I have been analysing how to tackle the creation of  an OSM-based administrative boundaries layer, as described on my previous post First thoughs for a free administrative boundaries layer (I).

Basically, the task involves the following steps:

  1. Identify the right tags and export the associated geometries
  2. Convert these geometries into polygons
  3. Determine the hierarchy of these units (which provinces are subareas of a given country and so on)
  4. Add existing identifiers to make the layer usable on different policy contexts (FIPS, NUTS, ISO, etc)
  5. Check the completeness of the dataset against some reference database to detect additions, deletions, errors introduced, etc
  6. Add or correct missing regions on OSM databse (and start over)

As the layer should be updated regularly, all these steps have to be automated, ideally reusing existing tools. As I am rather new to the OSM-export toolchain, I asked for advise on [OSM-talk] mailing list, and I got a bunch of ideas and potential tools to perform the task:

  • MapIt scripts nicely performs steps 1 and 2, but then you have to figure the hierarchy on your own, which is not as trivial as it might look. MapIt scripts are used to feed the gazetteer MapIt Global service, using OSM data.
  • Nominatim database seems to be able to perform steps 1, 2 and until some extent also 3. It is also a gazetteer service, more strongly related with OSM community.
  • Wikidata could be used for step 4, as it provides the link among NUTS, ISO, etc and OSM relation IDs (although it is at the moment quite incomplete). It can also be used to identify hierarchies or to validate hierarchies identified by Nominatim. Finally, at some point it could also be used as a reference database to assess the completeness of OSM data (step 5). Wikidata is a project from Wikimedia Foundation (the organization behind Wikipedia) which aims to create a knowledge base for structured information.
  • Note that geometries generated by MapIt or Nominatim might not exactly correspond with the expected geometries, as the boundaries are inconsistently tagged on OSM (some of them map territorial waters, while others map the land boundaries). In order to generate uniform layers, the OSM Land Polygons dataset may be really helpful.
  • Finally, as setting Nominatim or Mapit is quite space demanding if working on the full OSM planet, an initial filtering of data can be very useful. This can be performed using osmosis or osmfilter.

I am already familiar with MapIt scripts, but I am seduced with Nominatim as it will simplify my life by identifying hierarchies for me. Moreover, as Nominatim is PostGIS-based, this might reduce the post-processing to a set of SQL scripts and data imports and exports. I will describe some Nominatim internals in my next post.