First thoughs for a free administrative boundaries layer (II)



I have been analysing how to tackle the creation of  an OSM-based administrative boundaries layer, as described on my previous post First thoughs for a free administrative boundaries layer (I).

Basically, the task involves the following steps:

  1. Identify the right tags and export the associated geometries
  2. Convert these geometries into polygons
  3. Determine the hierarchy of these units (which provinces are subareas of a given country and so on)
  4. Add existing identifiers to make the layer usable on different policy contexts (FIPS, NUTS, ISO, etc)
  5. Check the completeness of the dataset against some reference database to detect additions, deletions, errors introduced, etc
  6. Add or correct missing regions on OSM databse (and start over)

As the layer should be updated regularly, all these steps have to be automated, ideally reusing existing tools. As I am rather new to the OSM-export toolchain, I asked for advise on [OSM-talk] mailing list, and I got a bunch of ideas and potential tools to perform the task:

  • MapIt scripts nicely performs steps 1 and 2, but then you have to figure the hierarchy on your own, which is not as trivial as it might look. MapIt scripts are used to feed the gazetteer MapIt Global service, using OSM data.
  • Nominatim database seems to be able to perform steps 1, 2 and until some extent also 3. It is also a gazetteer service, more strongly related with OSM community.
  • Wikidata could be used for step 4, as it provides the link among NUTS, ISO, etc and OSM relation IDs (although it is at the moment quite incomplete). It can also be used to identify hierarchies or to validate hierarchies identified by Nominatim. Finally, at some point it could also be used as a reference database to assess the completeness of OSM data (step 5). Wikidata is a project from Wikimedia Foundation (the organization behind Wikipedia) which aims to create a knowledge base for structured information.
  • Note that geometries generated by MapIt or Nominatim might not exactly correspond with the expected geometries, as the boundaries are inconsistently tagged on OSM (some of them map territorial waters, while others map the land boundaries). In order to generate uniform layers, the OSM Land Polygons dataset may be really helpful.
  • Finally, as setting Nominatim or Mapit is quite space demanding if working on the full OSM planet, an initial filtering of data can be very useful. This can be performed using osmosis or osmfilter.

I am already familiar with MapIt scripts, but I am seduced with Nominatim as it will simplify my life by identifying hierarchies for me. Moreover, as Nominatim is PostGIS-based, this might reduce the post-processing to a set of SQL scripts and data imports and exports. I will describe some Nominatim internals in my next post.


Accessing PostGIS from Linux 64 bits


, , ,

When I need to explore GIS data from Linux, my first options are gvSIG and QGIS. However, if you try to access PostGIS tables that are stored on a 64 bits server, QGIS is unable to access them, as only 32 bits identifiers are supported (at least on version 1.7.5, the one included on Linux Mint 14 Nadia). You can identify this problem by this error message:

“There were no columns in the table that were suitable as a qgis key into the table (either a column with a unique index and type int4 or a PostgreSQL oid column. The unique index on column ‘xxxxx’ is unsuitable because Quantum GIS does not currently support non-int4 type columns as a key into the table.”

There is an open ticket for this issue that you can use to track progress.

In this case, gvSIG is your friend. Note, however, that installing gvSIG on 64 bits systems is a bit tricky, so I recommend following these instructions:

  • Choose “All-included version (recommended)” from the gvSIG download section.
  • Execute installer and choose “Install a new Java Runtime Environment” (or use the suggested previous installation if it is not the first time you install gvSIG).
  • Install 32 bits base libraries if not already installed. They are available on package ia32-libs-multiarch on Linux Mint and Ubuntu systems, ia32-libs on Debian.

gvSIG requires a 32 bits Java runtime environment, and these steps are used to ensure that a 32 bits JRE is available and used by gvSIG.

The following error is a clear symptom of not having ia32-libs installed:

Caused by: java.lang.UnsatisfiedLinkError: /home/xxx/gvSIG/jre/1.6.0_20/lib/i386/xawt/ cannot open shared object file: No such file or directory

If using Linux Mint 14 Cinnamon, you might face an additional problem, as gvSIG installer is crashing Cinnamon window manager. In this case, you can avoid the problem by temporary switching your session to a different desktop environment (Gnome classic for instance). You can switch back to Cinnamon after installing gvSIG, as the problem only affects the installer.

First thoughs for a free administrative boundaries layer (I)



I am on the process of creating an administrative boundaries layer based on OpenStreetMap data. Ideally, it should contain the different OSM administrative levels and its hierarchy, it should include existing administrative codes (ISO, FIPS, NUTS in Europe,  etc) for each level and it should be updated regularly.

I am aware this is not a simple task, so I think on it as a mid-term project, but I hope to have some preliminary results soon (which I will publish here as soon as available) and then keep improving this basis little by little. I think such a layer would be very useful for a number of scenarios, for instance for scientific analysis involving administrative boundaries, specially in areas where such data is not publicly available. Moreover, it would probably attract collaborators to complete or correct the OSM boundaries where needed.

I have started analysing administrative boundaries tagging in OSM, tools to export and process the data, potential problems, etc. I have also posted the idea on [OSM-talk] mailing list and I have got lots of advises and proposals.

Regarding tagging, the current situation is a quite heterogeneous, as some country boundaries are mapped on the coastline, while  other boundaries are mapped on the territorial water limit. Some countries try to map both of them, using a combination of tags which varies from country to country.

Regarding the hierarchy (for instance, which provinces are a subarea of a country), it is not usually specified on tagging, but it can be guessed (to some extent) from their geometric properties.

Existing third-party administrative codes (ISO, etc) are not widely tagged on OSM data, so they should be added some way, probably by creating a correspondence table that could also be used to ensure that there is no missing countries (or regions) in the export.

I will go deeper on this ideas in the following weeks, so keep reading!

Querying and exporting OSM data


OpenStreetMaps has become an impressive geographical database which can be considered as “the Wikipedia of maps”. OSM opens a very interesting door for the derivation of specialized layers (not only roads but also railways, shops, protected areas, etc) which can be used for spatial analysis, filling an important gap in the Open Data field.

Till recently, export options where rather limited, mainly involving using XAPI (a read-only web interface for OSM data), or creating a kind of replica of OSM database in PostGIS. Fortunately, more options are available now such as the simpler and powerful Overpass API, which together with the Overpass Turbo service are lowering the entry barrier for exporting data.

However, if you need to export a very big amount of data (such as the roads for a full continent), you will still need to configure a local replica of OSM data, otherwise you will be banned by on-line services. There are several options when building that replica, each of it best suited for a different purpose, as summarized in the following table:

Schema name Created with Used by Primary use case Geometries (PostGIS)? Database
osm2pgsql osm2pgsql Mapnik, Kothic JS Rendering Yes PostgreSQL
apidb osmosis API Mirroring No PostgreSQL, MySQL
pgsnapshot osmosis jXAPI Analysis optionally PostgreSQL
imposm Imposm Rendering Yes PostgreSQL
nominatim osm2pgsql Nominatim Search, Geocoding Yes PostgreSQL
osmsharp OsmSharp Routing No Oracle
overpass Overpass API Analysis  ? custom
mongosm MongOSM Analysis  ? MongoDB
node-mongosm Mongoosejs Analysis Yes MongoDB

The original, complete table can be found in OSM Wiki. Consider carefully the features offered by each option (and its space requirements) before selection any of them.

The ones I am familiar with are the pg_spapshot schema and the overpass schema. The pg_snapshot schema is a very good option if you prefer to query and post-process your data using SQL/PostGIS predicates. The full OSM database uses about 400 GB when imported in this schema. You can find the installation instructions in this tutorial. The overpass schema is suitable if you prefer to filter OSM data using a simple OSM-centric XML query language. In this case, the full planet requires about 150 GB (50 GB if excluding updates and metadata), and installation is quite easy if you are familiar with a Linux environment (a similar tutorial is available on website) .