Roll-your-own geocoding with OpenStreetMap Nominatim on Amazon EC2

Sometimes you need to geocode a few addresses, and while Google is obviously the gold standard, the Google Maps API conditions are quite strict - you are supposed only to geocode addresses you will be displaying in conjunction with a Google map. That’s no use for bulk / backend geocoding, the kind you might do for analysis purposes.

It’s not as accurate, but the OpenStreetMap project has a fairly serviceable global geocoder subproject called Nominatim. You can read the API docs here. Note that official OSM Nominatim site also has a fairly restrictive usage policy (summed up as ‘No heavy uses’ but effectively no parallel requests). The next step up is to use the Mapquest instance of Nominatim. It seems like you can in principle be a heavy user of this service, but in that case they have to approve your request. I didn’t try this so I don’t know what their terms or restrictions might be. In any case geocoding across the internet to a public server incurs a degree of latency, which may not be desirable if you have a really large number of addresses to code.

In my case, I have potentially millions of addresses. What’s more many of them are extremely low-quality, and Nominatim does not handle low quality addresses well at all in my experience (unlike - sigh - Google). To deal with this I use a pre-coding stage to attempt to guess the 5-10 most likely variants of the address data to geocode. But that means I’m geocoding 10 million+ addresses, which might stretch even Mapquest’s generous free service.

The great advantage of OSM, of course, is that’s an open project so you can, if you wish, or if you need to, replicate the entire thing on your own hardware. Which if course, means Amazon’s hardware.

I was slightly surprised to find there is no pre-existing AMI image with OSM/Nominatim pre-installed, so I had to install from scratch. The instructions are quite complete, but not specific to Amazon Linux and the EC2 environment, so I had to do quite a bit of adapting and trial-and-error to get things to work. All in all it took about 7 days runtime, which on the EC2 machine I used (r3.2xlarge - $0.70/hr) cost about $120. Whether that’s a small or large upfront cost depends on the project, but once you’ve done that you can geocode to your heart’s content for 70 cents per hour. In fact after installation I actually downgraded my instance to r3.xlarge ($0.35/hr) with no performance degradation, so there’s probably scope to do this even more cheaply.

Anyway, in case you want to try this yourself, I kept a reasonably complete (but probably not perfect, let me know any corrections) log of the install, which you can find in this gist:

https://gist.github.com/econandrew/c057b4575d9d17175861

Comments

My blog no longer supports comments, but I have preserved comments on older posts like this one.

Hi, This is amazing, but before I try to follow your directions I have a question. You did not happen to create an EC2 image from this did you? It would be fantastic to just boot this up and run it at cost, rather than having to reinstall.

Reverse Geocoder

Unfortunately I don’t have one handy any more - but it’s a good idea. The other thing worth remembering is if you don’t need worldwide geocoding you can use a part of the OSM map and it will build much quicker.

Andrew Whitby

Andrew, Is there a reason you install postgresql 9.3? I have run into difficulty installing 9.3 on amazon-linux (it looks like there is confusion if RHEL6 or 7 repos should be used) and am contemplating installing 9.4 but don’t want to run into issues down the road.

Luke Spencer

I suspect it was just the current version at the time. Perhaps check the latest guidance from the OSM wiki.

Andrew Whitby

I will try and dig up there recommendations but wanted to leave a note here for anyone trying to implement these instructions: starting with 9.3 PostgreSQL has added a repo for amazon-linux; find the build that you want here: http://yum.postgresql.org/repopackages.php

It’s also worth noting that changing the repo can change the future values (ex: postgresql-9.3 changed to postgresql93 in my case).

Luke Spencer

Hi,

fantastic tutorial, well done.

For those that don’t want to go through all the effort though may I suggest the OpenCage Geocoder: https://geocoder.opencagedata.com which also uses OpenStreetMap data (along with other open data sources) via nominatim.

We provide a simple, well-documented API for forward and reverse geocoding. There are libraries for all common programming languages, and a free testing level or affordable paid plans if you need more. We do a bit more than nominatim in that we provide well-formatted addresses and annotations (things like timezone, etc)

But we love nominatim too, we regularly contribute new features and improvements.

Ed

i have a problem import luxemburgo into my postgresql…

[b]My SO is[/b] CentOS release 6.7 (Final)

[b]my version of postgres[/b] PostgreSQL 9.3.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit

[b]version of nominatim[/b] nominatim 2.3.0

[b]problem:[/b]

When i try to import a little .pbf i have an error like execute command “createdb -E UTF-8 -p 5432 nominatim”, when i create the database it says that database “nominatim” aready exist…

[code] bash-4.1$ ./utils/setup.php –osm-file ../luxembourg-latest.osm.pbf –all –osm2pgsql-cache 1024 Create DB createdb: database creation failed: ERROR: database "nominatim" already exists ERROR: Error executing external command: createdb -E UTF-8 -p 5432 nominatim Error executing external command: createdb -E UTF-8 -p 5432 nominatim [/code]

I’m desesperate, i don’t know what i will do e.e plx someone explainme…

Marco Antonio Siqueiros

Hard to say for sure - maybe try “dropdb nominatim” since apparently the database already exists.

Andrew Whitby

What’s the process to bulk geo-code using Nominatim? Can this only be achieved by firing a bunch of parallel requests?

Tony

As far as I know, yes - although my memory is pretty vague in this respect.

Andrew Whitby

Have you created an AMI from your original installation and noticed it has performance issues unlike the original? I get errors when doing large generic searches like wal mart which I do not get on the original server.

Bk

Hi, Great post, it is very useful! Having problem to find postgres 9.3 I moved to 9.6 and to postgis 2.3. Now the mapis loading properly, but an url like {domain}/nominatim/reverse.php?format=json&lat=29.9&lon=-90.1&debug=1 gives: ERROR: relation “placex” does not exist. This was an issue several years ago, when osm2pgsql was not updated. Updating it did not help. Do you have any idea what else could be the reason?

Zsolt

Unfortunately I haven’t returned to Nominatim since this post, so I have no idea!

Andrew Whitby