Wed, 17 Aug 2011

Adventures in Geocoding

— SjG @ 5:34 pm

An iPhone locator app I’ve been building had a weird thing happen: if you denied it access to Location Services and entered a valid ZIP code, it would work — but an invalid ZIP code would always home in on Lancaster, Pennsylvania. At first, I thought it was a bug in how I was sending coordinates to MKMapView, but I quickly was able to confirm that the problem originated in my server-side geolocation service.

My server-side geolocation service uses Google’s deprecated Geocoding API version 2. The problem arose from sloppy coding on my side, coupled with Google’s map intelligence, and the v2 API’s reporting. Here’s how:

My code would assemble an address string from street number, city, state, and ZIP code (if provided). In this particular case, however, it was only receiving a ZIP code. But my code was crappy, and the address string it assembled looked like “null, null, null 90066”. If the ZIP code was legit (like 90066), Google’s geolocator is smart enough to figure out that it’s a ZIP code, ignore the “nulls,” and do the right thing. But the interesting thing happens when the ZIP code is not legit. Google’s algorithm evidently tries its best to match up the provided address with something you might be looking for. Perhaps due to previous searches with the API key we were using, perhaps for other unknown reasons, those “nulls” were matched to Noll Airport, East Hempfield, Lancaster, PA 17603.

Interestingly, the Geo Address Accuracy resolution value returned for that specific match is 9, or “Premise level accuracy” — the highest level of accuracy. Again, my crappy code had assumed that the Geo Address Accuracy was a confidence factor, not a resolution indicator. So that guess appears to be a really good fix, when, in fact, it’s no better than any other guess.

Revising the code to leave out the “nulls” resulted in another interesting result. Again, legitimate ZIP codes were found right off. Bad ZIP codes ended up being matched against other numeric codes, so, for example, “90000” ended up matching Belfort, France, and “91000” ended up matching Tawau, Malaysia. These results all come back with a Geo Address Accuracy of 5, or ZIP-code level.

One solution to this problem is to validate ZIP codes before submission to Google. Another solution would be to upgrade to the v3 API, where there’s more information about what’s going on.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.