Saturday, May 17, 2008

Weekend listen: The Giant Pool of Money

It's an hour long, off topic, and I'm behind on work. But it's worth a listen if you're a non-finance-whiz, and interested in understanding the housing crisis:

http://www.thisamericanlife.org/Radio_Episode.aspx?episode=355

("Free download" link is on the left side for a full mp3.)

For a geospatial take, the NY Fed has been putting together some very nice mortgage maps with quite recent data.

Wednesday, May 7, 2008

On sharing geospatial data

Blogger James Fee put up a post this monday that's generated a lot of interesting discussion.

Whenever geospatial data sharing comes up there's a pretty strong tendency to gloss over the fact that "sharing data" is actually a two-parter:
  • There is the sort of sharing you do with co-workers or associates: getting your data with them in, to borrow a phrase from the GPL, the "preferred form of the work for making modifications to it."
  • Then there is publishing GIS data (internally or externally) once it's "finalized."
As everyone (or at least everyone commenting on James' article) recognizes, the current state of the art sucks. We are lost in a maze of twisty file formats, standards, and protocols, none alike; even the simple task of emailing a shapefile to a co-worker is often the cause of bitter enmity between GIS and IT technicians.

Status quo for simple sharing: Zip your data up, making sure not to leave off any of the many files it is spread across, put a password on the zip file (the attachment will be stripped if it contains a personal geodatabase mdb visible to the spam filter), change the extension to something not '.zip', attach and pray. Gosh forbid you are sending anything larger than a few megabytes, or working with whole map documents at once. (Did you remember to include every related layer in that zip file with relative paths in the mxd?)

As for map data publishing, there is an embarrassment of options, none of them really mature (or rather, dominant) yet. Most require some programming in order to set up a publisher side, and on the consumer side you'll need to jump through some awkward conversion hoops as soon as you want to do something more involved than look at a pretty picture in IE or Google Earth. If you are like me, and you prefer or need to get your hands dirty, you breath a deep sigh of relief when there are plain old shapefiles or GeoTiffs available for download over HTTP.

Then there's discovering published data in the first place.... with a few exceptions, if you don't know in advance that it exists and have a good idea of where to look for it, it might as well not exist.


I like the idea of a SQLite data format, at least as a "preferred working format" for vector data. As a full relational database in a file, it supports the rich data model that was shoehorned into GML, but in a format designed especially for working with relational data and a full set of mature tools and libraries. And it doesn't have the drawbacks of being flaky and proprietary the way personal geodatabases are. What would be sweeten the pot for its use as a working data format, besides vendor support in the tools everyone uses: basic versioning or a 'track changes' equivalent. Way, way better would be supporting distributed branches and merging the way us programmers get to treat source code!

(Consider if the folks working on the WDPA had the GIS equivalent of git or darcs to do their jobs: branches for published versions where error corrections can still be made; the ability to accept or reject edits via email from anyone; local experts with their own working copies; larger datasets that contain other information on protected areas from other sources, but which can still automatically merge in the latest WDPA changes...)

That is the direction I think things will eventually move, but it might take a very long time. Don't hold your breath.

Footnote: I've put a reasonable amount of thought into these topics; without saying too much, they are extremely relevant to the product I am building. (ETA: inside of a couple of months!) (It's not decentralized version control for GIS, either: alack and alas.)