Rex Sorgatz had An Interview with Adrian Holovaty. Adrian created ChicagoCrime.org (now redirects to his new venture, EveryBlock) back when we would have to use hacked methods (rather than an API) to use google maps (such as the Google Maps Standalone method).

Adrian talks about some of the challenges at EveryBlock which definitely rang a bell with me. Here are a few interesting passages that developers in the local space and/or aggregators of data may be able to relate with:


One of our post-launch priorities is to clean up the fire-hose of raw information, to introduce concepts of priority and improved relevance — but I do think there’s a certain appeal to that raw dump of “here’s everything that’s happened around this address, in simple, reverse-chronological order.” When significant events happen, they sort of “POP out” of the list.

The first layer is the army of scripts that compile data from all over the Web. This includes public APIs, private APIs, screen-scraping the “deep Web,” crawling news sites, plus harvesting data from PDFs and other non-Web-friendly documents. Some data also comes to us manually, like in spreadsheets e-mailed to us on a weekly basis. For each bit of data, we determine geographic relevance and normalize it so that it fits into our system.

The second layer is the data storage layer, which we built in a way that can handle an arbitrary number of data types, each with arbitrary attributes. For example, a restaurant inspection has a violation (or multiple violations), whereas a crime has a crime type (e.g., homicide). Of course, we want to be able to query across that whole database to get a geographic “slice,” so there’s a strong geo focus baked into everything.

The user interface was, and continues to be, a challenge. How do you display so many disparate pieces of data together, without overwhelming people?

Dealing with structured data is relatively easy, but attempting to determine structure from unstructured data is a challenge. The main example of unstructured data parsing is our geocoding of news articles. We do a pretty good job here, but we’re not crawling all of the sources we want to crawl — again, there’s a lot of room to grow.

On a completely different note, it’s been a challenge to acquire data from governments. We (namely Dan, our People Person) have been working since July to request formal data feeds from various agencies, and we’ve run into many roadblocks there, from the political to the technical. We expected that, of course, but the expectation doesn’t make it any less of a challenge.

Rather than use Google Maps or Microsoft’s Virtual Earth, you built your own mapping service application. Why?

That, along with “When will you bring EveryBlock to city XXX?”, is by far the most frequently asked question we get. Paul, our developer in charge of maps, is working on an article explaining our reasoning, so I don’t want to steal his thunder. I’ll just say that the existing free maps APIs are optimized for driving directions and wayfinding, not for data visualization. And, besides, having non-clich├ęd maps is an easy way to set yourself apart. Google Maps is so 2005. ;-)

We use an open-source library called Mapnik to render the maps, so that library does the heavy lifting for us. Paul is also working on a how-to article, in the spirit of giving back to the open-source community, that explains how to use Mapnik.

I strongly suspect we’ll have an API eventually

[via kottke]

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
 
Loading ... Loading ...