When we first started Ontodia, we were really keen on applying pragmatic semantic web technologies to open data. “Onto” comes from ontology, and “dia”, from encyclopedia. We thought (and still do) that “Open Data is the killer app of Linked Data and vice-versa.”
NYCpedia was actually our third attempt at doing that (NYCFacets was our second, and NYCDataWeb, the first). And on our third attempt, we found what we thought was a practical way to link open data, and that was with LOCATION.
Location, Location, Location!
It’s the mantra of every real estate agent. You may have the best property, but if its not in the right place, it doesn’t matter! With just a property’s address, a real estate agent can find out a lot of information culled from hundreds of data sources. What are the latest sales in the area? Is it near entertainment or shopping? In the right school district? Accessible to public transportation? Low crime rate? Etc, etc.
We found that this also applied to Open Data! Because when we first attempted to link data through identifiers, we found that it was a Sisyphean task! The datasets were simply too “dirty” and raw to link across datasources to create true linked data. And what can a small startup like ours do to minimize, never mind, eliminate the endless data-wrangling that is behind every Big Data project!?
But as we learned with NYCpedia, just enough context is good enough, and location is the ultimate link across disparate data sources! I don’t need to know that Building ID Number 1009745, is tax block 579 lot 70, is 137 Varick Street in Manhattan, NY.
All I need to know is the address, and with it, I can start linking data across systems – from NYC’s Building Identification System, property tax information from it Department of Finance, job listings from Indeed, 311 complaints, Foursquare check-ins, Flickr pictures taken in the area, Zillow property valuations, companies in Crunchbase, etc. etc.
We first discovered CartoDB in 2012 when we met co-founder – Javier de la Torre at NYCBigApps. Since then, we’ve been using it and love how it made creating beautiful data-driven maps, and telling stories around data so much easier. We also love that they share the same technologies/values as they democratize geospatial software – using open source software delivered as a service over the cloud, using PostgreSQL – much the same way we’re democratizing open data with OpenData.city.
There’s simply no better “GIS in the Cloud” solution.
So when we embarked on our fourth attempt with OpenData.city, we knew we were going to incorporate it as a fundamental component of the hosted service.
CKAN + CartoDB > Data Portal
We built OpenData.city on CKAN – the leading open source data portal platform, and the technology behind the biggest data portals in the world. Out of the box, it already has some advanced geospatial capabilities to allow you to preview, search and organize geospatial data. However, those capabilities were largely limited to viewing geospatial data, and more advanced capabilities required access to specialized GIS software. There were also constraints in the geospatial file formats it recognized, and there were some performance issues with large files.
But with the Resource Views feature in the upcoming version 2.3 release of CKAN, it created an opportunity for us to integrate CartoDB. As both are open-source, it allowed us to integrate the two seamlessly. Taking our CKAN hosting offering from just being a data portal solution – an online catalog for viewing and downloading datasets; to where data publishers can actually start to contextualize, analyze and tell stories with geospatial data.
And our CartoDB integration isn’t only skin-deep. Datasets can be set to transparently synchronize with CKAN with full access to the CartoDB platform – do SQL joins across datasets, add layers, customize and stylize map elements, georeference, map wizards, the works! We even linked any map generated from the portal back to the dataset (and if there’s a related Discourse instance, to related discussion thread as well).
And there’s more!
Attribution: the “Open Data is Geospatial Data Venn Diagram” was inspired by Andrew Nicklin‘s presentation at GISMO NYC November 18, 2014, “GeoData Startup Showcase.”