For the past few days we have been showing off some sweet maps, data, and use-cases built from CartoDB’s Data Observatory. We’re proud of the project and know some of our customers are already gaining a lot of value from using it. Making CartoDB users happy and successful is our primary goal of course. However, there is something hidden in the Data Observatory that may not be obvious to most people yet. The Data Observatory isn’t about data, it’s about prediction.
Spatial data and prediction on CartoDB
Spatial data is special, or so they say. But at CartoDB, we have been working hard to lower barrier after successive barrier in the location data realm. With the help of companies like Mapzen and Mapbox, we have pushed web mapping forward and greatly increased the creators and consumers of maps on the web. Our next goal is to do the same with spatial analysis. In particular, we want to help people make more accurate inferences from their data and generate more powerful predictions.
Inference and prediction are capabilities that can spell the difference between running a good company and running a great company. In a simple analogy, think of your own health, where the ability to understand why you have a cough is the only way to ensure that you get the right treatment. Knowing that same why is the only way you can set a course of action and predict how long it will take you to get better if you stay that course. In business, the impacts of identifying factors that lead to success and predicting future outcomes range from reducing costs to out maneuvering competitors. This is why we see so much money being spent by businesses to improve their data analytics and data science.
What we also see is that many businesses still haven’t discovered the value and potential in their spatial data completely. Location data has a contexualizing power that can be paired with the predictive capabilities of geostatistics in a way that few other types of data can. We think we can help businesses leapfrog their competitors and derive entirely new value from their existing data by showing them this future with location data.
The first step is context
When I was a biologist, we used data as well as predictive tools to learn more about animals than what our data at hand could show. For example, we worked a lot with data about the locations where animals had been observed to understand those animals. But point locations don’t tell you a lot. Instead we combined the location data with the known environment in those locations (temperature, precipitation, etc.) to build simple models and to then find other suitable areas where that animal likely existed. This workflow has proven powerful for biologists to learn more about the shifting ranges of animals, find historical events in the evolution of animals, and predict future responses of animals to things like climate change.
The two big components of building the predictive workflows were the input data and the contextual data (i.e. environment). Many of our clients come to CartoDB with data not-too-dissimilar, locations of people, stores, or events. What they lack are the “environmental” variables necessary to perform predictive analysis.
The environments of business
The environment that a business, customer, or event exists in isn’t limited to the climatological environment that an animal in the forest experiences (my biologist self is cringing at the over simplification here, but bear with me). The “environment” of your business for example is a collection of contextual location data, including what population exists or visits that location, what transportation methods are found in the location, and many other dimensions. So when a CartoDB user uploads their store locations, we want to be able to quickly provide them with that contextual location information.
Like a million arrows
How we can provide contextual location information is pretty simple actually. I like to think of location data as arrows that you shoot at a map. But on their way to reaching the map, they have the ability to pass through hundreds of other layers that are also known to exist in the same locations. By passing through each of those layers, you can find the good ones and collect more and more information. The Data Observatory is providing those layers. And our goal is to make it easy for CartoDB users to fire millions of arrows and collect limitless pieces of information.
From data to prediction
Setting up the Data Observatory in this way allows us to lower the cost and time it takes users to collect contextual location data. Once our users are blending the Data Observatory with their own data (e.g. store locations, customer locations, etc.) we can help them apply replicable modeling and statistical methods to discover inference and drive prediction.
For example, we are developing new ways for you to discover highly correlated dimensions in your data. We can do this in a few exciting ways: the first, using geospatial clustering, and second, by predicting correlated demographic variables from input data, on-the-fly. Immediate uses of these methods include outlier detection in the first case and market potential analysis in the second case. Like a blooming flower, these methods reveal more and more layers, each a different, highly valuable analysis that we can enable through data and the CartoDB technology.
The present is flying by
While the Data Observatory is ready to use today, some of the advanced analysis methods will be released over the coming months. We have been racing on them for some time now, and you can read about some of their impacts already on our blog. Or you can check us out talking about them in person. More will be coming out quickly, so keep up. Try to catch us at one of many public events. Or get in touch if you are dying to try these methods or might be interested in collaborating on the future of location intelligence.
The future is coming.
Happy data mapping!