Quantcast
Channel: CARTO Blog
Viewing all 820 articles
Browse latest View live

5 ways the Data Observatory is going to change your relationship with CartoDB

$
0
0

Earlier this week we released the Data Observatory. We built the Data Observatory because we wanted easier to find and easier to use geospatial data. More than that, we wanted to put in place the foundation for some things we have in store for the future (more on that in my next blog post). We are all a bunch of map and data nerds, so this is a project that really came from our hearts. While the Data Observatory is far from complete, we built it so it could expand rapidly into new areas of data, the first release is enough to have us tearing up a bit. We think it has the power to change the way you analyze location data and potentially the way you think about maps entirely.

Here are 5 ways we think it’s going to change your relationship with CartoDB:

1. CartoDB will become the first place your look for the staples of location data.

TurboCarto

The Data Observatory’s list of available data grows bigger every week. Our focus is on the high-value and/or hard-to-use datasets that people and businesses ask us for all the time. For people who need or just love location data, the Data Observatory might become a bit of an addiction. Be careful!

Example

Grab U.S. Census Block Groups for your home town quickly by simply running this function and clicking “create dataset from query” in the CartoDB Editor.

sql SELECT * FROM OBS_GetBoundariesByGeometry( ST_Buffer(CDB_LatLng(40.689, -73.944),0.1),'us.census.tiger.block_group')

2. Soon, you’ll be wondering if you normalize data too much.

It is now super simple to normalize your data in CartoDB with some of the world’s most trusted sources of denominators. So why not see what your data looks like per household? Or, per person? Or, per income($)? It’s so easy you might as well take a look. :)

Example

Add the U.S. population, per square meter, as a new column of any point dataset in your CartoDB account. Simply add a new column called, “population”, and run the following statement,

sql UPDATE SET population = OBS_GetUSCensusMeasure(the_geom, 'Total Population')

3. You will start making maps of people just because.

This is one that we know to be true. We know it because we’ve been living it for the past two months. Our team has had the pleasure of early access to the Data Observatory. The result of that access? Maps. Maps for curiousity. Maps for discovery. Maps just because. It’s awesome and you’re going to love it. There are so many interesting measures in the Data Observatory, it’s worth just playing around.

Example

Take for example, the number of people who walk to work as their primary mode of transportation. You can add the count of these people to any of your U.S. based polygons simply by adding a new column, “walkers”, to your table and then running,

sql UPDATE SET walkers = OBS_GetUSCensusMeasure(the_geom, 'Walked to Work')

4. Basemaps will become more and more optional.

One thing that I’ve noticed a lot since we started getting more and more data into the Data Observatory is that our Senior Cartographer started having fewer and fewer basemaps in her visualizations. And even those that do use a basemap are often actually just using boundary layers from the Data Observatory to replace the fully detailed basemaps we all normally use.

Example

Use a new boundary dataset as a blank canvas instead of a basemap. For example, grab the U.S. Congressional Districts and add them to a new blank table. First, create a new empty dataset in CartoDB. Add a new string based column called, ‘geoid’. Rename your new table, ‘us_congressional_districts’ and just run,

sql INSERT INTO us_congressional_districts(the_geom, geoid) SELECT the_geom, geom_ref FROM OBS_GetBoundariesByGeometry( ST_MakeEnvelope(-179.5, 13.4, -42.4, 74.4, 4326),'us.census.tiger.congressional_district')

5. You’ll find many new stories in old data.

In our mapping rampage we found ourselves thinking this quite a lot. The Data Observatory has actually backed some of our recent blog posts, including the L Train Analysis, Airbnb Impact Mapping and Data Mountains. What we now know is that we have only scraped the surface of the stories out there.

Let’s go dive into data!

Happy data mapping!


The Data Observatory is about Prediction

$
0
0
CartoDB is about prediction

For the past few days we have been showing off some sweet maps, data, and use-cases built from CartoDB’s Data Observatory. We’re proud of the project and know some of our customers are already gaining a lot of value from using it. Making CartoDB users happy and successful is our primary goal of course. However, there is something hidden in the Data Observatory that may not be obvious to most people yet. The Data Observatory isn’t about data, it’s about prediction.

Spatial data and prediction on CartoDB

Spatial data is special, or so they say. But at CartoDB, we have been working hard to lower barrier after successive barrier in the location data realm. With the help of companies like Mapzen and Mapbox, we have pushed web mapping forward and greatly increased the creators and consumers of maps on the web. Our next goal is to do the same with spatial analysis. In particular, we want to help people make more accurate inferences from their data and generate more powerful predictions.

Inference and prediction are capabilities that can spell the difference between running a good company and running a great company. In a simple analogy, think of your own health, where the ability to understand why you have a cough is the only way to ensure that you get the right treatment. Knowing that same why is the only way you can set a course of action and predict how long it will take you to get better if you stay that course. In business, the impacts of identifying factors that lead to success and predicting future outcomes range from reducing costs to out maneuvering competitors. This is why we see so much money being spent by businesses to improve their data analytics and data science.

What we also see is that many businesses still haven’t discovered the value and potential in their spatial data completely. Location data has a contexualizing power that can be paired with the predictive capabilities of geostatistics in a way that few other types of data can. We think we can help businesses leapfrog their competitors and derive entirely new value from their existing data by showing them this future with location data.

The first step is context

When I was a biologist, we used data as well as predictive tools to learn more about animals than what our data at hand could show. For example, we worked a lot with data about the locations where animals had been observed to understand those animals. But point locations don’t tell you a lot. Instead we combined the location data with the known environment in those locations (temperature, precipitation, etc.) to build simple models and to then find other suitable areas where that animal likely existed. This workflow has proven powerful for biologists to learn more about the shifting ranges of animals, find historical events in the evolution of animals, and predict future responses of animals to things like climate change.

The two big components of building the predictive workflows were the input data and the contextual data (i.e. environment). Many of our clients come to CartoDB with data not-too-dissimilar, locations of people, stores, or events. What they lack are the “environmental” variables necessary to perform predictive analysis.

The environments of business

The environment that a business, customer, or event exists in isn’t limited to the climatological environment that an animal in the forest experiences (my biologist self is cringing at the over simplification here, but bear with me). The “environment” of your business for example is a collection of contextual location data, including what population exists or visits that location, what transportation methods are found in the location, and many other dimensions. So when a CartoDB user uploads their store locations, we want to be able to quickly provide them with that contextual location information.

Like a million arrows

How we can provide contextual location information is pretty simple actually. I like to think of location data as arrows that you shoot at a map. But on their way to reaching the map, they have the ability to pass through hundreds of other layers that are also known to exist in the same locations. By passing through each of those layers, you can find the good ones and collect more and more information. The Data Observatory is providing those layers. And our goal is to make it easy for CartoDB users to fire millions of arrows and collect limitless pieces of information.

From data to prediction

Setting up the Data Observatory in this way allows us to lower the cost and time it takes users to collect contextual location data. Once our users are blending the Data Observatory with their own data (e.g. store locations, customer locations, etc.) we can help them apply replicable modeling and statistical methods to discover inference and drive prediction.

For example, we are developing new ways for you to discover highly correlated dimensions in your data. We can do this in a few exciting ways: the first, using geospatial clustering, and second, by predicting correlated demographic variables from input data, on-the-fly. Immediate uses of these methods include outlier detection in the first case and market potential analysis in the second case. Like a blooming flower, these methods reveal more and more layers, each a different, highly valuable analysis that we can enable through data and the CartoDB technology.

The present is flying by

While the Data Observatory is ready to use today, some of the advanced analysis methods will be released over the coming months. We have been racing on them for some time now, and you can read about some of their impacts already on our blog. Or you can check us out talking about them in person. More will be coming out quickly, so keep up. Try to catch us at one of many public events. Or get in touch if you are dying to try these methods or might be interested in collaborating on the future of location intelligence.

The future is coming.

Happy data mapping!

Import and Export Your Visualizations using .carto

$
0
0
Import and Export Your CartoDB

For data scientists, it’s very important to share the way they work their magic – so others can check and iterate over it. Let’s face it, analysis is one of those tasks that when done in a collaborative environment is much better and easier. Starting today, exporting and sharing your layered, styled visualizations is available directly from the interface!

CartoDB has been working on a brand new way to transport and distribute your data-driven visualizations. We are proud to release the new .carto file format. Using .carto, you can export public maps or export your own visualizations and map styles, store them, and drag-and-drop them into the CartoDB interface. Your styles, SQL, and data are all set exactly in your .carto file format. Now, spend your time on what really matters – performing analysis on your data.

How it works

For private work environments the .carto file format is compatible with your
on-premise and secure data. Simply click on edit and export your map. Your data and visualizations remain safe within your organization like before and you can share your data-driven visualizations efficiently, seamlessly, and in a flexible way.

Also, with this new feature you can export any visualization that you have access to (shared or public), with its data, into a file to import it in your account. This is a great feature for CartoDB Ambassadors, or anyone who can’t find the data they need to combine it with their own data. Just select any public map you want to export, then click on the download icon (hint: it’s next to the ‘like’ icon), and drag-and-drop your downloaded .carto file into your CartoDB Editor!

Data collaboration and analysis just got faster!

The new .carto format makes it easy to complete projects and pass the files to your client’s CartoDB account, keeping data, styles, and queries intact and ready to go on launch day. You can load your style set with pre-made maps with your company’s brand and requirements in place. The .carto format is also ideal for brand managers that have a series of maps with strict brand requirements. The format ensures that all your brand elements remain in place across your organization.

Simply, make maps faster, deliver analysis quicker, and distribute your visualizations to your entire team. Create maps that are styled and data-ready for people to spin off, allowing even more stories to be told in your visualizations – just store the package on a server and share a link.

To get you started, we have three amazing visualizations that you can check out to see just how easy .carto is!

  • NYC subway network – See map
  • Unemployment Rate in US – See map
  • Sydney natural population change – See map

Happy data mapping!

Sneak peek: CartoDB for Alteryx!

$
0
0
CartoDB for Alteryx

We previously announced that CartoDB will be a platinum sponsor at the Alteryx Inspire conference happening in San Diego from June 6-9th. As we approach our joint workshop with Alteryx and Boston Consulting Group, we wanted to give you a sneak peek into using CartoDB with Alteryx!

The CartoDB connector for Alteryx makes it easy to push Alteryx outputs directly to CartoDB and then create beautifully dynamic maps. With the CartoDB connector you can easily run multiple Alteryx workflows and update a CartoDB dashboard in real time to visually analyse the results.

Adjust attributes on the fly, select locations and see the computational output adjust to match your data. The video below shows selecting or zooming in on a trade area where you can gain insights into how much revenue is at stake with low/high risk customers that might move over to a competitor. The world of business data analysis is open!

We hope you’ll join us at Inspire 2016 for our hands on workshop—Drop us a note if you want to hear more.

Happy data mapping and see you in San Diego!

Accessing Demographic Clusters with CartoDB's Segmentation Layers

$
0
0

The U.S. Census is an amazing resource of data and information. The U.S. Census performs a number of regular as well as ongoing surveys that document many facets of people and life in the U.S. These data can often be used to help learn about dimensions of a location and what it might contain.

As humans though, when asked about what a neighborhood is like we don’t rhyme off a series of census variables for that neighborhood. Instead, whether it be the hipsters of Williamsburg or the stroller traffic jams of Noe Valley, we tend to describe neighborhoods in terms of archetypes that we can more easily relate to. These kinds of neighborhood descriptions can add meaningful value and context to your location data.

We want to make this kind of contextual data available and easier to use in CartoDB.

Releasing segmentation layers

Today we are releasing, inside the Data Observatory, a new set of layers created from demographic segmentation. Demographic segments provide a kind of grouping of people, that we then apply a data-driven naming method that makes them easily readable and recognizable when analyzing your data. By releasing them in the Data Observatory, we are making them available for users to use for augmenting their own data quickly.

The segmentation is generated through a clustering procedure that we’ll cover in more depth in a forthcoming blog post. The output gives us two granularities of clustering, one that produces 10 unique groupings of people across the USA and a second that creates 55 unique groupings of people.

10 cluster resolution

To generate these clusters we used the algorithum proposed by Spielman and Singleton and for the x10 clusters we were able to adopt their naming structure.

The names of the 10 different neighborhood types are:

Hispanic and kids
Low Income and Diverse
Middle income, single family homes
Native American
Low income, minority mix
Old Wealthy, White
Residential institutions, young people
Wealthy Nuclear Families
Low Income African American
Wealthy, urban, and kid-free

55 cluster resolution

The 55 cluster layer was more difficult, as the names of each group had not been previously published. For these more detailed categories, we generated names based on the dominant traits of the populations within that cluster (or the dominant omission in a few cases). For example, if an area within a city population is found to be highly dominated by college age adults with some college education, it was given the name, “City center university campuses”.

Take a look at all 55 proposed group names:

Middle Class, educated, suburban, mixed race
Low income on urban periphery
Suburban, young and low-income
Low-income, urban, young, unmarried
Low education, mainly suburban
Young, working class and rural
Low income with gentrification
High school education, long commuters, Black, White Hispanic mix
Rural, bachelors or college degree, rent/owned mix
Rural,high school education, owns property
Young, city based renters in sparse neighborhoods, low poverty
Predominantly black, high high school attainment, home owners
White and minority mix, multilingual, mixed income / education, married
Hispanic/Black mix multilingual, high poverty, renters, uses public transport
Predominantly black renters, rent / own mix
Lower middle income with higher rent burden
Black and mixed community with rent burden
Lower middle income with affordable housing
Relatively affordable, satisfied lower middle class
Satisfied lower middle income, higher rent costs
Suburban/rural, satisfied, decently educated lower middle class
Struggling lower middle class with rent burden
Older white home owners, less comfortable financially
Older home owners, more financially comfortable, some diversity
Younger, poorer,single parent family, Native Americans
Older, middle income Native Americans married and educated
Older, mixed race professionals
Works from home, highly educated, super wealthy
Retired grandparents
Wealthy and rural living
Wealthy, retired mountains/coasts
Wealthy diverse suburbanites on the coasts
Retirement communities
Urban - inner city
Rural families
College towns
College town with poverty
University campus wider area
City outskirt university campuses
City center university campuses
Lower educational attainment, homeowner, low rent
Younger, long commuter in dense neighborhood
Long commuters White/Black mix
Low rent in built up neighborhoods
Renters within cities, mixed income areas, White/Hispanic mix, unmarried
Older Home owners with high income
Older home owners and very high income
White/Asian Mix big city burb dwellers
Bachelors degree mid income with mortgages
Asian/Hispanic Mix, mid income
Bachelors degree higher income home owners
Wealthy city commuters
New developments
Very wealthy, multiple million dollar homes
High rise, dense urbanites

On the map

You can explore both on this map and the deep insights dashboard here or take a look at the simple map version here:

Accessing demographic segments

Using the awesome power of the Data Observatory to bring these segments into your data is as easy as calling a quick SQL statement.

To query these segments at a single point location, simply use the function

10 clusters

SELECT*FROMOBS_GetUSCensusCategory(CDB_LatLng(40.704512,-73.936669),'Spielman-Singleton Segments: 10 Clusters')

55 clusters

SELECT*FROMOBS_GetUSCensusCategory(CDB_LatLng(40.704512,-73.936669),'Spielman-Singleton Segments: 55 Clusters')

Augmenting your data

Another interesting use of the segmentation data is to augment your tables. You can do so by adding a new column to any table called segment (or any other unique name).

Next, augment your table with the segment description:

updateyour_tablesetsegment=(SELECT*fromOBS_GetUSCensusCategory(the_geom,'Spielman-Singleton Segments: 10 Clusters'))
updateyour_tablesetsegment=(SELECT*fromOBS_GetUSCensusCategory(the_geom,'Spielman-Singleton Segments: 55 Clusters'))

Next steps

Today we wanted to announce the availability of this exciting set of layers in the Data Observatory. In future blog posts we will explore some of these groupings, what they can tell us about the U.S., and how they can add context and insight into your data. We will also detail how these segments were created and how we plan to improve and expand on them in the future.

For further reading checkout the data services-api docs and the Data Observatory.

For now, happy demographic segment mapping!

The Data Observatory is for Cities

$
0
0

Lately, we’ve been talking a lot about data analysis of segmentation and demographics. Those are two things you are guaranteed to uncover when perusing city data. Cities are over burdened with a plethora of issues like crime, housing, and transportation, and need to provide context and solutions for these issues based on information. What that takes is data, and lots of it.

Find the data, find the solution

U.S. Census and American Community Survey data are key resources to understanding municipal patterns. Accessibility to contextual data about constituents is a critical part of a successful city. Widely available location data is used to make faster and more informed choices like: where to distribute resources, improve systems, or identify imbalances or inequalities. New insights on a city’s infrastructure can be gleamed from location data because new relationships and patterns emerge when seen spatially.

However, keeping a city loaded up with the data it needs takes significant effort. Resource strapped GIS and IT departments spend time acquiring data, updating it, and managing the storage it demands. Additionally, managing large datasets take a lot of effort and expensive tools.

It’s all baked into the system

The Data Observatory is a way for cities to instantly access common support data at a seriously reduced effort. The Data Observatory houses hard to find and high value datasets on CartoDB servers. You don’t need to gather, store, or clean up the data. Instead, you can just jump in, request data when you need it, and start analyzing in just a matter of moments.

For example, to view census boundary data on the total population of Oakland, CA requires you to simply get the bounding box around the area from bbox finder and put it into the following SQL statement:

INSERT INTO alameda_census_tracts(the_geom, geom_refs) SELECT * FROM OBS_GetBoundariesByGeometry (ST_MakeEnvelope(-122.351303,37.656101,-121.958542,37.862928,
4326), ‘us.census.tiger.census_tract_clipped’)

Once the bounding box is set, you can request the Census Tracts from the Data Observatory, clipped to shoreline.

Next, you can pull in the total population for each tract and add a few specific details to the data to give you more control over the output.

UPDATE alameda_census_tracts SET measure = OBS_GetMeasure(ST_PointOnSurface(the_geom), ‘us.census.acs.B01003001’,’area’,’us.census.tiger.census_tract’,’2010 - 2014’)

The result - a chloropleth visualization of the data displayed in one of our color ramps.

Oakland Census Tracts

There aren’t any directories to sort through or joins to navigate. Speedier access to data facilitates faster and smarter action by city officials and their staff with fewer worries over resources to get there. The Data Observatory helps you augment official data to build visualization and intelligence tools that matter.

Experiment and iterate

The Data Observatory makes it easier to iterate and experiment with your data and get a more informed end result. We’ve got some great plans for the Observatory, over time the available datasets will expand, and cities will have more access to the data they need.

Happy data mapping!

Offline maps with the Mobile SDK

$
0
0
Offline maps

Working offline is kind of a big deal. There are many mobile apps that always need to work, not just when a device is online. In previous posts we covered offline routing features. Today, I’d like to tell you all about offline basemaps for the CartoDB Mobile SDK. It should come as no surprise that the most common reason developers have sought out and use the SDK is that our basemaps work offline!

The Mobile SDK provides several solutions for offline basemaps:

  1. Country-wide packages: The CartoDB SDK provides OpenStreetMap-based vector map packages for every country in the world. For bigger countries, we have additional separate states. Countries are also organized into continents and subcontinents. All our offline packages are pre-packaged on the server side, which means a very fast download rate via CDN. See our long list of available country packages. As an alternative, we provide Here.com maps, for a similar country-wide package. Here.com maps have great address, geocoding, and routing data. Additionally, in some instances you may only need one particular country or state to be available. Some app providers have created an interactive country selection menu (e.g. Offline Maps app for Android).

  2. Maps for limited regions: This is useful for when you do not need to visualize the whole country. For example, Lonely Planet uses this approach in their Guides app for Android, which is powered by CartoDB. In the Guides app we use pre-packaged maps, which offers a very fast download speed, especially compared to the tile-by-tile download approach.

  3. Your own map data packages: The Mobile SDK supports Mbtiles as a map package format (via MBTilesTileDataSource) and you can find several tools to create these files (see our list here). For more compact vector packages there are no ready-made open source tools that we know of, but we can provide you with custom scripts. You’ll need to have a vector tile server with your data and a script that scrapes all tiles for your area of interest and saves them to an mbtiles script. Our tools for this supports data encryption, which is useful when you want to avoid copying and misuse of your valuable vector data assets. For more advanced cases you can also combine online and offline data sources. With your own map packages you can bundle map data with an app, which makes installation bigger, but you won’t need the additional download.

I hope you find offline mobile mapping as exciting and as useful as we do. If you’d like to know more about the latest in our Mobile SDK technology join our ‘Get started using CartoDB’s Mobile SDK’ webinaron June 29, 2016.

Happy offline mobile mapping!

GeoJourNews 2016 is less than 2 weeks away!

$
0
0
GeoJournews

Come join us for a day full of fascinating talks from some innovative and talented journalists! GeoJourNews is on June 8th GeoJourNews 2016 and is a media and technology conference catered to journalists to showcase the power of visualization and mapping. Attendees will hear from developers, journalists, and activists from various communities of practice working specifically on interactive mapping stories, GIS challenges, and creative coding solutions that couple storytelling and cartography in powerful and creative narrative.

CartoDB is proud to sponsor this unique event with the Mozilla Foundation and The New School.

We are excited to show how CartoDB, data visualization, and data science are being used by journalists, data scientists, developers, and activists to shape policy, teach, and better understand our world.

Check out the list of speakers and their talks below:

Jonathan Stray, Columbia University: “Location Privacy in the Age of the Smartphone”

Jonathan Stray is a data scientist and a computer scientist who’s written for The New York Times, The Atlantic, Wired, Foreign Policy and ProPublica. He teaches computational journalism at the Tow Center for Digital Journalism at Columbia University, and leads the Overview

Matías Kalwill, Bikestorming: “Adventures in Open Bicycle Data”

As a designer and entrepreneur, Matias’ focus is to explore the potential of design and the arts to produce massive social and environmental impact for good. His work in Civictech and sustainable transportation has been awarded by the British.

Joey K. Lee, Mozilla Science Lab: “Entangling the map: adventures in that fuzzy space between art and science”

Joey Lee is an Open Science Fellow at the Mozilla Science Lab where he is working to enhance public access to science through the web.

Peter Richardson, Mapzen: “Free as in Maps”

Peter Richardson is an artist, animator, and open data enthusiast. He makes unusual maps at Mapzen. Art and code; emergent systems of righteousness. “Vertex predator”, “lighter of paint”.

Ben Wellington, IQuantNY: “Explorations in Urban Data Science”

Ben Wellington is the creator of I Quant NY, a data science and policy blog that focuses on insights drawn from New York City’s public data, and advocates for the expansion and improvement of that data. His data science has influenced local government policy including changes in NYC street infrastructure.

JD Godchaux, NiJel.org: “Mapping in the Hyperlocal”

JD Godchaux is the co-founder and CTO of NiJeL. He has over 15 years’ experience in both directing and contributing to data visualization projects. He is an expert in open source GIS, visualization, and data management tools.

Martin Burch, The Wall Street Journal: “Don’t Make a Map! Alternative Uses for Geospatial Analysis”

Martin Burch is a data developer for graphics at The Wall Street Journal. He specializes in data research and acquisition.

Max Rust, The Wall Street Journal: “Data Dumping”

Max Rust is a cartographer with the graphics department at The Wall Street Journal.

Justin Blinder: “Vacated: Mapping change in New York City from the ground up”

Justin Blinder is a Brooklyn-based artist, programmer, and designer. His work examines how big data has shaped our claims of ownership, criteria for an object’s value, and social interactions in the built environment.

Stefani Bardin, NYU, Parsons: “No Free Lunch: Mapping Food & Climate Change”

Stefani Bardin explores the influences of corporate culture and industrial food production on our food system and the environment. She works with neuroscientists, biologists, engineers and gastroenterologists to ground her research in the scientific world. These investigations take

GeoJourNews will be held at the Eugene Lang College, The New School, and accessible from all over New York City. Following the conference, join us at the CartoDB office for the afterparty and lightning talks from journalists across all landscapes!

Where: Eugene Lang College, The New School: 65 W 11th St, New York, NY
When: June 8th, 9:00 a.m. - 9:00 p.m.
Sponsors: Parson’s Journalism & Design Program, Knight Mozilla Open News

If you’re a member of IRE or a NICAR attendee, use promo code newsmaps16 for a 95% discount!

The first 20 people who get to the end of this blog post have earned themselves a 95% discount, as well!
RSVP:https://nvite.com/GeoJourNews2016/fb60/
Code: newsies2016

If you’d like to tweet about the event, or the sweet speaker set list, please use the #geojournews16 hashtag and @CartoDB!

See you all there and happy data mapping!


What’s happening with CartoDB in June

$
0
0
What’s happening with CartoDB in June

Coast to coast, Austin to Detroit, Madrid to the UK – CartoDB is traversing the globe and making waves in the world of location intelligence. This June, these are the events you must attend.

We’ll be in sunny San Diego, California for the Alteryx Inspire conference from June 6-9, where we will be giving Deep Insights demos and a joint workshop with Alteryx and Boston Consulting Group.

CartoDB’s GeoJourNews conference is happening this month! This conference is the premier event for data journalists that focuses primarily on visual storytelling and data. This year there will be 11 incredible speakers and great new resources for journalists announced.

Join us at the Southeastern Michigan Councol of Goverments (SEMCOG) where we will be giving a talk at their annual meeting in Detroit, Michigan.

See Santiago Giraldo, Civic Technologist, during his panel session, “Healthy and SMART Design” on June 13 in Austin, Texas. Santiago will be there for four days and giving a talk in the exhibition hall as well. Come visit us at our booth!

FOSS4G-UK is happening from June14-16 in Southampton, UK. See a workshop by CartoDB Ambassador, Tim Martin.

For two days in Madrid, our COO Miguel Arias, will be giving a talk at the Money Conference.

And on June 8 in Layton, Utah see Matt McCullough, Founder of Public Health Geographies and GIS Mapping Director at Infobytes at the Utah Valley GIS User Group Conference talking about CartoDB.

Happy data mapping!

Learning more about your store locations with the CartoDB's Data Observatory

$
0
0

There are a lot of ways for you to analyze store locations using CartoDB. The combination of Data Observatory, Location Data Services, and Deep Insights gives you a complete toolkit for building actionable dashboards and delivering insights using maps. Today, we are going to really quickly show you a few ways to interact with location data using three CartoDB tools: the Data Observatory, Location Data Services, and Deep Insights.

1. Upload a new dataset of store locations

I’m going to use one I have lying around. You can grab it here if you want to follow along. It’s just a few handfuls of locations from the Boston area.

2. Georeference our point locations

CartoDB makes it incredibly simple to turn street addresses into mappable coordinates. This process, called georeferencing, can be done right in the CartoDB Editor.

Georeferencing in CartoDB

3. Enrich your data using the Data Observatory

Next, we can explore our data using the map in the CartoDB Editor. We can also enrich it by using the Data Observatory as a source of augmentation. You can currently write SQL functions to update your tables of location data on demand. Take a look:

Augment in CartoDB

The relevant SQL to create a column and augment it with the Data Observatory at the same time looks like this:

ALTERTABLEstore_performance_dataADDCOLUMNpop_densityNUMERIC;UPDATEandrew.store_performance_dataSETpop_density=OBS_GetMeasure(the_geom,'us.census.acs.B01003001')

This is just one of hundreds of measures you can grab. Read about them in the documentation.

4. Combine your data with Data Observatory data

Once you have enriched your table, it’s pretty simple to combine two or more columns through simple operations. For example, here we will divide the total sales of each store by the population pulled in from the Data Observatory.

Normalize Sales Data

5. Grab boundary data from local regions

There are a few different ways to grab boundary data from the Data Observatory. One of my favorites is by using point data to supply an envelope to the Data Observatory and then gather all the boundaries within that envelope. In this example, I’m grabbing all the Census Block Groups.

Get Regional Polygons

The SQL may look long, but we’ll be working to make this easier and easier over time. The key thing to notice is that it writes results into an existing table, new_table_name with a geometry column (default in CartoDB), and an attribute column called geom_refs. Inside the request, it uses a Data Observatory function called OBS_GetBoundariesByGeometry. We are accessing a boundary layer called us.census.tiger.block_group_clipped, which is our custom coastline clipped block group data.

INSERTINTOnew_table_name(the_geom,geom_refs)SELECTthe_geom,geom_refsFROM(SELECT(OBS_GetBoundariesByGeometry(ST_Envelope(ST_Collect(the_geom)),'us.census.tiger.block_group_clipped')).*FROMandrew.store_performance_data)asresultsGROUPBYthe_geom,geom_refs

You can always use those to pull in data about the areas. Here we’ll grab Per Capita Income for each block group.

Per Capita Income

6. Overlay trade areas

There are many different ways to define trade regions in CartoDB. One of the most interesting is by defining polygons by walking or driving distance. Here, we’ll take each store location and define a 20 minute walking distance around them. We’ll write those into a new table of polygons to overlay on our map.

Store walking distance

This is pretty cool in CartoDB because you can calculate these walking distances right in the Editor pretty easily. In the SQL example, we define walk as the method, and we provide an array with a single value, ARRAY[1200], which means just 1200 seconds, or 20 minutes. We can actually get multiple walking distances at once by adding multiple values to the array.

INSERTintountitled_table_54(the_geom,data_range)SELECTthe_geom,data_rangeFROM(SELECT(CDB_Isochrone(the_geom,'walk',ARRAY[1200]::integer[])).*FROMandrew.store_performance_data)asresults

7. Use Deep Insights for a custom dashboard

Once you’ve seen your data on the map, you can improve your analysis by exploring it in a dynamic and custom interface. We have a library called Deep Insights that allows for you to build amazing dashboards around your data. In the following example, I’ve built a dashboard that allows you to drill-down into the Per Capita Income, the overlap of the block groups, and the trade areas, thereby allowing you to identify possible markets and potentially new territories. See it here.

8. Explore local segmentation data

This morning we released a new dataset of segmentation for the USA. This is a really interesting dataset for exploring location trends in businesses, people, and events. I wanted to quickly test adding each store’s local segment classification as a new column and then building a Deep Insights dashboard. Take a look at the results:

You can access segmentation from the Data Observatory with a simple SQL UPDATE like this:

UPDATEuntitled_table_57SETsegment55=OBS_GetCategory(st_pointonsurface(the_geom),'us.census.spielman_singleton_segments.X55')

9. Enjoy

These are just a few peeks into the powerful capabilities inside of CartoDB. Pull them apart and start applying them to your workflows. We’re excited to see what you start to build!

Happy data mapping!

Exploring Congressional Districts using the Data Observatory

$
0
0

The Data Observatory contains a wealth of information available for exploration and data augmentation. In this post we will look at an available dataset provided by the U.S. Census that is very much in the news: United States Congressional Districts.

First let’s retrieve all of the U.S. Congressional Districts and display them in a map. Using the function OBS_GetBoundariesByGeometry we can populate a table with all of the geometries. We can find the tag we need for getting the districts from the Data Observatory’s catalog.

Steps to Create a Congressional District Map

  1. Create a new table in CartoDB
  2. Rename the table to us_congressional_districts
  3. Rename the name column to geoid
  4. Run the following query:
INSERTINTOus_congressional_districts(the_geom,geoid)SELECTthe_geom,geom_refsFROMOBS_GetBoundariesByGeometry(ST_MakeEnvelope(-179.5,13.4,-42.4,74.4,4326),'us.census.tiger.congressional_district')

If you switch to ‘Map View’, you’ll see the United States covered in congressional districts. Now we can start exploring.

Where are the 10 smallest districts?

Here, New York City takes the cake.

10 Smallest Districts

It has 9 of the top 10 smallest congressional districts by area. Number 10 is in the center of Los Angeles. If you’re interested in recreating this, see the SQL I used.

Where are the 10 largest districts?

Unsurprisingly, we find that Alaska is the largest congressional district, followed by Montana, Wyoming, and large chunks of other western states. I re-projected the data into a more compact format so it is easier to see all of the congressional districts in one view.

See the SQL I used to create that map. Or just download the .carto file on that map’s public page.

10 potentially gerrymandered districts

The geometries of congressional districts are often in the news because of accusations of gerrymandering. The Washington Post highlights several districts which have highly irregular shapes (i.e., they’re not compact in a mathematical sense). We list them here!

North Carolina’s 1st, 4th, and 12th districts have long, extended shapes that span large areas of the state:

North Carolina Congressional Districts

Chicago has a district the shape of ear muffs:

Maryland’s 3rd district wends all across the central parts of the state, as does Pennsylvania’s 7th:

Texas’ 21st and 33rd districts, and Louisiana’s 2nd:

Finally, Florida’s 5th district:

These were all captured with our Static Maps API from this map.

Use this map and its data by downloading this map’s .carto file from its public page.

What’s next?

Watch our blog for more about the Data Observatory or read more in our introductory post.

Happy open data mapping!

A 3D City for the Mobile SDK

$
0
0
Maps 3D

In previous blog posts we primarily covered 2D data and functions. For the most part, CartoDB has largely been a 2D data visualization tool, with 3D maps for the web environment appearing sporadically. Now with the Mobile SDK, we often find ourselves building 3D maps for real use cases and sometimes just because we can!

The Nutiteq Mobile SDK, version 3.x, has several 3D features:

  • Map Tilting for perspective or navigation view.
  • Billboard orientation for certain objects, which means that markers, balloons, and texts face the viewer even when the map is tilted.
  • 3D Polygon is just like using polygons on maps, but it has height (in meters), which can be set by code. Also general color of the polygon can be set, but not the outline.
  • 3D Objects can be added to maps. This is useful for landmarks, adding a 3D car, or a pointer to a map as a 3D model. You can dynamically rotate or move these objects on the map.
  • 3D City Layer Type for bigger 3D datasets used for visualizing an entire city. The data can be loaded from offline file or via online API.

To convert data from common standard formats like: KMZ, OpenCollada, DAE, we have built converters. These converters are provided as extra products, just email me your model files for a free sample of the first conversion!

We don’t provide 3D data as part of our standard map service, because 3D city data and 3D landmarks are very expensive to produce and we license them from other vendors. If you are interested in adding 3D capabilities, or if you have your own 3D city data, contact us!

P.S. These and many other Mobile SDK features will be shown in a webinar next June 29. RSVP now.

Happy 3D mobile mapping!

5 cartographic tips for your Data Observatory maps

$
0
0

Over the past couple of weeks, you’ve probably been keeping up with all of the exciting news about CartoDB’s Data Observatory and losing sleep thinking about all of the cool maps that you can make!

With all of the information available in the Data Observatory, the thematic mapping possibilities are endless. In this post, we give you a few cartographic tips (there are definitely more!) to consider when building your Data Observatory maps.

Tip 1: Choose an appropriate map type

There are some basic principles to follow when choosing an appropriate thematic map type. First, establish if your variable of interest represents categories or numbers.

Categories?

Make a category map. Category maps are meant for data that are non-numeric and have distinct categories.

Limit the number of categories to around 10, maximum 12. Why? The human eye can only distinguish between so many different elements on a map. The consequence of using too many of anything is that your map reader will have a difficult time identifying patterns and some pretty interesting information can easily get lost. If you have more than 12 categories, think about ways to aggregate your categories to higher level groupings.

Numbers?

There are several thematic map types for numeric data. Examples include choropleth maps, proportional symbol maps, dot density maps, isarithmic maps, and many more. In this post, we’ll focus on choropleth maps and briefly touch on proportional symbol maps. For a detailed how-to on dot density maps in CartoDB check out this post by my colleague Stuart Lynn.

Choropleth Maps

On a choropleth map, each polygon (or enumeration unit) is colored according to its value in the data. Choropleth maps should not be used to map raw counts. Instead, they should be used to map data that are rates, proportions, percentages, etc. By normalizing your data, you avoid the pitfall of making a map that is biased towards larger areas. An un-normalized choropleth gives unfair advantage to larger areas on your map – mapping raw counts often times shows that if an area is large, it must have ‘more’ of the variable of interest.

The example below illustrates this area effect with the total female population for each county on the left, and the percent of the total population for a given county that is female on the right. You can see the map with raw totals does not give an accurate picture of female populations around the country. By normalizing the total female population by total population, we can more easily compare different parts of the country and find places that are more or less similar from one another.

A super useful piece of the Data Observatory Catalogue is the guidance it provides on which normalizing variable is best suited to use with which counts.

Proportional Symbol Maps

If you are mapping raw counts, proportional symbols are a good choice. For a more detailed discussion on proportional symbol maps, check out this blog post.

Tip 2: Choose an appropriate classification method

Numbers

Choropleth and proportional symbol maps can be classed or unclassed. For a classed map, choose a classification method that best suits your data’s distribution histogram. For example, if the distribution is uniform, you may want to use Equal Interval. On the other hand, if the distribution is clustered, you may want to use Jenks. Or, if you know your data well, you may decide to manually define your class breaks to best highlight the story you want to tell.

Regardless of your classification method, for classed maps (choropleth or proportional symbol), aim to have between 3-7 classes. Any more than 7 classes will make it difficult for your map reader to pick out important patterns and detail. That might sound counterintuitive (you are showing more details in the data with a higher number of classes) but the more colors and/or symbol sizes used, the harder time your map reader will have distinguishing between them. The goal at all times is to reveal interesting patterns and outliers in your data.

Tip 3: Choose the right color palette

Categories

The most common type of symbology for category maps is to use discrete colors for each unique type in your variable of interest. An alternative to color is to explore the use of pattern fills (for polygons) or different shaped icons (for points). If using color, be sure to pick a palette that keeps a somewhat uniform level of saturation and lightness for each category. In doing so, one category is not perceived as more important than another.

Below is a map of the most common mode of transportation to work, other than by car, for each county. There are three categories: public transportation, walk to work, or tie. Each category is given a unique color. Since walk to work is the most common category, I chose a color that would act as a highlighter for the other, less common categories. By doing this, we’re able to easily see the patterns and outliers in the other two categories.

Numbers

Choropleth maps typically use sequential or diverging color palettes. The colors in a sequential palette should be ordered in such a way where your map reader can easily distinguish between high and low values. An effective sequential palette varies in lightness and saturation at each color stop. Sequential palettes can use a single hue or multiple hues.

Diverging palettes can be thought of as two sequential palettes on either side of a central, neutral color. Use a diverging palette when you want to show values above or below an interesting midpoint in your data. Lots of times you will see variables with near-normal distributions symbolized using a diverging color palette with a standard deviation classification method.

Typically, darker colors represent higher values and lighter colors lower values. If using a dark background, think about flipping this order, just make sure to update your legend!

To illustrate this idea, let’s take a look at a light and dark version a map showing the percent of household income spent on rent by county. On the left, the sequential palette is ‘traditionally’ ordered with lighter colors representing lower values, and darker colors, higher values. We could use the same ordering for the dark version on the right, but the contrast of light against dark is stronger so I flipped the ramp so that higher values (now using the lighter end) come to the foreground of the map.

Tip 4: Basemaps

For a lot of the thematic maps I design, I typically create a really simple basemap in the CartoDB Editor. There are times where having more detailed context is important especially at larger scales, but, if the basemap is hurting more than its helping, you can create one of your own. For more information on simple basemap design in CartoDB see our blog on how to Create a Thematic Map of Current Drought Conditions.

If you use one of CartoDB’s basemaps, you have a few different options for both Positron (our light map) and Dark Matter (our dark map). The default version of both maps sandwich thematic layers between a base layer and a label reference layer (left). This is a great option for the majority of your maps especially when you have continuous polygons. Other options include labels underneath (middle) and a ‘lite’ version of both basemaps with no labels (right).

Basemap Styles

Tip 5: Projections

The category, choropleth, and proportional symbol maps described above are summarizing information about areas (counties). With these maps it is a best practice to use a projection that preserves area! In the example maps above, I am using Albers Equal Area centered on the US.

At larger scales, around the block group level (from the US Census), sticking with the default web mercator projection is an OK choice. For more details about projections in CartoDB, see our post, Free Your Maps from Web Mercator.

We hope these tips help get you started building your thematic maps. Keep an eye out for more blogs about the Data Observatory and cartography best practices!

Happy Map Designing!

Supporting the Data Stories Podcast

$
0
0
Data Stories

Last week we were excited to sponsor for the first time, Data Stories, a podcast on data visualization. We are big fans of the podcast and are very proud to support their work. Enrico Bertini and Moritz Stefaner have worked over the years to create a series of shows that explore diverse domains and ask really interesting questions about how people explore data, discover new insights, and create beautiful visualizations.

Last week’s show is titled, Listening to Data From Space with Scott Hughes. In it, you will learn how Scott Hughes and his collaborators have turned complicated data about our universe into sounds they were able to learn from, it’s awesome.

If you haven’t ever listened to Data Stories before, definitely check out their show archive where you can find conversations with the likes of Michal Migurski, Rachel Binx, Eric Rodenbeck, Giorgia Lupi, and many more.

Happy data visualizing!

Better U.S. Boundaries through Shoreline Clipping

$
0
0

As part of our Data Observatory rollout, CartoDB is excited to announce that we are maintaining shoreline clipped versions of the USA boundaries provided byU.S. Census TIGER (Topologically Integrated Geographic Encoding and Referencing).

TIGER publishes shapefiles of some of the most important boundaries in the U.S., including U.S. states, counties, congressional districts, census tracts, and zip code tabulation areas, amongst others. As TIGER’s boundaries are a work of the United States government, they are not under copyright. They can be freely reproduced, reused, and altered. This makes them an especially valuable resource for the democratization of mapmaking: anyone can use them however they want to!

TIGER does a great job producing and maintaining files at an excellent resolution, in fact since 2010 they have not published high-resolution files where boundaries follow shorelines. This means that a fledgling mapmaker’s first experience with TIGER boundaries is frustrating:

Boundaries extend deep into the oceans and Great Lakes; counties and states reach across major rivers and bodies of water to touch their neighbors. While there are many statistical reasons one would want shapes to do this, they don’t work well for maps.

GIS StackExchange abounds with questions about where to get TIGER boundaries that don’t ignore the shore. The shoreline-hugging files which the census does make available are low resolution, and thus unsuitable for maps that are zoomed in.

Using the same PostGIS that’s available to every CartoDB user, we’ve processed every TIGER boundary we make available to be “shoreline clipped”. What we’ve done is taken water shapes and “clipped” them out of the original, leaving boundaries that clearly follow the shoreline. Take a look, and you’ll understand why this is a big deal:

US Counties
US States
Tracts in New York City
Congressional Districts in LA

As an added bonus, the water areas we use to clip to shoreline are TIGER’s own AREAWATER shapes. This means we can put the resulting dataset under the same extremely liberal license as regular TIGER data sources.

The reason why shoreline clipped boundaries aren’t easy to come by is that they’re surprisingly difficult to put together. Fortunately, PostGIS is a powerful tool, and all the transformations can be achieved in SQL:

  1. Split the “positive” land shapes, like states or counties, into smaller, simpler components by using ST_Subdivide. Some of these geometries start out quite large – hundreds of thousands of vertices – and this makes them easier to work with. Geometries with a land area of 0 in TIGER are also eliminated here. Each component is assigned a unique ID.

  2. Create a water area layer by importing several thousand AREAWATER shapefiles from TIGER and bringing them into one table, excluding minor water features (lakes and ponds of less than 1/3 km square). This will be our “negative” layer, which we subtract from the positive.

  3. Join the water and land areas usingST_Intersects, and union the resulting positive and negative geometries by the ID from step (1). This leaves us with manageably-sized positive and negative shapes to subtract from each other. Note that this will exclude any land shapes that don’t touch water areas.

  4. Subtract the water shapes from the land shapes, leaving clipped versions with the unique IDs from step (1).

  5. Add back to these all the land shapes that didn’t touch water, which were excluded at step (3).

  6. Re-union the shapes according to their Census geoid (a unique identifier for every census boundary). This stitches the shapes back together, as they’ve been broken apart since step (1). For example, a state could have been broken into hundreds of shapes in step (1) – this would stitch those components, which since had water subtracted from them, back into one geometry.

  7. Eliminate any water features that have created holes in the original geom, using ST_ExteriorRing. Since this breaks geoms apart into their components, this is also where we eliminate pesky edge artifacts.

Our partners at Mapzen are also picking up our shoreline clipped boundaries, meaning that consumers of the Who’s On First gazetteer of places will be able to take advantage of shoreline clipping, too.

Happy data mapping!


Enjoy the CartoDB FOSS4G NA Presentation Videos

$
0
0

CartoDB has been participating and sponsoring the premier conference on free and open source software for geospatial, better known as FOSS4G-NA, for several years now and this year hasn’t been no exception!

This year we had three really fun talks at FOSS4G NA. If you have a little bit of time, enjoy these two presentations and get some peeks into the future of CartoDB.

Machine Learning on Geospatial Datasets for Segmentation, Prediction and Modeling

In the first presentation, Stuart Lynn gives an exciting introduction to machine learning with a geospatial bent. The talk touches on a number of exciting research areas at CartoDB.

Beyond Mapping Population Density: Cartography for Big Data

In the second presentation, Mamata Akella talks about her work to push the boundaries of web-cartography on the data layer. Much of her work is being actively pushed to future CartoDB releases, so enjoy a lot of insights into areas you’ll see play out in the future.

Wrapping up Python into a Cloud-based PostgreSQL

In the third presentation, Stuart Lynn will show you how we are using Python in-DB at CartoDB to build powerful functionality directly into the platform you are already using.

Happy data mapping!

Cmaps Analytics Integrates Carto Db In Bi Dashboards

$
0
0
CMaps Analytics & CartoDB

CartoDB and CMaps Analytics are working in the same direction to make location intelligence more accessible to more people than ever before.

With a wide range of business cases and ecosystems, independant software vendors like CMaps Analytics specialize in embedding location analytics into platforms like SAP and are taking advantage of CartoDB’s platform to enhance their software solution.

What is CMaps Analytics?

CMaps Analytics is a cloud based JavaScript API designed to make embedding interactive maps easier. As a Google Maps Technology Partner, CMaps Analytics inherits Google Maps and injects hundreds of configurable business intelligence capabilities and over 20 data visualization layers on top. With a cloud-first approach, CMaps Analytics has built a series of extensions so customers can plug-in location intelligence into platforms like SAP BusinessObjects, Microsoft Sharepoint, and now Google Sheets.

Now, CMaps Analytics includes four brand new CartoDB map layers that make embedding CartoDB powered analysis a breeze.

In Action

CMaps Analytics cloud APIs are used for native extensions into other platforms like SAP. Customers who invest in CMaps Analytics build interactive visualizations using their on-premise data. However, the “where” questions that customers are increasingly asking require data that does not exist in their transactional system or BI data warehouse.

That is where CartoDB comes in, to deliver significant value:

“Before CartoDB, when a customer approached us wanting to take demographics, public, or open sourced data, or had a massive data extract from a system that is not integrated with a BI data warehouse, it was a laborsome process that required GIS,” said Evan Delodder, CMaps Analytics CTO. “Now, we can take that data and have it loaded into CartoDB, secured, and prepared as beautiful visualizations into BI reports and dashboards the same day, with minimal effort.”

Benefits

For CMaps Analytics customers who are primarily embedding cloud location intelligence into on-premise business intelligence, the benefits of CartoDB are primarily focused on acquisition and generation of new geospatial powered insights.

The value CartoDB provides to CMaps Analytics is its powerful, flexible APIs and services for delivering interactive big data visualization layers. Currently CMaps Analytics is using CartoDB as a native data visualization layer type so customers can:

Address big data visualization and analysis scenarios that require lots of geospatial data Centralize geospatial analysis to a platform designed for spatial analytics Move from legacy GIS installations to modern cloud solutions Use new innovative visualization techniques like Torque for animating time series data Deliver content hosted in the cloud securely to on-premise analytics and business apps

Additional plans for deeper integration and expansion of services is planned in the near future. Learn more about CartoDB integration in CMaps Analytics.

Did you miss out on this great webinar? Please find a recording of this session below!

Happy data mapping!

Editing Vector Map Data in Mobile

$
0
0
Edicting vector map

A feature frequently requested by more advanced Mobile SDK users is the ability to edit and modify vector map data in a mobile app. Well, we are here to tell you that it is not only entirely possible, but that it can be done in many different ways!

There are two ways to modify/edit your vector map data with the Mobile SDK:

  • Simple: The app developer can modify vector overlay geometries programmatically. All geometry coordinates are mutable, so code can change them and reflect these modifications on the map automatically.

  • Advanced: The developer can define that a layer is editable, which allows objects on the map to be moved around, vertices of polygons and polylines to be changed, and even new geometries to be drawn.

In the more advanced technique, the Mobile SDK has a special layer type called “Editable Vector Layer,” which specifically defines a data source. Data sources can be files, your online API, or in-memory data. You can retrieve objects in this layer, save changes, define what to do when a corner point or a whole object is clicked, and decide how to highlight objects being edited. You can define and customize both sides of the solution—backend data storage as well as user experience. We provide you with a ready-made connection to the CartoDB API as a backend and web solution for your map data, but you can also connect it to Shapefiles, Geopackage database, ArcGIS server APIs, or to your specific custom servers.

Here, is a short video for polygon editing on top of an aerial image:

NOTE: The Editable Vector Layer is provided as a GIS extension. Please contact us about how to get an extension, and we will provide you with a special sample app. Remember that our SDK is not a full-blown GIS client—it is a flexible developer toolkit to visualize map data. You can build a complete app around it yourself, or ask us for an intro to experienced partners who build ready apps.

P.S. These and many other Mobile SDK features will be shown in a webinar on June 29. RSVP today!

Happy mobile data mapping!

Creating Demographic Segments

$
0
0
segments

In a previous blog post we announced our demographic segmentation service as part of the Data Observatory. In today’s post we will discuss how we generate these segments and how we went about giving them names.

Finding Segments in the Census

Each one of us is a precious individual snowflake of data… but if you look around your neighborhood you will start noticing similarities to your fellow humans. You might all be roughly the same age, have the same income, drive to work or take the subway. There are patterns in groups of people everywhere you look.

Luckily we can train computers to pick out these kinds of groupings, or clusters, of people. We can take each census tract and a selection of the census variables which describe it, then using a method called K-means clustering we can identify groups of tracts that are statistical similar to each other.

To see how this might work, let’s consider a simple example. Imagine we collected data on people’s ages and the probability they own a record player. We plot the data and it looks something like this

clusters

As humans it’s really easy for us to pick out that there are three clusters of people. K-means attempts to find these clusters programmatically. It does this by:

  • Randomly guessing where the center of each cluster is
  • Finding all points closest to that cluster
  • Moving the cluster centers to the mean location of all the points that belong to that cluster
  • Repeat until the cluster centers don’t move much between iterations.

The end result of this process is to label each point on the graph a 1, 2 or 3 depending on what cluster it belongs to. If we color the points by the labels k-means gave them the data looks like this:

clusters with labels

Awesome, the algorithm has done programmatically what we as humans do instinctively. This simple example is trivial and we didn’t need k-means to find the clusters, but what if we had 150 different variables to sift through and wanted to find 55 independent clusters as we do with the census? Then its essential to use an algorithm.

Naming clusters

Unfortunately the algorithm can’t determine meaningful names for these clusters, thats up to us. In this example we might decide to call the yellow cluster: ‘young hipsters,’ the blue cluster: ‘parents with mp3 players,’ and the green cluster: ‘original record player owners.’ These names are subjective but informed by our intuition and the data.

Census Clusters

Applying the procedure outlined above allows us to segment the census into neighborhoods that fall into one of 55 different clusters. And after many hours of staring at plots of the census variables in each cluster, give a to name them. No doubt some of these names can be improved and we are going to keep working on getting more accurate descriptions of these neighborhoods but we wanted to set you lose on them early.

To get a feel for just how diverse a place the U.S. is, here are the neighborhood segments for multiple U.S. cities:

chicagonew york

seattleSan Francisco

You can explore the 55 segments in more detail using this deep insights dashboard:

For more details of how this all works in practice, check out this paper from Spielman and Singleton, which our segments are based on. Also checkout their blog post about segmentation in CartoDB.

We are working hard to generate segmentation for other countries outside of the U.S. Keep an eye on the blog and CartoDB to find out when we will be launching these.

Until then happy data mapping!

Get the best of your National Mapping Agency into CartoDB

$
0
0

Note: This post has been written in collaboration with Antonio F. Rodríguez Pascual, Deputy Assistant Director of the Spanish Centre for Geographical Information.

National Mapping Agencies are organizations with lots of data to offer to citizens and organizations like thematic data but also Reference Data (data used to georeference other information) generally known as basemaps. They are great producers of data and do their best to put it into people’s hands, normally using two approaches: direct data download or standard services. Standard services are the preferred way to publish rendered maps using the well known Web Map Service (WMS), which works for both for vector maps and to serve imagery. The tiled version of this service is called Web Map Tiled Service (WMTS) and its use is becoming also more popular as client applications implement it.

With CartoDB it is straight forward to add a WMS/WMTS service as a basemap. For example we can take the Spanish official 1:25,000 WMTS base map by the IGN, the Spanish National Mapping Agency (Instituto Geográfico Nacional) and add it to CartoDB simply by putting the url http://www.ign.es/wmts/ign-base on the WMS/WMTS section of the custom base map dialog.

Add a WMTS basemap

There are plenty of great services to be used as basemaps, including this map created to compare current Madrid neighborhoods with the shape of the city in 1875. Many more can be found for Spain at the official services directory of the national Spatial Data Infrastructure.

Another approach is directly using NMA data with your CartoDB account. Permission to use this data will vary depending on the license of the datasets, from public domain to applying some restrictions like non commercial use. Always check your dataset license!. If the dataset is directly offered on a format suitable by CartoDB (like a zipped shapefile) you can create a Synced Dataset that will be checked periodically and updated automatically if needed. Unfortunately, data is not always directly available because you may need to complete a form to download it or you need to process it before uploading to your account.

Regarding data licensing there are some NMAs who are publishing their data under open conditions. We usually understand Open Data as data under CC-BY, CC-BY-SA and similar licenses. It is not easy to know what the situation is in each country; the best effort made in this direction is the annually updated web page by the Open Knowledge Foundation about Open Data in the world, which includes a very interesting section about maps.

OKFN National Map section

For example Canada, United States, Mexico, Denmark, Norway and recently Spain have partially or totally opened their official cartography. In the case of Spain, the National Geographic Institute has defined in December 2015 all its digital data products as Open Data and it’s publishing them under a CC-BY 4.0 or equivalent license. It is generally assumed worldwide that this is a good idea because open geospatial data acts as a development engine. However, sometimes NMAs are obliged to follow a restrictive legal framework and/or a business model based on self-financing.

In any case it is important to stress that it is essential to maintain attribution in case of reuse others geographic data for recognizing the original authorship and copyright of the data you are using. This is the first requirement of any data producer. On the other hand, it is also useful to take advantage of the reliability and prestige of the base data you are using in your product.

To do this in CartoDB remember to fill the Attribution box on your tables’ metadata so your maps are correctly attributing your sources. In the case of data coming from CartoDB Data Library this is automatically set up for you.

Attribution on CartoDB maps

Happy open data mapping!

Viewing all 820 articles
Browse latest View live