Creating Demographic Segments

In a previous blog post we announced our demographic segmentation service as part of the Data Observatory. In today’s post we will discuss how we generate these segments and how we went about giving them names.

Finding Segments in the Census

Each one of us is a precious individual snowflake of data… but if you look around your neighborhood you will start noticing similarities to your fellow humans. You might all be roughly the same age, have the same income, drive to work or take the subway. There are patterns in groups of people everywhere you look.

Luckily we can train computers to pick out these kinds of groupings, or clusters, of people. We can take each census tract and a selection of the census variables which describe it, then using a method called K-means clustering we can identify groups of tracts that are statistical similar to each other.

To see how this might work, let’s consider a simple example. Imagine we collected data on people’s ages and the probability they own a record player. We plot the data and it looks something like this

As humans it’s really easy for us to pick out that there are three clusters of people. K-means attempts to find these clusters programmatically. It does this by:

Randomly guessing where the center of each cluster is
Finding all points closest to that cluster
Moving the cluster centers to the mean location of all the points that belong to that cluster
Repeat until the cluster centers don’t move much between iterations.

The end result of this process is to label each point on the graph a 1, 2 or 3 depending on what cluster it belongs to. If we color the points by the labels k-means gave them the data looks like this:

Awesome, the algorithm has done programmatically what we as humans do instinctively. This simple example is trivial and we didn’t need k-means to find the clusters, but what if we had 150 different variables to sift through and wanted to find 55 independent clusters as we do with the census? Then its essential to use an algorithm.

Naming clusters

Unfortunately the algorithm can’t determine meaningful names for these clusters, thats up to us. In this example we might decide to call the yellow cluster: ‘young hipsters,’ the blue cluster: ‘parents with mp3 players,’ and the green cluster: ‘original record player owners.’ These names are subjective but informed by our intuition and the data.

Census Clusters

Applying the procedure outlined above allows us to segment the census into neighborhoods that fall into one of 55 different clusters. And after many hours of staring at plots of the census variables in each cluster, give a to name them. No doubt some of these names can be improved and we are going to keep working on getting more accurate descriptions of these neighborhoods but we wanted to set you lose on them early.

To get a feel for just how diverse a place the U.S. is, here are the neighborhood segments for multiple U.S. cities:

chicago new york

seattle San Francisco

You can explore the 55 segments in more detail using this deep insights dashboard:

For more details of how this all works in practice, check out this paper from Spielman and Singleton, which our segments are based on. Also checkout their blog post about segmentation in CartoDB.

We are working hard to generate segmentation for other countries outside of the U.S. Keep an eye on the blog and CartoDB to find out when we will be launching these.

Until then happy data mapping!

Creating Demographic Segments

Finding Segments in the Census

Naming clusters

Census Clusters

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List