No data? No problem with the Data Observatory

As you might have already heard, the Data Observatory just launched to help provide CartoDB users with a universe of data. One of the reasons we built the Data Observatory is because getting the third-party data you need is oftentimes the hardest part of analyzing your own data. Data wrangling shouldn’t be such a big roadblock to mapping and analyzing your world.

Some of the most desirable and most used datasets are also the hardest to handle. Datasets like the US Census are rich with information but can be large and distributed. What most people want, though, is just a snapshot of some boundaries and particular measures. For instance, if you were telling a story about unemployment in a city, you might want to grab Census Tracts for that city, the number of unemployed people for each tract, and the number of people 16 and over for each tract (as the denominator). All of this is no easy feat given the size of the datasets provided by the US Census.

For today’s post, I want to look at how we can use CartoDB’s Data Observatory to answer questions when we don’t already have any data at all. Further, I want to only focus on just the region that I’m interested in, and avoid spending time working with data and geometries that are not used in my analysis.

Let’s explore San Francisco’s current housing price boom.

We’ll pull data for San Francisco’s real estate reality provided by Zillow and available in the Data Observatory. For the boundaries, we will work with the US Census generated ZIP code Tabulation Areas (ZCTA) boundaries, which correspond to ZIP codes for most purposes. To go beyond displaying one measure, we will pull the most recent median housing value per square foot and compare it to the same locations from three years ago to show the explosive growth in the region.

Getting Appropriate Boundaries

To get started we’ll need the boundaries that we’re interested in visualizing. Depending on the scale of your map (national, state, county) you will want different geometries. Looking at Zillow, they provide median home values in particular geographical regions, the most convenient of which is ZIP codes.

To find the available boundaries, we can use the function OBS_GetAvailableBoundaries which will list all of the IDs for boundaries available through the Data Observatory at the location we need them. Since we’re focused on San Francisco, we just pass the approximate location of the city and filter by descriptions which contain the word ‘zip’, like so:

SELECT*FROMOBS_GetAvailableBoundaries(CDB_LatLng(37.7749,-122.4194))WHEREdescriptionILIKE'%zip%'

The results look like this:

Get All Available Boundaries at a location

This will produce a temporary table of all boundaries we can use that intersect that point. We want the ZCTA boundary since we’re focused on ZIP codes. I’ll also opt for the water-clipped version since we don’t need to do any analysis with the geometries–we are only using them for visualization this time. The boundary_id that we need is: us.census.tiger.zcta5_clipped. You can see the data associated with these boundaries in this ID: they come from US Census Tiger geometries, are ZCTA boundaries, and are clipped (meaning water-clipped instead of statistical).

Now that we have our boundary ID, we can get our boundaries with the Data Observatory function OBS_GetBoundariesByPointAndRadius, which requires a center, a radius (in meters), and the boundary ID we’re interested in. I’ll populate an empty table called sf_housing which has a column for geometries (the_geom), and another for storing the text id of the geometry.

INSERTINTOsf_housing(the_geom,geoid)SELECT*FROMOBS_GetBoundariesByPointAndRadius(CDB_LatLng(37.7749,-122.4194),-- San Francisco, CA25000*1.609,-- 25 mile radius (= 25 km * conversion to miles)'us.census.tiger.zcta5_clipped'-- use water-clipped geometries for visualization)

Checking out our table, we’ll find it filled with all of the ZCTA boundaries in the region we’re interested in, but there’s no data for the geometries yet!

Getting Housing Value Data

Our next step is getting the housing price data for each of the ZIP code areas. The Data Observatory makes that possible through the wonderfully versatile OBS_GetMeasure. The reason that the Data Observatory splits these the work into these two pieces is so that you have complete flexibility about what data you blend and what data you ignore. With hundreds of measures and growing, this flexibility will keep your work easy.

To find the measures that would be of interest, we’ll use the OBS_Search function through CartoDB’s SQL tray, like so:

SELECT*FROMOBS_Search('zillow')

In CartoDB, it will look like this:

This gives back several IDs of interest, but let’s choose us.zillow.AllHomes_MedianValuePerSqft. This entry is just what it says, median home value per square foot for “All Homes” (i.e., single-family, condominium and co-operative homes). Let’s look at the most recent timestamp (March 2016).

We’ll put all of this information into OBS_GetMeasure and populate a new column with the results:

UPDATEsf_housingSETmedian_value_per_sqft_2016=OBS_GetMeasure(the_geom,-- specify the place'us.zillow.AllHomes_MedianValuePerSqft',-- specify the measure'area',-- specify normalization (this gives back raw measure)'us.census.tiger.zcta5',-- specifies level of geometries'2016-03'-- specifies when in time)

Visualizing this in CartoDB, we get the following map:

Now we can easily see the median housing value per square foot easily visualized on a map – and we didn’t have to track down the datasets to do so!

Calculating Historical Changes

Now that we have recent housing price data for San Francisco, let’s compare it to data from March, 2013 to look at the change in home value over time. To do this, we can calculate the current price divided by the historical price to get the ratio of change. Subtracting one from this ratio and multiplying by 100% gives us the percentage change of 2016 as compared to 2013.

We can accomplish this with the Data Observatory as follows (after creating a new numeric column):

UPDATEsf_housingSETpercent_change_median_value_2016_2013=100.0*(median_value_per_sqft_2016/OBS_GetMeasure(ST_PointOnSurface(the_geom),'us.zillow.AllHomes_MedianValuePerSqft','area','us.census.tiger.zcta5','2013-03')-1)

Now that we have our data, let’s map it! In under 5 minutes we have created a pretty cool map of housing value change in SF starting with an empty dataset!

We went from having no data to creating a value-added maps in under five minutes. And this is just the beginning of what can be done with the Data Observatory.

What’s next?

We hope you are starting to see the power of the Data Observatory. We see the Data Observatory as a source of enrichment and a way to make your analyses more powerful in CartoDB.

There’s so much more to explore, so watch our blog and Twitter for more. Also, read our documentation and checkout our catalog of available data to get started.

What maps and datasets will you make from an empty table? Share your “something from nothing” maps with us!

Happy data mapping!

No data? No problem with the Data Observatory

Let’s explore San Francisco’s current housing price boom.

Getting Appropriate Boundaries

Getting Housing Value Data

Calculating Historical Changes

What’s next?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112