Quantcast
Channel: CARTO Blog
Viewing all 820 articles
Browse latest View live

CartoDB Welcomes Ambassador Chris Woods

$
0
0
Welcome Chris Woods

At CartoDB we do our very best to spread the word on how anyone around the globe can leverage data and visualizations to make better, faster decisions. When great mappers make and discover insights through location intelligence we like to reward them.

We are proud of our carefully selected and very talented CartoDB Ambassadors. CartoDB Ambassadors are deeply involved in the mapping community and are inspirational educators that use CartoDB technology to empower future mappers. Ambassadors play an essential role to our mission of democratizing location intelligence by producing amazing maps using open data and educating others about the power of CartoDB.

Our Ambassadors Program identifies exceptional individuals who are engaged with their local community as well as mappers across the globe, harnessing the power of data to create positive change. These mappers gain recognition for their influential visualizations and achieve increased visibility as their projects are promoted by our team.

We would like to celebrate our newest Ambassador, Chris Woods, and congratulate him on the impact his visualizations have had in the UK, as well as his contributions to the global data mapping community.

Chris comes from a strong IT background and was inspired to map because he wanted to see how the power of emerging geographical visualization platforms could be used to bring new insights into the wave of public data being published by the British government and others.

One of his most notable CartoDB mapping projects is the Campaign For Better Transport. The interactive visualizations show how rail usage in the UK has evolved over nearly two decades. The data-driven maps allow the viewer to assimilate large amounts of data to instantly grasp the picture nationally, and zoom in on the area in which they live and work.

“The fact that the map shows 45,000 data points in a way that can be quickly understood is astonishing,” says Chris. “When these abilities are combined with a map that is animated, attractive, and recognizable you can see why it’s
been popular.”


The map was published by BBC News, Scotlands National Newspaper, The Scotsman, and has received over 75,000 views to date!

Why use CartoDB and participate in the Ambassadors Program?

Like other Ambassadors, Chris was drawn by the ability to create attractive, engaging, and easily understood visualizations of detailed data that would be otherwise dry and difficult to understand.

“I was drawn to the Ambassador’s Program for two reasons: first, I wanted to be more visible in the geo visualisation world; second, to be able to use more advanced features such as the Twitter Firehose and linking dynamically to data outside of CartoDB.”

We can expect more insightful visualizationss from Chris as he engages in talks with the Campaign For Better Transport about further mapping projects.

If you would like to learn more about the program or if you are interested in applying, please visit our Ambassadors Program page.

Also, be sure to check out our gallery of amazing maps created using CartoDB!

Happy data mapping!


CartoDB Welcomes Ambassador Chris Woods

$
0
0
Welcome Chris Woods

At CartoDB we do our very best to spread the word on how anyone around the globe can leverage data and visualizations to make better, faster decisions. When great mappers make and discover insights through location intelligence we like to reward them.

We are proud of our carefully selected and very talented CartoDB Ambassadors. CartoDB Ambassadors are deeply involved in the mapping community and are inspirational educators that use CartoDB technology to empower future mappers. Ambassadors play an essential role to our mission of democratizing location intelligence by producing amazing maps using open data and educating others about the power of CartoDB.

Our Ambassadors Program identifies exceptional individuals who are engaged with their local community as well as mappers across the globe, harnessing the power of data to create positive change. These mappers gain recognition for their influential visualizations and achieve increased visibility as their projects are promoted by our team.

We would like to celebrate our newest Ambassador, Chris Woods, and congratulate him on the impact his visualizations have had in the UK, as well as his contributions to the global data mapping community.

Chris comes from a strong IT background and was inspired to map because he wanted to see how the power of emerging geographical visualization platforms could be used to bring new insights into the wave of public data being published by the British government and others.

One of his most notable CartoDB mapping projects is the Campaign For Better Transport. The interactive visualizations show how rail usage in the UK has evolved over nearly two decades. The data-driven maps allow the viewer to assimilate large amounts of data to instantly grasp the picture nationally, and zoom in on the area in which they live and work.

“The fact that the map shows 45,000 data points in a way that can be quickly understood is astonishing,” says Chris. “When these abilities are combined with a map that is animated, attractive, and recognizable you can see why it’s
been popular.”


The map was published by BBC News, Scotlands National Newspaper, The Scotsman, and has received over 75,000 views to date!

Why use CartoDB and participate in the Ambassadors Program?

Like other Ambassadors, Chris was drawn by the ability to create attractive, engaging, and easily understood visualizations of detailed data that would be otherwise dry and difficult to understand.

“I was drawn to the Ambassador’s Program for two reasons: first, I wanted to be more visible in the geo visualisation world; second, to be able to use more advanced features such as the Twitter Firehose and linking dynamically to data outside of CartoDB.”

We can expect more insightful visualizationss from Chris as he engages in talks with the Campaign For Better Transport about further mapping projects.

If you would like to learn more about the program or if you are interested in applying, please visit our Ambassadors Program page.

Also, be sure to check out our gallery of amazing maps created using CartoDB!

Happy data mapping!

US Census + Machine Learning to map entirely new populations using CartoDB

$
0
0

This map shows the distribution of people in the US who have a problem with both their sight and their hearing. It was made using US Census data but if you go looking for sight and vision data in your copy of the US Census files, you aren’t going to find them. How did we make it? In this post we will take you through a quick experiment in using a pinch of public use microdata, a smidgen of machine learning and some inspiration from our friends over at Enigma, to make the US Census reveal to us patterns we’ve never seen before.

The US Census is an amazing data project. It contains incredible details of peoples lives right down to the tiny block level. It’s a tool that if used correctly can, either by itself or in tandem with other data, give us amazing insights into a wide variety of problems.

While the attributes in the US Census data allow us to see many dimensions of the US population, to answer some questions the packaged data isn’t enough on its own. One example where significant work is required to leverage the power of the US Census is an area called, segmentation. Segmentation is the division of population into different groupings that allow us to understand the distribution of those groups across the US. The ability to perform segmentation on US Census data is valuable to everybody from non-profits, expanding businesses, sales teams, and election campaigns.

Introducing Public Use Micro Data

We are going to look at trying to predict the joint probability of a person in the US population having both a sight problem and vision problem. We can get some raw counts of these two independent problems as reported by a limited number of people made available in the US Census’s public use micro data or PUMs for short. The PUMs are provided at a coarse geospatial scale called PUMAS area.

We extract from the PUMs the fraction of people in each PUMAS that reported:

  1. A vision problem but no hearing problem
  2. A hearing problem but no vision problem
  3. Both a hearing and vision problem
  4. No problems with hearing or vision

Using the joint distribution of 1 and 2 above, we are able to calculate for each PUMAS area what proportion of 3 and 4 exist. The following map shows the number of people in each PUMAS area who have both a vision and hearing problem. While the map is interesting already, we really wish we had the same data at a finer scale.

Upsampling US Census data through Machine Learning

The US Census data is that they provide many different statistical areas at different spatial resolutions. For our analysis, we want to determine the rate our target population at the spatial scale of Census Block Group. The Census Blocks Groups are nice and small and have a ton of valuable dimensions published by the US Census. Those dimensions are the key to our ability to upsample our data.

Many of the same dimensions published in Census Block Groups can be found in the PUMAS data as well. This linking across scales allows us to create models that will allow our upsampling. The way it works is that first we create a predictive model for a target variable based on the inputs known data at the PUMAS scale. We can then take our model and provide inputs at the Block Group scale and create new outputs our our new desired scale.

Determining how our new target variables (sight and hearing) relate to the other PUMs values would be a Herculean task for most humans. If we tried, we might suspect that they correlate with age and perhaps income but the precise nature of the relationship is bound to be complex and non-linear. This is where machine learning let’s us perform a task that might be otherwise impossible.

We wont go in to too many details in this blog post, but if you are interested in the models we create check out our ipython notebook here. The basic procedure is to set up a neural network that takes in a vector containing all the summary table information and will produce a vector of the 4 probabilities we want to compute. The algorithm learns the relationship between the inputs and outputs. Put simply, the model will aim to predict the likelihood of hearing and vision problems from all other attributes of the census data.

After training the model we want to check that it is capable of accurately predicting what we trained it to predict. To do this we simply get the model to predict the values of a handful of PUMAS areas that we held back from the model while training. If the predicted values are close to the known ones then the model is doing a good job. The following graph shows a dot for each of the test PUMAS with the y-axis showing the predicted value and the x-axis the known value. The closer a point lies to the red line the better the model did at predicting it. We do pretty well only being off by about 0.05% at most.

Armed with a model that we trust, we can now do the fun stuff. While we trained our model on the inputs from the PUMAS data, those attributes are the same as we can find at the Census Block Group resolution. When we apply our model to the Census Block Group, we are able to produce the first map of this population across the entire US.

Conclusions and future steps

Applying machine learning to the US Census data to define new variables is a really interesting approach. We think there are lots of cases where it can be incredibly useful and powerful. In this example we show how we can use data from one US Census dataset to predict that variable on smaller scales. The method isn’t limited to moving across the scale dimensions. In fact, the method isn’t limited to staying within the US Census data at all. Just take a second to imagine this method being used to predict the locations of your potential customers, your possible donors, or the likely voters you need to reach. This method of on-demand segmentation opens up a world of possibility.

If you are interested in how you can start using the US Census Data directly in your CartoDB accounts, or if you want to learn how to do advanced analysis on CartoDB, get in touch!

Happy data mapping!

Routing and Path Sectionals

$
0
0

You might have seen the blog post we produced a little while ago looking at the
L train closure and the possible impact that might have on L commuters to manhattan. One of the most visually striking elements of that post were the maps where we showed walking routes for people in Brooklyn to their nearest L. We did it a few times, once like above showing all people in Brooklyn, and a second time showing cumulative walking corridors for people that live on the L. You might have wondered how we made those lines, so in this post we will show you how.

Plan of attack

One of the aims of our blog post about the L line closures was to show the large number of people that would be impacted by the closure of the tunnel. We could have shown this as a choropleth, simply coloring the regions of the map brighter when there where a large number of commuters who would be effected there but to me you lose the impact of knowing that these are people who are being impacted with lives and jobs. Instead we decided to convey this information by showing the impact by producing flow lines of the routes commuters would take to go to their nearest L train stop. Being able to imagine yourself walking to your own subway stop helps connect you with those that are going to be affected by the closure.

To produce maps of the flows of cars/pedestrians/walkers etc we needed to know three things:

  1. How many people from a given area where effected?
  2. Which station did known Manhattan commuters likely walk to?
  3. What is their most likely walking route to that station?

Getting the data

To understand how many people would be effected by the closure we turned to US Census. The Census is a treasure trove of information on the kinds of people in the US and how they live and work. The Census has a column which asks people how they typically commute to work. While the column has a few caveats (e.g. people could only reply to the question with a single transportation method) but we decided that it could give a reasonable idea of relative numbers of people who would be walking from each census block.

With our subway commuters in hand, we needed to filter them to only include people who commute in Manhattan. To get this number we use the LODES dataset which describes where people in the US commute to work. It tells us for each region the number of people who work in another region. Using some simple math, we can combine our two values and uncover all the people who likely take the subway from Brooklyn to Manhattan for work. Below we map our results out:

Finally, we filtered our data to only the blocks where people likely walked to the L (based on distance) for their commute. This would have been enough to tell the story we wanted to tell but it lacked visual impact we wanted. So we busted out some routing.

Routing

To show the daily impact of street traffic by the L train closure we decided to map the walking routes of those commuters. Routing currently isn’t available within the CartoDB interface yet, so we turned to the excellent routing solution Valhalla produced by our friends at Mapzen.

Valhalla has some nifty features for finding the best route between two points, including a desired mechanism. In our case we wanted to map out pedestrian routes. We wrote a little python package to take a list of origin and destination latitude and longitudes and get the Valhalla generated linestring, travel distance and travel time.

If you want to try it, you can install it via:

pip install valhallapy

You can point this script at Valhalla’s servers and include your API key and let it trundle through all the data you want to route.

Alternatively, if you have a one-off job the requires a large number of routes to do like we did, you can use the docker container here to quickly and easily run a version of Valhalla locally on your computer.

Once all the routes have been generated the script spits them out as a geojson file that is ready to be uploaded to CartoDB. If we simply plot those lines we get the following map. This is awesome but with so many of the routes from different census blocks overlapping we have no way of really gauging the number of people along each route.

Combining overlapping segments

To get around this problem we need to segment the linestring, breaking them in to straight segments with no turns. To do this we wrote a PL/PGSQL function which you can grab here:

Just copy and paste that segment_line function in to your CartoDB SQL pane and it will be available in your account. Running the function on your geometry column will spit the lines out in to separate parts, each with the number of commuters and travel time along that route.

Now we just have to aggregate the number of commuters along each line segment and the max travel time along that segment by running the query in that gist. Having those two variables is perfect as it gives us two dimensions to style the routes.

Styling and bringing it all together.

Let’s bring it all to gather by writing some CartoCSS. We are going to style the thickness of each line with the number of commuters along that line and the color of the line by the maximum commute time of any commuter along that segment. Here is an example of CartoCSS you could use:

Bringing that all together we get the following map which is really powerful and impactful! We also produced this map showing the walking commutes of every person who rides the subway walking to their nearest subway stop.

We’re excited to find out what is possible with all the work that Mapzen has been doing. Look out for more experiments soon! In the meantime, let us know if you try this kind of analysis yourself.

Have fun with routing and stylizing routes!

Automatically Detecting Areas of Interest

$
0
0

One of the most common reasons people make maps using CartoDB is to try and explore the patterns in their data in order to make better decisions or to communicate interesting ideas. This exploratory approach to location intelligence can be difficult, but armed with the right analytical workflows and cartographic approaches people can create incredible new value simply by using a map. In today’s blog post, I’m going to introduce a lesser-known method that can help you find interesting patterns in polygon data that a simple thematic map (e.g. a choropleth) alone may allow you to overlook.

Choropleths have to be one of the most popular thematic map types produced. One form of choropleth you are almost certainly familiar with right now is the election map, showing the percent win of various candidates in elections (learn how to make one over on the Map Academy). Take a look at the one below from a recent workshop on the topic, the particulars of the data don’t matter to this story yet.

Choropleths can tell us a lot about the geographical distribution of data we are mapping, but sometimes they lack certain information that can be critical to gaining insights and extracting value. For example, depending on the selection of your color bins and ramp, a choropleth may highlight or obscure key attributes such as the average value and information about the distribution of the data. Still, choropleths allow us to explore the distribution of data on a map. Trends easily become clear, like the Midwest is a strong place for Republican candidates, while urban areas and more populated states tend to vote more Democratic.

What we often find with choropleths is that they can be good for quickly learning about your data, but become better the more you begin to know. So how can you learn more about the patterns in your data than a choropleth alone can show you?

Exploring the Census

Let’s visualize American Community Survey data from the United States Census, focusing on just a couple of variables. Here we’re looking at ratio of the number who ‘worked at home’ to the number of workers 16 years and older:

As you can easily see, there are dozens of counties in the Upper Midwest, west of Minnesota and Iowa where people reported that they work from home (WFH). From this map, we can infer that there is a change in work environments north and west of the line from Dallas to Chicago, with scores of values in the double digits west and north, and scores around or below the mean of 4.7% south and east of this line. There are also several outliers scattered throughout the country, easily seen as dark spots.

Some of the natural groupings of work-from-home counties are obviously not contained by just county borders–there are broader clusters of WFH hotspots and coldspots–but in other regions it is more contained, like the anomaly in southern Missouri. This leads us to wonder how we can detect spatial correlation of work-from-home percentages from one county to the next to better identify the regional behavior.

Tobler's First Law of Geography — Everything is related to everything else, but near things are more related than distant things.

Finding Spatial Clusters

Let’s try taking on this data with a different tack. Obviously, there are regions where counties are correlated with each other. That is, a county with a high percentage of WFH lies adjacent to other counties with high percentages, and vice versa for counties with low percentages. A lot of the country does not seem correlated with its neighbors – some highs, some lows but mostly they seem close to the mean value of workers working from home.

Let’s classify counties by how correlated they are with neighbors. The conditions we’re looking for:

  1. Highs, where a county and its neighbors have a high value on average
  2. Lows, where a county and its neighbors have a low value on average
  3. Neutrals, where the averages of neighbors tend towards the mean or what would be expected from a random distribution, so don’t seem to be clustered
  4. Outliers, for counties that don’t fit conditions 1-3 (more on these later)

To find these groupings, we draw upon a geo-statistical cluster and outlier detecting method called Local Moran’s I that allows us to test the distribution of our attribute of interest over geography. Moran’s I is one of many statistical approaches we have been exploring lately by combining CartoDB with the PySAL library (another blog post on that soon).

By applying the Moran’s I algorithm we classify our data similar to the types above. To visualize it, we simply style the counties according to these types.

Below is a map with the clustering overlayed on the choropleth we used above for Work from Home percentages. I did this to help show how the choropleth informs how and why the clusters, neutrals, and outliers are formed. For neutrals, we let them be masked (dark grey over the choropleth) because they are not ‘significant’. Here significant means that their arrangement is consistent with what would be expected by randomly distributing on the map the WFH percentages of each county.

What is exciting about this approach is that the clustering algorithm picks apart which variations in our data are interesting (i.e., is an outlier or part of a cluster) and which may just be random variation relative to neighbors. We now have an automated way of finding areas of interest for future study. That’s pretty exciting!

Let’s walk through some of the observations we can make looking at the outputs:

  1. Broader region extending from across Colorado through Nebraska, South and North Dakota, Montana, Idaho and petering out in eastern Oregon. This is evidently America’s work-from-home region. There’s also a more localized cluster centered west of Lake Tahoe in Northeastern California.
  2. The Southeastern United States has a broad not-working-from-home region that extends up east and west of the Appalachian mountains to West Virginia and Eastern North Carolina. Within this region, you can see that the counties are consistently the lowest shade in our color ramp (all below the mean).
  3. A large part of the country has work from home rates from what you would expect if the counties were placed randomly across the map, so no clusters or outliers are found in these areas.

What we see in #s 1 and 2 is evidence of a clear underlying process that’s driving the clusters in these regions that goes well beyond county borders, and leads us to ask questions about what this process could be.

There is obviously a big shift in work behavior between the WFH and not WFH regions mentioned above. What causes these differences? Perhaps a combination of topography, distance to work, population density, and nature of work? Maybe something else entirely?

The clustering also brought out some subtleties that weren’t evident in our choropleth. While there is variation across a few regions, they can’t be algorithmically told apart from what you would expect for a random arrangement about the mean value, so these ‘not significant’ ones lack a border and are masked by dark grey.

So what are outliers?

Finding Outliers

In the simplification above, I purposefully left out a discussion of the outliers. These are statistical outliers in a spatial sense: they are counties that buck the trend in their local area. That is, they are very dissimilar from their neighbors. For instance, a high work-from-home county adjacent to several low work-from-home counties. There are dozens of counties like this in the northern parts of our work from home cluster.

spatial outlier example

In the map above, the outliers are typically adjacent to regions of high or low values, and sometimes are in an island of non-significant counties. These are areas that are very different from their immediate neighborhood – maybe explained by a county which has a very different economy than those around it.

non-spatial outlier example

The Southern Missouri county (pictured above) that is an normal outlier/anomaly for the dataset looks like a shoo-in to be a spatial outlier. Instead, this county is surrounded by a lot of variability, which means that it is not correlated with its neighbors.

In the north middle regions of the US, there are a number of interesting outliers sitting in the middle of our large work-from-home region. These outliers could indicate a different economy than in the surrounding regions, such as several large cities, change in industry, presence of large institutions like universities, etc. Answers to the causes of this can be obtained by appealing to Workplace Area Characteristics data from the Census.

Outliers can be defined more precisely now:

  1. High outlier: A ‘high’ in a region of lows (on average)
  2. Low outlier: A ‘low’ in a region of highs (on average)

Finding clusters is fun!

We started playing around with this method of finding statistical clusters and outliers across different attributes of the American Community Survey. In the map below, we took a look at the population who are 18 or younger, popularly known as Generation Z. This group is interesting because they are largely made up of the population that hasn’t yet left home. They can give you a lot of insights about how a region will change over the coming years as the Gen Z population transitions into the workforce and move into new homes and apartments nearby.

This time I peeled away the choropleth layer to only show the cluster and outlier analysis. Looking at Generation Z, we see some interesting patterns:

  1. The Appalachians extending up into Maine, Northern Michigan, Minnesota, and Wisconsin, and large parts of Florida have low rates of young people (except in nearby major cities).
  2. There are two geographically large hotspots of Generation Z population: Eastern New Mexico through Western Texas and a huge area centered on Utah.

The technique is useful because it reveals so many patterns that aren’t immediately evident in the choropleth map, it gives a strong sense of areas that are above and below the mean value, and it allows us to make inferences about the patterns. Once we have inferences, we can use techniques such as spatial regression to make predictions.

Where to go next?

We’re working hard to bring these techniques to all CartoDB users so keep your eye on our tutorials, documentation, and of course here on our blog over the coming weeks and months.

If you want to use any of the maps from this blog post, feel free to. Here are some basic instructions for getting them into your stories.

Happy statistical mapping!

CartoDB takes Barcelona by Storm, again!

$
0
0
CartoDB at eFintechshow.png

What a year! We can still feel our presence in Barcelona after MWC 2016 and we keep coming back for more of that great energy. This time, Javier de la Torre will be speaking at eFintech Show. CartoDB is one of the 30 most disruptive Fintech Startups selected by an expert group in a single event.

It is an important conference for the business intelligence community and we are happy to have our CEO speaking on ‘Location, the Missing Dimension in Fintech Data.’ His talk will focus on how a large amount of information is generated daily. Every two days data is generated. Yet, the main problems with data remain — the enormous volumes and its disorder. For this reason big data analytics was born, and it allows organizations to capture, manage, and process the data. This was once unimaginable. Companies that know how to analyze that information and put it into practice may create products and services tailored and customized in real time for each customer.

These companies, among which are many Fintech startups, will be better positioned to succeed in the future since they will know the customer needs and exactly when they need it. Additionally, the world of finance has changed dramatically. In this post-crisis environment, new opportunities and a clear competitive advantage are open to financial firms that can obtain a more nuanced understanding of their risk exposure. Firms can achieve this level of accurate risk management by leveraging advanced spatial analytics and social data. In this context eFintech Show comes in as the first conference and exhibition focused exclusively on presenting the best and the most innovative technological solutions for banking and finance.

Register now to join Javier at this event on March 17 Thursday at 13:00 – 13:15. This event is specifically for technology executives of banking, finances, press, venture capital, analysts, bloggers and entrepreneurs.

Happy data mapping!

Emerging and Established Community Come Together

$
0
0
Bushwick Happy Hour

CartoDB happily moved to Bushwick from Williamsburg last September. Our office is in the middle of the industrial zone, with vast spaces around and graffiti art on the walls, and has ample room for up to 120 people where we love to host events, as well as the outdoor space in the summertime.

However, we also recognize that our tech community is a significant force in the development of the neighborhoods, including Bushwick. As more startups are relocating here together with their employees, we felt it was important to bind together the businesses that were already in the area to the ones that were arriving. We want to build a network of organizations, community members, universities, government, other startups in order not to go against the flow of the neighborhood life, but to become an integral part of it.

As a tech workForce, CartoDB would like to set a positive example, bridge the gap between us and the neighborhoods – by giving before getting and connecting to local organizations, providing services and mentoring opportunities.

That is the principle we would like to start building a community of organizations, first, in Bushwick, then extending to connect with other parts of Brooklyn. We would like to represent Brooklyn’s Tech Force – only one eager participant of many actors of the local neighborhoods’ life.

Bushwick Happy Hour

You can support our effort of building the Brooklyn tech workForce by becoming a part of the neighborhood events. At our location on Moore Street, we recently hosted the first of a series of informal happy hours to start tightening the community and serving as a connector. It was a terrific event, attracting more than 50 local tech/startup/businesses, New York City leaders and some of our fellow mappers. We made a few new friends and showed off an in-progress map of the community as we have found it.

As we build the next events for the Bushwick community and Brooklyn-at-large, we will be reaching out to our readers to participate. If you want to be informed about CartoDB Brooklyn/Bushwick events, drop us a line here and we will keep you informed.

Take a look at our map!

Bushwick Happy Hour

We hope to see you at the next one!

Understanding Airbnb's impact through Census data

$
0
0
Airbnb Impact

Airbnb has become a lightning rod for controversy about the so-called sharing economy. In tight housing markets – in particular, New York City and San Francisco, many have claimed that Airbnb contributes to the displacement of longtime, poorer residents.

While these reports are alarming, none have tried to map the major concern – concentrated neighborhood impact from Airbnb’s heavy concentration in small parts of cities. By leveraging US census data in CartoDB’s Data Observatory, it’s possible to isolate the impact of Airbnb to a the level of just a few city blocks.

Inside Airbnb data is incredibly detailed, to the point where visualizing every point can limit our ability to extract clear information. Our challenge was to visualize the listings in a way that would make clear what their impact was at the local level. A quick glance at a point-based Airbnb map shows that listings are concentrated in certain, generally central, parts of cities, but point-based maps don’t take into account neighborhood characteristics – all points look the same.

We wanted to understand the effect of Airbnb as it is experienced by residents at the level of a neighborhood or even just a block. Below we will work through several techniques for aggregating and exploring the Airbnb data in the context of the American Community Survey (ACS). The ACS provides us with information about the population across the US at the block group level.

We are excited to show the power of these 3rd party datasets and are slowly making them available to CartoDB users in exciting new ways. Watch out for announcements soon, or get in touch if you are interested in taking advantage of our work sooner.

Airbnb units per square mile

Our first analysis of the Airbnb data aggregates the point data into block groups and shows the concentration of listings per square mile. Next, we wanted to use a statistical method called Moran’s I to cluster and highlight areas of statistically significant high or low rates of Airbnb per square mile.

You can see the result in our first map below. While this still does not incorporate any neighborhood characteristics, it provides great visibility of high-density clusters of Airbnb running from the Lower East side up most of Central Park in Manhattan, with small clusters in Williamsburg and Clinton Hill in Brooklyn.

Airbnb compared to available housing units

A more nuanced analysis leverages the American Community Survey to look at what percentage of housing units in each block group are listed on Airbnb. Shown in the second map above, this approach reveals very different clusters than the naive analysis of Airbnb per square mile. While Airbnb is very dense in the Upper East and Upper West Side of Manhattan, it is actually a significantly “low” percentage of housing units in those neighborhoods. In contrast to that, the two Brooklyn clusters are much larger, with almost the entirety of Williamsburg and Greenpoint rented out on Airbnb at a high rate.

The reason for these differences is housing density. The Upper East and Upper West sides of Manhattan are very dense; even if they have a lot of Airbnbs, the neighborhood can more easily absorb these units into the huge existing stock of apartments.

Williamsburg and Greenpoint, however, are lower density neighborhoods. Even though the per square mile concentration of Airbnb may not be very high, as a percentage of the available housing it is much greater. Williamsburg and Greenpoint are therefore Brooklyn neighborhoods whose characters may be much more affected by Airbnb.

Manhattan also had several neighborhoods with a large percentage of apartments rented out on the platform: the Lower East Side, East and West Village, and Hell’s Kitchen are also most heavily affected by Airbnb rentals.

The sharing economy and the rental market

While a large percentage of housing units on Airbnb may change the character of a neighborhood, it does not give many clues about what effect Airbnb may have on rental prices in an area.

The simple solution might be to map Airbnb prices across the city. But mapping Airbnb prices in New York doesn’t reveal too many surprises (top map below). Apartments are much more expensive to rent in some of the most desirable residential neighborhoods, in Tribeca, Soho, Flatiron, and just south of Central Park. The cheapest Airbnbs were in more distant neighborhoods at the north end of Manhattan and a ring of Brooklyn from south end of Prospect Park through Crown Heights and Bed-Stuy to Bushwick.

To dig a bit deeper, we compare median Airbnb prices to median rent for every block. We find that clusters of high profitability in Manhattan center around Chinatown, East Harlem, and Manhattan Valley just west of Central Park on the west side. These are neighborhoods where the median Airbnb listing costs 25% of median rent – in other words, to be able to pay your rent by renting on Airbnb just four days of the month is easily possible! Williamsburg was the only large high-profitability cluster in Brooklyn, with a median Airbnb listing able to pay median rent in just five days.

These are neighborhoods where the median Airbnb listing costs 25% of median rent -- in other words, to be able to pay your rent by renting on Airbnb just four days of the month is easily possible!

Low profitability areas such as Crown Heights and Bushwick revealed an interesting pattern, perhaps a gap between desirability for residence and tourism. In very desirable parts of Brooklyn and Manhattan, like the Upper East Side, Park Slope, Cobble Hill, Ditmas Park, and Prospect Heights one would need to rent a median apartment at the median price for ten days or more to make rent. Considering the difficulty of renting the entirety of a month, this would mean it’s exceptionally hard in those areas to justify renting on Airbnb instead of to a long-term tenant.

Living with the sharing economy

For people interested in where Airbnb and its users are having the biggest impact on communities, these analyses provide good starting points. These maps allow us to focus on the relatively small clusters of blocks where high percentages of the available units are being rented on Airbnb, or where a financial incentive to rent out apartments may change the make-up of neighborhoods.

What is exciting about using these methods and data in CartoDB is that it allows us to work across the US all at once. In addition to New York City, we completed these same analyses for many other cities in the US, including San Francisco, Los Angeles, Nashville, Austin, Washington DC, Boston, Portland, Seattle, San Diego, and Chicago. Use the map below to search for any of those cities and see the Median Days to Make Rent with Airbnb.

If you are interested in learning about how location intelligence plays an essential role in the real estate sector and in data analysis, please join our webinar next Thursday, March 17.

Happy data mapping!


Intro to CartoDB Mobile API

$
0
0
CartoDB Mobile

My name is Jaak and I am the Head of Mobile of CartoDB’s newest division, Mobile. As CartoDB embarks on the innovating the field of mobile development, we wanted to keep you up-to-date on all the latest and greatest developments happening here. I’d like to start the Mobile SDK series by introducing the native mobile capabilities of CartoDB on both the general and technical levels.

Like the web, mobile has several map engines and APIs to choose from. The first implementation question is always: Which map API should I use?

Major mobile platforms already provide a map engine as part of their own API: iOS has MapKit (with Apple Maps), and Android has Google Maps Mobile API. Google Maps API is also available for iOS, WP, and Xamarin to make the choice even more difficult.

But there are third party options like CartoDB SDK (formerly known as Nutiteq) with built-in map APIs (similar to Google’s) that are nice and free.

However, like the other major mobile SDK offerings, they have some limitations:

  • They do not work offline.
  • Basemaps require a constant online connection. Some basemaps have certain caching availability — but it is very limited.
  • They do not allow you to change basemaps to anything else, other than their own. There is no customization of basemaps.
  • There is no whitelabel option, the Google logo will always appear in the corner.
  • There are usage restrictions for some cases. (E.g. fleet management and navigation).
  • The API is made for end-user app cases, so map data editing requires a lot of ugly hacking, if made possible at all.
  • No advanced 3D features.

YET there are HUGE advantages with the CartoDB mobile API.

First, it will already be pre-integrated with your CartoDB server and account — just create maps using the CartoDB Editor, and the maps work very similarly on your mobile device: same basemap, overlay map layers, and extra overlays. Second, it has nice native mobile-specific features like offline basemaps, and 2.5D map rotation and tilting. It is stand-alone fully functional SDK library, so it does not require Google Services to work.

Happy data mapping!

Bend Time and Distance Isolines to Your Business Needs

$
0
0
Deep Insights

When calculating distances on maps, often we simply compute the ‘as the crow flies’ distance: the length of the straight line that links our origin and destination. For human-centric uses, this is rarely the correct approach as it ignores the fact that we cannot walk on water or pass through solid walls. Instead, we have to navigate among obstacles such as buildings and one-way streets. For many tactical business decisions, we need to use the distance someone would
actually have to walk or drive to get from A to B, which can dramatically change the results.

To solve these problems, we introduce isolines. Isolines are a way of measuring how far a person could realistically go from a location in a given time with all the buildings, streets, and other obstacles in their way taken into account. Since they can go in many directions, the lines are the boundary of the surface of all possible end points.

CartoDB’s Time and Distance Isolines is a strategic approach to routing that saves time and resources. This technology offers forward and reverse measurements of distance to accurately run calculations up to the minute and meter level of detail. Determine how long it takes to go to various location points via walking or driving using spatial data and the area covered by a specific distance to empower your business analysis and intelligence, instead of just knowing raw distance.

If augmented with other data such as age, income, or ethnicity, time and distance isolines help you understand behaviors, patterns, and trends. Click on any stop along the New York MTA’s ‘L’ line and our powerful interactive dashboard will refresh with real-time trade areas showing all key demographic aspects related to target users that fall within trade areas to be analyzed. This is essential for choosing whether to open a new business or branch, setting prices, optimizing your distribution channels, or knowing your target audience, and making strategic location-based decisions, among others.

Time and Distance Isolines functionality is in beta, and can currently only be used with Nokia HERE basemaps. In using the service, you agree to the HERE terms and conditions.

Contact us if you want to start using the CartoDB's Time and Distance Isolines:

Contact us

Start calculating trade areas on demand using tons of your business data for analysis and view it as one point or a polygon for optimum data and geospatial analysis.

Happy data analyzing!

Transform Millions of Data into Actionable Insights

$
0
0
CartoDB's Deep Insights

Last month at MWC 2016 we made two announcements that show just how strong CartoDB’s presence is in the location intelligence world. One of the breaking news items revealed was our big data solution, Deep Insights!

Deep Insights technology enables the visualization, dynamic filtering, and exploration of large location datasets at groundbreaking scale in a single
all-in-one visualization solution. CartoDB for Deep Insights empowers everyone to instantly visualize big data patterns and trends that otherwise would not have been identifiable.

Join this 45-minute webinar to learn how CartoDB’s Deep Insights helps any organization discover patterns, trends, and connections:

Register

In this webinar you will learn:

  • How to manage and analyze your data in an effective way with the main objective to help you to make better, faster decisions.
  • How to create the dashboard based on your business needs. You can choose from multiple criteria such us, total population, average income, etc.
  • Results and tangible benefits.
  • Examples of different uses cases that already use CartoDB’s Deep Insights.

This webinar will be hosted by Jaime de Mora, EMEA Sales Director at CartoDB.

Happy data mapping!

Welcome, Explorers! CartoDB Expands our Origami Practice

$
0
0
CartoDB Team

CartoDB began 2016 in a disruptive way by expanding location intelligence around new industries, countries, and devices. To continue to disrupt this space we need amazing explorers to join our team! We still have about 10 open positions for different departments, such as Marketing, Sales, and Product, to work in our New York and Madrid offices.

We’re growing! Here are some of our fresh new faces:

Rafael Martin Soriano

RafaelMartinSorriano

Before CartoDB Rafa was managing SeedRocket— a leading startup accelerator in Spain, at Campus Madrid (Google space for entrepreneurs). As an account executive here, Rafa helps our customers bring location intelligence into their organizations.

“I’ve known CartoDB for a long time but I got more deeply acquainted because of Sergio and Miguel. They’re both mentors at SeedRocket — where I first saw CartoDB in action, I thought it was amazing. CartoDB is everything I hoped for:
a technology leader, a startup mindset, and a diverse international team”.

Apart from work, Rafa loves sports, music, and spending time with friends, though he’s quick to say: “Whenever I’m working, I try to see work as playing. I’m trying to have fun at work and most of the time I get it, so I don’t have to have this feeling of unplugging. I try to do work that I like and I think I’m gonna have fun at CartoDB”.

Ramiro Aznar Ballarin

Ramiro Aznar

Ramiro is our newest solutions engineer, a role in which he develops
proofs-of-concept using our customer’s’ data, helping them answer their biggest “where” questions.

Ramiro comes from a participatory urban planning studio where he managed GIS data and visualizations, and for ONGAWA, an organization similar to Engineers without Borders, supporting their work in Peru, Tanzania, Nicaragua, and Mozambique.

“I have a background in biology and then I started using GIS to become a GIS technician, and now I’m a GIS programmer. I am eager to put all of this in practice to helpCartoDB users!”

Just like the mission of CartoDB, Ramiro also enjoys transferring all his mapping knowledge and techniques to the people: It’s gratifying to see people making useful maps and that’s what I did in Nicaragua (I also learned from them, For example, how to use the “A tool” to make contour lines, a sustainable practice
for plowing). CartoDB is the best place in the world to help bring mapping to more people.

Ramiro also loves cooking (carrot cake is his secret weapon), dinosaurs and origami (here is his hand-made original Cartophante!).

Rafael Porres Molina

RPorres

Rafa comes to us from Amazon Web Services in Dublin, and he has experience as a consultant for telecom and financial services firms. At CartoDB, he ensures our product is responsive and reliable, no matter how much traffic we get.
It’s a big job!

“What attracted me is our work in the open source community, and the attention to the user community. I love that so many people use their free CartoDB accounts to do amazing things. I definitely relate to the scale and mission of CartoDB.”

His favorite place in the world is Madrid and beyond work he is an avid bike commuter.

“I bike to work every day and I used to be an active part of a Critical Mass movement in Madrid. I’m also a bass player of The Muffin Band.”

Welcome to the team!

Improve your (musical) timing with CartoDB Time Isolines and derive the best data insights

$
0
0
SXSW Concerts Selector

Planning where to strategically go next can be a daunting task. There are numerous factors to consider, like time. At major festivals, like SXSW, planning how long it takes to move from place to place doesn’t have to be so intimidating anymore. Time and Distance Isolines (also called isochrones), a new feature coming to the CartoDB platform, allows you to measure time directly through SQL statements. Time isolines provides insight and analysis on how long it takes pedestrians to move from one hotspot to the next, making tactical business planning a cinch.

Isochrones

Time and Distance Isolines is a visual way of showing how much time you need to get from a point to all surrounding places, by forming lines with points related to the same time. Businesses, as well as everyday people, can harness their power by determining through data the most efficient way to spend time. Because as we all know, time is money!

In this visualization, we focused only on musical events and concerts at SXSW, because you don’t often get the opportunity to bring music to your map! This data-driven visualization will show you how much time it takes to go from each venue and hotel in relation to other entertainment spots.

Scraping SXSW agenda data

The SXSW website has a nicely made agenda of events that screams, “Make a map out of me!”

However, not all data is created equal and sometimes you have to get your hands a little bit dirty to gather the data you want to see on your map. In this case we will scrape the website data, using a few simple lines of node.js and a few node libraries (namely request, cheerio, underscore, moment…): basically the ‘node survival kit for scraping.’

The whole process and data are visible in this repo: sxsw-schedule-csv.

After the scraping process, uploading this CSV data into a PostgreSQL table is a breeze using the CartoDB Import API.

This is where an important part of the magic happens: the addresses of the venues are automatically geocoded in the process, converting raw data into geo-data that can instantly be put on a map with the CartoDB Editor.

SXSW venues on the map

For further information on geocoding using the SQL API, see the Geocoding functions section of the Data Services API docs.

Creating isolines for each hotspot

CartoDB nicely abstracts the creation of time isolines geometries into a SQL function :

cdb_isochrone(the_geom, ‘walk’, Array[60, 120, 180])

The_geom being the center geometry of the isoline, ‘walk’ or ‘car’ the isoline mode, and the third array is a list of time intervals that will be used to create the actual lines.

But there are many more options : have a look at the documentation!

We want to show time isolines for each venue when the user clicks it on the map. We could run a dynamic query every time with the cdb_isochrone function, but as the service is subject to quota limitations, we’re going to prepare our geometries ahead using our sxsw_venues table:

We’ll also have a nearby venues table prepared in advance to show a synthetical list of the closest events from a given venue. To do that we’ll determine, for each hotspot, if it can be placed inside each of the isoline geometries that we previously created (using a simple ST_Contains).

Grabbing artists data from Spotify

Wouldn’t it be better if the user could listen to the artists directly from the map? Our friends at Spotify have an amazing REST API that we used to match SXSW artist names to song samples and artist visuals.

We used two API endpoints:

  • Search: to match an artist name with an artist entity in the Spotify database;

  • Artists: to get artists details

The process uses a node.js script and the CartoDB Import and SQL APIs through the CartoDB node library. This library is currently under heavy improvements, so definitely keep an eye on it!

Have a look at the node script.

Creating the app

We put all of that together using CartoDB.js as the ‘nerve center’ of our app.

The meaty parts of the code are visible here: cartodb-sxsw-concerts on Github

You can see that we are making heavy use of templating, using underscore.js. This truly brings flexibility to our app by allowing dynamic HTML elements, dynamic SQL requests to CartoDB SQL APIs (we are ‘injecting’ venue ids, selected day, etc), and CartoCSS. Allowing us to quickly edit styles without using CartoDB Editor.

Happy data mapping!

Getting Started with CartoDB Native Mobile SDK

$
0
0
CartoDB Native Mobile SDK

Welcome to the second blog post of our CartoDB mobile series! I’m back to advise you on how to get started with CartoDB’s native mobile SDK.

Currently we offer the well-proven Nutiteq Maps SDK version 3.x as the CartoDB native mobile SDK. One of the first things you’ll notice is that the Nutiteq SDK is not specific or technically dependent on CartoDB, due to its long history. So let’s get started building your first CartoDB mobile app!

Here are three easy steps to make native apps for your mobile devices with CartoDB:

  1. Register an account at developer.nutiteq.com
    As of now, a cartodb.com account is not enough. You’ll need to re-register at the developer.nutiteq.com site, as well as register your app there. You can select the Free Lite package to get started.
  2. Develop the app
    See the links below.
  3. Share the app!
    Publish your app to the app store or share privately within your enterprise.

For development you’ll need the following resources:

The guidelines for Nutiteq Maps SDK are, as of now, at the developer.nutiteq.com site. Soon you’ll find them in CartoDB’s developer documentation. The guidelines include code samples in several languages, depending on which platform your app targets: Objective-C for iPhone/iPad (iOS), Java for Android, and C# which is usable in both Xamarin (which is cross-platform tool for Android and iOS) and Windows Phone.

Some developers are already using the new Swift language to develop iOS apps, and our SDK supports this too. However we don’t have specific code samples for it, yet. The API is the same as ObjectiveC, so feel free to use ObjC documentation.

Happy data mapping!

Join CartoDB at FOSS4G Argentina!

$
0
0
CartoDB at FOSS4G Argentina

CartoDB is really excited about the upcoming FOSS4G-Argentina! Held in Buenos Aires, Argentina from April 5-9, and organized by the Instituto Geográfico Nacional of Argentina, CartoDB is a proud sponsor of this event that brings together decision makers from government institutions, IDERA members, universities, and state employees.

Our very own Jorge Sanz, an active member of the geospatial open-source community, Madrid’s solutions engineer extraordinaire, and support team lead, is taking part in this great event.

His workshop an ‘Introduction to CartoDB’ is on Tuesday, April 5 from 8:30 AM to 1 PM (ART). In this session you will learn how to use our Editor to publish your geospatial data in a simple way, and go even further in your data analysis.
Get your tickets now, you won’t want to miss this!

He will also be giving a talk on Thursday, April 7 at 4:30 PM (ART) called ‘Democratizing the publication of geographic data.’ There you will discover the multiple uses of CartoDB’s location intelligence and the unlimited applications to several industries. Come and enjoy this talk, no registration necessary.

If you want to join Jorge or set up a meeting don’t hesitate to contact us.

Happy data mapping and see you in Argentina!


CartoDB goes BIG!!!

$
0
0

What is constantly increasing its volume, variety, and velocity and greatly effects your business’s bottom line if not properly utilized? Big Data!

At CartoDB we know just how important the amount of data that you collect, how you understand and analyze that data, and how that analysis turns into better cost effective decision making for your organization is.

That’s why we want to celebrate CartoDB’s solution for Big Data! Our technology allows you to filter and drill-down hundred of millions of data points and provide actionable analysis like never before. That’s why we’ve dedicated a whole week to show you the benefits of big data analysis. From April 4 - 10, discover:

  • What Big Data is
  • Why Big Data matters
  • How to handle large data volumes
  • And why Location Intelligence is essential for big data analysis

We will be featuring blog posts on our big data solution, a white paper, a webinar about Deep Insights, an exclusive event, and much, much, more!

How can you stay in the loop on all things Big Data and participate in CartoDB’s Big Data week:

  • Follow us on our social media networks like Twitter, LinkedIn, Facebook, Medium, and Slideshare. We will be posting key information, tips, facts, and insights for big data, under the hashtag #CartoDBforBigData

  • We know you already do, but … don’t forget to check our blog! We will be writing interesting and informative blog posts about how visual analysis can help your business intelligence to performance in the big data world.

  • Our EMEA Sales Director, Jaime de Mora, is giving a 45-minute webinar next Thursday, at 1 p.m EDT: Deep Insights–Transform Data Into Actionable Insights.
    He will show you how to manage and analyze your data, how to create a widgets’ dashboard based on your business needs, and the results and tangible benefits that you can get from it.

We have more planned for you, so don’t forget to mark your calendars!

Happy Big Data mapping!

RSEI in motion, and a few tips on Torque

$
0
0
'

The US Environmental Protection Agency (EPA) has an incredible amount of data available to people, and the Risk-Screening Environmental Indicators (RSEI) microdata is a whopper of a dataset. RSEI is an air model which organizes results from environmantal data collected. Specifically, it tallies the quantities of release and transfer of chemicals, and is used to rate the air you breathe. The EPA is re-releasing this data by way of Amazon’s public data portal for anyone to use and visualize.

We took a look at the data and decided our Torque engine would be a great way to show how the readings have changed over time. With 27 years of data for 400 listed chemicals over 45,000 facilities, there’s a lot to show. We narrowed it down to a small area (SF Bay Area) and Joined the RSEI 1km grid to the Aggregated microdata to produce the map below, showing 1992-2014 changes in toxic concentrations.

What we have done exactly here is assign the ‘toxconc’ value to each grid cell for the 22 years of data we are using and merged that with the year of data, then visualized them, colored over a range. Since our values went from .000004 to 13 million, we used a custom torque aggregation function that mapped the values across a ramp:

-torque-aggregation-function:"avg((log(toxconc)+5.36)*100.0/12.49)";

The parameters for this function are:

avg((log(toxconc)+ (-min_toxconc) )*100.0/(max_toxconc-min_toxconc)<br>

The log transforms our values from .000004 to 13 million to -5.36 to 7.13. Then we simply shift the origin and scale so each value is between 0 and 100.

This gave us a balanced look at the data in the grid and made it possible to understand the nuances. Using chroma.js, we developed a color ramp based on a perceptual color space that assigned a color to each value to give it a continuous tone.

Using the power of CartoDB, we’ve brought the history of these 748K points into and can see the story of the air we breathe much more clearly.

Happy mapping (and breathing!)

Consuming CartoDB with Nutiteq SDK

$
0
0
Nutiteq SDK

This is our third post of CartoDB’s mobile blog series. If you’ve missed any of our previous posts you can view them here and here. In this edition, we’ll continue to build on the information from the earlier posts and present new and compelling details on how you can consume the same CartoDB platform and engine that you love with the Nutiteq SDK.

Once you have the basic app running, you might wonder: how exactly can you consume CartoDB via the SDK? If you have already tried the basic app with the SDK then you may have discovered that there are no CartoDB specific methods. This will change with the next SDK updates. However, you can already consume a lot of the CartoDB platform:

  1. Use raster map tiles, just define the tile URL for RasterTileLayer.
  2. To get interactivity (object click data) you can use the UTF Grid. This uses both raster map tiles and json-based UTF tiles. On the CartoDB side, you need to enable and define tootlips.
  3. You can also load data as vector tiles. The CartoDB platform has beta support to provide Mapbox Vector Tile (MVT) format tiles, which Nutiteq can render on the client side. To use them you also need CartoCSS styling. This provides many advantages, from zooming and rotation of maps to data that can be packaged as offline using mbtiles.
  4. Loading GeoJSON vector data provides a better look and feel. You can add interactivity (object click actions) and client-side styling of the objects. Also, you can render vector objects as billboard Markers, which “stand up” in 3D view manipulations. For vector data, CartoDB provides an SQL API and app that can load whole tables and render on maps. You can also use client-side simplification and clustering.
  5. If the data table is large, more than a few thousand objects, then loading whole tables can be too much for the mobile client. Instead, you can use smart, on-demand, view-based loading of vector data. Similar to the SQL API and GeoJSON format used from the CartoDB side, the SDK will use custom vector data sources to consume it. This loads an only a selected visible area of the map. The app can control zoom levels, server-side generalizations, or simplification can be applied.
  6. For point-based time-series visualizations, you will want to use Torque. This provides animated rendering, and the latest development version of the SDK has a special layer type for it. From the API point of view, it uses the SQL API and CartoCSS styling, but Torque has an additional time control method.

Before consuming CartoDB via the SDK, you need to know specific endpoints — tile URLs, CartoCSS, SQL API endpoints, etc. The best way to get these endpoints is to use the CartoDB Maps API request. This part is not yet done by the SDK, so currently your app needs to handle it itself. It is not very difficult: just JSON+HTTP requests are needed.

Last, but not least — we have just published the CartoDB integration code samples for Advanced Map samples for Android. See:

  • CartoDBSQLActivity for vector data (SQL API)
  • CartoDBTorqueActivity for Torque tiles
  • CartoDBUTFGridActivity for raster tiles and UTFGrid
  • CartoDBVectorTileActivity for vector tiles with CartoCSS styling

Just remember that for these you’ll need to use the latest development version(snapshot) of the SDK.

Figure below summarizes various ways how the SDK can consume CartoDB data:

CartoDB Maps SDK

Happy mobile mapping!

Subdivide All the Things

$
0
0

One of the things that makes managing geospatial data challenging is the huge variety of scales that geospatial data covers: areas as large as a continent or as small as a man-hole cover.

The data in the database also covers a wide range, from single points, to polygons described with thousands of vertices. And size matters! A large object takes more time to retrieve from storage, and more time to run calculations on.

The Natural Earth countries file is a good example of that variation. Load the data into CartoDB and inspect the object sizes using SQL:

SELECTadmin,ST_NPoints(the_geom),ST_MemSize(the_geom)FROMne_10m_admin_0_countriesORDERBYST_NPoints;
  • Coral Sea Islands are represented with a 4 point polygon, only 112 bytes.
  • Canada is represented with a 68159 point multi-polygon, 1 megabytes in size!

Countries by Size in KB

Over half (149) of the countries in the table are larger than the database page size (8Kb) which means they will take extra time to retrieve.

SELECTCount(*)FROMne_10m_admin_0_countriesWHEREST_MemSize(the_geom)>8192;

We can see the overhead involved in working with large data by forcing a large retrieval and computation.

Load the Natural Earth populated places into CartoDB as well, and then run a full spatial join between the two tables:

SELECTCount(*)FROMne_10m_admin_0_countriescountriesJOINne_10m_populated_places_simpleplacesONST_Contains(countries.the_geom,places.the_geom)

Even though the places table (7322) and countries table (255) are quite small the computation still takes several seconds (about 30 seconds on my computer).

The large objects cause a number of inefficiencies:

  • Geographically large areas (like Canada or Russia) have large bounding boxes, so the indexes don’t work as efficiently in winnowing out points that don’t fall within the countries.
  • Physically large objects have large vertex lists, which take a long time to pass through the containment calculation. This combines with the poor winnowing to make a bad situation worse.

How can we speed things up? Make the large objects smaller using ST_Subdivide()!

First, generate a new, sub-divided countries table:

CREATETABLEne_10m_admin_0_countries_subdividedASSELECTST_SubDivide(the_geom)ASthe_geom,adminFROMne_10m_admin_0_countries;

Remember to register the table with CartoDB, so that the editor interface can pick it up:

SELECTCDB_CartodbfyTable('ne_10m_admin_0_countries_subdivided');

Now we have the same data, but no object is more than 255 vertices (about 4Kb) in size!

Subdivided Countries by Size in KB

Run the spatial join torture test again, and see the change!

SELECTCount(*)FROMne_10m_admin_0_countries_subdividedcountriesJOINne_10m_populated_places_simpleplacesONST_Contains(countries.the_geom,places.the_geom)

On my computer, the return time about 0.5 seconds, or 60 times faster, even though the countries table is now 8633 rows. The subdivision has accomplished two things:

  • Each polygon now covers a smaller area, so index searches are less likely to pull up points that are not within the polygon.
  • Each polygon is now below the page size, so retrieval from disk will be much faster.

Subdividing big things can make map drawing faster too, but beware: once your polygons are subdivided you’ll have turn off the polygon outlines to avoid showing the funny square boundaries in your rendered map.

Happy mapping and querying!

What exactly is Big Data?

$
0
0
What is big data?

As we announced last week, today is the first day of Big Data week at CartoDB!

How can we begin to better understand and explain what big data is? Everyday we read in newspapers and social media the words ‘big data’ as it applies to businesses and systems of analysis and data aggregation. While it may seem like a marketing ploy, a critical approach to understanding what exactly big data is can provide new meaning to business intelligence.

In plan terms, big data is described as a situation where the amount, expanse, and type of data exceeds an organization’s storage or analysis capacity for accurate and efficient decision making.

An example of this is when you have a large dataset with millions of points and several columns that updates to include more data on the hour.
Additionally, this dataset includes various, disparate varieties of data - like images, drone data, or mobile ad clicks. That simply is big data. However, the real understanding of big data comes from being able to distinguish from all that data what is relevant and useful.

The true value lies not just in having it, but in harvesting it for fast, fact-based decisions that lead to real business value. The implementation of a technology like CartoDB’s Deep Insights can take you from the broad definition of big data to a narrow analysis that allows for the best business insights.

Experts say that 80% of data has a spatial component. However, organizations have been using only a subset of their data, or they are constrained to simplistic analysis because the sheer volume of data overwhelms their platforms. What good is it to collect and store terabytes of data if you can’t analyze it in full context, or if you have to wait hours or days to get results to urgent questions?

CartoDB is ready to visualize and analyze hundreds of millions of data points. CartoDB for Big Data is a business intelligence tool that allows the visualization and drill down of hundreds of millions of points, adding filters to the data based on predetermined dimensions, and spatio-temporal criteria.

Join us and discover even more about it and how using location intelligence can improve your business intelligence. Don’t forget to sign up for our webinar, this Thursday, April 7 on Deep Insights, hosted by Jaime de Mora!

Happy big data mapping!

Viewing all 820 articles
Browse latest View live