Quantcast
Channel: CARTO Blog
Viewing all articles
Browse latest Browse all 820

Subdivide All the Things

$
0
0

One of the things that makes managing geospatial data challenging is the huge variety of scales that geospatial data covers: areas as large as a continent or as small as a man-hole cover.

The data in the database also covers a wide range, from single points, to polygons described with thousands of vertices. And size matters! A large object takes more time to retrieve from storage, and more time to run calculations on.

The Natural Earth countries file is a good example of that variation. Load the data into CartoDB and inspect the object sizes using SQL:

SELECTadmin,ST_NPoints(the_geom),ST_MemSize(the_geom)FROMne_10m_admin_0_countriesORDERBYST_NPoints;
  • Coral Sea Islands are represented with a 4 point polygon, only 112 bytes.
  • Canada is represented with a 68159 point multi-polygon, 1 megabytes in size!

Countries by Size in KB

Over half (149) of the countries in the table are larger than the database page size (8Kb) which means they will take extra time to retrieve.

SELECTCount(*)FROMne_10m_admin_0_countriesWHEREST_MemSize(the_geom)>8192;

We can see the overhead involved in working with large data by forcing a large retrieval and computation.

Load the Natural Earth populated places into CartoDB as well, and then run a full spatial join between the two tables:

SELECTCount(*)FROMne_10m_admin_0_countriescountriesJOINne_10m_populated_places_simpleplacesONST_Contains(countries.the_geom,places.the_geom)

Even though the places table (7322) and countries table (255) are quite small the computation still takes several seconds (about 30 seconds on my computer).

The large objects cause a number of inefficiencies:

  • Geographically large areas (like Canada or Russia) have large bounding boxes, so the indexes don’t work as efficiently in winnowing out points that don’t fall within the countries.
  • Physically large objects have large vertex lists, which take a long time to pass through the containment calculation. This combines with the poor winnowing to make a bad situation worse.

How can we speed things up? Make the large objects smaller using ST_Subdivide()!

First, generate a new, sub-divided countries table:

CREATETABLEne_10m_admin_0_countries_subdividedASSELECTST_SubDivide(the_geom)ASthe_geom,adminFROMne_10m_admin_0_countries;

Remember to register the table with CartoDB, so that the editor interface can pick it up:

SELECTCDB_CartodbfyTable('ne_10m_admin_0_countries_subdivided');

Now we have the same data, but no object is more than 255 vertices (about 4Kb) in size!

Subdivided Countries by Size in KB

Run the spatial join torture test again, and see the change!

SELECTCount(*)FROMne_10m_admin_0_countries_subdividedcountriesJOINne_10m_populated_places_simpleplacesONST_Contains(countries.the_geom,places.the_geom)

On my computer, the return time about 0.5 seconds, or 60 times faster, even though the countries table is now 8633 rows. The subdivision has accomplished two things:

  • Each polygon now covers a smaller area, so index searches are less likely to pull up points that are not within the polygon.
  • Each polygon is now below the page size, so retrieval from disk will be much faster.

Subdividing big things can make map drawing faster too, but beware: once your polygons are subdivided you’ll have turn off the polygon outlines to avoid showing the funny square boundaries in your rendered map.

Happy mapping and querying!


Viewing all articles
Browse latest Browse all 820

Trending Articles