CartoDB has an excellent and easy-to-use synced tables feature, that reads data from a public URL on a regular schedule and keeps an associated CartoDB table up-to-date. However, synced tables require the data to be publicly available so CartoDB can read it, and that the destination table get over-written at each update.
For some users, these limitations are not acceptable–their data is not public or they want to continually append new information to their table.
Fortunately, it’s easy to create your own live data tables in CartoDB, updated on your own schedule, from your private data, using the ogr2ogr tool from the open source GDAL library. The support for CartoDB in GDAL has gotten a lot better recently, so use the latest version from SVN, or the upcoming GDAL 2.0 release.
Initial Load
You can load a fresh table directly into CartoDB using ogr2ogr like this (I’m loading my trusty 1:250K map grid for British Columbia):
ogr2ogr \
--config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \
-t_srs EPSG:4326 \
-f CartoDB \
"CartoDB:pramsey" \
g250.shp
The commandline syntax of ogr2ogr is not the most obvious thing in the world.
- The API key is required to allow write access into the database.
- The
-t_srs EPSG:4326
option tells ogr2ogr reproject the data to geographic coordinates, which is required for data to be published via the CartoDB Editor. - The
-f CartoDB
option tells ogr2ogr to use the CartoDB driver for writing. - The
CartoDB:pramsey
entry is the destination data, in this case, thepramsey
account at CartoDB. - The
g250.shp
entry is the source data.
Once the command has run though, the data are available in the CartoDB dashboard, ready for mapping. (It may take a minute for the dashboard to recognize the new table, be patient, and hit reload.)
Refreshing the Data
Once you’ve loaded and styled your data, and published your map, you want to keep the map up-to-date with the latest data. (Remember, if your data are public, then the easiest way to do this is with a synced table).
If you’ve read the ogr2ogr manual, you may want to use the -overwrite
option: don’t.
The overwrite option will drop the table, and then create a replacement. For some applications this would be fine, but for CartoDB it means the associated visualizations will also get dropped. For that reason we want to leave the table in place, but simply remove the data inside it before loading new data.
First, remove all the existing data. We do that by sending the
TRUNCATE
command directly to the SQL API to quickly delete all the records in the table.key=abcdefghijabcdefghijabcdefghijabcdefghij sql=`echo "TRUNCATE TABLE g250" | tr ' ' +` curl "http://pramsey.cartodb.com:80/api/v2/sql?api_key=${key}&q=${sql}"
Then, load new data into the table. Now that the table is empty, we can run ogr2ogr in
-append
mode to put the new records in place.ogr2ogr \ --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \ -append \ -t_srs EPSG:4326 \ -f CartoDB \ "CartoDB:pramsey" g250.shp
Now your table (and all the maps that depend on it) is full of the latest data, and you didn’t have to place the raw data on a public URL to make it happen.
Appending new Data
Sometimes applications with real-time data will not need to simply replace existing data, but rather add new data as it comes in. You can do this by just running ogr2ogr in -append
mode without first running the TRUNCATE
command.
The table will grow and grow and grow as you add data, and you may want to delete older data. If your raw data has a time stamp, you can use that as a key to delete old data. If it doesn’t, you can alter your table after import, to include a timestamp field that holds the date the data was loaded.
First, load your initial raw data.
ogr2ogr \ --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \ -t_srs EPSG:4326 \ -f CartoDB \ "CartoDB:pramsey" g250.shp
Then, add a time column with a default value, so that it gets filled in automatically when new data are inserted.
key=abcdefghijabcdefghijabcdefghijabcdefghij sql=`echo "ALTER TABLE g250 ADD COLUMN ts TIMESTAMP DEFAULT now()" | tr ' ' +` curl "http://pramsey.cartodb.com:80/api/v2/sql?api_key=${key}&q=${sql}"
Then, run your import in
-append
mode to see that new columns are timestamped with the time of import.ogr2ogr \ --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \ -append \ -t_srs EPSG:4326 \ -f CartoDB \ "CartoDB:pramsey" g250.shp
Finally, run a scheduled job to delete all records you consider too old to map. For example, 1 week old.
key=abcdefghijabcdefghijabcdefghijabcdefghij sql=`echo "DELETE FROM g250 WHERE ts < now() - '1 week'::interval" | tr ' ' +` curl "http://pramsey.cartodb.com:80/api/v2/sql?api_key=${key}&q=${sql}"
That’s all there is to it! The GDAL 2.0 release is expected in the next month or so, at which point binary builds for Windows and other platforms will be available. But for now, pull down the source, build, and start syncing your data with ogr2ogr!