Let’s imagine the following scenario…
There are five geostationary satellites around the earth and you need your map to automatically pan to the center of their locations to know the rough region of their coverage. On a number line this isn’t a very hard problem since it’s like finding a center of mass or mean for some arbitrary quantity :
Latitude works this way, but longitude is different.
Why Longitude is Different
What makes it difficult for longitude is that points that are actually very close can have a very large range in the coordinate system we commonly use. Take the following example: Tonga is located at 175.2° W and Tuvalu is at 179.2° E. How far apart are they? Well, it should be 5.6° longitude since they are 4.8° and 0.8° from 180° W/E, respectively. But since we normally view longitude as a number line [-180°, 180°] where East is positive and West is negative, these two points are
apart! Which answer is correct? Well, they both are, but this can cause some annoying problems if we were to devise a solution to finding the center of points on a map. In our example with Tonga and Tuvalu, the first solution would be well zoomed into the islands, whereas the second would be zoomed out enough to encompass 354.4° of the globe.
Still working with Tonga and Tuvalu, what would be the mean location of these two countries? Mean is sum / number of points, so we get
or 2° to the east of the prime meridian. That seems odd because that’s on the other side of the world.
Since we are getting results that seem a little nonsensical, there must be a clearer solution to our problem.
Finding another way
The problem here lies in viewing longitude as a number line where the end points are far apart, when in fact the end points represent the same physical location. One way to get around this difficulty involves mapping the line segment to a two-dimensional ring of radius one (a unit circle). This is convenient because now -180° and +180° lie at the same point in this new two-dimensional coordinate system: \((-1,0)\).
Below is an image of the satellite problem mentioned above. Five satellites are arranged around the earth. The coordinate system on the left represents the longitude as we normally measure it; the coordinate system on the right is the transformed longitude with degrees at appropriate locations to show the mapping. As you can easily see on the map, taking the normal mean of longitude (the red ×) gives a value that’s way off compared to the correct value (the gold star). If the coordinate system were just a number line without periodic boundary values, though, you could balance the line on your finger at the red ×.
The transformation that helps us to find the gold star is the following:
where the index means that it is the entry in the data table (of entries).
To find our center of points, we now need to average over all the and (which we’ll call and ), and then find the angle between them. The angle can be calculated by finding the inverse tangent. To be careful about which quadrant is chosen, we will use atan2, the two-argument inverse tangent that preserves the signs of the inputs.
This calculation for longitude, and the one mentioned above for latitude, can be easily calculated in CartoDB by placing the following SQL command in the SQL editor. The values calculated are: avg_lon
based on the discussion just above, avg_lon_naive
based on a straight average of longitude, and avg_lat
the average latitude.
Data Table
countrylatlon------- -------- ---------
Tonga-21.1333-175.2Tuvalu-8.53333179.2167
SQL
SELECT180*atan2(s.zeta,s.xi)/pi()ASavg_lon,s.avg_lon_naiveASavg_lon_naive,s.avg_latASavg_latFROM(SELECTavg(sin(pi()*lon/180))ASzeta,avg(cos(pi()*lon/180))ASxi,avg(lon)ASavg_lon_naive,avg(lat)ASavg_latFROMpacific_islands)ASs
Aside: If you are performing this calculation over a large number of points, the SQL function sum()
will probably be faster than avg()
. Since tangent is defined as opposite / adjacent, any common multiples cancel. This means that the denominator used in averages will cancel for both and , so avg()
can be replaced by sum()
for those calculations.
Output
avg_lonavg_lon_naiveavg_lat------- ------------- -------
-177.9922.00833-14.9333
The result for avg_lon
makes sense because it is in between Tuvalu and Tonga. avg_lon_naive
is just wrong. Notice that we have to convert degrees to radians and then back in the process. Also notice that the two calculated longitudes are 180° apart for the two point case. For a larger dataset, the avg_lon_naive
result becomes more nonsensical.
Using the dataset linked at the bottom, you can see the data visualized on the map. It’s abundantly clear that the naive approach to finding the mean longitude does not work.
Output
avg_lonavg_lon_naiveavg_lat------- ------------- -------
176.61436.6444-11.4833
Weighted Center of Points
These results can be generalized to a weighted center of points. If the column you want to weight by is , then we get
Since tangent takes the ratio of the arguments, the denominators would cancel, and our equation simplifies in a way to make it easier for the computer!
In SQL, this equation would look like:
SELECT180*atan2(s.zeta_w,s.xi_w)/pi()ASavg_w_lon,s.avg_w_lon_naiveASavg_w_lon_naive,s.avg_w_latASavg_w_latFROM(SELECTsum(w*sin(pi()*lon/180))ASzeta_w,sum(w*cos(pi()*lon/180))ASxi_w,sum(w*lon)/sum(w)ASavg_w_lon_naive,sum(w*lat)/sum(w)ASavg_w_latFROMpacific_islands)ASs
Pro Tip: If you don’t have an explicit latitude or longitude column, you can use the information directly from the_geom
by replacing lat
with ST_Y(the_geom)
and lon
with ST_X(the_geom)
.
We’re all about open data here at CartoDB. If you want to clone the map and data, see them both here. Happy mapping!