Tuesday, 30th September, 2014

R-R-Running with R
(Plotting GPS tracks and heart rate data)

I have recently been toying around with the idea to get myself a bit better versed in modern tools used in data analysis and so-called big data. One of the often mentioned tools on this arena is R, the free software environment for statistical computing and graphics. Since I like to practice with new programming languages, I have already for a couple of months been looking for a good excuse to try R in a real application.

Until recently, I have been logging all my running activities on online services, which have provided a nice overview on each of the runs individually and also a convenient way for browsing through old entries. However, this has always felt a bit silly—while having the data online is good for sharing on social media, for anything that interests just myself it is crazy. Nevertheless, unaware of any good offline tools for this purpose, I was stuck with the online solutions. As you can guess, this is where R came to the resque. I can't remember how it happened, but this Sunday I landed on Mollie Taylor's blog, where she discussed Mapping GPS Tracks in R. Now I had the long-awaited excuse for getting my hands dirty with R.

Obtaining data from the GPS watch

I already had the GPS and heart rate data from my Garmin Forerunner 405 transferred to the harddrive with Braiden Kindt's python-ant-downloader, and Mollie pointed out in her blog that the tcx files I had could be converted into csv using GPSBabel. What I still needed to figure out myself was how to use OpenStreetMap instead of data from one specific colossal corporation, how to include several plots within one figure and how to populate those plots with the data I wanted.

Calculating accurate distance from GPS data

The trickiest part turned out to be calculating accurate distances from the GPS coordinates. For reasons I can't fully comprehend, it turns out that almost everybody assumes that the Earth is a perfect ball when calculating the distance. This is obviously not true, and since I often run on different continents, I wanted to get this one nailed down. Of course, I wasn't the first one looking for a solution to this problem. The most reasonable implementation for R, by Mario Pineda-Krch, can be found on r-bloggers.com. It is based on JavaScript code by Chriss Veness available at movable-type.co.uk (attribution license). (The theory behind the method is by Thaddeus Vincenty). Also this formula, however, disregards altitude variations, which can be significant if you like to run on hilly areas like I do. So, I needed to add that part myself.

Plotting the pace

Another point to consider was that the point-by-point pace information was much too noisy for being directly plotted from the data. I ended up solving this by calculating the average pace for each 100 m trek and plotting those instead of each individual point. The result seems to be reasonably good in terms of comparison with empirical analysis (pace variations while running).

End result

The end result from my Sunday coding is now R code, which produces results such as this one:

A 8.5 km run in New York City in 2014
A 8.5 km run in New York City in 2014

Code is available on Github

The runmap code itself is available on github under the MIT license. I hope it will be useful also for others.

0 comment(s). View comments or leave a comment.