Thursday, November 05, 2009

8Some Probability And Statistics On The Individual Time Trial

The Tour de France (TDF) is the marquee cycling event on the calender for any top international pro cyclist as well as their squads. Everyone wants to do well here because its arguably the biggest and most glamorous stage for displaying athletic talent. The competition is tough, the fans are many, the stages are epic and the prize money is fat.

In this post, I'm trying to figure out what kind of a statistical distribution is seen in the finishing times from this year's prologue TT (Tour de France). I will also try to quantify the probability of getting close to the fastest time trialist in the world. Alberto Contador tried pretty darn well. How well?

Only one way to find out these things.

So here's what I did.

Step 1 : I obtained Cyclingnews.com data for the TDF Prologue TT on July 4, 2009. I obtained 180 data points corresponding to all the competing cyclists.

Step 2 : To make sense of this data clutter, I put them into Microsoft Excel 2007 and ran a descriptive statistics analysis on it. Here's what I obtained. What you're about to see is powerful.

Fig 1 : Descriptive statistical figures for the finishing times of a sample set of 180 cyclists from the Tour de France 2009.

So is my sample set taken from a normal distribution or something different?
Let's try to answer that reasonably with the table above.

The mean, median and mode are very close to each other which MAY indicate its normally distributed. The average of the average deviation of each cyclist from the mean was 0.63 min or 37.8 seconds. The minimum time belonged to Fabian Cancellara, with a blitzy 19.53 mins whereas the maximum time belonged to Yauheni Hutarovich. I also have a Kurtosis and Skewness of 0.558 and -0.068 respectively.

Positive Kurtosis indicates a relatively appreciable peak which makes me suspect the distribution is leptokurtic (too tall instead of normally high). The book Using Multivariate Statistics (Tabachnick & Fidell, 1996) explains that if my Kurtosis statistic is more than 2 times [sqrt(24/180)] = 0.73, the data is not normally distributed. Since 0.558 is less than 0.73, we're ok.

Negative Skewness indicates that my data is left skewed. The same book mentioned above explains that if my Skewness statistic is more than 2 times [sqrt(6/180)] = 0.365, the distribution is not normal. Since -0.068 is less than 0.365, we're ok here as well.

Step 3 :
The above only gives rough indications of the type of distribution. Nothing beats setting up a visual of the spread. So I made a histogram, with a chosen bin width of 0.20 min.

Fig 2 : The histogram for the data set. Please see source of data on CyclingNews.

The graph agrees with the skewness and kurtosis statistics. The data has central tendency but is ever so slightly skewed towards the left. This is the data for the best cyclists in the world. Not really a Gaussian, but not too far away from it either. What kind of distribution it is will take more analysis and tests for goodness of fit, which I'm going to tackle some other time.

So What Does All This Mean?

Looking at the data and Fig 2, we can say that the course conditions in Monaco on that July day were such that nearly 48% of all 180 cyclists managed to get times below the average, which might mean they were pretty fit and came well prepared (or something else worked in their favor which I can't quantify). Thus, the 48th percentile is the average time, i.e 21 min and 30 seconds.

To put it in another fashion, the probability of a world class cyclist racing on this course in a time less than the average time is 0.48.

52% of the 180 performed under par, with about 8% of those 52 giving exactly average times. The probability is 0.52 that a cyclist is at average time or above it on this course.

We can also say that 72% of the 180 cyclists lie between one standard deviation on both sides of the average, 93% lie between two standard deviations about the average and 99% lie between 3 standard deviations. Pretty close to the 68-95-99 rule obeyed by normal distributions eh?

Alberto Contador Vs Fabian Cancellara As Time Trialists

Our last question is the most interesting. So if you're a top pro at the peak of your abilities, what are you chances of ever getting close to Fabian Cancellara's blitzkrieg results? Then the next question would be, how close do you want to get to 'Spartacus'? Within 2%? 3%?

Let's do 2% as a start. Within 2% is 23 seconds difference. Now that's probably the limit of what a time trialist can accept to cap the gap, so to speak!!

Let's look at what Contador obtained that day from the data. Bert raced the course 18 seconds slower than Cancellara for an amazing second place. In other words, there was a mere 1.54% time difference between the best all round cyclist in the world and the fastest time trialist in the world. Just 4 cyclists managed to come within 2% of Cancellara's time - Contador, Wiggins, Kloden and Evans. 4/180 = 0.02 = 2%.

In other words, just 2% of the 180 cyclists got a time less than or equal to 19 minutes and 55 seconds (this 2% window we're talking about).

Put in another way, this is the 2nd percentile. This is where the glory is at. And the money. And the kisses from the long legged European girls.

The probability that you're in this 23 second window from the best man on the bike is low. Just 0.022 or 1 in 45 chance. Keep in mind this is for the best in the world.

Now you know why you and I are not racing in the Tour de France. Let's just scratch our butts and cheer these beasts on.

* * *

1. Anonymous7:32 PM

I hear there's a 1 in 40 chance of winning "Mega Millions" http://www.encyclopediabranigan.com/2009/04/there-is-1-in-40-chance-of-winning-with.html

:)

2. Yeah damn the Tour de France. Go win the bucks haha

3. Contador is definitely in league to thwart Cancellara. He just needs to mature as a time trialist but considering he has overall GC plans, I doubt he would give one discipline so much undue attention. Definitely two of the best athletes in pro cycling of today.

4. A bit late here but terrific stuff, Ron.

Thanks!

5. Anonymous5:17 PM

The main conclusion I find after reading your analysis is that if you want to win the tour of france your time trail skills have to be pretty close to the best time trial rider, mr cancellara. that's why contador, wiggins and evans are there in the top!!! Andy would be the exception though. So for the next tour of france, look for the 3 or 4 fisrt riders in the prologue and ganble some money on them!!!

6. Nice blog. Sorry to nit pick, but your analysis is flawed. It is not using a random sample.

Short of Cancellara crashing, I know that I, as a recreational rider, have ZERO chance of coming within 2% of his TT time. There are very few in the peloton that have a comparable sustainable power output as Cancellara.

The proper way to calculate the "Probability of being within 2% of Fabian Cancellara's TT Times at the TDF" one would take a sample or recent history of Cancellara's say 40K TT times or average power output for such effort and build an appropriate confidence interval.

7. Sorry I titled it wrong. It is not the "Probability of being within 2% of Fabian Cancellara's TT Times at the TDF" but rather a calculation of the 2nd percentile of the distribution. I will change this title.

Otherwise, I don't see any flaws .

8. I understand you better, but your post-hoc analysis is more than just the title.

"Alberto Contador Vs Fabian Cancellara As Time Trialists

Our last question is the most interesting. So if you're a top pro at the peak of your abilities, what are you chances of ever getting close to Fabian Cancellara's blitzkrieg results?"

The answer is to increase your average power output over 40K. Your distribution of "top pros" includes sprinters (high power output over short period of time), TT specialists and climbers (high Power/weight).

If Fabian Cancellara's average power output (or average speed) is: X and Alberto Contador's is X - Y, then you can build _respective_ (normal) distributions for them and then see where their distributions overlap. (Alternatively, confidence intervals work well, too). If you do the same for Fabian and another "Top Pro" cyclist, whom incidentally rides as a domestique for a team, you can see how such distributions do not overlap and hence the domestique has zero chance of winning or coming within % percent of Fabian's time.

-Peter
P.S. I enjoyed reading your "The Anatomy Of A Cancellara Attack" work.

Thank you. I read every single comment.