Data visualization & Business Intelligence predict Wimbledon final

The action is heating up in the men’s draw at the All England Club. The 2011 Wimbledon Championship has delivered some scintillating matches, match-ups and upsets.

Aussie young-gun Bernard Tomic cut a swath through his highly fancied opponents, dispensing with the likes of Davydenko and Soderling, before going down to world number two Novak Djokovic in an admirable Quarter-final performance.

Eighth seed, American Andy Roddick, bowed-out in the third round, beaten in straight sets by Spain’s Feliciano Lopez, and infectiously exuberant Frenchman, Jo-Wilfried Tsonga, came back from two sets down to overcome the eternally graceful master of the court, Roger Federer, in five.

At this point, you’re probably feeling as though you need a crystal ball to predict the outcome of this year’s men’s final.

But we’re an innovative yet well-considered lot at Yellowfin – we don’t like leaving things to chance. A crystal ball, or any other sort of gypsy magic, just won’t do. So we’ve enlisted the help of our trusty companions – Business Intelligence (BI) and data visualization – to remove the guesswork, and enable us to make a fact-based prediction.

Now, let’s find out whose mantelpiece will sparkle that little bit brighter, and how hard they’ll have to grunt, sweat and dummy-spit to get it.

What’s that I hear you say? We’re being too cocky, too self-assured. Maybe.

But do you remember our data blog regarding the Australian Open – Data visualization and BI reveal a gamblers’ guide to the Australian Open? If you do, you’ll also remember that our data-based predictions were eerily accurate.


Average length of Wimbledon’s men’s final in minutes (1900 – 2010)

When we analyze the average length of the men’s final stanza at the world’s premier grass court event, the first thing you’ll notice is that matches are getting longer – odds are there’ll be plenty of fist pumps, end changes and line-calls-gone-wrong. The average minutes played in the final have steadily risen from 104 minutes in 1900 to 188 in 2010.


There are a number of potential factors at ‘play’. Players have undoubtedly become fitter and more highly skilled, with the sport moving from amateur pass-time to professional moneymaking exercise, cemented with the birth of the Open Era in 1968. Curiously, the introduction of the tie-break, based on this data set, did not have a shortening effect on matches. The tie-break, invented by Van Alen in 1965, was introduced at Wimbledon in 1971. The tie-break originally came into effect when the score in any set, except the last, was eight games apiece. In 1979, this changed to six games apiece, excluding the final set.

Average length of Wimbledon’s men’s final in minutes (1985 – 2010)

Despite the obvious upward trend there’s a noticeable plateau from around 1985 to 2005.


However, minimal archive dredging quickly reveals the reason. This period was dominated by a raft of now household names who mercilessly aced, smashed and drove their feeble final opponent into near humiliation: McEnroe (‘83, ‘84), Becker (’85, ’86, ‘89), Sampras (’93, ’94, ’95, ’97, ’98, ’99,’00) and Federer in the early naughties.

2005 onwards saw a noticeable serge in minutes spent on court during the men’s final at Wimbledon, thanks to several epic face-offs between arch nemeses, and once-in-a-life-time-players of contrasting style, Roger Federer, and Rafael Nadal.

Number and moving average number of sets played in Wimbledon’s men’s final (1900 – 2010)

The period between 1955 and 1969 is categorized by an interesting contrast. Whilst the number of minutes spent on court continued to climb, the average number of sets dropped markedly, from 4.1 in 1995, to 3.2 in 1969. So whilst the likes of Laver, Emerson and Newcomb clearly dominated their opponents in straight sets, perhaps the encounter wasn’t entirely lop-sided.


However, the dip in the number of sets played from the mid 80s to mid 00s coincides with the aforementioned supremacy of McEnroe, Becker, Sampras and Federer.

Number of sets played vs expected victories and upsets in Wimbledon’s men’s final (1900 – 2010)

Now let’s check out the effect that expected victories and upset wins has on the number of sets played.

For the purpose of this analysis, we’re defining ‘expected victories’ as those finals won by the player with superior ranking, and ‘upset wins’ as those finals won by players with a lower world ranking than their opponent.


Just as we suspected. You can see a direct correlation between upset victories and the number of sets played. The grey bars represent an ‘expected victory’. We can see that the number of sets played (grey line) are, on the whole, significantly lower when everything goes according to plan. But when there’s white space, signaling an ‘upset win’, the number of sets played is inevitably higher – signaling an epic struggle, as the favorite submits to unexpected defeat. Oh, except for between 1985 and 1990, with six hardly-unforeseen-‘upset’ victories going to Becker, Cash and Edberg.

The bottom line

From 1935 to 2010 the favorite, assuming the better seeded/ranked player goes into the final as favorite, has won 54 times. So where’s your money?