Who was the best driver of the ball on the PGA Tour in 2013? Was it Luke List, who led in distance at 306.3 yards? Jerry Kelly, who was tops in accuracy at 71.81 percent fairways hit? Wait, it must have been Graham DeLaet, who was No. 1 in the total driving stat that combines rankings in distance and accuracy. Actually, the best driver in terms of gaining the most advantage from where his tee shots finished was Bubba Watson. He led in strokes gained/driving, which is not an official PGA Tour statistic -- yet -- but is calculated by Columbia professor Mark Broadie, who helped the tour develop the strokes gained/putting stat.
Strokes gained/driving is just one example of advancements in statistical analysis in golf enabled by the PGA Tour's ShotLink system, which tracks the results of every shot at most tournaments and officially began 10 years ago at the start of the 2004 season. It took awhile to understand how best to utilize that mountain of data, but we have reached a point where a statistical revolution akin to those we've seen in other sports is well underway.
It can be argued that no sport needed a statistical advance more than golf. While traditional baseball stats like batting average and runs batted in are less than perfect measures of performance, golf's standard stats are more highly flawed. The old putting stats didn't account for the starting position on the green, and although they were replaced in 2011 by strokes gained/putting, that was only a first step. Driving accuracy is still measured by fairways hit, which counts a shot harmlessly in the intermediate cut the same as a ball out-of-bounds. Iron play remains tracked by greens in regulation, which doesn't factor in whether a player aims at the center of the green or fires at the flag. In addition, there's no good stat for the short game, and none of the traditional stats account for the often great difference in course difficulty in players' schedules.
New statistics aren't the only outgrowths of the ShotLink information. Players and coaches are beginning to see the usefulness of the data to target improvements in their games and strategies, in some cases bringing in a "stats coach" for analysis. The USGA mines the data to monitor developments regarding past and possible future equipment regulations. Architects use it in redesigning tour courses. Journalists can report the exact distance of putts rather than estimating. Golf fans can track the shots of any player in the field on pgatour.com.
It all grew out of the PGA Tour realizing in the mid-1990s that its score- reporting system would need an upgrade for the 21st century. Hand-held digital devices rather than a pencil and paper were clearly the way to go for walking scorers to update leader boards more efficiently, and while they were at it someone had the forward-thinking idea of devising a system that could report the precise result of every shot hit on tour. It took years of development and a major financial investment to map each course and provide the resources for a laser measuring system at each tour event, but ultimately ShotLink was born, giving the tour an extensive database currently managed along with technology partner CDW.
Suddenly, the tour had new stats like percentage of putts made from various distances, average distance from the hole on approach shots from various yardage ranges, tendencies to miss left or right with tee shots and many more. But how to make some sense out of all those numbers? The tour's idea was to utilize the brainpower of America's university system, spreading word that customized ShotLink data would be made available to researchers.
Coincidentally, Broadie at about the same time had started his own research project analyzing exactly the same kind of data with a large group of amateur golfers who were tracking the results of every shot. His goal was to discern and quantify the differences between different levels of golfers, and his tool was the concept of strokes gained.
In a nutshell, the idea is to compare the player's expected strokes required to hole out before and after each stroke. If he has improved his expected score, he has gained a commensurate amount (usually a fraction of a stroke), if he has worsened his expected score it goes down as a negative. For tour pros the expected score is determined by the tour average from a given position, based on reams of ShotLink data. An example would be a putt from just inside eight feet, which a pro is expected to make half the time. A make gives him .5 strokes gained, and a miss is -.5.
Thanks to ShotLink, Broadie was able to expand his research to the tour-pro level. Meanwhile, a trio of Massachusetts Institute of Technology researchers (Douglas Fearing, Jason Acimovic and Steven Graves) combed six years of ShotLink statistics and published a paper in 2010 showing how strokes gained could be used to develop a new putting statistic.
"We knew there was a problem with the putting stat," says Steve Evans, the PGA Tour's senior vice president of information systems. "Ultimately, we realized there would be value in exposing the data to the academic community."
The MIT paper coined the term "strokes gained" instead of Broadie's original "shot value." In fact, by whatever name, the analytical tool is not new. Something along those lines was suggested in 1968 by Australians Alastair Cochran and John Stobbs. Peter Sanders developed a similar system called ShotByShot and started working with LPGA players in 1989 (he now works with PGA Tour players and has a program for amateurs), and L.M. Landsberger wrote a paper on the subject for the World Scientific Congress of Golf in 1994.
A few others were ahead of their time. Bob (Cowboy) Ming used to track the results of his players' shots with each club and analyze the data when he caddied on the PGA Tour in the 1990s. Annika Sorenstam became the top player in the women's game partly through recording the results of every shot she hit starting as a teenage amateur -- even as a pro, she would quickly jot down the information after each hole -- and inputting the data into a spreadsheet. Short-game instructor Dave Pelz got his start by meticulously recording shot-by-shot results for players he followed around the course in the late 1970s. His analysis of the data is what convinced players to start carrying a third (and eventually a fourth) wedge.
It was ShotLink that enabled full implementation, and strokes gained/putting marked the turning point in recognition and acceptance of advanced statistical analysis. But strokes gained doesn't need to stop at the green's edge. The calculation can be made for any shot, although it gets more complicated because there are more variables. The tour has been studying how to expand strokes gained since 2011, but is moving slowly and carefully in its consideration of the concept for drives, approach shots and short-game shots.
"The primary challenge is basically the tee shot versus the approach shot," says Evans. "If someone hits what appears to be a poor second shot based on the data, what portion of that is attributed to the tee shot and what portion to the approach shot?"
The location of trees is not part of the ShotLink course mapping, nor is a terrible lie in the rough differentiated from a good one. Mapping trees is not the answer, because there are too many variables in the types of shots players can hit, including going over the trees. In any case, it would be hard to differentiate between a great recovery shot into a greenside bunker and a mediocre shot into the same bunker from a more favorable position or lie.
Broadie has tackled this problem by determining zones on tour courses from which a recovery shot is required based on historical data, and in those cases penalizing the shot that put the player in that position. Sanders mines the ShotLink data for the PGA Tour pros he now works with and is able to break down missed fairways into the categories of good miss, poor miss and no shot, developing an algorithm to determine the worst outcome.
But the tour has additional issues. While Broadie is typically looking at the long term, where inconsistencies even out, using recovery zones to determine the worth of shots could lead to misleading results for a given round. "We're being very obsessive about trying to accommodate all kinds of situations," Evans says. In an effort to make any new strokes- gained stats work in real time, the tour is trying to develop a computation that doesn't use historical data. It is hoping to avoid using subjective judgments by the walking volunteer scorers as to the difficulty of recovery shots, but in the end that might be part of the equation. (Currently the scorers make a note if a lie in the rough or bunker is "buried," but that information is generally not used.)
So, the tour is taking a gradual approach. Sometime during 2014, it will unveil the new statistic of strokes gained/tee to green. It essentially will subtract strokes gained/putting from a player's stroke differential versus the field in a given round. For example, if a player shoots 68, the field averages 71.0 and his strokes gained/putting is 1.0, his strokes gained/tee to green is 2.0 to account for the total of three strokes gained on the field. That's simple enough, but even this rollout has been delayed by some additional related computations.
Meanwhile, Broadie is constantly refining his analytics. For example, one difficulty with analyzing short-game and approach-shot data strictly by distance from the hole is the short-side situation. All things being equal, shorter shots would be easier, but that's often not the case on the PGA Tour where missing on the short side of a hole location near the edge of the green often creates a difficult shot.
"I've spent quite a long time thinking about that problem," says Broadie, who is currently working with a grad student on a research project in that area. "We are taking a look at how we can adjust strokes gained for other factors that can be measured in the data, like whether a player is short-sided, the elevation change between the ball and the hole, the slope of the green, the angle relative to the fall line and the kind of lie if that is available."
Broadie has also spent quite a long time analyzing the data to determine the aspects of the game that contribute the most to low scoring and to winning tournaments. He conveys those results, along with suggestions for strategy based on the data, in an upcoming book, Every Shot Counts, to be published in March by Gotham Books.
Many of the results show that conventional wisdom is not to be trusted. For example, breaking down the ShotLink numbers for the top 40 players from 2004 to 2012, Broadie shows that approach shots accounted for 40 percent of their scoring advantage, driving accounted for 28 percent, the short game (shots off the green and inside 100 yards) for 17 percent and putting for 15 percent.
"The importance of the long game versus the short game is surprising to many people, but looking at the data it is striking how true it is throughout the whole range, from top pros to lesser pros to amateurs," says Broadie. "It becomes clear if you think about some examples. If I were playing a par 5 of 550 yards and I could have Tiger Woods hit the shots outside 100 yards or inside 100 yards, I think it's pretty clear I would choose outside 100."
Proponents of the importance of putting might be encouraged by the fact that wielding a hot putter is somewhat more of a factor in winning, as putting contributes 35 percent to victories as opposed to 15 percent to being a top player overall. However, that still leaves tee-to-green play the greatest contributor to taking home the top prize. In the 2013 season, the week's leader in strokes gained/tee to green won eight times and finished second 11 times in the 30 tournaments where all four rounds were covered by ShotLink, finishing out of the top 10 only once. The week's leader in strokes gained/putting won only twice with just four runner-up finishes, missing the top 10 fully a third of the time.
Players can sometimes win with mediocre or even substandard putting, but much more rarely with mediocre play from tee to green -- in 2012 and 2013 combined, 10 players won while ranking worse than 25th in strokes gained/putting but only two did so ranking worse than 25th from tee to green.
Another conclusion Broadie draws from the data is that driving distance is a greater factor than driving accuracy to scoring. That's the reason long hitters like Bubba Watson populate the top of the strokes gained/driving standings, though accuracy is important enough to hurt a very wild driver like distance-leader Luke List. A 20-yard advantage in driving distance leads to a fractional advantage on every stroke, and over the long run that adds up. Strokes gained/driving also reflects the advantage gained by being able to go for the green on reachable holes more often, an edge that isn't reflected in traditional stats like greens in regulation.
Going back to the point at the top of this story, Watson trumps Graham DeLaet in strokes gained/driving simply because averaging two rankings, as in total driving, is a flawed mathematical way of determining who is best at a combination of two factors (DeLaet ranked 17th in strokes gained off the tee). For approach shots, strokes- gained is superior to even a seemingly advanced stat like proximity to the hole because it better reflects the likelihood of making birdie or bogey after the approach. And all of the strokes- gained stats are adjusted to the field average at each event, taking away the disadvantage in the regular stats of playing a tougher schedule.
With the advances in statistical analysis, it should come as no surprise that some players have started to use ShotLink numbers to help their game. One of the early adopters was Luke Donald, along with his instructor Pat Goss. The start of Donald's pro career nearly coincided with the start of ShotLink, a fortunate coincidence for Goss, Donald's coach both at Northwestern and for most of his pro career. (Donald has recently switched to another swing coach but still relies on Goss for the short game and statistical analysis.)
"The old stats didn't have a lot of value. They never told the story correctly," says Goss, who has an economics degree. "For a guy with a background in stats, it didn't feel like there was enough info to make it worth the time. Even with ShotLink, we are just beginning to brush the surface."
Donald is an example that long hitting isn't the only route to success, even in the world of advanced stats. There is still room for individual differences, and Donald is able to attribute most of his strokes-gained advantage to short game and putting. Fortunately, he and Goss are smart enough to look at the stats and not say he needs to work on driving the ball farther, though they did note that an extreme dip in driving accuracy in 2009-10 needed to be corrected. They realize he could actually get worse if he tried to change his swing to gain extra yards. Instead, they look for realistic areas where he can improve.
The main use for pros is how best to utilize their practice time. "Most players don't spend enough time on their weaknesses," says instructor Mike Bender, who brought in Sanders to analyze the games of his players Zach Johnson and Jonathan Byrd. "With Peter, we can identify the weak areas and formulate a practice plan to enhance skills and bring the levels up."
"What I do is lend clarity and perspective to each piece of the puzzle," says Sanders, who works mostly with Bender rather than directly with the players. "For a player like Zach, it's really the fine tuning of a great athlete. We want to lessen the difference between his best and his not-best play, and extend the periods of time when he's playing well."
Experts like Sanders, Broadie (who now consults with several tour players and coaches), Goss and Pelz are valuable because they can cut through the noise and glean true insights from the data. Otherwise, there's the danger of being overwhelmed by so many numbers or making too much out of small sample sizes.
"A guy could almost make every putt in a round, and he may be a really good putter even though it doesn't look like it because he didn't make any," says Pelz. "There's no luck in a million putts, but there is a lot of luck in every putt. You have to understand stats if you are going to work with them."
It's best when you can point to a tangible reason for a weakness. Kevin Streelman looked at his stats at the end of 2013 and noticed that he didn't make enough putts in the 15- to 25-foot range. Rather than just going out and mindlessly practice them during the offseason, he thought about why he was making a low percentage. "Too many times, I was just lagging it up close, trying not to hit it five feet past," he says. "But I need to make a higher percentage of those to be more competitive." Sorenstam recalls that early in her career, her data showed she tended to miss long irons short and to the right, a fact that enabled her to correct a swing flaw.
Jason Day's coach Colin Swatton is very much into stats and recalls a time when he was able to pinpoint a problem with Day missing fairways. It turned out Day's percentage with the 3-wood wasn't any better than it was with the driver, so it was the shorter club -- with which one expects to be more accurate -- that was the biggest problem.
Of course, there's no guarantee that a player will actually improve with practice. But the potential payoff is worth the effort, especially as small improvements can make a big difference given how tightly compacted in skill level the pros are -- a fact which can be gleaned from the data.
Another way players can use the data is to determine the best strategy on a given hole or particular course. That's a specific focus of Brandt Snedeker and his analytics coach Mark Horton, who Snedeker points to as a reason for his improvement in the last couple of years, though the two are loath to talk about it for fear of giving away any secrets.
Swatton is in a unique position to give that kind of course-management advice because he is Day's caddie as well as his swing coach and statistical analyst. "In the past, you could say something like, 'You've played well at Torrey Pines,' " Swatton says. "But now you can look at it and see that he might have played 13 holes well and five not so well, and come up with a different plan on those holes."
Overall, Swatton compares statistical analysis in golf to a football coach or player watching game film. "At the end of the day, these guys play for so much money, why wouldn't they use it? Whether it's this year, next year or next month, who knows, but it's going in that direction."
What does the future hold for ShotLink and statistical analysis? One possibility, as the tour's laser equipment is nearing the end of its life, is a change to a video-camera system that would be able to capture the final trajectory of the shot and where the ball hits the ground as well as where it ultimately finishes.
Ideally, we would see ShotLink technology installed at the four major championships, all run by organizations other than the PGA Tour. That seems unlikely to happen soon, though responses to inquiries about the possibility from the PGA of America (PGA Championship) and the Masters at least offered some degree of hope. It's an unfortunate situation when there is less information about the game's most important tournaments than there is about regular events.
"All tour players are guessing about what they do or don't do well at the majors," says Goss. "This year I'm trying to come up with a homemade method for the majors and the European Tour events Luke plays in."
Broadie foresees more application of ShotLink data to course strategy, further development of strokes gained, more incorporation of additional information like the lie of the ball and the contours of the green, perhaps even connecting ShotLink data to TrackMan ball-flight data in a more direct link to coaching.
"If this were a round of golf, I'd say we're still on the front nine in golf analytics," Broadie says. "There's a long way to go."
Shotlink's many uses
ShotLink data has additional uses besides analyzing players' games. Most notably, it is a treasure trove of information for the USGA as it considers possible equipment regulations or analyzes the effect of past regulations.
"We call ourselves power users," says USGA technical director Matt Pringle. "There's not a week that goes by that we don't tap into it."
The USGA delves deep into the ShotLink data, marrying it with information from the tour setup staff on rough height, grass type, green speed and fairway width. Pringle's staff has developed its own software to go through different scenarios.
"With so much data, you can slice it a hundred different ways and test hypotheses to see if you can attribute things to equipment effects, course setup or something else," said Pringle. "The tour is not a controlled experiment, but there's so much data we can use it as if it were."
Naturally, the ruling body uses the data to constantly monitor distance. But it goes much further, and one of its most interesting tools is simulation -- the USGA has simulated the 2012 season 10,000 times with changing variables.
"We can say: 'What if driving distance were shorter or longer, how does that play out?' " says Pringle. "For example, if drives went 15 yards farther, what effect would it have on scoring for different types of players. We can see what would happen if you changed course setup or made courses longer or shorter."
ShotLink data on where shots have finished on particular holes -- and scoring from various positions -- has also proved to be very useful in redesigns of tour courses, as architect Gil Hanse discovered in working on TPC Boston and Doral.
"When they first gave me the information, I thought it was going to be useless. Boy, was I wrong!" says Hanse. "For placement of fairway bunkers it provides us with a great database that incorporates wind, weather and distance over a period of years and multiple rounds, and allows us to focus on proper placement to challenge these guys."
More subtly, Steve Wenzloff of the PGA Tour's design staff says that on a new course like Conway Farms, host of the BMW Championship for the first time in 2013, the tour looks to see if there are areas on greens or fairways that are collecting or repelling shots. Collection areas on fairways aren't good from a divot standpoint, so the course may look to mitigate those. And areas on the greens that either gather or repel shots are considerations when it comes to hole locations.
The tour's field staff can use that type of data when it sets pins for each round. Data about how holes have played, and where drives have landed, helps in moving tees to various positions in different rounds. In more general terms, ShotLink information can help the tour determine the optimum rough height or fairway width on a given course.
In the academic world, ShotLink has helped college professors demonstrate points in areas far removed from golf. Jennifer Brown of Northwestern used the proven tendency of Tiger Woods' fellow competitors to play worse when they were paired with him to illustrate an effect in the business world where a "superstar" performer at a company can actually hamper the performance of others. Todd McFall of Wake Forest looked at the effect of the USGA's 2010 change in the rule on iron grooves as an example of regulations leading to unexpected consequences because people modify their behavior, in this case pros playing more cautiously and thus mitigating the effect of ostensibly more difficult shots from the rough.