Advice for Average Joes
U.S. Open 2021 picks: 5 secrets from a data scientist to win your U.S. Open pool
It’s time to submit a lineup for your U.S. Open pool—where do you start? Using data as a strategic tool to help contextualize your picks and optimize your golf lineups is only helpful if you organize, value and use the results. And don’t worry: You don’t have to be a data scientist like me to use stats effectively to build an effective team.
My strategy is to provide a framework of insights that adds computer vision-derived stats onto the advanced data that is publicly available, and then vet the correlations my models find with experts (caddies, coaches and players) to provide an additional perspective. (Computer vision is a type of code that allows me to ‘measure’ almost anything using the television broadcast video. You know those launch monitors and the shot tracer on broadcast? It’s similar to that, and it gives me measurements of things my expert helpers say could be important like hips/balance, club movement, hand placement at time of strike, and much more. Then I use a lot of statistical models to verify or disprove the relationships between them mathematically.
I also really enjoy the competitive elements surrounding choosing which golfers the public believes in, and who might be flying under the radar. And that’s what I want to help you do:
Use my strategy to apply your own approach to the stats available and beat your friends and coworkers—or place high in a DFS contest. Here are five ways to do it for Torrey Pines. Good luck!
The latest videos from Golf Digest
Determine which stats you think will be crucial to success, and find players who do those things well.
You don’t need to build data models for a living to use stats to your advantage. Make it simple: Do your research on the course where the pros are playing and identify which skills are essential to success. Then apply it to your player pool.
For me, this is what I’m looking at: Total driving, strokes gained/approach, SG/putting on Poa, rough play (I used computer vision and a series function here to measure by club, distance traveled, distance to pin and next club result), recovery play (like what I looked for with rough, only when a shot did not go as planned, it measures how long it impacts the player’s game), along with par-4 results (450-500 yards), since there are seven of those holes.
One player who popped up as underrated based on those metrics and projected ownership: Adam Scott, for his strokes gained/Poa putting numbers and his past results on the South course.
Adam Scott hadn't missed a cut since the COVID restart until an MC at last month's PGA Championship.
Michael Reaves
One way to gain an edge this week: ignore course history.
Most people have narrowed their picks to those who have great history at Torrey Pines, so this is a great chance to differentiate yourself.
Remember, in 2008 Rocco Mediate missed the cut at Torrey Pines earlier in the year, and then nearly beat Tiger at the U.S. Open. Since the Open is played only on the South course, and it’s expected to play a lot differently than when the Farmers was here, it is a smart strategy to differentiate your lineup by considering a player or two who doesn’t have a high course history rating.
Let me be clear, overall course history does have value and is statistically relevant, but if most people are weighting this metric very highly, and there are good reasons to lower this value, then leaving it out of the model can give you an edge.
Selecting for traits and recent performance that should fare well on this course, without considering course history, my model flags the following players that I haven’t mentioned yet in this article but finishing in the top 15 in value by price in DFS:
Abraham Ancer
Daniel Berger
Shane Lowry
And for upside picks:
Wyndham Clark
Jhonattan Vegas
Wilco Nienaber
Why “rough play” might win you your contest
Since it’s likely a key factor this weekend, I dug a little deeper into how players have handled rough play.
Here’s how I defined rough play: Any time a player took a shot from the rough, I recorded the club and distance to the pin, then I used computer vision to measure his swing and hips, hand placement at time of strike and path/distance the ball traveled. Then I recorded the same things for the remaining shots he took that hole and related it to par. I compared the golfer’s results from the rough until finishing the hole to his results when he stayed in the fairway, making sure to factor in context like club and distance to the pin.
The rough will play a major factor at Torrey Pines this week—as Cameron Smith found out in this lie during a Tuesday practice round.
Ezra Shaw
The idea is to see how much playing out of the rough impacted the golfer’s score. The less of a negative impact, the better the rough metric. In the past five U.S. Opens, only four players who finished in the top 10 ranked outside of the top 30 in this rough metric.
For example, ahead of last year’s U.S. Open, Bryson DeChambeau ranked fourth in this category amongst the field. This weekend both popular choices, Louis Oosthuizen and Paul Casey, rank in the top five in terms of overall rough play, and Will Zalatoris forecasts to be a smart risk (meaning better rough play on this course than his ranking, which is 22nd).
My upside pick here is a name that has already come up: Wyndham Clark, who ranks 10th.
Why differentiating your lineup is so important
Whether it’s in a GPP tournament or your office pool, sometimes the key to differentiating yourself lies in finding the right pivots from popular choices—to provide the edge against the competition. In other words, projecting who your opponents are likely to select (or overlook) allows you to use game theory to set your lineup—strategically knowing where to blend in with the crowd and where to take smart risk for potential gains.
Taking a look at projected ownership data at a site like FanShare Sports or Fantasy National takes a lot of the guesswork out of it (and yes, we recommend subscribing—it’s worth it!). For this example, I looked at FanShare’s data and combined it with my model’s outputs. Here are a few interesting takeaways:
Highly Owned: Paul Casey (18.5 percent)
Strategy: Consider a swap for Sam Burns (8.4 percent)
Burns’ trajectory for strokes gained/putting in recent weeks compared to the past year is trending in the right direction. And he has been and remains efficient in strokes gained/tee to green (gaining more than 5.1 in the past six weeks) driving a top-14 ceiling projection.
Build at least one lineup with: Charley Hoffman (9.6 owned) or Justin Thomas (7.2)
Strategy: The two biggest overperforms versus roster percentage.
When I set my risk parameters to high, these two ping as having top eight results in ceiling projections. I know Thomas’ course history is basically non-existent, but the indicators are there.
You (most likely) need to pick the winner
In DFS contests, it will be impossible to win the top prize if you don’t have the winner (unless a complete underdog wins … not likely). So you’re probably wondering: Who are you picking?
I hate when my model (based off 400,000 moderate risk simulations) picks the favorite, but in those Jon Rahm edges out Collin Morikawa, who bests Patrick Cantlay.
One more pick that I haven’t mentioned yet but he appears in the largest percentage of my optimal model outputs:
Jason Kokrak, who has excelled in driving, putting recently, and is trending up in all relevant metrics.
OK, one more thing you’ll enjoy: I have Koepka finishing ahead of DeChambeau in 5 percent more of my moderate risk models, and 11.1 in high risk models. Koepka finishes in the top five 14 percent of the time in moderate-risk simulations. DeChambeau has been a bit too volatile longer term to pop higher in these simulations, so you might consider going Brooksy over Bryson in this developing feud.
Cynthia Frelund is an analytics expert for the NFL Network who has applied her game-theory analysis to building models for golf.