Every year, members of the APBR metrics bulletin board come together and share their rankings for the ensuing NBA draft. As the online home of the NBA analytics community, featuring NBA team executives, leading basketball journalists and of course, a gifted and passionate fan base, the draft rankings are a little different than a typical mock draft. Due to the increased popularity and ubiquity around the league of basketball analytics, the APBRmetrics community wanted to share their rankings with the passionate fans and followers of DraftExpress.
Before you read this, please look at any previous year's mock draft. Note how many lottery picks quickly left the league and how many later picks became stars. Every year following the NBA draft, analysts will tell you that the teams that won the draft are those that drafted players who slid from their projected location in mock drafts, while the teams that lost are those who drafted players above their projected location. Despite the increase in information available for NBA prospects, mock drafts have become less prescient. In today's landscape where analytics play such a large role in determining where and when a player is drafted, traditional mock drafts are left in a state of turmoil.
Despite the amazing statistical tools available to evaluate the impact of NBA players, ranking college prospects is a much more complicated task. Not only must team style, strength of schedule and age be accounted for, a more important question must be asked; how will a player's game change when competing against bigger and faster opponents? Other important factors simply cannot be measured with stats: how much room does the player have to grow (physically and mentally)? What about their work ethic? Do they have a good head on their shoulders? Can they guard a position in the NBA? Additionally, teams need to prioritize if they want a player who excels in traditional basketball metrics, or has good analytics. Zach LaVine was selected to the Rising Stars challenge, won the 2015 Slam Dunk Championship and named to the NBA All-Rookie Second team. Most consider this to be a successful rookie season and a good draft selection by Minnesota. On the other hand, analytics rate LaVine's year as one of the worst in the entire NBA and a bad draft choice.
Keep in mind when looking at these models that they are not mock drafts. Use these rankings in supplement to all of the other information you know about a prospect. Further, due to the limited information available for international prospects, these models only provide rankings for NCAA players eligible for the NBA draft that are likely to be drafted.
Preview on the Five Different NBA Draft Models, and Their Top-14 Prospects. Full Ranking Displayed At Bottom
Note: Non-collegiate prospects, such as Emmanuel Mudiay, Mario Hezonja, Kristaps Porzingis, and others, have been excluded from this study.
My name is Layne Vashro (@VJL_bball) and I am presenting my simple Estimated Wins Peak (EWP). In the past, I have put together a number of different projection models and tools to help evaluate incoming talent. These include several NCAA/International models, a player-season comparison finder, a tool that shows how each statistic has historically translated to the NBA for players under different coaches, and a tool that allows you follow each prospect's progression/regression throughout the season. You can find these over at NylonCalculus under the Our Stats tab.
The goal of the EWP model is to project how good each prospect will be at the peak of his NBA career. In order to do that, I must quantify peak NBA performance in some acceptable way. I do this by calculating the number of wins a player is responsible for in each season of his career using a blend of Win Shares (box-score metric) and RAPM (+/- metric). I then use a two-year rolling average and select the highest value as that player's wins peak. Here is a link to the list of previously drafted players included in the sample. If this list largely agrees with the order in which you would select these players in a redraft, you can at least be comfortable with my model's validity.
The model is constructed with collegiate box-score statistics pulled from DraftExpress.com and basketball-reference.com, play-by-play statistics pulled from hoop-math.com, anthropometric information (measurements) from DraftExpress, and a selection of team statistics pulled from sports-reference.com. Then, a linear regression is used to identify what each bit of pre-NBA information says about a player's future peak production in the NBA based on historical results. This knowledge is then applied to current prospects to project their NBA future.
My name is Steve Shea (@SteveShea33) and I am an associate professor of mathematics at Saint Anselm College and a co-author of the book, Basketball Analytics.
College Prospect Rating (CPR) uses a college player's box score statistics and his class (freshman, etc.) to approximate his NBA potential. It differs significantly from other objective draft models in at least the following ways: CPR does not use regressions. Thus, CPR does not have to make a choice of a dependent variable. This is nice, but not the primary motivation for not using regressions. A typical regression uses information of what has worked in the past to predict what will work in the future. Implicit in the prediction is the assumption that the context of the past will be similar enough to the context of the future. This may not be true in the NBA. The NBA is changing in very measurable ways (such as the percentage of a team's offense that comes from 3-point shots). CPR hopes to project the players that will succeed in 2016 and beyond, not pick the players that will thrive in the 90s.
College players are inconsistent. This is most problematic in the freshman season, which is the last college season for some of the top prospects. Some freshmen improve dramatically over the course of the season. Others simply don't show up on occasion. These inconsistencies blur season average numbers. CPR gets around this by focusing only on each individual's top 10 performances in each statistic.
There are no weights on statistics or adjustments for scarcity of position. CPR weights all statistics the same. It simply looks for excellence. Anthony Davis was excellent. Kevin Durant was excellent. The two were superior prospects in very different ways. CPR leaves it to the team to decide what positions they need or perceive to be scarce at the time, and what type of player they want. In spite of its nonstandard construction (or maybe because of it), CPR has been effective projecting both high picks that busted and late picks that surprised as can be seen here.
My name is Nick Restifo. In my basketball life, I write for Nylon Calculus and am a special assistant for the D2 powerhouse that is the University of New Haven Chargers. If you like, you can follow me on Twitter at @itsastat.
My overall predictions are based on an ensemble of four base models predicting a two year career peak blend of RAPM and Win Shares. The ensemble takes input from a regression based model and a bagged neural network trained on two different subsets of data; all prospects with statistics listed on DraftExpress since 2001-2002, and just those prospects that were actually drafted since 2001-2002 (a total of four base models).
I use RSCI high school rank, NBA Combine measurements and tests, pace and per minute adjusted box score statistics, minutes per game, age on February 1st of a player's draft year, strength of schedule, and percentage of points from three (to account for some spacing benefits). I average an entire player's pre-NBA career, each year weighted by minutes played. For the vast amount of missing data for the players who did not participate in the combine, I impute regression based estimates of body dimensions (hand length, body fat, etc) based on listed height and weight. For the vertical and agility tests, I impute missing values via decision trees trained on a player's age and body dimensions.
In comparison to other models, since I include high school ranking as a variable, my model will favor those highly heralded high school players significantly more than other models. While this has served as an important predictor in the past, it helps put into context the rankings of 5-star recruits such as Cliff Alexander and unranked high school prospects like Frank Kaminsky.
My name is Jesse Fischer. I work as a Senior Software Engineer at Amazon. My academic background includes a degree in Computer Engineering with a minor in Mathematics from the University of Washington. I blog on www.tothemean.com and can be found on twitter at @jessefischer33.
My "Longevity" draft model optimizes for "long term value" as defined by a player's max five year "Value over Replacement Player" (VORP) which is based on the stat BPM. VORP accounts for playing time allowing injuries/durability/coaching preferences to be factored in, which is important when measuring longevity. To account for players who are still playing (and most importantly the players who haven't hit their 5 year peak yet) I have a separate "Predicted VORP" model (based on age, VORP trajectory, playing time trajectory, max single season VORP, etc) which predicts a player's max 5 year value based upon data from his career thus far.
The actual "Longevity" model is based upon public data: college stats (multiple years), team stats, NBA Combine data, etc. It also includes actual/expected draft position to take into account real life scouting as a factor rather than only blindly using numbers. Data was transformed to try to account for pace, competition, playing time, teammate quality, among other things that I describe on my blog. The model uses more than just the players who were drafted, but also any which have had a remote chance of playing in the NBA (assigning a replacement player value if they didn't make it). To automate this pre-filtering there is an additional "Predicted NBA Player" model which projects the probability of a NCAA player playing in the NBA.
Using a filtered dataset, the "Longevity" model runs as a blend of many different models. The individual models consist of different machine learning algorithms, all optimized and tuned in different ways. Further, my overall model is not limited to linear relationships like most draft models. There are a lot of details and discussion left out of this summary which can be read about in future posts on my blog: www.tothemean.com.
My name is Masseffectlenk (@masseffectlenk), and I am a graduate student in bioengineering.
My model is based on a regression utilizing basic box score stats that are on a pace-adjusted, per 40 minute basis. Rates are specifically used--three point rate, free throw rate, assist rate, usage rate--as well as height, weight, and age. The regressions are informed by nearly a thousand NBA players who have played over 100+ NBA minutes and were drafted post-2005. The stats are mapped to a blend of the average offensive and defensive win shares/RPM values. Similarity scores are then created based on the overall stats and are used to determine the weightings for the regression. The most recent model is informed by recent at-rim shots per 40 minutes and dunk rate for each player in the past three seasons, and adds on the athletic component--this only applies to NCAA players this past season. Spreadsheet for data can be found here.
The athletic regression is used specifically to dock players who post deceptively athletic box score stats but lack athleticism otherwise (e.g. Jordan Adams) or elevate those who are more athletic than their box score stats indicate (e.g. Jordan Clarkson, Norman Powell this year).
My name is Daniel Myers (@DSMok1) and I am a structural engineer by trade, born and raised in Oklahoma, but now living in Maine. I have always been a math nerd (and Excel whiz), and started dabbling in advanced sports statistics around 2007. I started posting on the APBRmetrics forum in 2009, and currently am the acting administrator. My focus is to be open with my work and very aware of the limitations and weaknesses of our statistics.
Box Plus/Minus (BPM) is my contribution to this project, but it is not a projection system at all. Rather, it is perhaps the best public metric for measuring actual production at the college level. The ranking published here is simply a ranking by BPM, which evaluates player production per possession. The full derivation and methodology is available at http://www.basketball-reference.com/about/bpm.html.
BPM was developed by regressing advanced box score stats onto long term Regularized Adjusted Plus/Minus (RAPM). This was done using NBA data (no RAPM is available for the NCAA), but the values of each statistic should be valid at the NCAA level as well. BPM is adjusted for context and strength of schedule. Full BPM data for the NCAA are available through Sports Reference / College Basketball's Player Season Finder.
Treat BPM not as a projection of NBA ability, but rather context for the other models: has the player produced in college? Why or why not would we expect that to translate to the NBA? If they produce well in the NCAA as a freshman, that's a great indicator.
If you have enjoyed this article and want to see more about NBA draft models or read about analytics, we encourage you to visit us at the APBRmetrics forum.
Composite Ranking Comparison to DraftExpress Rankings
Table also available in Google Spreadsheet format for sorting purposes and further analysis.
*BPM is not included in the composite ranking
COMP Simple composite ranking including all 5 models.
LV: Layne Vashro
SS: Steve Shea
NR: Nick Restifo
JF: Jesse Fischer
BPM: Daniel Myers
DX-100: DraftExpress Top-100 Ranking
DIFF Difference between composite ranking and DraftExpress Top-100 Ranking (+=DX is higher on prospect, -=Model Composite ranking is higher )