Using Python to improve my Fantasy Basketball team
¶

Front Page

Nikolaos Michas
¶

PyCon Balkan 2018
¶

What is Fantasy Basketball ?¶

Fantasy sport

From Wikipedia, the free encyclopedia

A fantasy sport (also known less commonly as rotisserie or roto) is a type of online game where participants assemble imaginary or virtual teams of real players of a professional sport. These teams compete based on the statistical performance of those players' players in actual games. This performance is converted into points that are compiled and totaled according to a roster selected by each fantasy team's manager. These point systems can be simple enough to be manually calculated by a "league commissioner" who coordinates and manages the overall league, or points can be compiled and calculated using computers tracking actual results of the professional sport. In fantasy sports, team owners draft, trade and cut (drop) players, analogously to real sports.

Basic Rules¶

Team owner drafts a team of 13 players
Each week owners select 10 active players
Owners collect points based on their picks' performance

Front Page

Each Player collects points from the following statistics¶

Field Goals Made (FGM) 1.5
Field Goals Attempted (FGA) -0.5
Free Throws Made (FTM) 1
Free Throws Attempted (FTA) -0.75
Three Pointers Made (3PM) 1
Three Pointers Attempted (3PA) -0.25
Offensive Rebounds (OREB) 0.5
Rebounds (REB) 1
Assists (AST) 2
Steals (STL) 2.5
Blocks (BLK) 2.5
Turnovers (TO) -1.75
Points (PTS) 1

How to improve my team?¶

Improve Draft Process
Make smarter moves during the season (based on schedule and form)

Predict player's performance¶

Use previous season's statistics to predict the next one
Machine Learning
Regression
Neural Networks

Use Python¶

Pandas
Beautiful Soup
Jupyter
Seaborn / Plotly
Scikit Learn
Keras

Step 0.¶

Pandas¶

https://pandas.pydata.org/pandas-docs/stable/

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

(venv) [nmichas@my-pc]$ pip install pandas

pandas.DataFrame¶

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

In [1]:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))

In [2]:

df

Out[2]:

	A	B	C	D
0	-0.023940	-1.116884	-1.420836	0.026762
1	0.472838	0.537210	-0.174598	-1.972429
2	0.030127	-0.493965	-1.710277	-1.127274
3	-0.838290	-0.340422	0.982786	-0.291325
4	0.942333	0.914386	-1.218660	-2.353766
5	0.326871	-0.797093	-0.446801	-0.366841

Step 1.¶

Collect Statistics from Previous Years¶

Beautiful Soup¶

https://www.crummy.com/software/BeautifulSoup/

Install by:

(venv) [nmichas@my-pc]$ pip install beautifulsoup4
(venv) [nmichas@my-pc]$ pip install lxml

Parse and read all Players Statistics from Basketball-Reference website

https://www.basketball-reference.com/players/a/antetgi01.html

In [3]:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/players/a/antetgi01.html'

r = requests.get(url)
s = BeautifulSoup(r.text, 'lxml')

In [4]:

player_df = pd.read_html(r.text)[0]
player_df.head()

Out[4]:

	Season	Age	Tm	Lg	Pos	G	GS	MP	FG	FGA	...	FT%	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS
0	2013-14	19.0	MIL	NBA	SF	77	23	24.6	2.2	5.4	...	0.683	1.0	3.4	4.4	1.9	0.8	0.8	1.6	2.2	6.8
1	2014-15	20.0	MIL	NBA	SG	81	71	31.4	4.7	9.6	...	0.741	1.2	5.5	6.7	2.6	0.9	1.0	2.1	3.1	12.7
2	2015-16	21.0	MIL	NBA	PG	80	79	35.3	6.4	12.7	...	0.724	1.4	6.2	7.7	4.3	1.2	1.4	2.6	3.2	16.9
3	2016-17	22.0	MIL	NBA	SF	80	80	35.6	8.2	15.7	...	0.770	1.8	7.0	8.8	5.4	1.6	1.9	2.9	3.1	22.9
4	2017-18	23.0	MIL	NBA	PF	75	75	36.7	9.9	18.7	...	0.760	2.1	8.0	10.0	4.8	1.5	1.4	3.0	3.1	26.9

5 rows × 30 columns

In [5]:

COLUMNS = ['Season', 'Age', 'Tm', 'Lg', 'Pos', 'G', 'GS', 
           'MP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 
           '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 
           'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 
           'PF', 'PTS']
player_df = player_df[COLUMNS]
player_df.head()

Out[5]:

	Season	Age	Tm	Lg	Pos	G	GS	MP	FG	FGA	...	FT%	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS
0	2013-14	19.0	MIL	NBA	SF	77	23	24.6	2.2	5.4	...	0.683	1.0	3.4	4.4	1.9	0.8	0.8	1.6	2.2	6.8
1	2014-15	20.0	MIL	NBA	SG	81	71	31.4	4.7	9.6	...	0.741	1.2	5.5	6.7	2.6	0.9	1.0	2.1	3.1	12.7
2	2015-16	21.0	MIL	NBA	PG	80	79	35.3	6.4	12.7	...	0.724	1.4	6.2	7.7	4.3	1.2	1.4	2.6	3.2	16.9
3	2016-17	22.0	MIL	NBA	SF	80	80	35.6	8.2	15.7	...	0.770	1.8	7.0	8.8	5.4	1.6	1.9	2.9	3.1	22.9
4	2017-18	23.0	MIL	NBA	PF	75	75	36.7	9.9	18.7	...	0.760	2.1	8.0	10.0	4.8	1.5	1.4	3.0	3.1	26.9

5 rows × 30 columns

In [6]:

import re

player_df['Height'] = s.find(itemprop='height').get_text()
player_df['Weight'] = s.find(itemprop='weight').get_text()

regex = re.compile(
    '(Guard|Forward|Point Guard|Center|Power Forward|Shooting Guard|Small Forward)')
player_df['Position'] = s.findAll(text=regex)[0].strip().split('\n')[0]
player_df.columns

Out[6]:

Index(['Season', 'Age', 'Tm', 'Lg', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%',
       '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%',
       'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 'Height',
       'Weight', 'Position'],
      dtype='object')

Read All Player Names¶

(venv) [nmichas@my-pc]$ python get_all_players.py

players.csv should be a csv file of this form

name,shortname,href
Alaa Abdelnaby,abdelal01,/players/a/abdelal01.html
Zaid Abdul-Aziz,abdulza01,/players/a/abdulza01.html
Kareem Abdul-Jabbar,abdulka01,/players/a/abdulka01.html
Mahmoud Abdul-Rauf,abdulma02,/players/a/abdulma02.html
Tariq Abdul-Wahad,abdulta01,/players/a/abdulta01.html
Shareef Abdur-Rahim,abdursh01,/players/a/abdursh01.html
Tom Abernethy,abernto01,/players/a/abernto01.html
Forest Able,ablefo01,/players/a/ablefo01.html

Read all Statistics for every player¶

(venv) [nmichas@my-pc]$ python get_all_seasons.py

seasons.csv should be a csv file of this form

csv
Player,ShortName,Height,Weight,Position,BirthPlace,SeasonURL,Season,Age,Tm,Lg,Pos,G,...
Alaa Abdelnaby,abdelal01,6-10,240lb,Power Forward,Egypt,/players/a/abdelal01/gamelog/1991/,1990-91,22.0,POR,NBA,PF,43,0,6.7,1.3,2.7,0.474,0.0,0.0,,1.3,...
Alaa Abdelnaby,abdelal01,6-10,240lb,Power Forward,Egypt,/players/a/abdelal01/gamelog/1992/,1991-92,23.0,POR,NBA,PF,71,1,13.2,2.5,5.1,0.493,0.0,0.0,,2.5,...
Alaa Abdelnaby,abdelal01,6-10,240lb,Power Forward,Egypt,/players/a/abdelal01/gamelog/1993/,1992-93,24.0,TOT,NBA,PF,75,52,17.5,3.3,6.3,0.518,0.0,0.0,0.0,...

Step 2.¶

Clear the Data¶

In order to perform analysis and make predictions we need to use entirely numerical values

In [7]:

df = pd.read_csv('seasons.csv')
df.sample(1)

Out[7]:

	Player	ShortName	Height	Weight	Position	BirthPlace	SeasonURL	Season	Age	Tm	...	FT%	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS
2165	Benoit Benjamin	benjabe01	7-0	250lb	Center	Louisiana	/players/b/benjabe01/gamelog/1996/	1995-96	31.0	MIL	...	0.732	1.6	4.7	6.2	0.7	0.5	1.0	1.6	2.6	7.8

1 rows × 37 columns

Remove stats from players that changed team mid-season

In [8]:

df[['Player', 'ShortName', 'Season', 'Lg', 'Tm']].head(5)

Out[8]:

	Player	ShortName	Season	Lg	Tm
0	Alaa Abdelnaby	abdelal01	1990-91	NBA	POR
1	Alaa Abdelnaby	abdelal01	1991-92	NBA	POR
2	Alaa Abdelnaby	abdelal01	1992-93	NBA	TOT
3	Alaa Abdelnaby	abdelal01	1992-93	NBA	MIL
4	Alaa Abdelnaby	abdelal01	1992-93	NBA	BOS

In [9]:

df.drop(
    df[df.duplicated(['ShortName', 'Season'], keep='first')].index, 
    inplace=True)

Drop Rows for No-NBA leagues and total career statistics

In [10]:

df[['Player', 'ShortName', 'Season', 'Lg', 'Tm']][df.Lg != 'NBA'].head(5)

Out[10]:

	Player	ShortName	Season	Lg	Tm
92	John Abramovic	abramjo01	1946-47	BAA	PIT
93	John Abramovic	abramjo01	1947-48	BAA	TOT
96	John Abramovic	abramjo01	Career	BAA	NaN
151	Don Adams	adamsdo01	1974-75	TOT	TOT
154	Don Adams	adamsdo01	1975-76	TOT	TOT

In [11]:

df.drop(df[df.Lg == 'ABA'].index, inplace=True)
df.drop(df[df.Lg == 'BAA'].index, inplace=True)
df.drop(df[df.Lg == 'TOT'].index, inplace=True)
df.drop(df[df.Season == 'Career'].index, inplace=True)

Remove data from players with missing information

In [12]:

df[['Player', 'ShortName', 'Season', 'Lg', 'Tm', '3P', 'GS']][df['3P'].isnull()].head(5)

Out[12]:

	Player	ShortName	Season	Lg	Tm	3P	GS
10	Zaid Abdul-Aziz	abdulza01	1968-69	NBA	TOT	NaN	NaN
13	Zaid Abdul-Aziz	abdulza01	1969-70	NBA	MIL	NaN	NaN
14	Zaid Abdul-Aziz	abdulza01	1970-71	NBA	SEA	NaN	NaN
15	Zaid Abdul-Aziz	abdulza01	1971-72	NBA	SEA	NaN	NaN
16	Zaid Abdul-Aziz	abdulza01	1972-73	NBA	HOU	NaN	NaN

In [13]:

# drop players before 3P use
df.dropna(subset=['3P', '3PA'], inplace=True)
# drop players without info for Games Started
df.dropna(subset=['GS'], inplace=True)
# drop players with no height-weight info
df.dropna(subset=['Height', 'Weight'], inplace=True)

Convert Heigh and Weight to numeric

In [14]:

df[['Height', 'Weight']].sample(5)

Out[14]:

	Height	Weight
30612	6-8	199lb
3868	6-9	245lb
27836	6-2	186lb
24749	6-8	210lb
13275	6-5	184lb

In [15]:

def height_to_cm(h):
    ft, inch = h.split('-')
    inch = int(inch) + int(ft) * 12
    return round(inch * 2.54, 1)


def remove_lb(w):
    return int(w.replace('lb', ''))

df['Height'] = df['Height'].map(height_to_cm)
df['Weight'] = df['Weight'].map(remove_lb)

Convert season to numeric

In [16]:

min_season = int(df['Season'].min().split('-')[0])

def get_season(row):
    return int(row['Season'].split('-')[0]) - min_season

df['Season_Numeric'] = df.apply(get_season, axis=1)

In [17]:

df[['Season', 'Season_Numeric']].sample(5)

Out[17]:

	Season	Season_Numeric
12132	2010-11	31
30148	1982-83	3
19979	1982-83	3
3439	1981-82	2
7663	1990-91	11

Convert position to an array of Boolean values

In [18]:

def get_position_matrix(position):
    positions = [0, 0, 0, 0, 0]
    if 'Point Guard' in position:
        positions[0] = 1
    if 'Shooting Guard' in position:
        positions[1] = 1
    if 'Small Forward' in position:
        positions[2] = 1
    if 'Power Forward' in position:
        positions[3] = 1
    if 'Center' in position:
        positions[4] = 1
    if 'Guard' in position and 'Point Guard' not in position and 'Shooting Guard' not in position:
        positions[0] = 1
        positions[1] = 1
    if 'Forward' in position and 'Power Forward' not in position and 'Small Forward' not in position:
        positions[2] = 1
        positions[3] = 1
    return positions

position_matrix = []
for i, season_row in df.iterrows():
    position_matrix.append(get_position_matrix(season_row['Position']))
position_matrix = pd.np.array(position_matrix)
for i, position in enumerate(['PG', 'SG', 'SF', 'PF', 'C']):
    df['plays_' + position] = position_matrix[:, i]

In [19]:

giannis_df = df[df['Player'] == 'Giannis Antetokounmpo']
giannis_df[['Player', 'Position', 'plays_PG', 'plays_SG', 'plays_SF', 'plays_PF', 'plays_C']].head(1)

Out[19]:

	Player	Position	plays_PG	plays_SG	plays_SF	plays_PF	plays_C
770	Giannis Antetokounmpo	Small Forward and Point Guard and Shooting Gua...	1	1	1	1	0

Calculate score

In [20]:

def get_score(row):
    return row['FG'] * 1.5 + row['FGA'] * (-0.5) + row['FT'] + \
        row['FTA'] * (-0.75) + row['3P'] + row['3PA'] * (-0.25) + \
        row['ORB'] * 0.5 + row['TRB'] + row['AST'] * 2 + \
        row['STL'] * 2.5 + row['BLK'] * 2.5 + \
        row['TOV'] * (-1.75) + row['PTS']

df['Score'] = df.apply(get_score, axis=1)

In [21]:

giannis_df = df[df['Player'] == 'Giannis Antetokounmpo']
giannis_df[['Player', 'Season', 'Score']].head(5)

Out[21]:

	Player	Season	Score
770	Giannis Antetokounmpo	2013-14	17.275
771	Giannis Antetokounmpo	2014-15	28.475
772	Giannis Antetokounmpo	2015-16	39.025
773	Giannis Antetokounmpo	2016-17	51.675
774	Giannis Antetokounmpo	2017-18	55.300

Find next season score - the target column

In [22]:

df.sort_values(['ShortName', 'Season_Numeric'], inplace=True)
g = df.groupby(['ShortName'])
next_season_score = list()
for i, gr in g:
    next_season_score += list(gr['Score'].shift(-1))
df['Next_Season_Score'] = next_season_score
df.dropna(subset=['Next_Season_Score'], inplace=True)

In [23]:

giannis_df = df[df['Player'] == 'Giannis Antetokounmpo']
giannis_df[['Player', 'Season', 'Season_Numeric', 'Score', 'Next_Season_Score']].head(5)

Out[23]:

	Player	Season	Season_Numeric	Score	Next_Season_Score
770	Giannis Antetokounmpo	2013-14	34	17.275	28.475
771	Giannis Antetokounmpo	2014-15	35	28.475	39.025
772	Giannis Antetokounmpo	2015-16	36	39.025	51.675
773	Giannis Antetokounmpo	2016-17	37	51.675	55.300

In [24]:

# drop unnecessary columns
df.drop(
    ['BirthPlace', 'Season', 'Position', 'Player', 
     'SeasonURL', 'Lg', 'ShortName', 'Pos', 'Tm', 'Score'], 
    axis=1, inplace=True)

In [25]:

df.fillna(0, inplace=True)
df.reset_index(inplace=True)
df.drop(['index'], axis=1, inplace=True)

In [26]:

df.sample(5)

Out[26]:

	Height	Weight	Age	G	GS	MP	FG	FGA	FG%	3P	...	TOV	PF	PTS	Season_Numeric	plays_PG	plays_SG	plays_SF	plays_PF	Next_Season_Score
9259	182.9	170	28.0	75.0	74.0	31.7	6.4	13.1	0.484	1.6	...	2.6	1.4	18.2	13	1	0	0	0	39.475
3652	200.7	209	32.0	50.0	5.0	9.2	0.8	2.2	0.369	0.0	...	0.4	1.0	1.9	18	0	0	1	0	1.300
4787	195.6	195	24.0	58.0	26.0	18.7	3.8	8.4	0.458	0.0	...	1.0	1.5	9.2	8	0	1	0	0	2.725
2915	185.4	189	28.0	63.0	17.0	23.0	3.6	9.3	0.384	1.1	...	0.9	1.6	9.5	22	1	1	0	0	21.000
880	205.7	235	28.0	56.0	6.0	16.7	3.9	7.3	0.532	0.3	...	1.2	1.6	9.4	37	0	0	1	1	25.400

5 rows × 35 columns

We can now use the following function to get an entirely numeric dataframe.

from clear_seasons_data import get_clear_final_data
df = get_clear_final_data()

Step 3.¶

Inspect and Visualize our data.¶

We will use Pandas to get some insights for our data and Seaborn and Plotly in order to easily create plots and understand the relation that may exist between our dataframe's columns

In [27]:

from clear_seasons_data import get_clear_final_data
df = get_clear_final_data()
df.describe()

Out[27]:

	Height	Weight	Age	G	GS	MP	FG	FGA	FG%	3P	...	TOV	PF	PTS	plays_PG	plays_SG	plays_SF	plays_PF	plays_C	Season_Numeric	Next_Season_Score
count	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	...	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000	12648.000000
mean	200.705218	216.453036	26.551312	60.043722	31.035421	22.369892	3.533491	7.694790	0.450584	0.438536	...	1.391034	2.100253	9.305021	0.253716	0.332701	0.327087	0.355946	0.303843	50.780519	19.671316
std	9.413949	27.616132	3.907472	22.078819	30.437214	9.816474	2.275182	4.706923	0.074528	0.609560	...	0.820865	0.815102	6.079629	0.435154	0.471199	0.469168	0.478818	0.459934	10.129191	12.078043
min	160.000000	133.000000	18.000000	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	30.000000	-4.000000
25%	193.000000	195.000000	24.000000	47.000000	2.000000	14.400000	1.700000	3.900000	0.415000	0.000000	...	0.800000	1.500000	4.500000	0.000000	0.000000	0.000000	0.000000	0.000000	42.000000	10.050000
50%	200.700000	215.000000	26.000000	68.000000	19.000000	22.300000	3.100000	6.800000	0.452000	0.100000	...	1.200000	2.100000	8.000000	0.000000	0.000000	0.000000	0.000000	0.000000	51.000000	17.500000
75%	208.300000	235.000000	29.000000	79.000000	62.000000	30.800000	4.900000	10.800000	0.490000	0.700000	...	1.900000	2.700000	13.000000	1.000000	1.000000	1.000000	1.000000	1.000000	60.000000	27.725000
max	231.100000	330.000000	42.000000	85.000000	83.000000	43.700000	13.400000	27.800000	1.000000	5.100000	...	5.700000	6.000000	37.100000	1.000000	1.000000	1.000000	1.000000	1.000000	67.000000	68.050000

8 rows × 35 columns

In [28]:

df.groupby("Age").mean()[['FG', '3P%', 'FT%', '2P%', 'TRB', 'AST', 'BLK', 'TOV', 'PF', 'PTS']].head(10)

Out[28]:

	FG	3P%	FT%	2P%	TRB	AST	BLK	TOV	PF	PTS
Age
18.0	1.416667	0.159917	0.608917	0.432083	1.941667	0.566667	0.383333	0.658333	1.358333	3.708333
19.0	2.652041	0.201745	0.669204	0.459949	3.220408	1.321429	0.524490	1.162245	1.716327	6.943878
20.0	2.969231	0.215640	0.680802	0.457838	3.682996	1.560324	0.522267	1.277733	1.882996	7.827126
21.0	3.371655	0.219900	0.687147	0.463687	3.852608	1.719955	0.511565	1.360544	2.011111	8.843764
22.0	3.039694	0.202374	0.689547	0.461044	3.462755	1.682857	0.432857	1.276735	1.944898	7.945714
23.0	3.113436	0.202794	0.701479	0.462601	3.459618	1.757416	0.435977	1.283700	1.951762	8.130690
24.0	3.395662	0.211654	0.704888	0.467106	3.722500	1.966103	0.452279	1.372794	2.059706	8.895441
25.0	3.665811	0.214601	0.721808	0.472418	3.985634	2.136116	0.482745	1.435714	2.126164	9.680177
26.0	3.877049	0.214056	0.733745	0.475793	4.201639	2.297066	0.503969	1.501639	2.220362	10.239776
27.0	3.954891	0.220972	0.739258	0.476619	4.234188	2.362393	0.506553	1.523172	2.234283	10.449953

In [29]:

df.groupby("Season_Numeric").mean()[['3P%', 'FT%', '2P%', 'TRB', 'AST', 'BLK', 'TOV', 'PF', 'PTS']].head(10)

Out[29]:

	3P%	FT%	2P%	TRB	AST	BLK	TOV	PF	PTS
Season_Numeric
30	0.217100	0.755000	0.500100	4.660000	2.750000	0.390000	1.820000	2.420000	11.500000
31	0.112182	0.735818	0.502909	4.200000	2.581818	0.681818	1.818182	2.345455	10.709091
32	0.156051	0.722311	0.488097	4.201556	2.401556	0.522957	1.687549	2.549416	10.540078
33	0.124720	0.717420	0.482973	4.295331	2.469261	0.550195	1.826459	2.475875	10.487938
34	0.142391	0.734000	0.484050	4.031034	2.482759	0.494636	1.663985	2.432950	10.306513
35	0.143127	0.738647	0.485985	4.053455	2.476364	0.500000	1.685091	2.366545	10.404000
36	0.141316	0.718680	0.484004	4.189098	2.472932	0.509774	1.694737	2.421805	10.563910
37	0.141714	0.736007	0.476693	4.116429	2.407143	0.511071	1.596429	2.351429	10.327500
38	0.169051	0.749293	0.474420	4.027536	2.407609	0.502174	1.563043	2.288043	10.147464
39	0.182789	0.733786	0.472057	4.065886	2.378595	0.490970	1.571572	2.206020	10.080268

A picture says more than a thousand words¶

Seaborn¶

https://seaborn.pydata.org/

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

(venv) [nmichas@my-pc]$ pip install seaborn

In [30]:

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [31]:

from clear_seasons_data import get_clear_seasons_data
sns_df = get_clear_seasons_data()
sns_df.sample(5)

Out[31]:

	ShortName	Height	Weight	Age	Tm	Pos	G	GS	MP	FG	...	PF	PTS	plays_PG	plays_SG	plays_PF	plays_C	Score	Season_Numeric	Next_Season_Score
8429	obannch01	195.6	209	22.0	DET	SG	30.0	0.0	7.8	0.9	...	0.5	2.1	0	1	0	0	5.075	48	7.425
413	arenagi01	190.5	191	22.0	WAS	PG	55.0	52.0	37.6	6.5	...	3.2	19.6	1	0	0	0	34.950	54	44.800
820	battito01	210.8	230	33.0	NJN	C	15.0	0.0	8.9	0.9	...	1.3	2.4	0	0	1	1	5.100	60	7.250
10449	smithke01	190.5	170	24.0	TOT	PG	79.0	51.0	30.6	4.8	...	1.8	11.9	1	0	0	0	26.475	40	36.925
7160	marjabo01	221.0	290	27.0	SAS	C	54.0	4.0	9.4	1.9	...	1.0	5.5	0	0	0	1	12.500	66	12.450

5 rows × 39 columns

In [32]:

sns.distplot(sns_df['STL'])

Out[32]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fce611a55c0>

In [33]:

sns.jointplot(x='BLK', y='TRB', data=sns_df, kind='reg')

Out[33]:

<seaborn.axisgrid.JointGrid at 0x7fce610651d0>

In [34]:

test_df = sns_df[['Pos', 'TRB','AST','STL']]
sns.pairplot(test_df, hue='Pos', diag_kind='hist', hue_order=['PG', 'SG', 'SF', 'PF', 'C'])

Out[34]:

<seaborn.axisgrid.PairGrid at 0x7fce61dbdeb8>

In [35]:

sns.barplot(x='plays_PG', y='AST', data=sns_df)

Out[35]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fce6259a6a0>

In [36]:

sns.boxplot(x='Pos', y='TRB', data=sns_df, order=['PG', 'SG', 'SF', 'PF', 'C'])

Out[36]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fce6242a240>

In [37]:

sns.heatmap(sns_df[['TRB','AST','STL','BLK','TOV', '2P%', '3P%', 'eFG%', 'PF', 'Score']].corr(), annot=True)

Out[37]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fce623953c8>

Plotly for Python and Cufflinks¶

https://plot.ly/python/

Plotly's Python graphing library makes interactive, publication-quality graphs online.

(venv) [nmichas@my-pc]$ pip install plotly
(venv) [nmichas@my-pc]$ pip install cufflinks

In [38]:

import pandas as pd
import numpy as np
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
cf.go_offline()
%matplotlib inline

In [39]:

plotly_df = get_clear_seasons_data()

In [40]:

plotly_df.iplot(
    kind='scatter', x='Age', y='Score', text='ShortName', mode='markers', 
    layout={'autosize':False, 'width':800, 'height':600, 'hovermode': 'closest'})

# plotly and cufflinks does not work well with jupyter's slides

Step 4.¶

Predict Next Season.¶

We will use some Python Machine Learning Libraries in order to find out how every player will perform the next season

Scikit-learn¶

http://scikit-learn.org/

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

(venv) [nmichas@my-pc]$ pip install scikit-learn

Regression analysis

From Wikipedia, the free encyclopedia

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Predict with Linear Regression¶

In [41]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [42]:

from clear_seasons_data import get_clear_final_data

In [43]:

def get_train_test_datasets(df):
    max_season = df['Season_Numeric'].max()
    df_train = df[df['Season_Numeric'] != max_season]
    df_test = df[df['Season_Numeric'] == max_season]
    X_train = df_train.drop(columns=['Next_Season_Score'])
    y_train = df_train['Next_Season_Score']
    X_test = df_test.drop(columns=['Next_Season_Score'])
    y_test = df_test['Next_Season_Score']
    return X_train, X_test, y_train, y_test

In [44]:

regr_df = get_clear_final_data()
X_train, X_test, y_train, y_test = get_train_test_datasets(regr_df)

In [45]:

X_train.head(5)

Out[45]:

	Height	Weight	Age	G	GS	MP	FG	FGA	FG%	...	BLK	TOV	PF	PTS	plays_PF	plays_C	Season_Numeric
0	208.3	240	22.0	43.0	0.0	6.7	1.3	2.7	0.474	...	0.3	0.5	0.9	3.1	1	0	41
1	208.3	240	23.0	71.0	1.0	13.2	2.5	5.1	0.493	...	0.2	0.9	1.9	6.1	1	0	42
2	208.3	240	24.0	75.0	52.0	17.5	3.3	6.3	0.518	...	0.3	1.3	2.5	7.7	1	0	43
3	208.3	240	25.0	13.0	0.0	12.2	1.8	4.2	0.436	...	0.2	1.3	1.5	4.9	1	0	44
4	218.4	225	34.0	76.0	76.0	35.2	9.9	17.1	0.579	...	2.7	3.0	2.9	23.9	0	1	32

5 rows × 34 columns

In [46]:

y_train.head(5)

Out[46]:

0    12.325
1    14.950
2     8.350
3     8.500
4    44.350
Name: Next_Season_Score, dtype: float64

In [47]:

from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)

Out[47]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [48]:

lm.intercept_

Out[48]:

19.390105326328374

In [49]:

lm.coef_

Out[49]:

array([-1.62823036e-02,  4.15601307e-03, -4.21215753e-01,  2.53078338e-02,
       -9.78700150e-03, -2.22532612e-01,  6.52200483e-01, -2.20696560e-01,
       -5.15913294e+00,  3.70380493e+00, -9.80280912e-01, -1.36486384e-01,
        2.23047050e+00, -7.86904051e-01,  5.18996156e+00, -7.18298303e+00,
        1.43034951e+00, -6.28300398e-01, -5.12766833e-01,  2.47388821e+00,
        2.50854260e+00, -1.25279451e+00,  1.89030934e+00,  3.21056009e+00,
        2.68854892e+00, -1.17614684e+00, -7.65141144e-01,  7.58430663e-01,
        8.48358157e-01,  8.18898330e-01,  7.68414798e-01,  4.24741807e-01,
        9.26033901e-01, -1.82307125e-03])

In [50]:

coeff_df = pd.DataFrame(lm.coef_,X_train.columns,columns=['Coefficient'])
coeff_df.transpose()

Out[50]:

	Height	Weight	Age	G	GS	MP	FG	FGA	FG%	3P	...	BLK	TOV	PF	PTS	plays_PG	plays_SG	plays_SF	plays_PF	plays_C	Season_Numeric
Coefficient	-0.016282	0.004156	-0.421216	0.025308	-0.009787	-0.222533	0.6522	-0.220697	-5.159133	3.703805	...	2.688549	-1.176147	-0.765141	0.758431	0.848358	0.818898	0.768415	0.424742	0.926034	-0.001823

1 rows × 34 columns

In [51]:

predictions = lm.predict(X_test)

In [52]:

plt.scatter(y_test,predictions)

Out[52]:

<matplotlib.collections.PathCollection at 0x7fce563cecc0>

In [53]:

sns.distplot((y_test-predictions),bins=50)

Out[53]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fce57422d68>

In [54]:

from sklearn import metrics
print('Mean Absolute Error     :', metrics.mean_absolute_error(y_test, predictions))
print('Mean Squared Error      :', metrics.mean_squared_error(y_test, predictions))
print('Root Mean Squared Error :', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

Mean Absolute Error     : 4.90991732806849
Mean Squared Error      : 39.127676770519244
Root Mean Squared Error : 6.255211968472311

Compare Other Regression Methods¶

In [55]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from clear_seasons_data import get_clear_final_data
compare_df = get_clear_final_data()
X = df.drop(columns=['Next_Season_Score'])
y = df['Next_Season_Score']

In [56]:

X.head()

Out[56]:

	Height	Weight	Age	G	GS	MP	FG	FGA	FG%	...	BLK	TOV	PF	PTS	plays_PF	plays_C	Season_Numeric
0	208.3	240	22.0	43.0	0.0	6.7	1.3	2.7	0.474	...	0.3	0.5	0.9	3.1	1	0	41
1	208.3	240	23.0	71.0	1.0	13.2	2.5	5.1	0.493	...	0.2	0.9	1.9	6.1	1	0	42
2	208.3	240	24.0	75.0	52.0	17.5	3.3	6.3	0.518	...	0.3	1.3	2.5	7.7	1	0	43
3	208.3	240	25.0	13.0	0.0	12.2	1.8	4.2	0.436	...	0.2	1.3	1.5	4.9	1	0	44
4	218.4	225	34.0	76.0	76.0	35.2	9.9	17.1	0.579	...	2.7	3.0	2.9	23.9	0	1	32

5 rows × 34 columns

In [57]:

y.head()

Out[57]:

0    12.325
1    14.950
2     8.350
3     8.500
4    44.350
Name: Next_Season_Score, dtype: float64

In [58]:

from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet

In [59]:

a = 0.9

methods = [
    ('linear regression', LinearRegression()),
    ('lasso', Lasso(fit_intercept=True, alpha=a)),
    ('ridge', Ridge(fit_intercept=True, alpha=a)),
    ('elastic-net', ElasticNet(fit_intercept=True, alpha=a))
]

In [60]:

for name,met in methods:
    met.fit(X,y)
    p = met.predict(X)
    e = p-y
    total_error = np.dot(e,e)
    rmse_train = np.sqrt(total_error/len(p))
    
    print('Method: %s' %name)
    print('RMSE on training: %.4f' %rmse_train)
    print("\n")

Method: linear regression
RMSE on training: 6.0929


Method: lasso
RMSE on training: 6.4137


Method: ridge
RMSE on training: 6.0930


Method: elastic-net
RMSE on training: 6.4085

Neural Networks with Keras and Tensorflow¶

Keras¶

https://keras.io/

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Tensorflow¶

https://www.tensorflow.org/

TensorFlow™ is an open source software library for high performance numerical computation. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization.

In [62]:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Using TensorFlow backend.

In [63]:

from clear_seasons_data import get_clear_final_data, get_train_test_datasets
keras_df = get_clear_final_data()
X_train, X_test, y_train, y_test = get_train_test_datasets(keras_df)

In [64]:

# define base model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=34, kernel_initializer='normal', activation='relu'))
    model.add(Dense(10, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [65]:

estimator = KerasRegressor(build_fn=baseline_model, epochs=200, batch_size=256, verbose=2)

In [66]:

estimator.fit(X_train, y_train)

Epoch 1/200
 - 0s - loss: 459.5953
Epoch 2/200
 - 0s - loss: 132.8051
Epoch 3/200
 - 0s - loss: 99.5188
Epoch 4/200
 - 0s - loss: 82.6606
Epoch 5/200
 - 0s - loss: 66.9503
Epoch 6/200
 - 0s - loss: 57.6900
Epoch 7/200
 - 0s - loss: 54.7131
Epoch 8/200
 - 0s - loss: 52.4048
Epoch 9/200
 - 0s - loss: 50.1573
Epoch 10/200
 - 0s - loss: 48.3826
Epoch 11/200
 - 0s - loss: 46.7145
Epoch 12/200
 - 0s - loss: 45.5158
Epoch 13/200
 - 0s - loss: 44.4487
Epoch 14/200
 - 0s - loss: 43.8216
Epoch 15/200
 - 0s - loss: 43.1828
Epoch 16/200
 - 0s - loss: 42.6453
Epoch 17/200
 - 0s - loss: 42.3142
Epoch 18/200
 - 0s - loss: 41.7962
Epoch 19/200
 - 0s - loss: 41.3372
Epoch 20/200
 - 0s - loss: 40.9942
Epoch 21/200
 - 0s - loss: 40.7663
Epoch 22/200
 - 0s - loss: 40.3667
Epoch 23/200
 - 0s - loss: 40.5586
Epoch 24/200
 - 0s - loss: 39.9599
Epoch 25/200
 - 0s - loss: 39.8174
Epoch 26/200
 - 0s - loss: 39.6749
Epoch 27/200
 - 0s - loss: 39.4016
Epoch 28/200
 - 0s - loss: 39.2226
Epoch 29/200
 - 0s - loss: 39.3144
Epoch 30/200
 - 0s - loss: 39.1266
Epoch 31/200
 - 0s - loss: 38.9243
Epoch 32/200
 - 0s - loss: 38.7947
Epoch 33/200
 - 0s - loss: 38.7634
Epoch 34/200
 - 0s - loss: 38.9175
Epoch 35/200
 - 0s - loss: 38.7511
Epoch 36/200
 - 0s - loss: 38.5941
Epoch 37/200
 - 0s - loss: 38.5744
Epoch 38/200
 - 0s - loss: 38.4630
Epoch 39/200
 - 0s - loss: 38.5761
Epoch 40/200
 - 0s - loss: 38.4175
Epoch 41/200
 - 0s - loss: 38.3472
Epoch 42/200
 - 0s - loss: 38.1735
Epoch 43/200
 - 0s - loss: 38.1858
Epoch 44/200
 - 0s - loss: 38.0176
Epoch 45/200
 - 0s - loss: 38.1972
Epoch 46/200
 - 0s - loss: 38.1176
Epoch 47/200
 - 0s - loss: 38.3324
Epoch 48/200
 - 0s - loss: 37.9476
Epoch 49/200
 - 0s - loss: 37.8856
Epoch 50/200
 - 0s - loss: 37.9409
Epoch 51/200
 - 0s - loss: 37.8400
Epoch 52/200
 - 0s - loss: 37.7431
Epoch 53/200
 - 0s - loss: 37.7743
Epoch 54/200
 - 0s - loss: 37.8994
Epoch 55/200
 - 0s - loss: 37.6842
Epoch 56/200
 - 0s - loss: 37.7232
Epoch 57/200
 - 0s - loss: 38.0171
Epoch 58/200
 - 0s - loss: 37.7234
Epoch 59/200
 - 0s - loss: 37.9251
Epoch 60/200
 - 0s - loss: 37.7024
Epoch 61/200
 - 0s - loss: 37.6559
Epoch 62/200
 - 0s - loss: 37.5320
Epoch 63/200
 - 0s - loss: 37.5163
Epoch 64/200
 - 0s - loss: 37.7129
Epoch 65/200
 - 0s - loss: 37.6621
Epoch 66/200
 - 0s - loss: 37.6534
Epoch 67/200
 - 0s - loss: 37.4139
Epoch 68/200
 - 0s - loss: 37.5934
Epoch 69/200
 - 0s - loss: 37.4091
Epoch 70/200
 - 0s - loss: 37.5789
Epoch 71/200
 - 0s - loss: 37.4852
Epoch 72/200
 - 0s - loss: 37.5645
Epoch 73/200
 - 0s - loss: 37.3837
Epoch 74/200
 - 0s - loss: 37.4086
Epoch 75/200
 - 0s - loss: 37.3530
Epoch 76/200
 - 0s - loss: 37.4107
Epoch 77/200
 - 0s - loss: 37.3248
Epoch 78/200
 - 0s - loss: 37.3486
Epoch 79/200
 - 0s - loss: 37.2370
Epoch 80/200
 - 0s - loss: 37.4278
Epoch 81/200
 - 0s - loss: 37.3563
Epoch 82/200
 - 0s - loss: 37.5052
Epoch 83/200
 - 0s - loss: 37.3296
Epoch 84/200
 - 0s - loss: 37.1203
Epoch 85/200
 - 0s - loss: 37.1509
Epoch 86/200
 - 0s - loss: 37.2249
Epoch 87/200
 - 0s - loss: 37.1139
Epoch 88/200
 - 0s - loss: 37.3130
Epoch 89/200
 - 0s - loss: 37.2170
Epoch 90/200
 - 0s - loss: 37.0851
Epoch 91/200
 - 0s - loss: 37.1718
Epoch 92/200
 - 0s - loss: 37.6542
Epoch 93/200
 - 0s - loss: 37.0279
Epoch 94/200
 - 0s - loss: 37.0119
Epoch 95/200
 - 0s - loss: 37.5879
Epoch 96/200
 - 0s - loss: 37.3230
Epoch 97/200
 - 0s - loss: 37.1627
Epoch 98/200
 - 0s - loss: 36.9392
Epoch 99/200
 - 0s - loss: 37.1442
Epoch 100/200
 - 0s - loss: 37.0292
Epoch 101/200
 - 0s - loss: 37.2406
Epoch 102/200
 - 0s - loss: 37.1666
Epoch 103/200
 - 0s - loss: 37.1026
Epoch 104/200
 - 0s - loss: 37.0691
Epoch 105/200
 - 0s - loss: 36.9854
Epoch 106/200
 - 0s - loss: 37.0781
Epoch 107/200
 - 0s - loss: 36.9856
Epoch 108/200
 - 0s - loss: 37.2034
Epoch 109/200
 - 0s - loss: 36.9433
Epoch 110/200
 - 0s - loss: 37.1137
Epoch 111/200
 - 0s - loss: 36.9119
Epoch 112/200
 - 0s - loss: 36.8384
Epoch 113/200
 - 0s - loss: 36.8825
Epoch 114/200
 - 0s - loss: 37.0279
Epoch 115/200
 - 0s - loss: 36.9738
Epoch 116/200
 - 0s - loss: 36.8878
Epoch 117/200
 - 0s - loss: 37.2486
Epoch 118/200
 - 0s - loss: 36.8853
Epoch 119/200
 - 0s - loss: 36.9077
Epoch 120/200
 - 0s - loss: 36.8526
Epoch 121/200
 - 0s - loss: 36.7300
Epoch 122/200
 - 0s - loss: 37.0419
Epoch 123/200
 - 0s - loss: 36.7662
Epoch 124/200
 - 0s - loss: 36.7832
Epoch 125/200
 - 0s - loss: 36.7161
Epoch 126/200
 - 0s - loss: 36.9202
Epoch 127/200
 - 0s - loss: 36.8115
Epoch 128/200
 - 0s - loss: 36.8146
Epoch 129/200
 - 0s - loss: 37.0024
Epoch 130/200
 - 0s - loss: 36.6736
Epoch 131/200
 - 0s - loss: 36.9156
Epoch 132/200
 - 0s - loss: 36.6609
Epoch 133/200
 - 0s - loss: 36.8168
Epoch 134/200
 - 0s - loss: 37.2711
Epoch 135/200
 - 0s - loss: 36.8177
Epoch 136/200
 - 0s - loss: 36.6591
Epoch 137/200
 - 0s - loss: 36.8011
Epoch 138/200
 - 0s - loss: 37.0466
Epoch 139/200
 - 0s - loss: 36.7148
Epoch 140/200
 - 0s - loss: 36.6854
Epoch 141/200
 - 0s - loss: 36.8929
Epoch 142/200
 - 0s - loss: 36.8034
Epoch 143/200
 - 0s - loss: 37.0319
Epoch 144/200
 - 0s - loss: 36.8104
Epoch 145/200
 - 0s - loss: 36.6326
Epoch 146/200
 - 0s - loss: 36.9849
Epoch 147/200
 - 0s - loss: 37.2135
Epoch 148/200
 - 0s - loss: 36.7456
Epoch 149/200
 - 0s - loss: 36.5984
Epoch 150/200
 - 0s - loss: 36.6494
Epoch 151/200
 - 0s - loss: 36.5613
Epoch 152/200
 - 0s - loss: 37.0005
Epoch 153/200
 - 0s - loss: 36.6890
Epoch 154/200
 - 0s - loss: 36.6270
Epoch 155/200
 - 0s - loss: 36.8675
Epoch 156/200
 - 0s - loss: 36.6778
Epoch 157/200
 - 0s - loss: 36.6405
Epoch 158/200
 - 0s - loss: 36.5097
Epoch 159/200
 - 0s - loss: 36.8047
Epoch 160/200
 - 0s - loss: 36.5839
Epoch 161/200
 - 0s - loss: 36.6382
Epoch 162/200
 - 0s - loss: 36.6244
Epoch 163/200
 - 0s - loss: 36.6021
Epoch 164/200
 - 0s - loss: 36.5452
Epoch 165/200
 - 0s - loss: 36.5041
Epoch 166/200
 - 0s - loss: 36.6022
Epoch 167/200
 - 0s - loss: 36.6299
Epoch 168/200
 - 0s - loss: 36.6023
Epoch 169/200
 - 0s - loss: 36.7352
Epoch 170/200
 - 0s - loss: 36.4985
Epoch 171/200
 - 0s - loss: 36.7347
Epoch 172/200
 - 0s - loss: 36.5795
Epoch 173/200
 - 0s - loss: 36.6202
Epoch 174/200
 - 0s - loss: 36.8924
Epoch 175/200
 - 0s - loss: 36.5441
Epoch 176/200
 - 0s - loss: 36.6195
Epoch 177/200
 - 0s - loss: 36.6718
Epoch 178/200
 - 0s - loss: 36.7881
Epoch 179/200
 - 0s - loss: 36.4715
Epoch 180/200
 - 0s - loss: 36.4860
Epoch 181/200
 - 0s - loss: 36.5994
Epoch 182/200
 - 0s - loss: 36.4601
Epoch 183/200
 - 0s - loss: 36.5082
Epoch 184/200
 - 0s - loss: 36.4787
Epoch 185/200
 - 0s - loss: 36.6551
Epoch 186/200
 - 0s - loss: 36.5175
Epoch 187/200
 - 0s - loss: 36.7392
Epoch 188/200
 - 0s - loss: 36.5299
Epoch 189/200
 - 0s - loss: 36.4365
Epoch 190/200
 - 0s - loss: 36.5772
Epoch 191/200
 - 0s - loss: 36.5714
Epoch 192/200
 - 0s - loss: 36.4685
Epoch 193/200
 - 0s - loss: 36.5741
Epoch 194/200
 - 0s - loss: 36.5005
Epoch 195/200
 - 0s - loss: 36.4790
Epoch 196/200
 - 0s - loss: 36.4878
Epoch 197/200
 - 0s - loss: 36.6152
Epoch 198/200
 - 0s - loss: 36.8434
Epoch 199/200
 - 0s - loss: 36.5095
Epoch 200/200
 - 0s - loss: 36.5246

Out[66]:

<keras.callbacks.History at 0x7fce461f1978>

In [67]:

predictions = estimator.predict(X_test)

In [68]:

import matplotlib.pyplot as plt
%matplotlib inline
plt.scatter(y_test,predictions)

Out[68]:

<matplotlib.collections.PathCollection at 0x7fce45e511d0>

In [69]:

from sklearn import metrics
import numpy as np
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

MAE: 4.911167843974367
MSE: 39.100205451025126
RMSE: 6.253015708522179

Step 5.¶

Use the predictions

In [70]:

from clear_seasons_data import get_clear_final_data, get_train_test_datasets

final_df = get_clear_final_data(with_labels=True)
X_train, X_test, y_train, y_test = get_train_test_datasets(final_df)
labels_train = X_train[['Player']]
labels_test = X_test[['Player']]
X_train.drop(columns=['Player'], inplace=True)
X_test.drop(columns=['Player'], inplace=True)

In [71]:

from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train, y_train)
predictions = lm.predict(X_test)

In [72]:

labels_test['Prediction'] = predictions

In [73]:

prediction_df_sorted = labels_test.sort_values(by='Prediction', ascending=False)

In [74]:

prediction_df_sorted[:10]

Out[74]:

	Player	Prediction
11892	Russell Westbrook	64.568737
4664	James Harden	61.799162
2719	Anthony Davis	56.979021
5627	LeBron James	54.898267
369	Giannis Antetokounmpo	54.394127
3285	Kevin Durant	53.026744
2441	DeMarcus Cousins	52.271180
11256	Karl-Anthony Towns	51.269578
11706	John Wall	50.722309
1737	Jimmy Butler	47.564992

In [75]:

prediction_df_sorted[10:20]

Out[75]:

	Player	Prediction
6670	Kawhi Leonard	47.317304
2612	Stephen Curry	46.954240
11023	Isaiah Thomas	46.403544
6755	Damian Lillard	46.100491
5458	Kyrie Irving	43.599142
5971	Nikola Jokic	43.551978
8811	Chris Paul	43.180978
2959	DeMar DeRozan	42.571234
6902	Kyle Lowry	41.559765
4081	Paul George	41.363313

In [76]:

last_season_df = pd.read_csv('current.csv')
from clear_seasons_data import get_score
last_season_df['Score'] = last_season_df.apply(get_score, axis=1)
last_season_df = last_season_df.sort_values(by='Score', ascending=False)[['Player', 'Score']]
last_season_df.head(10)

Out[76]:

	Player	Score
84	Anthony Davis	58.750
179	LeBron James	55.575
103	Joel Embiid	54.550
9	Giannis Antetokounmpo	53.900
82	Stephen Curry	53.700
100	Kevin Durant	52.900
90	DeMar DeRozan	52.775
351	Russell Westbrook	52.350
141	James Harden	51.175
139	Blake Griffin	50.075

Using Python to improve my Fantasy Basketball team
¶

Nikolaos Michas
¶

PyCon Balkan 2018
¶

What is Fantasy Basketball ?¶

Fantasy sport

Basic Rules¶

Each Player collects points from the following statistics¶

How to improve my team?¶

Predict player's performance¶

Use Python¶

Step 0.¶

Pandas¶

pandas.DataFrame¶

Step 1.¶

Collect Statistics from Previous Years¶

Beautiful Soup¶

Read All Player Names¶

Read all Statistics for every player¶

Step 2.¶

Clear the Data¶

Step 3.¶

Inspect and Visualize our data.¶

A picture says more than a thousand words¶

Seaborn¶

Plotly for Python and Cufflinks¶

Step 4.¶

Predict Next Season.¶

Scikit-learn¶

Regression analysis

Predict with Linear Regression¶

Compare Other Regression Methods¶

Neural Networks with Keras and Tensorflow¶

Keras¶

Tensorflow¶

Step 5.¶

Next Steps¶

Potential Improvements¶

Q & A

Nikolaos Michas
¶

PyCon Balkan 2018
¶

Using Python to improve my Fantasy Basketball team¶

Nikolaos Michas¶

PyCon Balkan 2018¶

What is Fantasy Basketball ?¶

Fantasy sport

Basic Rules¶

Each Player collects points from the following statistics¶

How to improve my team?¶

Predict player's performance¶

Use Python¶

Step 0.¶

Pandas¶

pandas.DataFrame¶

Step 1.¶

Collect Statistics from Previous Years¶

Beautiful Soup¶

Read All Player Names¶

Read all Statistics for every player¶

Step 2.¶

Clear the Data¶

Step 3.¶

Inspect and Visualize our data.¶

A picture says more than a thousand words¶

Seaborn¶

Plotly for Python and Cufflinks¶

Step 4.¶

Predict Next Season.¶

Scikit-learn¶

Regression analysis

Predict with Linear Regression¶

Compare Other Regression Methods¶

Neural Networks with Keras and Tensorflow¶

Keras¶

Tensorflow¶

Step 5.¶

Next Steps¶

Potential Improvements¶

Q & A

Nikolaos Michas¶

PyCon Balkan 2018¶

Using Python to improve my Fantasy Basketball team
¶

Nikolaos Michas
¶

PyCon Balkan 2018
¶

Nikolaos Michas
¶

PyCon Balkan 2018
¶