NBA games results visualization with Holoviews

Visualizing NBA season game results with Holoviews

(Image source: fivethirtyeight)

(Image source: Holoviews)

Getting the data

We'll grab some NBA game data from basketball-reference.com using pandas' read_html function, which returns a list of DataFrames. Pandas' read_html is pretty good. On simple websites it almost always works. It provides a couple parameters for controlling what gets selected from the webpage if the defaults fail. It is always good to use it first, before moving on to BeautifulSoup or lxml if the page is more complicated.

We will be used the data for the last three seasons including the current one, i.e. 2016-2018 seasons.

In [1]:
%matplotlib inline
import os
import pathlib
import numpy as np
import pandas as pd
import scipy
import bokeh
import scipy.stats as ss
pd.options.display.max_rows = 10
# Ignore the excessive repeated warnings. Comment this out if you want warnings.
import warnings
warnings.filterwarnings("ignore")
In [2]:
months = ["october", "november", "december", "january", "february", "march"]
In [3]:
class Downloader(object):
    def __init__(self, year, months, filepath):
        self.year = year
        self.months = months
        self.filepath = pathlib.Path(filepath)
        self.games = self.download_dataset()

    def scrape(self, month):
        base_url = "https://www.basketball-reference.com/leagues/NBA_" + self.year + "_games"
        url = base_url + "-" + month + ".html"
        tables = pd.read_html(url)
        games = tables[0]
        return games

    def download_dataset(self):
        games = []
        if not pathlib.Path.exists(self.filepath):
            for month in self.months:
                data = self.scrape(month)
                # store DataFrame in list
                games.append(data)

            # Use pd.concat to merge a list of DataFrame into a single big DataFrame.
            games = pd.concat(games, axis=0)
            games.to_csv(self.filepath, index=False)
            return games

        else:
            games = pd.read_csv(self.filepath)
            return games
In [4]:
games_2018 = Downloader("2018", months, "./data/nba_2018.csv").games
games_2017 = Downloader("2017", months, "./data/nba_2017.csv").games
games_2016 = Downloader("2016", months, "./data/nba_2016.csv").games
In [5]:
games_2018.head()
Out[5]:
Date Start (ET) Visitor/Neutral PTS Home/Neutral PTS.1 .1 Attend. Notes
0 Tue, Oct 17, 2017 8:01 pm Boston Celtics 99.0 Cleveland Cavaliers 102.0 Box Score NaN 20562.0 NaN
1 Tue, Oct 17, 2017 10:30 pm Houston Rockets 122.0 Golden State Warriors 121.0 Box Score NaN 19596.0 NaN
2 Wed, Oct 18, 2017 7:30 pm Milwaukee Bucks 108.0 Boston Celtics 100.0 Box Score NaN 18624.0 NaN
3 Wed, Oct 18, 2017 8:30 pm Atlanta Hawks 117.0 Dallas Mavericks 111.0 Box Score NaN 19709.0 NaN
4 Wed, Oct 18, 2017 7:00 pm Charlotte Hornets 90.0 Detroit Pistons 102.0 Box Score NaN 20491.0 NaN
In [6]:
games_2018.tail()
Out[6]:
Date Start (ET) Visitor/Neutral PTS Home/Neutral PTS.1 .1 Attend. Notes
217 Sat, Mar 31, 2018 7:30 pm Toronto Raptors NaN Boston Celtics NaN NaN NaN NaN NaN
218 Sat, Mar 31, 2018 8:00 pm Brooklyn Nets NaN Miami Heat NaN NaN NaN NaN NaN
219 Sat, Mar 31, 2018 5:00 pm Detroit Pistons NaN New York Knicks NaN NaN NaN NaN NaN
220 Sat, Mar 31, 2018 10:00 pm Golden State Warriors NaN Sacramento Kings NaN NaN NaN NaN NaN
221 Sat, Mar 31, 2018 3:00 pm Charlotte Hornets NaN Washington Wizards NaN NaN NaN NaN NaN
In [7]:
games_2017.head()
Out[7]:
Date Start (ET) Visitor/Neutral PTS Home/Neutral PTS.1 .1 Attend. Notes
0 Tue, Oct 25, 2016 7:30 pm New York Knicks 88 Cleveland Cavaliers 117 Box Score NaN 20562 NaN
1 Tue, Oct 25, 2016 10:30 pm San Antonio Spurs 129 Golden State Warriors 100 Box Score NaN 19596 NaN
2 Tue, Oct 25, 2016 10:00 pm Utah Jazz 104 Portland Trail Blazers 113 Box Score NaN 19446 NaN
3 Wed, Oct 26, 2016 7:30 pm Brooklyn Nets 117 Boston Celtics 122 Box Score NaN 18624 NaN
4 Wed, Oct 26, 2016 7:00 pm Dallas Mavericks 121 Indiana Pacers 130 Box Score OT 17923 NaN
In [8]:
games_2016.head()
Out[8]:
Date Start (ET) Visitor/Neutral PTS Home/Neutral PTS.1 .1 Attend. Notes
0 Tue, Oct 27, 2015 8:00 pm Detroit Pistons 106 Atlanta Hawks 94 Box Score NaN 19187 NaN
1 Tue, Oct 27, 2015 8:00 pm Cleveland Cavaliers 95 Chicago Bulls 97 Box Score NaN 21957 NaN
2 Tue, Oct 27, 2015 10:30 pm New Orleans Pelicans 95 Golden State Warriors 111 Box Score NaN 19596 NaN
3 Wed, Oct 28, 2015 7:30 pm Philadelphia 76ers 95 Boston Celtics 112 Box Score NaN 18624 NaN
4 Wed, Oct 28, 2015 7:30 pm Chicago Bulls 115 Brooklyn Nets 100 Box Score NaN 17732 NaN
In [9]:
column_names = {
    'Date': 'date',
    'Start (ET)': 'start',
    '\xa0': 'box',
    'Visitor/Neutral': 'away_team',
    'PTS': 'away_points',
    'Home/Neutral': 'home_team',
    'PTS.1': 'home_points',
    '\xa0.1': 'n_ot',
    'Attend.': 'attendance'
}
In [10]:
def clean_dataframe(games):
    games = (games.rename(columns=column_names).dropna(thresh=4).drop(
        columns=['Notes', 'n_ot', 'attendance', 'box']
    ).assign(
        date=lambda x: pd.to_datetime(x['date'], infer_datetime_format=True))
             )  #format='%a, %b %d, %Y')))
    #.set_index('date', append=True).rename_axis(["game_id","date"]).sort_index())
    return games
In [11]:
games_2018 = clean_dataframe(games_2018)
games_2017 = clean_dataframe(games_2017)
games_2016 = clean_dataframe(games_2016)
In [12]:
games_2018.head()
Out[12]:
date start away_team away_points home_team home_points
0 2017-10-17 8:01 pm Boston Celtics 99.0 Cleveland Cavaliers 102.0
1 2017-10-17 10:30 pm Houston Rockets 122.0 Golden State Warriors 121.0
2 2017-10-18 7:30 pm Milwaukee Bucks 108.0 Boston Celtics 100.0
3 2017-10-18 8:30 pm Atlanta Hawks 117.0 Dallas Mavericks 111.0
4 2017-10-18 7:00 pm Charlotte Hornets 90.0 Detroit Pistons 102.0
In [13]:
from holoext.bokeh import Mod
import holoviews as hv
# Use Bokeh Backend for HoloViews
hv.extension('bokeh')