View on GitHub

Teachers Credit Union Daily Cash Use Forecast

2021 Crossroads Classic Datathon

Introduction

Teachers Credit Union (TCU) is committed to being the best in-store member experience it can be, and it starts with the logistics of providing enough cash. Too much cash for a branch will exceed the cash limit, but too little cash will incur additonal cost for TCU to make emergency orders. The old method of handling this is by team members viewing how much cash is on hand, comparing that to last year’s usage in the current time frame, and estimating the need for the next week. However this naive seasonal estimation based on past year cannot take into consideration the yearly changes and has been made even harder due to covid situation. It is time now to build a better time series model to forecast the daily cash use by branches.

Data

We used daily cash use data of 49 branches at TCU from Jan 1, 2010 to July 31, 2020. It is a time series data of daily cash use with 171230 observations. Aside from daily cash use, it also includes variables such as holidays, cash order received per week, cash balance start of the day, etc.

BranchIDCashOrdersReceivedCashBalance_StartOfDayTotalCashUsedCashToVaultCashUnderCashOverTotalCashDisbursedTotalCashOpeningBalanceTotalCashReceivedAllSourcesHolidayDateHolidaybeforebefore_holidayafterafter_holidayYearMonthDay_name
BusinessDate                   
2010-01-0199nan1195235.3348082.66nannannannan1195235.33nan2010-01-01New Year's DayNaNNaNNaNNaN20101Friday
2010-01-0299nan1147152.68nannannannannan1147152.68nanNaNNaNNaNNaN2010-01-02New Year's Day_After20101Saturday
2010-01-0399nan1147152.6892262.26nannannannan1147152.68nanNaNNaNNaNNaNNaNNaN20101Sunday
2010-01-0499nan1054890.4264107.06nannannannan1054890.42nanNaNNaNNaNNaNNaNNaN20101Monday
2010-01-0599nan990783.3512372.77nannannannan990783.35nanNaNNaNNaNNaNNaNNaN20101Tuesday
............................................................
2020-07-27501nan1174197.1218095.32nannannannan1174197.12nanNaNNaNNaNNaNNaNNaN20207Monday
2020-07-28501nan1156101.79161573.86nannannannan1156101.79nanNaNNaNNaNNaNNaNNaN20207Tuesday
2020-07-29501440550.111435078.0466574.28nannannannan1435078.04nanNaNNaNNaNNaNNaNNaN20207Wednesday
2020-07-30501nan1368503.76227236.55nannannannan1368503.76nanNaNNaNNaNNaNNaNNaN20207Thursday
2020-07-31501nan1141267.2155886.53nan600.15nan816941.381314870.56643938.18NaNNaNNaNNaNNaNNaN20207Friday
171230 rows × 19 columns

EDA

Daily cash use by branches

We can see that the distribution of daily cash usage by branches have a lot of outliers for each branch.

Average daily cash use by branches

We can see that branches vary a lot in terms of average daily cash usage. The largest branch has an average daily cash usage of $256,456, while the smallest branch has an average daily cash usage of $9,957.

Compare two branches

Branch 270

Branch 398

The comparison between two branches further elaborate on the differences among branches. These differences suggest that we may need to build different models for different branches.

Initial EDA confirms on two main points:

  1. Big differences by branch
  2. Naive seaonal prediction based on last year’s data at the same time cannot capture the changes in the new year, especially during the COVID

Closing dates for branches

Closing days of week

Most of the branches will close on Saturdays.

Closing holidays

Most of the branches will close on Independence Day and Thanksgiving.

Modeling

Fit one branch

Branch=270

Seasonal Naive – Benchmark

This is the traditional way used by TCU to predict the cash usage.

ARIMA and SARIMA Models

This graph compares the results of ARIMA(6, 1, 0) and Seasonal ARIMA(6,1,0)(0,1,2)[7] models.

Prophet Model

This graph shows the result of prophet model.

Model Selection
SMAPE (Symetric Mean Absolute Percentage Error)

This is the main metric the client used to test the model.

Final selection - Prophet Model
Prophet forecast

This graph shows the forecast result for the prophet model. Blue lines indicate the forcast values, black dots indicate the actual values

Prophet component

This graph shows the components of prophet model, including the trend, holiday effect, yearly, daily, weekly seasonality. It is worth noticing that daily cash usage during the covid is more spread out over a week while daily cash usage before the covid tend to cluster on one day.

Model Tuning
Hyperparameters for tuning:

There are 49 branches, we looped through all branches and calculate the SMAPE for each branch. Then selected branches with high SMAPE to retune the model. Here is an example of code:

param_grid = {
'changepoint_prior_scale': [0.01, 0.05, 0.1, 0.5], 'seasonality_prior_scale': [0.01, 0.1, 1.0, 10.0], 'holidays_prior_scale': [0.1, 1.0, 10.0, 20],

}

##Generate all combinations of parameters¶
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())] smape_values = [] # Store the smapes for each params here

for params in all_params:

m = Prophet(holidays=holiday, yearly_seasonality=True, weekly_seasonality=False,  daily_seasonality=True, seasonality_mode='multiplicative', 
       **params)
m.add_seasonality(name='weekly_is_covid', period=7, fourier_order=3, condition_name='is_covid')
m.add_seasonality(name='weekly_not_covid', period=7, fourier_order=3, condition_name='not_covid')
#m.add_regressor('is_weekends')

m.fit(train_ts)  # Fit model with given params
cutoffs = pd.to_datetime(['2019-12-01', '2020-03-01','2020-04-01', '2020-05-01'])
df_cv = cross_validation(m, cutoffs=cutoffs, horizon='120 days', period='7 days', parallel="processes")
a = df_cv['y']
f = df_cv['yhat']
smape_value = smape(a, f)
smape_values.append(smape_value)
Results