Teachers Credit Union Daily Cash Use Forecast

Introduction

Teachers Credit Union (TCU) is committed to being the best in-store member experience it can be, and it starts with the logistics of providing enough cash. Too much cash for a branch will exceed the cash limit, but too little cash will incur additonal cost for TCU to make emergency orders. The old method of handling this is by team members viewing how much cash is on hand, comparing that to last year’s usage in the current time frame, and estimating the need for the next week. However this naive seasonal estimation based on past year cannot take into consideration the yearly changes and has been made even harder due to covid situation. It is time now to build a better time series model to forecast the daily cash use by branches.

Data

We used daily cash use data of 49 branches at TCU from Jan 1, 2010 to July 31, 2020. It is a time series data of daily cash use with 171230 observations. Aside from daily cash use, it also includes variables such as holidays, cash order received per week, cash balance start of the day, etc.


	BranchID	CashOrdersReceived	CashBalance_StartOfDay	TotalCashUsed	CashToVault	CashUnder	CashOver	TotalCashDisbursed	TotalCashOpeningBalance	TotalCashReceivedAllSources	HolidayDate	Holiday	before	before_holiday	after	after_holiday	Year	Month	Day_name
BusinessDate
2010-01-01	99	nan	1195235.33	48082.66	nan	nan	nan	nan	1195235.33	nan	2010-01-01	New Year's Day	NaN	NaN	NaN	NaN	2010	1	Friday
2010-01-02	99	nan	1147152.68	nan	nan	nan	nan	nan	1147152.68	nan	NaN	NaN	NaN	NaN	2010-01-02	New Year's Day_After	2010	1	Saturday
2010-01-03	99	nan	1147152.68	92262.26	nan	nan	nan	nan	1147152.68	nan	NaN	NaN	NaN	NaN	NaN	NaN	2010	1	Sunday
2010-01-04	99	nan	1054890.42	64107.06	nan	nan	nan	nan	1054890.42	nan	NaN	NaN	NaN	NaN	NaN	NaN	2010	1	Monday
2010-01-05	99	nan	990783.35	12372.77	nan	nan	nan	nan	990783.35	nan	NaN	NaN	NaN	NaN	NaN	NaN	2010	1	Tuesday
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2020-07-27	501	nan	1174197.12	18095.32	nan	nan	nan	nan	1174197.12	nan	NaN	NaN	NaN	NaN	NaN	NaN	2020	7	Monday
2020-07-28	501	nan	1156101.79	161573.86	nan	nan	nan	nan	1156101.79	nan	NaN	NaN	NaN	NaN	NaN	NaN	2020	7	Tuesday
2020-07-29	501	440550.11	1435078.04	66574.28	nan	nan	nan	nan	1435078.04	nan	NaN	NaN	NaN	NaN	NaN	NaN	2020	7	Wednesday
2020-07-30	501	nan	1368503.76	227236.55	nan	nan	nan	nan	1368503.76	nan	NaN	NaN	NaN	NaN	NaN	NaN	2020	7	Thursday
2020-07-31	501	nan	1141267.21	55886.53	nan	600.15	nan	816941.38	1314870.56	643938.18	NaN	NaN	NaN	NaN	NaN	NaN	2020	7	Friday

171230 rows × 19 columns

EDA

Daily cash use by branches

We can see that the distribution of daily cash usage by branches have a lot of outliers for each branch.

Average daily cash use by branches

We can see that branches vary a lot in terms of average daily cash usage. The largest branch has an average daily cash usage of $256,456, while the smallest branch has an average daily cash usage of $9,957.

Compare two branches

Branch 270

Branch 398

The comparison between two branches further elaborate on the differences among branches. These differences suggest that we may need to build different models for different branches.

Initial EDA confirms on two main points:

Big differences by branch
Naive seaonal prediction based on last year’s data at the same time cannot capture the changes in the new year, especially during the COVID

Closing dates for branches

Closing days of week

Most of the branches will close on Saturdays.

Closing holidays

Most of the branches will close on Independence Day and Thanksgiving.

Modeling

Fit one branch

Branch=270

Seasonal Naive – Benchmark

This is the traditional way used by TCU to predict the cash usage.

ARIMA and SARIMA Models

This graph compares the results of ARIMA(6, 1, 0) and Seasonal ARIMA(6,1,0)(0,1,2)[7] models.

Prophet Model

This graph shows the result of prophet model.

Model Selection

SMAPE (Symetric Mean Absolute Percentage Error)

This is the main metric the client used to test the model.

Seasonal Naive SMAPE=107.52
ARIMA SMAPE=75.09
SARIMA SMAPE=65.83
SMAPE=34.56

Final selection - Prophet Model

Accurate and fast
Fully automatic: robust to outliers, missing data, and dramatic changes in time series.
Tunable forecasts: human-interpretable parameters, adding domain knowledge.

Prophet forecast

This graph shows the forecast result for the prophet model. Blue lines indicate the forcast values, black dots indicate the actual values

Prophet component

This graph shows the components of prophet model, including the trend, holiday effect, yearly, daily, weekly seasonality. It is worth noticing that daily cash usage during the covid is more spread out over a week while daily cash usage before the covid tend to cluster on one day.

Model Tuning

Hyperparameters for tuning:

changepoint_prior_scale [0.001, 0.5]: flexibility of the trend. Small – less flexible
seasonality_prior_scale [0.01, 10]: flexibility of the seasonality.
holidays_prior_scale [0.01, 10]: flexibility to fit holiday effects.
seasonality_mode: Options are ‘additive’ and ‘multiplicative’.

There are 49 branches, we looped through all branches and calculate the SMAPE for each branch. Then selected branches with high SMAPE to retune the model. Here is an example of code:

param_grid = {
'changepoint_prior_scale': [0.01, 0.05, 0.1, 0.5], 'seasonality_prior_scale': [0.01, 0.1, 1.0, 10.0], 'holidays_prior_scale': [0.1, 1.0, 10.0, 20],

}

##Generate all combinations of parameters¶
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())] smape_values = [] # Store the smapes for each params here

for params in all_params:

m = Prophet(holidays=holiday, yearly_seasonality=True, weekly_seasonality=False,  daily_seasonality=True, seasonality_mode='multiplicative', 
       **params)
m.add_seasonality(name='weekly_is_covid', period=7, fourier_order=3, condition_name='is_covid')
m.add_seasonality(name='weekly_not_covid', period=7, fourier_order=3, condition_name='not_covid')
#m.add_regressor('is_weekends')

m.fit(train_ts)  # Fit model with given params
cutoffs = pd.to_datetime(['2019-12-01', '2020-03-01','2020-04-01', '2020-05-01'])
df_cv = cross_validation(m, cutoffs=cutoffs, horizon='120 days', period='7 days', parallel="processes")
a = df_cv['y']
f = df_cv['yhat']
smape_value = smape(a, f)
smape_values.append(smape_value)

Results

Overall Prophet model SMAPE on test data: 59.71%.
Overall Seasonal Naïve method SMAPE on test data: 141.26%.
Improved performance: (141.26%-59.71%)/141.26% = 57.73%.
SMAPE on forecast data: 74.89%, ranked 1st out of 29 teams.
Built an Streamlit app to deploy prophet model.