Bayesian geo-lift in Python
Let’s consider a hypothetical scenario… We are a data scientist within a company that operates over Europe. We have been given a historical dataset of sales volumes, in units of 1000’s. The data is broken down by country and was collected at weekly frequency. We have data for the past 4 years.
At the start of 2022, the marketing department decided to refurbish all the stores in Denmark. Now, at the end of 2022, the company wants you to assess whether this refurbishment programme increased sales. If you tell them that the store refurbishment scheme increased sales volumes then they will roll out the scheme to other countries. Nobody said this, but in the back of your mind you worry that if you tell them that refurbishments increase sales but that doesn’t actually happen in the future, then the companies profits will drop, the value of your shares will decrease, and your job security may be at risk.
Your boss is pretty tuned in. She also has these concerns. She knows that while it might be easy to establish an association between the store refurbishments and changes in sales volumes, we really want to know if the store refurbishments caused an increase in sales.
We know that the best way to make causal claims is to run a randomized control trial (sometimes known as an A/B test). If we have randomly assigned stores across Europe (or picked a…