Develop a Hierarchical Bayesian Model with DAG (Directed acyclic graph)
- G B
- Feb 17
- 3 min read

Section 1 - Background
Lets go through how to develop a Marketing Mix Model which would explain the impact to sales because of media investments across key channels and at the same time controlling for macroeconomic and market forces.
Section 2 - Methodology for Developing the Marketing Mix Model (MMX)
1. Overview of the Marketing EquationThe foundation of the Marketing Mix Model (MMX) we developed rests on understanding how various media spends influence overall sales. The equation can be expressed as:
Ln(Sales) = Intercept + Σ (βi Ln(Media Spend_i)) + Σ (γj
Ln(Control Variable_j))- βi: Represents the media coefficient (impact factor) for each media channel.- f(Media Spend_i): The function capturing the diminishing returns and carryover effect (adstock) of media spend over time.- γj: Coefficients for various control variables (e.g., seasonality, macroeconomic factors, promotions, holidays).- The intercept represents the baseline sales when no media spend or external control variables are present.
2. Adstock and Carryover EffectsOne key aspect of media impact is that it extends beyond the immediate week of spending, creating what is referred to as an "adstock" or carryover effect. The adstock transformation is applied to media spend through the following equation:Adstocked Spend_t = α * Adstocked Spend_t-1 + Media Spend_t
α is the decay factor, also known as the "adstock" or "carryover" factor.

3. Diminishing Returns: The Hill Function
To capture the diminishing returns of media spend, we implemented the Hill function, which models how the marginal impact of additional media spend decreases as spending increases. The Hill function takes the form:f(Media Spend) = 1 / [1 + (Media Spend / EC) ^ (-slope)]- EC (Effective Concentration) is the media spend level at which the media channel impact is half of its maximum potential.- slope controls the steepness of the curve, determining how rapidly the returns diminish as media spend increases.

4. Coefficients (Betas)
Each media channel has its own βi, which represents its effectiveness in driving sales. These media coefficients are learned through model training and quantify the relative contribution of each media channel to sales. Similarly, control variables such as macroeconomic factors, store count, markdowns, holidays, and seasonality are included with their respective γj coefficients.

5.Hierarchical Bayesian Modeling
Given the complexity of marketing mix models, we employed a Hierarchical Bayesian framework. This approach provides several advantages over traditional methods, especially when dealing with high uncertainty.
- Hierarchical Structure: The model incorporates multiple levels, where each media channel has its own parameters (e.g., adstock decay factor, Hill parameters). These parameters are modeled with shared hyperparameters, which allow the model to “borrow strength” across media channels.
- Bayesian Inference: By using Bayesian inference, we estimate distributions for the parameters rather than point estimates. This allows us to quantify uncertainty in our parameter estimates and make more informed decisions by considering the entire posterior distribution of the parameters.
- Prior Distributions: We incorporate prior knowledge about parameters (e.g., reasonable ranges for adstock decay, slopes, etc.). These priors guide the model towards more plausible parameter values, especially when data is noisy or sparse.The hierarchical Bayesian approach is particularly suitable because:
- Handling Uncertainty: Marketing data is often noisy and uncertain, especially for media channels with lower spends or fewer observations. Bayesian models provide more accurate estimates by incorporating uncertainty directly into the estimation.
- Pooling Information: When data for some media channels is limited, the model can pool information across channels, leading to more robust and stable estimates.
- Posterior Distributions: Instead of single-point estimates, we obtain posterior distributions for each parameter, which provides a richer understanding of potential media effectiveness and allows us to quantify uncertainty in our estimates.
Can we use other methods like XG_Boost or Random Forest?
The answer is a big YES. I tried using Random Forest (Similar approach will be vaid for XG_Boost as it is nothing but pruning trees in more or less a sequential order vs trying to divide the data using a Random Forest Design)
Model Training Code
Pay very close attention to how you optimize for 1. Best_Alphas (from adstock transformation), 2. Slope & 3. EC (from the Hill transformation) as when you run the model
predicted_sales_scaled = rf_model.predict(X_media_hill)
It will only optimize the Random Forest Hyperparameters and NOT alphas, slope or EC for each channel.
We need to do that and make a part of the RandomForest optimization using a nested for loops. I will demonstrate below how to do that



Comments