You can customize and configure the Meridian base model specification for your specific needs, such as customizing ROI priors, adjusting for seasonality, setting the maximum carry over duration, using reach and frequency, and more. For more information about the Meridian base model and options, see The Meridian model section.
Default model specification
You can use the following default model specification to start creating your model:
model_spec = spec.ModelSpec(
prior=prior_distribution.PriorDistribution(),
media_effects_dist='log_normal',
hill_before_adstock=False,
max_lag=8,
unique_sigma_for_each_geo=False,
media_prior_type='roi',
roi_calibration_period=None,
rf_prior_type='roi',
rf_roi_calibration_period=None,
organic_media_prior_type='contribution',
organic_rf_prior_type='contribution',
non_media_treatments_prior_type='contribution',
knots=None,
baseline_geo=None,
holdout_id=None,
control_population_scaling_id=None,
)
Set the priors
You can customize the priors in the default model
specification. Every parameter gets its own independent prior
that can be set by the prior
argument in Meridian
ModelSpec
. For
information about the priors and exceptions, see Default prior
distributions.
The following example customizes the ROI prior distributions for each media channel. In this particular example, the ROI priors for the media channels differ.
- Channel 1:
LogNormal(0.2, 0.7)
- Channel 2:
LogNormal(0.3, 0.9)
- Channel 3:
LogNormal(0.4, 0.6)
- Channel 4:
LogNormal(0.3, 0.7)
- Channel 5:
LogNormal(0.3, 0.6)
- Channel 6:
LogNormal(0.4, 0.5)
my_input_data = input_data.InputData( ... )
build_media_channel_args = my_input_data.get_paid_media_channels_argument_builder()
# Assuming Channel1,...,Channel6 are all media channels.
roi_m = build_media_channel_args(
Channel1=(0.2, 0.7),
Channel2=(0.3, 0.9),
Channel3=(0.4, 0.6),
Channel4=(0.3, 0.7),
Channel5=(0.3, 0.6),
Channel6=(0.4, 0.5),
) # This creates a list of channel-ordered (mu, sigma) tuples.
roi_m_mu, roi_m_sigma = zip(*roi_m)
prior = prior_distribution.PriorDistribution(
roi_m=tfp.distributions.LogNormal(
roi_m_mu, roi_m_sigma, name=constants.ROI_M
)
)
model_spec = spec.ModelSpec(prior=prior)
Where prior
in ModelSpec
is a
PriorDistribution
object specifying the prior distribution of each set of model parameters. Every
parameter gets its own independent prior that can be set by the
prior_distribution.PriorDistribution()
constructor.
Model parameters with an m
subscript (for example, roi_m
) can either have
dimensionality equal to the number of media channels or can be one-dimensional.
When the dimensionality is equal to the number of media channels, the order of
parameter values in the custom prior correspond to the order in
data.media_channel
, representing a custom prior being set for each respective
media channel. If you can't determine a custom prior for some of the media
channels, consider manually using the default tfd.LogNormal(0.2, 0.9)
. When a
one-dimensional prior is passed, then the single dimension is used for all of
the media channels.
The logic for setting priors for model parameters with a c
subscript
(for example, gamma_c
) is the same as it is for the m
subscript. For c
subscripts, the dimensionality can either be equal to the number of control
variables or can be one-dimensional. When the dimensionality is equal to the
number of control variables, the order of parameter values in the custom prior
correspond to the order in data.control_variable
, representing a custom prior
being set for each respective control variable.
The following example shows that a single number is used to set the same ROI
prior for each channel. In this particular example, the ROI priors for the two
media channels are identical, with both represented as LogNormal(0.2, 0.9)
.
roi_mu = 0.2
roi_sigma = 0.9
prior = prior_distribution.PriorDistribution(
roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)
model_spec = spec.ModelSpec(prior=prior)
It is important to note that Meridian has a distinct ROI parameter
(roi_rf
) and beta parameter (beta_rf
) for the channels that possess reach
and frequency. Consequently, some modifications are required to the previously
mentioned code snippets in instances where specific channels have reach and
frequency data. In this example, channel 4 and channel 5 have reach and
frequency data.
To customize the ROI prior distributions for each media channel:
# ROI prior for channels without R&F data build_media_channel_args = my_input_data.get_paid_media_channels_argument_builder() roi_m = build_media_channel_args( Channel1=(0.2, 0.7), Channel2=(0.3, 0.9), Channel3=(0.4, 0.6), Channel4=(0.3, 0.7), ) roi_m_mu, roi_m_sigma = zip(*roi_m) # ROI prior for channels with R&F data build_rf_channel_args = my_input_data.get_paid_rf_channels_argument_builder() roi_rf = build_rf_channel_args( Channel5=(0.3, 0.6), Channel6=(0.4, 0.5), ] roi_rf_mu, roi_rf_sigma = zip(*roi_rf) prior = prior_distribution.PriorDistribution( roi_m=tfp.distributions.LogNormal( roi_m_mu, roi_m_sigma, name=constants.ROI_M ), roi_rf=tfp.distributions.LogNormal( roi_rf_mu, roi_rf_sigma, name=constants.ROI_RF ), ) model_spec = spec.ModelSpec(prior=prior)
Note that the order of parameter values in roi_rf_mu
and roi_rf_sigma
must
match data.rf_channel
.
To set same ROI priors for all media channels:
roi_mu = 0.2 roi_sigma = 0.9 prior = prior_distribution.PriorDistribution( roi_m=tfp.distributions.LogNormal( roi_mu, roi_sigma, name=constants.ROI_M), roi_rf=tfp.distributions.LogNormal( roi_mu, roi_sigma, name=constants.ROI_RF ), ) model_spec = spec.ModelSpec(prior=prior)
Use a training and test data split (Optional)
We recommend using a training and test data split to help avoid overfitting and
to help ensure that the model generalizes well with new data. This can be done
by using holdout_id
. This is an optional step.
The following example shows a holdout_id
argument that randomly sets 20% of
the data as the test group:
np.random.seed(1)
test_pct = 0.2 # 20% of data are held out
n_geos = len(data.geo)
n_times = len(data.time)
holdout_id = np.full([n_geos, n_times], False)
for i in range(n_geos):
holdout_id[
i,
np.random.choice(
n_times,
int(np.round(test_pct * n_times)),
)
] = True
model_spec = spec.ModelSpec(holdout_id=holdout_id)
Where holdout_id
is an optional boolean tensor of dimensions (n_geos
x
n_times
) or (n_times
) that indicates which observations are excluded from
the training sample. Only the response variable is excluded from the training
sample, and the media variables are still included as they can affect the
Adstock of subsequent weeks. Default: None
(meaning there aren't any
holdout geo and times).
Tune the automatic seasonality adjustment (Optional)
Meridian applies automatic seasonality adjustment through a time-varying
intercept approach. The effects can be tuned by adjusting the value of knots
.
For more information, see
How the knots
argument works.
knots
is an optional integer or list of integers indicating the knots used to
estimate time effects. When knots
is a list of integers, the knot locations
are provided by that list, where zero corresponds to a knot at the first time
period, one corresponds to a knot at the second time period, etc., with
(n_times - 1
) corresponding to a knot at the last time period.
When knots
is an integer, there are a corresponding number of knots with
locations equally spaced across the time periods, including knots at zero and
(n_times - 1
). When knots
is 1
, there is a single common regression
coefficient used for all time periods.
If knots
is set to None
, the number of knots used is equal to the number of
time periods in the case of a geo model. This is equivalent to each time period
having its own regression coefficient. For national models, if knots
is set to
None
, then the number of knots used is 1
. By default, its value is set to
None
.
For guidance on how to choose the number of knots for time effects in the model, see Choosing the number of knots for time effects in the model.
Examples
Setting
knots
to1
so that there is no automatic seasonality adjustment. In this case, we recommend that you include your own seasonality or holiday dummies as control variables:model_spec = spec.ModelSpec(knots=1)
Setting
knots
to a relatively large number:knots = round(0.8 * n_times) model_spec = spec.ModelSpec(knots=knots)
Setting knots for every 4 time points:
knots = np.arange(0, n_times, 4).tolist() model_spec = spec.ModelSpec(knots=knots)
Setting knots so that there is a knot in November and December, but is relatively sparse elsewhere. To make this example tractable, assume that there are only 12 data points and that the data is monthly (this assumption is not realistic or recommended). This example is summarized in the following table:
To set knots at indexes
0
,3
,6
,10
, and11
, do the following:knots = [0, 3, 6, 10, 11] model_spec = spec.ModelSpec(knots=knots)
You can use a similar approach to help ensure knots at major holidays.
Tune the ROI calibration (Optional)
Meridian introduces an ROI calibration method that reparameterizes ROI as a model parameter. For more information, see ROI priors for calibration.
By default, the same non-informative prior on ROI is applied to all media channels. You can tune this feature by doing one of the following:
- Turning off the ROI calibration
- Setting the ROI calibration period
Turn off the ROI calibration
You can turn off the ROI calibration feature using
media_prior_type='coefficient'
and rf_prior_type='coefficient'
:
model_spec = spec.ModelSpec(
media_prior_type='coefficient',
rf_prior_type='coefficient',
)
The media_prior_type
argument indicates whether to use a prior on roi_m
,
mroi_m
, or beta_m
in the PriorDistribution
. Default: 'roi'
(recommended)
Set the ROI calibration period
Although the media effect regression coefficients don't have time varying effects, there is a calibration window argument for setting the ROI (or mROI) prior. This is because ROI (or mROI) at a given time is dependent on additional factors that might vary across time:
- The Hill curves model the non-linear, diminishing returns of media execution. Therefore, the amount of media execution at a given time can impact the ROI.
- Media allocation across geos with different effectiveness.
- Cost of media execution.
You can calibrate the MMM using a subset of data when the experimental results don't reflect the modeling return on ad spend (ROAS) that the MMM aims to measure. For example, the experiment timeframe is not aligned with the MMM data timeframe. For more information, see Media Mix Model Calibration with Bayesian Priors and ROI priors and calibration.
The following example shows how to specify the ROI calibration period from
'2021-11-01'
to '2021-12-20'
for channel 1. Any media channels not specified
within the roi_period
will utilize all available periods for ROI calibration.
roi_period = {
'Channel1': [
'2021-11-01',
'2021-11-08',
'2021-11-15',
'2021-11-22',
'2021-11-29',
'2021-12-06',
'2021-12-13',
'2021-12-20',
],
}
roi_calibration_period = np.zeros((len(data.time), len(data.media_channel)))
for i in roi_period.items():
roi_calibration_period[
np.isin(data.time.values, i[1]), data.media_channel.values == i[0]
] = 1
roi_calibration_period[
:, ~np.isin(data.media_channel.values, list(roi_period.keys()))
] = 1
model_spec = spec.ModelSpec(roi_calibration_period=roi_calibration_period)
Where roi_calibration_period
is an optional boolean array of shape
(n_media_times, n_media_channels)
indicating the subset of time
for media
ROI calibration. If set to None
, all times are used for media ROI calibration.
Default: None
.
Note that Meridian has a different parameter,
rf_roi_calibration_period
, for the channels that possess reach and frequency.
The following example shows how to specify the ROI calibration period from
'2021-11-01'
to '2021-12-20'
for channel 5, which use reach and frequency as
inputs.
roi_period = {
'Channel5': [
'2021-11-01',
'2021-11-08',
'2021-11-15',
'2021-11-22',
'2021-11-29',
'2021-12-06',
'2021-12-13',
'2021-12-20',
],
}
rf_roi_calibration_period = np.zeros(len(data.time), len(data.rf_channel))
for i in roi_period.items():
rf_roi_calibration_period[
np.isin(data.time.values, i[1]), data.rf_channel.values == i[0]
] = 1
rf_roi_calibration_period[
:, ~np.isin(data.rf_channel.values, list(roi_period.keys()))
] = 1
model_spec = spec.ModelSpec(rf_roi_calibration_period=rf_roi_calibration_period)
Where rf_roi_calibration_period
is an optional boolean array of shape
(n_media_times, n_rf_channels)
. If set to None
, all times and geos are
used for media ROI calibration.
Default: None
.
Set additional attributes (Optional)
You can change any remaining attributes in the default model specification as needed. This section describes the remaining attributes and examples of how to change the values.
baseline_geo
An optional integer or a string for the baseline geo. The baseline geo is
treated as the reference geo in the dummy encoding of geos. Non-baseline geos
have a corresponding tau_g
indicator variable, which means that they have a
higher prior variance than the baseline geo. When set to None
, the geo with
the biggest population is used as the baseline. Default: None
The following example shows the baseline geo set to 'Geo10'
:
model_spec = spec.ModelSpec(baseline_geo='Geo10')
hill_before_adstock
A boolean indicating whether to apply the Hill function before the Adstock
function, in contrast to the default order of Adstock before Hill. Default:
False
The following examples shows that the Hill function will be applied first, by
setting the value to True
:
model_spec = spec.ModelSpec(hill_before_adstock=True)
max_lag
An integer indicating the maximum number of lag periods (>= 0) to include in the
Adstock calculation. Can also be set to None
, which is the equivalent to the
whole modeling period. Default: 8
The following example changes the value to 4
:
model_spec = spec.ModelSpec(max_lag=4)
media_effects_dist
A string to specify the distribution of media random effects across geos.
Allowed values: 'normal'
or 'log_normal'
. Default: 'log_normal'
Where:
media_effects_dist='log_normal'
is \(\beta m,g\ {_\sim^{iid}} Lognormal(\beta m, \eta ^2m)\)media_effects_dist='normal'
is \(\beta m,g\ {_\sim^{iid}} Normal (\beta m, \eta ^2m)\)
The following example shows how to change the value to 'normal'
:
model_spec = spec.ModelSpec(media_effects_dist='normal')
control_population_scaling_id
An optional boolean tensor of dimension (n_controls
) indicating the control
variables for which the control value will be scaled by population. Default:
None
The following example specifies the control variable at index 1
to be scaled
by population:
control_population_scaling_id = np.full([n_controls], False)
control_population_scaling_id[1] = True
model_spec = spec.ModelSpec(
control_population_scaling_id=control_population_scaling_id
)
unique_sigma_for_each_geo
A boolean indicating whether to use a unique residual variance for each geo. If
False
, then a single residual variance is used for all geos. Default:
False
The following example sets the model to use a unique residual variance for each geo:
model_spec = spec.ModelSpec(unique_sigma_for_each_geo=True)