Configure the model
Stay organized with collections Save and categorize content based on your preferences.

You can customize and configure the Meridian base model specification for your specific needs, such as customizing ROI priors, adjusting for seasonality, setting the maximum carry over duration, using reach and frequency, and more. For more information about the Meridian base model and options, see The Meridian model section.

Default model specification

You can use the following default model specification to start creating your model:

model_spec = spec.ModelSpec(
    prior=prior_distribution.PriorDistribution(),
    media_effects_dist='log_normal',
    hill_before_adstock=False,
    max_lag=8,
    unique_sigma_for_each_geo=False,
    media_prior_type='roi',
    roi_calibration_period=None,
    rf_prior_type='roi',
    rf_roi_calibration_period=None,
    organic_media_prior_type='contribution',
    organic_rf_prior_type='contribution',
    non_media_treatments_prior_type='contribution',
    knots=None,
    baseline_geo=None,
    holdout_id=None,
    control_population_scaling_id=None,
)

Set the priors

You can customize the priors in the default model specification. Every parameter gets its own independent prior that can be set by the prior argument in Meridian ModelSpec. For information about the priors and exceptions, see Default prior distributions.

The following example customizes the ROI prior distributions for each media channel. In this particular example, the ROI priors for the media channels differ.

Channel 1: LogNormal(0.2, 0.7)
Channel 2: LogNormal(0.3, 0.9)
Channel 3: LogNormal(0.4, 0.6)
Channel 4: LogNormal(0.3, 0.7)
Channel 5: LogNormal(0.3, 0.6)
Channel 6: LogNormal(0.4, 0.5)

my_input_data = input_data.InputData( ... )
build_media_channel_args = my_input_data.get_paid_media_channels_argument_builder()

# Assuming Channel1,...,Channel6 are all media channels.
roi_m = build_media_channel_args(
  Channel1=(0.2, 0.7),
  Channel2=(0.3, 0.9),
  Channel3=(0.4, 0.6),
  Channel4=(0.3, 0.7),
  Channel5=(0.3, 0.6),
  Channel6=(0.4, 0.5),
) # This creates a list of channel-ordered (mu, sigma) tuples.
roi_m_mu, roi_m_sigma = zip(*roi_m)

prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(
        roi_m_mu, roi_m_sigma, name=constants.ROI_M
    )
)
model_spec = spec.ModelSpec(prior=prior)

Where prior in ModelSpec is a PriorDistribution object specifying the prior distribution of each set of model parameters. Every parameter gets its own independent prior that can be set by the prior_distribution.PriorDistribution() constructor.

Model parameters with an m subscript (for example, roi_m) can either have dimensionality equal to the number of media channels or can be one-dimensional. When the dimensionality is equal to the number of media channels, the order of parameter values in the custom prior correspond to the order in data.media_channel, representing a custom prior being set for each respective media channel. If you can't determine a custom prior for some of the media channels, consider manually using the default tfd.LogNormal(0.2, 0.9). When a one-dimensional prior is passed, then the single dimension is used for all of the media channels.

The logic for setting priors for model parameters with a c subscript (for example, gamma_c) is the same as it is for the m subscript. For c subscripts, the dimensionality can either be equal to the number of control variables or can be one-dimensional. When the dimensionality is equal to the number of control variables, the order of parameter values in the custom prior correspond to the order in data.control_variable, representing a custom prior being set for each respective control variable.

The following example shows that a single number is used to set the same ROI prior for each channel. In this particular example, the ROI priors for the two media channels are identical, with both represented as LogNormal(0.2, 0.9).

roi_mu = 0.2
roi_sigma = 0.9
prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)
model_spec = spec.ModelSpec(prior=prior)

It is important to note that Meridian has a distinct ROI parameter (roi_rf) and beta parameter (beta_rf) for the channels that possess reach and frequency. Consequently, some modifications are required to the previously mentioned code snippets in instances where specific channels have reach and frequency data. In this example, channel 4 and channel 5 have reach and frequency data.

To customize the ROI prior distributions for each media channel:

# ROI prior for channels without R&F data
build_media_channel_args = my_input_data.get_paid_media_channels_argument_builder()
roi_m = build_media_channel_args(
  Channel1=(0.2, 0.7),
  Channel2=(0.3, 0.9),
  Channel3=(0.4, 0.6),
  Channel4=(0.3, 0.7),
)
roi_m_mu, roi_m_sigma = zip(*roi_m)

# ROI prior for channels with R&F data
build_rf_channel_args = my_input_data.get_paid_rf_channels_argument_builder()
roi_rf = build_rf_channel_args(
  Channel5=(0.3, 0.6),
  Channel6=(0.4, 0.5),
]
roi_rf_mu, roi_rf_sigma = zip(*roi_rf)

prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(
        roi_m_mu, roi_m_sigma, name=constants.ROI_M
    ),
    roi_rf=tfp.distributions.LogNormal(
        roi_rf_mu, roi_rf_sigma, name=constants.ROI_RF
    ),
)
model_spec = spec.ModelSpec(prior=prior)

Note that the order of parameter values in roi_rf_mu and roi_rf_sigma must match data.rf_channel.

To set same ROI priors for all media channels:

roi_mu = 0.2
roi_sigma = 0.9
prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(
        roi_mu, roi_sigma, name=constants.ROI_M),
    roi_rf=tfp.distributions.LogNormal(
        roi_mu, roi_sigma, name=constants.ROI_RF
    ),
)
model_spec = spec.ModelSpec(prior=prior)

Use a training and test data split (Optional)

We recommend using a training and test data split to help avoid overfitting and to help ensure that the model generalizes well with new data. This can be done by using holdout_id. This is an optional step.

The following example shows a holdout_id argument that randomly sets 20% of the data as the test group:

np.random.seed(1)
test_pct = 0.2  # 20% of data are held out
n_geos = len(data.geo)
n_times = len(data.time)
holdout_id = np.full([n_geos, n_times], False)
for i in range(n_geos):
  holdout_id[
    i,
    np.random.choice(
      n_times,
      int(np.round(test_pct * n_times)),
    )
  ] = True
model_spec = spec.ModelSpec(holdout_id=holdout_id)

Where holdout_id is an optional boolean tensor of dimensions (n_geos x n_times) or (n_times) that indicates which observations are excluded from the training sample. Only the response variable is excluded from the training sample, and the media variables are still included as they can affect the Adstock of subsequent weeks. Default: None (meaning there aren't any holdout geo and times).

Tune the automatic seasonality adjustment (Optional)

Meridian applies automatic seasonality adjustment through a time-varying intercept approach. The effects can be tuned by adjusting the value of knots. For more information, see How the knots argument works.

knots is an optional integer or list of integers indicating the knots used to estimate time effects. When knots is a list of integers, the knot locations are provided by that list, where zero corresponds to a knot at the first time period, one corresponds to a knot at the second time period, etc., with (n_times - 1) corresponding to a knot at the last time period.

When knots is an integer, there are a corresponding number of knots with locations equally spaced across the time periods, including knots at zero and (n_times - 1). When knots is 1, there is a single common regression coefficient used for all time periods.

If knots is set to None, the number of knots used is equal to the number of time periods in the case of a geo model. This is equivalent to each time period having its own regression coefficient. For national models, if knots is set to None, then the number of knots used is 1. By default, its value is set to None.

For guidance on how to choose the number of knots for time effects in the model, see Choosing the number of knots for time effects in the model.

Examples

Setting knots to 1 so that there is no automatic seasonality adjustment. In this case, we recommend that you include your own seasonality or holiday dummies as control variables:
```
model_spec = spec.ModelSpec(knots=1)
```

Setting knots to a relatively large number:

knots = round(0.8 * n_times)
model_spec = spec.ModelSpec(knots=knots)

Setting knots for every 4 time points:

knots = np.arange(0, n_times, 4).tolist()
model_spec = spec.ModelSpec(knots=knots)

Setting knots so that there is a knot in November and December, but is relatively sparse elsewhere. To make this example tractable, assume that there are only 12 data points and that the data is monthly (this assumption is not realistic or recommended). This example is summarized in the following table:

To set knots at indexes 0, 3, 6, 10, and 11, do the following:

knots = [0, 3, 6, 10, 11]
model_spec = spec.ModelSpec(knots=knots)

You can use a similar approach to help ensure knots at major holidays.

Tune the ROI calibration (Optional)

Meridian introduces an ROI calibration method that reparameterizes ROI as a model parameter. For more information, see ROI priors for calibration.

By default, the same non-informative prior on ROI is applied to all media channels. You can tune this feature by doing one of the following:

Turning off the ROI calibration
Setting the ROI calibration period

Turn off the ROI calibration

You can turn off the ROI calibration feature using media_prior_type='coefficient' and rf_prior_type='coefficient':

model_spec = spec.ModelSpec(
    media_prior_type='coefficient',
    rf_prior_type='coefficient',
)

The media_prior_type argument indicates whether to use a prior on roi_m, mroi_m, or beta_m in the PriorDistribution. Default: 'roi' (recommended)

Set the ROI calibration period

Although the media effect regression coefficients don't have time varying effects, there is a calibration window argument for setting the ROI (or mROI) prior. This is because ROI (or mROI) at a given time is dependent on additional factors that might vary across time:

The Hill curves model the non-linear, diminishing returns of media execution. Therefore, the amount of media execution at a given time can impact the ROI.
Media allocation across geos with different effectiveness.
Cost of media execution.

You can calibrate the MMM using a subset of data when the experimental results don't reflect the modeling return on ad spend (ROAS) that the MMM aims to measure. For example, the experiment timeframe is not aligned with the MMM data timeframe. For more information, see Media Mix Model Calibration with Bayesian Priors and ROI priors and calibration.

The following example shows how to specify the ROI calibration period from '2021-11-01' to '2021-12-20' for channel 1. Any media channels not specified within the roi_period will utilize all available periods for ROI calibration.

roi_period = {
  'Channel1': [
    '2021-11-01',
    '2021-11-08',
    '2021-11-15',
    '2021-11-22',
    '2021-11-29',
    '2021-12-06',
    '2021-12-13',
    '2021-12-20',
  ],
}

roi_calibration_period = np.zeros((len(data.time), len(data.media_channel)))
for i in roi_period.items():
  roi_calibration_period[
      np.isin(data.time.values, i[1]), data.media_channel.values == i[0]
  ] = 1

roi_calibration_period[
    :, ~np.isin(data.media_channel.values, list(roi_period.keys()))
] = 1

model_spec = spec.ModelSpec(roi_calibration_period=roi_calibration_period)

Where roi_calibration_period is an optional boolean array of shape (n_media_times, n_media_channels) indicating the subset of time for media ROI calibration. If set to None, all times are used for media ROI calibration. Default: None.

Note that Meridian has a different parameter, rf_roi_calibration_period, for the channels that possess reach and frequency. The following example shows how to specify the ROI calibration period from '2021-11-01' to '2021-12-20' for channel 5, which use reach and frequency as inputs.

roi_period = {
  'Channel5': [
    '2021-11-01',
    '2021-11-08',
    '2021-11-15',
    '2021-11-22',
    '2021-11-29',
    '2021-12-06',
    '2021-12-13',
    '2021-12-20',
  ],
}

rf_roi_calibration_period = np.zeros(len(data.time), len(data.rf_channel))
for i in roi_period.items():
  rf_roi_calibration_period[
      np.isin(data.time.values, i[1]), data.rf_channel.values == i[0]
  ] = 1

rf_roi_calibration_period[
    :, ~np.isin(data.rf_channel.values, list(roi_period.keys()))
] = 1

model_spec = spec.ModelSpec(rf_roi_calibration_period=rf_roi_calibration_period)

Where rf_roi_calibration_period is an optional boolean array of shape (n_media_times, n_rf_channels). If set to None, all times and geos are used for media ROI calibration. Default: None.

Set additional attributes (Optional)

You can change any remaining attributes in the default model specification as needed. This section describes the remaining attributes and examples of how to change the values.

`baseline_geo`

An optional integer or a string for the baseline geo. The baseline geo is treated as the reference geo in the dummy encoding of geos. Non-baseline geos have a corresponding tau_g indicator variable, which means that they have a higher prior variance than the baseline geo. When set to None, the geo with the biggest population is used as the baseline. Default: None

The following example shows the baseline geo set to 'Geo10':

model_spec = spec.ModelSpec(baseline_geo='Geo10')

`hill_before_adstock`

A boolean indicating whether to apply the Hill function before the Adstock function, in contrast to the default order of Adstock before Hill. Default: False

The following examples shows that the Hill function will be applied first, by setting the value to True:

model_spec = spec.ModelSpec(hill_before_adstock=True)

`max_lag`

An integer indicating the maximum number of lag periods (>= 0) to include in the Adstock calculation. Can also be set to None, which is the equivalent to the whole modeling period. Default: 8

The following example changes the value to 4:

model_spec = spec.ModelSpec(max_lag=4)

`media_effects_dist`

A string to specify the distribution of media random effects across geos. Allowed values: 'normal' or 'log_normal'. Default: 'log_normal'

Where:

media_effects_dist='log_normal' is \(\beta m,g\ {_\sim^{iid}} Lognormal(\beta m, \eta ^2m)\)
media_effects_dist='normal' is \(\beta m,g\ {_\sim^{iid}} Normal (\beta m, \eta ^2m)\)

The following example shows how to change the value to 'normal':

model_spec = spec.ModelSpec(media_effects_dist='normal')

`control_population_scaling_id`

An optional boolean tensor of dimension (n_controls) indicating the control variables for which the control value will be scaled by population. Default: None

The following example specifies the control variable at index 1 to be scaled by population:

control_population_scaling_id = np.full([n_controls], False)
control_population_scaling_id[1] = True
model_spec = spec.ModelSpec(
  control_population_scaling_id=control_population_scaling_id
)

`unique_sigma_for_each_geo`

A boolean indicating whether to use a unique residual variance for each geo. If False, then a single residual variance is used for all geos. Default: False

The following example sets the model to use a unique residual variance for each geo:

model_spec = spec.ModelSpec(unique_sigma_for_each_geo=True)

Load national-level data

Run the model

Configure the model Stay organized with collections Save and categorize content based on your preferences.