| Title: | Multivariate Age-Period-Cohort (MAPC) Modeling for Health Data |
|---|---|
| Description: | Bayesian multivariate age-period-cohort (MAPC) models for analyzing health data, with support for model fitting, visualization, stratification, and model comparison. Inference focuses on identifiable cross-strata differences, as described by Riebler and Held (2010) <doi:10.1093/biostatistics/kxp037>. Methods for handling complex survey data via the 'survey' package are included, as described in Mercer et al. (2014) <doi:10.1016/j.spasta.2013.12.001>. |
| Authors: | Lars Vatten [aut, cre] |
| Maintainer: | Lars Vatten <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-23 07:39:58 UTC |
| Source: | https://github.com/larsvatten/mapctools |
Aggregates specified columns of a data frame into summarizing statistics, preserving the potentially complex structure returned by aggregator functions (like data frames or inla.mdata objects) within list-columns. Aggregation is performed according to sufficient statistics for the specified distribution of the columns. Possible distributions: Gaussian, binomial. This function aggregates the entire data frame into a single row result.
aggregate_df( data, gaussian = NULL, gaussian.precision.scales = NULL, binomial = NULL )aggregate_df( data, gaussian = NULL, gaussian.precision.scales = NULL, binomial = NULL )
data |
A data frame. |
gaussian |
Gaussian columns in |
gaussian.precision.scales |
Scales for the precision of Gaussian observations.
|
binomial |
Binomial columns in |
A single-row data frame (tibble) containing:
A column n with the total number of rows in the input data.
For each specified column in gaussian, binomial, a corresponding
list-column (named e.g., colname_gaussian, colname_binomial.
Each element of these list-columns can be accessed by using the $ operator twice, e.g. through data$colname_gaussian$Y1 for the first element of the Gaussian summary.
Add 1-indexed APC columns to data frame, handling numeric or categorical age/period
as.APC.df(data, age, period, age_order = NULL, period_order = NULL, M = 1)as.APC.df(data, age, period, age_order = NULL, period_order = NULL, M = 1)
data |
Data frame with age and period columns. |
age |
Age column in |
period |
Period column in |
age_order |
(Optional) Character vector giving the desired order of age levels.
If NULL and the |
period_order |
(Optional) Vector (numeric or character) giving the desired order of periods.
If NULL and |
M |
Grid factor, defined as the ratio of age interval width to period interval width. Defaults to 1 (i.e. assuming equal sized age and period increments). |
The data frame with new columns \code{age_index}, \code{period_index}, \code{cohort_index},
and sorted by \code{(age_index, period_index)}.
Creates a data frame where age, period, and cohort values are placed into
columns specific to their stratum (defined by stratify_var), with other
strata combinations marked as NA. This structure is often useful for
specific modeling approaches, like certain Age-Period-Cohort (APC) models.
Optionally includes unique indices for random effects.
as.APC.NA.df(data, stratify_by, age, period, cohort, include.random = FALSE)as.APC.NA.df(data, stratify_by, age, period, cohort, include.random = FALSE)
data |
Data frame with age, period, cohort, and stratification columns. |
stratify_by |
Stratification variable column. This column will be used to create the stratum-specific NA structure. It should ideally be a factor or character vector. |
age |
Age column in |
period |
Name of the period column (must be a numeric/integer column). |
cohort |
Name of the cohort column (must be a numeric/integer column). |
include.random |
Logical. Whether to include a unique index ('random') for each combination of age, period, and stratum, potentially for use as random effect identifiers in models. Defaults to FALSE. |
A data frame containing the original age, period,
cohort, and stratify_by columns, plus:
Dummy indicator columns for each level of stratify_by (e.g., Region_North, Region_South if Region was a stratifying variable).
Stratum-specific age, period, and cohort columns (e.g., age_Region_North,
period_Region_North, cohort_Region_North), containing the respective
value if the row belongs to that stratum, and NA otherwise.
If include.random = TRUE, a column named random with unique integer indices.
The rows are ordered primarily by the stratification variable levels. This is useful for defining random components in MAPC models.
Fits all configurations of shared vs. stratum-specific time effects:
Shared age and period effects, stratum-specific cohort effects.
Shared age and cohort effects, stratum-specific period effects.
Shared period and cohort effects, stratum-specific age effects.
Shared age effects, stratum-specific period and cohort effects.
Shared period effects, stratum-specific age and cohort effects.
Shared cohort effects, stratum-specific age and period effects.
Uses the fit_MAPC function.
The multivariate APC model is based on Riebler and Held (2010) doi:10.1093/biostatistics/kxp037.
For handling complex survey data, we follow Mercer et al. (2014) doi:10.1016/j.spasta.2013.12.001,
implemented using the survey package.
fit_all_MAPC( data, response, family, stratify_by, reference_strata = NULL, age = "age", period = "period", grid.factor = 1, all_models = c("apC", "aPc", "Apc", "aPC", "ApC", "APc"), extra.fixed = NULL, extra.random = NULL, extra.models = NULL, extra.hyper = NULL, apc_prior = "rw1", include.random = FALSE, binomial.n = NULL, poisson.offset = NULL, apc_hyperprior = NULL, survey.design = NULL, control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE), track.progress = FALSE, verbose = FALSE )fit_all_MAPC( data, response, family, stratify_by, reference_strata = NULL, age = "age", period = "period", grid.factor = 1, all_models = c("apC", "aPc", "Apc", "aPC", "ApC", "APc"), extra.fixed = NULL, extra.random = NULL, extra.models = NULL, extra.hyper = NULL, apc_prior = "rw1", include.random = FALSE, binomial.n = NULL, poisson.offset = NULL, apc_hyperprior = NULL, survey.design = NULL, control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE), track.progress = FALSE, verbose = FALSE )
data |
A data frame containing the age, period, response, and stratification variables.
Age and period are assumed to be on the raw scale, not transformed to 1-indexed index columns.
Factor/character columns are handled, as long as they are properly sorted by |
response |
A string naming the response (outcome) variable in |
family |
A string indicating the likelihood family. The default is |
stratify_by |
The column in |
reference_strata |
Level of |
age |
The age column in |
period |
The period column in |
grid.factor |
(Optional) Grid factor, defined as the ratio of age interval width to period interval width; defaults to 1. |
all_models |
(Optional) Character vectors of valid APC-formats (e.g. |
extra.fixed |
(Optional) If desired, the user can specify additional fixed effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.random |
(Optional) If desired, the user can specify additional random effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.models |
(Optional) If the user specifies one or more additional random effects to be added in |
extra.hyper |
(Optional) If the user specifies one or more additional random effects to be added in |
apc_prior |
(Optional) A string specifying the prior for the age, period, and cohort effects (e.g. |
include.random |
(Optional) Logical; if |
binomial.n |
(Optional) For the |
poisson.offset |
(Optional) For the |
apc_hyperprior |
(Optional) If the user wants non-default hyperpriors for the time effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
survey.design |
(Optional) In the case of complex survey data, explicit handling of unequal sampling probabilities can be required.
The user can pass a |
control.compute |
(Optional) A list of control variables passed to the |
track.progress |
(Optional) Whether to report progress of the estimation of models in the console; defaults to |
verbose |
(Optional) This is argument is passed along to the |
The returned object is of class all_mapc, which is a container for multiple mapc model fits (each typically fitted with a different APC formats).
It also contains a model_selection element, which holds plots summarizing comparative fit metrics (DIC, WAIC and log-scores).
The following S3 methods are available:
print(): Prints a compact summary for each individual model fit.
summary(): Calls summary() on each contained mapc object, providing detailed posterior summaries.
plot(): Displays model comparison plots (DIC/WAIC/log-score comparisons).
These methods are intended to streamline multi-model workflows and allow quick comparison of results across model specifications.
A named list of mapc objects, one for each configuration of shared vs. stratum-specific time effects: APc, ApC, aPC, Apc, aPc, apC.
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392. doi:10.1111/j.1467-9868.2008.00700.x See also https://www.r-inla.org for more information about the INLA method and software.
fit_MAPC for fitting a single model (more flexible; can pass your own formula and lincombs),
and the function inla() from the INLA package for the estimation machinery.
For complex survey data, see svydesign for the creation of a survey design object which can be passed to survey.design.
data("toy_data") fits <- fit_all_MAPC( data = toy_data, response = count, family = "poisson", stratify_by = education, reference_strata = 1, age = age, period = period, apc_prior = "rw2", include.random = TRUE ) # Print concise summary of the models and estimation procedure print(fits) # Plot comparison plots, based on comparative fit metrics plot(fits) # Optional: view full summary of all models (can be long) # summary(fits)data("toy_data") fits <- fit_all_MAPC( data = toy_data, response = count, family = "poisson", stratify_by = education, reference_strata = 1, age = age, period = period, apc_prior = "rw2", include.random = TRUE ) # Print concise summary of the models and estimation procedure print(fits) # Plot comparison plots, based on comparative fit metrics plot(fits) # Optional: view full summary of all models (can be long) # summary(fits)
Fit a Bayesian multivariate age-period-cohort model, and obtain posteriors for identifiable cross-strata contrasts. The method is based on Riebler and Held (2010) doi:10.1093/biostatistics/kxp037. For handling complex survey data, we follow Mercer et al. (2014) doi:10.1016/j.spasta.2013.12.001, implemented using the survey package.
fit_MAPC( data, response, family, apc_format, stratify_by, reference_strata = NULL, age, period, grid.factor = 1, apc_prior = "rw1", extra.fixed = NULL, extra.random = NULL, extra.models = NULL, extra.hyper = NULL, include.random = FALSE, binomial.n = NULL, poisson.offset = NULL, inla_formula = NULL, lincombs = NULL, survey.design = NULL, apc_hyperprior = NULL, control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE), verbose = FALSE )fit_MAPC( data, response, family, apc_format, stratify_by, reference_strata = NULL, age, period, grid.factor = 1, apc_prior = "rw1", extra.fixed = NULL, extra.random = NULL, extra.models = NULL, extra.hyper = NULL, include.random = FALSE, binomial.n = NULL, poisson.offset = NULL, inla_formula = NULL, lincombs = NULL, survey.design = NULL, apc_hyperprior = NULL, control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE), verbose = FALSE )
data |
A data frame containing the age, period, response, and stratification variables.
Age and period are assumed to be on the raw scale, not transformed to 1-indexed index columns.
Factor/character columns are handled, as long as they are properly sorted by |
response |
A string naming the response (outcome) variable in |
family |
A string indicating the likelihood family. The default is |
apc_format |
A specification of the APC structure, with options:
Note: It is also possible to specify models with only one or two time effects, by omitting the letters corresponding to the time effects to be excluded. |
stratify_by |
A string naming the column in |
reference_strata |
Level of |
age |
Name of the age variable in |
period |
Name of the period variable in |
grid.factor |
(Optional) Grid factor, defined as the ratio of age interval width to period interval width; defaults to 1. |
apc_prior |
(Optional) A string specifying the prior for the age, period, and cohort effects (e.g. |
extra.fixed |
(Optional) If desired, the user can specify additional fixed effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.random |
(Optional) If desired, the user can specify additional random effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.models |
(Optional) If the user specifies one or more additional random effects to be added in |
extra.hyper |
(Optional) If the user specifies one or more additional random effects to be added in |
include.random |
(Optional) Logical; if |
binomial.n |
(Optional) For the |
poisson.offset |
(Optional) For the |
inla_formula |
(Optional) If desired, the user can pass its own INLA-compatible formula to define the model. If not, a formula is generated automatically, with the models and priors defined. |
lincombs |
(Optional) If desired, the user can pass its own INLA-compatible linear combinations to be computed by the |
survey.design |
(Optional) In the case of complex survey data, explicit handling of unequal sampling probabilities can be required.
The user can pass a |
apc_hyperprior |
(Optional) If the user wants non-default hyperpriors for the time effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
control.compute |
(Optional) A list of control variables passed to the |
verbose |
(Optional) This is argument is passed along to the |
This function works as a wrapper around the inla()-function from the INLA package, which executes the model fitting procedures using Integrated Neste Laplace Approximations.
The returned object is of class mapc. S3 methods are available for:
print(): Displays a concise summary of the model, including the APC format used, CPU time,
number of estimated parameters (fixed, random, hyperparameters, linear combinations), and model fit scores (DIC, WAIC, log-score).
summary(): Prints detailed posterior summaries of all estimated components, including fixed effects,
random effects, hyperparameters, and linear combinations, as estimated by the inla()-function.
plot(): Visualizes model estimates of cross-stata contrast trends, using precomputed plots stored in the object.
The available plots depends on the APC-format that was used.
You can control which effects to plot using the which argument (e.g. which="age" or which=c("age", "period")).
An named list, containing the following arguments:
model_fitAn object of class "inla", containing posterior densities, posterior summaries, measures of model fit etc. See documentation for the inla()-function for details.
plotsA named list of plots for each time effect. Extract them as plots\$age/plots\$periodplots\$cohort.
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392. doi:10.1111/j.1467-9868.2008.00700.x See also https://www.r-inla.org for more information about the INLA method and software.
fit_all_MAPC for fitting multiple models at once,
and the function inla() from the INLA package for the estimation machinery.
For complex survey data, see svydesign for the creation of a survey design object which can be passed to survey.design.
data("toy_data") fit <- fit_MAPC( data = toy_data, response = count, family = "poisson", apc_format = "ApC", stratify_by = education, reference_strata = 1, age = age, period = period ) # Print concise summary of the MAPC fit and the estimation procedure print(fit) # Plot estimated cross-strata contrast trends plot(fit) # Optional: view full summary of the model (can be long) # summary(fit)data("toy_data") fit <- fit_MAPC( data = toy_data, response = count, family = "poisson", apc_format = "ApC", stratify_by = education, reference_strata = 1, age = age, period = period ) # Print concise summary of the MAPC fit and the estimation procedure print(fit) # Plot estimated cross-strata contrast trends plot(fit) # Optional: view full summary of the model (can be long) # summary(fit)
Constructs a set of linear combinations (contrasts) for age, period, and/or cohort effects
across different strata, relative to a specified reference strata, suitable for use with
inla.make.lincomb from the INLA package.
generate_apc_lincombs( apc_format, data, strata, reference_strata, age = "age", period = "period", cohort = "cohort" )generate_apc_lincombs( apc_format, data, strata, reference_strata, age = "age", period = "period", cohort = "cohort" )
apc_format |
Character string containing any combination of
e.g. |
data |
A |
strata |
String giving the name of the factor column in |
reference_strata |
String indicating which level of |
age |
String name of the column in |
period |
String name of the column in |
cohort |
String name of the column in |
For each specified dimension (a, p, c), the function loops over all
unique values of age, period, or cohort in the data, and over all strata levels except
the reference. It then constructs a contrast that subtracts the effect in the reference
stratum from the effect in the other strata at each index.
A named list of linear combination objects as returned by
inla.make.lincomb() (INLA function). Each element corresponds to one contrast,
with names of the form “Age = x, Strata = y vs ref”, “Period = x, Strata = y vs ref”,
or “Cohort = x, Strata = y vs ref”, depending on apc_format.
Based on APC-format, generate the proper formula to pass to INLA for fitting MAPC models.
generate_MAPC_formula( df, APC_format, response, stratify_var, age = "age", period = "period", cohort = "cohort", intercept = FALSE, apc_prior = "rw1", apc_hyper = NULL, random_term = TRUE, extra.fixed = NULL, extra.random = NULL, extra.models = NULL, extra.hyper = NULL )generate_MAPC_formula( df, APC_format, response, stratify_var, age = "age", period = "period", cohort = "cohort", intercept = FALSE, apc_prior = "rw1", apc_hyper = NULL, random_term = TRUE, extra.fixed = NULL, extra.random = NULL, extra.models = NULL, extra.hyper = NULL )
df |
Data frame for which MAPC models should be fit |
APC_format |
A string where lower-case letters indicate stratum-specific time effects and upper-case letters indicate shared time effects. |
response |
A string, name of the column in |
stratify_var |
Stratification variable. At least one time effect should be stratum-specific, and at least one should be shared. |
age |
Name of age column |
period |
Name of period column |
cohort |
Name of cohort column |
intercept |
Boolean, indicating if an overall intercept should be included in the formula. |
apc_prior |
Which prior model to use for the time effects. |
apc_hyper |
If the user wants non-default hyperpriors for the random time effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
random_term |
Indicator, indicating if a random term should be included in the model. |
extra.fixed |
Name of additional fixed effects. |
extra.random |
Name of additional random effects. |
extra.models |
Models for additional random effects. Supported |
extra.hyper |
If the user wants non-default hyperpriors for the additional random effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
A formula object that can be passed to INLA to fit the desired MAPC model.
Bins a specified numeric variable into intervals, counts observations per value of a specified variable and bin groups, and plots lines for each bin group using ggplot2. If a stratification variable is provided, counts are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.
plot_binned_counts( data, x, bin_by, stratify_by = NULL, for_each = NULL, n_bins = 8, bin_width = NULL, title = "Observation counts", subtitle = NULL, legend_title = NULL, x_lab = NULL, y_lab = NULL, viridis_color_option = "D" )plot_binned_counts( data, x, bin_by, stratify_by = NULL, for_each = NULL, n_bins = 8, bin_width = NULL, title = "Observation counts", subtitle = NULL, legend_title = NULL, x_lab = NULL, y_lab = NULL, viridis_color_option = "D" )
data |
Data frame containing all input variables. |
x |
Variable in |
bin_by |
Numeric variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
n_bins |
(Optional) Number of bins to create across |
bin_width |
(Optional) Width of the bins created across |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
legend_title |
(Optional) Legend title; defaults to name of |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
If for_each is not supplied, a ggplot object showing counts
per x and bin groups, optionally faceted by stratify_by. If for_each
is supplied, a named list of such plots.
plot_counts_1D, plot_counts_2D,
plot_counts_with_mean, ggplot
data("toy_data") # Counts by period, binned by age plot_binned_counts(toy_data, x = period, bin_by = age, n_bins = 4) # Counts by period, binned by age, stratified by education levels plot_binned_counts(toy_data, period, bin_by = age, n_bins = 4, stratify_by = education) # Counts by period, binned by age, stratified by education levels, for each sex plot_binned_counts(toy_data, period, bin_by = age, n_bins = 4, stratify_by = education, for_each = sex)data("toy_data") # Counts by period, binned by age plot_binned_counts(toy_data, x = period, bin_by = age, n_bins = 4) # Counts by period, binned by age, stratified by education levels plot_binned_counts(toy_data, period, bin_by = age, n_bins = 4, stratify_by = education) # Counts by period, binned by age, stratified by education levels, for each sex plot_binned_counts(toy_data, period, bin_by = age, n_bins = 4, stratify_by = education, for_each = sex)
Computes the number of observations at each value of a specified variable and creates a line plot of these counts using ggplot2. If a stratification variable is provided, counts are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.
plot_counts_1D( data, x, stratify_by = NULL, for_each = NULL, title = "Observation counts", subtitle = NULL, legend_title = NULL, x_lab = NULL, y_lab = NULL, viridis_color_option = "D" )plot_counts_1D( data, x, stratify_by = NULL, for_each = NULL, title = "Observation counts", subtitle = NULL, legend_title = NULL, x_lab = NULL, y_lab = NULL, viridis_color_option = "D" )
data |
Data frame containing all input variables. |
x |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
legend_title |
(Optional) Legend title; defaults to name of |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
A ggplot object displaying counts across the variable supplied in x,
optionally stratified by stratify_by. If for_each is supplied, separate plots are created in separate windows for each level.
Visuals can be modified with ggplot2.
plot_counts_2D, plot_binned_counts,
plot_counts_with_mean, ggplot
data("toy_data") # Counts by age plot_counts_1D(toy_data, x = age) # Counts by age, stratified by education level plot_counts_1D(toy_data, x = age, stratify_by = education) # Count by age, stratified by education level, for each sex plot_counts_1D(toy_data, x = age, stratify_by = education, for_each = sex)data("toy_data") # Counts by age plot_counts_1D(toy_data, x = age) # Counts by age, stratified by education level plot_counts_1D(toy_data, x = age, stratify_by = education) # Count by age, stratified by education level, for each sex plot_counts_1D(toy_data, x = age, stratify_by = education, for_each = sex)
Computes the number of observations for each combination of two specified variables, and displays the result as a heatmap using ggplot2. If a stratification variable is provided, counts are calculated per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.
plot_counts_2D( data, x, y, stratify_by = NULL, for_each = NULL, color_gradient = c("blue", "beige", "red"), title = "Observation counts", legend_title = NULL, subtitle = NULL, x_lab = NULL, y_lab = NULL )plot_counts_2D( data, x, y, stratify_by = NULL, for_each = NULL, color_gradient = c("blue", "beige", "red"), title = "Observation counts", legend_title = NULL, subtitle = NULL, x_lab = NULL, y_lab = NULL )
data |
Data frame containing all input variables. |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
color_gradient |
(Optional) Color gradient for the heatmap. Specified as a character vector of three colors, representing: c(<low_counts>, <middle_counts>, <high_counts>).
Defaults to |
title |
(Optional) Plot title; defaults to |
legend_title |
(Optional) Legend title for color gradient; defaults to "Count". |
subtitle |
(Optional) Plot subtitle; defaults to |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
If for_each is not supplied, a ggplot object
showing a heatmap of counts for each x-y combination, optionally
faceted by stratify_by. If for_each is supplied, a named list
of such ggplot objects, one per unique value of for_each.
plot_counts_1D, plot_binned_counts,
plot_counts_with_mean, ggplot
data("toy_data") # Heatmap of counts by age and period plot_counts_2D(toy_data, x = age, y = period) # Heatmap of counts by age and period, stratified by education plot_counts_2D(toy_data, x = period, y = age, stratify_by = education) # Heatmap of counts by age and period, stratified by education, for each sex plot_counts_2D(toy_data, x = period, y = age, stratify_by = education, for_each = sex)data("toy_data") # Heatmap of counts by age and period plot_counts_2D(toy_data, x = age, y = period) # Heatmap of counts by age and period, stratified by education plot_counts_2D(toy_data, x = period, y = age, stratify_by = education) # Heatmap of counts by age and period, stratified by education, for each sex plot_counts_2D(toy_data, x = period, y = age, stratify_by = education, for_each = sex)
Computes counts of observations for each combination of two variables and displays them as a heatmap, with an overlaid line showing the mean of the second variable across the first. If a stratification variable is provided, observations are counted per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.
plot_counts_with_mean( data, x, y, stratify_by = NULL, for_each = NULL, title = NULL, subtitle = NULL, heatmap_legend = "Count", mean_legend = "Mean", viridis_color_option = "D", mean_color = "coral" )plot_counts_with_mean( data, x, y, stratify_by = NULL, for_each = NULL, title = NULL, subtitle = NULL, heatmap_legend = "Count", mean_legend = "Mean", viridis_color_option = "D", mean_color = "coral" )
data |
Data frame containing all input variables. |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts and means are computed for each level of |
for_each |
(Optional) Additional stratification variable.
If supplied, separate plot windows are created per level of |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
heatmap_legend |
(Optional) Label for the heatmap legend; defaults to "Count". |
mean_legend |
(Optional) Label for the overlay mean line legend; defaults to "Mean". |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
mean_color |
(Optional) Color for the overlay mean line; defaults to "coral". |
If for_each is not supplied, a ggplot object showing a
heatmap of counts with a mean overlay line, optionally faceted by stratify_by.
If for_each is supplied, a named list of such plots.
plot_counts_1D, plot_counts_2D,
plot_binned_counts, ggplot
data("toy_data") # Heatmap of counts by age vs. period with mean age overlay plot_counts_with_mean(toy_data, x = period, y = age) # Heatmap of counts by age vs. period with mean age overlay, stratified by education plot_counts_with_mean(toy_data, x = period, y = age, stratify_by = education) # Heatmap of counts by age vs. period with mean age overlay, stratified by education, for each sex plot_counts_with_mean(toy_data, x = period, y = age, stratify_by = education, for_each = sex)data("toy_data") # Heatmap of counts by age vs. period with mean age overlay plot_counts_with_mean(toy_data, x = period, y = age) # Heatmap of counts by age vs. period with mean age overlay, stratified by education plot_counts_with_mean(toy_data, x = period, y = age, stratify_by = education) # Heatmap of counts by age vs. period with mean age overlay, stratified by education, for each sex plot_counts_with_mean(toy_data, x = period, y = age, stratify_by = education, for_each = sex)
Generates ggplot2 line plots of estimated linear combinations for age, period, and/or cohort effects from an INLA fit, stratified by a factor. Returns a named list of ggplot objects for each requested effect.
plot_lincombs( inla_fit, apc_model, data, strata_col, reference_level, family = NULL, age_ind = "age", period_ind = "period", cohort_ind = "cohort", age_title = NULL, period_title = NULL, cohort_title = NULL, y_lab = NULL, age_vals = NULL, period_vals = NULL, cohort_vals = NULL, age_breaks = NULL, age_limits = NULL, period_breaks = NULL, period_limits = NULL, cohort_breaks = NULL, cohort_limits = NULL, PDF_export = FALSE )plot_lincombs( inla_fit, apc_model, data, strata_col, reference_level, family = NULL, age_ind = "age", period_ind = "period", cohort_ind = "cohort", age_title = NULL, period_title = NULL, cohort_title = NULL, y_lab = NULL, age_vals = NULL, period_vals = NULL, cohort_vals = NULL, age_breaks = NULL, age_limits = NULL, period_breaks = NULL, period_limits = NULL, cohort_breaks = NULL, cohort_limits = NULL, PDF_export = FALSE )
inla_fit |
An object returned by the |
apc_model |
Character string indicating the configuration of shared vs. stratum-specific time effects in the model. |
data |
The data frame used to fit |
strata_col |
Character name of the factor column in |
reference_level |
Character value of |
family |
Optional character; if |
age_ind |
Character name of the age variable in |
period_ind |
Character name of the period variable in |
cohort_ind |
Character name of the cohort variable in |
age_title |
Optional plot title for the age effect. |
period_title |
Optional plot title for the period effect. |
cohort_title |
Optional plot title for the cohort effect. |
y_lab |
Optional y-axis label; if |
age_vals |
Optional numeric vector of x-values for age; defaults to
|
period_vals |
Optional numeric vector of x-values for period; defaults to
|
cohort_vals |
Optional numeric vector of x-values for cohort; defaults to
|
age_breaks |
Optional vector of breaks for the age plot x-axis. |
age_limits |
Optional numeric vector of length 2 giving x-axis limits for age. |
period_breaks |
Optional vector of breaks for the period plot x-axis. |
period_limits |
Optional numeric vector of length 2 giving x-axis limits for period. |
cohort_breaks |
Optional vector of breaks for the cohort plot x-axis. |
cohort_limits |
Optional numeric vector of length 2 giving x-axis limits for cohort. |
PDF_export |
Logical; if |
A named list of ggplot objects. Elements are
"age", "period", and/or "cohort" depending on apc_model.
if (requireNamespace("INLA", quietly = TRUE)) { # Load toy dataset data("toy_data") # Filter away unobserved cohorts (see plot_missing_data() function): require(dplyr) toy_data.f <- toy_data %>% filter(sex == "female") %>% subset(cohort > 1931) # Load precomputed 'mapc' object apC_fit.f <- readRDS(system.file("extdata", "quickstart-apC_fit_f.rds", package = "MAPCtools")) # Extract INLA object: apC_fit.inla <- apC_fit.f$model_fit apC_plots <- plot_lincombs( inla_fit = apC_fit.inla, apc_model = "apC", data = toy_data.f, strata_col = "education", reference_level = "1", family = "poisson", ) # Display the age effect plot print(apC_plots$age) # Display the period effect plot print(apC_plots$period) }if (requireNamespace("INLA", quietly = TRUE)) { # Load toy dataset data("toy_data") # Filter away unobserved cohorts (see plot_missing_data() function): require(dplyr) toy_data.f <- toy_data %>% filter(sex == "female") %>% subset(cohort > 1931) # Load precomputed 'mapc' object apC_fit.f <- readRDS(system.file("extdata", "quickstart-apC_fit_f.rds", package = "MAPCtools")) # Extract INLA object: apC_fit.inla <- apC_fit.f$model_fit apC_plots <- plot_lincombs( inla_fit = apC_fit.inla, apc_model = "apC", data = toy_data.f, strata_col = "education", reference_level = "1", family = "poisson", ) # Display the age effect plot print(apC_plots$age) # Display the period effect plot print(apC_plots$period) }
Computes the mean of a specified response variable at each value of a specified x variable and displays a line plot using ggplot2. If a stratification variable is provided, means are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.
plot_mean_response_1D( data, response, x, stratify_by = NULL, for_each = NULL, title = NULL, subtitle = NULL, legend_title = NULL, x_lab = NULL, y_lab = NULL, viridis_color_option = "D" )plot_mean_response_1D( data, response, x, stratify_by = NULL, for_each = NULL, title = NULL, subtitle = NULL, legend_title = NULL, x_lab = NULL, y_lab = NULL, viridis_color_option = "D" )
data |
A |
response |
A numeric variable in |
x |
A variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
legend_title |
(Optional) Legend title; defaults to name of |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
A ggplot object displaying the mean of the response across the variable supplied in x,
optionally stratified by stratify_by. If for_each is supplied, separate plots are created in separate windows for each level.
Visuals can be modified with ggplot2.
data("toy_data") # Mean by age plot_mean_response_1D(toy_data, response = count, x = age) # Mean count by age, stratified by education plot_mean_response_1D(toy_data, response = count, x = age, stratify_by = education) # Mean count by age, stratified by education, for each sex plot_mean_response_1D(toy_data, response = count, x = age, stratify_by = education, for_each = sex)data("toy_data") # Mean by age plot_mean_response_1D(toy_data, response = count, x = age) # Mean count by age, stratified by education plot_mean_response_1D(toy_data, response = count, x = age, stratify_by = education) # Mean count by age, stratified by education, for each sex plot_mean_response_1D(toy_data, response = count, x = age, stratify_by = education, for_each = sex)
Computes the mean of a specified response variable for each combination of two variables and displays it as a heatmap using ggplot2. If a stratification variable is provided, means are calculated per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.
plot_mean_response_2D( data, response, x, y, stratify_by = NULL, for_each = NULL, color_gradient = c("blue", "beige", "red"), title = NULL, subtitle = NULL, x_lab = NULL, y_lab = NULL )plot_mean_response_2D( data, response, x, y, stratify_by = NULL, for_each = NULL, color_gradient = c("blue", "beige", "red"), title = NULL, subtitle = NULL, x_lab = NULL, y_lab = NULL )
data |
Data frame containing all input variables. |
response |
Numeric variable in |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, means are computed for each combination of |
for_each |
(Optional) Additional stratification variable.
If supplied, separate plot windows are created per level of |
color_gradient |
(Optional) Color gradient for the heatmap. Specified as a character vector of three colors, representing: c(<low_counts>, <middle_counts>, <high_counts>).
Defaults to |
title |
Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
A ggplot object showing the mean of response
across x and y, optionally faceted by facet_row and/or facet_col.
data("toy_data") # Mean count by age and period plot_mean_response_2D(toy_data, response = count, x = period, y = age) # Mean count by age and period, stratified by education level plot_mean_response_2D(toy_data, response = count, x = period, y = age, stratify_by = education) # Mean count by age and period, stratified by education level, for each sex plot_mean_response_2D(toy_data, response = count, x = period, y = age, stratify_by = education, for_each = sex)data("toy_data") # Mean count by age and period plot_mean_response_2D(toy_data, response = count, x = period, y = age) # Mean count by age and period, stratified by education level plot_mean_response_2D(toy_data, response = count, x = period, y = age, stratify_by = education) # Mean count by age and period, stratified by education level, for each sex plot_mean_response_2D(toy_data, response = count, x = period, y = age, stratify_by = education, for_each = sex)
Creates a tile plot highlighting combinations of grouping variables that are expected but missing from the data. Allows for faceting.
plot_missing_data( data, x, y, stratify_by = NULL, for_each = NULL, facet_labeller = NULL, title = "Missing data", subtitle = NULL, x_lab = NULL, y_lab = NULL )plot_missing_data( data, x, y, stratify_by = NULL, for_each = NULL, facet_labeller = NULL, title = "Missing data", subtitle = NULL, x_lab = NULL, y_lab = NULL )
data |
Data frame. |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, missing data is examined separately for each leves of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
facet_labeller |
A |
title |
Character string for the plot title. Defaults to "Missing data". |
subtitle |
Character string for the plot subtitle. Defaults to NULL. |
x_lab |
Character string for the x-axis label. Defaults to the name of |
y_lab |
Character string for the y-axis label. Defaults to the name of |
A ggplot object, or NULL if no missing combinations found.
data("toy_data") # Plot missing data across age and period, stratified by education, for each sex plot_missing_data (data = toy_data, x = period, y = age, stratify_by = education, for_each = sex)data("toy_data") # Plot missing data across age and period, stratified by education, for each sex plot_missing_data (data = toy_data, x = period, y = age, stratify_by = education, for_each = sex)
A toy dataset generated to illustrate modeling of age, period, and cohort effects, including interactions with education and sex. This data simulates count outcomes (e.g., disease incidence or event counts) as a function of demographic variables using a Poisson process.
data(toy_data)data(toy_data)
A data frame with 10000 rows and 7 variables:
Age of individuals, sampled uniformly from 20 to 59.
Calendar year of observation, sampled uniformly from 1990 to 2019.
Factor for education level, with levels 1, 2 and 3.
Factor indicating biological sex, with levels: "male", "female".
Simulated event count, generated from a Poisson distribution.
The true Poisson rate used to generate count, computed from the log-linear model.
Derived variable indicating year of birth (period - age).
The underlying event rate is modeled on the log scale as a linear combination of age, period, sex, education, and an age-education interaction. The count outcome is drawn from a Poisson distribution with this rate. This dataset is handy for testing APC models.
The true log-rate is computed (for observation ) as:
where the rate decreases over time (periods), increases with age up to age 40, and decreases after. The coefficients used are:
intercept = 1.0
b_period = 0.02
b_sex = 0.5 (female effect)
b_education_base = 0.5
b_education_age_interaction = 0.015
Simulated data, created using base R and tibble.
Checks that all variable names used in inla.make.lincomb() expressions
(inside a list of lincombs) are present in the provided INLA model formula.
validate_lincombs_against_formula(lincombs, formula)validate_lincombs_against_formula(lincombs, formula)
lincombs |
A list of linear combinations (as generated by |
formula |
The INLA model formula object (e.g., from |
Invisible TRUE if all terms match. Otherwise, stops with an informative error.