| Title: | An R package to facilitate fitting of global and local Bayesian hierarchical models |
|---|---|
| Description: | The localhierarchy package implements a flexible set up to estimate parameters hierarchically and to carry out local model fits, whereby higher-level parameters are fixed. |
| Authors: | Leontine Alkema [aut, cre], Shauna Mooney [aut], Evan Ray [aut], Herbert Susmann [aut] |
| Maintainer: | Leontine Alkema <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.91 |
| Built: | 2026-06-02 05:32:56 UTC |
| Source: | https://github.com/AlkemaLab/localhierarchy |
This function checks if there are any NAs in a specified column of a data frame.
check_nas(data, column)check_nas(data, column)
data |
tibble or data frame |
column |
column to check for NAs |
stops if there are NAs in the column or if the column does not exist
This function fits the localhierarchy example model
fit_model_localhierarchy( survey_df, y = "y", area = "iso", area_select = NULL, runstep, global_fit = NULL, hierarchical_level = c("intercept", "subcluster", "iso"), add_subnational_hierarchy = "subnat", use_globalsubnat_fromnat = TRUE, mu_isvector = FALSE, chains = 4, iter_sampling = 300, iter_warmup = 150, compile_model = TRUE, force_recompile = FALSE, seed = 1234, refresh = 200, adapt_delta = 0.9, max_treedepth = 14 )fit_model_localhierarchy( survey_df, y = "y", area = "iso", area_select = NULL, runstep, global_fit = NULL, hierarchical_level = c("intercept", "subcluster", "iso"), add_subnational_hierarchy = "subnat", use_globalsubnat_fromnat = TRUE, mu_isvector = FALSE, chains = 4, iter_sampling = 300, iter_warmup = 150, compile_model = TRUE, force_recompile = FALSE, seed = 1234, refresh = 200, adapt_delta = 0.9, max_treedepth = 14 )
survey_df |
tibble with survey data |
y |
column name of outcome, defaults to |
area |
column name of the area of each observation (used as geounit; iso or subnational region) |
area_select |
area name to use for local run (eg iso code or subnat region name) |
runstep |
Type of run, defines which model fitting step to perform (see Details for options). |
global_fit |
optional global fit object, used to obtain fixed values to use for some parameters in the current fit (see Details). |
hierarchical_level |
vector specifying hierarchical structure used for mu |
add_subnational_hierarchy |
level that's added to the hierarchy for subnational, defaults to |
use_globalsubnat_fromnat |
Logical, whether in a local subnational run,
to use the global fit derived from national data
if TRUE and local subnat run, global_fit needs to contain object |
mu_isvector |
Logical, TRUE if mu is a vector, defaults to FALSE Settings for sampling |
chains |
number of chains to run |
iter_sampling |
number of posterior samples to draw |
iter_warmup |
number of warmup iterations |
compile_model |
boolean indicator of whether to compile the Stan model |
force_recompile |
boolean indicator of whether to force recompilation of the Stan model |
seed |
random seed |
refresh |
number of iterations between progress updates |
adapt_delta |
target acceptance rate for the No-U-Turn Sampler |
max_treedepth |
maximum tree depth for the No-U-Turn Sampler |
The fit_model_localhierarchy function fits the toy example for hierarchical models/seq fitting.
The argument runstep determines the type of run to perform. The
following run steps are supported:
"global_national": Fit the global model.
"local_national": Fit the model to data from a single country, using a global_national fit.
"global_subnational": Fit the model to global database with subnational data, using a global_national fit (NISE modeling).
"local_subnational": Fit the model to subnational data from a single country or region, using a global_subnational fit.
The options are also explained in the article with the package.
Details on hierarchical set ups used
The package allows the structure of the hierarchical prior to be configured by the user
through the hierarchical_level argument.
These arguments expect a character vector that specifies a nesting hierarchical structure.
Each element of the vector must be either "intercept" or a column name in the survey dataset, where
"intercept" will add a global intercept for the parameter.
The vector must be in descending order in terms of the hierarchy: that is, it starts with
"intercept" and proceeds down the hierarchy.
For example, suppose we are fitting country-level data, where the dataset has columns
"name_country", "name_sub_region", and "name_region" containing the name of the country,
sub-region, and region that each observation belongs to. To specify that parameter mu
should be fitted with a hierarchical model in which countries are nested within sub-regions within regions within world,
we would use the argument
hierarchical_level = c("intercept", "name_region", "name_sub_region", "name_country").
Optionally, model parameters can be fixed to values from a previous model fit
provided via the global_fit argument. In a typical use case, the
global_fit will have been fit to data from many geographic units
(e.g., all countries), while the current fit uses data from a smaller number
of locations.
When using a global_fit to fix parameter values, what exactly is fixed is determined by the runstep and global_fit combi.
List that contains samples, stan_data, other information relevant to model fit (arguments), and for global fits, point estimates of relevant parameters (post_summ). For subnational global fits, the list includes fit_globalsubnat_fromnat, which is the global fit with additional subnational sigmas added to the postsum object.
Get unique hierarchical levels from survey data and assign index "c" based on area.
get_geo_unit_index_data(data, hierarchical_levels, area)get_geo_unit_index_data(data, hierarchical_levels, area)
data |
survey data of interest that contains the hierarchical_column_names |
hierarchical_levels |
vector that contains one of more specs of hierarchical_level (incl intercept) |
area |
unit of analysis (eg country or subnational region) |
tibble with unique hierarchical levels (one column for each level) and index "c" assigned based on area
survey_dat <- tibble::tibble(subcluster = c("A", "A", "B", "B"), iso = c("iso1", "iso2", "iso3", "iso4")) get_geo_unit_index_data(survey_dat, hierarchical_levels = c("intercept", "subcluster", "iso"), area = "iso")survey_dat <- tibble::tibble(subcluster = c("A", "A", "B", "B"), iso = c("iso1", "iso2", "iso3", "iso4")) get_geo_unit_index_data(survey_dat, hierarchical_levels = c("intercept", "subcluster", "iso"), area = "iso")
Get posterior summaries for a set of parameters for use in local model
get_posterior_summaries_localhierarchy( fit, params = c("mu_raw", "mu_sigma", "nonse") )get_posterior_summaries_localhierarchy( fit, params = c("mu_raw", "mu_sigma", "nonse") )
fit |
fit object to summarize |
params |
vector of parnames to summarize (w/o index) |
tibble with variable (parname with index used in model) and postmean (posterior mean)
Create hierarchical data structure for a hierarchical model.
hierarchical_data(geo_unit_index_data, hierarchical_level)hierarchical_data(geo_unit_index_data, hierarchical_level)
geo_unit_index_data |
tibble with unique hierarchical levels (one column for each level) |
hierarchical_level |
vector with hierarchical spec |
list with model matrix and info on start and end indices associated with different levels
Create list with information including a model matrix, associated with the geo_unit_data and hierarchical_levels
hierarchical_model_matrix(geo_unit_index_data, hierarchical_level)hierarchical_model_matrix(geo_unit_index_data, hierarchical_level)
geo_unit_index_data |
tibble with unique hierarchical levels (one column for each level, one row per combi) |
hierarchical_level |
vector with hierarchical spec from highest to lowest levels |
A list with components modelmatrix, assign, and index:
modelmatrix: A matrix where each column refers to one eta (number of rows equals number of lowest level units).
assign: Integer vector of length equal to the number of etas; 0 for the intercept, then a hierarchical level index (starting at 2) for each eta level.
index: A tibble with n_eta rows, and columns:
i: Index of eta
column: Hierarchical level
level: Name of hierarchical level
All components are associated with the provided geo_unit_data and hierarchical_levels.
Set up data for Stan related to hierarchical parameters
hierarchical_param_stan_data( param_name, param_data, global_fit = NULL, hierarchical_terms_fixed, hierarchical_sigmas_fixed )hierarchical_param_stan_data( param_name, param_data, global_fit = NULL, hierarchical_terms_fixed, hierarchical_sigmas_fixed )
param_name |
The name of the parameter we are working with, e.g. "mu" |
param_data |
Data structures as constructed by hierarchical_data for that parameter |
global_fit |
an optional "global" fit that will be used to extract parameter estimates for any specified hierarchical units to fix. Defaults to NULL, no fixing. |
hierarchical_terms_fixed |
character vector specifying hierarchical
levels for which the terms should be fixed (subset of |
hierarchical_sigmas_fixed |
character vector specifying hierarchical
levels for which the terms should be fixed (subset of |
named list with Stan data relevant to the hierarchical set up for
this parameter, e.g. if the param_name is "mu", these will be
mu_n_terms, mu_n_sigma, mu_re_start, mu_re_end, mu_model_matrix, mu_n_terms_fixed, mu_n_terms_estimate, mu_raw_fixed, mu_n_sigma_fixed, mu_n_sigma_estimate, mu_sigma_fixed
This function plots the posterior densities of location (mu_raw) parameters, with priors added.
plot_muraw_localhierarchy( fit, parname, morethan1param = FALSE, nresultsperpage = 30 )plot_muraw_localhierarchy( fit, parname, morethan1param = FALSE, nresultsperpage = 30 )
fit |
List that includes "parname"_raw_estimate and stan_data |
parname |
Selected parameter name (example: mu) |
morethan1param |
Logical, does parname refer to more than 1 parameter (a vector) |
nresultsperpage |
Number of results per page in summary plots |
lists with list 'summary_plots' and list 'plots_allmuraw'. summary plots gives summary CIs, nresultsperpage at a time. plots_allmuraw gives all the individual plots of each mu_raw, with density per chain and prior added If morethan1param = TRUE, then each list contains a list per parameter k
Display outputs from posterior_summary_hierparam_localhierarchy
plot_posterior_summaries_localhierarchy( res, hierarchy_select = NULL, areas_select = NULL, res2 = NULL, modelname1 = "model 1", modelname2 = "model 2", k_select = NULL, dodge = position_dodge(width = 0.5) )plot_posterior_summaries_localhierarchy( res, hierarchy_select = NULL, areas_select = NULL, res2 = NULL, modelname1 = "model 1", modelname2 = "model 2", k_select = NULL, dodge = position_dodge(width = 0.5) )
res |
output from posterior_summary_hierparam_localhierarchy |
hierarchy_select |
optional, what hierarchical level to show? if NULL, all levels are shown |
areas_select |
optional: specific areas in a level to filter by, allowed only if one hierarchical level is selected |
res2 |
optional, output from posterior_summary_hierparam_localhierarchy for comparison |
modelname1 |
label for res |
modelname2 |
label for res2 |
k_select |
optional, if res contains k, which k values to show? if NULL, all k values are shown |
dodge |
used for offsetting plots, default is 0.5 |
ggplot object
This function plots the prior and posterior densities of sigma_estimate parameters.
plot_prior_post_sigmas_localhierarchy(fit, parname)plot_prior_post_sigmas_localhierarchy(fit, parname)
fit |
List, needs to include parname_sigma_estimate and stan_data |
parname |
Selected parameter name (example: "mu") |
Plot with density of sigma_estimate and prior added
Calculate posterior summaries for hierarchical parameters
posterior_summary_hierparam_localhierarchy( fit, parname, morethan1param = FALSE, hierarchical_levels = fit$hierarchical_level )posterior_summary_hierparam_localhierarchy( fit, parname, morethan1param = FALSE, hierarchical_levels = fit$hierarchical_level )
fit |
needs to include parname_star |
parname |
selected parameter name (example: "mu") |
morethan1param |
does paramname refer to more than 1 parameter (a vector) |
hierarchical_levels |
specifies the names of the hierarchical levels (defaults to fit$hierarchical_level) |
list with summaries of mu for each hierarchical level (units with each level) these mus are obtained by summing up all relevant etas for morethan1param, each level has a list where k refers to the index
Simulates a nested multilevel data structure. Lowest level can be used as observational error.
simulate_multilevel_data( n_levels = 3, n_units_perlevel = rep(25, n_levels), sigma_perlevel = rep(1, n_levels), mu_global = 0, add_data = TRUE, n_data = 10, sigma_y = 0.05, add_data1levelup = FALSE, seed = 12345 )simulate_multilevel_data( n_levels = 3, n_units_perlevel = rep(25, n_levels), sigma_perlevel = rep(1, n_levels), mu_global = 0, add_data = TRUE, n_data = 10, sigma_y = 0.05, add_data1levelup = FALSE, seed = 12345 )
n_levels |
Number of hierarchical levels (default: 3). |
n_units_perlevel |
A vector specifying the number of units at each level (default: rep(25, n_levels)). |
sigma_perlevel |
A vector specifying the standard deviation of etas at each level (default: rep(1, n_levels)). |
mu_global |
Global mean for the outcome variable (default: 0). |
add_data |
Logical indicating whether to add observations (default: TRUE). If TRUE, adds column y. |
n_data |
Number of observations to add at the lowest level (default: 10). |
sigma_y |
Standard deviation of the added observations (default: 0.05) |
add_data1levelup |
Logical indicating whether to add observations one level up e.g. at national level if lowest level is subnational (default: FALSE). |
seed |
Random seed for reproducibility (default: 1234). |
A tibble with columns for each level, mu_global, etas, and their total labeled y