Title: | Flexible Bayesian Model Selection and Model Averaging |
---|---|
Description: | Implements the Mode Jumping Markov Chain Monte Carlo algorithm described in <doi:10.1016/j.csda.2018.05.020> and its Genetically Modified counterpart described in <doi:10.1613/jair.1.13047> as well as the sub-sampling versions described in <doi:10.1016/j.ijar.2022.08.018> for flexible Bayesian model selection and model averaging. |
Authors: | Jon Lachmann [cre, aut], Aliaksandr Hubin [aut] |
Maintainer: | Jon Lachmann <[email protected]> |
License: | GPL-2 |
Version: | 1.1 |
Built: | 2025-02-26 21:42:47 UTC |
Source: | https://github.com/cran/FBMS |
Implements the Mode Jumping Markov Chain Monte Carlo algorithm described in <doi:10.1016/j.csda.2018.05.020> and its Genetically Modified counterpart described in <doi:10.1613/jair.1.13047> as well as the sub-sampling versions described in <doi:10.1016/j.ijar.2022.08.018> for flexible Bayesian model selection and model averaging.
Maintainer: Jon Lachmann [email protected]
Authors:
Jon Lachmann [email protected]
Aliaksandr Hubin [email protected]
Other contributors:
Florian Frommlet [email protected] [contributor]
Geir Storvik [email protected] [contributor]
Lachmann, J., Storvik, G., Frommlet, F., & Hubin, A. (2022). A subsampling approach for Bayesian model selection. International Journal of Approximate Reasoning, 151, 33-63. Elsevier.
Hubin, A., Storvik, G., & Frommlet, F. (2021). Flexible Bayesian Nonlinear Model Configuration. Journal of Artificial Intelligence Research, 72, 901-942.
Hubin, A., Frommlet, F., & Storvik, G. (2021). Reversible Genetically Modified MJMCMC. Under review in EYSM 2021.
Hubin, A., & Storvik, G. (2018). Mode jumping MCMC for Bayesian variable selection in GLMM. Computational Statistics & Data Analysis, 127, 281-297. Elsevier.
%% ~~ A concise (1-5 lines) description of the dataset. ~~
A data frame with 4177 observations on the following 9 variables.
Diameter Perpendicular to length, continuous
Height with with meat in shell, continuous.
Longest shell measurement, continuous
+1.5 gives the age in years, integer
Sex of the abalone, F
is female, M
male, and I
infant, categorical.
Grams after being dried, continuous.
Grams weight of meat, continuous.
Grams gut weight (after bleeding), continuous.
Grams whole abalone, continuous.
See the web page https://archive.ics.uci.edu/ml/datasets/Abalone for more information about the data set.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository https://archive.ics.uci.edu/ml/. Irvine, CA: University of California, School of Information and Computer Science.
data(abalone) ## maybe str(abalone) ; plot(abalone) ...
data(abalone) ## maybe str(abalone) ; plot(abalone) ...
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
data(breastcancer)
data(breastcancer)
A data frame with 569 rows and 32 variables
Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) (K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992), a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes.
The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: (K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34).
The variables are as follows:
ID number
Diagnosis (1 = malignant, 0 = benign)
Ten real-valued features are computed for each cell nucleus
Dataset downloaded from the UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Creators:
Dr. William H. Wolberg, General Surgery Dept. University of Wisconsin, Clinical Sciences Center Madison, WI 53792 wolberg 'at' eagle.surgery.wisc.edu
W. Nick Street, Computer Sciences Dept. University of Wisconsin, 1210 West Dayton St., Madison, WI 53706 street 'at' cs.wisc.edu 608-262-6619
Olvi L. Mangasarian, Computer Sciences Dept. University of Wisconsin, 1210 West Dayton St., Madison, WI 53706 olvi 'at' cs.wisc.edu
Donor: Nick Street
W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
Lichman, M. (2013). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
This function computes model averaged effects for specified covariates using a fitted model object. The effects are expected change in the BMA linear predictor having an increase of the corresponding covariate by one unit, while other covariates are fixed to 0. Users can provide custom labels and specify quantiles for the computation of effects.
compute_effects(object, labels, quantiles = c(0.025, 0.5, 0.975))
compute_effects(object, labels, quantiles = c(0.025, 0.5, 0.975))
object |
A fitted model object, typically the result of a regression or predictive modeling. |
labels |
A vector of labels for which effects are to be computed. |
quantiles |
A numeric vector specifying the quantiles to be calculated. Default is c(0.025, 0.5, 0.975). |
A matrix of treatment effects for the specified labels, with rows corresponding to labels and columns to quantiles.
data <- data.frame(matrix(rnorm(600), 100)) result <- mjmcmc.parallel(runs = 2, cores = 1, data, gaussian.loglik) compute_effects(result,labels = names(data)[-1])
data <- data.frame(matrix(rnorm(600), 100)) result <- mjmcmc.parallel(runs = 2, cores = 1, data, gaussian.loglik) compute_effects(result,labels = names(data)[-1])
Cosine function for degrees
cos_deg(x)
cos_deg(x)
x |
The vector of values in degrees |
The cosine of x
cos_deg(0)
cos_deg(0)
Plot convergence of best/median/mean/other summary log posteriors in time
diagn_plot(res, FUN = median, conf = 0.95, burnin = 0, window = 5, ylim = NULL)
diagn_plot(res, FUN = median, conf = 0.95, burnin = 0, window = 5, ylim = NULL)
res |
Object corresponding gmjmcmc output |
FUN |
The summary statistics to check convergence |
conf |
which confidence intervals to plot |
burnin |
how many first populations to skip |
window |
sliding window for computing the standard deviation |
ylim |
limits for the plotting range, if unspecified, min and max of confidence intervals will be used |
A list of summary statistics for checking convergence with given confidence intervals
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) diagnstats <- diagn_plot(result)
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) diagnstats <- diagn_plot(result)
erf function
erf(x)
erf(x)
x |
The vector of values |
2 * pnorm(x * sqrt(2)) - 1
erf(2)
erf(2)
Data fields include planet and host star attributes.
data(exoplanet)
data(exoplanet)
A data frame with 223 rows and 11 variables
The variables are as follows:
TypeFlag: Flag indicating the type of data
PlanetaryMassJpt: Mass of the planetary object in Jupiter masses
RadiusJpt: Radius of the planetary object in Jupiter radii
PeriodDays: Orbital period of the planetary object in days
SemiMajorAxisAU: Semi-major axis of the planetary object's orbit in astronomical units
Eccentricity: Eccentricity of the planetary object's orbit
HostStarMassSlrMass: Mass of the host star in solar masses
HostStarRadiusSlrRad: Radius of the host star in solar radii
HostStarMetallicity: Metallicity of the host star
HostStarTempK: Effective temperature of the host star in Kelvin
PlanetaryDensJpt: Density of the planetary object up to a constant
Dataset downloaded from the Open Exoplanet Catalogue Repository. https://github.com/OpenExoplanetCatalogue/oec_tables/
Creators:
Prof. Hanno Rein, Department for Physical and Environmental Sciences. University of Toronto at Scarborough Toronto, Ontario M1C 1A4 hanno.rein 'at' utoronto.ca
Double exponential function
exp_dbl(x)
exp_dbl(x)
x |
The vector of values |
e^(-abs(x))
exp_dbl(2)
exp_dbl(2)
This function fits a model using the relevant MCMC sampling. The user can specify the formula, family, data, transforms, and other parameters to customize the model.
fbms( formula = NULL, family = "gaussian", data = NULL, impute = FALSE, loglik.pi = gaussian.loglik, method = "mjmcmc", verbose = TRUE, ... )
fbms( formula = NULL, family = "gaussian", data = NULL, impute = FALSE, loglik.pi = gaussian.loglik, method = "mjmcmc", verbose = TRUE, ... )
formula |
A formula object specifying the model structure. Default is NULL. |
family |
The distribution family of the response variable. Currently supports "gaussian", "binomial" and "custom". Default is "gaussian". |
data |
A data frame containing the variables in the model. If NULL, the variables are taken from the environment of the formula. Default is NULL. |
impute |
TRUE means imputation combined with adding a dummy column with indicators of imputed values, FALSE (default) means only full data is used. |
loglik.pi |
Custom function to compute the logarithm of the posterior mode based on logarithm of marginal likelihood and logarithm of prior functions (needs specification only used if family = "custom") |
method |
Which fitting algorithm should be used, currently implemented options include "gmjmcmc", "gmjmcmc.parallel", "mjmcmc" and "mjmcmc.parallel" with "mjmcmc" being the default and 'mjmcmc' means that only linear models will be estimated |
verbose |
If TRUE, print detailed progress information during the fitting process. Default is TRUE. |
... |
Additional parameters to be passed to the underlying method. |
An object containing the results of the fitted model and MCMC sampling.
mjmcmc
, gmjmcmc
, gmjmcmc.parallel
# Fit a Gaussian multivariate time series model fbms_result <- fbms( X1 ~ ., family = "gaussian", method = "gmjmcmc.parallel", data = data.frame(matrix(rnorm(600), 100)), transforms = c("sin","cos"), P = 10, runs = 1, cores = 1 ) summary(fbms_result) plot(fbms_result)
# Fit a Gaussian multivariate time series model fbms_result <- fbms( X1 ~ ., family = "gaussian", method = "gmjmcmc.parallel", data = data.frame(matrix(rnorm(600), 100)), transforms = c("sin","cos"), P = 10, runs = 1, cores = 1 ) summary(fbms_result) plot(fbms_result)
Gaussian function
gauss(x)
gauss(x)
x |
The vector of values |
e^(-x^2)
gauss(2)
gauss(2)
Log likelihood function for gaussian regression with a prior p(m)=r*sum(total_width).
gaussian.loglik(y, x, model, complex, params)
gaussian.loglik(y, x, model, complex, params)
y |
A vector containing the dependent variable |
x |
The matrix containing the precalculated features |
model |
The model to estimate as a logical vector |
complex |
A list of complexity measures for the features |
params |
A list of parameters for the log likelihood, supplied by the user |
A list with the log marginal likelihood combined with the log prior (crit) and the posterior mode of the coefficients (coefs).
gaussian.loglik(rnorm(100), matrix(rnorm(100)), TRUE, list(oc = 1), NULL)
gaussian.loglik(rnorm(100), matrix(rnorm(100)), TRUE, list(oc = 1), NULL)
Log likelihood function for gaussian regression for alpha calculation This function is just the bare likelihood function Note that it only gives a proportional value and is equivalent to least squares
gaussian.loglik.alpha(a, data, mu_func)
gaussian.loglik.alpha(a, data, mu_func)
a |
A vector of the alphas to be used |
data |
The data to be used for calculation |
mu_func |
The function linking the mean to the covariates, as a string with the alphas as a[i]. |
A numeric with the log likelihood.
## Not run: gaussian.loglik.alpha(my_alpha,my_data,my_mu) ## End(Not run)
## Not run: gaussian.loglik.alpha(my_alpha,my_data,my_mu) ## End(Not run)
GELU function
gelu(x)
gelu(x)
x |
The vector of values |
x*pnorm(x)
gelu(2)
gelu(2)
This function generates the full list of parameters required for the Generalized Mode Jumping Markov Chain Monte Carlo (GMJMCMC) algorithm, building upon the parameters from gen.params.mjmcmc
. The generated parameter list includes feature generation settings, population control parameters, and optimization controls for the search process.
gen.params.gmjmcmc(data)
gen.params.gmjmcmc(data)
data |
A data frame containing the dataset with covariates and response variable. |
A list of parameters for controlling GMJMCMC behavior:
feat
)feat$D
Maximum feature depth, default 5
. Limits the number of recursive feature transformations. For fractional polynomials, it is recommended to set D = 1
.
feat$L
Maximum number of features per model, default 15
. Increase for complex models.
feat$alpha
Strategy for generating $alpha$ parameters in non-linear projections:
"unit"
(Default) Sets all components to 1.
"deep"
Optimizes $alpha$ across all feature layers.
"random"
Samples $alpha$ from the prior for a fully Bayesian approach.
feat$pop.max
Maximum feature population size per iteration. Defaults to min(100, as.integer(1.5 * p))
, where p
is the number of covariates.
feat$keep.org
Logical flag; if TRUE
, original covariates remain in every population (default FALSE
).
feat$prel.filter
Threshold for pre-filtering covariates before the first population generation. Default 0
disables filtering.
feat$prel.select
Indices of covariates to include initially. Default NULL
includes all.
feat$keep.min
Minimum proportion of features to retain during population updates. Default 0.8
.
feat$eps
Threshold for feature inclusion probability during generation. Default 0.05
.
feat$check.col
Logical; if TRUE
(default), checks for collinearity during feature generation.
feat$max.proj.size
Maximum number of existing features used to construct a new one. Default 15
.
rescale.large
Logical flag for rescaling large data values for numerical stability. Default FALSE
.
burn_in
The burn-in period for the MJMCMC algorithm, which is set to 100 iterations by default.
mh
A list containing parameters for the regular Metropolis-Hastings (MH) kernel:
neigh.size
The size of the neighborhood for MH proposals with fixed proposal size, default set to 1.
neigh.min
The minimum neighborhood size for random proposal size, default set to 1.
neigh.max
The maximum neighborhood size for random proposal size, default set to 2.
large
A list containing parameters for the large jump kernel:
neigh.size
The size of the neighborhood for large jump proposals with fixed neighborhood size, default set to the smaller of 0.35 * p
and 35
, where is the number of covariates.
neigh.min
The minimum neighborhood size for large jumps with random size of the neighborhood, default set to the smaller of 0.25 * p
and 25
.
neigh.max
The maximum neighborhood size for large jumps with random size of the neighborhood, default set to the smaller of 0.45 * p
and 45
.
random
A list containing a parameter for the randomization kernel:
prob
The small probability of changing the component around the mode, default set to 0.01.
sa
A list containing parameters for the simulated annealing kernel:
probs
A numeric vector of length 6 specifying the probabilities for different types of proposals in the simulated annealing algorithm.
neigh.size
The size of the neighborhood for the simulated annealing proposals, default set to 1.
neigh.min
The minimum neighborhood size, default set to 1.
neigh.max
The maximum neighborhood size, default set to 2.
t.init
The initial temperature for simulated annealing, default set to 10.
t.min
The minimum temperature for simulated annealing, default set to 0.0001.
dt
The temperature decrement factor, default set to 3.
M
The number of iterations in the simulated annealing process, default set to 12.
greedy
A list containing parameters for the greedy algorithm:
probs
A numeric vector of length 6 specifying the probabilities for different types of proposals in the greedy algorithm.
neigh.size
The size of the neighborhood for greedy algorithm proposals, set to 1.
neigh.min
The minimum neighborhood size for greedy proposals, set to 1.
neigh.max
The maximum neighborhood size for greedy proposals, set to 2.
steps
The number of steps for the greedy algorithm, set to 20.
tries
The number of tries for the greedy algorithm, set to 3.
loglik
A list to store log-likelihood values, which is by default empty.
data <- data.frame(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100)) params <- gen.params.gmjmcmc(data) str(params)
data <- data.frame(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100)) params <- gen.params.gmjmcmc(data) str(params)
Generate a parameter list for MJMCMC (Mode Jumping MCMC)
gen.params.mjmcmc(data)
gen.params.mjmcmc(data)
data |
The dataset that will be used in the algorithm |
A list of parameters to use when running the mjmcmc function.
The list contains the following elements:
burn_in
The burn-in period for the MJMCMC algorithm, which is set to 100 iterations by default.
mh
A list containing parameters for the regular Metropolis-Hastings (MH) kernel:
neigh.size
The size of the neighborhood for MH proposals with fixed proposal size, default set to 1.
neigh.min
The minimum neighborhood size for random proposal size, default set to 1.
neigh.max
The maximum neighborhood size for random proposal size, default set to 2.
large
A list containing parameters for the large jump kernel:
neigh.size
The size of the neighborhood for large jump proposals with fixed neighborhood size, default set to the smaller of 0.35 and 35, where
is the number of covariates.
neigh.min
The minimum neighborhood size for large jumps with random size of the neighborhood, default set to the smaller of 0.25 and 25.
neigh.max
The maximum neighborhood size for large jumps with random size of the neighborhood, default set to the smaller of 0.45 and 45.
random
A list containing a parameter for the randomization kernel:
prob
The small probability of changing the component around the mode, default set to 0.01.
sa
A list containing parameters for the simulated annealing kernel:
probs
A numeric vector of length 6 specifying the probabilities for different types of proposals in the simulated annealing algorithm.
neigh.size
The size of the neighborhood for the simulated annealing proposals, default set to 1.
neigh.min
The minimum neighborhood size, default set to 1.
neigh.max
The maximum neighborhood size, default set to 2.
t.init
The initial temperature for simulated annealing, default set to 10.
t.min
The minimum temperature for simulated annealing, default set to 0.0001.
dt
The temperature decrement factor, default set to 3.
M
The number of iterations in the simulated annealing process, default set to 12.
greedy
A list containing parameters for the greedy algorithm:
probs
A numeric vector of length 6 specifying the probabilities for different types of proposals in the greedy algorithm.
neigh.size
The size of the neighborhood for greedy algorithm proposals, set to 1.
neigh.min
The minimum neighborhood size for greedy proposals, set to 1.
neigh.max
The maximum neighborhood size for greedy proposals, set to 2.
steps
The number of steps for the greedy algorithm, set to 20.
tries
The number of tries for the greedy algorithm, set to 3.
loglik
A list to store log-likelihood values, which is by default empty.
Note that the $loglik
item is an empty list, which is passed to the log likelihood function of the model,
intended to store parameters that the estimator function should use.
gen.params.mjmcmc(matrix(rnorm(600), 100))
gen.params.mjmcmc(matrix(rnorm(600), 100))
Generate a probability list for GMJMCMC (Genetically Modified MJMCMC)
gen.probs.gmjmcmc(transforms)
gen.probs.gmjmcmc(transforms)
transforms |
A list of the transformations used (to get the count). |
A named list with eight elements:
large
The probability of a large jump kernel in the MJMCMC algorithm. With this probability, a large jump proposal will be made; otherwise, a local Metropolis-Hastings proposal will be used. One needs to consider good mixing around and between modes when specifying this parameter.
large.kern
A numeric vector of length 4 specifying the probabilities for different types of large jump kernels. The four components correspond to:
Random change with random neighborhood size
Random change with fixed neighborhood size
Swap with random neighborhood size
Swap with fixed neighborhood size
These probabilities will be automatically normalized if they do not sum to 1.
localopt.kern
A numeric vector of length 2 specifying the probabilities for different local optimization methods during large jumps. The first value represents the probability of using simulated annealing, while the second corresponds to the greedy optimizer. These probabilities will be normalized if needed.
random.kern
A numeric vector of length 2 specifying the probabilities
of first two randomization kernels applied after local optimization. These correspond
to the same kernel types as in large.kern
but are used for local proposals
where type and 2 only are allowed.
mh
A numeric vector specifying the probabilities of different standard Metropolis-Hastings kernels, where the first four as the same as for other kernels, while fifths and sixes components are uniform addition/deletion of a covariate.
filter
A numeric value controlling the filtering of features
with low posterior probabilities in the current population. Features with
posterior probabilities below this threshold will be removed with a probability
proportional to .
gen
A numeric vector of length 4 specifying the probabilities of different feature generation operators. These determine how new nonlinear features are introduced. The first entry gives the probability for an interaction, followed by modification, nonlinear projection, and a mutation operator, which reintroduces discarded features. If these probabilities do not sum to 1, they are automatically normalized.
trans
A numeric vector of length equal to the number of elements in transforms
,
specifying the probabilities of selecting each nonlinear transformation from .
By default, a uniform distribution is assigned, but this can be modified by providing a specific
transforms
argument.
gen.probs.gmjmcmc(c("p0", "exp_dbl"))
gen.probs.gmjmcmc(c("p0", "exp_dbl"))
Generate a probability list for MJMCMC (Mode Jumping MCMC)
gen.probs.mjmcmc()
gen.probs.mjmcmc()
A named list with five elements:
large
A numeric value representing the probability of making a large jump. If a large jump is not made, a local MH (Metropolis-Hastings) proposal is used instead.
large.kern
A numeric vector of length 4 specifying the probabilities for different types of large jump kernels. The four components correspond to:
Random change with random neighborhood size
Random change with fixed neighborhood size
Swap with random neighborhood size
Swap with fixed neighborhood size
These probabilities will be automatically normalized if they do not sum to 1.
localopt.kern
A numeric vector of length 2 specifying the probabilities for different local optimization methods during large jumps. The first value represents the probability of using simulated annealing, while the second corresponds to the greedy optimizer. These probabilities will be normalized if needed.
random.kern
A numeric vector of length 2 specifying the probabilities of different randomization kernels applied after local optimization of type one or two. These correspond to the first two kernel types as in large.kern
but are used for local proposals with different neighborhood sizes.
mh
A numeric vector specifying the probabilities of different standard Metropolis-Hastings kernels, where the first four as the same as for other kernels, while fifths and sixes components are uniform addition/deletion of a covariate.
gen.probs.mjmcmc()
gen.probs.mjmcmc()
This function retrieves the best model from the results of MJMCMC, MJMCMC parallel, GMJMCMC, or GMJMCMC merged runs
based on the maximum criterion value (crit
). The returned list includes the model probability, selected features,
criterion value, intercept parameter, and named coefficients.
get.best.model(result, labels = FALSE)
get.best.model(result, labels = FALSE)
result |
An object of class |
labels |
Logical; if |
The function identifies the best model by selecting the one with the highest crit
value. Selection logic depends on the class of the result
object:
"mjmcmc"
Selects the top model from a single MJMCMC run.
"mjmcmc_parallel"
Identifies the best chain, then selects the best model from that chain.
"gmjmcmc"
Selects the best population and model within that population.
"gmjmcmc_merged"
Finds the best chain and population before extracting the top model.
A list containing the details of the best model:
prob
A numeric value representing the model's probability.
model
A logical vector indicating which features are included in the best model.
crit
The criterion value used for model selection (e.g., marginal likelihood or posterior probability).
alpha
The intercept parameter of the best model.
coefs
A named numeric vector of model coefficients, including the intercept and selected features.
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) get.best.model(result)
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) get.best.model(result)
This function extracts the Median Probability Model (MPM) from a fitted model object. The MPM includes features with marginal posterior inclusion probabilities greater than 0.5. It constructs the corresponding model matrix and computes the model fit using the specified likelihood.
get.mpm.model( result, y, x, labels = F, family = "gaussian", loglik.pi = gaussian.loglik, params = NULL )
get.mpm.model( result, y, x, labels = F, family = "gaussian", loglik.pi = gaussian.loglik, params = NULL )
result |
A fitted model object (e.g., from |
y |
A numeric vector of response values. For |
x |
A |
labels |
If specified, custom labels of covariates can be used. Default is |
family |
Character string specifying the model family. Supported options are:
If an unsupported family is provided, a warning is issued and the Gaussian likelihood is used by default. |
loglik.pi |
A function that computes the log-likelihood. Defaults to |
params |
Parameters of |
A bgnlm_model
object containing:
prob
The log marginal likelihood of the MPM.
model
A logical vector indicating included features.
crit
Criterion label set to "MPM"
.
coefs
A named numeric vector of model coefficients, including the intercept.
## Not run: # Simulate data set.seed(42) x <- data.frame( PlanetaryMassJpt = rnorm(100), RadiusJpt = rnorm(100), PeriodDays = rnorm(100) ) y <- 1 + 0.5 * x$PlanetaryMassJpt - 0.3 * x$RadiusJpt + rnorm(100) # Assume 'result' is a fitted object from gmjmcmc or mjmcmc result <- mjmcmc(cbind(y,x)) # Get the MPM mpm_model <- get.mpm.model(result, y, x, family = "gaussian") # Access coefficients mpm_model$coefs ## End(Not run)
## Not run: # Simulate data set.seed(42) x <- data.frame( PlanetaryMassJpt = rnorm(100), RadiusJpt = rnorm(100), PeriodDays = rnorm(100) ) y <- 1 + 0.5 * x$PlanetaryMassJpt - 0.3 * x$RadiusJpt + rnorm(100) # Assume 'result' is a fitted object from gmjmcmc or mjmcmc result <- mjmcmc(cbind(y,x)) # Get the MPM mpm_model <- get.mpm.model(result, y, x, family = "gaussian") # Access coefficients mpm_model$coefs ## End(Not run)
Main algorithm for GMJMCMC (Genetically Modified MJMCMC)
gmjmcmc( data, loglik.pi = gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, transforms, P = 10, N.init = 100, N.final = 100, probs = NULL, params = NULL, sub = FALSE, verbose = TRUE )
gmjmcmc( data, loglik.pi = gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, transforms, P = 10, N.init = 100, N.final = 100, probs = NULL, params = NULL, sub = FALSE, verbose = TRUE )
data |
A matrix containing the data to use in the algorithm, first column should be the dependent variable, and the rest of the columns should be the independent variables. |
loglik.pi |
The (log) density to explore |
loglik.alpha |
The likelihood function to use for alpha calculation |
transforms |
A Character vector including the names of the non-linear functions to be used by the modification and the projection operator. |
P |
The number of generations for GMJMCMC (Genetically Modified MJMCMC). The default value is $P = 10$. A larger value like $P = 50$ might be more realistic for more complicated examples where one expects a lot of non-linear structures. |
N.init |
The number of iterations per population (total iterations = (T-1)*N.init+N.final) |
N.final |
The number of iterations for the final population (total iterations = (T-1)*N.init+N.final) |
probs |
A list of the various probability vectors to use |
params |
A list of the various parameters for all the parts of the algorithm |
sub |
An indicator that if the likelihood is inexact and should be improved each model visit (EXPERIMENTAL!) |
verbose |
A logical denoting if messages should be printed |
A list containing the following elements:
models |
All models per population. |
lo.models |
All local optimization models per population. |
populations |
All features per population. |
marg.probs |
Marginal feature probabilities per population. |
model.probs |
Marginal feature probabilities per population. |
model.probs.idx |
Marginal feature probabilities per population. |
best.margs |
Best marginal model probability per population. |
accept |
Acceptance rate per population. |
accept.tot |
Overall acceptance rate. |
best |
Best marginal model probability throughout the run, represented as the maximum value in |
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) summary(result) plot(result)
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) summary(result) plot(result)
Run multiple gmjmcmc (Genetically Modified MJMCMC) runs in parallel returning a list of all results.
gmjmcmc.parallel( runs = 2, cores = getOption("mc.cores", 2L), merge.options = list(populations = "best", complex.measure = 2, tol = 1e-07), data, loglik.pi = gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, transforms, ... )
gmjmcmc.parallel( runs = 2, cores = getOption("mc.cores", 2L), merge.options = list(populations = "best", complex.measure = 2, tol = 1e-07), data, loglik.pi = gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, transforms, ... )
runs |
The number of runs to run |
cores |
The number of cores to run on |
merge.options |
A list of options to pass to the |
data |
A matrix containing the data to use in the algorithm, first column should be the dependent variable, and the rest of the columns should be the independent variables. |
loglik.pi |
The (log) density to explore |
loglik.alpha |
The likelihood function to use for alpha calculation |
transforms |
A Character vector including the names of the non-linear functions to be used by the modification and the projection operator. |
... |
Further parameters passed to mjmcmc. |
Results from multiple gmjmcmc runs
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) summary(result) plot(result)
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) summary(result) plot(result)
heavy side function
hs(x)
hs(x)
x |
The vector of values |
as.integer(x>0)
hs(2)
hs(2)
Log likelihood function for linear regression using Zellners g-prior
linear.g.prior.loglik(y, x, model, complex, params = list(g = 4))
linear.g.prior.loglik(y, x, model, complex, params = list(g = 4))
y |
A vector containing the dependent variable |
x |
The matrix containing the precalculated features |
model |
The model to estimate as a logical vector |
complex |
A list of complexity measures for the features |
params |
A list of parameters for the log likelihood, supplied by the user |
A list with the log marginal likelihood combined with the log prior (crit) and the posterior mode of the coefficients (coefs).
linear.g.prior.loglik(rnorm(100), matrix(rnorm(100)), TRUE, list(oc=1))
linear.g.prior.loglik(rnorm(100), matrix(rnorm(100)), TRUE, list(oc=1))
Log model prior function
log_prior(params, complex)
log_prior(params, complex)
params |
list of passed parameters of the likelihood in GMJMCMC |
complex |
list of complexity measures of the features included into the model |
A numeric with the log model prior.
log_prior(params = list(r=2), complex = list(oc = 2))
log_prior(params = list(r=2), complex = list(oc = 2))
Log likelihood function for logistic regression with a prior p(m)=sum(total_width) This function is created as an example of how to create an estimator that is used to calculate the marginal likelihood of a model.
logistic.loglik(y, x, model, complex, params = list(r = exp(-0.5)))
logistic.loglik(y, x, model, complex, params = list(r = exp(-0.5)))
y |
A vector containing the dependent variable |
x |
The matrix containing the precalculated features |
model |
The model to estimate as a logical vector |
complex |
A list of complexity measures for the features |
params |
A list of parameters for the log likelihood, supplied by the user |
A list with the log marginal likelihood combined with the log prior (crit) and the posterior mode of the coefficients (coefs).
logistic.loglik(as.integer(rnorm(100) > 0), matrix(rnorm(100)), TRUE, list(oc = 1))
logistic.loglik(as.integer(rnorm(100) > 0), matrix(rnorm(100)), TRUE, list(oc = 1))
Log likelihood function for logistic regression with an approximate Laplace approximations used This function is created as an example of how to create an estimator that is used to calculate the marginal likelihood of a model.
logistic.loglik.ala(y, x, model, complex, params = list(r = exp(-0.5)))
logistic.loglik.ala(y, x, model, complex, params = list(r = exp(-0.5)))
y |
A vector containing the dependent variable |
x |
The matrix containing the precalculated features |
model |
The model to estimate as a logical vector |
complex |
A list of complexity measures for the features |
params |
A list of parameters for the log likelihood, supplied by the user |
A list with the log marginal likelihood combined with the log prior (crit) and the posterior mode of the coefficients (coefs).
logistic.loglik.ala(as.integer(rnorm(100) > 0), matrix(rnorm(100)), TRUE, list(oc = 1))
logistic.loglik.ala(as.integer(rnorm(100) > 0), matrix(rnorm(100)), TRUE, list(oc = 1))
Log likelihood function for logistic regression for alpha calculation This function is just the bare likelihood function
logistic.loglik.alpha(a, data, mu_func)
logistic.loglik.alpha(a, data, mu_func)
a |
A vector of the alphas to be used |
data |
The data to be used for calculation |
mu_func |
The function linking the mean to the covariates, as a string with the alphas as a[i]. |
A numeric with the log likelihood.
Function for calculating marginal inclusion probabilities of features given a list of models
marginal.probs(models)
marginal.probs(models)
models |
The list of models to use. |
A numeric vector of marginal model probabilities based on relative frequencies of model visits in MCMC.
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) marginal.probs(result$models[[1]])
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) marginal.probs(result$models[[1]])
Merge a list of multiple results from many runs This function will weight the features based on the best marginal posterior in that population and merge the results together, simplifying by merging equivalent features (having high correlation).
merge_results( results, populations = NULL, complex.measure = NULL, tol = NULL, data = NULL )
merge_results( results, populations = NULL, complex.measure = NULL, tol = NULL, data = NULL )
results |
A list containing multiple results from GMJMCMC (Genetically Modified MJMCMC). |
populations |
Which populations should be merged from the results, can be "all", "last" (default) or "best". |
complex.measure |
The complex measure to use when finding the simplest equivalent feature, 1=total width, 2=operation count and 3=depth. |
tol |
The tolerance to use for the correlation when finding equivalent features, default is 0.0000001 |
data |
Data to use when comparing features, default is NULL meaning that mock data will be generated, if data is supplied it should be of the same form as is required by gmjmcmc, i.e. with both x, y and an intercept. |
An object of class "gmjmcmc_merged" containing the following elements:
features |
The features where equivalent features are represented in their simplest form. |
marg.probs |
Importance of features. |
counts |
Counts of how many versions that were present of each feature. |
results |
Results as they were passed to the function. |
pop.best |
The population in the results which contained the model with the highest log marginal posterior. |
thread.best |
The thread in the results which contained the model with the highest log marginal posterior. |
crit.best |
The highest log marginal posterior for any model in the results. |
reported |
The highest log marginal likelihood for the reported populations as defined in the populations argument. |
rep.pop |
The index of the population which contains reported. |
best.log.posteriors |
A matrix where the first column contains the population indices and the second column contains the model with the highest log marginal posterior within that population. |
rep.thread |
The index of the thread which contains reported. |
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) summary(result) plot(result) merge_results(result$results)
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) summary(result) plot(result) merge_results(result$results)
Main algorithm for MJMCMC (Genetically Modified MJMCMC)
mjmcmc( data, loglik.pi = gaussian.loglik, N = 100, probs = NULL, params = NULL, sub = FALSE, verbose = TRUE )
mjmcmc( data, loglik.pi = gaussian.loglik, N = 100, probs = NULL, params = NULL, sub = FALSE, verbose = TRUE )
data |
A matrix containing the data to use in the algorithm, first column should be the dependent variable, and the rest of the columns should be the independent variables. |
loglik.pi |
The (log) density to explore |
N |
The number of iterations to run for |
probs |
A list of the various probability vectors to use |
params |
A list of the various parameters for all the parts of the algorithm |
sub |
An indicator that if the likelihood is inexact and should be improved each model visit (EXPERIMENTAL!) |
verbose |
A logical denoting if messages should be printed |
A list containing the following elements:
models |
All visited models. |
accept |
Average acceptance rate of the chain. |
lo.models |
All models visited during local optimization. |
best.crit |
The highest log marginal probability of the visited models. |
marg.probs |
Marginal probabilities of the features. |
model.probs |
Marginal probabilities of all of the visited models. |
model.probs.idx |
Indices of unique visited models. |
populations |
The covariates represented as a list of features. |
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) summary(result) plot(result)
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) summary(result) plot(result)
Run multiple mjmcmc runs in parallel, merging the results before returning.
mjmcmc.parallel(runs = 2, cores = getOption("mc.cores", 2L), ...)
mjmcmc.parallel(runs = 2, cores = getOption("mc.cores", 2L), ...)
runs |
The number of runs to run |
cores |
The number of cores to run on |
... |
Further parameters passed to mjmcmc. |
Merged results from multiple mjmcmc runs
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) summary(result) plot(result)
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) summary(result) plot(result)
Function to generate a function string for a model consisting of features
model.string(model, features, link = "I", round = 2)
model.string(model, features, link = "I", round = 2)
model |
A logical vector indicating which features to include |
features |
The population of features |
link |
The link function to use, as a string |
round |
Rounding error for the features in the printed format |
A character representation of a model
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) summary(result) plot(result) model.string(c(TRUE, FALSE, TRUE, FALSE, TRUE), result$populations[[1]]) model.string(result$models[[1]][1][[1]]$model, result$populations[[1]])
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) summary(result) plot(result) model.string(c(TRUE, FALSE, TRUE, FALSE, TRUE), result$populations[[1]]) model.string(result$models[[1]][1][[1]]$model, result$populations[[1]])
Negative GELU function
ngelu(x)
ngelu(x)
x |
The vector of values |
-x*pnorm(-x)
ngelu(2)
ngelu(2)
negative heavy side function
nhs(x)
nhs(x)
x |
The vector of values |
as.integer(x<0)
nhs(2)
nhs(2)
not x
not(x)
not(x)
x |
The vector of binary values |
1-x
not(TRUE)
not(TRUE)
negative ReLu function
nrelu(x)
nrelu(x)
x |
The vector of values |
max(-x,0)
nrelu(2)
nrelu(2)
p0 polynomial term
p0(x)
p0(x)
x |
The vector of values |
log(abs(x) + .Machine$double.eps)
p0(2)
p0(2)
p05 polynomial term
p05(x)
p05(x)
x |
The vector of values |
(abs(x)+.Machine$double.eps)^(0.5)
p05(2)
p05(2)
p0p0 polynomial term
p0p0(x)
p0p0(x)
x |
The vector of values |
p0(x)*p0(x)
p0p0(2)
p0p0(2)
p0p05 polynomial term
p0p05(x)
p0p05(x)
x |
The vector of values |
p0(x)*(abs(x)+.Machine$double.eps)^(0.5)
p0p05(2)
p0p05(2)
p0p1 polynomial term
p0p1(x)
p0p1(x)
x |
The vector of values |
p0(x)*x
p0p1(2)
p0p1(2)
p0p2 polynomial term
p0p2(x)
p0p2(x)
x |
The vector of values |
p0(x)*x^(2)
p0p2(2)
p0p2(2)
p0p3 polynomial term
p0p3(x)
p0p3(x)
x |
The vector of values |
p0(x)*x^(3)
p0p3(2)
p0p3(2)
p0pm05 polynomial term
p0pm05(x)
p0pm05(x)
x |
The vector of values |
p0(x)sign(x)(abs(x)+.Machine$double.eps)^(-0.5)
p0pm05(2)
p0pm05(2)
p0pm1 polynomial terms
p0pm1(x)
p0pm1(x)
x |
The vector of values |
p0(x)*(x+.Machine$double.eps)^(-1)
p0pm1(2)
p0pm1(2)
p0pm2 polynomial term
p0pm2(x)
p0pm2(x)
x |
The vector of values |
p0(x)sign(x)(abs(x)+.Machine$double.eps)^(-2)
p0pm2(2)
p0pm2(2)
p2 polynomial term
p2(x)
p2(x)
x |
The vector of values |
x^(2)
p2(2)
p2(2)
p3 polynomial term
p3(x)
p3(x)
x |
The vector of values |
x^(3)
p3(2)
p3(2)
Function to plot the results, works both for results from gmjmcmc and merged results from merge.results
## S3 method for class 'gmjmcmc' plot(x, count = "all", pop = "best", tol = 1e-07, data = NULL, ...)
## S3 method for class 'gmjmcmc' plot(x, count = "all", pop = "best", tol = 1e-07, data = NULL, ...)
x |
The results to use |
count |
The number of features to plot, defaults to all |
pop |
The population to plot, defaults to last |
tol |
The tolerance to use for the correlation when finding equivalent features, default is 0.0000001 |
data |
Data to merge on, important if pre-filtering was used |
... |
Not used. |
No return value, just creates a plot
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) plot(result)
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) plot(result)
Plot a gmjmcmc_merged run
## S3 method for class 'gmjmcmc_merged' plot(x, count = "all", pop = NULL, tol = 1e-07, data = NULL, ...)
## S3 method for class 'gmjmcmc_merged' plot(x, count = "all", pop = NULL, tol = 1e-07, data = NULL, ...)
x |
The results to use |
count |
The number of features to plot, defaults to all |
pop |
The population to plot, defaults to last |
tol |
The tolerance to use for the correlation when finding equivalent features, default is 0.0000001 |
data |
Data to merge on, important if pre-filtering was used |
... |
Not used. |
No return value, just creates a plot
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) plot(result)
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) plot(result)
Function to plot the results, works both for results from gmjmcmc and merged results from merge.results
## S3 method for class 'mjmcmc' plot(x, count = "all", ...)
## S3 method for class 'mjmcmc' plot(x, count = "all", ...)
x |
The results to use |
count |
The number of features to plot, defaults to all |
... |
Not used. |
No return value, just creates a plot
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) plot(result)
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) plot(result)
Plot a mjmcmc_parallel run
## S3 method for class 'mjmcmc_parallel' plot(x, count = "all", ...)
## S3 method for class 'mjmcmc_parallel' plot(x, count = "all", ...)
x |
The results to use |
count |
The number of features to plot, defaults to all |
... |
Not used. |
No return value, just creates a plot
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) plot(result)
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) plot(result)
pm05 polynomial term
pm05(x)
pm05(x)
x |
The vector of values |
(abs(x)+.Machine$double.eps)^(-0.5)
pm05(2)
pm05(2)
pm1 polynomial term
pm1(x)
pm1(x)
x |
The vector of values |
sign(x)*(abs(x)+.Machine$double.eps)^(-1)
pm1(2)
pm1(2)
pm2 polynomial term
pm2(x)
pm2(x)
x |
The vector of values |
sign(x)*(abs(x)+.Machine$double.eps)^(-2)
pm2(2)
pm2(2)
This function generates predictions from a fitted bgnlm_model
object given a new dataset.
## S3 method for class 'bgnlm_model' predict( object, x, link = function(x) { x }, ... )
## S3 method for class 'bgnlm_model' predict( object, x, link = function(x) { x }, ... )
object |
A fitted |
x |
A |
link |
A link function to apply to the linear predictor.
By default, it is the identity function |
... |
Additional arguments to pass to prediction function. |
A numeric vector of predicted values for the given data x
.
These predictions are calculated as ,
where
is the design matrix and
are the model coefficients.
## Not run: # Example with simulated data set.seed(123) x_train <- data.frame(PlanetaryMassJpt = rnorm(100), RadiusJpt = rnorm(100)) model <- list( coefs = c(Intercept = -0.5, PlanetaryMassJpt = 0.2, RadiusJpt = -0.1), class = "bgnlm_model" ) class(model) <- "bgnlm_model" # New data for prediction x_new <- data.frame(PlanetaryMassJpt = c(0.1, -0.3), RadiusJpt = c(0.2, -0.1)) # Predict using the identity link (default) preds <- predict.bgnlm_model(model, x_new) ## End(Not run)
## Not run: # Example with simulated data set.seed(123) x_train <- data.frame(PlanetaryMassJpt = rnorm(100), RadiusJpt = rnorm(100)) model <- list( coefs = c(Intercept = -0.5, PlanetaryMassJpt = 0.2, RadiusJpt = -0.1), class = "bgnlm_model" ) class(model) <- "bgnlm_model" # New data for prediction x_new <- data.frame(PlanetaryMassJpt = c(0.1, -0.3), RadiusJpt = c(0.2, -0.1)) # Predict using the identity link (default) preds <- predict.bgnlm_model(model, x_new) ## End(Not run)
Predict using a gmjmcmc result object.
## S3 method for class 'gmjmcmc' predict( object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), pop = NULL, tol = 1e-07, ... )
## S3 method for class 'gmjmcmc' predict( object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), pop = NULL, tol = 1e-07, ... )
object |
The model to use. |
x |
The new data to use for the prediction, a matrix where each row is an observation. |
link |
The link function to use |
quantiles |
The quantiles to calculate credible intervals for the posterior modes (in model space). |
pop |
The population to plot, defaults to last |
tol |
The tolerance to use for the correlation when finding equivalent features, default is 0.0000001 |
... |
Not used. |
A list containing aggregated predictions and per model predictions.
aggr |
Aggregated predictions with mean and quantiles. |
preds |
A list of lists containing individual predictions per model per population in object. |
result <- gmjmcmc( matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) preds <- predict(result, matrix(rnorm(600), 100))
result <- gmjmcmc( matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) preds <- predict(result, matrix(rnorm(600), 100))
Predict using a merged gmjmcmc result object.
## S3 method for class 'gmjmcmc_merged' predict( object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), pop = NULL, tol = 1e-07, ... )
## S3 method for class 'gmjmcmc_merged' predict( object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), pop = NULL, tol = 1e-07, ... )
object |
The model to use. |
x |
The new data to use for the prediction, a matrix where each row is an observation. |
link |
The link function to use |
quantiles |
The quantiles to calculate credible intervals for the posterior modes (in model space). |
pop |
The population to plot, defaults to last |
tol |
The tolerance to use for the correlation when finding equivalent features, default is 0.0000001 |
... |
Not used. |
A list containing aggregated predictions and per model predictions.
aggr |
Aggregated predictions with mean and quantiles. |
preds |
A list of lists containing individual predictions per model per population in object. |
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) preds <- predict(result, matrix(rnorm(600), 100))
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) preds <- predict(result, matrix(rnorm(600), 100))
Predict using a gmjmcmc result object from a parallel run.
## S3 method for class 'gmjmcmc_parallel' predict(object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), ...)
## S3 method for class 'gmjmcmc_parallel' predict(object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), ...)
object |
The model to use. |
x |
The new data to use for the prediction, a matrix where each row is an observation. |
link |
The link function to use |
quantiles |
The quantiles to calculate credible intervals for the posterior modes (in model space). |
... |
Additional arguments to pass to merge_results. |
A list containing aggregated predictions and per model predictions.
aggr |
Aggregated predictions with mean and quantiles. |
preds |
A list of lists containing individual predictions per model per population in object. |
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) preds <- predict(result$results, matrix(rnorm(600), 100))
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) preds <- predict(result$results, matrix(rnorm(600), 100))
Predict using a mjmcmc result object.
## S3 method for class 'mjmcmc' predict(object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), ...)
## S3 method for class 'mjmcmc' predict(object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), ...)
object |
The model to use. |
x |
The new data to use for the prediction, a matrix where each row is an observation. |
link |
The link function to use |
quantiles |
The quantiles to calculate credible intervals for the posterior modes (in model space). |
... |
Not used. |
A list containing aggregated predictions.
mean |
Mean of aggregated predictions. |
quantiles |
Quantiles of aggregated predictions. |
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) preds <- predict(result, matrix(rnorm(500), 100))
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) preds <- predict(result, matrix(rnorm(500), 100))
Predict using a mjmcmc result object from a parallel run.
## S3 method for class 'mjmcmc_parallel' predict(object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), ...)
## S3 method for class 'mjmcmc_parallel' predict(object, x, link = function(x) x, quantiles = c(0.025, 0.5, 0.975), ...)
object |
The model to use. |
x |
The new data to use for the prediction, a matrix where each row is an observation. |
link |
The link function to use |
quantiles |
The quantiles to calculate credible intervals for the posterior modes (in model space). |
... |
Not used. |
A list containing aggregated predictions.
mean |
Mean of aggregated predictions. |
quantiles |
Quantiles of aggregated predictions. |
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) preds <- predict(result, matrix(rnorm(500), 100))
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) preds <- predict(result, matrix(rnorm(500), 100))
Print method for "feature" class
## S3 method for class 'feature' print(x, dataset = FALSE, alphas = FALSE, labels = FALSE, round = FALSE, ...)
## S3 method for class 'feature' print(x, dataset = FALSE, alphas = FALSE, labels = FALSE, round = FALSE, ...)
x |
An object of class "feature" |
dataset |
Set the regular covariates as columns in a dataset |
alphas |
Print a "?" instead of actual alphas to prepare the output for alpha estimation |
labels |
Should the covariates be named, or just referred to as their place in the data.frame. |
round |
Should numbers be rounded when printing? Default is FALSE, otherwise it can be set to the number of decimal places. |
... |
Not used. |
String representation of a feature
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) print(result$populations[[1]][1])
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) print(result$populations[[1]][1])
ReLu function
relu(x)
relu(x)
x |
The vector of values |
max(x,0)
relu(2)
relu(2)
This function applies a function in parallel to a list or vector (X
) using multiple cores.
On Linux/macOS, it uses mclapply
, while on Windows it uses a hackish version of parallelism.
The Windows version is based on parLapply
to mimic forking following Nathan VanHoudnos.
rmclapply(runs, args, fun, mc.cores = NULL)
rmclapply(runs, args, fun, mc.cores = NULL)
runs |
The runs to run |
args |
The arguments to pass to fun |
fun |
The function to run |
mc.cores |
Number of cores to use for parallel processing. Defaults to |
A list of results, with one element for each element of X
.
A 210 times 3221 matrix with indviduals along the rows and expression data along the columns
data(SangerData2)
data(SangerData2)
A data frame with 210 rows and 3221 variables
The first column corresponds to column number 24266 (with name GI_6005726-S) in the original data. Column names give the name of the variables, row names the "name" of the individuals. This is a subset of SangerData where the 3220 last rows are select among all original rows following the same pre-processing procedure as in (section 1.6.1). See also the file Read_sanger_data.R
Dataset downloaded from https://ftp.sanger.ac.uk/pub/genevar/
References:
Stranger, BE et al (2007): Relative impact of nucleotide and copy number variation on gene expression phenotypes Science, 2007•science.org
Bogdan et al (2020): Handbook of Multiple Comparisons, https://arxiv.org/pdf/2011.12154
Set the transformations option for GMJMCMC (Genetically Modified MJMCMC), this is also done when running the algorithm, but this function allows for it to be done manually.
set.transforms(transforms)
set.transforms(transforms)
transforms |
The vector of non-linear transformations |
No return value, just sets the gmjmcmc-transformations option
set.transforms(c("p0","p1"))
set.transforms(c("p0","p1"))
Sigmoid function
sigmoid(x)
sigmoid(x)
x |
The vector of values |
The sigmoid of x
sigmoid(2)
sigmoid(2)
Sine function for degrees
sin_deg(x)
sin_deg(x)
x |
The vector of values in degrees |
The sine of x
sin_deg(0)
sin_deg(0)
Square root function
sqroot(x)
sqroot(x)
x |
The vector of values |
The square root of the absolute value of x
sqroot(4)
sqroot(4)
Function to get a character representation of a list of features
string.population(x, round = 2)
string.population(x, round = 2)
x |
A list of feature objects |
round |
Rounding precision for parameters of the features |
A matrix of character representations of the features of a model.
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) string.population(result$populations[[1]])
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) string.population(result$populations[[1]])
Function to get a character representation of a list of models
string.population.models(features, models, round = 2, link = "I")
string.population.models(features, models, round = 2, link = "I")
features |
A list of feature objects on which the models are build |
models |
A list of model objects |
round |
Rounding precision for parameters of the features |
link |
The link function to use, as a string |
A matrix of character representations of a list of models.
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) string.population.models(result$populations[[2]], result$models[[2]])
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) string.population.models(result$populations[[2]], result$models[[2]])
Function to print a quick summary of the results
## S3 method for class 'gmjmcmc' summary( object, pop = "best", tol = 1e-04, labels = FALSE, effects = NULL, data = NULL, verbose = TRUE, ... )
## S3 method for class 'gmjmcmc' summary( object, pop = "best", tol = 1e-04, labels = FALSE, effects = NULL, data = NULL, verbose = TRUE, ... )
object |
The results to use |
pop |
The population to print for, defaults to last |
tol |
The tolerance to use as a threshold when reporting the results. |
labels |
Should the covariates be named, or just referred to as their place in the data.frame. |
effects |
Quantiles for posterior modes of the effects across models to be reported, if either effects are NULL or if labels are NULL, no effects are reported. |
data |
Data to merge on, important if pre-filtering was used |
verbose |
If the summary should be printed to the console or just returned, defaults to TRUE |
... |
Not used. |
A data frame containing the following columns:
feats.strings |
Character representation of the features ordered by marginal probabilities. |
marg.probs |
Marginal probabilities corresponding to the ordered feature strings. |
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) summary(result, pop = "best")
result <- gmjmcmc(matrix(rnorm(600), 100), P = 2, gaussian.loglik, NULL, c("p0", "exp_dbl")) summary(result, pop = "best")
Function to print a quick summary of the results
## S3 method for class 'gmjmcmc_merged' summary( object, tol = 1e-04, labels = FALSE, effects = NULL, pop = NULL, data = NULL, verbose = TRUE, ... )
## S3 method for class 'gmjmcmc_merged' summary( object, tol = 1e-04, labels = FALSE, effects = NULL, pop = NULL, data = NULL, verbose = TRUE, ... )
object |
The results to use |
tol |
The tolerance to use as a threshold when reporting the results. |
labels |
Should the covariates be named, or just referred to as their place in the data.frame. |
effects |
Quantiles for posterior modes of the effects across models to be reported, if either effects are NULL or if labels are NULL, no effects are reported. |
pop |
If null same as in merge.options for running parallel gmjmcmc otherwise results will be re-merged according to pop that can be "all", "last", "best" |
data |
Data to merge on, important if pre-filtering was used |
verbose |
If the summary should be printed to the console or just returned, defaults to TRUE |
... |
Not used. |
A data frame containing the following columns:
feats.strings |
Character representation of the features ordered by marginal probabilities. |
marg.probs |
Marginal probabilities corresponding to the ordered feature strings. |
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) summary(result)
result <- gmjmcmc.parallel( runs = 1, cores = 1, list(populations = "best", complex.measure = 2, tol = 0.0000001), matrix(rnorm(600), 100), P = 2, gaussian.loglik, loglik.alpha = gaussian.loglik.alpha, c("p0", "exp_dbl") ) summary(result)
Function to print a quick summary of the results
## S3 method for class 'mjmcmc' summary( object, tol = 1e-04, labels = FALSE, effects = NULL, verbose = TRUE, ... )
## S3 method for class 'mjmcmc' summary( object, tol = 1e-04, labels = FALSE, effects = NULL, verbose = TRUE, ... )
object |
The results to use |
tol |
The tolerance to use as a threshold when reporting the results. |
labels |
Should the covariates be named, or just referred to as their place in the data.frame. |
effects |
Quantiles for posterior modes of the effects across models to be reported, if either effects are NULL or if labels are NULL, no effects are reported. |
verbose |
If the summary should be printed to the console or just returned, defaults to TRUE |
... |
Not used. |
A data frame containing the following columns:
feats.strings |
Character representation of the covariates ordered by marginal probabilities. |
marg.probs |
Marginal probabilities corresponding to the ordered feature strings. |
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) summary(result)
result <- mjmcmc(matrix(rnorm(600), 100), gaussian.loglik) summary(result)
Function to print a quick summary of the results
## S3 method for class 'mjmcmc_parallel' summary( object, tol = 1e-04, labels = FALSE, effects = NULL, verbose = TRUE, ... )
## S3 method for class 'mjmcmc_parallel' summary( object, tol = 1e-04, labels = FALSE, effects = NULL, verbose = TRUE, ... )
object |
The results to use |
tol |
The tolerance to use as a threshold when reporting the results. |
labels |
Should the covariates be named, or just referred to as their place in the data.frame. |
effects |
Quantiles for posterior modes of the effects across models to be reported, if either effects are NULL or if labels are NULL, no effects are reported. |
verbose |
If the summary should be printed to the console or just returned, defaults to TRUE |
... |
Not used. |
A data frame containing the following columns:
feats.strings |
Character representation of the covariates ordered by marginal probabilities. |
marg.probs |
Marginal probabilities corresponding to the ordered feature strings. |
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) summary(result)
result <- mjmcmc.parallel(runs = 1, cores = 1, matrix(rnorm(600), 100), gaussian.loglik) summary(result)
To the 2.3 power function
to23(x)
to23(x)
x |
The vector of values |
x^2.3
to23(2)
to23(2)
To 2.5 power
to25(x)
to25(x)
x |
The vector of values |
x^(2.5)
to25(2)
to25(2)
To 3.5 power
to35(x)
to35(x)
x |
The vector of values |
x^(3.5)
to35(2)
to35(2)
To the 7/2 power function
to72(x)
to72(x)
x |
The vector of values |
x^(7/2)
to72(2)
to72(2)
Cube root function
troot(x)
troot(x)
x |
The vector of values |
The cube root of x
troot(27)
troot(27)