Title: | Coordinate-Wise Mediation Filter |
---|---|
Description: | Functions to discover, plot, and select multiple mediators from an x -> M -> y linear system. This exploratory mediation analysis is performed using the Coordinate-wise Mediation Filter as introduced by Van Kesteren and Oberski (2019) <doi: 10.1080/10705511.2019.1588124>. |
Authors: | Erik-Jan van Kesteren <[email protected]> |
Maintainer: | Erik-Jan van Kesteren <[email protected]> |
License: | MIT + file LICENCE |
Version: | 0.9.5 |
Built: | 2024-12-02 04:12:33 UTC |
Source: | https://github.com/vankesteren/cmfilter |
This function combines two cmf objects and returns one cmf object with the combined results. This helps with combining results done over multiple runs, for example in high-performance computing.
## S3 method for class 'cmf' x + y
## S3 method for class 'cmf' x + y
x |
a cmf object |
y |
a cmf object |
a cmf object with combined results
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) # create two different cmf objects on this data res_1 <- cmf(dat, nStarts = 500) res_2 <- cmf(dat, nStarts = 500) # Combine the results using the + operator res_1 + res_2
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) # create two different cmf objects on this data res_1 <- cmf(dat, nStarts = 500) res_2 <- cmf(dat, nStarts = 500) # Combine the results using the + operator res_1 + res_2
This function performs CMF on a set of potential mediators, given an input and an output variable. It selects those mediators that are deemed relevant by a default or a user-defined decision function, *conditional* on the other mediators in the model. By doing this cyclically, and with multiple random starts, the algorithm outputs an estimate of the best mediators.
cmf( x, M, y, decisionFunction = "prodcoef", nStarts = 1000, nCores = 2, cutoff = 0.5, maxIter = 25, stableLag = 5, pb = TRUE, ... )
cmf( x, M, y, decisionFunction = "prodcoef", nStarts = 1000, nCores = 2, cutoff = 0.5, maxIter = 25, stableLag = 5, pb = TRUE, ... )
x |
exogenous variable; numeric vector or data frame with x, y, and at least one M column |
M |
potential mediators; data frame with column names |
y |
outcome variable; numeric vector |
decisionFunction |
either a function with as inputs x, m, y, parameters, and as output a TRUE (include) or FALSE (exclude) statement or a string indicating the built-in decision function to use (see details) |
nStarts |
how many times to start the algorithm |
nCores |
how many threads (cores) to use for parallel processing (default 2) |
cutoff |
a cutoff value for selection: variables are selected if they
display a selection rate higher than this value. Only relevant when multiple
starts are specified. Can also be specified post-hoc using
|
maxIter |
the maximum number of iterations for each start |
stableLag |
how long does the selection need to be stable before deciding the algorithm has converged |
pb |
Whether to display a progress bar (default TRUE). Only available with built-in decision functions |
... |
parameters passed to decisionFunction |
Available decision functions. These functions are implemented in
C++ to speed up computation. Between brackets the additional parameter that
may be passed to the function in the ...
argument of this function.
(arguments
= defaultvalue
):
- prodcoef
(p.value
= 0.1): Test for the product of
coefficients, Sobel test
- causalsteps
(p.value
= 0.1): Causal steps test min(Ta, Tb)
an object of class cmf
. See cmf-methods
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) # Run CMF on this data cmf(dat)
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) # Run CMF on this data cmf(dat)
Plotting, summarising, and printing cmf objects
## S3 method for class 'cmf' plot( x, select, line = TRUE, labelSelected = TRUE, defaultColour = "#00008b", highlightColour = "#e2bd36", ... ) ## S3 method for class 'cmf' screeplot(x, topn, ...) ## S3 method for class 'cmf' summary(object, ...) ## S3 method for class 'cmf' print(x, ...)
## S3 method for class 'cmf' plot( x, select, line = TRUE, labelSelected = TRUE, defaultColour = "#00008b", highlightColour = "#e2bd36", ... ) ## S3 method for class 'cmf' screeplot(x, topn, ...) ## S3 method for class 'cmf' summary(object, ...) ## S3 method for class 'cmf' print(x, ...)
x |
cmf object |
select |
optional selection vector of variables to show |
line |
whether to show a line at the chosen cutoff |
labelSelected |
whether to label the selected mediators in the plot, not used when fewer than 20 bars are shown. |
defaultColour |
the colour for the bars lower than the cutoff |
highlightColour |
the colour for the bars of the selected mediators |
... |
other arguments passed to barplot and summary |
topn |
only show the top n mediators |
object |
cmf object |
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) res <- cmf(dat) # screeplot of the result screeplot(res) # manhattan style plot the result plot(res)
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) res <- cmf(dat) # screeplot of the result screeplot(res) # manhattan style plot the result plot(res)
This function generates a dataset from an x -> M -> y model, where M may be of any size with any correlation matrix.
generateMed( n = 100L, a = 0.3, b = 0.3, r2y = 0.5, dir = 0, Sigma, residual = FALSE, empirical = FALSE, scaley = FALSE, forma = identity, formb = identity )
generateMed( n = 100L, a = 0.3, b = 0.3, r2y = 0.5, dir = 0, Sigma, residual = FALSE, empirical = FALSE, scaley = FALSE, forma = identity, formb = identity )
n |
Sample size |
a |
Vector of a path coefficients within 0 and 1 |
b |
Vector of b path coefficients within 0 and 1 |
r2y |
Proportion of explained variance in y. Set to
|
dir |
Direct path from x to y |
Sigma |
Desired true covariance matrix between the mediators M |
residual |
Whether Sigma indicates residual or marginal covariance |
empirical |
Ensure observed data matrix has exactly the requested covmat (only if Sigma is specified) |
scaley |
Whether to standardise y (changes b path coefficients) |
forma |
Functional form of the a paths. Function that accepts a matrix as input and transforms each column to the desired form. |
formb |
Functional form of the b paths. Function that accepts a vector. |
A data frame with columns x, M.1 - M.p, y
# Generate a suppression dataset where M.2 is suppressed sup <- generateMed(n = 100, a = c(-0.4, 0.4), b = c(0.8, 0.48), Sigma = matrix(c(1, -0.6, -0.6, 1), 2))
# Generate a suppression dataset where M.2 is suppressed sup <- generateMed(n = 100, a = c(-0.4, 0.4), b = c(0.8, 0.48), Sigma = matrix(c(1, -0.6, -0.6, 1), 2))
A dataset containing variables on childhood trauma, stress response, and 1000 preprocessed epigenetic methylation values. This dataset was synthesised using the synthpop
package to remove any identifying information.
methylation
methylation
A data frame with 85 rows and 1002 variables: a composite score on a childhood trauma questionnaire (x), a measure of increase in area under the curve of cortisol after stress task (y), and beta values of methylation at 1000 different locations in the genome.
Houtepen, L. C., Vinkers, C. H., Carrillo-Roa, T., Hiemstra, M., Van Lier, P. A., Meeus, W., ... & Schalkwyk, L. C. (2016). Genome-wide DNA methylation levels and altered cortisol stress reactivity following childhood trauma in humans. Nature communications, 7, 10967.
van Kesteren, E. J., & Oberski, D. L. (2018). Exploratory Mediation Analysis with Many Potential Mediators. arXiv preprint arXiv:1810.06334.
This function sets the cutoff value on a cmf object for mediator selection. Any cutoff value between 0 and 1 is allowed, where potential mediators with empirical selection probability (selection rate) above the cutoff will be considered mediators and the others will not. The cutoff can be entered manually or, when set to "mc", be based on a monte carlo simulation. See "details"
setCutoff(object, cutoff = 0.5)
setCutoff(object, cutoff = 0.5)
object |
a cmf object |
cutoff |
either a number between 0 and 1 or "mc" - see details |
The monte carlo determination is based on a procedure in PCA and Factor Analysis called "Parallel Analysis". We generate data from the null hypothesis with the same dimensionality as the original dataset and perform the algorithm 100 times. This creates a distribution of nonmediator selection rates from which we can determine the cutoff (the 99.9th percentile)
a cmf object with updated cutoff value
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) res <- cmf(dat) # set the cutoff for this result at 0.1 setCutoff(res, 0.1)
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) res <- cmf(dat) # set the cutoff for this result at 0.1 setCutoff(res, 0.1)
This function adds additional samples to the existing results object
## S3 method for class 'cmf' update(object, nStarts = 100, ...)
## S3 method for class 'cmf' update(object, nStarts = 100, ...)
object |
a cmf object |
nStarts |
the number of starts to add (default 100) |
... |
not used |
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) res <- cmf(dat, nStarts = 200) # double the samples res <- update(res, 500)
# generate some data dat <- generateMed(a = (1:10)/20, b = (1:10)/20) res <- cmf(dat, nStarts = 200) # double the samples res <- update(res, 500)