Title: | Higher Criticism Tuned Regression |
---|---|
Description: | A novel searching scheme for tuning parameter in high-dimensional penalized regression. We propose a new estimate of the regularization parameter based on an estimated lower bound of the proportion of false null hypotheses (Meinshausen and Rice (2006) <doi:10.1214/009053605000000741>). The bound is estimated by applying the empirical null distribution of the higher criticism statistic, a second-level significance testing, which is constructed by dependent p-values from a multi-split regression and aggregation method (Jeng, Zhang and Tzeng (2019) <doi:10.1080/01621459.2018.1518236>). An estimate of tuning parameter in penalized regression is decided corresponding to the lower bound of the proportion of false null hypotheses. Different penalized regression methods are provided in the multi-split algorithm. |
Authors: | Tao Jiang [aut, cre] |
Maintainer: | Tao Jiang <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2024-11-17 03:54:23 UTC |
Source: | https://github.com/cran/HCTR |
Calculates bounding sequence of higher crticism for proportion estimator using p-values
bounding.seq(p.value, alpha)
bounding.seq(p.value, alpha)
p.value |
A matrix of p-values from permutation: row is from each permutation; column is from each variable. |
alpha |
Probability of Type I error for bounding sequence, the default value is 1 / sqrt(log(p)), where p is number of p-values in each permutation. |
A bounding value of higher criticism with (1 - alpha) confidence.
Jeng XJ, Zhang T, Tzeng J (2019). “Efficient Signal Inclusion With Genomic Applications.” Journal of the American Statistical Association, 1–23.
set.seed(10) X <- matrix(runif(n = 10000, min = 0, max = 1), nrow = 100) result <- bounding.seq(p.value = X)
set.seed(10) X <- matrix(runif(n = 10000, min = 0, max = 1), nrow = 100) result <- bounding.seq(p.value = X)
Estimate upper and lower bound of new tuning region of regularization parameter Lambda.
est.lambda(cv.fit, pihat, p, cov.num = 0)
est.lambda(cv.fit, pihat, p, cov.num = 0)
cv.fit |
An object of either class "cv.glmnet" from glmnet::cv.glmnet() or class "cv.ncvreg" from ncvreg::cv.ncvreg(), which is a list generated by a cross-validation fit. |
pihat |
eatimated proprtion from HCTR::est.prop(). |
p |
Total number of variables, except for covariates. |
cov.num |
Number of covariates in model, default is 0. Covariate matrix, W, is assumed on the left side of variable matrix, X. The column index of covariates are before those of variables. |
A list of (1) lambda.max, upper bound of new tuning region; (2) lambda.min, lower bound of new tuning region.
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) fit <- glmnet::cv.glmnet(x = X, y = Y) pihat <- 0.01 result <- est.lambda(cv.fit = fit, pihat = pihat, p = ncol(X))
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) fit <- glmnet::cv.glmnet(x = X, y = Y) pihat <- 0.01 result <- est.lambda(cv.fit = fit, pihat = pihat, p = ncol(X))
Estimates false null hypothesis Proportion from multiple p-values using higher criticism test estimator.
est.prop(p.value, cn, adj = TRUE)
est.prop(p.value, cn, adj = TRUE)
p.value |
A sequence of p-values from test data, not including p-values from covariates. |
cn |
A value of bounding sequence generated by HCTR::bounding.seq(). |
adj |
A boolean algebra to decide whether to use adjusted Higher Criticism test statistic, the default value is TRUE. |
An estimated proportion of false null hypothesis.
Meinshausen N, Rice J (2006). “Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses.” The Annals of Statistics, 34(1), 373–393.
set.seed(10) X <- matrix(runif(n = 10000, min = 0, max = 1), nrow = 100) result <- bounding.seq(p.value = X) Y <- matrix(runif(n = 100, min = 0, max = 1), nrow = 100) test <- est.prop(p.value = Y, cn = result)
set.seed(10) X <- matrix(runif(n = 10000, min = 0, max = 1), nrow = 100) result <- bounding.seq(p.value = X) Y <- matrix(runif(n = 100, min = 0, max = 1), nrow = 100) test <- est.prop(p.value = Y, cn = result)
Returns the index of final selected variables in the final chosen model.
final.selection(cv.fit, pihat, p, cov.num = 0)
final.selection(cv.fit, pihat, p, cov.num = 0)
cv.fit |
An object of either class "cv.glmnet" from glmnet::cv.glmnet() or class "cv.ncvreg" from ncvreg::cv.ncvreg(), which is a list generated by a cross-validation fit. |
pihat |
eatimated proprtion from HCTR::est.prop(). |
p |
Total number of variables, except for covariates. |
cov.num |
Number of covariates in model, default is 0. Covariate matrix, W, is assumed on the left side of variable matrix, X. The column index of covariates are before those of variables. |
A sequence of index of final selected variables in the final chosen model.
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) fit <- glmnet::cv.glmnet(x = X, y = Y) pihat <- 0.01 result <- est.lambda(cv.fit = fit, pihat = pihat, p = ncol(X)) lambda.seq <- seq(from = result$lambda.min, to = result$lambda.max, length.out = 100) # Note: The lambda sequences in glmnet and ncvreg are diffrent. fit2 <- glmnet::cv.glmnet(x = X, y = Y, lambda = lambda.seq) result2 <- final.selection(cv.fit = fit2, pihat = 0.01, p = ncol(X))
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) fit <- glmnet::cv.glmnet(x = X, y = Y) pihat <- 0.01 result <- est.lambda(cv.fit = fit, pihat = pihat, p = ncol(X)) lambda.seq <- seq(from = result$lambda.min, to = result$lambda.max, length.out = 100) # Note: The lambda sequences in glmnet and ncvreg are diffrent. fit2 <- glmnet::cv.glmnet(x = X, y = Y, lambda = lambda.seq) result2 <- final.selection(cv.fit = fit2, pihat = 0.01, p = ncol(X))
Calculates p-values in high-dimentional linear models using multi-split method
highdim.p(Y, X, W = NULL, type, B = 100, fold.num)
highdim.p(Y, X, W = NULL, type, B = 100, fold.num)
Y |
A numeric response vector, containing nobs variables. |
X |
An input matrix, of dimension nobs x nvars. |
W |
A covariate matrix, of dimension nobs x ncors, default is NULL. |
type |
Penalized regression type, valid parameters include "Lasso", "AdaLasso", "SCAD", and "MCP". |
B |
Multi-split times, default is 100. |
fold.num |
The number of cross validation folds. |
A list of objects containing: (1) harmonic mean p-values; (2) original p-values; (3) index of selected samples; (4) index of selected variables
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) result <- highdim.p(Y=Y, X=X, type = "Lasso", B = 2, fold.num = 10)
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) result <- highdim.p(Y=Y, X=X, type = "Lasso", B = 2, fold.num = 10)
Multi-splitted variable selection using Adaptive Lasso
multi.adlasso(X, Y, covar.num = NULL, fold.num)
multi.adlasso(X, Y, covar.num = NULL, fold.num)
X |
An input matrix, of dimension nobs x nvars. |
Y |
A numeric response vector, containing nobs variables. |
covar.num |
Number of covariates in model, default is NULL. |
fold.num |
The number of cross validation folds. |
A list of two numeric objects of index of (1) selected and (2) unselected variables.
Multi-splitted variable selection using Lasso
multi.lasso(X, Y, p.fac = NULL, fold.num)
multi.lasso(X, Y, p.fac = NULL, fold.num)
X |
An input matrix, of dimension nobs x nvars. |
Y |
A numeric response vector, containing nobs variables. |
p.fac |
A sequence of penalty factor applied on each variable. |
fold.num |
The number of cross validation folds. |
A list of two numeric objects of index of (1) selected and (2) unselected variables.
Multi-splitted variable selection using MCP
multi.mcp(X, Y, p.fac = NULL, fold.num)
multi.mcp(X, Y, p.fac = NULL, fold.num)
X |
An input matrix, of dimension nobs x nvars. |
Y |
A numeric response vector, containing nobs variables. |
p.fac |
A sequence of penalty factor applied on each variable. |
fold.num |
The number of cross validation folds. |
A list of two numeric objects of index of (1) selected and (2) unselected variables.
Multi-splitted variable selection using SCAD
multi.scad(X, Y, p.fac = NULL, fold.num)
multi.scad(X, Y, p.fac = NULL, fold.num)
X |
An input matrix, of dimension nobs x nvars. |
Y |
A numeric response vector, containing nobs variables. |
p.fac |
A sequence of penalty factor applied on each variable. |
fold.num |
The number of cross validation folds. |
A list of two numeric objects of index of (1) selected and (2) unselected variables.
Calculates
pmpv(Y, X, W = NULL, type, B = 100, fold.num = 10, perm.num = 1000)
pmpv(Y, X, W = NULL, type, B = 100, fold.num = 10, perm.num = 1000)
Y |
A numeric response vector, containing nobs variables. |
X |
An input matrix, of dimension nobs x nvars. |
W |
A covariate matrix, of dimension nobs x ncors, default is NULL. |
type |
Penalized regression type, valid parameters include "Lasso", "AdaLasso", "SCAD", and "MCP". |
B |
Multi-split times, default is 100. |
fold.num |
The number of cross validation folds, default is 10. |
perm.num |
Permutation times, default is 1000. |
A matrix containing harmonic mean p-values from permutation.
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) result <- pmpv(Y=Y, X=X, type = "Lasso", B = 2, fold.num = 10, perm.num = 10)
set.seed(10) X <- matrix(rnorm(20000), nrow = 100) beta <- rep(0, 200) beta[1:100] <- 5 Y <- MASS::mvrnorm(n = 1, mu = X%*%beta, Sigma = diag(100)) result <- pmpv(Y=Y, X=X, type = "Lasso", B = 2, fold.num = 10, perm.num = 10)