| Title: | Sparse Functional Data Analysis Methods |
|---|---|
| Description: | Provides algorithms to fit linear regression models under several popular penalization techniques and functional linear regression models based on Majorizing-Minimizing (MM) and Alternating Direction Method of Multipliers (ADMM) techniques. See Boyd et al (2010) <doi:10.1561/2200000016> for complete introduction to the method. |
| Authors: | Mauro Bernardi [aut, cre], Marco Stefanucci [aut], Antonio Canale [ctb] |
| Maintainer: | Mauro Bernardi <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.2 |
| Built: | 2026-05-27 06:27:20 UTC |
| Source: | https://github.com/cran/fdaSP |
Provides algorithms to fit linear regression models under several popular penalization techniques and functional linear regression models based on Majorizing-Minimizing (MM) and Alternating Direction Method of Multipliers (ADMM) techniques. See Boyd et al (2010) <doi:10.1561/2200000016> for complete introduction to the method.
Index of help topics:
confband Function to plot the confidence bands
f2fSP Overlap Group Least Absolute Shrinkage and
Selection Operator for function-on-function
regression model
f2fSP_cv Cross-validation for Overlap Group Least
Absolute Shrinkage and Selection Operator for
function-on-function regression model
f2sSP Overlap Group Least Absolute Shrinkage and
Selection Operator for scalar-on-function
regression model
f2sSP_cv Cross-validation for Overlap Group Least
Absolute Shrinkage and Selection Operator on
scalar-on-function regression model
fdaSP-package Sparse Functional Data Analysis Methods
forward_diff Forward finite difference approximation of
order d
lmSP Sparse Adaptive Overlap Group Least Absolute
Shrinkage and Selection Operator
lmSP_cv Cross-validation for Sparse Adaptive Overlap
Group Least Absolute Shrinkage and Selection
Operator
softhresh Function to solve the soft thresholding problem
Mauro Bernardi <[email protected]>
Mauro Bernardi [aut, cre], Marco Stefanucci [aut], Antonio Canale [ctb]
Function to plot the confidence bands
confband(xV, yVmin, yVmax)confband(xV, yVmin, yVmax)
xV |
the values for the x-axis. |
yVmin |
the minimum values for the y-axis. |
yVmax |
the maximum values for the y-axis. |
a polygon.
Overlap Group-LASSO for function-on-function regression model solves the following optimization problem
to obtain a sparse coefficient vector for the functional penalized predictor , where the coefficient matrix ,
the regression function ,
and are two B-splines bases of order and dimension and , respectively. For each group , each row of
the matrix has non-zero entries only for those bases belonging
to that group. These values are provided by the arguments groups and group_weights (see below).
Each basis function belongs to more than one group. The diagonal matrix contains
the basis-specific weights. These values are provided by the argument var_weights (see below).
The regularization path is computed for the overlap group-LASSO penalty at a grid of values for the regularization
parameter using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method.
f2fSP( mY, mX, L, M, group_weights = NULL, var_weights = NULL, standardize.data = TRUE, splOrd = 4, lambda = NULL, lambda.min.ratio = NULL, nlambda = 30, overall.group = FALSE, control = list() )f2fSP( mY, mX, L, M, group_weights = NULL, var_weights = NULL, standardize.data = TRUE, splOrd = 4, lambda = NULL, lambda.min.ratio = NULL, nlambda = 30, overall.group = FALSE, control = list() )
mY |
an |
mX |
an |
L |
number of elements of the B-spline basis vector |
M |
number of elements of the B-spline basis vector |
group_weights |
a vector of length |
var_weights |
a vector of length |
standardize.data |
logical. Should data be standardized? |
splOrd |
the order |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
overall.group |
logical. If it is TRUE, an overall group including all penalized covariates is added. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
an solution matrix for the parameters , which corresponds to the minimum in-sample MSE.
an array of estimated coefficients for each lambda.
an matrix providing the estimated functional coefficient for .
an array providing the estimated functional coefficients for for each lambda.
sequence of lambda.
value of lambda that attains the minimum in-sample MSE.
in-sample mean squared error.
minimum value of the in-sample MSE for the sequence of lambda.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
When you run the algorithm, output returns not only the solution, but also the iteration history recording following fields over iterates,
objective function value.
norm of primal residual.
norm of dual residual.
feasibility tolerance for primal feasibility condition.
feasibility tolerance for dual feasibility condition.
Iteration stops when both r_norm and s_norm values
become smaller than eps_pri and eps_dual, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) and Lin et al. (2022) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) and Lin et al. (2022) for details.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926. https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237. doi:10.1561/2200000016. http://dx.doi.org/10.1561/2200000016.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8. doi:10.1007/978-981-16-9840-8. With forewords by Zongben Xu and Zhi-Quan Luo.
## generate sample data set.seed(4321) s <- seq(0, 1, length.out = 100) t <- seq(0, 1, length.out = 100) p1 <- 5 p2 <- 6 r <- 10 n <- 50 beta_basis1 <- splines::bs(s, df = p1, intercept = TRUE) # first basis for beta beta_basis2 <- splines::bs(s, df = p2, intercept = TRUE) # second basis for beta data_basis <- splines::bs(s, df = r, intercept = TRUE) # basis for X x_0 <- apply(matrix(rnorm(p1 * p2, sd = 1), p1, p2), 1, fdaSP::softhresh, 1.5) # regression coefficients x_fun <- beta_basis2 %*% x_0 %*% t(beta_basis1) fun_data <- matrix(rnorm(n*r), n, r) %*% t(data_basis) b <- fun_data %*% x_fun + rnorm(n * 100, sd = sd(fun_data %*% x_fun )/3) ## set the hyper-parameters maxit <- 1000 rho_adaptation <- FALSE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ## fit functional regression model mod <- f2fSP(mY = b, mX = fun_data, L = p1, M = p2, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min.ratio = NULL, control = list("abstol" = abstol, "reltol" = reltol, "maxit" = maxit, "adaptation" = rho_adaptation, rho = rho, "print.out" = FALSE)) mycol <- function (n) { palette <- colorRampPalette(RColorBrewer::brewer.pal(11, "Spectral")) palette(n) } cols <- mycol(1000) oldpar <- par(mfrow = c(1, 2)) image(x_0, col = cols) image(mod$sp.coefficients, col = cols) par(oldpar) oldpar <- par(mfrow = c(1, 2)) image(x_fun, col = cols) contour(x_fun, add = TRUE) image(beta_basis2 %*% mod$sp.coefficients %*% t(beta_basis1), col = cols) contour(beta_basis2 %*% mod$sp.coefficients %*% t(beta_basis1), add = TRUE) par(oldpar)## generate sample data set.seed(4321) s <- seq(0, 1, length.out = 100) t <- seq(0, 1, length.out = 100) p1 <- 5 p2 <- 6 r <- 10 n <- 50 beta_basis1 <- splines::bs(s, df = p1, intercept = TRUE) # first basis for beta beta_basis2 <- splines::bs(s, df = p2, intercept = TRUE) # second basis for beta data_basis <- splines::bs(s, df = r, intercept = TRUE) # basis for X x_0 <- apply(matrix(rnorm(p1 * p2, sd = 1), p1, p2), 1, fdaSP::softhresh, 1.5) # regression coefficients x_fun <- beta_basis2 %*% x_0 %*% t(beta_basis1) fun_data <- matrix(rnorm(n*r), n, r) %*% t(data_basis) b <- fun_data %*% x_fun + rnorm(n * 100, sd = sd(fun_data %*% x_fun )/3) ## set the hyper-parameters maxit <- 1000 rho_adaptation <- FALSE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ## fit functional regression model mod <- f2fSP(mY = b, mX = fun_data, L = p1, M = p2, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min.ratio = NULL, control = list("abstol" = abstol, "reltol" = reltol, "maxit" = maxit, "adaptation" = rho_adaptation, rho = rho, "print.out" = FALSE)) mycol <- function (n) { palette <- colorRampPalette(RColorBrewer::brewer.pal(11, "Spectral")) palette(n) } cols <- mycol(1000) oldpar <- par(mfrow = c(1, 2)) image(x_0, col = cols) image(mod$sp.coefficients, col = cols) par(oldpar) oldpar <- par(mfrow = c(1, 2)) image(x_fun, col = cols) contour(x_fun, add = TRUE) image(beta_basis2 %*% mod$sp.coefficients %*% t(beta_basis1), col = cols) contour(beta_basis2 %*% mod$sp.coefficients %*% t(beta_basis1), add = TRUE) par(oldpar)
Overlap Group-LASSO for function-on-function regression model solves the following optimization problem
to obtain a sparse coefficient vector for the functional penalized predictor , where the coefficient matrix ,
the regression function ,
and are two B-splines bases of order and dimension and , respectively. For each group , each row of
the matrix has non-zero entries only for those bases belonging
to that group. These values are provided by the arguments groups and group_weights (see below).
Each basis function belongs to more than one group. The diagonal matrix contains
the basis-specific weights. These values are provided by the argument var_weights (see below).
The regularization path is computed for the overlap group-LASSO penalty at a grid of values for the regularization
parameter using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method.
f2fSP_cv( mY, mX, L, M, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, lambda.min.ratio = NULL, nlambda = NULL, cv.fold = 5, overall.group = FALSE, control = list() )f2fSP_cv( mY, mX, L, M, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, lambda.min.ratio = NULL, nlambda = NULL, cv.fold = 5, overall.group = FALSE, control = list() )
mY |
an |
mX |
an |
L |
number of elements of the B-spline basis vector |
M |
number of elements of the B-spline basis vector |
group_weights |
a vector of length |
var_weights |
a vector of length |
standardize.data |
logical. Should data be standardized? |
splOrd |
the order |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
cv.fold |
the number of folds - default is 5. |
overall.group |
logical. If it is TRUE, an overall group including all penalized covariates is added. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
an solution matrix for the parameters , which corresponds to the minimum cross-validated MSE.
an matrix providing the estimated functional coefficient for corresponding to the minimum cross-validated MSE.
sequence of lambda.
value of lambda that attains the cross-validated minimum mean squared error.
index of the lambda sequence corresponding to lambda.min.
cross-validated mean squared error.
minimum value of the cross-validated MSE for the sequence of lambda.
standard deviation of the cross-validated mean squared error.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
Iteration stops when both r_norm and s_norm values
become smaller than eps_pri and eps_dual, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) and Lin et al. (2022) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) and Lin et al. (2022) for details.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926. https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237. doi:10.1561/2200000016. http://dx.doi.org/10.1561/2200000016.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8. doi:10.1007/978-981-16-9840-8. With forewords by Zongben Xu and Zhi-Quan Luo.
## generate sample data set.seed(4321) s <- seq(0, 1, length.out = 100) t <- seq(0, 1, length.out = 100) p1 <- 5 p2 <- 6 r <- 10 n <- 50 beta_basis1 <- splines::bs(s, df = p1, intercept = TRUE) # first basis for beta beta_basis2 <- splines::bs(s, df = p2, intercept = TRUE) # second basis for beta data_basis <- splines::bs(s, df = r, intercept = TRUE) # basis for X x_0 <- apply(matrix(rnorm(p1 * p2, sd = 1), p1, p2), 1, fdaSP::softhresh, 1.5) # regression coefficients x_fun <- beta_basis2 %*% x_0 %*% t(beta_basis1) fun_data <- matrix(rnorm(n*r), n, r) %*% t(data_basis) b <- fun_data %*% x_fun + rnorm(n * 100, sd = sd(fun_data %*% x_fun )/3) ## set the hyper-parameters maxit <- 1000 rho_adaptation <- FALSE rho <- 0.01 reltol <- 1e-5 abstol <- 1e-5 ## fit functional regression model mod_cv <- f2fSP_cv(mY = b, mX = fun_data, L = p1, M = p2, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, cv.fold = 5, lambda.min.ratio = NULL, control = list("abstol" = abstol, "reltol" = reltol, "maxit" = maxit, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- f2fSP(mY = b, mX = fun_data, L = p1, M = p2, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min.ratio = NULL, control = list("abstol" = abstol, "reltol" = reltol, "maxit" = maxit, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - x_0)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2)## generate sample data set.seed(4321) s <- seq(0, 1, length.out = 100) t <- seq(0, 1, length.out = 100) p1 <- 5 p2 <- 6 r <- 10 n <- 50 beta_basis1 <- splines::bs(s, df = p1, intercept = TRUE) # first basis for beta beta_basis2 <- splines::bs(s, df = p2, intercept = TRUE) # second basis for beta data_basis <- splines::bs(s, df = r, intercept = TRUE) # basis for X x_0 <- apply(matrix(rnorm(p1 * p2, sd = 1), p1, p2), 1, fdaSP::softhresh, 1.5) # regression coefficients x_fun <- beta_basis2 %*% x_0 %*% t(beta_basis1) fun_data <- matrix(rnorm(n*r), n, r) %*% t(data_basis) b <- fun_data %*% x_fun + rnorm(n * 100, sd = sd(fun_data %*% x_fun )/3) ## set the hyper-parameters maxit <- 1000 rho_adaptation <- FALSE rho <- 0.01 reltol <- 1e-5 abstol <- 1e-5 ## fit functional regression model mod_cv <- f2fSP_cv(mY = b, mX = fun_data, L = p1, M = p2, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, cv.fold = 5, lambda.min.ratio = NULL, control = list("abstol" = abstol, "reltol" = reltol, "maxit" = maxit, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- f2fSP(mY = b, mX = fun_data, L = p1, M = p2, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min.ratio = NULL, control = list("abstol" = abstol, "reltol" = reltol, "maxit" = maxit, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - x_0)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2)
Overlap Group-LASSO for scalar-on-function regression model solves the following optimization problem
to obtain a sparse coefficient vector for the functional penalized predictor and a coefficient vector for the unpenalized scalar predictors . The regression function is
where is a B-spline basis of order and dimension . For each group , each row of
the matrix has non-zero entries only for those bases belonging
to that group. These values are provided by the arguments groups and group_weights (see below).
Each basis function belongs to more than one group. The diagonal matrix contains
the basis-specific weights. These values are provided by the argument var_weights (see below).
The regularization path is computed for the overlap group-LASSO penalty at a grid of values for the regularization
parameter using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method.
f2sSP( vY, mX, mZ = NULL, M, group_weights = NULL, var_weights = NULL, weights_adaptive = FALSE, standardize.data = TRUE, splOrd = 4, diff_order = 1, lambda = NULL, lambda2 = NULL, nlambda = 30, lambda.min.ratio = NULL, intercept = FALSE, overall.group = FALSE, control = list() )f2sSP( vY, mX, mZ = NULL, M, group_weights = NULL, var_weights = NULL, weights_adaptive = FALSE, standardize.data = TRUE, splOrd = 4, diff_order = 1, lambda = NULL, lambda2 = NULL, nlambda = 30, lambda.min.ratio = NULL, intercept = FALSE, overall.group = FALSE, control = list() )
vY |
a length- |
mX |
a |
mZ |
an |
M |
number of elements of the B-spline basis vector |
group_weights |
a vector of length |
var_weights |
a vector of length |
weights_adaptive |
logical. If |
standardize.data |
logical. Should data be standardized? |
splOrd |
the order |
diff_order |
order of the discrete difference operator used in the smoothness penalty. The default is 1. |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
lambda2 |
either a non-negative smoothing regularization parameter or a vector of
smoothing regularization parameters. If |
nlambda |
the number of lambda values - default is 30. |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
intercept |
logical. If it is TRUE, a column of ones is added to the design matrix. |
overall.group |
logical. If it is TRUE, an overall group including all penalized covariates is added. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
a length- solution vector for the parameters , which corresponds to the minimum in-sample MSE.
an matrix of estimated coefficients for each lambda.
a length- vector providing the estimated functional coefficient for .
an matrix providing the estimated functional coefficients for for each lambda.
a length- solution vector for the parameters , which corresponds to the minimum in-sample MSE.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
an matrix of estimated coefficients for each lambda.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
estimated degrees of freedom for each value of the regularization parameter.
active groups used in the degrees-of-freedom computation for each value of the regularization parameter.
active coefficient indices used in the degrees-of-freedom computation for each value of the regularization parameter.
active basis coefficients used in the degrees-of-freedom computation for each value of the regularization parameter.
Bayesian information criterion computed along the regularization path.
value of lambda that attains the minimum in-sample MSE.
sequence of lambda.
sequence of smoothing regularization parameters. It is NULL when no smoothness penalty is used.
in-sample mean squared error.
minimum value of the in-sample MSE for the sequence of lambda.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
When you run the algorithm, output returns not only the solution, but also the iteration history recording following fields over iterates,
objective function value.
norm of primal residual.
norm of dual residual.
feasibility tolerance for primal feasibility condition.
feasibility tolerance for dual feasibility condition.
Iteration stops when both r_norm and s_norm values
become smaller than eps_pri and eps_dual, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) and Lin et al. (2022) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) and Lin et al. (2022) for details.
tolerance parameter used in the degrees-of-freedom computation to determine active constraints.
tolerance parameter used in the degrees-of-freedom computation to determine active coefficients or groups.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926. https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237. doi:10.1561/2200000016. http://dx.doi.org/10.1561/2200000016.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8. doi:10.1007/978-981-16-9840-8. With forewords by Zongben Xu and Zhi-Quan Luo.
## generate sample data set.seed(1) n <- 40 p <- 18 # number of basis to GENERATE beta r <- 100 s <- seq(0, 1, length.out = r) beta_basis <- splines::bs(s, df = p, intercept = TRUE) # basis coef_data <- matrix(rnorm(n*floor(p/2)), n, floor(p/2)) fun_data <- coef_data %*% t(splines::bs(s, df = floor(p/2), intercept = TRUE)) x_0 <- apply(matrix(rnorm(p, sd=1),p,1), 1, fdaSP::softhresh, 1) # regression coefficients x_fun <- beta_basis %*% x_0 b <- fun_data %*% x_fun + rnorm(n, sd = sqrt(crossprod(fun_data %*% x_fun ))/10) l <- 10^seq(2, -4, length.out = 30) maxit <- 1000 ## set the hyper-parameters maxit <- 1000 rho_adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 mod <- f2sSP(vY = b, mX = fun_data, M = p, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min = NULL, overall.group = FALSE, control = list("abstol" = abstol, "reltol" = reltol, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) # plot coefficiente path matplot(log(mod$lambda), mod$sp.coef.path, type = "l", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "", bty = "n", lwd = 1.2)## generate sample data set.seed(1) n <- 40 p <- 18 # number of basis to GENERATE beta r <- 100 s <- seq(0, 1, length.out = r) beta_basis <- splines::bs(s, df = p, intercept = TRUE) # basis coef_data <- matrix(rnorm(n*floor(p/2)), n, floor(p/2)) fun_data <- coef_data %*% t(splines::bs(s, df = floor(p/2), intercept = TRUE)) x_0 <- apply(matrix(rnorm(p, sd=1),p,1), 1, fdaSP::softhresh, 1) # regression coefficients x_fun <- beta_basis %*% x_0 b <- fun_data %*% x_fun + rnorm(n, sd = sqrt(crossprod(fun_data %*% x_fun ))/10) l <- 10^seq(2, -4, length.out = 30) maxit <- 1000 ## set the hyper-parameters maxit <- 1000 rho_adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 mod <- f2sSP(vY = b, mX = fun_data, M = p, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min = NULL, overall.group = FALSE, control = list("abstol" = abstol, "reltol" = reltol, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) # plot coefficiente path matplot(log(mod$lambda), mod$sp.coef.path, type = "l", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "", bty = "n", lwd = 1.2)
Overlap Group-LASSO for scalar-on-function regression model solves the following optimization problem
to obtain a sparse coefficient vector for the functional penalized predictor and a coefficient vector for the unpenalized scalar predictors . The regression function is
where is a B-spline basis of order and dimension .
For each group , each row of the matrix has non-zero entries only for those bases belonging
to that group. These values are provided by the arguments groups and group_weights (see below).
Each basis function belongs to more than one group. The diagonal matrix contains
the basis specific weights. These values are provided by the argument var_weights (see below).
The regularization path is computed for the overlap group-LASSO penalty at a grid of values for the regularization
parameter using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method.
f2sSP_cv( vY, mX, mZ = NULL, M, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, diff_order = 1, lambda = NULL, lambda.min.ratio = NULL, nlambda = NULL, lambda2 = NULL, cv.fold = 5, intercept = FALSE, overall.group = FALSE, control = list() )f2sSP_cv( vY, mX, mZ = NULL, M, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, diff_order = 1, lambda = NULL, lambda.min.ratio = NULL, nlambda = NULL, lambda2 = NULL, cv.fold = 5, intercept = FALSE, overall.group = FALSE, control = list() )
vY |
a length- |
mX |
a |
mZ |
an |
M |
number of elements of the B-spline basis vector |
group_weights |
a vector of length |
var_weights |
a vector of length |
standardize.data |
logical. Should data be standardized? |
splOrd |
the order |
diff_order |
order of the discrete difference operator used in the smoothness penalty. The default is 1. |
lambda |
either a regularization parameter or a vector of regularization parameters. |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
lambda2 |
either a non-negative smoothing regularization parameter or a vector of
smoothing regularization parameters. If |
cv.fold |
the number of folds - default is 5. |
intercept |
logical. If it is TRUE, a column of ones is added to the design matrix. |
overall.group |
logical. If it is TRUE, an overall group including all penalized covariates is added. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
a length- solution vector for the parameters , which corresponds to the minimum cross-validated MSE.
a length- vector providing the estimated functional coefficient for corresponding to the minimum cross-validated MSE.
a length- solution vector for the parameters , which corresponds to the minimum cross-validated MSE.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
sequence of lambda.
value of lambda that attains the minimum cross-validated MSE.
cross-validated mean squared error.
minimum value of the cross-validated MSE for the sequence of lambda.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
Iteration stops when both r_norm and s_norm values
become smaller than eps_pri and eps_dual, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) and Lin et al. (2022) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) and Lin et al. (2022) for details.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926. https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237. doi:10.1561/2200000016. http://dx.doi.org/10.1561/2200000016.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8. doi:10.1007/978-981-16-9840-8. With forewords by Zongben Xu and Zhi-Quan Luo.
## generate sample data and functional coefficients set.seed(1) n <- 40 p <- 18 r <- 100 s <- seq(0, 1, length.out = r) beta_basis <- splines::bs(s, df = p, intercept = TRUE) # basis coef_data <- matrix(rnorm(n*floor(p/2)), n, floor(p/2)) fun_data <- coef_data %*% t(splines::bs(s, df = floor(p/2), intercept = TRUE)) x_0 <- apply(matrix(rnorm(p, sd=1),p,1), 1, fdaSP::softhresh, 1) x_fun <- beta_basis %*% x_0 b <- fun_data %*% x_fun + rnorm(n, sd = sqrt(crossprod(fun_data %*% x_fun ))/10) l <- 10^seq(2, -4, length.out = 30) maxit <- 1000 ## set the hyper-parameters maxit <- 1000 rho_adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ## run cross-validation mod_cv <- f2sSP_cv(vY = b, mX = fun_data, M = p, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, lambda.min = 1e-5, nlambda = 30, cv.fold = 5, intercept = FALSE, control = list("abstol" = abstol, "reltol" = reltol, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- f2sSP(vY = b, mX = fun_data, M = p, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min = 1e-5, intercept = FALSE, control = list("abstol" = abstol, "reltol" = reltol, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - x_0)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2)## generate sample data and functional coefficients set.seed(1) n <- 40 p <- 18 r <- 100 s <- seq(0, 1, length.out = r) beta_basis <- splines::bs(s, df = p, intercept = TRUE) # basis coef_data <- matrix(rnorm(n*floor(p/2)), n, floor(p/2)) fun_data <- coef_data %*% t(splines::bs(s, df = floor(p/2), intercept = TRUE)) x_0 <- apply(matrix(rnorm(p, sd=1),p,1), 1, fdaSP::softhresh, 1) x_fun <- beta_basis %*% x_0 b <- fun_data %*% x_fun + rnorm(n, sd = sqrt(crossprod(fun_data %*% x_fun ))/10) l <- 10^seq(2, -4, length.out = 30) maxit <- 1000 ## set the hyper-parameters maxit <- 1000 rho_adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ## run cross-validation mod_cv <- f2sSP_cv(vY = b, mX = fun_data, M = p, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, lambda.min = 1e-5, nlambda = 30, cv.fold = 5, intercept = FALSE, control = list("abstol" = abstol, "reltol" = reltol, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- f2sSP(vY = b, mX = fun_data, M = p, group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4, lambda = NULL, nlambda = 30, lambda.min = 1e-5, intercept = FALSE, control = list("abstol" = abstol, "reltol" = reltol, "adaptation" = rho_adaptation, "rho" = rho, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - x_0)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2)
Forward finite difference approximation of order d
forward_diff(f, h, d)forward_diff(f, h, d)
f |
Numeric vector containing function values. |
h |
Step size (default = 1.0). |
d |
Order of the derivative (integer >= 1). |
Numeric vector of forward differences of order d (length n, last d elements = NA). #@examples #f <- c(0, 1, 4, 9, 16) #forward_diff(f = f, h = 1, d = 2)
Sparse Adaptive overlap group-LASSO, or sparse adaptive group -regularized regression, solves the following optimization problem
to obtain a sparse coefficient vector for the matrix of penalized predictors and a coefficient vector
for the matrix of unpenalized predictors . For each group , each row of
the matrix has non-zero entries only for those variables belonging
to that group. These values are provided by the arguments groups and group_weights (see below).
Each variable can belong to more than one group. The diagonal matrix contains the variable-specific weights. These values are
provided by the argument var_weights (see below). The diagonal matrix contains
the variable-specific weights. These values are provided by the argument var_weights_L1 (see below).
The regularization path is computed for the sparse adaptive overlap group-LASSO penalty at a grid of values for the regularization
parameter using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method. The regularization is a combination of and
simultaneous constraints. Different specifications of the penalty argument lead to different models choice:
The classical Lasso regularization (Tibshirani, 1996) can be obtained by specifying
and the matrix as the identity matrix. An adaptive version of this model (Zou, 2006) can be obtained if is
a diagonal matrix of adaptive weights. See also Hastie et al. (2015) for further details.
The group-Lasso regularization (Yuan and Lin, 2006) can be obtained by specifying ,
non-overlapping groups in and by setting the matrix equal to the identity matrix.
An adaptive version of this model can be obtained if the matrix is a diagonal matrix of adaptive weights. See also Hastie et al. (2015) for further details.
The sparse group-Lasso regularization (Simon et al., 2011) can be obtained by specifying ,
non-overlapping groups in and by setting the matrices and equal to the identity matrix.
An adaptive version of this model can be obtained if the matrices and are
diagonal matrices of adaptive weights.
The overlap group-Lasso regularization (Jenatton et al., 2011) can be obtained by specifying
, overlapping groups in and by setting the matrix equal to the identity matrix. An adaptive version of this model can be obtained if the matrix is a
diagonal matrix of adaptive weights.
The sparse overlap group-Lasso regularization (Jenatton et al., 2011) can be obtained by specifying
, overlapping groups in and by setting the matrices and equal to the identity matrix.
An adaptive version of this model can be obtained if the matrices and are diagonal matrices of adaptive weights.
lmSP( X, Z = NULL, y, penalty = c("LASSO", "GLASSO", "spGLASSO", "OVGLASSO", "spOVGLASSO"), groups = NULL, group_weights = NULL, var_weights = NULL, var_weights_L1 = NULL, weights_adaptive = FALSE, adaptive = FALSE, standardize.data = TRUE, intercept = FALSE, overall.group = FALSE, lambda = NULL, alpha = NULL, lambda.min.ratio = NULL, nlambda = 30, control = list() )lmSP( X, Z = NULL, y, penalty = c("LASSO", "GLASSO", "spGLASSO", "OVGLASSO", "spOVGLASSO"), groups = NULL, group_weights = NULL, var_weights = NULL, var_weights_L1 = NULL, weights_adaptive = FALSE, adaptive = FALSE, standardize.data = TRUE, intercept = FALSE, overall.group = FALSE, lambda = NULL, alpha = NULL, lambda.min.ratio = NULL, nlambda = 30, control = list() )
X |
an |
Z |
an |
y |
a length- |
penalty |
choose one from the following options: 'LASSO', for the or adaptive-Lasso penalties, 'GLASSO', for the group-Lasso penalty, 'spGLASSO', for the sparse group-Lasso penalty, 'OVGLASSO', for the overlap group-Lasso penalty and 'spOVGLASSO', for the sparse overlap group-Lasso penalty. |
groups |
either a vector of length |
group_weights |
a vector of length |
var_weights |
a vector of length |
var_weights_L1 |
a vector of length |
weights_adaptive |
logical. If |
adaptive |
a flag for running adaptive Lasso. The default is FALSE. |
standardize.data |
logical. Should data be standardized? |
intercept |
logical. If it is TRUE, a column of ones is added to the design matrix. |
overall.group |
logical. This setting is only available for the overlap group-LASSO and the sparse overlap group-LASSO penalties, otherwise it is set to NULL. If it is TRUE, an overall group including all penalized covariates is added. |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
alpha |
the sparse overlap group-LASSO mixing parameter, with |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
a length- solution vector for the parameters . If then the provided vector corresponds to the minimum in-sample MSE.
a length- solution vector for the parameters . If then the provided vector corresponds to the minimum in-sample MSE.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
an matrix of estimated coefficients for each lambda of the provided sequence.
an matrix of estimated coefficients for each lambda of the provided sequence.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
sequence of lambda.
value of lambda that attains the minimum in sample MSE.
in-sample mean squared error.
minimum value of the in-sample MSE for the sequence of lambda.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
When you run the algorithm, output returns not only the solution, but also the iteration history recording following fields over iterates:
objective function value
norm of primal residual
norm of dual residual
feasibility tolerance for primal feasibility condition
feasibility tolerance for dual feasibility condition.
Iteration stops when both r_norm and s_norm values
become smaller than eps_pri and eps_dual, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) for details.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
numeric tolerance for identifying effective zeros in the degrees-of-freedom calculation. The default is 5.
numeric tolerance for identifying effective zeros in the degrees-of-freedom calculation. The default is 0.001.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926. https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237. doi:10.1561/2200000016. http://dx.doi.org/10.1561/2200000016.
Hastie T, Tibshirani R, Wainwright M (2015). Statistical learning with sparsity: the lasso and generalizations, number 143 in Monographs on statistics and applied probability. CRC Press, Taylor & Francis Group, Boca Raton. ISBN 978-1-4987-1216-3.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8. doi:10.1007/978-981-16-9840-8. With forewords by Zongben Xu and Zhi-Quan Luo.
Simon N, Friedman J, Hastie T, Tibshirani R (2013). “A sparse-group lasso.” J. Comput. Graph. Statist., 22(2), 231–245. ISSN 1061-8600. doi:10.1080/10618600.2012.681250.
Yuan M, Lin Y (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Zou H (2006). “The adaptive lasso and its oracle properties.” J. Amer. Statist. Assoc., 101(476), 1418–1429. ISSN 0162-1459. doi:10.1198/016214506000000735.
### generate sample data set.seed(2023) n <- 50 p <- 30 X <- matrix(rnorm(n*p), n, p) ### Example 1, LASSO penalty beta <- apply(matrix(rnorm(p, sd = 1), p, 1), 1, fdaSP::softhresh, 1.5) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### set regularization parameter grid lam <- 10^seq(0, -2, length.out = 30) ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run example mod <- lmSP(X = X, y = y, penalty = "LASSO", standardize.data = FALSE, intercept = FALSE, lambda = lam, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation matplot(log(lam), mod$sp.coef.path, type = "l", main = "Lasso solution path", bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "") ### Example 2, sparse group-LASSO penalty beta <- c(rep(4, 12), rep(0, p - 13), -2) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### define groups of dimension 3 each group1 <- rep(1:10, each = 3) ### set regularization parameter grid lam <- 10^seq(1, -2, length.out = 30) ### set the alpha parameter alpha <- 0.5 ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run example mod <- lmSP(X = X, y = y, penalty = "spGLASSO", groups = group1, standardize.data = FALSE, intercept = FALSE, lambda = lam, alpha = 0.5, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation matplot(log(lam), mod$sp.coef.path, type = "l", main = "Sparse Group Lasso solution path", bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "")### generate sample data set.seed(2023) n <- 50 p <- 30 X <- matrix(rnorm(n*p), n, p) ### Example 1, LASSO penalty beta <- apply(matrix(rnorm(p, sd = 1), p, 1), 1, fdaSP::softhresh, 1.5) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### set regularization parameter grid lam <- 10^seq(0, -2, length.out = 30) ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run example mod <- lmSP(X = X, y = y, penalty = "LASSO", standardize.data = FALSE, intercept = FALSE, lambda = lam, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation matplot(log(lam), mod$sp.coef.path, type = "l", main = "Lasso solution path", bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "") ### Example 2, sparse group-LASSO penalty beta <- c(rep(4, 12), rep(0, p - 13), -2) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### define groups of dimension 3 each group1 <- rep(1:10, each = 3) ### set regularization parameter grid lam <- 10^seq(1, -2, length.out = 30) ### set the alpha parameter alpha <- 0.5 ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run example mod <- lmSP(X = X, y = y, penalty = "spGLASSO", groups = group1, standardize.data = FALSE, intercept = FALSE, lambda = lam, alpha = 0.5, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation matplot(log(lam), mod$sp.coef.path, type = "l", main = "Sparse Group Lasso solution path", bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "")
Sparse Adaptive overlap group-LASSO, or sparse adaptive group -regularized regression, solves the following optimization problem
to obtain a sparse coefficient vector for the matrix of penalized predictors and a coefficient vector
for the matrix of unpenalized predictors . For each group , each row of
the matrix has non-zero entries only for those variables belonging
to that group. These values are provided by the arguments groups and group_weights (see below).
Each variable can belong to more than one group. The diagonal matrix contains the variable-specific weights. These values are
provided by the argument var_weights (see below). The diagonal matrix contains
the variable-specific weights. These values are provided by the argument var_weights_L1 (see below).
The regularization path is computed for the sparse adaptive overlap group-LASSO penalty at a grid of values for the regularization
parameter using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method. The regularization is a combination of and
simultaneous constraints. Different specifications of the penalty argument lead to different models choice:
The classical Lasso regularization (Tibshirani, 1996) can be obtained by specifying
and the matrix as the identity matrix. An adaptive version of this model (Zou, 2006) can be obtained if is
a diagonal matrix of adaptive weights. See also Hastie et al. (2015) for further details.
The group-Lasso regularization (Yuan and Lin, 2006) can be obtained by specifying ,
non-overlapping groups in and by setting the matrix equal to the identity matrix.
An adaptive version of this model can be obtained if the matrix is a diagonal matrix of adaptive weights. See also Hastie et al. (2015) for further details.
The sparse group-Lasso regularization (Simon et al., 2011) can be obtained by specifying ,
non-overlapping groups in and by setting the matrices and equal to the identity matrix.
An adaptive version of this model can be obtained if the matrices and are
diagonal matrices of adaptive weights.
The overlap group-Lasso regularization (Jenatton et al., 2011) can be obtained by specifying
, overlapping groups in and by setting the matrix equal to the identity matrix. An adaptive version of this model can be obtained if the matrix is a
diagonal matrix of adaptive weights.
The sparse overlap group-Lasso regularization (Jenatton et al., 2011) can be obtained by specifying
, overlapping groups in and by setting the matrices and equal to the identity matrix.
An adaptive version of this model can be obtained if the matrices and are diagonal matrices of adaptive weights.
lmSP_cv( X, Z = NULL, y, penalty = c("LASSO", "GLASSO", "spGLASSO", "OVGLASSO", "spOVGLASSO"), groups = NULL, group_weights = NULL, var_weights = NULL, var_weights_L1 = NULL, cv.fold = 5, standardize.data = TRUE, intercept = FALSE, overall.group = FALSE, lambda = NULL, alpha = NULL, lambda.min.ratio = NULL, nlambda = 30, control = list() )lmSP_cv( X, Z = NULL, y, penalty = c("LASSO", "GLASSO", "spGLASSO", "OVGLASSO", "spOVGLASSO"), groups = NULL, group_weights = NULL, var_weights = NULL, var_weights_L1 = NULL, cv.fold = 5, standardize.data = TRUE, intercept = FALSE, overall.group = FALSE, lambda = NULL, alpha = NULL, lambda.min.ratio = NULL, nlambda = 30, control = list() )
X |
an |
Z |
an |
y |
a length- |
penalty |
choose one from the following options: 'LASSO', for the or adaptive-Lasso penalties, 'GLASSO', for the group-Lasso penalty, 'spGLASSO', for the sparse group-Lasso penalty, 'OVGLASSO', for the overlap group-Lasso penalty and 'spOVGLASSO', for the sparse overlap group-Lasso penalty. |
groups |
either a vector of length |
group_weights |
a vector of length |
var_weights |
a vector of length |
var_weights_L1 |
a vector of length |
cv.fold |
the number of folds - default is 5. |
standardize.data |
logical. Should data be standardized? |
intercept |
logical. If it is TRUE, a column of ones is added to the design matrix. |
overall.group |
logical. This setting is only available for the overlap group-LASSO and the sparse overlap group-LASSO penalties, otherwise it is set to NULL. If it is TRUE, an overall group including all penalized covariates is added. |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
alpha |
the sparse overlap group-LASSO mixing parameter, with |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
a length- solution vector for the parameters . If then the provided vector corresponds to the minimum cross-validated MSE.
a length- solution vector for the parameters . If then the provided vector corresponds to the minimum cross-validated MSE.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
an matrix of estimated coefficients for each lambda of the provided sequence.
an matrix of estimated coefficients for each lambda of the provided sequence.
It is provided only when either the matrix in input is not NULL or the intercept is set to TRUE.
sequence of lambda.
value of lambda that attains the minimum cross-validated MSE.
cross-validated mean squared error.
minimum value of the cross-validated MSE for the sequence of lambda.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
a vector of values between 1 and cv.fold identifying what fold each observation is in.
When you run the algorithm, output returns not only the solution, but also the iteration history recording following fields over iterates:
objective function value
norm of primal residual
norm of dual residual
feasibility tolerance for primal feasibility condition
feasibility tolerance for dual feasibility condition.
Iteration stops when both r_norm and s_norm values
become smaller than eps_pri and eps_dual, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) for details.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926. https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237. doi:10.1561/2200000016. http://dx.doi.org/10.1561/2200000016.
Hastie T, Tibshirani R, Wainwright M (2015). Statistical learning with sparsity: the lasso and generalizations, number 143 in Monographs on statistics and applied probability. CRC Press, Taylor & Francis Group, Boca Raton. ISBN 978-1-4987-1216-3.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8. doi:10.1007/978-981-16-9840-8. With forewords by Zongben Xu and Zhi-Quan Luo.
Simon N, Friedman J, Hastie T, Tibshirani R (2013). “A sparse-group lasso.” J. Comput. Graph. Statist., 22(2), 231–245. ISSN 1061-8600. doi:10.1080/10618600.2012.681250.
Yuan M, Lin Y (2006). “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Zou H (2006). “The adaptive lasso and its oracle properties.” J. Amer. Statist. Assoc., 101(476), 1418–1429. ISSN 0162-1459. doi:10.1198/016214506000000735.
### generate sample data set.seed(2023) n <- 50 p <- 30 X <- matrix(rnorm(n * p), n, p) ### Example 1, LASSO penalty beta <- apply(matrix(rnorm(p, sd = 1), p, 1), 1, fdaSP::softhresh, 1.5) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run cross-validation mod_cv <- lmSP_cv(X = X, y = y, penalty = "LASSO", standardize.data = FALSE, intercept = FALSE, cv.fold = 5, nlambda = 30, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- lmSP(X = X, y = y, penalty = "LASSO", standardize.data = FALSE, intercept = FALSE, nlambda = 30, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - beta)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2) ### Example 2, sparse group-LASSO penalty beta <- c(rep(4, 12), rep(0, p - 13), -2) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### define groups of dimension 3 each group1 <- rep(1:10, each = 3) ### set regularization parameter grid lam <- 10^seq(1, -2, length.out = 30) ### set the alpha parameter alpha <- 0.5 ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run cross-validation mod_cv <- lmSP_cv(X = X, y = y, penalty = "spGLASSO", groups = group1, cv.fold = 5, standardize.data = FALSE, intercept = FALSE, lambda = lam, alpha = 0.5, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- lmSP(X = X, y = y, penalty = "spGLASSO", groups = group1, standardize.data = FALSE, intercept = FALSE, lambda = lam, alpha = 0.5, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - beta)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2)### generate sample data set.seed(2023) n <- 50 p <- 30 X <- matrix(rnorm(n * p), n, p) ### Example 1, LASSO penalty beta <- apply(matrix(rnorm(p, sd = 1), p, 1), 1, fdaSP::softhresh, 1.5) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run cross-validation mod_cv <- lmSP_cv(X = X, y = y, penalty = "LASSO", standardize.data = FALSE, intercept = FALSE, cv.fold = 5, nlambda = 30, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- lmSP(X = X, y = y, penalty = "LASSO", standardize.data = FALSE, intercept = FALSE, nlambda = 30, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - beta)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2) ### Example 2, sparse group-LASSO penalty beta <- c(rep(4, 12), rep(0, p - 13), -2) y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20) ### define groups of dimension 3 each group1 <- rep(1:10, each = 3) ### set regularization parameter grid lam <- 10^seq(1, -2, length.out = 30) ### set the alpha parameter alpha <- 0.5 ### set the hyper-parameters of the ADMM algorithm maxit <- 1000 adaptation <- TRUE rho <- 1 reltol <- 1e-5 abstol <- 1e-5 ### run cross-validation mod_cv <- lmSP_cv(X = X, y = y, penalty = "spGLASSO", groups = group1, cv.fold = 5, standardize.data = FALSE, intercept = FALSE, lambda = lam, alpha = 0.5, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) ### graphical presentation plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n", xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error", ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd), main = "Cross-validated Prediction Error") fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd, yVmax = mod_cv$mse + mod_cv$mse.sd) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0) ### comparison with oracle error mod <- lmSP(X = X, y = y, penalty = "spGLASSO", groups = group1, standardize.data = FALSE, intercept = FALSE, lambda = lam, alpha = 0.5, control = list("adaptation" = adaptation, "rho" = rho, "maxit" = maxit, "reltol" = reltol, "abstol" = abstol, "print.out" = FALSE)) err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - beta)^2)) plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Estimation Error", main = "True Estimation Error", bty = "n") abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0) abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]), col = "red", lwd = 1.0, lty = 2)
Function to solve the soft thresholding problem
softhresh(x, lambda)softhresh(x, lambda)
x |
the data value. |
lambda |
the lambda value. |
the solution to the soft thresholding operator.
Hastie T, Tibshirani R, Wainwright M (2015). Statistical learning with sparsity: the lasso and generalizations, number 143 in Monographs on statistics and applied probability. CRC Press, Taylor & Francis Group, Boca Raton. ISBN 978-1-4987-1216-3.