This function performs the full ASCA decomposition

ASCA_decompose(
  d,
  x,
  f,
  glm_par = vector(mode = "list", length = 0),
  res_type = "response"
)

Arguments

d

a data.frame/matrix with the design

x

a data.frame/matrix of numeric values to be decomposed

f

a string holding the formula of the decomposition

glm_par

a list with the parameters to be passed to the glm call

res_type

the types of GLM residuals

Value

a list with the full outcomes of the decomposition with the following elements

  • decomposition: a list holding the results of the decomposition

  • mu: a vector with the constant terms of the univariate models

  • residuals: a matrix holding the model residuals. Their type is stored in the res_type element.

  • prediction: the matrix with the predicted values in the linear predictor space

  • pseudoR2: a parameter to assess the goodness of fit for the model on each variable.

  • glm_par: a list with the parameters used for modelling

  • res_type: the type of residuals

  • varimp: the importance of the individual variables in the decomposition terms

  • terms_L2: the L2 norm of the individual terms back transformed in the response space

  • d: a data.frame with the design

  • x: a data.frame with the initial data

  • f: the string defining the decomposition

  • combined: a vector holding the combined terms ()

  • invlink: the inverse of the link function used in the glm fitting

Details

The ASCA decomposition of a data matrix is performed by using Generalized Linear Models to estimate univariate expected values. The use of GLM's allows the extension of the method to non normal data and unbalanced designs. This function performs only the decomposition without the SVD, which have to be performed by ASCA_svd. The level of fit for each variable is assessed calculating the pseudoR2, which is defined as:

$$1-residual_deviance/null_deviance$$

The variable importance element stores a measure of the importance of each variable \(c\) for each term calculated as the norm of each column. L2 norm is also calculated for each decomposition term.

It is important to highlight that in the case of count data with large fraction of zeroes the variable importance and the term L2 norms cannot be considered as reliable measure of importance because in presence of log links very low expected values are associated to very large negative values in the linear predictor space.

Examples

## load the data
data("synth_count_data")

## perform the ASCA decomposition
dec_test <- ASCA_decompose(
d = synth_count_data$design,
x = synth_count_data$counts, 
f = "time + treatment + time:treatment",
glm_par = list(family = poisson())
)