The element \(X_{ij}\) contains the value of variable j
in the sample i
The element \(X_{ij}\) contains the value of variable j
in the sample i
In presence of correlation among the variables, the samples actually occupy only a “fraction” of the potential multidimensional space. Here a projection is highly informative
Mathematical combination of several variables
Projecting the data along specific latent variables, we highlight some desired property of the data. In a more broad sense, the latent variables can also be seen as the mathematical representation of the hidden rules which determine the sample behavior
A set of latent variable can be used to reconstruct an informative representation of the dataset which captures some relevant multidimensional aspects of the data.
This representation is constructed “projecting” the samples on the LVs
Each projection will result in:
What LV will maximize the separation between the two groups? Can you guess something about the loadings?
The loadings represent the weight of the initial variables along the discriminating direction
Var a
: 0.9987687Var b
: 0.0496086The aim of PCA is dimension reduction and PCA is the most frequently applied method for computing linear latent variables (components).
The transformation is defined in such a way way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.
What LV will highlight the direction of maximal variance?
The loadings represent the weight of the initial variables along PC1
Var a
: 0.0899434Var b
: 0.9959469