Difference between principal directions and principal component scores in the context of dimensionality reduction

I am really confused. I have read a lot about this, but I still don't quite understand it.

pca
dimensionality-reduction

106k 35 35 gold badges 316 316 silver badges 340 340 bronze badges asked Sep 28, 2015 at 20:31 393 1 1 gold badge 3 3 silver badges 9 9 bronze badges

$\begingroup$ Principal axis/direction/dimension/component are one thing. It is a derived variable. As an axis, it is "rotated" in space relative the original axes-variables. V is the matrix of the angles (the cosines) between the original variables (rows of V ) and the PCs (columns of V ). The data points are then projected onto the PC axes, yielding coordinates US on them called the PC values or scores. Note that US=XV' which makes it clearer that the rows of data X (data points) were projected onto axes V which were in turn the result of rotation of the columns of X (original variables). $\endgroup$

Commented Sep 28, 2015 at 21:25

$\begingroup$ But why does the column vectors have so many elements/dimensions? It doesn't mmake sense if I have reduced the dimensionality to only 6 by choosing only 6 column vectors $\endgroup$

Commented Sep 29, 2015 at 4:53 $\begingroup$ and how can US=XV' if X=USV' ? Shouldn't it be X=USV' <=>US=X(V')^(-1)`? $\endgroup$ Commented Sep 29, 2015 at 4:56

$\begingroup$ Columns of $US$ have as many elements as columns of $X$. If you had 1000 points in 12 dimensions, then dimensionality reduction from 12 to 6 means that you select 6 principal axes (columns of $V$) and project your data onto them, obtaining 6 columns of $US$, each having 1000 values -- one value for each data point. In the end you have 1000 points in 6 dimensions. Regarding the math, $US=XV$ because $XV=USV^\top V = US$. Matrix $V$ is orthogonal, so $V^<-1>=V^\top$. Does this make sense? $\endgroup$

Commented Sep 29, 2015 at 8:55

1 Answer 1

$\begingroup$

Most of these things are covered in my answers in the following two threads:

Relationship between SVD and PCA. How to use SVD to perform PCA?
What exactly is called "principal component" in PCA?

Still, here I will try to answer your specific concerns.

Think about it like that. You have, let's say, $1000$ data points in $12$-dimensional space (i.e. your data matrix $X$ is of $1000\times12$ size). PCA finds directions in this space that capture maximal variance. So for example PC1 direction is a certain axis in this $12$-dimensional space, i.e. a vector of length $12$. PC2 direction is another axis, etc. These directions are given by columns of your matrix $V$. All your $1000$ data points can be projected onto each of these directions/axes, yielding coordinates of $1000$ data points along each PC direction; these projections are what is called PC scores, and what I prefer to simply call PCs. They are given by the columns of $US$.

So for each PC you have a $12$-dimensional vector specifying the PC direction or axis and a $1000$-dimensional vector specifying the PC projection on this axis.

"Reducing dimensionality" means that you take several PC projections as your new variables (e.g. if you take $6$ of them, then your new data matrix will be of $1000\times 6$ size) and essentially forget about the PC directions in the original $12$-dimensional space.

Most websites about PCA say that I should choose some principal components, but isn't it more correct to choose principal directions/axes since my objective is to reduce dimensionality?

This is equivalent. One column of $V$ corresponds to one column of $US$. You can say that you choose some columns of $V$ or you can say that you choose some columns of $US$. Doesn't matter. Also, by "principal components" some people mean columns of $V$ and some people mean columns of $US$. Again, most of the time it does not matter.

I have seen that my matrix V consists of 12 column vectors, each with 12 elements. If I choose 6 of these columns vectors, each vector still has 12 elements - but how is this possible if I have reduced the dimensionality?

You chose 6 axes in the 12-dimensional space. If you only consider these 6 axes and discard the other 6, then you reduced your dimensionality from 12 to 6. But each of the 6 chosen axes is originally a vector in the 12-dimensional space. No contradiction.

Besides, there are 12 column vectors of US, representing the principal components (scores), but each column vector has an awful lot of elements. What does it mean?

As I said, these are the projections on the principal axes. If your data matrix had 1000 points, then each PC score vector will have 1000 points. Makes sense.