Why do we want First Principal Component?
Suppose that the data with features are represented as a row vector, denote the matrix
For every unit vector , it is easy to see that represents all the magnitude of projected onto .
Next,
is the standard deviation of the set of values with mean . Our target is now to find a direction so that the standard deviation is as huge as possible, therefore along that direction our data will be much more comparable and hopefully we can find clustering values for which the data can be grouped together.
Finding such a direction is the same as finding with such that
We call the first principal component.
Conclusion. First Principal Component is a unit vector in that maximizes the standard deviation of , where .
In Terms of Singular Value Decomposition
is in fact the right singular vector corresponding to the largest singular value. The -th principal component is accordingly the -th right singular vector corresponding to the -th largest singular value.
Note that by constructon .