https://www.knime.com/sites/default/files/inline-images/knime_seventechniquesdatadimreduction.pdf
Knime and dimensionality reduction (decrease the number of input columns)
PCA is a statistical procedure that uses an orthogonal transformation to move the original n coordinates of a data set into a new set of n coordinates called principal components. As a result of the transformation, the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible); each succeeding component has the highest possible variance under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components.
The principal components are orthogonal because they are the eigenvectors of the covariance matrix, which is symmetric. The purpose of applying PCA to a data set is ultimately to reduce its dimensionality, by finding a new smaller set of m variables, m < n, retaining most of the data information, i.e. the variation in the data. Since the principal components (PCs) resulting from PCA are sorted in terms of variance, keeping the first m PCs should also retain most of the data information, while reducing the data set dimensionality. Notice that the PCA transformation is sensitive to the relative scaling of the original variables. Data column ranges need to be normalized before applying PCA. Also notice that the new coordinates (PCs) are not real system-produced variables anymore. Applying PCA to your data set loses its interpretability. If interpretability of the results is important for your analysis, PCA is not the transformation for your project. KNIME has 2 nodes to implement PCA transformation: PCA Compute and PCA Apply.
Niciun comentariu:
Trimiteți un comentariu