Presentation available online (PDF files):

- Part 1, in color, 55 Pages, 262KB
- Part 2, in color, 44 Pages, 950KB
- Part 3, in color, 23 Pages, 493KB

This tutorial will cover Principal Component Analysis (PCA), spectral clustering via the Laplacian matrix of a graph, nonnegative matrix factorization (NMF), other matrix models used in machine learning.

Principal Component Analysis (PCA) and its underlying Singular Value Decomposition (SVD) are widely used in machine learning, image/signal processing, data analysis and many other fields.

This area is going through a Renaissance period with many new developments: Kernel PCA, probabilistic PCA, independent component analysis; Sparsification of PCA using L1 constraints extending LASSO, truncation and discretize, direct sparsification.

Recent results show that PCA components provides solutions to K-means clustering, thus connecting dimension reduction to clustering.

Solutions of Kernel K-means clustering are given by the eigenvector of the kernel matrix (or a matrix of pairwise similarities), whereas solutions of the spectral clustering normalized cut objective are given by the generalized eigenvector of the same matrix. Thus Kernel K-means clustering and spectral clustering are closely related.

From matrix perspective, PCA/SVD are matrix factorization (approximations by lower rank matrices with clear meaning). Thus K-means and spectral clustering are under this broad matrix model framework.

Aside from eigenvector based factorizations, nonnegative matrix factorization (NMF) have many desirable properties. It is recognized that NMF provides a continuous nonnegative solution to the K-means clustering and also a solution to the spectral clustering.

Besides NMF, other matrix factorization such as maximum margin factorization, partitioned columns based factorizations are also useful.

The key defining properties of the matrix model are orthogonality and nonnegativity. Enforcing orthogonality while ignoring nonnegativity, we get spectral clustering. Ignoring orthogonality while enforing nonnegativity, we get NMF. We may also impose orthogonality and nonnegativity simultaneously. This leads to orthogonal NMF in NMF side and indicator matrix quadratic clustering in spectral clustering side.

Standard SVD/PCA deals with a set of 1D vectors such as data points in a high dimensional space. When the input is a set of 2D objects such as images or weather maps, 2DSVD computes their low-dimensional approximations by using principal eigenvectors of the row-row and column-column covariance matrices.

The tutorial starts with the classic PCA and provides a unified review of these recent developments. We hope to convey the breadth and depth of matrix model based approach for machine learning.