Angelos Markos bio photo

Angelos Markos

Data Scientist, Professor, Runner

Email Twitter Facebook LinkedIn

Proud for this one. Theory Award of the German Classification Society (GfKl) for our paper entitled Incremental Generalized Canonical Correlation Analysis (together with Alfonso Iodice D’ Enza), which was presented at the European Conference on Data Analysis 2014 in Bremen.

Photo taken before the ceremony in Essex, UK (ECDA 2015)

Best paper award

Abstract

Generalized canonical correlation analysis (GCANO) is a versatile technique that allows the joint analysis of several sets of data matrices through data reduction. The method embraces a number of representative techniques of multivariate data analysis as special cases (Takane et al., 2008). When all data sets consist of indicator variables GCANO specializes into Correspondence Analysis (simple and multiple), and into Principal Component Analysis when each of the data sets consists of a single continuous variable. In the case of two data sets with continuous variables, GCANO reduces to canonical correlation analysis, and when one of the two sets of variables consists of indicator variables, the method specializes into canonical discriminant analysis or MANOVA. GCANO can also be viewed as a method for data fusion from disparate sources and has recently found applications in large scale scenarios. The GCANO solution can be obtained noniteratively through an eigenequation and distributional assumptions are not required. The high computational and memory requirements of ordinary eigendecomposition makes its application impractical on massive or sequential data sets. The aim of the present contribution is two-fold: i) to extend the family of GCANO techniques to a split-apply-combine framework, that leads to exact and parallel implementations; ii) to allow for incremental updates of existing solutions, which lead to approximate yet highly accurate solutions. For this purpose, an incremental SVD approach with desirable properties is revised and embedded in the context of GCANO, and extends its applicability to modern big data problems and data streams.