A Kernel Canonical Correlation Analysis for Learning the Semantics of Text
Blaž Fortuna (Jožef Stefan Institute, Slovenia), Nello Cristianini (University of Bristol, UK) and John Shawe-Taylor (University of Southampton, UK)
Copyright: © 2007
We present a general method using kernel canonical correlation analysis (KCCA) to learn a semantic of text from an aligned multilingual collection of text documents. The semantic space provides a language-independent representation of text and enables a comparison between the text documents from different languages. In experiments, we apply the KCCA to the cross-lingual retrieval of text documents, where the text query is written in only one language, and to cross-lingual text categorization, where we trained a cross-lingual classifier.