With the emergence of Semi-structured data format (such as XML), the storage of documents in centralised facilities appeared as a natural adaptation of data warehousing technology. Nowadays, OLAP (On-Line Analytical Processing) systems face growing non-numeric data. This chapter presents a framework for the multidimensional analysis of textual data in an OLAP sense. Document structure, metadata, and contents are converted into subjects of analysis (facts) and analysis axes (dimensions) within an adapted conceptual multidimensional schema. This schema represents the concepts that a decision maker will be able to manipulate in order to express his analyses. This allows greater multidimensional analysis possibilities as a user may gain insight within a collection of documents.
The rapid expansion of information technologies has considerably increased the quantity of available data through electronic documents. The volume of all this information is so large that comprehension of this information is a difficult problem to tackle.