Strategies for Document Management

Strategies for Document Management

Karen Corral, David Schuff, Gregory Schymik, Robert St. Louis
DOI: 10.4018/978-1-4666-0279-3.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Keyword search has failed to adequately meet the needs of enterprise users. This is largely due to the size of document stores, the distribution of word frequencies, and the indeterminate nature of languages. The authors argue a different approach needs to be taken, and draw on the successes of dimensional data modeling and subject indexing to propose a solution. They test our solution by performing search queries on a large research database. By incorporating readily available subject indexes into the search process, they obtain order of magnitude improvements in the performance of search queries. Their performance measure is the ratio of the number of documents returned without using subject indexes to the number of documents returned when subject indexes are used. The authors explain why the observed tenfold improvement in search performance on our research database can be expected to occur for searches on a wide variety of enterprise document stores.
Chapter Preview
Top

The Current Approach

Apple, Microsoft, and Google all are marketing software that is designed to facilitate document retrieval. Apple claims that its Spotlight search tool “can find anything on your computer as quickly as you type” (Apple, 2009). Moreover, they claim “you always find what you are looking for, even if you don’t know where to look.” Microsoft and Google make similar claims for their desktop search engines (Google, 2009b; Microsoft, 2009). Google promotes that its search tool “puts your information easily within your reach and frees you from having to manually organize your files, emails and bookmarks” (Google, 2009b).

These claims are quite impressive, and might lead one to believe that the document management problem has been solved. The claims are even more impressive when one stops to consider that the individual or company that owns the documents does not need to add any metadata or structure to the documents before storing them. In fact, for these search engines both organization and format are irrelevant. The artifacts (files, emails, contacts, images, calendars, music, etc.) simply have to be stored on a device accessible by a personal computer or a server. It makes no difference whether the documents are placed in a single folder, or stored in an elaborate hierarchical structure. The presumption is that between the content itself and the artifact’s metadata (owner, date created, size, file type, etc.), there is sufficient information to enable retrieval.

Complete Chapter List

Search this Book:
Reset