A Model for Estimating the Savings from Dimensional vs. Keyword Search

A Model for Estimating the Savings from Dimensional vs. Keyword Search

Karen Corral (Boise State University, USA), David Schuff (Temple University, USA), Robert D. St. Louis (Arizona State University, USA) and Ozgur Turetken (Ryerson University, Canada)
DOI: 10.4018/978-1-60566-172-8.ch009
OnDemand PDF Download:
$37.50

Abstract

Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon by many authors, and many enhancements to keyword searches have been proposed. To date, however, no one has provided a quantitative model or systematic process for evaluating the savings that accrue from enhanced search procedures. This paper presents a model for estimating the total cost to a company of relying on keyword searches versus a dimensional search approach. The model is based on the Zipf-Mandelbrot law in quantitative linguistics. Our analysis of the model shows that a surprisingly small number of searches are required to justify the cost associated with encoding the metadata necessary to support a dimensional search engine. The results imply that it is cost effective for almost any business organization to implement a dimensional search strategy.
Chapter Preview
Top

Introduction

People spend a tremendous amount of time searching for information. One estimate puts the average employee’s time at 3-1/2 hours a week for unsuccessful searches (Ultraseek, 2006). For a 1,000 employee company, that works out to $9.7 million a year for just the cost of salary (Ultraseek, 2006). Some estimates put the cost as high as $33 million annually per company when taking into consideration the costs of recreating the information not found (Thompson, 2004). Furthermore, between 60-80% of queries over the intranet (as opposed to the Internet) are for material that the searcher has previously seen (Mukherjee and Mao, 2004).

Keyword search has several well-known problems (for a review, see Blair, 2002), but its advantage over other methods is that once the documents have been saved, there is no additional work that the user has to perform. One alternative to keyword search is dimensional search. Dimensional search eliminates the ambiguity of words (which causes so many of the problems for keyword search) through the use of pre-defined categories (dimensions) to define documents as well as finite sets of possible values for each category. It has been demonstrated that dimensional search reduces the number of irrelevant documents returned in the result set (LaBrie, 2004). However, there is a significant, up-front, time investment that has to be made for dimensional search. In particular, meta-data must be stored about each document, and much of this information must be determined and entered by a human user. So the question becomes, is the increased retrieval accuracy worth the initial cost of categorizing documents?

The content management market was estimated to be over $1 billion in 2003 (Dunwoodie, 2004), and to have grown 9.7% in 2006 (Webster, 2007). Vendors of this software make quite amazing claims about the efficacy of their software, yet for all the money being spent by companies, there has been little academic work done to evaluate these systems. We want to determine the cost, in time, of performing a keyword search versus the cost, in time, of performing a dimensional search, including the initial time-investment. Factors that affect the overall cost of searching include the start-up costs of any content management system, the size of the library (it is much easier to exhaustively search a small library than a large library), the size of the documents in the library (books are more difficult to search than are e-mail messages), and the cost of not finding the document.

While evaluating the best approach to studying this question, we considered a number of research methodologies. A case study approach to this problem, which is largely what IDC, Gartner and other commercial information providers use, would be hampered by a lack of generalizability. Also, attempting to collect data on an employee’s search could be considered invasive by the employee. If employees know that their time and actions are being tracked, they might elect to perform searches outside of such data collection, out of concern that the collected data might be used to evaluate their work rather than the content management software. Moreover, drawing data from a survey of content management product users makes comparison of such data difficult as the nature of searches might vary considerably by company as well as by user. And there is the additional concern that users might not have an accurate sense of the time or the effectiveness of their searches.

An experiment would need to consider all the above factors, plus ensure the proper motivation of the users. For these reasons, we elected to use an analytical modeling approach, which allows us to use different values for variables and examine their impact on search cost. From our model we were able to determine the break-point, in terms of the number of searches, at which dimensional search becomes more cost effective than keyword search. That is, we were able to determine the number of searches an organization must do in order to justify the up-front cost of determining and entering the metadata that is required to support dimensional search.

The rest of this paper is organized as follows. In the next section, we briefly review some of the most important findings in word frequency distributions. After that, we present the basic model for net search cost. We then present a model for estimating the net search cost of keyword searches, followed by a model for estimating the net search cost of dimensional searches. The output of the two models is then compared, followed by a discussion of the implications of the results and possible refinements of the model.

Complete Chapter List

Search this Book:
Reset
Editorial Advisory Board
Table of Contents
Chapter 1
Hong Zhang, Rajiv Kishore, Ram Ramesh
A conceptual modeling grammar should be based on the theory of ontology and possess clear ontological semantics to represent problem domain... Sample PDF
Semantics of the MibML Conceptual Modeling Grammar: An Ontological Analysis Using the Bunge-Wand-Weber Framework
$37.50
Chapter 2
Henry M. Kim, Arijit Sengupta, Mark S. Fox, Mehmet Dalkilic
This paper introduces a measurement ontology for applications to semantic Web applications, specifically for emerging domains such as microarray... Sample PDF
A Measurement Ontology Generalizable for Emerging Domain Applications on the Semantic Web
$37.50
Chapter 3
Zhiyuan Chen
Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for... Sample PDF
Semantic Integration and Knowledge Discovery for Environmental Research
$37.50
Chapter 4
Vijayan Sugumaran, Gerald DeHondt
Software reuse has been discussed in the literature for the past three decades and is widely seen as one of the major areas for improving... Sample PDF
Towards Code Reuse and Refactoring as a Practice within Extreme Programming
$37.50
Chapter 5
Miguel I. Aguiree-Urreta, George M. Marakas
Requirements elicitation has been recognized as a critical stage in system development projects, yet current models prescribing particular... Sample PDF
Requirements Elicitation Technique Selection: A Theory-Based Contingency Model
$37.50
Chapter 6
VenuGopal Balijepally, Sridhar Nerur, RadhaKanta Mahapatra
Software development in organizations is evolving and increasingly taking a socio-technical hue. While empirical research guided by common sense... Sample PDF
IT Value of Software Development: A Multi-Theoretic Perspective
$37.50
Chapter 7
Amel Mammar
UB2SQL is a tool for designing and developing database applications using UML and B formal method. The approach supported by UB2SQL consists of two... Sample PDF
UB2SQL: A Tool for Building Database Applications Using UML and B Formal Method
$37.50
Chapter 8
Juliette Gutierrez
Crime reports are used to find criminals, prevent further violations, identify problems causing crimes and allocate government resources.... Sample PDF
Using Decision Trees to Predict Crime Reporting
$37.50
Chapter 9
Karen Corral, David Schuff, Robert D. St. Louis, Ozgur Turetken
Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon... Sample PDF
A Model for Estimating the Savings from Dimensional vs. Keyword Search
$37.50
Chapter 10
Praveen Madiraju, Rajshekhar Sunderraman, Shamkant B. Navathe, Haibin Wang
Global semantic integrity constraints ensure the integrity and consistency of data spanning distributed databases. In this chapter, we discuss a... Sample PDF
Integrity Constraint Checking for Multiple XML Databases
$37.50
Chapter 11
Russel Pears
Data Warehouses are widely used for supporting decision making. On Line Analytical Processing or OLAP is the main vehicle for querying data... Sample PDF
Accelerating Multi Dimensional Queries in Data Warehouses
$37.50
Chapter 12
Vikas Agrawal, P. S. Sundararaghavan, Mesbah U. Ahmed, Udayan Nandkeolyar
Data warehouse has become an integral part in developing a DSS in any organization. One of the key architectural issues concerning the efficient... Sample PDF
View Materialization in a Data Cube: Optimization Models and Heuristics
$37.50
Chapter 13
Athman Bouguettaya, Zaki Malik, Xumin Liu, Abdelmounaam Rezgui, Lori Korff
The ubiquity of the World Wide Web facilitates the deployment of highly distributed applications. The emergence of Web databases and applications... Sample PDF
WebFINDIT: Providing Data and Service-Centric Access through a Scalable Middleware
$37.50
Chapter 14
James E. Wyse
Location-based mobile commerce (LBMC) incorporates location-aware technologies, wire-free connectivity, and server-based repositories of business... Sample PDF
Retrieval Optimization for Server-Based Repositories in Location-Based Mobile Commerce
$37.50
Chapter 15
Shing-Han Li, Shi-Ming Huang, David C. Yen, Cheng-Chun Chang
The lifecycle of information system (IS) became relatively shorter compared with earlier days as a result of information technology (IT) revolution... Sample PDF
Migrating Legacy Systems to Web Services Architecture
$37.50
Chapter 16
Myeong Ho Lee
The trend toward convergence, initiated by advances in ICT, entails the creation of new value chain networks, made up by partnerships between actors... Sample PDF
A Socio-Technical Interpretation of IT Convergence Services: Applying a Perspective from Actor Network Theory and Complex Adaptive Systems
$37.50
Chapter 17
T. Ariyachandra, L. Dong
Past evidence suggests that organizational transformation from IT implementations is rare. Data warehousing promises to be one advanced information... Sample PDF
Understanding Organizational Transformation from IT Implementations: A Look at Structuration Theory
$37.50
Chapter 18
Yuan Long, Keng Siau
Drawing on social network theories and previous studies, this research examines the dynamics of social network structures in Open Source Software... Sample PDF
Social Networks Structures in Open Source Software Development Teams
$37.50
Chapter 19
Susanta Mitra, Aditya Bagchi, A. K. Bandyopadhyay
A social network defines the structure of a social community like an organization or institution, covering its members and their... Sample PDF
Design of a Data Model for Social Networks Applications
$37.50
About the Contributors