It is routine to hear and read about the information explosion, how we are all overwhelmed with data and information. Is it progress when our search tools report that our query resulted in 300,000 hits? Or, are we still left to wonder where is the information that we really wanted? How far down the list must we go to find it? Discovery informatics is a distinctly 21st century emerging methodology that brings together several threads of research and practice aimed at making sense out of massive data sources. It is defined as “the study and practice of employing the full spectrum of computing and analytical science and technology to the singular pursuit of discovering new information by identifying and validating patterns in data” (Agresti, 2003).
The rapid rise in the amount of information generated each year may be quite understandable. After all, the world’s population is growing, and countries like China and India, with very large populations, are becoming increasingly influential worldwide. However, the real reason why people are confronted with so much more information in their lives and work is that the information has real benefits for them. However, these benefits are not always or often realized, and therein lay the motivation for discovery informatics.
Companies today are data mining with more highly granular data to better understand their customers’ buying habits. As a result, there is pressure on all businesses to attain the same level of understanding or be left behind – and being left behind in the 21st century can mean going out of business. Not-for-profits are becoming equally adept at mining data to discover which likely donors are most cost-effective to cultivate. Increasing granularity enables more targeted marketing, but with more data requiring more analysis. A co-conspirator in this infoglut is the declining cost to store the data. Organizations don’t need to make choices on what data to keep. They can keep it all.
The task of making sense out of this burgeoning mass of data is growing more difficult every day. Effectively transforming this data into usable knowledge is the challenge of discovery informatics. In this broad-based conceptualization, discovery informatics may be seen as taking shape by drawing on more established disciplines:
Data analysis and visualization: analytic frameworks, interactive data manipulation tools, visualization environments
Database management: data models, data analysis, data structures, data management, federation of databases, data warehouses, database management systems
Pattern recognition: statistical processes, classifier design, image data analysis, similarity measures, feature extraction, fuzzy sets, clustering algorithms
Information storage and retrieval: indexing, content analysis, abstracting, summarization, electronic content management, search algorithms, query formulation, information filtering, relevance and recall, storage networks, storage technology
Knowledge management: knowledge sharing, knowledge bases, tacit and explicit knowledge, relationship management, content structuring, knowledge portals, collaboration support systems
Artificial intelligence: learning, concept formation, neural nets, knowledge acquisition, intelligent systems, inference systems, Bayesian methods, decision support systems, problem solving, intelligent agents, text analysis, natural language processing
What distinguishes discovery informatics is that it brings coherence across dimensions of technologies and domains to focus on discovery. It recognizes and builds upon excellent programs of research and practice in individual disciplines and application areas. It looks selectively across these boundaries to find anything (e.g., ideas, tools, strategies, and heuristics) that will help with the critical task of discovering new information.
To help characterize discovery informatics, it may be useful to see if there are any roughly analogous developments elsewhere. Two examples, knowledge management and core competence, may be instructive as reference points.