This chapter mainly focuses on biomedical knowledge representation and its use in biomedicine. It first illustrates the existent more relevant bioinformatics resources and why they need to be better integrated. Then it describes what the main problems that machines can encounter in processing the factual biomedical knowledge are, what terminologies, classifications and ontologies are, and why they could help in better organizing and exploiting the bioinformatics resources available online. The authors hope that a concise perspective of the field and a list of selected resources, commented with their scope and usability, may help interested people in quickly understanding the main principles of knowledge representation in biomedicine and its high relevance for modern biomedical research and e-health.
Key Terms in this Chapter
Terminology: A collection of names of the entities involved in a domain. It simply states which are the principal terms used in the domain without any further information. Though it is a quite simplistic approach, yet it is extremely useful because helps computer programs to recognize the relevant terms and concentrate only on them. Although sometimes could be difficult to understand the difference between a terminology and a controlled vocabulary, the former is just a list of the terms used in a domain, while the latter guarantees that its terms are precise, accurate and unequivocal.
Biomedical Informatics: The discipline that studies biomedical information and knowledge, focusing in particular on their structure, acquisition, integration, management, and optimal use. It adopts and applies results from a variety of other disciplines including Information Science, Computer Science, Cognitive Science, Statistics and Biometrics, Mathematics, Artificial Intelligence, Operations Research, and basic and clinical Health Sciences.
Proteomics: The study of the whole of all possible proteins (amino acid sequences) of an organism, translated from different transcripts (mRNA sequences transcripted from a nucleotide sequence).
Biomolecular or Genomic Databank: A structured repository of biomolecular, genomic or proteomic data, often integrated with their related biological, medical, clinical, or experimental information. Generally it also provides interfaces and tools for browsing, querying, and sometime analyzing the data it contains.
Genomics: The systematic identification and study of Genomes, each of them including all the whole genetic material of a living organism.
Bioinformatics: A join branch of biology and informatics concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. It comprehends all computational methods and theories applicable to molecular biology and the computer-based techniques for solving biological problems, including manipulation of models and datasets.
Classificat ion: A collection of terms organized in categories. Thus, it includes only the is-a relationship between terms. This enables machines to group together bottom level terms up to their higher level ancestor, e.g. grouping all “lipidic methabolism” related terms under the upper term “metabolism.”
Semantic Network: A graph structure useful to represent the knowledge of a domain. It is composed of a set of objects, the graph nodes, which represent the concepts of the domain, and relations among such objects, the graph arches, which represent the domain knowledge. The semantic networks are also a reasoning tool as it is possible to find relations among the concepts of a semantic network that do not have a direct relation among them. To this aim, it is enough “to follow the arrows” of the network arches that exit from the considered nodes and find in which node the paths meet.
Controlled vocabulary: A collection of precise and universally understandable terms that define and identify the concepts of a domain in a unique and unequivocal way, e.g. the anatomical terminology. Such a vocabulary is said controlled because it is defined and maintained updated by people, the curators, who are expert of the domain the vocabulary refers to. Controlled vocabularies are very useful in extended and complex domains, such as Medicine and Biology, where distinct concepts must be identified with high precision in order to codify, analyze, and communicate the domain knowledge. Though they are similar to terminologies, the difference is that a terminology does not guarantee that its terms are precise, accurate and unequivocal, but it is rather a list of used terms for a specific domain.
Ontology: A semantic structure useful to standardize and provide rigorous definitions of the terminology used in a domain and to describe the knowledge of the domain. It is composed of a controlled vocabulary, which describes the concepts of the considered domain, and a semantic network, which describes the relations among such concepts. Each concept is connected to other concepts of the domain through semantic relations that specify the knowledge of the domain. A general concept can be described by several terms that can be synonyms or characteristic of different domains in which the concept exists. For this reason the ontologies tend to have a hierarchical structure, with generic concepts/terms at the higher levels of the hierarchy and specific concepts/terms at the lover levels, connected by different types of relations.