Semantification of Large Corpora of Technical Documentation

Semantification of Large Corpora of Technical Documentation

Sebastian Furth (denkbares GmbH, Germany) and Joachim Baumeister (denkbares GmbH, Germany & University of Würzburg, Germany)
Copyright: © 2016 |Pages: 30
DOI: 10.4018/978-1-5225-0293-7.ch011
OnDemand PDF Download:
No Current Special Offers


The complexity of machines has grown dramatically in the past years. Today, they are built as a complex functional network of mechanics, electronics, and hydraulics. Thus, the technical documentation became a fundamental source for service technicians in their daily work. The technicians need fast and focused access methods to handle the massive volumes of documentation. For this reason, semantic search emerged as the new system paradigm for the presentation of technical documentation. However, the existent large corpora of legacy documentation are usually not semantically prepared. This fact creates an invincible gap between new technological opportunities and the actual data quality at companies. This chapter presents a novel and comprehensive approach for the semantification of large volumes of legacy technical documents. The approach espescially tackles the veracity and variety existent in technical documentation and makes explicit use of their typical characteristics. The experiences with the implementation and the learned benefits are discussed in industrial case studies.
Chapter Preview


In the field of Information Extraction and Text Analytics, established methods exist for the extraction of semantic information from natural language texts. Most of these methods are based on supervised Machine Learning approaches that require a sufficient amount of training data for decent results. In real-world scenarios such training data is often not available, and the creation with respect to a cost-benefit ratio is not reasonable.

Complete Chapter List

Search this Book: