KYOTO: A Wiki for Establishing Semantic Interoperability for Knowledge Sharing Across Languages and Cultures

Piek Vossen (VU University Amsterdam, The Netherlands), Eneko Agirre (EHU, Spain), Francis Bond (Nanyang Technological University, Singapore), Wauter Bosma (VU University Amsterdam, The Netherlands), Axel Herold (BBAW, Germany), Amanda Hicks (BBAW, Germany), Shu-Kai Hsieh (National Taiwan Normal University, Taiwan), Hitoshi Isahara (NICT, Japan), Chu-Ren Huang (Hong Kong University, China), Kyoko Kanzaki (NICT, Japan), Andrea Marchetti (CNR-IIT, Italy), German Rigau (EHU, Spain), Francesco Ronzano (CNR-IIT, Italy), Roxane Segers (VU University Amsterdam, The Netherlands) and Maurizio Tesconi (CNR-IIT, Italy)
KYOTO is an Asian-European project developing a community platform for modeling knowledge and finding facts across languages and cultures. The platform operates as a Wiki system that multilingual and multi-cultural communities can use to agree on the meaning of terms in specific domains. The Wiki is fed with terms that are automatically extracted from documents in different languages. The users can modify these terms and relate them across languages. The system generates complex, language-neutral knowledge structures that remain hidden to the user but that can be used to apply open text mining to text collections. The resulting database of facts will be browse-able and searchable. Knowledge is shared across cultures by modeling the knowledge across languages. The system is developed for 7 languages and applied to the domain of the environment, but it can easily be extended to other languages and domains.
Chapter Preview

Information And Knowledge In The Environment Domain

The globalization of markets and communication brings with it a concomitant globalization of world-wide problems and the need for new solutions. Timely examples are global warming, climate change and other environmental issues related to rapid growth and economic developments. Environmental problems can be acute, requiring immediate support and action, relying on information available elsewhere. Knowledge sharing and transfer are also essential for sustainable growth and development on a longer term. In both cases, it is important that distributed information and experience can be re-used on a global scale. The globalization of problems and their solutions requires that information and communication be supported across a wide range of languages and cultures. Such a system should furthermore allow both experts and laymen to access this information in their own language, without recourse to cultural background knowledge.

Key Terms in this Chapter

Role Concept: A concept is a role if it is not rigid, which means it is not essential to all or some of its instances. For example, invasive species is a role because certain species may become invasive at some point in time and become native at a later point in time.

Rigid Concept: a concept is rigid if it is essential to all of its instances. For example, the concept animal is rigid because everything that is an animal, must be an animal and is an animal for as long as it exists. It cannot cease being animal and change into, for example, a plant.

Synset: set of synonyms that represent a single concept

Cultural Interoperability: the degree to which knowledge and information is anchored to a unified model of meaning across cultures

WordNet: lexical semantic database with concepts represented by synonyms in a language, so-called synsets, with semantic relations between these concepts

Ontology: formalized database of conceptual knowledge that can be used by computers to do inferencing

Knowledge Mining: computer systems that extract knowledge from natural language text

Semantic Interoperability: the degree to which natural language text and resources are anchored to a unified model of meaning across resources and languages

Information Extraction: computer systems that extract information defined in a template from natural language text

Text Mining: computer systems that extract information from natural language text

