Representation Languages for Unstructured ‘Narrative’ Documents

Representation Languages for Unstructured ‘Narrative’ Documents

Gian Piero Zarri (University Paris Est and LISSI Laboratory, France)
Copyright: © 2011 |Pages: 14
DOI: 10.4018/978-1-59904-931-1.ch132


A big amount of important, ‘economically relevant’ information, is buried into unstructured, multimedia ‘narrative’ resources. This is true, e.g., for most of the corporate knowledge documents (memos, policy statements, reports, minutes etc.), for the news stories, the normative and legal texts, the medical records, many intelligence messages, the ‘storyboards/historians’ describing sequences of events in industrial plants, the surveillance videos, the actuality photos for newspapers and magazines, lot of material (text, image, video, sound…) for eLearning etc., as well as, in general, for a huge fraction of the information stored on the Web. In these ‘narrative documents’, or ‘narratives’, the main part of the information content consists in the description of ‘events’ that relate the real or intended behavior of some ‘actors’ (characters, personages, etc.) – the term ‘event’ is taken here in its more general meaning, covering also strictly related notions like fact, action, state, situation etc. These actors try to attain a specific result, experience particular situations, manipulate some (concrete or abstract) materials, send or receive messages, buy, sell, deliver etc. Note that, in these narratives, the actors or personages are not necessarily human beings; we can have narrative documents concerning, e.g., the vicissitudes in the journey of a nuclear submarine (the ‘actor’, ‘subject’ or ‘personage’) or the various avatars in the life of a commercial product. Note also that, even if a large amount of narrative documents concerns natural language (NL) texts, this is not necessarily true, and ‘narratives’ are really ‘multimedia’. A photo representing a situation that, verbalized, could be expressed as “The US President is addressing the Congress” is not of course an NL text, yet it is still a narrative document. Because of the ubiquity of these ‘narrative’ resources, being able to represent in a general, accurate, and effective way their semantic content – i.e., their key ‘meaning’ – is then both conceptually relevant and economically important: narratives form, in fact, a huge underutilized component of organizational knowledge, and people could be willing to pay for a system able to process in an ‘intelligent’ way this information and/or for the results of the processing. This type of explicit yet unstructured knowledge can be, of course, indexed and searched in a variety of ways, but is requires, however, an approach for formal analysis and effective utilization that is neatly different from the ‘traditional’ ones.

Key Terms in this Chapter

Predicative Occurrences: Conceptual structures obtained from the instantiation of templates and used to represent particular elementary events. To take into account the ‘connectivity phenomena’ (see the corresponding defining term), conceptual labels denoting predicative occurrences can be associated within second order structures making use of operators like CAUSE, GOAL, COORD(ination), etc.

Corporate Memories and Narrative Documents: Knowledge is one of the most important assets of an enterprise, on the condition that it could be controlled, shared and reused in an effective way. The core of any commercial/industrial organization can then be conceived under the form of a general and shared ‘Corporate Memory’, i.e., of an on-line, computer-based storehouse of expertise, experience and documentation about all the strategic aspects of the organization. Given that this corporate knowledge is mainly represented under the form of narrative documents, the possibility of having at one's disposal tools for an effective management of this documents becomes an essential condition for the concrete set up and for the ‘intelligent’ exploitation of non-trivial Corporate Memories.

Connectivity Phenomena: A term drawn from Computational Linguistics: in the presence of several, logically linked elementary events, it denotes the existence of a global information content that goes beyond the simple addition of the information conveyed by the single events. The connectivity phenomena are linked with the presence of logico-semantic relationships like causality, goal, indirect speech, co-ordination and subordination etc., as in a sequence like: “Company X has sold its subsidiary Y to Z because the profits of Y have fallen dangerously these last years due to a lack of investments”. These phenomena cannot be managed by the usual ontological tools; in NKRL, they are dealt with using second order tools based on reification.

Semantic Networks: Basically, directed graphs (digraphs) where the nodes represent concepts, and the arcs different kinds of associative links, not only the ‘classical’ IsA and property-value links, but also, e.g., ‘ternary’ relationships derived from Case Grammar in Linguistics and labeled as Actor, Object, Recipient, Instrument etc. Representational solutions that can be reduced in some way to a Semantic Network framework include, among (many) other things, Ceccato’s Correlational Grammar – which goes back to the fifties – Quillian’s Semantic Memory, Schank's Conceptual Dependency theory, Sowa’s Conceptual Graphs, Lenat’s CYC, Zarri’s NKRL (Narrative Knowledge Representation Language), etc. Semantic Network solutions have been often used/proposed to represent different kinds of narrative phenomena.

Templates: In NKRL, templates take the form of combinations of quadruples connecting together the ‘symbolic name’ of the template, a ‘predicate’ – as BEHAVE, MOVE, OWN, PRODUCE… – and the ‘arguments’ of the predicate (concepts or combinations of concepts) introduced by named relations, the ‘roles’ (like SUBJ(ect), OBJ(ect), SOURCE, BEN(e)F(iciary), etc.). The quadruples have in common the ‘name’ and ‘predicate’ components. If we denote then with L i the generic symbolic label identifying a given template, with P j the predicate used in the template, with R k the generic role and with a k the corresponding argument, the NKRL core data structure for templates has the following general format: (L i (P j (R 1 a 1 ) (R 2 a 2 ) … (R n a n ))) . Templates are included in an inheritance hierarchy, HTemp(lates), which implements in practice the ‘ontology of events’ of NKRL.

Narrative Documents or ‘Narratives’: Multimedia documents (very often, unstructured, natural language documents like memos, policy statements, reports, minutes, news stories, normative and legal texts etc.) that constitute a huge underutilized component of corporate knowledge. In these ‘narratives’, the main part of the information content consists in the description of ‘events’ that relate the real or intended behavior of some ‘actors’ (characters, personages, etc.): these try to attain a specific result, experience particular situations, manipulate some (concrete or abstract) materials, send or receive messages, buy, sell, deliver etc. ‘Classical’ ontologies are inadequate for representing and exploiting narrative knowledge in a non-trivial way.

The Narrative Knowledge Representation Language (NKRL): ‘Classical’ ontologies are largely sufficient to provide a static , a priori definition of the concepts and of their properties. This is no more true when we consider the dynamic behavior of the concepts, i.e., we want to describe their mutual relationships when they take part in some concrete action, situation etc. (‘events’). NKRL deals with this problem by adding to the usual ontology of concept an ‘ontology of events’, a new sort of hierarchical organization where the nodes, called ‘templates’, represent general classes of events like “move a physical object”, “be present in a place”, “produce a service”, “send/receive a message”, etc.

Complete Chapter List

Search this Book: