A Dynamic Interoperability Model for an Emergent Middleware Framework

Standard middleware platforms are unable to cope with extreme heterogeneity and dynamicity of distributed systems. With new trends in mobile/pervasive applications, distributed systems are required to connect to one another at run time, implying that heterogeneities arising in systems need to be resolved on the fly. This ability of a system to interact with a different system is known as interoperability. More advanced solutions, which exceed the state-of-the-art in middleware, are required to handle interoperability on the fly. This paper investigates the challenges of enabling dynamic interoperability for the domain of vehicular ad-hoc networks (VANETs). The paper uses semantic web technologies to help devise an emergent middleware to enable different VANETs to interact with each other at runtime. An ontology-based framework coupled with an experimental evaluation of the framework is presented. The need for linguistic techniques in assisting ontologies is also emphasized in the framework.


INTRoDUCTIoN
Interoperability is the ability of a system to interact with other systems, possibly from different manufacturers, with the intent of sharing services and data among them.Middleware plays an important role in bridging the barrier between heterogeneous systems by providing a layer of abstraction to mask the heterogeneity of the underlying operating systems and hardware devices used.As long as there is one middleware standard operating, interoperability among the systems can be achieved.However, such common standards in the field of distributed systems are rare owing to the plethora of various middleware standards competing on the market.Recent distributed systems contribute to a wider variety of services such as sensor-based, mobile, ubiquitous and Internet-based.The level of heterogeneity, exhibited through the usage of hardware devices, operating systems, varying communication styles -infrastructure-based, ad-hoc and mobile-and new programming paradigms, significantly surpasses that of previous generation distributed systems.Given the high heterogeneity and dynamicity of evolving systems, the level of complexity in achieving interoperability has considerably increased.It is clear that such a problem necessitates more advanced and principled solutions that go beyond the state-of-the-art in middleware.This implies investigating new methodologies to explore the heterogeneity issue in distributed systems at run time.In particular, this paper looks into the potential use of ontologies to address heterogeneity in distributed systems, more specifically in VANETs.The aims of this work can be summed up as: 1. To investigate the potential role of ontologies in an emergent middleware architecture to resolve the underlying heterogeneity arising at different levels in a system 2. To investigate the potential role of ontologies in an emergent middleware architecture to enable a semantic reasoning at a conceptual level of a system The work presented in this paper was carried out for a PhD thesis, and detailed information about this proposed framework can be found at (Nundloll, 2013).

ILLUSTRATING THE CHALLENGES
Interoperation is possible between two distributed systems if both are able to interact at their application, middleware and network levels.Applications differ in terms of their data formats and interface methods; middleware protocols differ in their behaviour and message formats, and network protocols differ in their communication styles and packet formats.The need to handle heterogeneity in a distributed system has been highlighted in (Bennaceur, 2015), where a unified approach is adopted to deal with the interoperability issue at the application and the middleware layers.The emergent middleware concept has been contributed by the distributed system community whereby different systems are able to establish a dynamic connection at run time.The purpose of such an emergent middleware is to provide a level of abstraction on the different middleware standards employed by these systems.(Blair, 2011;Grace, 2009;Nundloll, 2011) present the contributions of the Connect project in enabling interoperability through an emergent middleware.

Application Level Heterogeneity
Fig. 1 illustrates two applications (A and C) providing a similar service but differ in the way they have been designed and in their data representation formats.For example, A uses a method getServices to retrieve all take-away delivery services for a given area, enabling the user to choose the nearest service; whilst C finds the nearest take-away service for the user through a findNearestService method.Data such as item price can be defined in pounds by A and in euros by C. Such differences pose a hindrance on A and C if they want to interoperate with each other.

Middleware Level Heterogeneity
As illustrated in Fig. 2, system A uses CORBA (OMG, 2022) to communicate whereas system C uses Web Services (David, 2004).Web Services are message-based, and they have no concept of objects.However, they provide abstraction over the hardware, operating system and programming language used.Therefore, distributed services, running on different platforms, can still speak to other services through the medium of web service interfaces.The architecture basically enables a client to send and receive messages.The protocol used for message exchange is normally SOAP (David, 2004).Whilst CORBA uses IDL to advertise its services, Web Services use WSDL.Both CORBA and Web Services differ significantly in their architectural and communication styles.Interaction between different middleware systems relies on a successful exchange of messages.However, messages produced by each middleware protocol vary in their formats.As shown in Fig. 2, CORBA messages cannot be un-marshalled by System C.

Network Level Heterogeneity
Each network uses a different routing protocol to route packets from the sender to the receiver.For instance, in system A (Fig. 2), the routing protocol used can be OSPF while in system C, OLSR (Jacquet, 2001) can be used to route packets.Since each network uses a different routing protocol, each network node maintains its own routing table.Hence, one network cannot route packets coming from the other network.This is a serious hindrance, which needs to be tackled to enable interoperability between heterogeneous systems.One possible solution is to use a gateway node that can maintain a routing table containing routing information for both different networks.Though this is a feasible solution, it is not applicable where two different networks encounter each other at run time.The routing protocols used by the different networks are significantly different and exhibit heterogeneity in the data format of the routing packets and also in the behaviour of the protocols itself.

Middleware Community
The middleware community is actively involved in providing solutions such as software bridges and interoperability frameworks.A software bridge acts as a one-to-one mapping between two different middleware environments to establish communication between them.It takes messages from a client in one format and marshals them to the format of the server middleware.For example, the OMG has created the DCOM/CORBA Inter-working specification (OMG, 2022).Additionally, SOAP2CORBA (Soap2Corba and Corba2Soap) bridges SOAP and CORBA middleware systems.Model Driven Architecture advocates the generation of such bridges to underpin deployed interoperable solutions.However, the development of these bridges is a resource intensive and timeconsuming task.Enabling universal interoperability will necessitate developing a bridge for every protocol pair.This also implies that a future protocol will require a mapping to every existing protocol.Finally, software bridges must normally be deployed and available in the network, which is not such a feasible solution especially where environments are resource constrained.
Interoperability frameworks are grouped in two sets.In one set, the data is translated into an intermediary representation at the source and then translated to the legacy format at the destination.Examples are the Enterprise Service Buses (ESB) (Menge, 2007), INDISS (Y.-D.Bromberg & Issarny, 2005), uMiddle (Nakazawa, 2006) and SeDIM (Flores-Cortés, 2007) .However, this solution is possible between two protocols which show matching behaviour.This means that protocols, with no matching attributes, are unable to interoperate.With the increase in protocols showing different behaviours, interoperability is limited only to a subset.The second set of interoperability frameworks substitutes the communication middleware such that it behaves as the peer or the server they wish to contact.Examples are ReMMoC (Grace, 2003), UIC (Román, 2001) and WSIF (Duftler, 2001).However, this approach is also resource consuming and requires that every middleware is designed in such a way that it can be substituted.Moreover, this approach is generally limited to client-side interoperability with heterogeneous servers.
All these solutions seem quite insufficient where dynamic interoperability is required.(Nundloll, 2020) presents the concept of a holon, which is basically a high-level representation of the services and requirements of a system.Holons can be used to enable opportunistic interactions between systems at runtime.The paper proposes the use of ontologies to capture the abstract definition of holons.Moreover, (Elhabbash, 2020) takes this idea of a holon further and presents an ontologicalbased approach to enable systems to self-adapt by opportunistically composing at runtime.The ontology-based approach is used to define distributed systems at a high level, hence abstracting their heterogeneity and enabling a programmatic construction of systems of systems.On the other hand, (Htaik, 2017) presents a framework to tackle heterogeneity issues arising in sensor devices by using semantic web technologies.The framework uses a semantic model to integrate sensor information into the existing middleware.
Semantic Web Community The Semantic Web community addresses the problem of data heterogeneity in applications.The W3C 1 states that the Semantic Web can act as a common framework in order to enable sharing and reuse of data across different boundaries.The Semantic Web aims at inserting semantic meaning into content found in web pages so that meaningful information can be extracted in a cohesive way from different applications running independently on the web.In an online article entitled "The Semantic Web" (Berners-Lee, 2001), Tim Berners-Lee provides a detailed account of the concept of the Semantic Web.(Feigenbaum, 2008) presents case studies on the use of the Semantic Web, such as drug discovery and health care.One technology contributed by this community is ontology.Ontologies 2 act as a vocabulary to define a given data concept and the relationships existing between different data concepts of a particular domain.Ontologies can describe the knowledge found in a domain and act as a structural framework in classifying and interpreting information.They can add meaning to data and infer new knowledge from existing data.The aptitude of ontologies to bring meaning to data makes them a useful feature in handling data heterogeneity issues.Hence, ontologies are being used in the development of web services to integrate data from disparate systems.Examples of ontology-based solutions are the Semantic Desktop (Siebert, 2006), Semantic Security Web Services (SSWS) (Denker, 2004) and Cultural Heritage and the Semantic Web (Benjamins, 2004).Ontologies are also used to tackle semantic interoperability and knowledge reusability.(Fraga, 2020) discusses the use of ontology-based solutions to achieve semantic interoperability in manufacturing industrial environments.(Liyanage, 2015) mentions that ontologies are being applied in health care for modelling medical concepts and integrating disparate data sets.The paper further presents an ontological toolkit to tackle some interoperability challenges.

The Challenge of Interoperability
The proposed framework embodies an emergent middleware framework applying ontologies to enable conceptual reasoning of systems to address the heterogeneity arising in message formats issued from routing protocols at runtime.The proposed framework necessitates a semantic analysis of protocols involved and entails the following stages: • Matching Protocols: comparing information labels of two routing protocol messages to check for any similarities/differences.• Classifying Protocols: discovering/defining the type of messaging from each system and classifying these protocols.• Mapping Protocols: using the information provided from the matching phase to build a bridge to enable both systems to interoperate.
The key research questions associated with the aims of this work are: 1. What is the potential role of ontologies in an emergent middleware, particularly with respect to introducing semantic reasoning at a conceptual level, rather than syntactic reasoning as employed in current systems? 2. How can ontologies support matching, classification and mapping of systems?a.How can ontologies reason over the differences arising in data at network levels of systems to enable matching of the system messages?b.How can ontologies provide the level of understanding necessary to classify messages received from a system?c.How can ontologies be subsequently used to provide a degree of mapping between heterogeneous messages?3. How are ontological reasoning and related functions incorporated into an emergent middleware architecture, and what are the implications for the architecture generally, especially in recognising the potential cross-cutting role of ontological reasoning?

Methodology
The methodology adopted is an experimental one, building on a generic software architecture tailored to enable dynamic interoperability for a specific domain and tested through the domain.To this effect, this paper explores the domain of Vehicular Ad-Hoc Networks (VANETs) to evaluate the framework based on the guidelines elicited earlier -matching, classification and mapping using an ontological approach.VANETs have been chosen owing to their richness of routing strategies available and their dynamic nature.Their diverse routing protocols contribute to a high heterogeneity of message formats and their dynamicity implies that different VANETs require interacting with one another at runtime.The domain knowledge of VANETs partially came from a separate experiment-a component-based framework to gain more insight on VANETs (Nundloll, 2009).The paper investigates different VANETs, significantly different from one another, and gathers domain information about such VANET-based systems to enable two different VANETs to interoperate.New software has been developed and tested, with the purpose of experimenting how ontologies can provide the conceptual reasoning for the VANET domain.The purpose of the experiment is to show that this approach can be potentially generalised and applied to another domain or indeed to another middleware layer.

Translating Network Messages
The proposed framework translates text-based messages from one format to another.It takes input messages issued from one system in a text-based format at runtime.Given that systems provide network messages in form of byte streams, these messages need to be converted first into a text-based format.One such tool to convert byte streams into text messages has been contributed by (Y.D. Bromberg, 2011) through their Starlink framework.This is a middleware solution devised to generate runtime solutions in the form of high-level models in order to enable two heterogeneous protocols to interact with each other.

Designing the VANET ontology
The purpose of the ontology design is to capture the definitions of different VANET concepts.The primary design considerations in the VANET ontology are to describe the existing routing strategies and to describe the type of information found within the routing messages.This section explains briefly how the Protege software (version 3.4.4)has been used to create the ontology before going into the details of the framework.It should be noted that at the time of conception of the framework, the stable release of Protege was version 3.4.4,owing to its full support for SWRL and SQWRL rules (which have been used in the proposed framework).The ontology describes the various routing strategies applied to VANETs.It is described using Manchester owl syntax.

VANET domain information
Fig. 3 provides examples of few VANET routing protocols together with the type of information they represent.For example, the first row shows BBR protocol that performs MFRBroadcast-based routing.
The main information required to perform this routing is represented in the BBR (Zhang & Wolff, 2007) routing messages and are CommonNeighbourNo, NeighbourList, Destination, BroadcastMeter, Longitude, Latitude.Other pieces of information also present include the Source, PacketID, etc.

Defining the fields
In order to define the routing messages, the ontology needs to define every field constituent of the routing messages.Fig. 4 shows the fields described in the vehicular ontology.These represent the type of information required by VANETs.The information presented in the left pane of Fig. 4 denotes a superclass concept called NamedFields, which defines all the types of fields present in the messages.For example, NeighbourFields, highlighted in the left pane, is a subclass of NamedFields and consists of two other subclasses -CommonNeighbourNo, and NeighbourList.This list of fields is not exhaustive.

Defining the Routing Strategies
The VANET ontology also models different routing strategies applicable to VANETs.The different ontological concepts representing these routing strategies have been modelled in All these routing strategies differ in the way they operate but they all carry information regarding the destination of the message they need to communicate.For example, in Cluster-based routing (Nundloll, 2009), a group of nodes identifies themselves to be part of a cluster and the node designated to be the cluster-head broadcasts the packet to the cluster.Therefore, ClusterBasedPacket is defined as having fields like Clusterhead (the head of a cluster), TargetRoute (the trajectory taken between destination and sender), and LocationCoordinates (the geographical coordinates of the destination node).The classes PartialClusterBasedPacket, PartialMFRBroadcast and PartialPositionBasedPacket are subclasses of the respective routing class, as they contain a subset of all the required information to perform a particular routing.For example, PartialMFRBroadcast states that this class has either CommonNeighbourNo or NeighbourList but not both.

NamedPackets
When a routing message is received at runtime, its information is compared against the information represented by the different routing strategies.In case of a match, this routing message is defined as NamedPackets (see Fig. 7).The figure shows few VANET protocols -BBR (Zhang & Wolff, 2007), Broadcomm (Durresi, 2005) and Lora-cbf (Santos, 2005), that have been named as BBRPacket, Broadcomm and Lora-cbfPacket respectively.Moreover, the right pane also shows the definitions of a BBR routing message.The BRR protocol discovers the nearest neighbours of a current node to send the packet and also designates a border node, which is the node lying furthest away from the source node .Therefore, the definition captured by the ontology shows that a BBR routing message consists of fields that encapsulate the underlying algorithm of the BBR protocol-Broadcast Meter, Common Neighbour Number, Neighbour List, Source, Destination, etc.Other routing messages created in the ontology are BBRLora, PartialBBR and PartialLora, and are based on variations of the real protocols.For example, BBRLora is a combination of the protocols BBR and Lora-cbf, that is, it combines features of Broadcast and Cluster-based routing.PartialBBR represents a protocol that behaves partially as BBR.Finally, PartialLora represents a protocol that behaves partially as Lora-cbf.Formulating other information Different properties have been defined in the VANET ontology to represent the relationships between the different VANET concepts.They are categorised as: (i) Object Properties represent relationships between two concepts, for example, IdentifiedPacket hasFields only NamedFields; (ii) Data Properties represent an object having a numerical or a text value, for example, CommonNeighbourNo hasFieldValue 3.Moreover, since there can be multiple labels to represent information, the ontology has also been formulated to increase the scope of matching the information labels.For example, the label 'Destination' stands for the destination node, however, this information can also be represented by labels such as 'To', 'End', etc. Table 1 provides few examples of instances formulated to remove the dependency over one limited label.

The matching phase
The matching phase involves comparing two different systems.The meaning of the system is derived from the content of the abstract messages obtained at runtime.These messages are defined as a series of field labels with field values and appropriate data types, which represent the syntactic information about the messages as shown in Fig. 7.The role of the ontology at the matching stage is to semantically describe these abstract messages, and compare these descriptions against the semantic message descriptions of existing routing messages found in the ontology.Fig. 8 illustrates a message received at runtime, defined as UnNamedPackets because the message has not yet been classified under a corresponding routing strategy.
Why ontologies are not enough for matching?Given that the ontology represents information using specific terms, it is unable to match information represented using different terms.In such a case, the ontology is not enough to perform the matching step.Table 1 shows the case of adding few more labels to the ontology to define some information.However, the list of terms supplied in this way is not exhaustive.A more efficient way is required to capture different representations of information.For example, a routing message can carry a field label like 'Dest', which stands for the term 'Destination'.However, it cannot be matched if the ontology represents this information with 'Destination'.Consequently, the message gets classified as an unIdentifiedField.This has repercussions on the successive steps such that neither the message is properly classified nor mapped by the framework.This is a crucial step in the matching process where additional tools seem necessary to identify fields termed as unIdentifiedField.Hence, the scope of the matching process has been enhanced by delving into the use of linguistic techniques.

The use of linguistics to enhance the matching phase
Linguistics refers to the study of the human language structure or grammar, and is based on a set of rules used by speakers of the language.Different linguistic techniques have been experimented to see which one increases the probability of matching information at runtime.This section gives a brief overview of the techniques applied -Phonetic Matching, Similarity Metrics, Preprocessing (filtering, stemming and tokenization), and Semantics of words.The following section narrates the experience of applying these techniques through a running example based on resolving irregularities linked to the word 'Destination'.Phonetic Matching Phonetics is a branch of linguistics that deals with the sounds made through speech.The sound descriptions can be represented by written symbols.One such technique is Soundex (Zobel & Dart, 1996), (Holmes & McCabe, 2002) and is one of the best-known phonetic matching schemes developed.Different numeric codes are associated with alphabets, and when put together in a word, they evaluate to a numeric value.In the case of matching 'Destination' with 'Dest', 'Dest' evaluates to D235 and the term 'Destination' has a Soundex code within a range of D230 -D235.However, the drawback of this method is that a word like 'Distance' also evaluates to D235, implying that this can result in an irrelevant match.
Similarity Metrics Similarity metrics revolve around approximate string matching.One such metric is the distance function, which calculates the similarity between two strings and evaluates to a range between 0 and 1 -0 if there is no match, 1 if there is a match, and intermediate values between 0 and 1 if there is a partial match.Distance functions such as MongeElkan and Matching Coefficient have been applied in this case.It was found that they can be used to differentiate between words such as 'Distance' and 'Destination' and found 'Dest' and 'Destination' to be similar.

Additional processing of phonetic results
Although there was a match between 'Dest' and 'Destination', the similarity metrics are not able to find any match with labels such as 'FieldDestination', 'FieldDest', 'FldDEst', 'Fld_Dest'.This raised the concern of doing some preprocessing with the labels by applying methods such as filtering, tokenization and stemming.
The tokenization function enables to separate tokens from a given label.Each token is then checked using the stemming algorithm for any abbreviations.Those abbreviations, designated to be filtered out, are then removed using the filtering function.
Semantics of words Whilst the phonetics methods are useful to match variations of a given word, they are unable to match words that may not 'look' similar, but have the same meaning.Semantic matching looks at the semantic similarity between words using a dictionary-based approach.The approach here is based on the use of the WordNet dictionary, a huge lexical database of English words, consisting of groups/synsets of related words based on their meanings.A synset represents a distinctive concept, represented by a set of words whose meanings are cognitive synonyms.All the words in WordNet are organised into synsets and a particular word can be found in several synsets.The synsets are classified into four parts of speech (POS) -nouns, verbs, adjectives and adverbs.WordNet makes use of the POS value of a given word to look up for associated words, which share a semantic relationship with that word.Therefore, the use of the Wordnet dictionary can enable us to find semantic similarities between labels such as 'End' and 'Destination'.

The Classification phase
The classification phase steps next in order to identify the provenance of these messages, that is, the type of routing protocols they belong to.It is used to classify semantically formulated concepts within a taxonomy in order to perform searches based on specific criteria.

Classification of the VANET Ontology
Classification of the ontology is concerned with identifying the nature of the routing strategy represented by the received messages.This is achieved through the ontology reasoner which computes a superclass-subclass relationship of the concepts demarcated as primitive and defined.Example of a primitive concept is BBR, received at runtime and is unknown, whereas a defined concept will be a routing protocol like Broadcast-based.The ontology reasoner will hence classify BBR as a subclass of the superclass Broadcast-based.The classification process is based on an open world assumption made by OWL: if something is not present, then OWL assumes that the required knowledge has not been added yet to the knowledge base.
Fig. 9 shows the inferred ontology and the messages classified under ClusterBasedPacket.One such message is Broadcomm, a routing message issued from a Cluster-based routing protocol, and contains information such as ClusterHead, TargetRoute and LocationCoordinates.This is a subclass ofNamedPackets, and is a primary concept.The same procedure applies for creating other types of routing strategies, and enabling the identification of messages at runtime.Fig. 10 shows two other routing strategies and the type of information they each require formulated as restrictions.

Routing messages performing more than one routing strategy
The reason for creating routing messages such as BBRLora, PartialBBR and PartialLora is to increase the scope of classifying routing messages.Fig. 11 demonstrates the classification results for BBRLora, highlighting the fields that constitute this particular routing message.Given that among this set of fields, BBRLora contains information that can be used to perform both Broadcast-based (like BBR) and Cluster-based routing (like Lora-cbf), the ontology reasoner classifies BBRLora under both Broadcast and Cluster-based, as shown in the figure.

Routing Messages that partially perform a routing strategy
Similarly, cases like PartialBBR and PartialLora represent routing messages that partially perform Broadcast and Cluster-based routing respectively.In order to identify such routing messages, new defined concepts are required that partially describe the information requirements of these routing strategies.Fig. 12 illustrates 3 defined concepts-PartialClusterBasedPacket, PartialMFRBroadcast and PartialPositionBasedPacket -implying that they partially contain the information required to perform Cluster-based, Broadcast-based and Position-based respectively.For example, Fig. 12 shows that PartialPositionBasedPacket is formulated as having LocationCoordinates or TrajectoryFields but does not have both fields required for performing Position-Based routing.

Classifying unknown messages
Messages received at runtime get stored as unIdentifiedPacketRecv and later get classified by the reasoner under the appropriate routing strategy.Fig. 13 shows a message, stored as unIdentifiedPacketRecv0, that gets classified as Cluster-based routing, implying that this particular message is probably issued from a cluster-based routing protocol.As shown in the figure, this class contains information, such as Destination, IntermediateIP, ClusterHead, TargetRoute, Longitude and Latitude, identified by the ontology.

The mapping phase
The classification step highlights whether two messages belong to the same family of routing protocols, in which case they can be interchanged between the systems.Therefore, the purpose of the mapping phase is to check if a message received from system A can be formulated into a message format required by system B.This phase entails mapping field values of messages of one system from their original data type to the data type of the corresponding fields of the messages of the other system.Hence, this phase necessitates reasoning about the data types of the corresponding field values and devising a conversion mechanism between them.The proposed framework enables such mapping through user-defined built-ins embedded within SWRL rules, which represent an extra feature provided by ontologies.These built-ins can trigger a mapping engine which, in turn, activates a converter factory to enable required data type conversions at runtime.The mapping framework consists of 3 main components: user-defined built-ins, a mapping engine, and a converter factory, as illustrated in Fig. 14.
SWRL Rules SWRL, the Semantic Web Rule Language, is an OWL-based language designed to formulate rules using OWL concepts within the ontology.The reasoning capabilities of SWRL are more powerful than OWL-based reasoning alone and have been adapted to perform mapping into the proposed framework.The built-ins are defined as instances of a class called swrl:Builtin in the SWRL ontology, which should be imported by the actual ontology using the SWRL rules.These builtins require a namespace qualifier.Whilst qualifiers for the core built-ins already exist in the SWRL ontology, qualifiers for user-defined built-ins need to be explicitly created through a separate ontology.
How the mapping engine invokes the converter factory?All the values (field value and field data types) are first sent as strings through the user-defined built-in.The mapping engine invokes a converter factory which provides the required methods to perform the conversions at runtime.The role of the mapping engine is to instantiate this converter factory and to load a map object at runtime, which maps strings representing the actual data types of the field values to the appropriate data types.For example, if the data type of a field value is integer, which is passed as a string like "Int", then the mapping engine enables to map this string onto the appropriate class type Integer.Assume the input received at runtime through the built-in is ("10", "int", "float").Therefore, the map, loaded at runtime, invokes the converter factory to enable the conversion of the string "10" to an Integer value.The resulting object is an instance of the converter factory, which invokes the required method, floatValue(), to convert the value from Integer to Float.The converter factory can enable to perform both primitive (simple data types) and composite mapping (composite data types).The outline of this mapping process is depicted in Fig. 14.
Furthermore, regular expressions are also employed in order to parse composite values, for example, Struct("int, int") would need the string "Struct" removed from the argument before being processed.Another feature of the composite mapping is an array of object values returned by this processing.Given that the SWRL built-in can only return one object value, the built-in has been modified to accommodate returning more than 1 argument of primitive type.

Use of SQwRL
In case mapping is not possible between two systems, the interoperability model can still deduce the underlying heterogeneity between the systems involved.This is achieved through SQWRL (O'Connor & Das, 2009) (Semantic Query Web Language), a query language based on SWRL and used to provide SQL-like operations to query knowledge from OWL.The SQWRL library provides a set of core/ collection operators used within the SQWRL rules to query the underlying differences in the messages obtained from systems, as shown in Fig. 15.The rule states: find all the fields in the messages from systems A and B and compare both collections of fields to find the differences between them.The results returned here are all the fields not found in the unidentified system, required for the latter to function as system A. Through the use of the SQWRL operations, the model makes an attempt to show that the underlying differences can still be discovered at runtime, and that they can help to devise future mechanisms to handle such heterogeneities in order to enable interoperability.

EVALUATIoN
This section assesses the proposed framework in terms of its ability to match different protocols, and in terms of its overall performance.

Experimental Set Up
A Java-based framework merges all the components presented for the experiment into one single workspace through handling packets received and manipulating the ontology at runtime.This is achieved through the Protege-OWL API, which provides an open-source Java library to load and save the vehicular ontology at runtime, to enable manipulation of OWL data models and also to enable reasoning through classification based on Description Logic engines.The framework also executes all the linguistic-based techniques to increase the matching scope and evaluates the mapping phase for reformulating messages at runtime.Evaluation Strategy This section investigates the research questions in Section 3.1, revolving around the role played by ontologies in each of the 3 phases of the proposed framework.The evaluation looks at whether interoperability can be achieved across the multitude of different protocol, grouped into families: • Same family/Same protocols with different implementations: Routing protocols performing similar routing but implemented differently.• Same family/Different protocols: Different routing protocols belonging to same family of routing because of similar information contained within.• Different families: Different routing protocols performing different routing.The complexity of interoperability is higher owing to the higher level of heterogeneity.
Table 2 provides a brief overview of five routing protocols used in the evaluation of the VANET systems.A few fictitious protocols are also added to the table in order to evaluate the framework -PartialBBR, PartialLora and BBRLora.

Matching protocols
The evaluation of the matching attempts to test the following hypotheses: • Hypothesis 1: Ontologies should add significant value to matching.
• Hypothesis 2: Linguistic Techniques such as phonetic and semantic matching techniques significantly increase the probability of matching.
Same Family of Protocols/Same Protocols (Different Implementations) Fig. 16 presents the families together with examples of routing protocols, and Fig. 17 shows their matching results with/without linguistic techniques.
Same Family of Protocols/Different Protocols The different routing protocols belonging to the same family are shown in Fig. 16 and their matching results in Fig. 18.Overall Results Fig. 20 summarizes the matching percentage value obtained per each family of protocols.In general, linguistic techniques are highly effective for matching and are used for all future experiments.

Classifying protocols
The hypothesis being assessed here is: Hypothesis 3: Ontologies should facilitate classification of systems if they are properly matched.The routing messages are created as primitive concepts within the ontology under UnNamedPackets since the identity of the packets is not known yet.The Pellet reasoner is used to classify these primitive concepts as subclasses of appropriate routing strategies within the ontology.Fig. 21 portrays the resulting ontology after the classification of these test cases at run time.For example, BBRPacket, a packet defined by the ontology, is colour-coded orange.All instances of BBR are UnIdentifiedPacektRecv0, UnIdentifiedPacketRecv1 and UnIdentifiedPacketRecv2 and are colour-coded orange.When all the fields of a message have been identified during the matching phase, the reasoner classifies the message as IdentifiedPacket.Given that the packet BBRPacket is also classified under MFRBroadcast, it implies that this packet also performs broadcast-based routing.Furthermore, given that Lora-cbf packet is a cluster-based packet, the  Looking at the Different Families (BBR and Broadcomm), they only have a 29% match.This is reflected in the classification results, where the test case of Broadcomm, UnIdentifiedPacketRecv9, is correctly classified under Cluster-based routing, and also under Position-based routing, but is nowhere similar to BBR, which does Broadcast-based routing.
Finally, just to show the classification results of an unidentified packet, BBR1 packet has been matched without the linguistics techniques and stored as UnIdentifiedPacketRecv11 in the ontology.Given that there are unidentified fields, the reasoner is unable to properly classify this packet under any routing strategy and hence classifies it as UnIdentifiedPacket.The implication of this result is that this packet cannot be mapped to any other packet.

Overall Results
Table 3 shows the classification percentage value obtained per family of protocols.As the table shows, a matching value of 100% is likely to produce a 100% classification result.A proper matching of the field labels implies a proper classification of the message.Table 3 proves that linguistics techniques play a very significant part in the whole framework since they indirectly increase the chances of a proper classification by increasing the accuracy of matching.
Every field should ideally be matched exactly to one field so that the definition of the received packet can be formulated correctly.However, there are few trivial discrepancies that arise.Taking the case of BBR1 as an example, the reason it is only 85% classified is because although BBR1 gets properly classified under the expected MFRBroadcast, it also gets classified under PartialCluster/ PartialPosition routing because of the additional field Latitude.
Another discrepancy is that even if matching is done correctly for all the received fields, the reasoner is still unable to properly classify a particular packet, as in the case of Trade (UnIdentifiedPacketRecv6). Given one field is missing from this packet, the packet gets classified under PartialCluster/PartialPosition instead of the expected Position/Cluster routing.Even though this is a more serious discrepancy, this problem is beyond the scope of this framework as it cannot create missing values.Nonetheless, such packets are still validated to proceed to the mapping phase since they have been classified in subclasses of the expected routing strategy.The problem arising in the mapping phase is that it will not be able to map certain missing values.

Mapping protocols
The hypothesis assessed here is: Hypothesis 4: Ontologies should faciliate mapping of heterogeneous messages if they are validated through the classification process As shown in Table 4, the category Same family/Same protocols show 100% mapping since the protocols have the required fields for mapping.The next category Same Family/Different protocols shows at least 50% mapping because all the fields may not be present in the protocols being mapped.Unlike the previous case, a two-way reformulation is not possible in this case.Finally, the third category Different Families shows 0% mapping because the protocols are found to be highly heterogeneous.

overall performance of the framework
The hypotheses being assessed here are: • Hypothesis 5: An emergent middleware architecture can be designed for doing semantic reasoning.
The proposed framework shows that such an emergent middleware can indeed be designed for doing semantic reasoning.The framework shows that it can capture the underlying heterogeneities of messages received at runtime, and is also able to devise a strategy to reason about them through the support of ontologies.
• Hypothesis 6: The valid and effective middleware incorporates software architecture that captures the cross-cutting role of ontologies.
The proposed framework consists of 3 distinct steps (matching, classification and mapping) which are interlinked with each other.
• Matching: The information carried by the messages needs to be parsed first to match the field labels against the ontology repository labels so that they can be identified by the ontology.• Classification: The classification process compares these messages through the use of the reasoner engine.This process validates the next phase, which is the mapping phase.• Mapping: The mapping phase establishes the conversion of the field values to the appropriate corresponding datatype through extensive use of SWRL rules, executed by the ontology.

Performance Evaluation of the framework
Fig. 22 displays the time taken by each particular phase.Generally, the matching phase is seen to take more time than the other phases, simply because this phase does some more processing together with the linguistic tools.Moreover, the higher the number of fields in a message, the higher is the time taken by the matching phase.On the other hand, the high rise noted for some of the classification cases is related to the high number of instances present within the ontology.This issue can be solved by separating the instances from the ontology knowledge base and making use of a triple store geared at handling and reasoning over semantic data.For the final category -different families -the mapping time is noted to be 0.This is because both protocols tested in this case belong to different families and have little in common.Hence, the classification phase does not validate the packets to proceed with mapping.The high classification value is due to having at least 20 instances of packets created within the ontology.Fig. 23 plots the total time taken by the framework for each category.The first set of values assumes that there is no latency incurred, in which case, the framework consumes approximately 20 to 30 seconds only.However, in the second set of results, suppose there is a latency of 1 min before receiving the second message, the total time taken considerably increases.

Analysis of the Framework
The ontology describes routing protocols that are identified by one message only.However, to identify protocols having a sequence of messages (such as AODV on the basis of its three messages RREQ, RREP, RRER) through the ontology, it is not possible in the current framework.This shows that for a case like AODV which consists of a sequence of 3 messages, the framework should not only cater for this through the use of a state machine to keep track of the sequence of the messages but should also modify the matching phase to enable matching of a new type of message.The matching phase is able to resolve heterogeneities arising in field labels using phonetic and semantic matching techniques.However, fields with very ambiguous abbreviations may impact phonetic matching.Moreover, the framework relies on the use of the WordNet dictionary to enable semantic matching.Given that WordNet is a generalpurpose dictionary, the semantic phase often provides irrelevant matches.The best solution in such a case is to provide a dictionary which is tailored for capturing the intricacies of a particular domain.For example, a specialised dictionary for the VANET domain can significantly improve the accuracy of the semantic phase.It would also be interesting to see if the use of AI techniques can help bridge the gap between highly heterogeneous protocols.Finally, in order to improve the evaluation results, it would also be good to see how the framework behaves in tackling a new domain of application protocols.

CoNCLUSIoN
This paper has presented the concept of an emergent middleware in bridging two unknown systems at runtime.The proposed framework is based on a semantic reasoning of systems using ontologies and consists of three distinct phases -matching (defines a system), classification (classifies the system according to its definition) and mapping (translates the system data for use by another system).To increase its accuracy in interpreting data, the matching phase highlights the benefits of applying linguistic techniques to surmount the limitations of ontologies during that phase.On the other hand, through their reasoning capability, ontologies can be used to decide whether interoperability is possible between heterogeneous systems.We think that ontologies have actually been under-utilised in the field of Distributed Systems and believe that ontologies provide great scope to understand the meaning of data as well as the behaviour of systems.

Figure
Figure 1.Application level heterogeneity

Figure 3 .Figure 4 .
Figure 3. VANET routing protocols and packet information Fig. 5. Part(a) shows the classes Packets, ClusterBasedPacket, PositionBasedPacket and MFRBroadcast.These represent a packet definition, cluster-based routing, position-based routing and broadcast-based routing respectively.Part(b) shows PartialClusterBasedPacket, PartialPositionBasedPacket and PartialMFRBroadcastPacket. These show that they partially contain the required information for performing a particular routing protocol.Fig. 6 shows the concept Packets, which represents Opportunistic forwarding (Broadcast), Position-based forwarding, Trajectory-based forwarding, Restricted-directional flooding, Contentbased forwarding and Cluster-based forwarding.These are represented as sub-concepts such as ClusterBasedPacket, PositionBasedPacket and MFRBroadcastPacket, which represent respectively Cluster, Position and Broadcast-based routing.

Figure 8 .
Figure 8. Concept UnNamedPackets formulating messages received at runtime

Figure 14 .
Figure 14.Mapping framework for reformulation of a packet

Figure 16 .
Figure 16.Example of routing protocols

Figure 17 .
Figure 17.Matching results for same family/same protocols

Figure 22 .
Figure 22.Time taken by each phase