Web resources exploration is increasingly driven by semantic web technologies with automated processing. Earth science communities generate large amounts of datasets described in hundreds of millions of metadata records. It is critical to discover the accurate data from the millions of data records based on the end user's searching intent. However, the big challenge is how to ensure that catalogs and Spatial Web Portals can understand end user's intents. To enable portals effectively ‘understand' the meaning of user's queries and to provide a better searching experience for end users, we collaborated with Earth Science Information Partners (ESIP) to develop such a capability through a semantic Testbed. We implemented a reasoning engine using similarity calculations to facilitate the meaningful discovery of Earth science data and to improve the accuracy of searching results.
TopIntroduction
Earth science communities generate and publish datasets and services described in metadata records. To promote the broad sharing of the geospatial data, services and other resources among public users and government, researchers proposed the Spatial Web Portal (SWP; Yang et al., 2007), which can be considered as an interface to geospatial cyberinfrastructure (Yang et al., 2010), in which the mechanisms for Earth science data storage, indexing, editing, searching, visualization and analysis are provided through an interactive web interface. For example, the FGDC Virtual Arctic Spatial Data Infrastructure (SDI), which is established upon the Service-Oriented Architecture (SOA), has incorporated most available Arctic WMSs for online service chaining and map integration (Li et al., 2010; Li et al., 2011). We built for the intergovernmental GEO (“Group on Earth Observations,” 2011) the GEOSS (Global Earth Observation System of Systems) clearinghouse (http://clearinghouse.cisc.gmu.edu/geonetwork) to facilitate the discovery, access, and utilization of Earth observation data, information, tools and services using standardized metadata. By July 2012, 133 remote datasets or services and 167 K metadata have been registered/harvested by the GEOSS Clearinghouse. The ever-increasing resources in national catalogs and clearinghouse pose great challenges for effective resource discovery.
Traditional searching tools, built upon keyword matching technology, are weak in understanding user behavior and providing the most relevant results. Success in searching engines of SWP is not only a matter of quantity of the resources but also the quality of the resources found. Two factors are always used to evaluate the performance of the process of Earth Science records discovery using SWPs: precision and recall. Precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved (“Precision and Recall,” 2011). 1) Users of the Earth science data and information are hindered by syntax mismatches between users and providers (Raskin & Pan, 2005). With millions of geospatial data, services and other resources, there is a big challenge for the catalogs and SWPs to search the most relevant records to help users discover the geospatial information effectively. 2) Normally, SWPs discover Earth science records by matching text using search terms input online by end users. It is difficult for SWPs to understand the meanings of the search terms and do the extensive discovery. Therefore, both the precision and recall are important and should be considered when improving the efficiency of records discovery.
The 21st century witnessed the emergence of the semantic Web (Berners-Lee, 2001) for web resources exploration with a focus on automated processing. The goal of the semantic Web is to augment the current World Wide Web (WWW) with a highly interconnected network of data that can be easily exploited and processed by both machines and human beings. Thus, the semantic Web is designed to make Web data more meaningful so that it can be understood, interpreted, manipulated, and integrated. To this end, W3C proposed a series of formal specifications to specify how Web resources could be modeled, interpreted and presented. Some of these include Resource Description Framework (RDF), RDF Schema (RDFS) and Web Ontology Language (OWL). Some semantic discovery researches based on ontology matching and integration have been introduced to Earth Science (Zhang et al., 2010 a). By formalizing such semantics of user query behavior and modeling them in these standardized machine languages, the semantic web can help machines further improve the performance of a search engine.
This paper reports our research to improve the discovery of Earth science records based on the semantic Web using a case study of ESIP semantic testbed (Yang et al., 2008). The research problem we are trying to address is “Among all the results returned, which ones fit best a user request?” For example, a query of “Natural resource WMS” will return many different records and it becomes extremely difficult for users to pick the best match. Therefore, it will be helpful if the system can evaluate the relevance between the Earth science records and “Natural Resource WMS” to rank the results. This paper presents our research on using semantic similarity calculations for results ranking.