An Efficient Approach for Ranking of Semantic Web Documents by Computing Semantic Similarity and Using HCS Clustering

An Efficient Approach for Ranking of Semantic Web Documents by Computing Semantic Similarity and Using HCS Clustering

Poonam Chahal, Manjeet Singh
Copyright: © 2021 |Pages: 12
DOI: 10.4018/IJSVR.2021010104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In today's era, with the availability of a huge amount of dynamic information available in world wide web (WWW), it is complex for the user to retrieve or search the relevant information. One of the techniques used in information retrieval is clustering, and then the ranking of the web documents is done to provide user the information as per their query. In this paper, semantic similarity score of Semantic Web documents is computed by using the semantic-based similarity feature combining the latent semantic analysis (LSA) and latent relational analysis (LRA). The LSA and LRA help to determine the relevant concepts and relationships between the concepts which further correspond to the words and relationships between these words. The extracted interrelated concepts are represented by the graph further representing the semantic content of the web document. From this graph representation for each document, the HCS algorithm of clustering is used to extract the most connected subgraph for constructing the different number of clusters which is according to the information-theoretic approach. The web documents present in clusters in graphical form are ranked by using the text-rank method in combination with the proposed method. The experimental analysis is done by using the benchmark datasets OpinRank. The performance of the approach on ranking of web documents using semantic-based clustering has shown promising results.
Article Preview
Top

1. Introduction

The tremendous amount of documents are present in the form of web pages in World Wide Web (WWW) (Wang & Wu, 2013) which contains the dynamic and expandable information. This information needs to be retrieved by the users of web for which they provide a query to the search engine. Millions of billions of users query have been submitted by the users for retrieval of information as desired by them. The relevant retrieval of huge volume of information is a challenging task in semantic web. Most of the ranking techniques give user a ranked set of documents as per the query specified by them to the search engine, but the result-set is not according to the user desire. Also, research has been done (Spink et. al., 2001; Croft W. B., 1980) about the types of submission of user queries like monogram, bigrams, trigrams etc. which has also made the process of retrieval of information complex and critical task.

Various techniques have already been given by researchers for information retrieval from semantic web. Semantic web is the extension of web which focuses on making web machine readable and the processing of information can be completely done by the machines. In view of this fact, it has been noticed (Li et. al., 2012) that first the clustering of documents in consideration with the semantic information available in the web documents and then applying the appropriate ranking algorithm on the clusters obtained helps to retrieve the relevant information.

In our approach the major contributions are as following:

  • 1.

    First, we will extract the semantic information from each web documents depending upon the concepts along with relationship connecting the concepts corresponding to the actual presence of the words in the documents in combination with LSA and LRA.

  • 2.

    The semantic information extracted will be represented in the graphical manner and the HCS clustering algorithm is applied to attain the maximum connected subgraphs which will further be representing only the relevant semantic part of information.

  • 3.

    The proposed ranking algorithm with the text rank will be applied to rank the set of documents graph in different domain related cluster depending upon the user query.

The paper is having different sections as follows: Section 2 gives the work related in the field of information retrieval using techniques like clustering and ranking of web documents depending on the query. In section 3, we give the proposed approach in detail along with the algorithms for ranking of web documents. Section 4 provides the empirical analysis of the proposed approach with the existing approaches and we conclude our work with the further scope in section 5.

Top

Information retrieval of required information by the web users is the most critical task. For the efficient information retrieval of the desired information present in semantic web pages needs to be analysed semantically. The semantic information extraction from a semantic web document is mainly done by using ontology. Ontology is the conceptual view of the concepts and their relationships that exist related to a particular domain. The analysed semantic web documents are then ranked to provide the users a result-set as per the query specified by the users in a search engine. In the field of information retrieval many researchers have given different approaches to provide users with information which is semantically useful and related to the conceptual view of users regarding the search engine query. (Starr & De Oliveira, 2013) has built a technique for construction of ontology. For construction of ontology an application is used by the authors to analyse the model maps depending on the words/concepts and their relationships. The application used by the authors helps to remove the ambiguous words/concepts and their relationships to construct a consistent ontology showing all the specialization and generalization among the incorporated concepts.

(Song and Park, 2009) has given a Latent Semantic Indexing (LSI) based genetic algorithm for clustering of text/information. The given algorithm supports in reducing the dimensionality by considering synonyms and polysemy. The number of clusters which are needed to be formed is decided by the variable given by the authors which is of string type (Chen et. al. , 2019, Gavankar & Ghosh, 2019).

Complete Article List

Search this Journal:
Reset
Volume 7: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 6: 2 Issues (2023): 1 Released, 1 Forthcoming
Volume 5: 1 Issue (2021)
Volume 4: 2 Issues (2020)
Volume 3: 2 Issues (2019)
Volume 2: 2 Issues (2018)
Volume 1: 2 Issues (2017)
View Complete Journal Contents Listing