Semantic Web mining for Content-Based Online Shopping Recommender Systems

Semantic Web mining for Content-Based Online Shopping Recommender Systems

Ibukun Tolulope Afolabi (Covenant University, Ota, Nigeria), Opeyemi Samuel Makinde (Covenant University, Ota, Nigeria) and Olufunke Oyejoke Oladipupo (Covenant University, Ota, Nigeria)
Copyright: © 2019 |Pages: 16
DOI: 10.4018/IJIIT.2019100103

Abstract

Currently, for content-based recommendations, semantic analysis of text from webpages seems to be a major problem. In this research, we present a semantic web content mining approach for recommender systems in online shopping. The methodology is based on two major phases. The first phase is the semantic preprocessing of textual data using the combination of a developed ontology and an existing ontology. The second phase uses the Naïve Bayes algorithm to make the recommendations. The output of the system is evaluated using precision, recall and f-measure. The results from the system showed that the semantic preprocessing improved the recommendation accuracy of the recommender system by 5.2% over the existing approach. Also, the developed system is able to provide a platform for content-based recommendation in online shopping. This system has an edge over the existing recommender approaches because it is able to analyze the textual contents of users feedback on a product in order to provide the necessary product recommendation.
Article Preview
Top

1. Introduction

It is obvious that the importance of data in the 21st century cannot be over emphasized, it is due to this fact that application areas such as medicine, transportation, ecommerce, banking and many other areas, seek to harness its power for various reasons. Data can be structured or unstructured, when it is structured, it has a degree of organization that is readily searchable and quickly consolidates into information. Structured data is well defined, predictable, and managed by an elaborate infrastructure (Lee, 2017). As a rule, most units of data in the structured environment can be located very quickly and easily. Typically, structured data is managed by a database management system (DBMS) and consists of records, attributes, keys, and indexes. Unstructured data on the other hand are not organized in a specific manner and does not have a specific data type or orientation. Without preprocessing, storing this data type in a table is impossible. Examples are; call centre data, email, social media data and so on (Johnson & Kumar, 2012). Data mining involves recognizing patterns from large data set using various mathematical and machine learning methods such as Support Vector Machines, Classification, Artificial Neural Network (ANN), etc. (Xu, Zhang, & Li, 2011). Unstructured data mining is the practice of looking at relatively unstructured data and trying to get more refined data sets out of it. It regularly comprises of removing data from sources not generally utilized for data mining activities. In unstructured data mining, advancements would separate the data, searching for identifiers and bits of data. Finally, semi-structured data mining refers to data that does not exist in a relational database but rather has some hierarchical properties that make it less demanding to analyze. Mining semi-structured data includes finding and separating helpful information from semi-structured data sets. The first step of unstructured mining is the data processing and the techniques used are dependent on the type of data to be mined.

Web mining lies in between and copes with semi-structured data and/or unstructured data. Web mining calls for creative use of data mining and/or text mining techniques and its distinctive approaches. Mining the web data is one of the most challenging tasks for the data mining and data management scholars because there are huge heterogeneous, less structured data available on the web that one can easily get overwhelmed with (Zhang & Segall, 2008). The contents of data from the web may be a collection of facts that webpages are meant to contain, and these may consist of text, structured data such as lists and tables, and even images, video and audio. Therefore, web mining extract valuable information or patterns from the Web hyperlink structure, page content, and data usage. In recent years with the rapid progress of the internet, one can easily access various kinds of information on the World Wide Web (WWW). However, it has become very inconvenient for users to obtain relevant interesting information on the WWW as information resources on the WWW is growing rapidly and there is a large amount of redundant and irrelevant information on web pages. Web mining can be categorized into web usage mining, web structured mining and web content mining (Gunasundari & Eswari, 2013; Mihai, 2009). Web usage mining deals with extricating patterns and data from server logs to understand client or user actions including where they are from, the number of clients or users who clicked an item on the web page and the sort of activities being done on the website (Mishra & Tripathi, 2015). Web structure mining deals with analyzing the nodes and connection structure of a website using graph theory. There are two things that can be obtained from this: the structure of a website in terms of how it is connected to other sites and the document structure of the website itself, as to how each page is connected (Anami, Wadawadagi, & Pagi, 2014). Web content mining is the process of mining useful information from the contents of web pages and web documents, which are mostly text, images and audio/video files. Techniques used in this discipline have been heavily drawn from natural language processing (NLP) and information retrieval (Asvial, Budiyanto, & Gunawan, 2014; Gunasundari & Eswari, 2013).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 16: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing