Extracting Knowledge from Web Data

Extracting Knowledge from Web Data

Hanane Ezzikouri (Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco), Mohamed Fakir (Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco), Cherki Daoui (Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco) and Mohamed Erritali (Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco)
Copyright: © 2014 |Pages: 15
DOI: 10.4018/jitr.2014100103


The user behavior on a website triggers a sequence of queries that have a result which is the display of certain pages. The Information about these queries (including the names of the resources requested and responses from the Web server) are stored in a text file called a log file. Analysis of server log file can provide significant and useful information. Web Mining is the extraction of interesting and potentially useful patterns and implicit information from artifacts or activity related to the World Wide Web. Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. The motive of mining is to find users' access models automatically and quickly from the vast Web log file, such as frequent access paths, frequent access page groups and user clustering. Through Web Usage Mining, several information left by user access can be mined which will provide foundation for decision making of organizations, Also the process of Web mining was defined as the set of techniques designed to explore, process and analyze large masses of consecutive information activities on the Internet, has three main steps: data preprocessing, extraction of reasons of the use and the interpretation of results. This paper will start with the presentation of different formats of web log files, then it will present the different preprocessing method that have been used, and finally it presents a system for “Web content and Usage Mining'' for web data extraction and web site analysis using Data Mining Algorithms Apriori, FPGrowth, K-Means, KNN, and ID3.
Article Preview

2. Web Log File

Web log files are files that contain information’s about the activity of visitors on the website. The log files are created automatically by Web servers. Each time a visitor requests a page (page, image, etc.) on the site, information about the request is added to the current log file.

Most log files have text format and each log entry is saved as a line of text.

2.1. Location of Weblog File

Web log file is located in three different location.

  • Web server logs: Web log files provide most accurate and complete usage of data to web server. Data of log files are sensitive, personal information so web server keeps them closed (Langhnoja et al., 2012).

  • Web proxy server: Web proxy server takes HTTP request from user, gives them to web server, then result passed to web server and return to user. Client send request to web server via proxy server (Charrad, 2005).

  • Client browser: Log file can reside in client’s browser window itself. HTTP cookies used for client browser. These HTTP cookies are pieces of information generated by a web server and stored in user’s computer, ready for future access (Pamutha et al., 2012).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing