Article Preview
Top1. Introduction
The advancements in Information and Communication Technologies (ICT) have led to new approaches in information exchange and communication. The growth of the Internet has increased exponentially and therefore playing a vital role in dissemination of information. The number of Internet users by June 30, 2012 was 2,405,518,376 which is 34.2% of the total world population (Internet World Stats, 2012). As a result, the World Wide Web (WWW) has evolved as an information hub on the Internet. It manages the enormous flood of information and presents it to the Internet users. However; the arrangement of information on WWW is important for its success. It should satisfy the needs of relevant users by providing right information in the right way with minimum hassle.
The WWW data is exchanged via client server communication over the Internet. The client makes use of web browser and sends the Hyper Text Transfer Protocol (HTTP) request to the web server. The web browser simply sends and receives the data. However the web server has more responsibilities to perform. It not only receives the request and sends the reply back but also stores access log details of different clients. The web server log file maintains history of page request as per W3C standard including Internet Protocol (IP) address, date and time of request, the requested page, HTTP code, and bytes served (Phillip & Brian, n.d.). The log file analysis provides an opportunity to improve the website for better use and benefit of the organization. As a result, the field of Web Mining has emerged as an important an area of research. Web mining is the discovery and analysis of web data. It is used to discover information which can be utilized for improvement of websites. Web Mining is divided into two major areas: Web content mining and Web usage mining. Web content mining is the automated search of useful information available from a large number of websites whereas web usage mining is the study and analysis of user access patterns through log files (Kavita, Gulshan, & Vikas, 2011). Web usage mining discovers useful knowledge from the web access log file. It aims to analyze the access records of client interaction and finds some interesting patterns. The results can be used to improve the structure of the website and performance of the web server (Navin, Solanki, & Manoj, 2010). It can also be used to improve the website recommendations, adaptive contents and personalized search on the website (Eirinaki, & Vazirgiannis, 2003).
Web usage mining is comprised of four steps (Srivastava, Sharma, & Kumar, 2011; Chitraa & Antony, 2010): data collection, data pre-processing, pattern discovery and the last pattern analysis. The data collection involves compilation of user log data from the web and proxy servers. The preprocessing involves data cleaning, user and session identification and path detection. The pattern discovery stage uses the data mining techniques to further refine the patterns. It includes classification, clustering, statistical analysis and machine learning algorithms. Finally the pattern analysis stage comes, which further process and filters the patterns.
For an educational institution, the website should be capable to provide the information to the students, teachers and parents. Its importance becomes more vital for a distance learning institution where the students, teachers and parents are separated by geographical distance. Web mining techniques can be applied to extract the web usage analysis to discover the hidden patterns. It can be used to gather the important information including number of visitors, popular hours and popular page hits. The information can be used to improve the web site structure, navigation sequence and publish productive academic information by enhancing the web services.