Higher education IT project managers have always relied on user activity data as logged in one form or another. Summarized counts of users and performance trends serve as essential sources of information for those who need to analyze problems, monitor security, improve software, perform capacity planning, etc. With the reach of the Internet extending into all aspects of higher education research and teaching, however, new questions have arisen as to how, where, and when user activity gets captured and analyzed. Tracking and understanding remote users and their round-the-clock activities is a major technical and analytical challenge within today’s cyber-infrastructure. As open content publishing and open source development projects thrive in higher education there are some side effects on usage analysis. This chapter examines how data mining solutions – particularly Web usage mining methods– are being taken up in three open systems project management contexts: digital libraries, online museums, and course management systems. In describing the issues and challenges that motivate data mining applications in these three contexts, the chapter provides an overview of how data mining integrates within project management processes. The chapter also touches on ways in which data mining can be augmented by the complementary practice of data visualization.
For decades, automatically generated logs have been a vital source of information within academic computing. From timeshare minicomputers to lab-based workstations, logs have provided data at the heart of acquisition, recharge, licensing, security, usability, and capacity planning processes. In the Internet era, Web log data becomes an even more essential and often sole source of information for all of these same reasons together with the added urgency of supporting round-the-clock, remote access. Tracking users’ interactions with digital library content, online museum collections, or e-learning material is crucial. Tools for making sense of usage within these systems depend heavily on the underlying Web technology. The monitoring features available to administrators of these systems rely heavily on Web application server logs as a record of the visitor’s access. Further complicating the usage analysis problem is the trend within higher education towards open content, open source application frameworks, distance learning and a more general embrace of open online research collaboration (Lynch, 2007). Academic institutions are also increasingly involved in collaborative efforts to develop open source alternatives to commercial applications such as repositories, portals, and collaboration environments (Olsen, 2003). This shift in the locus of software development away from commercial companies and into loosely organized consortia of higher education institutions, however, ends up with noticeably different processes and results.
Key Terms in this Chapter
Open Content: Open content usually refers to research or educational material that can be distributed and re-used freely. The types of content can range from previously published books and articles to educational software simulations and lesson plans. Key concerns for those who provide or use open content include ensuring that the material can be easily adapted, integrated, and reconfigured in new online settings.
Open Systems: Many computing devices that people encounter and use nowadays are not isolated. Especially in the Web era, the computing involves systems of software applications, databases, personal computers, servers, etc. all working together. The openness of such system refers in part to their accessibility by users throughout the world, but also to the ease in which the underlying system components are brought together. Rather than try to design entire systems from the top down, it has proven very powerful to design potential system components so that they can share and understand universal protocols and common services – allowing them to be combined in new and unanticipated ways later on.
Web Usage Mining: As a sub-field of data mining, Web usage mining focuses specifically on finding patterns relating to users of a Web based system: who they are, what they tend to do, etc. In contrast, other types of Web data mining (e.g., Web text mining) might focus on finding patterns in the content itself. Web usage mining relies on data captured behind the scene in server logs and databases.
Data Mining: In common parlance, data mining often refers generally to the idea of probing deeply into some mountain of data. This informal use of the term usually says little about the techniques used to do the probing. In contrast, the more formal use of the term refers specifically to using computational techniques to uncover patterns in huge data sets. Here the techniques range widely from statistics to artificial intelligence. The range of data mining investigations is also varied and ever increasing, but some of the better-known approaches include clustering, classification, and affinity analysis.
Clustering Analysis: Clustering analysis is another common kind of data mining investigation. Often performed as part of an initial exploration of data, the goal is to see what natural groupings if any exist, i.e., what items in the data are alike. In the context of Web usage, a clustering analysis might reveal that the site’s users fall into two distinct groupings: those who use the site’s menu and those who go directly to specific pages within the site.
Data Warehouse: A data warehouse is typically a second home for data. In large corporate or institutional settings, data deemed important for reporting purposes is copied out of various production systems and brought together in the data warehouse where is can be preserved and analyzed. In a university setting, a student data warehouse might contain historical data gathered from a variety of systems (admissions, housing, advising, degree audit, etc.).
Affinity Analysis: Affinity analysis is one kind of data mining investigation. In this approach, the goal is to see what association rules if any exist, i.e., what actions co-occur. In the context of Web usage, an affinity analysis might yield a rule such as ‘if page A is visited, then page D is visited’ which might indicate a previously unknown navigational path popular among users. Affinity analysis is also sometime referred to as market basket analysis, as it can provide retailers with information about products that consumers purchase together.
Open Source: Open source is a term used generally for software created by the programmers who allow then allow the source code to be distributed freely. This form of distribution encourages other programmers to take up, modify, and contribute back improvements to the software. There are many variations on the openness involved in open source. In some cases, the code can be re-used in any way. In other cases, use of the code brings with it a requirement that any new system of which it becomes a part will, in turn, become open to all. Due to the aggregate nature of contributions, the major challenges for open source software product development involve organizational coordination and oversight. The community source process brings to open source a model of governance via an institutional consortium.