Speeding Up the Internet in Big Data Era: Exploiting Historical User Request Patterns for Web Caching to Reduce User Delays

Speeding Up the Internet in Big Data Era: Exploiting Historical User Request Patterns for Web Caching to Reduce User Delays

Chetan (Chet) Kumar (California State University – San Marcos, USA)
DOI: 10.4018/978-1-4666-9787-4.ch062
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

Introduction

The Internet has witnessed a tremendous growth in the amount of available information, and this trend of increasing traffic is likely to continue. The rapid rise of Big Data across the technology world has lead to an explosion of data. According to McAfee and Brynjolfsson (2012) the key characteristic of Big Data that separates it from analytics of the past is the volume, velocity, and variety of data. They quantify that more data now cross the Internet every second than were stored in the entire Internet 20 years ago. A few excerpts from McAfee and Brynjolfsson (2012) on volume, velocity, and variety of Big Data are as follows:

  • Volume:As of 2012, about 2.5 exabytes of data are created each day, and that number is doubling every 40 months or so...This gives companies an opportunity to work with many petabyes of data in a single data set—and not just from the internet. For instance, it is estimated that Walmart collects more than 2.5 petabytes of data every hour from its customer transactions.

  • Velocity: For many applications, the speed of data creation is even more important than the volume. Real-time or nearly real-time information makes it possible for a company to be much more agile than its competitors. For instance,…a group at MIT Media Lab used location data from mobile phones to infer how many people were in Macy’s parking lots on Black Friday…This made it possible to estimate the retailer’s sales on that critical day even before Macy’s itself had recorded those sales.

  • Variety:Big data takes the form of messages, updates, and images posted to social networks; readings from sensors; GPS signals from cell phones, and more. Many of the most important sources of big data are relatively new. The huge amounts of information from social networks, for example, are only as old as the networks themselves; Facebook was launched in 2004, Twitter in 2006. The same holds for smartphones and the other mobile devices that now provide enormous streams of data tied to people, activities, and locations. (McAfee and Brynjolfsson 2012)

According to IDC Worldwide Big Data Technology and Services 2014–2018 Forecast, the trend of increasing Big Data applications online is to continue (IDC Forecast Report 2014). The Report states: “IDC expects the Big Data technology and services market to grow at a 26.24% compound annual growth rate through 2018 to reach $41.52 billion.” Despite technological advances this traffic increase can lead to significant user delays in web access (Sorn & Tsuyoshi 2013, Zhao & Wu 2013, Kumar 2010, Hosanagar & Tan 2004, Datta et al. 2003).

Web caching is one approach to reduce such delays. Caching involves temporary storage of web object copies at locations that are relatively close to the end user. As a result user requests can be served faster than if they were served directly from the origin web server (Davison, 2013; Ali et al,. 2012; Hosanagar & Tan, 2004).

Key Terms in this Chapter

Big Data: Vast amounts of data generated today characterized by volume, velocity, and variety.

Dynamic Documents: Web documents such as website front pages that often change contents.

Proxy Caches: These caches are located at computer network access points for web users. Proxy caches can store copies of web objects and directly serve requests for them in the network. Therefore they reduce user delays by avoiding repeated requests to origin web servers.

Web 2.0 Technologies: Web traffic that primarily consists of user generated content such as video and social networking and collaboration.

Static Documents: Those web documents that are unaltered in content and size.

Origin Web Server: The server where web content originates. User requests that are satisfied by the origin server typically have the longest waiting times.

Web Caching: This involves temporary storage of web object copies at locations that are relatively close to the end user. Consequently user requests can be served faster than if they were served directly from the origin web server.

Least Recently Used (LRU) Caching Policy: LRU is a popular cache replacement strategy where the least recently requested object is evicted from the cache to make space for a new one.

Historical User Request Patterns: These are user web object request patterns that have previously been observed. For example, at proxy level users typically re-access documents on a daily basis, and demand for a document spikes in multiples of 24 hours.

Complete Chapter List

Search this Book:
Reset