Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Scalable Data Mining, Archiving, and Big Data Management for the Next Generation Astronomical Telescopes

Chris A. Mattmann, Andrew Hart, Luca Cinquini, Joseph Lazio, Shakeh Khudikyan, Dayton Jones, Robert Preston, Thomas Bennett, Bryan Butler, David Harland, Brian Glendenning, Jeff Kern, James Robnett

Source Title: Big Data Management, Technologies, and Applications

DOI: 10.4018/978-1-4666-4699-5.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data as a paradigm focuses on data volume, velocity, and on the number and complexity of various data formats and metadata, a set of information that describes other data types. This is nowhere better seen than in the development of the software to support next generation astronomical instruments including the MeerKAT/KAT-7 Square Kilometre Array (SKA) precursor in South Africa, in the Low Frequency Array (LOFAR) in Europe, in two instruments led in part by the U.S. National Radio Astronomy Observatory (NRAO) with its Expanded Very Large Array (EVLA) in Socorro, NM, and Atacama Large Millimeter Array (ALMA) in Chile, and in other instruments such as the Large Synoptic Survey Telescope (LSST) to be built in northern Chile. This chapter highlights the big data challenges in constructing data management systems for these astronomical instruments, specifically the challenge of integrating legacy science codes, handling data movement and triage, building flexible science data portals and user interfaces, allowing for flexible technology deployment scenarios, and in automatically and rapidly mitigating the difference in science data formats and metadata models. The authors discuss these challenges and then suggest open source solutions to them based on software from the Apache Software Foundation including Apache Object-Oriented Data Technology (OODT), Tika, and Solr. The authors have leveraged these solutions to effectively and expeditiously build many precursor and operational software systems to handle data from these astronomical instruments and to prepare for the coming data deluge from those not constructed yet. Their solutions are not specific to the astronomical domain and they are already applicable to a number of science domains including Earth, planetary, and biomedicine.

Chapter Preview

Top

1. Introduction

The next generation of astronomical telescopes including MeerKAT/KAT-7 in South Africa (Jonas 2009), the Low Frequency Array (LOFAR) in Europe (De Vos, 2009), the Expanded Very Large Array (EVLA) in Socorro, New Mexico (Perley, 2011), the Atacama Large Millimeter Array (ALMA) in Chile (Wootten, 2003) and eventually over the next decade the cross-continental Square Kilometre Array (SKA) (Hall, 2004), and the Large Synoptic Survey Telescope (LSST) in northern Chile (Tyson, 2002) will generate unprecedented volumes of data, stretching from the near terabyte (TB) of data/day range for EVLA on the lower bounds to the 700 TB of data per second range for the SKA. These ground-based instruments will push the boundaries of Big Data (Lynch, 2008) (Mattmann, 2013) in several dimensions shown in Table 1. Table 1 represents the common challenges that users, educators, scientists, and other discipline users face when leveraging astronomical data, namely its size (volume, velocity); variety of formats (complexity); the geographically distributed nature of these telescopes, and the limitations in bandwidth that prevents the wide dissemination of the information throughout the world’s users who desire access to it. Big data is the buzzword of the day, used to define data sets so large and complex that traditional data management systems have difficulties handling them. There are three main challenges when dealing with big data: the amount of data collected (volume), the speed at which data must be analyzed (velocity), and the array of different data formats that is collected (complexity).

Table 1.

Big data challenges and their mappings to upcoming or current astronomical instruments. Challenges are labeled as C1, C2 and C3.

Big Data Challenge		Description
C1	Volume	Across all science domains, the SKA will set the precedent in many ways when it sees first light in 2020 in terms of data volume. For example, it will generate exabytes (10¹⁸ bytes) in days, eclipsing the size of the current Internet in that same time span. LOFAR is already at the petabyte (10¹⁵ bytes) per day range. EVLA is generating hundreds of terabytes per experiment, and per month. ALMA will generate similar volumes.
C2	Velocity	Not only are these astronomical instruments generating large volumes, but also they are doing so in a rapid fashion. For example, the SKA will generate 700Tb/sec; LOFAR is already generating 138Tb/day, other instruments such as EVLA are generating on the order of terabytes per day. Some processing stages have larger data rates (e.g., data staging of raw instrument measurements), while others (data reduction) may have comparatively smaller rates.
C3	Complexity	Each of these ground-based instruments stores data in a number of different formats, and metadata models, for example, the EVLA and ALMA store data in a custom binary and metadata directory-based format called Measurement Sets (MS), and also in the FITS format (Hanisch, 2001). Some of these communities, e.g., LOFAR and the SKA South Africa project have already made the transition to HDF-5 (Fortner, 1998) for their image cubes. The ability to automatically facilitate transformations between these different formats is also a characteristic of these projects as Big Data.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Scalable Data Mining, Archiving, and Big Data Management for the Next Generation Astronomical Telescopes

Abstract

1. Introduction

Complete Chapter List