A Study of Big Data Analytical Frameworks in Research Data Management Using Data Mining Techniques

A Study of Big Data Analytical Frameworks in Research Data Management Using Data Mining Techniques

Madhavi Arun Vaidya, Meghana Sanjeeva
DOI: 10.4018/978-1-7998-3476-2.ch004
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Research, which is an integral part of higher education, is undergoing a metamorphosis. Researchers across disciplines are increasingly utilizing electronic tools to collect, analyze, and organize data. This “data deluge” creates a need to develop policies, infrastructures, and services in organisations, with the objective of assisting researchers in creating, collecting, manipulating, analysing, transporting, storing, and preserving datasets. Research is now conducted in the digital realm, with researchers generating and exchanging data among themselves. Research data management in context with library data could also be treated as big data without doubt due its properties of large volume, high velocity, and obvious variety. To sum up, it can be said that big datasets need to be more useful, visible, and accessible. With new and powerful analytics of big data, such as information visualization tools, researchers can look at data in new ways and mine it for information they intend to have.
Chapter Preview
Top

Definitions

Research data management (RDM) is about “the organization of data, from its entry to the research cycle through the dissemination and archiving of valuable results” (Whyte and Tedds, 2011).

Big data is an evolving term that describes a large volume of structured, semi-structured and unstructured data that has the potential to be mined for information and used in machine learning projects and other advanced analytics applications.

Top

Introduction

Research data management concerns the organization of data, from its entry to the research cycle through to the dissemination and archiving of valuable results. Research Data Management is part of the research process, and aims to make the research process as efficient as possible, and meet expectations and requirements of the university, research funders, and legislation. Pinfield,Cox and Smith (2014) mentioned that RDM consists of a number of different activities and processes associated with the data lifecycle, involving the design and creation of data, storage, security, preservation, retrieval, sharing, and reuse, all taking into account technical capabilities, ethical considerations, legal issues and governance frameworks.

Data produced as part of research take a wide range of forms, from statistics and experimental results to interview recordings and transcripts (Borgman, 2012). Data could exist as physical records or files on a researcher’s computer or terabytes of data on shared servers. This chapter is divided into various sections viz. section I elaborates on the need of data processing whereas section II, III and IV exemplifies the Literature Review, Need of research data management and Role of Big Data in research data management. Section V describes role of research management in context with Big Data in Library management. Section VI and VII elaborate on various techniques for Big Data Analysis in research data management and on cloud.

Before the data has to be stored it has to be processed. It is essential to st dy why the data has to be processed. There are certain reasons for which the data is being processed. The data can be:

  • Incomplete: Lacking attribute values, containing attribute data.

  • Noisy: Containing errors or outliers.

  • Inconsistent: Containing discrepancies in code or names.

  • The quality data should be available.

  • To obtain the required information from huge, incomplete, noisy and inconsistent set of data is the need of data processing. Data Processing follows the following steps:

  • Data Cleaning

  • Data Integration

  • Data Transformation

  • Data Reduction

  • Data Summarization

Key Terms in this Chapter

RDM: Research data management (or RDM) is a term which describes the research process has the organization, storage, preservation, and sharing of data collected and used in a research project. It involves the management of research data during the lifetime of a research project.

Dataverse: The Dataverse repository platform enables the building of repositories without having to implement from scratch all the standards and best practices needed to fully support data sharing and archiving.

Curation: Data curation is the management of data, to ensure that data is reliably retrievable for future research purposes or reuse.

DMP Tool (Data Management Plan Tool): Guides researchers on how to create, review, and share data management plans that meet institutional and funder requirements.

Big Data: Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

Complete Chapter List

Search this Book:
Reset