Introduction to Data Science

Introduction to Data Science

DOI: 10.4018/978-1-7998-3053-5.ch001
(Individual Chapters)
No Current Special Offers


This chapter focuses on introduction to the field of data science. Data science is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms, and processes. The term data science has emerged because of the evolution of mathematical statistics, data analysis, and big data. Data science helps to discover hidden patterns from the raw data. It enables to translate a business problem into a research project and then translate it back into a practical solution. The purpose of this chapter is to provide emphasis on integration and synthesis of concepts, techniques, applications, and tools to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, predictive modeling, descriptive modeling, data product creation, evaluation, and effective communication.
Chapter Preview


Data Science is the science and art of using computational methods to identify and discover influential patterns in data. The goal of data science is to gain insight from data and often to affect decisions to make them more reliable (D.Abbott, 2014).

Data is necessarily a measure of historic information so, by definition, data science examines historic data. Considering this definition data science can be defined as organizing data knowledge that can be used for experiments and prediction. The need for data science has developed due to the immense increase in the amount of raw data such as images, text, video and others.

Every field is contributing to this ever increasing data such as engineering, mining, healthcare, hospitality, energy etc. Data scientists are developing various algorithms and techniques in order to process and analyze this data and make the best use out of it. Previously health industry used hard paper to store data regarding patients and other medical issues. But the new trend has helped doctors to store this data in electronic form (Raghupathi & Raghupathi, 2014).

Space exploration has produced a large amount of data considering the recent space missions. Data scientists have also helped to store this data and use it for prediction to carry out further missions. Implementing business logics and strategies need data analysis and predictions which can be now easily done by the use of prediction algorithms.

Energy companies are using data prediction algorithms to manage the energy production based on demands and supply. This prediction has aided to use the available energy efficiently. In coming years machines will have the ability to predict and generate the required resource as per the supply and demand. The current upcoming technology artificial intelligence is also being helped tremendously by the use of data mining and prediction algorithms.

Key Terms in this Chapter

Data Science: Data Science is the science and art of using computational methods to identify and discover influential patterns in data.

Tableau: Tableau is data Visualization software specializing in graphical analysis of data. It allows its users to create interactive visualizations and dashboards.

Model Evaluation: Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future.

Predictive Modeling: Predictive modeling is a commonly used statistical technique to predict future behavior.

Descriptive Modeling: Descriptive modeling is a mathematical process that describes real-world events and the relationships between factors responsible for them.

R: R is a scripting language that is specifically tailored for statistical computing. It is widely used for data analysis, statistical modeling, time-series forecasting, clustering, etc.

Support Vectors Machine: Support vector machine (SVM) was proposed in 1995 by Cortes and Vapnikto solve problems related to multidimensional classification and regression issues as its outstanding learning performance.

Complete Chapter List

Search this Book: