Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Statistical Modelling and Analysis of the Computer-Simulated Datasets

M. Harshvardhan, Pritam Ranjan

Source Title: Handbook of Research on Cloud Computing and Big Data Applications in IoT

DOI: 10.4018/978-1-5225-8407-0.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Over the last two decades, the science has come a long way from relying on only physical experiments and observations to experimentation using computer simulators. This chapter focuses on the modelling and analysis of data arising from computer simulators. It turns out that traditional statistical metamodels are often not very useful for analyzing such datasets. For deterministic computer simulators, the realizations of Gaussian process (GP) models are commonly used for fitting a surrogate statistical metamodel of the simulator output. The chapter starts with a quick review of the standard GP-based statistical surrogate model. The chapter also emphasizes on the numerical instability due to near-singularity of the spatial correlation structure in the GP model fitting process. The authors also present a few generalizations of the GP model, reviews methods, and algorithms specifically developed for analyzing big data obtained from computer model runs, and reviews the popular analysis goals of such computer experiments. A few real-life computer simulators are also briefly outlined here.

Chapter Preview

Top

Introduction

In early days, when the computers were not readily accessible to common people, statisticians and data analysts focussed on the development of innovative methodologies that were efficient for analyzing small datasets. Over the last two decades, we have come a long way from relying on only physical experiments and observations to experimentation using computer simulation models, commonly referred to as the computer simulators or computer models. These simulators are software implementation of the real-world processes, imitated based on the comprehensive understanding on the underlying phenomena. The applications range from simulating socioeconomic behaviour, impact due to a car crash, manufacturing a compound for drug discovery, climate and weather forecasting, population growth of certain pest species, cosmological phenomena like dark energy and universe expansion, emulation of tidal flow for harnessing renewable energy, the simulation of a nuclear reactions, and so on. Given the easier access to high performance computing power such as cloud computing and cluster grids, computer model data is now a reality in everyday life.

In this chapter, we focus on the modelling and analysis of data sets arising from such computer simulators. Similar to the physical experiments setup, the data obtained from the computer simulator runs have to be modelled and analysed for a deeper understanding of the underlying process. However, traditional statistical metamodels are often not very useful for analyzing such datasets. This is because, many a time, these computer models are deterministic in nature, that is, the repeated runs of such a computer simulator with a fixed input settings yield the same output / response. In other words, there is no replication error for the deterministic computer simulators. Recall that in the traditional statistical models, such as regression, the main driving force for model fitting and inference part of the methodology is the distribution of replication errors.

For deterministic computer simulators, the realizations of Gaussian Process (GP) models, trained by the observed simulator data, are commonly used for fitting a surrogate statistical metamodel of the simulator output. This is particularly crucial if the simulator is expensive to run, which is the case for many complex real-life phenomena. The notion of GP models gained popularity in late 1990 and early 2000 (e.g., Santner et al. (2003); Rasmussen and Williams (2006); Fang et al. (2005)), though it was first proposed in the seminal paper of Sacks et al. (1989). Section 2 of the chapter presents a quick review of the standard GP based statistical surrogate model. We will also briefly discuss the implementation procedure using both the maximum likelihood method and the Bayesian approach.

Almost all published research articles and books focus on the new methodologies and algorithms that can be used for analyzing the computer simulator data, and not on the small nuances related to the actual implementation which is extremely useful from a practitioners’ standpoint. This chapter emphasizes on such computational issues. In particular, Section 3 of the chapter discusses the numerical instability due to near-singularity or ill-conditioning of the spatial correlation structure which is the key building block behind the flexibility of the GP-based surrogate model. In practice, the majority of researchers simply use a numerical fix to overcome this issue, but this inadvertently compromises with other aspects of the model assumptions. We present an empirical study to compare different current practices to address this ill-conditioning problem. We also discuss the best coding practices in the implementation of such model fitting exercise, for instance, which of the matrix decomposition method, LU / QR / SVD / Cholesky, is recommended from an accuracy and time efficiency perspective.

Given the revolution in the computing power, it is now easy to collect and process data sets that are spatio-temporal and functional in nature. Dynamic computer models, i.e. the simulator which returns time-series response (see Zhang et al. (2018b)), is a current hot topic of research in applied statistics and computer experiments. Section 4 of the chapter reviews several generalizations of the GP model that accounts for multiple sources of uncertainty in the simulation model, non-stationarity of the underlying processes, and dynamic nature of such computer simulator outputs.

Key Terms in this Chapter

Stationarity: (referred to weak-stationarity) In the context of response surfaces, a process is said to be non-stationary if the surface exhibit abrupt changes in the curvature and shape.

Correlation Length Parameter: It is the inverse of the correlation hyper-parameter, , in the power-exponential correlation function, and used to quantify smoothness of the fitted surrogate.

Gaussian Process: A stochastic process is said to follow Gaussian Process (GP) if every finite subset of the random variables , for arbitrary , and , follow multivariate normal distribution.

Best Linear Unbiased Predictor (BLUP): An unbiased linear predictor with minimum variance among the class of all linear unbiased predictors is called the best linear unbiased predictor. In some sense it is the best linear unbiased estimator (BLUE) of the unobserved .

Nugget: (denoted by ) It is a small positive constant added to the diagonal of the correlation matrix to evade ill-conditioning in the near-singular matrices.

Near-Singular Matrix: A matrix which has its determinant close to zero, and whose inverse is unreliable, is called near-singular matrix or ill-conditioned matrix. The extent of ill-conditioning is defined by its condition number.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Statistical Modelling and Analysis of the Computer-Simulated Datasets

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List