# Use of Simulation in Statistics

## 20 September, 2019

### Simulation in Various Fields

farming simulation

traffic simulation

flood simulation

swimming simulation

simulation of posterior distribution

# Simulation is the creation of a model that can be manipulated logically to decide how the physical world works.

Dr. Richard Gran

Source: https://www.youtube.com/watch?v=OCMafswcNkY

# Simulation in Statistics

## Data Analysis Process

A model is a conceptual representation of a relationship, a system or an aspect of a real world

Example: When you are buying a second-hand car, the car travelled a longer distance cost more.

\begin{aligned} \text{price} &= \beta_0 + \beta_1\text{mileage} + \varepsilon\\ \varepsilon &\sim \textsf{N}(0, \sigma^2) \end{aligned}

## Reverse Engineering

• We fix the parameters
• Identify a distribution
• Sample the data

Most of the distribution can be approximated by simulating data from a uniform distribution and manipulating the values.

\begin{aligned} \text{price} &= \beta_0 + \beta_1\text{mileage} + \varepsilon\\ \varepsilon &\sim \textsf{N}(0, \sigma^2) \end{aligned}

## Why Simulate

• Collecting data to answer these what-ifs is expensive
• Find answers that is difficult and sometimes impossible to compute analytically
• Mimicking a real system (a scenario, a process etc)
• Generating data from a model
• Random Variables are the basic components

## Deterministic and Stochastic Simulation

• Stochastic contains any probabilistic components
• Deterministic has fixed output and can not be generalized

## Random Variates

• Random variables are the building blocks of any complex system
• Random variables following any distribution can be obtained from sampling and manipulating uniform random variates U(0,1).
• Generating samples from U(0,1) needs random numbers.

## Random Numbers

• Building blocks of any stochastic simulation
• Are random numbers generated by computers really random?

### Pseudo-Random Numbers:

• Deterministic but unpredictable unless the generating mechanism is known
• Usually, a seed is used to reproduce the same random number sequence.

## Random Variables from Different Distributions

runif(1000, min = 0, max = 1)
rchisq(1000, df = 2)
rnorm(mean = 0, sd = 1)
rgamma(1000, shape = 2)
rcauchy(1000, scale = 1.5)
rbeta(1000, 1.3, 2.4)

In most software, we can draw random samples from a different distribution.

# Uses of Simulation

## Monte Carlo Methods

Random sample to solve problems that might be deterministic

Used in Optimization, Numerical Integration, Drawing samples from a probability distribution, etc

## Methods Based on Monte Carlo

• Inverse Transform Method
• Acceptance-Rejection Method
• Markov Chain Monte Carlo
• Metropolis-Hastings Algorithm
• Gibbs Sampling
• Bayesian Analysis
• Resampling Techniques
• Jackknifing
• Bootstrapping
• Permutation test

source: http://tiny.cc/eistcz

## Methods based on Monte Carlo

• Inverse Transform Method
• Acceptance-Rejection Method
• Markov Chain Monte Carlo
• Metropolis-Hastings Algorithm
• Gibbs Sampling
• Bayesian Analysis
• Resampling Techniques
• Jackknifing
• Bootstrapping
• Permutation test

## Methods based on Monte Carlo

• Inverse Transform Method
• Acceptance-Rejection Method
• Markov Chain Monte Carlo
• Metropolis-Hastings Algorithm
• Gibbs Sampling
• Bayesian Analysis
• Resampling Techniques
• Jackknifing
• Bootstrapping
• Permutation test

## Methods based on Monte Carlo

• Inverse Transform Method
• Acceptance-Rejection Method
• Markov Chain Monte Carlo
• Metropolis-Hastings Algorithm
• Gibbs Sampling
• Bayesian Analysis
• Resampling Techniques
• Jackknifing
• Bootstrapping
• Permutation test

## Methods that uses Simulation

• Inverse Transform Method
• Acceptance-Rejection Method
• Markov Chain Monte Carlo
• Metropolis-Hastings Algorithm
• Gibbs Sampling
• Bayesian Analysis
• Resampling Techniques
• Jackknifing
• Bootstrapping
• Permutation test
• Random Cross-validation

## Methods that uses Simulation

• Inverse Transform Method
• Acceptance-Rejection Method
• Markov Chain Monte Carlo
• Metropolis-Hastings Algorithm
• Gibbs Sampling
• Bayesian Analysis
• Resampling Techniques
• Jackknifing
• Bootstrapping
• Permutation test
• Random Cross-validation

# Uses of Simulation

## Generating Data

• Study the effect of problems while deploying a method or a technique
• Assessing accuracy and problems of new methods against difficult data structures

### Use of Generated Data

• R-packages:
• simrel
• simulator
• simTools
• simglm, ...
• Python-packages:
• Other software:

## Generating Data

• R-packages:
• Python-packages:
• simpy
• pysimrel
• numpy
• Other software:
• stata
• SAS
• Study the effect of problems while deploying a method or a technique
• Assessing accuracy and problems of new methods against difficult data structures

## Experimental Design

Proper use of experimental design makes the simulation more effective. Consider a model,

\mathbf{y} = \boldsymbol{\mu}_y + \boldsymbol{\beta}^t(\mathbf{x} - \boldsymbol{\mu}_x) + \boldsymbol{\varepsilon}

## Experimental Design

\mathbf{y} = \boldsymbol{\mu}_y + \boldsymbol{\beta}^t(\mathbf{x} - \boldsymbol{\mu}_x) + \boldsymbol{\varepsilon}

Proper use of experimental design makes the simulation more effective. Consider a model,

## ShinyApplication

Local Source

## Simulation in Research Studies

• DOI: 10.1016/j.chemolab.2019.05.004: Comparison of multi-response prediction methods
• DOI: 10.1080/00401706.2013.872700: Simultaneous Envelopes for Multivariate Linear Regression
• DOI: 10.1111/j.1467-9469.2011.00770.x: Near Optimal Prediction from Relevant Components

• Numerous studies can be obtained both in the past and present
• With Increase in computing power, the trend is increasing

## Modern Application in Machine Learning

• Combine Simulation with Machine Learning
• Generate artificial training samples
• Add domain-specific knowledge to machine learning through simulation

Following are some of these studies:

# Limitations of Simulation

## Cautious Simulation

• Fake data give fake results
• Simulated data is not a replacement for real data
• Trying to get something from nothing
• Making the analysis and results reproducible and open-source
• Unclear experimental design and poor reporting of results

### Simulation illuminates important points and build up a picture of the landscape, but can not illuminate the entire landscape.

- Patrick Landscape

• DOI: 10.1002/sim.8086: Using simulation studies to evaluate statistical methods

# References

## References

1. Ripley, B. D. (2009). Stochastic simulation (Vol. 316). John Wiley & Sons.
2. Jones, O., Maillardet, R., & Robinson, A. (2014). Introduction to scientific programming and simulation using R. Chapman and Hall/CRC.
3. Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in medicine, 38(11), 2074-2102.
4. Ross, S. M. (2014). Introduction to probability models. Academic press.
5. Ripley, B. D. (1988). Uses and abuses of statistical simulation. Mathematical Programming, 42(1-3), 53-68.
6. Knežo, D., & Vagaská, A. (2019). Monte Carlo Method Application and Generation of Random Numbers by Usage of Numerical Methods. In Models and Theories in Social Systems (pp. 197-207). Springer, Cham.
7. Birta Louis, G., & Gilbert, A. (2007). Modelling and Simulation: Exploring Dynamic System Behaviour. Ottawa: School of information technology and engineering.
8. Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24(3), 136-156.
9. Sæbø, S., Almøy, T., & Helland, I. S. (2015). simrel—A versatile tool for linear model data simulation based on the concept of a relevant subspace and relevant predictors. Chemometrics and Intelligent Laboratory Systems, 146, 128-135.