PhD Midway Seminar

Simulation Tool and its application

Raju Rimal

Supervisors:

Solve Sæbø

Tryge Almøy

08 March, 2017

Introduction

My PhD Plan

Why I am doing this

Important for:

Research
Education and
Method Evaluation

What I learn

Advanced Multivariate methods and their properties
Programming concept for developing statistical packages and applications for various statistical methods
Extending and improving existing methods in statistics
And, obviously, to properly document what I have done

Today’s Special

Today I will talk about:

A comparative study of various estimation techniques by simulating linear model data using simulatr in single response situation Demonstration
Simulation tool (simulatr) we are building

A comparative study of different estimation methods using simulated data

Overview

Four estimtion methods were considered

Ordinary Least Squares (OLS)

Although unbiased, suffer highly from multicollinearity
Widely used and can be used as reference for comparison

Partial Least Squares (PLS)

Well established and widely used method
Based on Latent Structure and free of multicollinearity problem

Overview

Four estimation methods were considered

Envelope

Relatively new method (Cook, Helland, & Su, 2013) and is also based on reduction of regression model
Based on Maximum Likelihood but works better than OLS in \(p\) approaches \(n\)

Bayes PLS

Bayesian Estimation of regression coefficient
Promising performance was shown in previous studies (I. S. Helland, Sæbø, & Tjelmeland, 2012)

Simulation Design

Population Parameters were set as follows:

Number of sample observations: 50
Number of predictor variables: 15 and 40
Coefficient of determination \((R^2)\): 0.5 and 0.9
Level of multicollinearity: 0.5 and 0.9
Position of relevant components: 1 and 2; 1 and 3; 2 and 3; 1, 2 and 3

From the combination of above parameters, 32 datasets were simulated with 5 replication of each, i.e. 160 datasets with 5 of them having similar population properties.

A Systematic Comparison

Bayes PLS has out-performed others methods
Envelope performed better than OLS
OLS prediction: very poor in noisy data

A Systematic Comparison

Bayes PLS has approached to its minimum error with very few component and remained low for additional component
PLS has moderate performance but better than envelope in many situations.
OLS prediction is poor especially with large number of predictor
Envelope method captured its minimum error and the error increased with additional components

`simrel-m`: A versatile tool for simulating multi-response linear model data

`simrel-m`

It is an extension of simrel (Sæbø, Almøy, & Helland, 2015) r-package for simulating multi-response data

Based on idea of reduction of random regression model
It separates \(X\) into subspaces that is relevant and irrelevant for predicting each response
It re-parameterize the population model, \[ \mathbf{Y} = \boldsymbol{\mu}_{Y} + \mathbf{B}^t\left(\mathbf{X} - \boldsymbol{\mu}_X\right) + \boldsymbol{\epsilon} \text{, where }\boldsymbol{\epsilon} \sim N(0, \boldsymbol{\Sigma}_{Y|X}) \]
It can simulate diverse nature of data with very few parameters

How it works

Collect input parameters from user
Make a covariance matrix satisfying those input parameters
Computes true population properties such as regression coefficients
Sample calibration and validation sets

Demonstration

`simulatr` Application

References

Cook, R., Helland, I., & Su, Z. (2013). Envelopes and partial least squares regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(5), 851–877.

Helland, I. S., Sæbø, S., & Tjelmeland. (2012). Near optimal prediction from relevant components. Scandinavian Journal of Statistics, 39(4), 695–713.

Sæbø, S., Almøy, T., & Helland, I. S. (2015). Simrel—A versatile tool for linear model data simulation based on the concept of a relevant subspace and relevant predictors. Chemometrics and Intelligent Laboratory Systems, 146, 128–135.

PhD Midway Seminar

Simulation Tool and its application

08 March, 2017

Introduction

My PhD Plan

Why I am doing this

What I learn

Today’s Special

A comparative study of different estimation methods using simulated data

Overview

Ordinary Least Squares (OLS)

Partial Least Squares (PLS)

Overview

Envelope

Bayes PLS

Simulation Design

A Systematic Comparison

A Systematic Comparison

simrel-m: A versatile tool for simulating multi-response linear model data

simrel-m

How it works

Demonstration

simulatr Application

References

References

`simrel-m`: A versatile tool for simulating multi-response linear model data

`simrel-m`

`simulatr` Application