# Introduction

Principal Component Analysis (PCA) and Ordinary Least Squares (OLS) are two important statistical methods. They are even better when performed together. We will explore these methods using matrix operations in R and introduce a basic Principal Component Regression (PCR) technique.

# Data generating

We will generate a simple data set of four highly correlated exploratory variables from the Gaussian distribution, and a response variable which will be a linear combination of them with added random noise.

`> library(‘MASS’)> mu=rep(3,4)> sigma=matrix(.9, nrow=4, ncol=4) + diag(4)*0.1> set.seed(2021)> data <- as.data.frame(mvrnorm(20, mu = mu, Sigma = sigma), + empirical = T)>…`

# Introduction

Bootstrap is a method of random sampling with replacement. Among its other applications such as hypothesis testing, it is a simple yet powerful approach for checking the stability of regression coefficients. In our previous article, we explored the permutation test, which is a related concept but executed without replacement.

Linear regression relies on several assumptions, and the coefficients of the formulas are presumably normally distributed under the CLT. It shows that on average if we repeated the experiment thousands and thousands of times, the line would be in confidence intervals. The bootstrap approach does not rely on those assumptions*, but…

# Introduction

To compare outcomes in experiments, we often use Student’s t-test. It assumes that data are randomly selected from the population, arrived in large samples (>30), or normally distributed with equal variances between groups.
If we do not happen to meet these assumptions, we may use one of the simulation tests. In this article, we will introduce the Permutation Test.

Rather than assuming underlying distribution, the permutation test builds its distribution, breaking up the associations between or among groups. Often we are interested in the difference of means or medians between the groups, and the null hypothesis is that there is…

# Prerequisite

When we perform traditional AB testing, we need a randomized environment for the experiment. But what if we cannot randomly choose the participants?

In this article, we will explore two powerful techniques for estimating the effect in non-randomized experiments: difference in differences and propensity score matching. We will briefly introduce these methods using a classical case study David Card and Alan B. Krueger conducted in 1994.

# Introduction

In our previous example, we estimated the effect of different sales approaches in a randomized environment.

Suppose now that we have a retail chain with presentence in different cities or countries. We want to…

# Decision Trees in Python

## Introduction

From this article, we will learn how to run a Decision Tree classifier using Python and sklearn package. Also, you will understand how to split the dataset on training and testing sets, and how to measure the accuracy of the model. Finally, we will plot the model.

## Modeling in Python

For this algorithm, there are fewer than 25 lines of code, which is typical for some of the Machine Learning algorithms.

# Four Asian Tigers (actually, 3)

## The role of R&D in the economy 