Linear algebra operations using R: Principal Component Analysis and Ordinary Least Squares

Photo by Vlado Paunovic on Unsplash

Introduction

Principal Component Analysis (PCA) and Ordinary Least Squares (OLS) are two important statistical methods. They are even better when performed together. We will explore these methods using matrix operations in R and introduce a basic Principal Component Regression (PCR) technique.

Data generating

We will generate a simple data set of four highly correlated exploratory variables from the Gaussian distribution, and a response variable which will be a linear combination of them with added random noise.

> library(‘MASS’)

> mu=rep(3,4)
> sigma=matrix(.9, nrow=4, ncol=4) + diag(4)*0.1
> set.seed(2021)
> data <- as.data.frame(mvrnorm(20, mu = mu, Sigma = sigma),
+ empirical = T)
>…


Estimation of regression coefficients with implementation in R

Photo by Andrew Ridley on Unsplash

Introduction

Bootstrap is a method of random sampling with replacement. Among its other applications such as hypothesis testing, it is a simple yet powerful approach for checking the stability of regression coefficients. In our previous article, we explored the permutation test, which is a related concept but executed without replacement.

Linear regression relies on several assumptions, and the coefficients of the formulas are presumably normally distributed under the CLT. It shows that on average if we repeated the experiment thousands and thousands of times, the line would be in confidence intervals. The bootstrap approach does not rely on those assumptions*, but…


Exploring a powerful simulation technique with implementation from scratch in R

Photo by Eric Prouzet on Unsplash

Introduction

To compare outcomes in experiments, we often use Student’s t-test. It assumes that data are randomly selected from the population, arrived in large samples (>30), or normally distributed with equal variances between groups.
If we do not happen to meet these assumptions, we may use one of the simulation tests. In this article, we will introduce the Permutation Test.

Rather than assuming underlying distribution, the permutation test builds its distribution, breaking up the associations between or among groups. Often we are interested in the difference of means or medians between the groups, and the null hypothesis is that there is…


The Difference in differences and Propensity score matching in R

Photo by Jason Yuen on Unsplash

Prerequisite

When we perform traditional AB testing, we need a randomized environment for the experiment. But what if we cannot randomly choose the participants?

In this article, we will explore two powerful techniques for estimating the effect in non-randomized experiments: difference in differences and propensity score matching. We will briefly introduce these methods using a classical case study David Card and Alan B. Krueger conducted in 1994.

Introduction

In our previous example, we estimated the effect of different sales approaches in a randomized environment.

Suppose now that we have a retail chain with presentence in different cities or countries. We want to…


A quick intro to this classification algorithm using sklearn

Photo by Tom Robertson on Unsplash

Introduction

From this article, we will learn how to run a Decision Tree classifier using Python and sklearn package. Also, you will understand how to split the dataset on training and testing sets, and how to measure the accuracy of the model. Finally, we will plot the model.

Modeling in Python

For this algorithm, there are fewer than 25 lines of code, which is typical for some of the Machine Learning algorithms.


The role of R&D in the economy

Photo by Jéan Béller on Unsplash

Introduction

These are the economies of South Korea, Taiwan, Singapore, and Hong Kong. The World Bank database (https://data.worldbank.org) for some reason does not have the data for Taiwan, therefore, we will look into the economies of only Three Tigers.

Although these Tigers are in the same club, in this article we will explore some of their differences. For our analysis, we have obtained various features such as GDP, Electric power consumption, Patent applications, Researchers in R&D, Export/Import, and many others. …

Serafim Petrov

Data Analytics in R and Python, Machine Learning, Management, Innovations, Mathematics, Chess, Kafka, Joyce, Stravinsky, Rossini, Bubble Tea and Ice Cream

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store