Boosting (Originally called hypothesis boosting) refers to any ensemble method that can combine several weak learners and their mistakes into a strong learner.

The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor. There are many boosting methods available, but by far the most popular ones are AdaBoost (short for Adaptive Boosting) and Gradient Boosting.

In the picture below the single model is a decision tree and you just have one model train on your data and that’s it.

With Bagging, we are training a lot of different trees separately and then…

Random Forest is very similar to bagging trees (recommend reading my blog about bagging trees) but there is an added feature of randomness in the random forest that makes the difference from bagged trees.

To construct a random forest estimator, what we need is:

1- Bootstrap the entire dataset

2- Build a tree using only a random subset of the features at each node from the bootstrapped dataset. (New Step)

3- Repeat steps 1 and 2 many, many times, and aggregate all the trees.

4- Output prediction through each tree

5- For regression, take the average of prediction. …

Any method that combines more than one model is known as an Ensemble Method. In this post, we are going to talk about Bagged trees which is a kind of ensemble method.

For any bagged trees these are the steps that we are going to do to have our model running:

1- Bootstrap the entire dataset

2- Build a tree using the bootstrapped dataset

3- Repeat step 1 and 2 many times and aggregate all trees

4- Output prediction through each tree

5- For regression, take the average of the prediction. For classification, take the majority predicted value

We know…

In this post, we introduce and develop the concept of independence between events. The general idea is the following:

If I tell you that a certain event A has occurred, this will generally change the probability of some other event B. Probabilities will have to be replaced by conditional probabilities. But if the conditional probability turns out to be the same as the unconditional probability, then the occurrence of event A does not carry any useful information on whether event B will occur. In such a case, we say that events A and B are independent. …

In this post, we are going to talk about one of the most important algorithms in machine learning, which is called Decision Trees. They are important because without them other state-of-the-art algorithms like Bagging and Boosting wouldn’t have been invented.

First, Let’s talk about the Pros and Cons and then paraphrase them one by one as we go along.

1- Can be used for both classification and regression

2- Can be displayed graphically therefore they are easily interpretable

3- Non-parametric

4- Features don’t need scaling

5- Automatically account for interactions

1- Tend to not perform very well compared to the…

K-Nearest Neighbors (aka KNN) is built of distance. In other words, we compute the similarity of each data point to all other existing data points in a data set by calculating its distance to others to realize which data point is closer to that specific data point.

To calculate the distance between these data points, there are a vast variety of distance calculations, we are going to study a bunch of distance metrics that can be useful for the KNN algorithm.

The Manhattan distance is the sum of the absolute values of cartesian coordinates (just the sum of numbers, not…

First, let me introduce you to some vocabularies in the language of A/B Testing and optimization:

• **Element** — a discrete unit on the page: a block of text, a form, a button, an image, etc.

• **Page** — a web page or landing page that is considered the control page for your test.

• **Variation** — a version of a page that has some changes made to page elements. Also referred to as a variant.

• **Test **— a hypothesis that one version of an element will change the conversion rate in a significant, hopefully, beneficial way.

**Conversion**—…

In this blog, we will write about regularization. we will discuss its purpose and how it works.

If there is one thing that jeopardizes a perfect Neural Network that would be overfitting. Overfitting refers to situations where the model has fit the training data so well that the model captures the noise and random fluctuations.

I assume we already know about using a validation set and early stopping in order to prevent overfitting from happening. Unfortunately, I have to say that these approaches are not 100% reliable. There may be certain situations where the validation loss stays the same or…

There are so many different types of machine learning systems that it is useful to classify them into broad categories, based on the following criteria:

- Whether or not they are trained with human supervision (supervised, unsupervised, semi-supervised, and reinforcement learning)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model, much like scientists do (instance-based vs. Model-based learning)

These criteria are not exclusive; you can combine them in any…

For the simplicity of the article, I decided to make most of my examples on a simple regression model (one independent variable and the target variable). However, they can be applied to multiple linear regression model, and indeed can be expanded to other forms of general linear models with a single target variable ANOVA, ANCOVA, and independent samples t-tests.

To have a better model in this regard consistency and efficiency play a vital role, consider our estimation method as Ordinary Least Squares (OLS) as is usually the case.

M.Sc in Information Technology