We use logistic regression to estimate the probability that an event will occur as a function of other variables. An example is that the probability that a borrower will default as a function of his credit score , income, loan size, and his current debts. We will be discussing classifiers in the next lesson. Logistic regression can also be considered

## Regression – Relating input variables and outcome

The term “regression” was coined by Francis Galton in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean). Specifically, regression analysis helps one understand how the value of the dependent variable (also referred

## Apiriori Alogorithm

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other products we may have to

## Association Rules

Association Rules is another unsupervised learning method. There is no “prediction” performed but is used to discover relationships within the data. The example questions are • Which of my products tend to be purchased together? • What will other people who are like this person or product tend to buy/watch or click on for other products we may have to

## K-means clustering – Use Cases

K-means clustering is often used as a lead-in to classification. It is primarily an exploratory technique to discover the structure of the data that you might not have notice before and as a prelude to more focused analysis or decision processes. Some examples of the set of measurements based on which clustering can be performed are detailed in the slide.

## Clustering

In machine learning, “unsupervised” refers to the problem of finding a hidden structure within unlabeled data. In this lesson and the following lesson we will be discussing two unsupervised learning methods clustering and Association Rules. Clustering is a popular method used to form homogenous groups within a data set based on their internal structure. Clustering is a method often used

## Hypothesis Testing : ANOVA

ANOVA (Analysis of Variance) is a generalization of the difference of means. Here we have multiple populations, and we want to see if any of the population means are different from the others. That means that the null hypothesis is that ALL the population means are equal. An example: suppose everyone who visits our retail website either gets one of

## Hypothesis – Null and Alternative Hypothesis

Here are some examples of null and alternative hypotheses that we would be answering during the analytic lifecycle. Once we have fit a model – does it predict better than always predicting the mean value of the training data? If we call the mean value of the training data “the null model”, then the null hypothesis is that the average

## Data Exploration Vs. Presentation

Finally, we want to touch on the difference between using visualization for data exploration, and for presenting results to stakeholders. The plots and tips that we’ve discussed try to make the details of the data as clear as possible for the data scientist to see structure and relationships. These technical graphs don’t always effectively convey the information that needs to

## Establishing Multiple Pairwise Relationships between Variables

There are times when it’s useful to see multiple values of a dataset in context in order to visually represent data relationships so as to magnify differences or to show patterns hidden within the data that summary statistics don’t reveal. In the graphic represented above, the variable sepal length, sepal width, petal length and petal width are compared with three