How Netflix Uses Correlations to Predict What We Like

Statistics may not have been the most popular subject in school, but it’s a VIP in technology. You’re witnessing statistical correlations every time you see suggestions based on purchase history or recommendations based on “likes.” Netflix is perhaps the all-star when it comes to correlations. How many times have you asked yourself, ”
How did Netflix know I would love that show?!” To understand how Netflix uses correlations to recommend shows we may enjoy using their algorithm, we’ll break down the concepts behind their technology’s statistics.

Wait, what is a correlation again?

A correlation allows you to compare 2 things to see how similar they are. One of the best examples is
summer temperatures and ice cream sales. Ice cream sales are positively correlated to the temperature. As the temperature goes up, so do ice cream sales. Correlations are so powerful because you can compare 2 things with completely different measurements. In the ice cream sales example, ice cream sales are measured in Dollars and temperature is measured in Degrees Fahrenheit.

Positive vs Negative Correlations

Positive Correlations

  • If a change in one is associated with a change in the other in the same direction
  • Example: Weight and height. Taller people weigh more on average than shorter people.

Negative Correlations

  • If a change in one is associated with a change in the other in the opposite direction.
  • Example: Exercise and weight. The more you exercise on average the less you weigh.

How to Measure It

  • A correlation goes from -1 to 1.
  • A correlation of 1 is often described as a perfect positive correlation. That means that every change in one variable is associated with an equivalent change in the other variable in the same direction.
  • A correlation of -1 is a perfect negative correlation. That means that a change in one variable is associated with an equivalent change in the other variable in the opposite direction.
  • A correlation of 0 means that the variables have no meaningful association with one another. Example: The relationship between shoe size and SAT scores.
  • The closer to -1 or 1, the stronger the association.

How to Calculate a Correlation

1. Convert the first measurement to standard units: (measurement – mean) / standard deviation

2. Convert the second measurement to standard units: (measurement – mean) / standard deviation

3. Calculate the product for each (result from #1) X (result from #2)

4. Calculate the correlation coefficient by using the sum of the products calculated above divided by the number of observations.

How Netflix uses this to tell you which movies you will like

Disclaimer: This is a simplified version of what Netflix actually uses. Netflix had a 1 Million Dollar prize to the team that came up with the best algorithm, but it is a fancy version of using correlations. As a result, the new rating system has two options: ? or ? (and they can do this because they have a ginormous dataset). Netflix basically creates a correlation between individuals that rate movies the same. The more positively correlated you are to someone, the more likely you are to like a movie they have rated positively that you haven’t rated yet. The more negatively correlated you are to someone, the more likely you are to dislike a movie they have rated positively and thus, this shouldn’t show up in your feed of suggested movies.

Netflix Example

Suppose we have 4 people who have rated movies on a scale of 1 to 5 stars with 1 being disliking the movie and 5 being loving the movie.

MovieAdamLindsayAustinSarah
Top Gun4152
Jurassic Park5253
Office Space5351
Message in a Bottle1415
Sleepless in Seattle1511
Titanic4153
Predator5252
Terminator5352
Anchorman5452

Correlate Adam to 3 others

First, we need to calculate the mean (average) of Adam’s ratings.

Let’s say we wanted to know how similarly correlated Adam is to the other 3 people.
Note: In Excel, you can use the AVERAGE function.

1. Sum all the ratings:

4 + 5 + 5 + 1 + 1 + 4 + 5 + 5 + 5 = 35

2. Divide by number of ratings:

35 / 9 = 3.89

Next, we need to calculate the standard deviation of Adam’s ratings

Note: In Excel, you can use the STDEVP function.

1. For each rating, subtract the mean and square the result:

(4 - 3.89)^2 = 0.01
(5 - 3.89)^2 = 1.23
...

2. Calculate Average of the results:

= 0.01 + 1.23 + 1.23 + 8.35 + 8.35 + 0.01 + 1.23 + 1.23 + 1.23 
= 22.89 / 9 
= 2.54

3. Take the square root of the Average

= √(2.54) = 1.59

Convert Adam’s rating of each movie to standard units

This calculation is (rating – mean) / standard deviation.

Top Gun = (4 - 3.89) / 1.59 = 0.07
Jurassic Park = (5 - 3.89) / 1.59 = 0.70
...

Next, we need to follow the previous 3 steps for each person

I won’t bore you with doing this over and over, I’ll just show you the results we have so far. Note: S/U = Std. Units

MovieAdamS/ULindsayS/UAustinS/USarahS/U
Top Gun40.071-1.3550.532-0.29
Jurassic Park50.72-0.5950.5330.58
Office Space50.730.1750.531-1.15
Message in a Bottle1-1.8140.931-1.8752.31
Sleepless in Seattle1-1.8151.691-1.871-1.15
Titanic40.071-1.3550.5330.58
Predator50.72-0.5950.532-0.29
Terminator50.730.1750.532-0.29
Anchorman50.740.9350.532-0.29
Mean3.892.784.112.33
Std Deviation1.591.311.661.15

Multiply the standard units of each person together

Adam's Top Gun Standard Unit (0.07) * 
Lindsay's Top Gun Standard Unit (-1.35)
= 0.07 * -1.35 = -0.09

Again, I don’t want to bore you so let’s just show the
results of Adam compared to Lindsay so far.

MovieAdamLindsayAdam: Std. UnitsLindsay: Std. UnitsProduct
Top Gun410.07-1.35-0.09
Jurassic Park520.7-0.59-0.41
Office Space530.70.170.12
Message in a Bottle14-1.810.93-1.68
Sleepless in Seattle15-1.811.69-3.06
Titanic410.07-1.35-0.09
Predator520.7-0.59-0.41
Terminator530.70.170.12
Anchorman540.70.930.65
Mean3.892.78
Std Deviation1.591.31

Finally: Calculate the correlation coefficient

The Correlation Coefficient is the final number we need to compare 2 people. You simply get the Average of the Product result.

= -0.09 + -0.41 + 0.12 + -1.68 + -3.06 + -0.09 + -0.41 + 0.12 + 0.65 
= -4.88
= -4.88 / 9 = -0.54

What does this mean?

Our Correlation Coefficient between Adam and Lindsay ended up being
-0.54. This means that Adam and Lindsay are
most of the time not going to like the same movies. If this had been a -1 it means they have a strong negative correlation and anytime Lindsay likes a movie, Adam will more than likely not like the same movie. You should be able to build a Matrix from these numbers to more easily compare 2 people.

AdamLindsayAustinSarah
Adam1-0.540.97-0.34
Lindsay-0.541-0.7-0.1
Austin0.97-0.71-0.31
Sarah-0.34-0.1-0.311

From this, you can see that Adam and Austin are very positively correlated (0.97) so they should like very similar things.

Step-by-Step to Compare 2 People

1. Get the mean rating for all movies for Person 1

2. Get the standard deviation for all movies for Person 1

3. Convert the rating of each movie to standard units for Person 1: (rating – mean) / standard deviation

4. Get the mean rating for all movies for Person 2

5. Get the standard deviation for all movies for Person 2

6. Convert the rating of each movie to standard units for Person 2: (rating – mean) / standard deviation

7. Calculate the product for each movie rating (Standard Unit for Person 1’s rating of the movie * Standard Unit for Person 2’s rating of the movie)

8. Calculate the correlation coefficient (Sum of the products from #7 divided by the number of movies they rated)

Summary

Hopefully, this is some use to you. Try to think of some scenarios where you might be able to use correlations to compare 2 things. We have a few projects where we need to recommend or predict if someone will like something. This is perfect for correlations.


References

Naked Statistics: Stripping the Dread from the Data – Charles Wheelan (2014)
Netflix Prize – Wikipedia
Netflix knows what I like by Jana Vembunarayanan, Oct 201
How Netflix uses correlations to predict what we like - Airship Blog

Get Ahead, Stay Ahead

Expert insights straight to your inbox

Subscribe for fresh ideas, case studies, tips, and trends to help you modernize your tech, build great teams, and create AI strategies that drive growth.

Help us customize your content with the following 2 questions:

Thank you!

We’re excited to have you with us! Keep an eye out for our next update – we can’t wait to share more.