Movie Recomendation System using ALS

In this tutorial we will develop movie recomendation system using Spark MLlib ALS algorithm.The common workflow will have the following steps:

  1. Load the sample data.

  2. Parse the data into the input format for the ALS algorithm.

  3. Split the data into two parts: one for building the model and one for testing the model.

  4. Run the ALS algorithm to build/train a user product matrix model.

  5. Make predictions with the training data and observe the results.

  6. Test the model with the test data.

MovieLens Dataset

In the following example, we load ratings data from the MovieLens dataset, each row consisting of a user, a movie, a rating and a timestamp. We then train an ALS model which assumes, by default, that the ratings are explicit (implicitPrefs is false). We evaluate the recommendation model by measuring the root-mean-square error of rating prediction.

  • We will use two files from this MovieLens dataset: “ratings.dat” and “movies.dat”. All ratings are contained in the file “ratings.dat” and are in the following format:

UserID::MovieID::Rating::Timestamp

  • Movie information is in the file “movies.dat” and is in the following format:

MovieID::Title::Genres

Create training examples

To make recommendation for you, we are going to learn your taste by asking you to rate a few movies. We have selected a small set of movies that have received the most ratings from users in the MovieLens dataset.

Please copy personalRatings.txt.template to personalRatings.txt and replace ?s with your ratings.

results matching ""

    No results matching ""