Practice 6: Spark MLLib - Movie Recomendation System
In this practice we will develop movie recomendation system using Spark MLlib ALS algorithm.The common workflow will have the following steps:
Load the sample data.
Parse the data into the input format for the ALS algorithm.
Split the data into two parts: one for building the model and one for testing the model.
Run the ALS algorithm to build/train a user product matrix model.
Make predictions with the training data and observe the results.
Test the model with the test data
1. Load MovieLens Data
Load MovieLens Data to SparkSQLContext and prepare the follwoing DataFrames: movies, ratings, personalRatings.
Please copy personalRatings.txt.template and replace <?> with your ratings.
For example:
0::1::0::1400000000::Toy Story (1995)
0::780::5::1400000000::Independence Day (a.k.a. ID4) (1996)
0::590::3::1400000000::Dances with Wolves (1990)
0::1210::2::1400000000::Star Wars: Episode VI - Return of the Jedi (1983)
0::648::2::1400000000::Mission: Impossible (1996)
2. Prepare training and testing sets
Split the ratings into two non-overlapping subsets, named training, test.
// Split dataset into training and testing parts
val Array(training, test) = ratingWithMyRats.randomSplit(Array(0.5, 0.5))
3. Initialize ALS() function and Train Model
Initialize ALS object and define names of corresponding colums of your dataframe:
// Build the recommendation model using ALS on the training data
val als = new ALS()
.setMaxIter(3)
.setRegParam(0.01)
.setUserCol("userId")
.setItemCol("movieId")
.setRatingCol("rating")
//Get trained model
val model = als.fit(training)
4. Calculate predictions on the test dataset:
val predictions = model.transform(test).na.drop
5. Evaluate the model by RMSE
val evaluator = new RegressionEvaluator()
.setMetricName("rmse")
.setLabelCol("rating")
.setPredictionCol("prediction")
val rmse = evaluator.evaluate(predictions)
println(s"Root-mean-square error = $rmse")
6. Add your personal ratings
Load your personalRating into dataframe and union with the original rating dataframe:
val ratingWithMyRats = ratings.union(myRating)
//Get My Predictions
val myPredictions = model.transform(myRating).na.drop