Assignment 2

In this assignment you will develop a Book manager and recommender.

Overview

Many companies use recommender systems to suggest products, information, and friends to their customers . These systems analyse usage patterns to suggest what a user would prefer from among a collection of things he has never experienced. Amazon Recommendations Recommender systems are big business - recently Netflix ran an open competition for a recommender algorithm. The winners prize was $1 million. In this assignment, you will develop a similar recommender algorithm for Movies but, unfortunately, there's no cash prize involved.

The main aims of this assignment is to write a Java program that can:

Book Data

You are provided with a small book data set. You will need to examine the data files and read the README file to design and develop suitable abstract data types to represent the data in your Java program (hint: User, Book and Rating classes). The data files are in CSV text format - you will have to write your own code that can parse the Movielens data. small book data set .

The following code example uses ADTs from Stanford University's standard library to read in a text file line by line, parse the data, and print the data to the standard output :

        File usersFile = new File("data/users.dat");
        In inUsers = new In(usersFile);
          //each field is separated(delimited) by a '|'
        String delims = "[|]";
        while (!inUsers.isEmpty()) {
            // get user and rating from data source
            String userDetails = inUsers.readLine();

            // parse user details string
            String[] userTokens = userDetails.split(delims);

            // output user data to console.
            if (userTokens.length == 7) {
                System.out.println("UserID: "+userTokens[0]+",First Name:"+
                        userTokens[1]+",Surname:" + userTokens[2]+",Age:"+
                        Integer.parseInt(userTokens[3])+",Gender:"+userTokens[4]+",Occupation:"+
                        userTokens[5]);

            }else
            {
                throw new Exception("Invalid member length: "+userTokens.length);
            }
        }

The above code produces output similar to the following:

UserID: 1, First Name:Leonard, Surname:Hernandez, Age:24, Gender:M, Occupation:technician UserID: 2, First Name:Melody, Surname:Roberson, Age:53, Gender:F, Occupation:other UserID: 3, First Name:Gregory, Surname:Newton, Age:23, Gender:M, Occupation:writer UserID: 4, First Name:Oliver, Surname:George, Age:24, Gender:M, Occupation:technician UserID: 5, First Name:Jenna, Surname:Parker, Age:33, Gender:F, Occupation:other

You will also need to use suitable data structures that will support the functionality of the program.

Serializer

Develop a serializer that can serialize/deserialize all the data into a single file. You must use a well known external data format such as XML, JSON or both.

Recommender API

Develop a recommender api that provides at least the following interface description. The interface desciption is in the form methodName(parameter list). The parameter list includes required parameters - you can include more if you wish.

addUser(firstName,lastName,age,gender,occupation)
removeUser(userID)
addBook(title, year, url)
addRating(userID, movieID, rating)
getBook(movieID)
getUserRatings(userID)
getUserRecommendations(userID)
getTopFiveBooks()
load()
write()

Recommender Application

Use the api to develop an end user application. This can be a simple command line interface that allows a user to access the functionality in the Recommender API(e.g. list books, rate a movie, get movie details.) . One consideration here is to include a user login facility in the application.

Making recommendations

The following are two possible approaches for implementing the recommendation function.

Approach 1:

A simple approach would be to make almost the same prediction for every member. In this case the program would simply calculate the average rating for all the books in the database, sort the books by rating and then from that sorted list, suggest the top 5 books that the member hasn't already rated. With this simple approach, the only information unique to the member used by the prediction algorithm was whether or not the member had seen a book.

Approach 2:

A better prediction about what a member might like by considering his actual ratings in the past and how these ratings compare to the ratings given by other customers. Consider how you decide on book recommendations from friends. If a friend tells you about a number of books that s(he) enjoyed and you also enjoyed them, then when your friend recommends another book that you have never seen, you probably are willing to go see it. On the other hand, if you and a different friend always tend to disagree about books, you are not likely to go to see a book this friend recommends. A program can calculate how similar two users are by treating their ratings as a vector and calculating the dot product of these two vectors (the dot product is just the sum of the products of each of the corresponding elements.) For example, suppose we had 3 books in our database and Paddy rated them [5, 3,-5], Clodagh rated them [1, 5,-3], Ronan rated them [5, -3, 5], and Michael rated them [1, 3, 0].

The similarity between Paddy and Ronan is: (5 x 5) + (3 x -3) + (-5 x 5) = 25 - 9 - 25 = -9. The similarity between Paddy and Clodagh is: (5 x 1) + (3 x 5) + (-5 x -3) = 5 + 15 + 15 = 35. The similarity between Paddy and Michael is: (5 x 1) + (3 x 3) + (-5 x 0) = 5 + 9 + 0 = 14.

We see that if both people like a book (rating it with a positive number) it increases their similarity and if both people dislike a book (both giving it a negative number) it also increases their similarity. Once you have calculated the similarity between Paddy and every other customer, you can then identify whose ratings are most similar to Paddy's. In this case Clodagh is most similar to Paddy, so we would recommend to Paddy the top books from Clodagh's list that Paddy hadn't already rated.

Testing

Approx 20% of marks in this assignment is for the testing strategy. You are required to provide the following:

Assignment Submission

Please submit assignment as a Github or Bitbucket repository. The following are expected to be included:

Deadlines

There are two assignment submission deadlines:

-13th June (before Summer holidays)

-1st September (after Summer holidays)

You must submit before 1st Sept. and be present for a code review in the 1st week of Septmber. Alternatively, if you wish to submit and finish before the Summer, you can submit anytime before 13th June and be available to attend a code review. However, You CANNOT submit twice (i.e. for both deadlines.)

Marking

All code should be internally documented and your repository should contain by a readme file that clearly and concisely explains the features of your program. General marking scheme, including above expectations, is as follows:

Load External Data: 10; Serialiser: 10; Add/Remove objects: 15; Get top 5 books: 10; Rate book: 10; Reccomendations: 25; Testing: 20.