Josh Adell recently published a blog post on the similarity based recommendation engine he is building for rating and recommending beer- always a welcome service! His post shares his experience with Gremlin, a graph traversal language. I'm going to take his example to show you how it can be done using Cypher, Neo4j's query language.
So basically, we want to be able to recommend beers. But we just don't want the highly rated beers- it means much more to us if they were rated highly by people who have similar tastes to us.
The goal is to answer two questions:
1. For a beer that I have not rated, what is the average rating given to it by people similar to me?
2. Which beers should I try and then rate? We want to recommend beers that have been rated 7 or higher by people similar to me.
To get to both these questions, we first need to determine people with similar tastes. For our purposes, Josh defines a similar user as one whos ratings on an average are within 2 points of my ratings for the same items.
We'll use this simple graph to test our Cypher queries:
(user1) -[:RATED 3]->(itemA)
(user2) -[:RATED 2]->(itemA)
(user3) -[:RATED 7]->(itemA)
(user1) -[:RATED 8]->(itemB)
(user2) -[:RATED 7]->(itemB)
(user3) -[:RATED 4]->(itemB)
(user2) -[:RATED 5]->(itemC)
(user3) -[:RATED 9]->(itemC)
(user2) -[:RATED 8]->(itemD)
(user3) -[:RATED 4]->(itemD)
Considering user1, the user with similar tastes is user2. Let's take a look at the Cypher query:
start me = node(user1) //Look up user1 via an index or some other means
match (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)
where abs(myRating.rating-otherRating.rating)<=2
return u
The match clause finds all items that I rated which have also been rated by other users, represented by u.
The where clause filters out ratings that differ by more than 2 points. The result of this query would be user2.
Let's try to answer question 1: What is the average rating(by similar people) of a beer not rated by me?
start item=node(x), //Look up item via an index or other means
similarUsers=node(u) //similarUsers here is the result received in the first query above
match (similarUsers)-[r:RATED]->(item)
return AVG(r.rating)
And finally, question 2: Which beers(that I haven't tried) have been rated 7 or higher by users similar to me?
start me=node(user1), //Look up item via an index or other means
similarUsers=node(3) /similarUsers here is the result received in the first query above
match (similarUsers)-[r:RATED]->(item)
where r.rating > 7 and not((me)-[:RATED]->(item))
return item
There you go! Of course there's much more to recommending beers- read more about it in Josh' post.
-Luanne
