Collaborative filtering

From Citizendium
Revision as of 19:03, 11 August 2010 by imported>Yash Prabhu
Jump to navigation Jump to search
All unapproved Citizendium articles may contain errors of fact, bias, grammar etc. A version of an article is unapproved unless it is marked as citable with a dedicated green template at the top of the page, as in this version of the 'Biology' article. Citable articles are intended to be of reasonably high quality. The participants in the Citizendium project make no representations about the reliability of Citizendium articles or, generally, their suitability for any purpose.

Nuvola apps kbounce green.png
Nuvola apps kbounce green.png
This article is currently being developed as part of an Eduzendium student project. The course homepage can be found at CZ:Special_Topics_2010.
To provide students with experience in collaboration, you are warmly invited to join in here, or to leave comments on the discussion page. The anticipated date of course completion is 13 August 2010. One month after that date at the latest, this notice shall be removed.
Besides, many other Citizendium articles welcome your collaboration!


This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Definition

Collaborative Filtering(CF) refers to the use of software algorithms for narrowing down a large set of choices by using collaboration among multiple agents, viewpoints, and data sources.

Overview

The term Collaborative Filtering was first coined by the makers of one of the earliest recommendation systems, Tapestry. The basic assumption in CF is that user A and user B's personal tastes are correlated if both users rate n items similarly.

Unlike Content-based filtering, Collaborative filtering does not require content; instead, CF requires ratings for an item to make a prediction for it. Rating is an association of a user and an item by means of a value. Rating can be either explicit or implicit. Explicit rating requires the user to rate an item. Implicit rating infers from the user's actions. (Refer paper)

Collaborative Filtering systems follow this approach to produce recommendations:
1. Gather ratings from users and maintain user's ratings in a database.
2. Compute the correlations between pairs of users to determine a user’s neighbors in taste space
3. Compute the ratings of these neighbors to make recommendations.

Collaborative Filtering Techniques

Collaborative Filtering techniques can be separated into 3 classes:

1. Memory-based(Heuristic) Recommendation Technique

Memory-based algorithms make predictions by operating on data (users, items and ratings) stored in memory. They can be classified into:
1. Nearest Neighbor algorithms.
2. Top-N recommendation algorithms

Nearest neighbor algorithms are the most commonly used Memory based CF algorithms. Users similar to the current user with respect to preferences are called as neighbors. Nearest neighbor algorithms can be further classified into User-based nearest neighbor and Item-based nearest neighbor algorithms.

User-based nearest neighbor algorithms generate predictions for a given user based on ratings provided by users in the neighborhood. A common approach to user-based nearest neighbor algorithm is:
1. Weight all users with respect to similarity with the current user. Similarity weighting is done by either using:

  • Pearson correlation coefficient between ratings for current user c, and another user, u.
  • Cosine-based correlation where in the two users c and u are considered to be two vectors in an m-dimensional space and the similarity between the two is measured by computing the cosine of the angle between them.

2. Select a subset of the users (neighbors) to use as predictors.

  • Neighbor selection is done by finding similar users or users having a similarity weighting above a certain threshold.

3. Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings.
4. Present items with highest predicted ratings as recommendations.

Item-based algorithms generate predictions based on similarities between items.
1. The correlation of each item ik with other items is computed.
2. For each user ui , the ratings of the items highly correlated with ik are aggregated.

Top-N recommendations
Top-N recommendation is to recommend a set of N top-ranked items that will be of interest to a certain user. Top-N recommendation techniques analyze the user-item matrix to discover relations between different users or items and use them to compute the recommendations.

User-based Top-N Recommendation Algorithms.

  • Identify the k most similar users (nearest neighbors) to the active user using the Pearson correlation or vector-space model
  • After the k most similar users have been discovered, their corresponding rows in the user-item matrix R are aggregated to identify a set of items, C, purchased by the group together with their frequency.
  • With the set C, user-based CF techniques then recommend the top-N most frequent items in C that the active user has not purchased.
  • User-based top-N recommendation algorithms have limitations related to scalability and real-time performance

Item-Based Top-N Recommendation Algorithms.

  • The algorithms firstly compute the k most similar items for each item according to the similarities;
  • then identify the set, C, as candidates of recommended items by taking the union of the k most similar items and removing each of the items in the set, U, that the user has already purchased;
  • then calculate the similarities between each item of the set C and the set U.
  • The resulting set of the items in C, sorted in decreasing order of the similarity, will be the recommended item-based Top-N list
  • Item-based top-N recommendation algorithms have been developed to address the scalability problem of user-based top-N recommendation algorithms.

Advantages of Memory-based algorithms:
1. It is easy to implement.
2. It can detect complex patterns easily.
3. It scales well with correlated items.
4. It does not require the content of items, only the ratings are sufficient.
5. New data can be added easily.

Disadvantages of Memory-based algorithms:
1. It depends on human ratings.
2. Correlations are skewed when data is sparse.
3. Time and memory requirements increase linearly with the number of users and ratings.
4. It cannot recommend for new users and items.

2. Model-based Recommendation Technique

Model-based algorithms use the collection of ratings to learn a model, which is then used to make rating predictions.

3. Hybrid Recommendation Technique

Challenges of Collaborative Filtering

  • Data sparsity (421425 all)
  • Scalability
  • Synonymy
  • Gray Sheep
  • Shilling
  • Privacy and Security
  • Trust 4321

Collaborative Filtering in e-commerce

CF algorithms are employed in several popular websites such as:

  • Amazon
  • Netflix
  • eBay
  • Google News
  • Hulu
  • Internet Movie Database


References