Datasets

lightfm.datasets.movielens.fetch_movielens(data_home=None, indicator_features=True, genre_features=False, min_rating=0.0, download_if_missing=True)

Fetch the Movielens 100k dataset.

The dataset contains 100,000 interactions from 1000 users on 1700 movies, and is exhaustively described in its README.

Parameters:
  • data_home (path, optional) – Path to the directory in which the downloaded data should be placed. Defaults to ~/lightfm_data/.
  • indicator_features (bool, optional) – Use an [n_users, n_users] identity matrix for item features. When True with genre_features, indicator and genre features are concatenated into a single feature matrix of shape [n_users, n_users + n_genres].
  • genre_features (bool, optional) – Use a [n_users, n_genres] matrix for item features. When True with item_indicator_features, indicator and genre features are concatenated into a single feature matrix of shape [n_users, n_users + n_genres].
  • min_rating (float, optional) – Minimum rating to include in the interaction matrix.
  • download_if_missing (bool, optional) – Download the data if not present. Raises an IOError if False and data is missing.

Notes

The return value is a dictionary containing the following keys:

Returns:
  • train (sp.coo_matrix of shape [n_users, n_items]) – Contains training set interactions.
  • test (sp.coo_matrix of shape [n_users, n_items]) – Contains testing set interactions.
  • item_features (sp.csr_matrix of shape [n_items, n_item_features]) – Contains item features.
  • item_feature_labels (np.array of strings of shape [n_item_features,]) – Labels of item features.
  • item_labels (np.array of strings of shape [n_items,]) – Items’ titles.