Dataset construction

class lightfm.data.Dataset(user_identity_features=True, item_identity_features=True)[source]

Bases: object

Tool for building interaction and feature matrices, taking care of the mapping between user/item ids and feature names and internal feature indices.

To create a dataset: - Create an instance of the Dataset class. - Call fit (or fit_partial), supplying user/item ids and feature names

that you want to use in your model. This will create internal mappings that translate the ids and feature names to internal indices used by the LightFM model.

  • Call build_interactions with an iterable of (user id, item id) or (user id, item id, weight) to build an interactions and weights matrix.

  • Call build_user/item_features with iterables of (user/item id, [features]) or (user/item id, {feature: feature weight}) to build feature matrices.

  • To add new user/item ids or features, call fit_partial again. You will need to resize your LightFM model to be able to use the new features.

Parameters
  • user_identity_features (bool, optional) – Create a unique feature for every user in addition to other features. If true (default), a latent vector will be allocated for every user. This is a reasonable default for most applications, but should be set to false if there is very little data for every user. For more details see the Notes in LightFM.

  • item_identity_features (bool, optional) – Create a unique feature for every item in addition to other features. If true (default), a latent vector will be allocated for every item. This is a reasonable default for most applications, but should be set to false if there is very little data for every item. For more details see the Notes in LightFM.

build_interactions(data)[source]

Build an interaction matrix.

Two matrices will be returned: a (num_users, num_items) COO matrix with interactions, and a (num_users, num_items) matrix with the corresponding interaction weights.

Parameters

data (iterable of (user_id, item_id) or (user_id, item_id, weight)) – An iterable of interactions. The user and item ids will be translated to internal model indices using the mappings constructed during the fit call. If weights are not provided they will be assumed to be 1.0.

Returns

(interactions, weights) – Two COO matrices: the interactions matrix and the corresponding weights matrix.

Return type

COO matrix, COO matrix

build_item_features(data, normalize=True)[source]

Build a item features matrix out of an iterable of the form (item id, [list of feature names]) or (item id, {feature name: feature weight}).

Parameters
  • data (iterable of the form) – (item id, [list of feature names]) or (item id, {feature name: feature weight}). Item and feature ids will be translated to internal indices constructed during the fit call.

  • normalize (bool, optional) – If true, will ensure that feature weights sum to 1 in every row.

Returns

feature matrix – Matrix of item features.

Return type

CSR matrix (num items, num features)

build_user_features(data, normalize=True)[source]

Build a user features matrix out of an iterable of the form (user id, [list of feature names]) or (user id, {feature name: feature weight}).

Parameters
  • data (iterable of the form) – (user id, [list of feature names]) or (user id, {feature name: feature weight}). User and feature ids will be translated to internal indices constructed during the fit call.

  • normalize (bool, optional) – If true, will ensure that feature weights sum to 1 in every row.

Returns

feature matrix – Matrix of user features.

Return type

CSR matrix (num users, num features)

fit(users, items, user_features=None, item_features=None)[source]

Fit the user/item id and feature name mappings.

Calling fit the second time will reset existing mappings.

Parameters
  • users (iterable of user ids) –

  • items (iterable of item ids) –

  • user_features (iterable of user features, optional) –

  • item_features (iterable of item features, optional) –

fit_partial(users=None, items=None, user_features=None, item_features=None)[source]

Fit the user/item id and feature name mappings.

Calling fit the second time will add new entries to existing mappings.

Parameters
  • users (iterable of user ids, optional) –

  • items (iterable of item ids, optional) –

  • user_features (iterable of user features, optional) –

  • item_features (iterable of item features, optional) –

interactions_shape()[source]

Return a tuple of (num users, num items).

item_features_shape()[source]

Return the shape of the item features matrix.

Returns

(num item ids, num item features) – The shape.

Return type

tuple of ints

mapping()[source]

Return the constructed mappings.

Invert these to map internal indices to external ids.

Returns

(user id map, user feature map, item id map, item feature map)

Return type

tuple of dictionaries

model_dimensions()[source]

Returns a tuple that characterizes the number of user/item feature embeddings in a LightFM model for this dataset.

user_features_shape()[source]

Return the shape of the user features matrix.

Returns

(num user ids, num user features) – The shape.

Return type

tuple of ints