obp.dataset.real

Dataset Class for Real-World Logged Bandit Feedback.

Classes

OpenBanditDataset(behavior_policy, campaign, …)

Class for loading and preprocessing Open Bandit Dataset.

class obp.dataset.real.OpenBanditDataset(behavior_policy: str, campaign: str, data_path: pathlib.Path = PosixPath('obd'), dataset_name: str = 'obd')[source]

Bases: obp.dataset.base.BaseRealBanditDataset

Class for loading and preprocessing Open Bandit Dataset.

Note

Users are free to implement their own feature engineering by overriding the pre_process method.

Parameters
  • behavior_policy (str) – Name of the behavior policy that generated the logged bandit feedback data. Must be either ‘random’ or ‘bts’.

  • campaign (str) – One of the three possible campaigns considered in ZOZOTOWN, “all”, “men”, and “women”.

  • data_path (Path, default=Path(‘./obd’)) – Path that stores Open Bandit Dataset.

  • dataset_name (str, default=’obd’) – Name of the dataset.

References

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. “Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms.”, 2020.

classmethod calc_on_policy_policy_value_estimate(behavior_policy: str, campaign: str, data_path: pathlib.Path = PosixPath('obd'), test_size: float = 0.3, is_timeseries_split: bool = False) → float[source]

Calculate on-policy policy value estimate (used as a ground-truth policy value).

Parameters
  • behavior_policy (str) – Name of the behavior policy that generated the log data. Must be either ‘random’ or ‘bts’.

  • campaign (str) – One of the three possible campaigns considered in ZOZOTOWN (i.e., “all”, “men”, and “women”).

  • data_path (Path, default=Path(‘./obd’)) – Path that stores Open Bandit Dataset.

  • test_size (float, default=0.3) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.

  • is_timeseries_split (bool, default=False) – If true, split the original logged bandit feedback data by time series.

Returns

on_policy_policy_value_estimate – Policy value of the behavior policy estimated by on-policy estimation, i.e., \(\mathbb{E}_{\mathcal{D}} [r_t]\). where \(\mathbb{E}_{\mathcal{D}}[\cdot]\) is the empirical average over \(T\) observations in \(\mathcal{D}\). This parameter is used as a ground-truth policy value in the evaluation of OPE estimators.

Return type

float

load_raw_data() → None[source]

Load raw open bandit dataset.

obtain_batch_bandit_feedback(test_size: float = 0.3, is_timeseries_split: bool = False) → Dict[str, Union[int, numpy.ndarray]][source]

Obtain batch logged bandit feedback.

Parameters
  • test_size (float, default=0.3) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the evaluation split.

  • is_timeseries_split (bool, default=False) – If true, split the original logged bandit feedback data by time series.

Returns

bandit_feedback – Batch logged bandit feedback collected by a behavior policy.

Return type

BanditFeedback

pre_process() → None[source]

Preprocess raw open bandit dataset.

Note

This is the default feature engineering and please override this method to implement your own preprocessing. see https://github.com/st-tech/zr-obp/blob/master/examples/examples_with_obd/custom_dataset.py for example.

sample_bootstrap_bandit_feedback(test_size: float = 0.3, is_timeseries_split: bool = False, random_state: Optional[int] = None) → Dict[str, Union[int, numpy.ndarray]][source]

Obtain bootstrap logged bandit feedback.

Parameters
  • test_size (float, default=0.3) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the evaluation split.

  • is_timeseries_split (bool, default=False) – If true, split the original logged bandit feedback data by time series.

  • random_state (int, default=None) – Controls the random seed in bootstrap sampling.

Returns

bandit_feedback – Logged bandit feedback sampled independently from the original data with replacement.

Return type

BanditFeedback

property dim_context

Dimensions of context vectors.

property len_list

Length of recommendation lists.

property n_actions

Number of actions.

property n_rounds

Total number of rounds contained in the logged bandit dataset.