obp.dataset.real¶

Dataset Class for Real-World Logged Bandit Feedback.

Classes

OpenBanditDataset(behavior_policy, campaign, …)

Class for loading and preprocessing Open Bandit Dataset.

class obp.dataset.real.OpenBanditDataset(behavior_policy: str, campaign: str, data_path: pathlib.Path = PosixPath('obd'), dataset_name: str = 'obd')[source]¶

Bases: obp.dataset.base.BaseRealBanditDataset

Class for loading and preprocessing Open Bandit Dataset.

Note

Users are free to implement their own feature engineering by overriding the pre_process method.

Parameters

behavior_policy (str) – Name of the behavior policy that generated the logged bandit feedback data. Must be either ‘random’ or ‘bts’.
campaign (str) – One of the three possible campaigns considered in ZOZOTOWN, “all”, “men”, and “women”.
data_path (Path, default=Path(‘./obd’)) – Path that stores Open Bandit Dataset.
dataset_name (str, default=’obd’) – Name of the dataset.

References

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. “Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms.”, 2020.

classmethod calc_on_policy_policy_value_estimate(behavior_policy: str, campaign: str, data_path: pathlib.Path = PosixPath('obd'), test_size: float = 0.3, is_timeseries_split: bool = False) → float[source]¶

Calculate on-policy policy value estimate (used as a ground-truth policy value).

Parameters

behavior_policy (str) – Name of the behavior policy that generated the log data. Must be either ‘random’ or ‘bts’.
campaign (str) – One of the three possible campaigns considered in ZOZOTOWN (i.e., “all”, “men”, and “women”).
data_path (Path, default=Path(‘./obd’)) – Path that stores Open Bandit Dataset.
test_size (float, default=0.3) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.
is_timeseries_split (bool, default=False) – If true, split the original logged bandit feedback data by time series.

Returns

on_policy_policy_value_estimate – Policy value of the behavior policy estimated by on-policy estimation, i.e., \(\mathbb{E}_{\mathcal{D}} [r_t]\). where \(\mathbb{E}_{\mathcal{D}}[\cdot]\) is the empirical average over \(T\) observations in \(\mathcal{D}\). This parameter is used as a ground-truth policy value in the evaluation of OPE estimators.

Return type

float

load_raw_data() → None[source]¶: Load raw open bandit dataset.

obtain_batch_bandit_feedback(test_size: float = 0.3, is_timeseries_split: bool = False) → Dict[str, Union[int, numpy.ndarray]][source]¶

Obtain batch logged bandit feedback.

Parameters

test_size (float, default=0.3) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the evaluation split.
is_timeseries_split (bool, default=False) – If true, split the original logged bandit feedback data by time series.

Returns

bandit_feedback – Batch logged bandit feedback collected by a behavior policy.

Return type

BanditFeedback

pre_process() → None[source]¶: Preprocess raw open bandit dataset.

Note

This is the default feature engineering and please override this method to implement your own preprocessing. see https://github.com/st-tech/zr-obp/blob/master/examples/examples_with_obd/custom_dataset.py for example.

sample_bootstrap_bandit_feedback(test_size: float = 0.3, is_timeseries_split: bool = False, random_state: Optional[int] = None) → Dict[str, Union[int, numpy.ndarray]][source]¶

Obtain bootstrap logged bandit feedback.

Parameters

test_size (float, default=0.3) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the evaluation split.
is_timeseries_split (bool, default=False) – If true, split the original logged bandit feedback data by time series.
random_state (int, default=None) – Controls the random seed in bootstrap sampling.

Returns

bandit_feedback – Logged bandit feedback sampled independently from the original data with replacement.

Return type

BanditFeedback

property dim_context¶: Dimensions of context vectors.

property len_list¶: Length of recommendation lists.

property n_actions¶: Number of actions.

property n_rounds¶: Total number of rounds contained in the logged bandit dataset.