obp.policy.contextfree¶

Context-Free Bandit Algorithms.

Classes

`BernoulliTS`(n_actions, len_list, batch_size, …)	Bernoulli Thompson Sampling Policy
`EpsilonGreedy`(n_actions, len_list, …)	Epsilon Greedy policy.
`Random`(n_actions, len_list, batch_size, …)	Random policy

class obp.policy.contextfree.BernoulliTS(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None, alpha: Optional[numpy.ndarray] = None, beta: Optional[numpy.ndarray] = None, is_zozotown_prior: bool = False, campaign: Optional[str] = None, policy_name: str = 'bts')[source]¶

Bases: obp.policy.base.BaseContextFreePolicy

Bernoulli Thompson Sampling Policy

Parameters

n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
alpha (array-like, shape (n_actions, ), default=None) – Prior parameter vector for Beta distributions.
beta (array-like, shape (n_actions, ), default=None) – Prior parameter vector for Beta distributions.
is_zozotown_prior (bool, default=False) – Whether to use hyperparameters for the beta distribution used at the start of the data collection period in ZOZOTOWN.
campaign (str, default=None) – One of the three possible campaigns considered in ZOZOTOWN, “all”, “men”, and “women”.
policy_name (str, default=’bts’) – Name of bandit policy.

compute_batch_action_dist(n_rounds: int = 1, n_sim: int = 100000) → numpy.ndarray[source]¶

Compute the distribution over actions by Monte Carlo simulation.

Parameters

n_rounds (int, default=1) – Number of rounds in the distribution over actions. (the size of the first axis of action_dist)
n_sim (int, default=100000) – Number of simulations in the Monte Carlo simulation to compute the distribution over actions.

Returns

action_dist – Probability estimates of each arm being the best one for each sample, action, and position.

Return type

array-like, shape (n_rounds, n_actions, len_list)

initialize() → None¶: Initialize Parameters.

select_action() → numpy.ndarray[source]¶

Select a list of actions.

Returns: selected_actions – List of selected actions.
Return type: array-like, shape (len_list, )

update_params(action: int, reward: float) → None[source]¶

Update policy parameters.

Parameters

action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.

property policy_type¶: Type of the bandit policy.

class obp.policy.contextfree.EpsilonGreedy(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None, epsilon: float = 1.0)[source]¶

Bases: obp.policy.base.BaseContextFreePolicy

Epsilon Greedy policy.

Parameters

n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
epsilon (float, default=1.) – Exploration hyperparameter that must take value in the range of [0., 1.].
policy_name (str, default=f’egreedy_{epsilon}’.) – Name of bandit policy.

initialize() → None¶: Initialize Parameters.

select_action() → numpy.ndarray[source]¶

Select a list of actions.

Returns: selected_actions – List of selected actions.
Return type: array-like, shape (len_list, )

update_params(action: int, reward: float) → None[source]¶

Update policy parameters.

Parameters

action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.

property policy_type¶: Type of the bandit policy.

class obp.policy.contextfree.Random(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None, epsilon: float = 1.0, policy_name: str = 'random')[source]¶

Bases: obp.policy.contextfree.EpsilonGreedy

Random policy

Parameters

n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
epsilon (float, default=1.) – Exploration hyperparameter that must take value in the range of [0., 1.].
policy_name (str, default=’random’.) – Name of bandit policy.

compute_batch_action_dist(n_rounds: int = 1, n_sim: int = 100000) → numpy.ndarray[source]¶

Compute the distribution over actions by Monte Carlo simulation.

Parameters: n_rounds (int, default=1) – Number of rounds in the distribution over actions. (the size of the first axis of action_dist)
Returns: action_dist – Probability estimates of each arm being the best one for each sample, action, and position.
Return type: array-like, shape (n_rounds, n_actions, len_list)

initialize() → None¶: Initialize Parameters.

select_action() → numpy.ndarray¶

Select a list of actions.

Returns: selected_actions – List of selected actions.
Return type: array-like, shape (len_list, )

update_params(action: int, reward: float) → None¶

Update policy parameters.

Parameters

action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.

property policy_type¶: Type of the bandit policy.