obp.policy.contextfree¶
Context-Free Bandit Algorithms.
Classes
|
Bernoulli Thompson Sampling Policy |
|
Epsilon Greedy policy. |
|
Random policy |
-
class
obp.policy.contextfree.
BernoulliTS
(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None, alpha: Optional[numpy.ndarray] = None, beta: Optional[numpy.ndarray] = None, is_zozotown_prior: bool = False, campaign: Optional[str] = None, policy_name: str = 'bts')[source]¶ Bases:
obp.policy.base.BaseContextFreePolicy
Bernoulli Thompson Sampling Policy
- Parameters
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
alpha (array-like, shape (n_actions, ), default=None) – Prior parameter vector for Beta distributions.
beta (array-like, shape (n_actions, ), default=None) – Prior parameter vector for Beta distributions.
is_zozotown_prior (bool, default=False) – Whether to use hyperparameters for the beta distribution used at the start of the data collection period in ZOZOTOWN.
campaign (str, default=None) – One of the three possible campaigns considered in ZOZOTOWN, “all”, “men”, and “women”.
policy_name (str, default=’bts’) – Name of bandit policy.
-
compute_batch_action_dist
(n_rounds: int = 1, n_sim: int = 100000) → numpy.ndarray[source]¶ Compute the distribution over actions by Monte Carlo simulation.
- Parameters
n_rounds (int, default=1) – Number of rounds in the distribution over actions. (the size of the first axis of action_dist)
n_sim (int, default=100000) – Number of simulations in the Monte Carlo simulation to compute the distribution over actions.
- Returns
action_dist – Probability estimates of each arm being the best one for each sample, action, and position.
- Return type
array-like, shape (n_rounds, n_actions, len_list)
-
initialize
() → None¶ Initialize Parameters.
-
select_action
() → numpy.ndarray[source]¶ Select a list of actions.
- Returns
selected_actions – List of selected actions.
- Return type
array-like, shape (len_list, )
-
update_params
(action: int, reward: float) → None[source]¶ Update policy parameters.
- Parameters
action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.contextfree.
EpsilonGreedy
(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None, epsilon: float = 1.0)[source]¶ Bases:
obp.policy.base.BaseContextFreePolicy
Epsilon Greedy policy.
- Parameters
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
epsilon (float, default=1.) – Exploration hyperparameter that must take value in the range of [0., 1.].
policy_name (str, default=f’egreedy_{epsilon}’.) – Name of bandit policy.
-
initialize
() → None¶ Initialize Parameters.
-
select_action
() → numpy.ndarray[source]¶ Select a list of actions.
- Returns
selected_actions – List of selected actions.
- Return type
array-like, shape (len_list, )
-
update_params
(action: int, reward: float) → None[source]¶ Update policy parameters.
- Parameters
action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.contextfree.
Random
(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None, epsilon: float = 1.0, policy_name: str = 'random')[source]¶ Bases:
obp.policy.contextfree.EpsilonGreedy
Random policy
- Parameters
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
epsilon (float, default=1.) – Exploration hyperparameter that must take value in the range of [0., 1.].
policy_name (str, default=’random’.) – Name of bandit policy.
-
compute_batch_action_dist
(n_rounds: int = 1, n_sim: int = 100000) → numpy.ndarray[source]¶ Compute the distribution over actions by Monte Carlo simulation.
- Parameters
n_rounds (int, default=1) – Number of rounds in the distribution over actions. (the size of the first axis of action_dist)
- Returns
action_dist – Probability estimates of each arm being the best one for each sample, action, and position.
- Return type
array-like, shape (n_rounds, n_actions, len_list)
-
initialize
() → None¶ Initialize Parameters.
-
select_action
() → numpy.ndarray¶ Select a list of actions.
- Returns
selected_actions – List of selected actions.
- Return type
array-like, shape (len_list, )
-
update_params
(action: int, reward: float) → None¶ Update policy parameters.
- Parameters
action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.
-
property
policy_type
¶ Type of the bandit policy.