obp.policy.base

Base Interfaces for Bandit Algorithms.

Classes

BaseContextFreePolicy(n_actions, len_list, …)

Base class for context-free bandit policies.

BaseContextualPolicy(dim, n_actions, …)

Base class for contextual bandit policies.

BaseOfflinePolicyLearner(n_actions, len_list)

Base class for off-policy learners.

class obp.policy.base.BaseContextFreePolicy(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None)[source]

Bases: object

Base class for context-free bandit policies.

Parameters
  • n_actions (int) – Number of actions.

  • len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.

  • batch_size (int, default=1) – Number of samples used in a batch parameter update.

  • random_state (int, default=None) – Controls the random seed in sampling actions.

initialize() → None[source]

Initialize Parameters.

abstract select_action() → numpy.ndarray[source]

Select a list of actions.

abstract update_params(action: int, reward: float) → None[source]

Update policy parameters.

property policy_type

Type of the bandit policy.

class obp.policy.base.BaseContextualPolicy(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None)[source]

Bases: object

Base class for contextual bandit policies.

Parameters
  • dim (int) – Number of dimensions of context vectors.

  • n_actions (int) – Number of actions.

  • len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.

  • batch_size (int, default=1) – Number of samples used in a batch parameter update.

  • alpha_ (float, default=1.) – Prior parameter for the online logistic regression.

  • lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.

  • random_state (int, default=None) – Controls the random seed in sampling actions.

initialize() → None[source]

Initialize policy parameters.

abstract select_action(context: numpy.ndarray) → numpy.ndarray[source]

Select a list of actions.

abstract update_params(action: float, reward: float, context: numpy.ndarray) → None[source]

Update policy parameters.

property policy_type

Type of the bandit policy.

class obp.policy.base.BaseOfflinePolicyLearner(n_actions: int, len_list: int = 1)[source]

Bases: object

Base class for off-policy learners.

Parameters
  • n_actions (int) – Number of actions.

  • len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.

abstract fit() → None[source]

Fits an offline bandit policy using the given logged bandit feedback data.

abstract predict(context: numpy.ndarray) → numpy.ndarray[source]

Predict best action for new data.

Parameters

context (array-like, shape (n_rounds_of_new_data, dim_context)) – Context vectors for new data.

Returns

action – Action choices by a policy trained by calling the fit method.

Return type

array-like, shape (n_rounds_of_new_data, n_actions, len_list)

property policy_type

Type of the bandit policy.