obp.policy.base¶
Base Interfaces for Bandit Algorithms.
Classes
|
Base class for context-free bandit policies. |
|
Base class for contextual bandit policies. |
|
Base class for off-policy learners. |
-
class
obp.policy.base.
BaseContextFreePolicy
(n_actions: int, len_list: int = 1, batch_size: int = 1, random_state: Optional[int] = None)[source]¶ Bases:
object
Base class for context-free bandit policies.
- Parameters
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
random_state (int, default=None) – Controls the random seed in sampling actions.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.base.
BaseContextualPolicy
(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None)[source]¶ Bases:
object
Base class for contextual bandit policies.
- Parameters
dim (int) – Number of dimensions of context vectors.
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
alpha_ (float, default=1.) – Prior parameter for the online logistic regression.
lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.
random_state (int, default=None) – Controls the random seed in sampling actions.
-
abstract
update_params
(action: float, reward: float, context: numpy.ndarray) → None[source]¶ Update policy parameters.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.base.
BaseOfflinePolicyLearner
(n_actions: int, len_list: int = 1)[source]¶ Bases:
object
Base class for off-policy learners.
- Parameters
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
-
abstract
fit
() → None[source]¶ Fits an offline bandit policy using the given logged bandit feedback data.
-
abstract
predict
(context: numpy.ndarray) → numpy.ndarray[source]¶ Predict best action for new data.
- Parameters
context (array-like, shape (n_rounds_of_new_data, dim_context)) – Context vectors for new data.
- Returns
action – Action choices by a policy trained by calling the fit method.
- Return type
array-like, shape (n_rounds_of_new_data, n_actions, len_list)
-
property
policy_type
¶ Type of the bandit policy.