obp.policy.logistic¶
Contextual Logistic Bandit Algorithms.
Classes
|
Logistic Epsilon Greedy. |
|
Logistic Thompson Sampling. |
|
Logistic Upper Confidence Bound. |
|
MiniBatch Online Logistic Regression Model. |
-
class
obp.policy.logistic.
LogisticEpsilonGreedy
(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None, epsilon: float = 0.0)[source]¶ Bases:
obp.policy.base.BaseContextualPolicy
Logistic Epsilon Greedy.
- Parameters
dim (int) – Number of dimensions of context vectors.
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
alpha_ (float, default=1.) – Prior parameter for the online logistic regression.
lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.
random_state (int, default=None) – Controls the random seed in sampling actions.
epsilon (float, default=0.) – Exploration hyperparameter that must take value in the range of [0., 1.].
-
initialize
() → None¶ Initialize policy parameters.
-
select_action
(context: numpy.ndarray) → numpy.ndarray[source]¶ Select action for new data.
- Parameters
context (array-like, shape (1, dim_context)) – Observed context vector.
- Returns
selected_actions – List of selected actions.
- Return type
array-like, shape (len_list, )
-
update_params
(action: int, reward: float, context: numpy.ndarray) → None[source]¶ Update policy parameters.
- Parameters
action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.
context (array-like, shape (1, dim_context)) – Observed context vector.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.logistic.
LogisticTS
(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None, policy_name: str = 'logistic_ts')[source]¶ Bases:
obp.policy.base.BaseContextualPolicy
Logistic Thompson Sampling.
- Parameters
dim (int) – Number of dimensions of context vectors.
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
alpha_ (float, default=1.) – Prior parameter for the online logistic regression.
lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.
random_state (int, default=None) – Controls the random seed in sampling actions.
References
Olivier Chapelle and Lihong Li. “An empirical evaluation of thompson sampling,” 2011.
-
initialize
() → None¶ Initialize policy parameters.
-
select_action
(context: numpy.ndarray) → numpy.ndarray[source]¶ Select action for new data.
- Parameters
context (array-like, shape (1, dim_context)) – Observed context vector.
- Returns
selected_actions – List of selected actions.
- Return type
array-like, shape (len_list, )
-
update_params
(action: int, reward: float, context: numpy.ndarray) → None[source]¶ Update policy parameters.
- Parameters
action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.
context (array-like, shape (1, dim_context)) – Observed context vector.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.logistic.
LogisticUCB
(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None, epsilon: float = 0.0)[source]¶ Bases:
obp.policy.base.BaseContextualPolicy
Logistic Upper Confidence Bound.
- Parameters
dim (int) – Number of dimensions of context vectors.
n_actions (int) – Number of actions.
len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.
batch_size (int, default=1) – Number of samples used in a batch parameter update.
alpha_ (float, default=1.) – Prior parameter for the online logistic regression.
lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.
random_state (int, default=None) – Controls the random seed in sampling actions.
epsilon (float, default=0.) – Exploration hyperparameter that must take value in the range of [0., 1.].
References
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. “A Contextual-bandit Approach to Personalized News Article Recommendation,” 2010.
-
initialize
() → None¶ Initialize policy parameters.
-
select_action
(context: numpy.ndarray) → numpy.ndarray[source]¶ Select action for new data.
- Parameters
context (array-like, shape (1, dim_context)) – Observed context vector.
- Returns
selected_actions – List of selected actions.
- Return type
array-like, shape (len_list, )
-
update_params
(action: int, reward: float, context: numpy.ndarray) → None[source]¶ Update policy parameters.
- Parameters
action (int) – Selected action by the policy.
reward (float) – Observed reward for the chosen action and position.
context (array-like, shape (1, dim_context)) – Observed context vector.
-
property
policy_type
¶ Type of the bandit policy.
-
class
obp.policy.logistic.
MiniBatchLogisticRegression
(lambda_: float, alpha: float, dim: int, random_state: Optional[int] = None)[source]¶ Bases:
object
MiniBatch Online Logistic Regression Model.
-
predict_proba
(X: numpy.ndarray) → numpy.ndarray[source]¶ Predict extected probability by the expected coefficient.
-