obp.policy.logistic

Contextual Logistic Bandit Algorithms.

Classes

LogisticEpsilonGreedy(dim, n_actions, …)

Logistic Epsilon Greedy.

LogisticTS(dim, n_actions, len_list, …)

Logistic Thompson Sampling.

LogisticUCB(dim, n_actions, len_list, …)

Logistic Upper Confidence Bound.

MiniBatchLogisticRegression(lambda_, alpha, …)

MiniBatch Online Logistic Regression Model.

class obp.policy.logistic.LogisticEpsilonGreedy(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None, epsilon: float = 0.0)[source]

Bases: obp.policy.base.BaseContextualPolicy

Logistic Epsilon Greedy.

Parameters
  • dim (int) – Number of dimensions of context vectors.

  • n_actions (int) – Number of actions.

  • len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.

  • batch_size (int, default=1) – Number of samples used in a batch parameter update.

  • alpha_ (float, default=1.) – Prior parameter for the online logistic regression.

  • lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.

  • random_state (int, default=None) – Controls the random seed in sampling actions.

  • epsilon (float, default=0.) – Exploration hyperparameter that must take value in the range of [0., 1.].

initialize() → None

Initialize policy parameters.

select_action(context: numpy.ndarray) → numpy.ndarray[source]

Select action for new data.

Parameters

context (array-like, shape (1, dim_context)) – Observed context vector.

Returns

selected_actions – List of selected actions.

Return type

array-like, shape (len_list, )

update_params(action: int, reward: float, context: numpy.ndarray) → None[source]

Update policy parameters.

Parameters
  • action (int) – Selected action by the policy.

  • reward (float) – Observed reward for the chosen action and position.

  • context (array-like, shape (1, dim_context)) – Observed context vector.

property policy_type

Type of the bandit policy.

class obp.policy.logistic.LogisticTS(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None, policy_name: str = 'logistic_ts')[source]

Bases: obp.policy.base.BaseContextualPolicy

Logistic Thompson Sampling.

Parameters
  • dim (int) – Number of dimensions of context vectors.

  • n_actions (int) – Number of actions.

  • len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.

  • batch_size (int, default=1) – Number of samples used in a batch parameter update.

  • alpha_ (float, default=1.) – Prior parameter for the online logistic regression.

  • lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.

  • random_state (int, default=None) – Controls the random seed in sampling actions.

References

Olivier Chapelle and Lihong Li. “An empirical evaluation of thompson sampling,” 2011.

initialize() → None

Initialize policy parameters.

select_action(context: numpy.ndarray) → numpy.ndarray[source]

Select action for new data.

Parameters

context (array-like, shape (1, dim_context)) – Observed context vector.

Returns

selected_actions – List of selected actions.

Return type

array-like, shape (len_list, )

update_params(action: int, reward: float, context: numpy.ndarray) → None[source]

Update policy parameters.

Parameters
  • action (int) – Selected action by the policy.

  • reward (float) – Observed reward for the chosen action and position.

  • context (array-like, shape (1, dim_context)) – Observed context vector.

property policy_type

Type of the bandit policy.

class obp.policy.logistic.LogisticUCB(dim: int, n_actions: int, len_list: int = 1, batch_size: int = 1, alpha_: float = 1.0, lambda_: float = 1.0, random_state: Optional[int] = None, epsilon: float = 0.0)[source]

Bases: obp.policy.base.BaseContextualPolicy

Logistic Upper Confidence Bound.

Parameters
  • dim (int) – Number of dimensions of context vectors.

  • n_actions (int) – Number of actions.

  • len_list (int, default=1) – Length of a list of actions recommended in each impression. When Open Bandit Dataset is used, 3 should be set.

  • batch_size (int, default=1) – Number of samples used in a batch parameter update.

  • alpha_ (float, default=1.) – Prior parameter for the online logistic regression.

  • lambda_ (float, default=1.) – Regularization hyperparameter for the online logistic regression.

  • random_state (int, default=None) – Controls the random seed in sampling actions.

  • epsilon (float, default=0.) – Exploration hyperparameter that must take value in the range of [0., 1.].

References

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. “A Contextual-bandit Approach to Personalized News Article Recommendation,” 2010.

initialize() → None

Initialize policy parameters.

select_action(context: numpy.ndarray) → numpy.ndarray[source]

Select action for new data.

Parameters

context (array-like, shape (1, dim_context)) – Observed context vector.

Returns

selected_actions – List of selected actions.

Return type

array-like, shape (len_list, )

update_params(action: int, reward: float, context: numpy.ndarray) → None[source]

Update policy parameters.

Parameters
  • action (int) – Selected action by the policy.

  • reward (float) – Observed reward for the chosen action and position.

  • context (array-like, shape (1, dim_context)) – Observed context vector.

property policy_type

Type of the bandit policy.

class obp.policy.logistic.MiniBatchLogisticRegression(lambda_: float, alpha: float, dim: int, random_state: Optional[int] = None)[source]

Bases: object

MiniBatch Online Logistic Regression Model.

fit(X: numpy.ndarray, y: numpy.ndarray)[source]

Update coefficient vector by the mini-batch data.

grad(w: numpy.ndarray, *args) → numpy.ndarray[source]

Calculate gradient.

loss(w: numpy.ndarray, *args) → float[source]

Calculate loss function.

predict_proba(X: numpy.ndarray) → numpy.ndarray[source]

Predict extected probability by the expected coefficient.

predict_proba_with_sampling(X: numpy.ndarray) → numpy.ndarray[source]

Predict extected probability by the sampled coefficient.

sample() → numpy.ndarray[source]

Sample coefficient vector from the prior distribution.

sd() → numpy.ndarray[source]

Standard deviation for the coefficient vector.