obp.simulator.simulator

Bandit Simulator.

Functions

run_bandit_simulation(bandit_feedback, policy)

Run an online bandit algorithm on the given logged bandit feedback data.

obp.simulator.simulator.run_bandit_simulation(bandit_feedback: Dict[str, Union[int, numpy.ndarray]], policy: Union[obp.policy.base.BaseContextFreePolicy, obp.policy.base.BaseContextualPolicy]) → numpy.ndarray[source]

Run an online bandit algorithm on the given logged bandit feedback data.

Parameters
  • bandit_feedback (BanditFeedback) – Logged bandit feedback data used in offline bandit simulation.

  • policy (BanditPolicy) – Online bandit policy evaluated in offline bandit simulation (i.e., evaluation policy).

Returns

action_dist – Action choice probabilities (can be deterministic).

Return type

array-like, shape (n_rounds, n_actions, len_list)