obp.simulator.simulator¶
Bandit Simulator.
Functions
|
Run an online bandit algorithm on the given logged bandit feedback data. |
-
obp.simulator.simulator.
run_bandit_simulation
(bandit_feedback: Dict[str, Union[int, numpy.ndarray]], policy: Union[obp.policy.base.BaseContextFreePolicy, obp.policy.base.BaseContextualPolicy]) → numpy.ndarray[source]¶ Run an online bandit algorithm on the given logged bandit feedback data.
- Parameters
bandit_feedback (BanditFeedback) – Logged bandit feedback data used in offline bandit simulation.
policy (BanditPolicy) – Online bandit policy evaluated in offline bandit simulation (i.e., evaluation policy).
- Returns
action_dist – Action choice probabilities (can be deterministic).
- Return type
array-like, shape (n_rounds, n_actions, len_list)