obp.simulator.simulator¶

Bandit Simulator.

Functions

run_bandit_simulation(bandit_feedback, policy)

Run an online bandit algorithm on the given logged bandit feedback data.

obp.simulator.simulator.run_bandit_simulation(bandit_feedback: Dict[str, Union[int, numpy.ndarray]], policy: Union[obp.policy.base.BaseContextFreePolicy, obp.policy.base.BaseContextualPolicy]) → numpy.ndarray[source]¶

Run an online bandit algorithm on the given logged bandit feedback data.

Parameters

bandit_feedback (BanditFeedback) – Logged bandit feedback data used in offline bandit simulation.
policy (BanditPolicy) – Online bandit policy evaluated in offline bandit simulation (i.e., evaluation policy).

Returns

action_dist – Action choice probabilities (can be deterministic).

Return type

array-like, shape (n_rounds, n_actions, len_list)