# Selected Tutorials

Tutorial_Scoreboard_Update

# KDD Cup|Humanities Track Tutorial¶

This Tutorial exposes participants on approaches to learn a sequence of intervention based decisions (Actions) from a malaria modelling environment. We'll introduce notions of State, Action and Reward, in order to descibe some approaches to this challenge.

### State¶

Observations for the challenge models occur over a 5 year timeframe and each year of this timeframe may be considered as the State of the system. With the possiblity to take one Action for each State. While it should also be noted this temporal State transition is fixed and as such not dependant on the Action taken.

$$S \in \{1,2,3,4,5\}$$

### Action¶

Consider Actions as a combination of only two possible interventions i.e. Insecticide spraying (IRS) and distributing bed nets (ITN) based on our model description. $a_{\mathrm{ITN}} \in [0,1]$ and $a_{\mathrm{IRS}}\in [0,1]$. Action values between O and 1 describe a coverage range of the intervention for a simulated human population.

$$A_S = [a_{\mathrm{ITN}}, a_{\mathrm{IRS}}]$$

### Reward¶

A reward function determines a Stochastic Reward for a Policy over the entire episode, this function acts to determine the Health outcomes per unit cost for the interventions implementated in the policy. In order to have a notion of goodness maximising the Reward we negate this value.

$$R_\pi \in (-\infty, \infty)$$

### Policy¶

Therefore a Policy ($\pi$) for this challenge consists of a temporal sequence of Actions, defined in the code as:

### Dependancies:¶

In [1]:
policies = []
policy = {}

policy['1']=[.55,.7]
policy['2']=[0,0]
policy['3']=[0,0]
policy['4']=[0,0]
policy['5']=[0,0]

policies.append(policy)
print(policies)

[{'1': [0.55, 0.7], '2': [0, 0], '3': [0, 0], '4': [0, 0], '5': [0, 0]}]

In [2]:
import os
from sys import exit, exc_info, argv
import random
import numpy as np
import pandas as pd

!pip3 install git+https://github.com/slremy/netsapi --user --upgrade

from netsapi.challenge import *

Collecting git+https://github.com/slremy/netsapi
Cloning https://github.com/slremy/netsapi to /private/var/folders/9s/jzvww38j21xc6dwlnwyw5cd00000gn/T/pip-06qh4ym9-build
Installing collected packages: netsapi
Found existing installation: netsapi 1.0
Uninstalling netsapi-1.0:
Successfully uninstalled netsapi-1.0
Running setup.py install for netsapi ... done
Successfully installed netsapi-1.0
You are using pip version 9.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


# Evaluating Policies as a Stochastic Multi-armed Bandit¶

Evaluating Policies as a Sequence of Actions : In this section, we run several experiments with the same Policy and visualize the Rewards to indicate the stochasticity with MatPlotLib visualisations

Evaluating a single known Policy

Let's start with a current intervention campaign 55% ITN and 70% IRS coverage and obtain our reward.

In [3]:
print(policies[0]['1']) #Action in Year 1

env = ChallengeEnvironment() #Initialise Challenge Environment
reward = env.evaluateReward(np.asarray(policies[0]['1'])) #This has been negated and any reward should be maximised

print(reward)

[0.55, 0.7]
1000  Evaluations Remaining
12.948810075957244


Update to Sequential Decision Environment

In [4]:
envSeqDec = ChallengeSeqDecEnvironment() #Initialise a New Challenge Environment to post entire policy
reward = envSeqDec.evaluatePolicy(policies[0]) #Action in Year 1 only

print(reward)

1005  Evaluations Remaining
11.311099498167504


Observing Stochastic Rewards for a single policy

Let's repeat evaluations for the policy above and visualise the rewards as a boxplot (MatPlotLib may be required but you are free to visualise the data how you see fit)

In [5]:
rewards = [reward]
for i in range(10):
reward = envSeqDec.evaluatePolicy(policies[0])
rewards = np.append(rewards, reward)

print(rewards)

1000  Evaluations Remaining
995  Evaluations Remaining
990  Evaluations Remaining
985  Evaluations Remaining
980  Evaluations Remaining
975  Evaluations Remaining
970  Evaluations Remaining
965  Evaluations Remaining
960  Evaluations Remaining
955  Evaluations Remaining
[11.3110995  11.57090023 13.73156669 11.73709534 12.13090915 13.2724343
11.84748624 12.42272096 11.65866701 12.72114325 12.28536912]

In [6]:
import matplotlib.pyplot as plt

plt.boxplot(rewards)
plt.xlabel('Action [ITN,IRS]')
plt.ylabel('Rewards')

plt.show()

<Figure size 640x480 with 1 Axes>

# Evaluating Policies as a Sequence of Actions¶

High performing Policies may also be found through framing this as a sequential decision making problem and so this may be extended to the reinforcement learning paradigm as described above. Rewards can be returned for each state, action pair. Observations are the simplest case and only the year in which the decision is being made, rewards may be observed based on an action taken in each state in the sequence.

In [7]:
episode_count = 1

reward = 0
for i in range(episode_count):
envSeqDec.reset()
episodic_reward = 0

while True:

#Agent Training Code here
action = [abs(np.sin(reward)),abs(np.cos(reward))]
envSeqDec.policy[str(envSeqDec.state)] = action

ob, reward, done, _ = envSeqDec.evaluateAction(action)
print('reward',reward)
episodic_reward += reward

if done:
break

policies.append(envSeqDec.policy)
print('policy', envSeqDec.policy)
print('episodic_reward', episodic_reward)

950  Evaluations Remaining
reward 103.53479064947626
949  Evaluations Remaining
reward -0.4566663940986153
948  Evaluations Remaining
reward 6.114871910871425
947  Evaluations Remaining
reward 14.060675679008984
946  Evaluations Remaining
reward 65.97544670812049
policy {'1': [0.0, 1.0], '2': [0.13733153601576423, 0.9905251381037994], '3': [0.4409585598249027, 0.8975274639347521], '4': [0.167519818317901, 0.9858687085361506], '5': [0.9970759695122792, 0.07641669334084424]}
episodic_reward 189.22911855337856


# Generating a Submission file¶

Description of Submission Process

Please do not alter the methods outside of the generate() method. This code is open for your own Agent implementation. The submission file to be scored and generated is based on 10 runs of 20 episodes in which a policy of 5 actions is run, this meets the 1000 evaluations constraint of the initialised environment. When you package your code for evaluation it will be required that you maintain this style.

In [8]:
from sys import exit, exc_info, argv
import numpy as np
import pandas as pd

!pip3 install git+https://github.com/slremy/netsapi --user --upgrade

from netsapi.challenge import *

class CustomAgent:
def __init__(self, environment):
self.environment = environment

def generate(self):
best_policy = None
best_reward = -float('Inf')
candidates = []
try:
# Agents should make use of 20 episodes in each training run, if making sequential decisions
for i in range(20):
self.environment.reset()
policy = {}
for j in range(5): #episode length
policy[str(j+1)]=[random.random(),random.random()]
candidates.append(policy)

rewards = self.environment.evaluatePolicy(candidates)
best_policy = candidates[np.argmax(rewards)]
best_reward = rewards[np.argmax(rewards)]

except (KeyboardInterrupt, SystemExit):
print(exc_info())

return best_policy, best_reward

Collecting git+https://github.com/slremy/netsapi
Cloning https://github.com/slremy/netsapi to /private/var/folders/9s/jzvww38j21xc6dwlnwyw5cd00000gn/T/pip-9mqcmd3o-build
fatal: unable to access 'https://github.com/slremy/netsapi/': Could not resolve host: github.com
Command "git clone -q https://github.com/slremy/netsapi /private/var/folders/9s/jzvww38j21xc6dwlnwyw5cd00000gn/T/pip-9mqcmd3o-build" failed with error code 128 in None
You are using pip version 9.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


This Agent class may then be executed within the EvaluateChallenge Submission class to generate a submission file:

In [ ]:
EvaluateChallengeSubmission(ChallengeSeqDecEnvironment, CustomAgent, "tutorial.csv")

1005  Evaluations Remaining

In [ ]: