At the beginning of Phase II we will clear the leaderboard.
We have to do this, because we have reasonable doubt that
there are a lot of duplicate accounts and hand labeling done.
Also, we will be requesting two new things:
1. Affiliations - We request you to provide your school or work
email addresses as affiliations. You can also participate
un-affiliated, however you cannot change from un-affiliated
to school or work.
2. Submission of code : We will be requiring submission of code
along with each submission you make in a day.
Please note: We reserve the rights to remove participants whom
we find reasonable doubts of hand labeling or having multiple
accounts. It is against the spirit of the competition. ( See Rules Page)
In recent years there has been an explosion of papers on time series anomaly detection appearing in SIGKDD and other data mining, machine learning and database conferences. Most of these papers test on one or more of a handful of benchmark datasets, including datasets created by NASA, Yahoo, Numenta and Tsinghua-OMNI (Pei’s Lab) etc.
While the community should greatly appreciate the efforts of these teams to share data, a handful of recent papers [a], have suggested that these are unsuitable datasets for gauging progress in anomaly detection.
In brief, the two most compelling arguments against using these datasets are:
Beyond the issues listed above, and the possibility of file drawer effect [b] and/or cherry-picking [c], we believe that the community has been left with a set of unsuitable benchmarks. With this in mind, we have created new benchmarks for time series anomaly detection as part of this contest.
The benchmark datasets created for this contest are designed to mitigate this problem. It is important to note our claim is “mitigate”, not “solve”. We think it would be wonderful for a large and diverse group of researchers to address this issue, much in the spirit of CASP [d].
In the meantime, the 250 datasets that are part of this challenge reflect more than 20 years work surveying the time series anomaly detection literature and collecting datasets. Beyond the life of this competition, we hope that they can serve as a resource for the community for years to come, and to inspire deeper introspection about the evaluation of anomaly detection.
Further, in order to keep the spirit of the competition high, we would like to thank Hexagon-ML for not only sponsoring the competition but also providing the winning price rewards:
- First Prize : $2000 USD
- Second Prize : $1000 USD
- Third Prize : $500 USD
- For the top 15 participants we will provide a certificate with rank.
- For all other participants we will provide participation certificate
We hope you will enter the contest, and have lots of fun!
Prof. Eamonn Keogh, UC Riverside and Taposh Roy, Kaiser Permanente
Cite this competition:
Keogh, E., Dutta Roy, T., Naik, U. & Agrawal, A (2021).
Multi-dataset Time-Series Anomaly Detection Competition, SIGKDD 2021.
Evaluation for this competition will be done based on the outcomes of Phase II only.
There will be a public leaderboard showcasing the results instantly as you submit a submission file.
The private leaderboard showcasing rank and winner will be released one week after the competition is over on April 16th 2021. This will be the final leaderboard.
We will use percentage as a metric to compute the forecast.
Every time-series has exactly one anomaly.
For every correct identification of the location of anomaly you will get 1 point and 0 points for every incorrect.
We have added +/- 100 locations on either side of the anomaly range to award the correct answer.
There are 250 files, for every correct answer you will get 1 point and 0 for incorrect. The max score you can obtain is 100% ( as long as you do this in code using an algorithm, no hand labeling). We reserve rights to disqualify any participant if we suspect of any. Please see rules for participating in the competition.
We reserve the right to change the rules and data sets if deemed necessary.
Updates for Phase II (from description page):
Q) Why must you submit code with every attempt?
(A) Recall the goal of the contest is not to find the anomalies. The goal of the contest is to produce a single algorithm that can find the anomalies [*]. If your submission turns out to be competitive, your submitted code allows you to create a post-hoc demonstration that it was the result of a clever algorithm.
(Q) Why must you use an official university or company email address to register?
(A) Experience in similar contests suggest that otherwise individuals may register multiple times to glean an advantage. It is hard to prevent multiple registrations, but this policy goes someway to limit the utility of an unfair advantage.
[*] Of course, the “single algorithm” can be an ensemble or a meta-algorithm that automatically chooses which algorithm and which parameters to use. However, having a human to decide which algorithm, or which parameters on a case-by-case basis is not allowed. This is not a test of human skill, this is a test of algorithms.
The spirit of the contest is to create a general-purpose algorithm for anomaly detection. It is against the spirit of this goal (and explicitly against the rules) to embed a human’s intelligence into the algorithm based on a human inspection of the contest data. For example, a human might look at the data and notice: “When the length of the training data is odd, algorithms that look at the maximum value are best. But when the length of the training data is even, algorithms that look at the variance are best.” , then the human might code up a meta algorithm like “If odd(length(training data)) then invoke … ”.
There is a simple test for example above, if we just duplicated the first datapoint, would the outcomes be essentially identical?
Of course, an algorithm can be adaptive. If the representational power of your algorithm is able to discover and exploit some regularity, then that is fine. However, an algorithm that logically memorizes which of datasets it looking at, and changes it parameters/behavior based on that observation (not on the intrinsic properties of the data it observes) is cheating. Our code review for the best performing algorithms will attempt to discover any such deliberate overfitting to the contest problems.
To receive announcements and be informed of any change in rules, the participants must provide a valid email to the challenge platform
Conditions of participation:
Participation requires complying with the rules of the Competition. Prize eligibility is restricted by US government export regulations. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the Competition design giving him (or her) an unfair advantage are excluded from participation. A disqualified person may submit one or several entries in the Competition and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.
The Winners will be invited to attend a remote webinar organized by the hosts and present their method.
The participants must register to the platform and provide a valid email address. Teams must register only once and provide a group email, which is forwarded to all team members. Teams or solo participants registering multiple times to gain an advantage in the competition may be disqualified.
All work is open source.
The participants who do not present their results at the webinar can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to claim any prize they might win.
One account per participant:
You cannot sign up from multiple accounts and therefore you cannot submit from multiple accounts.
Max team size of 4.
No private sharing outside teams:
Privately sharing code or data outside of teams is not permitted. It's okay to share code if made available to all participants on the forums.
You may submit a maximum of 1 entry per day.
Use of external data is not permitted. This includes use of pre-trained models.
Hand-labeling is not permitted and will be grounds for disqualification.
Knowledge between 2 files should not be shared, any violation of this will lead to disqualification
Submissions should be reasonably constrained to standard Open source libraries (Python, R, Julia and Octave)
If submitted code cannot be run, the team may be contacted, if minor remediation or sufficient information not provided to run the code, the submission will be removed.
If an algorithm is stochastic please make sure you save the seeds.
|1||poteman||70.80000000||5||April 14, 2021, 3:49 a.m.|
|2||gen||70.40000000||3||April 13, 2021, 6:46 a.m.|
|3||NeoZhao||69.60000000||2||April 13, 2021, 12:25 a.m.|
|4||HU WBI||64.80000000||5||April 15, 2021, 12:35 a.m.|
|5||yu||64.00000000||5||April 12, 2021, 1:37 a.m.|
|6||LUMEN||64.00000000||2||April 14, 2021, 10:19 a.m.|
|7||MSD||62.40000000||1||April 9, 2021, 1:12 p.m.|
|8||Old Captain||61.60000000||1||April 12, 2021, 6:21 a.m.|
|9||CASIA||60.40000000||2||April 8, 2021, 7:31 p.m.|
|10||Limos Team||60.40000000||3||April 15, 2021, 12:54 a.m.|
|11||Gidora||58.00000000||5||April 14, 2021, 2:04 p.m.|
|12||FirstDan||57.60000000||3||April 11, 2021, 8:58 p.m.|
|13||walyc||57.60000000||7||April 13, 2021, 6:10 a.m.|
|14||HI||55.20000000||6||April 12, 2021, 4:49 a.m.|
|15||syin1||54.40000000||3||April 12, 2021, 12:39 a.m.|
|16||haizhan||53.60000000||4||April 13, 2021, 1:33 a.m.|
|17||Kubota||53.20000000||3||April 13, 2021, 2:08 a.m.|
|18||kris13||53.20000000||1||April 14, 2021, 4:52 a.m.|
|19||KP||52.40000000||3||April 15, 2021, 2:48 a.m.|
|20||TAL_AI_NLP||52.00000000||5||April 10, 2021, 12:50 a.m.|
|21||hpad||52.00000000||3||April 11, 2021, 1:30 a.m.|
|22||whatsup||52.00000000||1||April 11, 2021, 9:09 a.m.|
|23||exp234||52.00000000||6||April 12, 2021, 1:45 a.m.|
|24||kddi_research||51.60000000||3||April 12, 2021, 6:13 a.m.|
|25||huangguo||50.80000000||4||April 14, 2021, 2:40 a.m.|
|26||xuesheng||50.80000000||5||April 14, 2021, 6:37 a.m.|
|27||insight||50.40000000||3||April 12, 2021, 5:28 a.m.|
|28||AIG_Mastercard||49.60000000||1||April 15, 2021, 1:52 a.m.|
|29||Jim||49.20000000||2||April 12, 2021, 11:33 p.m.|
|30||JJ||47.20000000||2||April 15, 2021, 3:17 a.m.|
|31||lansy||46.40000000||1||April 14, 2021, 7:25 p.m.|
|32||sion||45.60000000||5||April 10, 2021, 12:28 p.m.|
|33||willxu||45.20000000||2||April 12, 2021, 3:44 a.m.|
|34||demo_user||45.20000000||1||April 14, 2021, 6:27 a.m.|
|35||Alibey||44.40000000||2||April 9, 2021, 6:33 p.m.|
|36||wenj||44.00000000||2||April 12, 2021, 11:35 p.m.|
|37||LQKK||42.80000000||1||April 8, 2021, 8:05 a.m.|
|38||Ida||42.80000000||1||April 11, 2021, 5:44 a.m.|
|39||ralgond||42.40000000||8||April 14, 2021, 2:17 a.m.|
|40||linytsysu||42.00000000||1||April 12, 2021, 7:26 a.m.|
|41||166||42.00000000||1||April 13, 2021, 12:33 a.m.|
|42||UCM/INNOVA-TSN||41.60000000||2||April 8, 2021, 3:57 p.m.|
|43||Anony||40.00000000||2||April 13, 2021, 1:42 a.m.|
|44||yuanliu||39.20000000||1||April 10, 2021, 7:57 a.m.|
|45||Liu||39.20000000||1||April 10, 2021, 8:10 a.m.|
|46||Anony||38.00000000||2||April 13, 2021, 2:06 a.m.|
|47||Wakamoto||38.00000000||4||April 13, 2021, 8:25 p.m.|
|48||katsuhito||34.80000000||1||April 14, 2021, 5 p.m.|
|49||mzrske||34.40000000||2||April 14, 2021, 6:06 p.m.|
|50||runningz||34.00000000||1||April 8, 2021, 11:43 p.m.|
|51||yuanCheng||33.60000000||5||April 11, 2021, 10:33 a.m.|
|52||zsyjy||32.00000000||4||April 11, 2021, 6:36 p.m.|
|53||OnePiece||32.00000000||1||April 14, 2021, 2:31 a.m.|
|54||jin||31.20000000||3||April 14, 2021, 11:46 p.m.|
|55||NONE||28.80000000||1||April 9, 2021, 9:49 a.m.|
|56||KeepItUp||22.80000000||1||April 12, 2021, 10:51 p.m.|
|57||sad||22.40000000||3||April 13, 2021, 8:48 p.m.|
|58||AnomalyDetection||22.00000000||1||April 14, 2021, 12:44 a.m.|
|59||sdl-team||21.20000000||3||April 10, 2021, 6:29 a.m.|
|60||itouchz.me||20.80000000||4||April 13, 2021, 12:15 p.m.|
|61||Splunk Applied Research||20.00000000||1||April 8, 2021, 9:56 a.m.|
|62||BigPicture||17.20000000||2||April 14, 2021, 11:15 p.m.|
|63||tEST||12.40000000||2||April 14, 2021, 8:52 p.m.|
|64||Prarthi||12.00000000||1||April 11, 2021, 10:02 a.m.|
|65||Seemandhar||11.60000000||1||April 11, 2021, 9:46 a.m.|
|66||AD||11.60000000||2||April 11, 2021, 8:15 p.m.|
|67||WJH||10.80000000||1||April 14, 2021, 8:23 a.m.|
|68||Songpeix||9.20000000||1||April 11, 2021, 12:57 a.m.|
|69||Pooja||6.40000000||1||April 10, 2021, 12:40 p.m.|
|70||patpat||3.20000000||1||April 15, 2021, 5:54 a.m.|
|71||kmskonilg||2.40000000||2||April 10, 2021, 3:57 a.m.|
|72||guoy||1.60000000||1||April 15, 2021, 1:49 a.m.|
|73||Host||1.20000000||1||April 7, 2021, 11:10 p.m.|
|74||uday||0.00000000||1||April 7, 2021, 10:07 p.m.|
|75||Competition Host||0.00000000||1||April 7, 2021, 11:39 p.m.|
|76||PaulyCat||0.00000000||1||April 13, 2021, 6:30 a.m.|
|77||finlayliu||0.00000000||1||April 14, 2021, 12:25 a.m.|
|78||daintlab||0.00000000||1||April 14, 2021, 5:05 a.m.|
Overview of the Time Series Anomaly Detection Competition
Detecting Anomaly in univariate time series is a challenge that has been around for more than 50 years. Several attempts have been made but still there is no robust outcome. This year Prof. Eamonn Keogh and Taposh Roy as part of KDD Cup 2021 are hosting the multi data set time series anomaly detection competition. This goal of this competition is to encourage industry and academia to find a solution for univariate time-series anomaly detection. Prof. Keogh has provided 250 data-sets collected over 20 years of research to further this area. Please review the brief overview video developed by Dr. Keogh.
Here is a simple example show-casing how to find anomaly in a single time series file provided.
#importing the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matrixprofile as mp
from matrixprofile import *
#reading the dataset
#set window size
#calculating the matrix profile with window size'4'
profile = mp.discover.motifs(profile, k=window_size)