Multi-dataset Time Series Anomaly Detection

Anomaly Detection in Time Series

Work Shop Agenda (Aug 15th 2021)


9:00am -9:30am : Overview of the Competition & announce the results for top 5. [ Dr. Eamon Keogh & Taposh Roy]

9:30am-10:00am : Competition Winner #1 team present their approach and work

10:00am-10:15am: Break

10:15am-11:30am : Lightening Talks - Teams (#2 - #7) 

11:45am-12:00pm : Round Table/Feedback / Closing comments 

* Note: All in Singapore time (GMT + 8)

In recent years there has been an explosion of papers on time series anomaly detection appearing in SIGKDD and other data mining, machine learning and database conferences. Most of these papers test on one or more of a handful of benchmark datasets, including datasets created by NASA, Yahoo, Numenta and Tsinghua-OMNI (Pei’s Lab) etc.

While the community should greatly appreciate the efforts of these teams to share data, a handful of recent papers [a], have suggested that these are unsuitable datasets for gauging progress in anomaly detection.

In brief, the two most compelling arguments against using these datasets are:

  • Triviality: Almost all the benchmark datasets mentioned above can be perfected solved, without the need to look any at any training data, and with decade-old algorithms.
  • Mislabeling: The possibility of mislabeling for anomaly detection benchmarks can never be completely eliminated. However, some of the datasets mentioned above seem to have a significant number of false positives and false negatives in the ground truth. Papers have been published arguing that method A is better than method B, because it is 5% more accurate on benchmark X. However, a careful examination of benchmark X suggests that more that 25% of the labels are wrong, a number that dwarfs the claimed difference between the algorithms being compared.

Beyond the issues listed above, and the possibility of file drawer effect [b] and/or cherry-picking [c], we believe that the community has been left with a set of unsuitable benchmarks. With this in mind, we have created new benchmarks for time series anomaly detection as part of this contest.

The benchmark datasets created for this contest are designed to mitigate this problem. It is important to note our claim is “mitigate”, not “solve”. We think it would be wonderful for a large and diverse group of researchers to address this issue, much in the spirit of CASP [d].

In the meantime, the 250 datasets that are part of this challenge reflect more than 20 years work surveying the time series anomaly detection literature and collecting datasets. Beyond the life of this competition, we hope that they can serve as a resource for the community for years to come, and to inspire deeper introspection about the evaluation of anomaly detection.

Further, in order to keep the spirit of the competition high, we would like to thank Hexagon-ML for not only sponsoring the competition but also providing the winning price rewards:

  • First Prize : $2000 USD
  • Second Prize : $1000 USD
  • Third Prize : $500 USD
  • For the top 15 participants we will provide a certificate with rank.
  • For all other participants we will provide participation certificate

We hope you will enter the contest, and have lots of fun!

Best wishes, 

 

Prof. Eamonn Keogh, UC Riverside and Taposh Roy, Kaiser Permanente


Cite this competition:

Keogh, E., Dutta Roy, T., Naik, U. & Agrawal, A (2021).


Multi-dataset Time-Series Anomaly Detection Competition, SIGKDD 2021.


https://compete.hexagon-ml.com/practice/competition/39/ 


References

[a] https://arxiv.org/abs/2009.13807 Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress. Wu and Keogh

[b] https://en.wikipedia.org/wiki/Publication_bias

[c] https://en.wikipedia.org/wiki/Cherry_picking

[d] https://en.wikipedia.org/wiki/CASP

rwu034 hello.py python 0 March 11, 2021, 10:43 p.m.
finlayliu aaa.py python 0 March 19, 2021, 3:48 p.m.
stekboi ss1.py python 1 March 21, 2021, 9:16 p.m.
stekboi ss1_9JvBFi8.py python 1 March 21, 2021, 9:21 p.m.
zhangbohan kdd.py python 1 March 26, 2021, 2:23 a.m.
cat001 2021kyd2.py python 0 June 10, 2021, 8:25 a.m.
chez8990 test_excel.py python 1 Sept. 20, 2021, 12:58 a.m.
zhanghj test.py python 0 Oct. 20, 2021, 2 a.m.
martin880501 GitUsage.ipynb python 1 Nov. 30, 2021, 5:23 a.m.
Durrr test.py python 1 Aug. 5, 2023, 8:02 p.m.
haya RC_test1.m other 0 Sept. 6, 2024, 8:44 p.m.

Evaluation

  • Evaluation for this competition will be done based on the outcomes of Phase II only.

  • There will be a public leaderboard showcasing the results instantly as you submit a submission file. 

  • The private leaderboard showcasing rank and winner will be released one week after the competition is over on April 16th 2021. This will be the final leaderboard.

  • We will use percentage as a metric to compute the forecast.

  • Every time-series has exactly one anomaly.

  • For every correct identification of the location of anomaly you will get 1 point and 0 points for every incorrect.

  • We have added +/- 100 locations on either side of the anomaly range to award the correct answer.


    Example




    There are 250 files, for every correct answer you will get 1 point and 0 for incorrect. The max score you can obtain is 100% ( as long as you do this in code using an algorithm, no hand labeling). We reserve rights to disqualify any participant if we suspect of any. Please see rules for participating in the competition.

Rules

We reserve the right to change the rules and data sets if deemed necessary.


Updates for Phase II (from description page):


Q) Why must you submit code with every attempt?

(A) Recall the goal of the contest is not to find the anomalies. The goal of the contest is to produce a single algorithm that can find the anomalies [*]. If your submission turns out to be competitive, your submitted code allows you to create a post-hoc demonstration that it was the result of a clever algorithm.

(Q) Why must you use an official university or company email address to register?

(A) Experience in similar contests suggest that otherwise individuals may register multiple times to glean an advantage. It is hard to prevent multiple registrations, but this policy goes someway to limit the utility of an unfair advantage.

[*] Of course, the “single algorithm” can be an ensemble or a meta-algorithm that automatically chooses which algorithm and which parameters to use. However, having a human to decide which algorithm, or which parameters on a case-by-case basis is not allowed. This is not a test of human skill, this is a test of algorithms.   

 


Note:

The spirit of the contest is to create a general-purpose algorithm for anomaly detection. It is against the spirit of this goal (and explicitly against the rules) to embed a human’s intelligence into the algorithm based on a human inspection of the contest data. For example, a human might look at the data and notice: “When the length of the training data is odd, algorithms that look at the maximum value are best. But when the length of the training data is even, algorithms that look at the variance are best.” , then the human might code up a meta algorithm like “If odd(length(training data)) then invoke … ”. There is a simple test for example above, if we just duplicated the first datapoint, would the outcomes be essentially identical? Of course, an algorithm can be adaptive. If the representational power of your algorithm is able to discover and exploit some regularity, then that is fine. However, an algorithm that logically memorizes which of datasets it looking at, and changes it parameters/behavior based on that observation (not on the intrinsic properties of the data it observes) is cheating. Our code review for the best performing algorithms will attempt to discover any such deliberate overfitting to the contest problems.



Announcements:

To receive announcements and be informed of any change in rules, the participants must provide a valid email to the challenge platform

 

Conditions of participation:

Participation requires complying with the rules of the Competition. Prize eligibility is restricted by US government export regulations. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the Competition design giving him (or her) an unfair advantage are excluded from participation. A disqualified person may submit one or several entries in the Competition and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.

Dissemination:

The Winners will be invited to attend a remote webinar organized by the hosts and present their method.

Registration:

The participants must register to the platform and provide a valid email address. Teams must register only once and provide a group email, which is forwarded to all team members. Teams or solo participants registering multiple times to gain an advantage in the competition may be disqualified.

 

Intellectual Policy:

All work is open source. 

 

 

Anonymity:

The participants who do not present their results at the webinar can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to claim any prize they might win. 


One account per participant:
You cannot sign up from multiple accounts and therefore you cannot submit from multiple accounts.  

 

Team Size:

Max team size of 4.

 

No private sharing outside teams:
Privately sharing code or data outside of teams is not permitted. It's okay to share code if made available to all participants on the forums.

 

Submission Limits:
You may submit a maximum of 1 entry per day.

 

Specific Understanding:

  1. Use of external data is not permitted. This includes use of pre-trained models.

  2. Hand-labeling is not permitted and will be grounds for disqualification.

  3. Knowledge between 2 files should not be shared, any violation of this will lead to disqualification

  4. Submissions should be reasonably constrained to standard Open source libraries (Python, R, Julia and Octave)

  5. If submitted code cannot be run, the team may be contacted, if minor remediation or sufficient information not provided to run the code, the submission will be removed.

  6. If an algorithm is stochastic please make sure you save the seeds.

Leaderboard

Rank Team Percentile Count Submitted Date
1 DBAI 88.40000000 39 May 29, 2021, 11:16 p.m.
2 Old Captain 87.60000000 25 May 31, 2021, 8:35 p.m.
3 JJ 87.20000000 17 May 31, 2021, 10:40 p.m.
4 MDTS 87.20000000 27 May 31, 2021, 10:43 p.m.
5 gen 86.80000000 40 May 31, 2021, 11:20 p.m.
6 insight 84.80000000 17 May 31, 2021, 11:05 p.m.
7 HU WBI 84.40000000 24 May 30, 2021, 12:04 a.m.
8 AutoAD 82.80000000 24 May 30, 2021, 8:34 a.m.
9 yu 80.00000000 23 May 30, 2021, 8:23 a.m.
10 yanxinyi 80.00000000 2 May 31, 2021, 5:06 a.m.
11 NSSOL Suzuki 76.80000000 21 May 27, 2021, 11:58 p.m.
12 poteman 76.00000000 14 April 23, 2021, 3:55 a.m.
13 Id&aLab 76.00000000 7 May 31, 2021, 11:46 p.m.
14 TAL_AI_NLP 75.60000000 26 May 31, 2021, 11:32 p.m.
15 ralgond 75.20000000 53 May 31, 2021, 12:17 a.m.
16 JulienAu 74.40000000 12 May 30, 2021, 11:25 a.m.
17 OWLs 73.60000000 7 May 26, 2021, midnight
18 JiaJia 72.00000000 3 May 13, 2021, 2:47 a.m.
19 willxu 72.00000000 10 May 13, 2021, 10:11 p.m.
20 hector 72.00000000 5 May 14, 2021, midnight
21 HI 71.60000000 16 May 18, 2021, 5:43 a.m.
22 WintoMT 71.20000000 23 May 30, 2021, 10:13 p.m.
23 ML_Noob 70.80000000 18 May 29, 2021, 2:01 a.m.
24 LUMEN 70.00000000 13 May 6, 2021, 3:19 p.m.
25 UCM/INNOVA-TSN 70.00000000 20 May 31, 2021, 11:03 p.m.
26 piggim 69.20000000 19 May 30, 2021, 5:18 a.m.
27 Shawn 68.80000000 21 May 31, 2021, 10:54 p.m.
28 deverest 68.40000000 10 May 17, 2021, 6:46 p.m.
29 quincyqiang 67.60000000 1 May 31, 2021, 7:44 p.m.
30 MSD 66.80000000 2 May 1, 2021, 3:12 a.m.
31 takuji 66.80000000 38 May 31, 2021, 11:57 p.m.
32 kddi_research 66.00000000 8 April 27, 2021, 6:20 p.m.
33 NVIDIA Giba 65.60000000 4 April 17, 2021, 10:15 a.m.
34 Vignesh 65.60000000 6 May 24, 2021, 2:02 a.m.
35 Pooja 64.80000000 20 May 26, 2021, 3:15 p.m.
36 WOW 64.40000000 13 May 31, 2021, 11:50 p.m.
37 tmz 64.00000000 7 May 10, 2021, 6:09 a.m.
38 syin1 63.60000000 7 May 14, 2021, 12:06 a.m.
39 hg2 63.60000000 13 May 14, 2021, 5:50 p.m.
40 MeisterMorxrc 63.20000000 18 April 24, 2021, 9:19 p.m.
41 lansy 62.80000000 3 April 18, 2021, 8:33 p.m.
42 NONE 62.80000000 4 May 31, 2021, 3:11 a.m.
43 huangguo 62.40000000 25 April 28, 2021, 6:19 a.m.
44 zzl 62.40000000 13 May 18, 2021, 5:54 a.m.
45 PaulyCat 62.00000000 19 April 26, 2021, 2:38 a.m.
46 lzc775269512 60.80000000 8 May 10, 2021, 11:58 p.m.
47 XYCat 60.80000000 3 May 11, 2021, 8:12 p.m.
48 CASIA 60.40000000 2 April 8, 2021, 7:31 p.m.
49 Limos Team 60.40000000 12 April 15, 2021, 12:54 a.m.
50 NTT DOCOMO LABS 60.00000000 86 May 18, 2021, 4:42 p.m.
51 Songpeix 60.00000000 9 May 28, 2021, 8:25 p.m.
52 Gidora 59.60000000 16 April 24, 2021, 10:20 p.m.
53 166 59.60000000 6 May 31, 2021, 2:32 a.m.
54 Alibey 59.20000000 4 April 21, 2021, 8:21 p.m.
55 AIG_Mastercard 58.80000000 18 April 26, 2021, 9:57 p.m.
56 KP 58.80000000 15 May 27, 2021, 4:56 a.m.
57 kitty 58.40000000 10 May 16, 2021, 9:39 a.m.
58 itouchz.me 58.40000000 15 May 22, 2021, 3:30 p.m.
59 jin 58.00000000 4 April 22, 2021, 7:27 p.m.
60 FirstDan 57.60000000 3 April 11, 2021, 8:58 p.m.
61 walyc 57.60000000 10 April 13, 2021, 6:10 a.m.
62 xus 57.60000000 4 May 23, 2021, 11:40 p.m.
63 Newborn Calves 57.60000000 23 May 25, 2021, 7:30 a.m.
64 BigPicture 57.20000000 7 April 27, 2021, 6:23 p.m.
65 gaodawn 56.80000000 9 May 17, 2021, 8:23 p.m.
66 Unicorns 56.80000000 3 May 25, 2021, 9:47 a.m.
67 varlam 56.40000000 1 April 18, 2021, 9:44 p.m.
68 xuesheng 56.40000000 26 April 19, 2021, 12:05 a.m.
69 haizhan 55.60000000 24 April 21, 2021, 6:05 a.m.
70 AD 55.60000000 21 May 28, 2021, 6:21 a.m.
71 uchicago 55.60000000 3 May 30, 2021, 11:45 p.m.
72 kdd_gcc 55.20000000 3 April 17, 2021, 7:37 a.m.
73 daintlab 54.80000000 22 May 18, 2021, 2:56 a.m.
74 hpad 54.40000000 6 May 10, 2021, 12:52 a.m.
75 kris13 53.20000000 2 April 14, 2021, 4:52 a.m.
76 tang 53.20000000 10 April 30, 2021, 5:18 p.m.
77 SoftLab 53.20000000 2 May 13, 2021, 9:25 a.m.
78 dd 53.20000000 1 May 31, 2021, 6:28 p.m.
79 fizzer 52.80000000 4 April 21, 2021, 11:28 p.m.
80 HanS 52.80000000 4 May 31, 2021, 8:07 a.m.
81 whatsup 52.00000000 1 April 11, 2021, 9:09 a.m.
82 exp234 52.00000000 6 April 12, 2021, 1:45 a.m.
83 darthvarder 52.00000000 2 May 31, 2021, 11:48 p.m.
84 88aaattt 50.80000000 12 April 29, 2021, 10:06 a.m.
85 DayDayUp 50.80000000 5 May 4, 2021, 6:10 a.m.
86 Sai Balaji 50.80000000 1 May 31, 2021, 9:11 a.m.
87 Hello 50.40000000 15 May 10, 2021, 7:49 p.m.
88 iamhlbx 50.40000000 1 May 19, 2021, 6:36 p.m.
89 Snowman 49.60000000 2 April 22, 2021, 4:39 a.m.
90 Jim 49.20000000 3 April 12, 2021, 11:33 p.m.
91 lyxiao 48.80000000 6 May 17, 2021, 4:25 a.m.
92 zeroshot 48.80000000 2 May 19, 2021, 2:11 a.m.
93 linytsysu 48.00000000 6 April 29, 2021, 1:20 a.m.
94 Luuuuu 47.60000000 4 May 31, 2021, 6:22 a.m.
95 zhou 47.20000000 3 May 31, 2021, 6:02 a.m.
96 sion 46.80000000 8 April 24, 2021, 12:57 a.m.
97 AOLeaf 46.00000000 1 April 15, 2021, 9:06 p.m.
98 hren927 46.00000000 4 May 3, 2021, 3:49 a.m.
99 sakami 45.60000000 1 April 29, 2021, 7:44 a.m.
100 Pabba 45.60000000 7 May 25, 2021, 11:16 a.m.
101 HOYO 45.60000000 5 May 29, 2021, 7:34 a.m.
102 demo_user 45.20000000 1 April 14, 2021, 6:27 a.m.
103 xiaoqiangteam 45.20000000 10 May 18, 2021, 12:05 a.m.
104 xuruiyu 44.40000000 5 May 19, 2021, 5:13 a.m.
105 chidata 44.40000000 9 May 26, 2021, 10:44 p.m.
106 AdamK 44.40000000 2 May 26, 2021, 11:15 p.m.
107 wenj 44.00000000 3 April 12, 2021, 11:35 p.m.
108 void 43.20000000 1 April 19, 2021, 7:59 a.m.
109 lyf 43.20000000 8 May 17, 2021, 6:40 p.m.
110 LQKK 42.80000000 1 April 8, 2021, 8:05 a.m.
111 Ida 42.80000000 1 April 11, 2021, 5:44 a.m.
112 Anony 42.80000000 3 April 20, 2021, 5:43 a.m.
113 mouyitian 41.60000000 6 May 24, 2021, 7:54 p.m.
114 Hector 41.20000000 3 April 28, 2021, 12:25 a.m.
115 MouMou1 41.20000000 2 May 9, 2021, 4:53 a.m.
116 Anony 40.00000000 4 April 13, 2021, 1:42 a.m.
117 yuanliu 39.20000000 1 April 10, 2021, 7:57 a.m.
118 Liu 39.20000000 1 April 10, 2021, 8:10 a.m.
119 Splunk Applied Research 38.80000000 5 May 5, 2021, 9:53 a.m.
120 Giba 38.40000000 4 April 15, 2021, 7:25 a.m.
121 Anony 38.00000000 3 April 13, 2021, 2:06 a.m.
122 Anony 37.20000000 1 April 19, 2021, 1:45 a.m.
123 Yb 37.20000000 5 May 18, 2021, 11:45 p.m.
124 UMAC 36.80000000 5 May 5, 2021, 1:06 a.m.
125 Turtelsyu 36.80000000 2 May 9, 2021, 6:44 p.m.
126 Andre 36.00000000 3 May 25, 2021, 3:15 a.m.
127 Emmitt 34.40000000 3 April 24, 2021, 11:56 p.m.
128 kail 34.40000000 3 May 9, 2021, 10:45 p.m.
129 ssgkirito 34.00000000 1 May 31, 2021, 10:22 p.m.
130 yuanCheng 33.60000000 5 April 11, 2021, 10:33 a.m.
131 BMul 33.20000000 5 April 21, 2021, 8:03 a.m.
132 fred 33.20000000 2 May 5, 2021, 1:56 a.m.
133 maradonam 33.20000000 12 May 27, 2021, 3:56 a.m.
134 zsyjy 32.00000000 6 April 11, 2021, 6:36 p.m.
135 hiro 31.20000000 4 May 28, 2021, 8:26 a.m.
136 WJH 30.80000000 4 April 17, 2021, 5:01 a.m.
137 First 28.80000000 1 May 24, 2021, 1:22 a.m.
138 Juanyong 28.40000000 1 May 24, 2021, 1:46 a.m.
139 truck 28.00000000 3 April 30, 2021, 3:03 p.m.
140 HL 26.80000000 7 May 2, 2021, 3:50 p.m.
141 BTDLOZC 25.60000000 1 April 20, 2021, 6:26 p.m.
142 Monkey D. Luffy 25.60000000 8 May 10, 2021, 3:45 p.m.
143 SYSU_HCP 25.60000000 10 May 23, 2021, 11:33 a.m.
144 jarvus 24.40000000 9 April 27, 2021, 1:24 a.m.
145 rushin 24.40000000 2 April 29, 2021, 2:33 a.m.
146 penguin 24.40000000 8 May 2, 2021, 12:07 a.m.
147 KeepItUp 22.80000000 2 April 12, 2021, 10:51 p.m.
148 AnomalyDetection 22.80000000 2 April 17, 2021, 10:41 p.m.
149 ZJU_Control 22.80000000 3 May 6, 2021, 4:34 a.m.
150 BPJ 22.40000000 6 May 30, 2021, 3:11 a.m.
151 bdbdg 22.00000000 1 May 20, 2021, 7:06 p.m.
152 dspreit 21.60000000 1 April 21, 2021, 9:45 p.m.
153 sdl-team 21.20000000 4 April 10, 2021, 6:29 a.m.
154 hot 18.00000000 2 April 17, 2021, 3:58 a.m.
155 Anony 17.60000000 1 April 19, 2021, 11:32 a.m.
156 kmskonilg 17.20000000 3 April 21, 2021, 12:42 a.m.
157 donaldxu 17.20000000 5 April 21, 2021, 8:30 a.m.
158 Rush B 16.40000000 3 April 26, 2021, 3:47 a.m.
159 mike 16.40000000 5 May 10, 2021, 12:34 a.m.
160 qustslxysdxx 14.80000000 3 May 18, 2021, 4:09 a.m.
161 yexxxxxx 12.80000000 1 May 10, 2021, 6:14 p.m.
162 tEST 12.40000000 2 April 14, 2021, 8:52 p.m.
163 Prarthi 12.00000000 1 April 11, 2021, 10:02 a.m.
164 Seemandhar 11.60000000 1 April 11, 2021, 9:46 a.m.
165 HS 11.20000000 1 May 24, 2021, 8:06 a.m.
166 Zoey 10.40000000 2 May 7, 2021, 1:17 a.m.
167 NiGala 10.40000000 1 May 26, 2021, 7:06 a.m.
168 piticli 9.60000000 3 May 4, 2021, 4:16 p.m.
169 axioma 9.60000000 16 May 15, 2021, 3:03 p.m.
170 MirrorCai 9.20000000 1 May 31, 2021, 8:50 a.m.
171 chunli 8.00000000 2 May 3, 2021, 6:24 a.m.
172 ceshi 7.20000000 1 April 24, 2021, 3:58 a.m.
173 zyz 6.80000000 3 April 18, 2021, 6:35 p.m.
174 fa 6.80000000 1 May 21, 2021, 7:02 a.m.
175 zch 6.40000000 7 May 23, 2021, 12:46 a.m.
176 TomatoMan 6.00000000 2 May 28, 2021, 10:15 p.m.
177 baidupedia 4.00000000 1 May 30, 2021, 6:53 a.m.
178 DirkieDye 4.00000000 1 May 30, 2021, 7:24 p.m.
179 Pradeep 4.00000000 1 May 31, 2021, 10:02 p.m.
180 patpat 3.20000000 1 April 15, 2021, 5:54 a.m.
181 nickil21 2.80000000 1 May 5, 2021, 1:23 a.m.
182 ereshkigal 1.60000000 2 May 7, 2021, 2:10 a.m.
183 Host 1.20000000 1 April 7, 2021, 11:10 p.m.
184 Yunfei 1.20000000 1 May 9, 2021, 11:14 p.m.
185 Support 0.40000000 1 April 18, 2021, 4:14 a.m.
186 baozi 0.40000000 1 April 23, 2021, 8:39 p.m.
187 iyoad 0.40000000 1 April 28, 2021, 12:35 a.m.
188 uday 0.00000000 1 April 7, 2021, 10:07 p.m.
189 Competition Host 0.00000000 1 April 7, 2021, 11:39 p.m.
190 finlayliu 0.00000000 1 April 14, 2021, 12:25 a.m.
191 LEARNING 0.00000000 1 May 3, 2021, 10:11 a.m.
192 AXiu 0.00000000 2 May 9, 2021, 4:58 a.m.
193 nvg 0.00000000 1 May 23, 2021, 11:39 p.m.

Getting Started



Overview of the Time Series Anomaly Detection Competition


Detecting Anomaly in univariate time series is a challenge that has been around for more than 50 years. Several attempts have been made but still there is no robust outcome. This year  Prof. Eamonn Keogh and Taposh Roy as part of KDD Cup 2021 are hosting the multi data set time series anomaly detection competition. This goal of this competition is to encourage industry and academia to find a solution for univariate time-series anomaly detection. Prof. Keogh has provided 250 data-sets collected over 20 years of research to further this area. Please review the brief overview video developed by Dr. Keogh.






Here is a simple example show-casing how to find anomaly in a single time series file provided.


#importing the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matrixprofile as mp
from matrixprofile import *

#reading the dataset
df=pd.read_csv('/Users/code/timeseries/ucr_competition_data/005_UCR_Anomaly_4000.txt', names=['values'])

#set window size
window_size=100
#calculating the matrix profile with window size'4'
profile=mp.compute(df['values'].values, window_size)

#discover motifs
profile = mp.discover.motifs(profile, k=window_size)
print(profile['discords'])

624

Teams

741

Competitors

1,967

Submissions