forked from twitter/AnomalyDetection
-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathREADME.Rmd
186 lines (139 loc) · 6.23 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
output: github_document
editor_options:
chunk_output_type: console
---
[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/AnomalyDetection.svg?branch=master)](https://travis-ci.org/hrbrmstr/AnomalyDetection)
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "##",
fig.path = "README-",
fig.width = 8,
fig.height = 6,
fig.retina = 2
)
```
# AnomalyDetection
Anomaly Detection Using Seasonal Hybrid Extreme Studentized Deviate Test
## Description
A technique for detecting anomalies in seasonal univariate time series.
The methods uses are robust, from a statistical standpoint, in the presence of
seasonality and an underlying trend. These methods can be used in
wide variety of contexts. For example, detecting anomalies in system metrics after
a new software release, user engagement post an 'A/B' test, or for problems in
econometrics, financial engineering, political and social sciences.
## About This Fork
Twitterfolks launched this package in 2014. Many coding and package standards
have changed. The package now conforms to CRAN standards.
The plots were nice and all but terribly unnecessary. The two core functions
have been modified to only return tidy data frames (tibbles, actually). This
makes it easier to chain them without having to deal with list element
dereferencing.
Shorter, snake-case aliases have also been provided:
- `ad_ts` for `AnomalyDetectionTs`
- `ad_vec` for `AnomalyDetectionVec`
The original names are still in the package but the `README` and examples
all use the newer, shorter versions.
The following outstanding PRs from the original repo are included:
- Added in PR [#98](https://github.com/twitter/AnomalyDetection/pull/98/) (@gggodhwani)
- Added in PR [#93](https://github.com/twitter/AnomalyDetection/pull/93) (@nujnimka)
- Added in PR [#69](https://github.com/twitter/AnomalyDetection/pull/69) (@randakar)
- Added in PR [#44](https://github.com/twitter/AnomalyDetection/pull/44) (@nicolasmiller)
- PR [#92](https://github.com/twitter/AnomalyDetection/pull/92) (@caijun) inherently resolved
If those authors find this repo, please add yourselves to the `DESCRIPTION` as
contirbutors.
## What's Inside The Tin
The following functions are implemented:
- `ad_ts`: Anomaly Detection Using Seasonal Hybrid ESD Test
- `ad_vec`: Anomaly Detection Using Seasonal Hybrid ESD Test
## How the package works
The underlying algorithm – referred to as Seasonal Hybrid ESD (S-H-ESD) builds
upon the Generalized ESD test for detecting anomalies. Note that S-H-ESD can
be used to detect both global as well as local anomalies. This is achieved by
employing time series decomposition and using robust statistical metrics, viz.,
median together with ESD. In addition, for long time series (say, 6 months of
minutely data), the algorithm employs piecewise approximation - this is rooted
to the fact that trend extraction in the presence of anomalies in non-trivial -
for anomaly detection.
Besides time series, the package can also be used to detect anomalies in a
vector of numerical values. We have found this very useful as many times the
corresponding timestamps are not available. The package provides rich
visualization support. The user can specify the direction of anomalies, the
window of interest (such as last day, last hour), enable/disable piecewise
approximation; additionally, the x- and y-axis are annotated in a way to assist
visual data analysis.
## Installation
You can install AnomalyDetection from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("hrbrmstr/AnomalyDetection")
```
```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```
## How to get started
```{r message=FALSE, warning=FALSE, error=FALSE}
library(AnomalyDetection)
library(hrbrthemes)
library(tidyverse)
```
```{r message=FALSE, warning=FALSE, error=FALSE}
data(raw_data)
res <- ad_ts(raw_data, max_anoms=0.02, direction='both')
glimpse(res)
# for ggplot2
raw_data$timestamp <- as.POSIXct(raw_data$timestamp)
ggplot() +
geom_line(
data=raw_data, aes(timestamp, count),
size=0.125, color="lightslategray"
) +
geom_point(
data=res, aes(timestamp, anoms), color="#cb181d", alpha=1/3
) +
scale_x_datetime(date_labels="%b\n%Y") +
scale_y_comma() +
theme_ipsum_rc(grid="XY")
```
From the plot, we observe that the input time series experiences both positive
and negative anomalies. Furthermore, many of the anomalies in the time series
are local anomalies within the bounds of the time series’ seasonality (hence,
cannot be detected using the traditional approaches). The anomalies detected
using the proposed technique are annotated on the plot. In case the timestamps
for the plot above were not available, anomaly detection could then carried
out using the AnomalyDetectionVec function; specifically, one can use the
`AnomalyDetectionVec()` method. The equivalent call to the above would be:
```{r eval=FALSE}
ad_vec(raw_data[,2], max_anoms=0.02, period=1440, direction='both')
```
Often, anomaly detection is carried out on a periodic basis. For instance, at
times, one may be interested in determining whether there was any anomaly
yesterday. To this end, we support a flag only_last whereby one can subset the
anomalies that occurred during the last day or last hour.
```{r message=FALSE, warning=FALSE, error=FALSE}
data(raw_data)
res <- ad_ts(raw_data, max_anoms=0.02, direction='both', only_last="day")
glimpse(res)
# for ggplot2
raw_data$timestamp <- as.POSIXct(raw_data$timestamp)
ggplot() +
geom_line(
data=raw_data, aes(timestamp, count),
size=0.125, color="lightslategray"
) +
geom_point(
data=res, aes(timestamp, anoms), color="#cb181d", alpha=1/3
) +
scale_x_datetime(date_labels="%b\n%Y") +
scale_y_comma() +
theme_ipsum_rc(grid="XY")
```
Anomaly detection for long duration time series can be carried out by setting
the longterm argument to `TRUE`.
## Copyright & License
Copyright © 2015 Twitter, Inc. and other contributors
Licensed under the GPLv3
## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.