Anomaly Detection Research Based on Similarity Analysis of Time Series

Anomaly Detection Research Based on Similarity Analysis of Time Series

Author: ChenRan
Tutor: DaiQi
School: Southwest Jiaotong University
Course: Applied Computer Technology
Keywords: time series patern representation similarity measure anomaly detection
CLC: TP311.13
Type: Master's thesis
Year: 2011
Downloads: 166
Quote: 2
Read: Download Dissertation


With the rapid development of economy and technology, people are increasingly concerned about the various types of data and reliance how to manage and use massive amounts of data effectively and find out that the law behind the data have been become a great concern of researchers of data mining. As an important research topic in data mining, mining and forecasting on time series develops rapidly in recent years. Time series data mining can extract hidden and potentially useful knowledge from large amounts of data which maybe omitted by users.In this thsis, anomaly detection on time series is main subject. We have studied the representation of time series models, time series similarity measurment, time series anomaly detection and other issues. The main research work and results are summarized as follows:1. The algorithm of time series segmentation based on the series important points can better retained global characteristics of series and fitting high accuracy. The traditional segmentation algorithm chooses segment point can only through error threshold but fixed number of subsection. It can not meet the application which require fix segment number. This thsis proposes an algorithm based on fixed number of PIPs detection(PLR_FPIP), which uses the ideas of binary tree level traversal, re-adjust the order of the original method and use PIPs composed of straight time series. Experimental results show that this algorithm can reflect the main characteristics of time series in cases of fixed number of PIPs, and the algorithm is simple, fast and low total error.2. In this thsis, we proposed a new time series segmentaion approach called DTPD(Dynamic Translation Pattern Distance), which consists of SPD(Single Pattern Distance) and FPD(Full Pattern Distance). SPD used to compare the similarity between a single pattern, FPD used to compare between the pattern groups similarity, that is the whole similarity between time series. FPD using the ideas similar to dynamic warping distance (DTW), and integrated SPDs for the whole value (FPD), and as a measure of similarity between candidate sequences. Experimental results show that the method is accurate and efficient in the laboratory data set clustering.3. After we studied the LOF approach, we proposed an improved approach called PLOF(Local Outlier Factor Based On Pattern). The method uses the SPD to measure the pattern sequence’s similarity, which greatly reducing the computational time of the original algorithm, and filters the noise. Thus, it can find the’abnormal’patern with global vision. Experiments show that the method is accurate.

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
