Dissertation > Excellent graduate degree dissertation topics show

Research on Outlier Detection Method and Its Key Techniques

Author: ChenBin
Tutor: ChenSongCan
School: Nanjing University of Aeronautics and Astronautics
Course: Applied Computer Technology
Keywords: outlier detection support vector data description robustness weighted averaging possibilisitic C-means multiview learning AUC metric manifold embedding
CLC: TP274
Type: PhD thesis
Year: 2013
Downloads: 65
Quote: 0
Read: Download Dissertation

Abstract


Outlier detection is to detect and discover those abnormal data patterns not conforming to normal(expected) behavior in observed data. These abnormal patterns are noted as outlier, inconsistent point,novelty or stain for different applications. Recent years, outlier detection is widely applied in faultdiagnosis, disease detection, intrusion detection, credit card (or insurance) fraud detection and personidenfication. In these areas, the abnormal pattern often implies significant (usually greatly harmedeven deadly) behavior. For instance, the abnormal traffic (behavior) in Internet may imply the leakageof sensitive information in attacked host, and credict card fraud behavior would lead to greateconomic loss. For the great pratical meaning and value, outlier detection is now becoming a veryactive and hot research area. As a result, many researchers pay close attention to the research in thearea.Different from other learning task, outlier detection task is with only data patterns conforming toexpected behavior (target class), and rare (even no) data patterns not conforming to expected behavior(outlier class). So there exists extreme imbalance (outlier samples are much less than target samples)leading to great difficulty in outlier detection. Therefore, recent research maily focused inunsupervised learning framework and supervised learning method with a very few labeled outliersamples. Based on the deep research on the principles of various outlier detection methods, robustnessto outliers and the embedding of prior knowledge, the contributions of this paper are as followed:1. First, One-cluster Clustering based Data Description (OCCDD) is proposed which employsthe PCM (Possibilisitic C-Mean) algorithm with one cluster, that is, P1M(PCM,C=1) to compute theweights, and hereafter, obtains an enclosing ball with weight averaging. As a result, OCCDD advoidsthe sensitivity to outliers and high training complexity in Support Vector Data Description (SVDD)due to minimax optimization. Second, global optimal charactistic of P1M which original PCM (C>1)has no is proved in theory. In the end, a multiview OCCDD is proposd to adapt the instinctivemultiview property in text classification. Different from general classifers learn in single view,multiview OCCDD simultaneously learns from all views, and increases the performance owing toeach view boosting mutally.2. A SVDD regularized with Area under the ROC curve (AUC) is proposed towards the situationthat outliers lie around the target samples. The regularized SVDD incorporates AUC measure into theoptimizing object of SVDD, and simultaneously optimizes the volume of minimum enclosing ball andAUC performance so as to deal with the extreme balance in class distribution. Then, two speed tricksare proposed to solve the high training complexity after AUC regularization. 3. A designing framework for manifold-based classifier: mXXX≈ISOMAP+XXX (here, XXXdenotes an existed learning algorithm based on Euclid Distance) is proposed, which replaces theEuclid distance in the feature space after ISOMAP dimension reduction by the Geodesic Distance ininput space, and implicitly conducts a ISOMAP without the truly ISOMAP process. When underlyingmanifold of the observed data existed, SVDD performance degrades since Euclid Distance cannotdepict the true geometrical structure, so we extend this method to SVDD and derivate a SVDD withManifold Embedding (mSVDD). After manifold embedding, mSVDD has advantages as follows:(1)With the approximation of Euclid Distances in the feature space induced by ISOMAP process, itsolves the problem that Geodesic Distance based SVDD cannot be directly optimized;(2)It avoidstruly Multidimensional Scaling (MDS) process in ISOMAP and selection of the dimension of theEuclid space after ISOMAP;(3) Different from formal Euclid Distance based SVDD, mSVDD isbased on Geodesic Distance, and implicitly executes a ISOMAP process, thus it can find a manifoldembedding.4. The relationship beween density estimation and domain-based outlier dectectors is revealed,especially, the essential relation between kernel density estimation and two domain-based outlierdetectors (One-Class Support Vector Machine (OCSVM) and SVDD) induced by Gaussian kernel.That is, domain-based outlier detectors are falling into the framework of density estimation. Moreover,the density estimator induced by OCSVM and SVDD is consistent to the true density; meanwhile,optimizing OCSVM and SVDD can also reduce the Integrated Squared Error (ISE).

Related Dissertations

  1. Identification and Robustness Analysis of Nonlinear Hybrid Dynamical System Concerning Glycerol Transport Mechanism,TQ223.162
  2. Researches on Watermarking Algorithm and Its Application to Image Software of Electronic Signature,TP309.7
  3. Observation of Immune and Other Systems’ Functions under Normal Pregnancy Status Based on Automatic Immunochemiluminometry Analyzers,R446.6
  4. Outlier Detection Techniques on Uncertain Moving Objects,TP311.13
  5. Attribute Reduction of Interval-valued Information System Based on Fuzzy Discernibility Matrix,O159
  6. H_∞ Control for Polynomial Systems Based on Sum-of-Squares Optimization,TP13
  7. The Study on Control for Submarine Space Motion,U674.76
  8. Research on Several Machine Learning Methods and Their Applications in Video-Based Fingerprint Verification,TP391.41
  9. The Research on Digital Watermarking Technology for Relational Database,TP309.7
  10. Research into Visual Orientation in the Weeding Robot Based on COCM,TP391.41
  11. Robustness Analysis of Interval Type-2 Fuzzy Logic Systems,O231
  12. The Prediction of the Fossilizable Tendency of English Definite Ariticle of Chinese L2 Learners,H319
  13. Outlier Detection Technique on Uncertain Sensing Data,TN929.5
  14. Study and Building Three-dimensional Network for P2P Overlay Network,TP393.02
  15. The Research on Robust Digital Watermarking Technology Based on Digital Hologram,TP309.7
  16. Research of Multi-project Robust Scheduling Based on Critical Chain,TH186
  17. The Research of Structural System and Arrangement with Good Behavior of Anti-collapse under Large Earthquake,TU352.11
  18. The Perturbation Based on Speech Production and Acquisition of DIVA Model,TN912.3
  19. Design and Implementation of Intelligent Terminal Based on TCP/IP Protocol,TP393.04
  20. Auditory System Property about Speech Signal Process,TN912.3
  21. Reasonability of Geometric Topology of Cable Domes,TU399

CLC: > Industrial Technology > Automation technology,computer technology > Automation technology and equipment > Automation systems > Data processing, data processing system
© 2012 www.DissertationTopic.Net  Mobile