Dissertation > Excellent graduate degree dissertation topics show

Research on Support Vector Machine Classification Method for Imbalanced Datasets

Author: YangZhiMing
Tutor: PengXiYuan
School: Harbin Institute of Technology
Course: Instrument Science and Technology
Keywords: support vector machines imbalanced dataset kernel optimization data-preprocessing intelligent fault diagnosis
CLC: TP181
Type: PhD thesis
Year: 2009
Downloads: 645
Quote: 3
Read: Download Dissertation

Abstract


Support Vector Machine (SVM) is a kind of machine learning method based on statistical learning theory. Compared with traditional methods such as neural network, SVM can solve many practical problems such as high dimension, nonlinearity and local minima. So it has become a hot issue in the field of machine learning. SVM has strong theoretical foundation and can get excellent generalization ability even if the number of training sample is small. Therefore it is suitable to solve fault diagnosis problem, which is a typical limited sample learning problem. So research on fault diagnosis method based Support Vector Machine has strong theoretical significance and practical engineering meaning.In general, when the diagnosis dataset is balanced distributed, SVM can get desirable result. However, in practical application, fault samples are hard to acquire, which makes the diagnosis dataset highly imbalanced. And it is found that the classification accuracy of SVM for fault sample is much worse than that for normal sample which limits the practical application of SVM for circuit fault diagnosis problems. This dissertation aims at solving the problem that SVM cannot get desirable results for classification on imbalanced datasets. Reseach work includes two main aspectes: the data pre-processing method for imbalanced dataset and SVM modification method for imbalanced datset. Then we apply these methods in analog circuit fault diagnosis field and solve problem of SVM classification accuracy deterioration caused by imbalanced diagnosis dataset in practical application.The main innovative contributions of this dissertation are as follows.1. Synthetic Minority Oversampling TEchnique (SMOTE) is an effective over-sampling technique, but in the process of synthetic sample generating, SMOTE doesn’t consider the true distribution of minority samples and it doesn’t consider the distribution of majority sample in the neighborhood of minority sample either, so it is of some blindness. Therefore, a new kind of over-sampling technique——ASMOTE is proposed. Based on the distribution of the dataset, ASMOTE adjusts the neighbor selective strategy of SMOTE in order to control the quality of new samples. Simulation results show that after preprocessing the dataset by ASMOTE, classification accuracy of SVM classifier is highly improved.2. In the process of boundary data processing, traditional sample cutting technique such as one-sided selection simply removes the boundary samples from the datasets, which makes loss of classification information. For this problem, the dissertation proposes Fuzzy Sampling Cutting Technique based on K-nearest neighbor method. For the classification information loss problem occurred in traditional random undersampling method, the dissertation proposes Guided Undersampling Technique based on unsupervised learning. Experimental results show that after preprocessing datasets by the above two methods, classification accuracy of SVM for imbalanced datasets will be highly improved.3. SVM can be ineffective in classifying the minority sample when it is applied to the problem of learning from imbalanced datasets. In order to design proper SVM modification method to remedy this problem, the dissertation analyzes the true cause of that problem firstly. Then based on this, a kind of SVM modification method——μSVM is proposed. In the new method, the decision region of the minority class is enlarged by adjusting the distance measurement rule in the classifying decision function. Empirical study shows thatμSVM can augment the classification accuracy rate effectively.4. SVM’s theoretical foundation is based on the nonlinear mapping from input space to a high-dimensional feature space to make the dataset linear separable, and it is very hard, sometimes impossible, to acquire the form of this nonlinear mapping. So it is difficult to implement effective modification on SVM in feature space to make it suitable to solve imbalanced classification tasks. For this problem, the thesis proposes a new kind of SVM modification method——BEF-SVM. BEF-SVM uses Biased Discriminant Analysis criterion to measure class separability for imbalanced datasets in the process of kernel optimization, so that the class separability will be enlarged, which in turn improves the prediction accuracy for minority samples.5. For the practical application research on fault diagnosis, the dissertation selects two typical circuits as diagnosis target and simulates the output waveform in PSPICE environment. Then we apply a three stage data-preprocessing method which includes Haar wavelet transform, PCA method and data normalization to extract feature from the circuits. Then these features are used to develop fault diagnosis system based on SVM. For the imbalanced classification problem occurred in practical circuit fault diagnosis application field, different setting parameters and sampling rate are applied in simulation process to generate normal samples and fault samples, then the imbalanced dataset classification methods proposed in the dissertation is applied to solve this imbalance problem. Finally the SVM classification method which is suitable to solve practical analog circuit fault diagnosis problem can be developed.

Related Dissertations

  1. Research on the Classification Based on the Reconstruction of Solder Joint,TP391.41
  2. Research on Predicting Intrinsic Disorder Protein Structure Based on Supervision Manifold Learning Algorithm,Q51
  3. The Research of Online Modeling Based on Gaussian Process,TP181
  4. Research on Classification of Colleges in Our Country Based on Clustering Technology of Data Mining,TP311.13
  5. Sensor Data Transmission and Processing of Inertia Motion Capture System,TP212
  6. Study on the Decision Tree Classification Algorithm and Its Application Based on Rough Set Theory,TP18
  7. The Research on Microscopic Evaluation System of Road Traffic Safety,U491
  8. Support Vector Data Description fault diagnosis,TH165.3
  9. Chinese Word Segmentation System Based on Statistics,TP391.1
  10. Emfs:an Equipment Maintenance and Failure Processing System,TP311.52
  11. Research on Intelligent Fault Diagnosis System of Hydraulic,TV738
  12. Research on Intelligent Fault Diagnosis Method in Point Based on Neural Network,TP183
  13. Research on Processing Technique of Aeroengine Condition Parameters and Its Application,V263.6
  14. Data Processing Method of Dimensionless,TP311.13
  15. Research and Application of Intelligent Battery Fault Diagnosis System,TM910.7
  16. Based on Web Log Mining Research and Implementation of Prototype System,TP311.13
  17. Analysis of Electrical Parameters for Fault Diagnosis in Cage Asynchronous Induction Wind Turbine,TM315
  18. Feather and Down Category Recognition System Based on GA and SVM,TP391.41
  19. Study on the Intelligent Fault Diagnosis System of Vacuum Unit Based on LabVIEW,TP277
  20. Study of the Ship Stabilized Gyrocompass Expert Diognose System,U666.7
  21. Research of the Stock Data Based on Machine Learning Methods,TP181

CLC: > Industrial Technology > Automation technology,computer technology > Automated basic theory > Artificial intelligence theory > Automated reasoning,machine learning
© 2012 www.DissertationTopic.Net  Mobile