Dissertation > Excellent graduate degree dissertation topics show

Data mining technology and classification algorithm

Author: LiuGang
Tutor: GuoJinGeng
School: PLA Information Engineering University
Course: Computer Software and Theory
Keywords: Data mining KDD (Knowledge Discovery in Databases) SJEP-based classifiers knowledge patterns
CLC: TP311.13
Type: PhD thesis
Year: 2004
Downloads: 3357
Quote: 16
Read: Download Dissertation

Abstract


Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions. Like the other new techniques, however, data mining must develop gradually from concept creation, accepted importance, wide discussion, few usage attempts to a large applications. Most experts consider it as the phase of wide discussion today. It still needs theoretic studies and algorithm exploring. Though some results have been achieved, more theoretic problems are kept in ongoing researches. In addition, data mining is from real applications and must combine with the specific business application logic to solve the specific problem. This is because that different business fields have different mining needs and targets. The successful data mining systems are the excellent combination of data mining techniques and the business logic, rather than tools that are designed to make data mining application development convenient.A data rich but information poor situation makes for the emergency of data mining and within a few years, many people in different fields were interested in data mining. Classification, as an important field in data mining, has been researched earlier in statistics, machine learning, nerve net and expert systems. But most algorithms are memory resident, typically assuming a small data size. With the growth of data volume and dimensionality, it’s a challenge to build an efficient classifier for large databases.Jumping emerging patterns (JEPs), a new kind of knowledge patterns, were recently proposed to capture some crucial difference between a pair of datasets and some JEP-based classifiers were built. Previous studies show that those JEP-based classifiers have good overall predictive accuracy and are scalable on data volume and dimensionality.But they suffer from the large number of mined JEPs, which makes the classifiers complex. In this paper, we propose a special type of JEPs, the most significant jumpingemerging patterns (SJEPs), which are believed to have strong discriminating power and are sufficient for building accurate classifiers. The thesis present a novel algorithm to efficiently mine SJEPs of both, data classes, because existing algorithms can’t directly mine such SJEPs. And how to build a new classifier (SJEP_ Classifier) based on SJEP is introduced.Compared with previous JEP-based classifiers, the classifier based exclusively on SJEPs, which uses much fewer JEPs, not only can achieve almost the same or higher predictive accuracy, but also can finish learning phase in very short time (usually in a few seconds). And our classifier outperforms both CBA and C4.5 generally in terms of average accuracy, which has been shown by our experimental results.In conclusion, this paper analyzes application architecture of data mining systems, creates new mining theoretic models, and designs a a new classifier (SJEP_ Classifier) based on SJEP.

Related Dissertations

  1. A Study on Healthcare Product Marketing Based on Data Mining Technology,F426.72
  2. Gao Zhong-ying academic thought and experience and use of Bufei Decoction treatment of common diseases of the respiratory system drug law,R249.2
  3. Bing- thick academic thought and clinical experience and empirical studies apply to turtle soups treatment of chronic kidney disease,R249.2
  4. The Design and Implementation of Bicluster Data Analyzing Software,TP311.52
  5. Research on Clustering Algorithm Based on Mutation Particle Swarm Optimization,TP18
  6. Research on Fuzzy C-Mean Clustering Algorithm Based on Particle Swarm Optimization and Shuffled Frog Leaping Algorithm,TP18
  7. Research on Clustering Algorithm Based on Genetic Algorithm and Rough Set Theory,TP18
  8. Based on data mining research tax audit case selection,F812.42
  9. Community-oriented education, personalized learning system and its implementation,TP391.6
  10. Association rule mining based Intrusion Detection System Research and Implementation,TP393.08
  11. Data warehouse technology in the banking customer management systems research and implementation,TP315
  12. Design to E-learning System in Senior Vocational School Base on Moodle,TP311.52
  13. Design and Development of Teaching Quality Assessment System Based on Data Mining,TP311.13
  14. The Application of Association Rules Algorithm in Higher Vocational Colleges’ Endorsement of Impoverished Students,G717
  15. Based on Data Mining Technologies in Urban Water Supply Analysis and Decision,F299.24;F224
  16. Web Usage Mining and the Research of Personalized Recommendation,TP311.13
  17. Research on Employment Monitoring System of University Graduate,G647.38
  18. Design and Implementation for Decision Support System of Drug Administration Based on Data Warehouse,TP311.13
  19. A Research on the Credit Card Client Activating and Response Extent Based on Data Mining,F832.2
  20. Application of the ⅡS Log Mining in the E-mail Marketing of E-commerce Website,TP311.13
  21. Liupanshui City tobacco company's human resource management system integrated design and implementation,TP311.52

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems
© 2012 www.DissertationTopic.Net  Mobile