Dissertation > Excellent graduate degree dissertation topics show

Research and Implementation of Retrieval System on Massive Mail

Author: ShiXing
Tutor: LiuBingQuan
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: mass-mail Distributed processing information retrieval index
CLC: TP393.098
Type: Master's thesis
Year: 2008
Downloads: 93
Quote: 0
Read: Download Dissertation


Along with the development of computer and network, e-mail, as an important application of Internet, is welcomed by the people with its convenience and the rapid speed. The individuals, enterprises, government and even the military, are communicating via e-mail for daily life and work. However, the illegal businesses and lawless elements using e-mail push ads, viruses, unhealthy and undermine national stability information, makes potential safety hazard for the individuals, enterprises and the nationality. The mail filtering is the mature technology to filter the spam, but it can’t prevent the propagation of the negative information. It has become a research direction how to retrieval the sensitive information in the massive documents and trace the suspicious information and users. So there is a urgent need to manage and monitor the massive mail safely.This paper analyzes the characteristics and special format of the mail, and the retrieval system of massive mail. The text of mail content, from, to which the user is interested in can be easily searched by the system, so it can solve the monitoring of mail message effectively. In order to improving the efficiency of processing mass-mail, the distributed mail parse, indexing and searching are mainly studied. Firstly, after introducing the theory of the special mail document, this paper analyzes the mail format and proposes a VSM (Vector Space Model) for the mail document. Secondly, the traditional inverted index document is used store the indices. The incremental index, which is different from normal retrieval system, is implemented in this retrieval system. This method saves index update time highly. For speeding up processing the massive mail data, the distributed processing technology is adopted in the system architecture. When pre-process the mail, the distributed processing technology makes one task run on several nodes by implementing the distributed algorithm, leading to the high speed of parsing and indexing. It also makes the search process stable and rapidly. Finally this paper describes the data test, analyzing the parsing and indexing speed between monolithic and the distributed system. Also it draws a conclusion that the search time is depending on the mail scale and the complexity of the query.The series of user operations such as parsing mail, indexing, searching are implemented in this system, combined with the distributed parallel technology. The system uses the invert index to store and manage the mail indices. And in order to meet the demands of the user’s query, the similarity is computed by the mail VSM. At the same time, a good computing capability and application development environment is supplied for the system’s unified interface and method.

Related Dissertations

  1. A Study of the Correlation between English Majors’ Tolerance of Ambiguity and Reading Comprehension Achievement,H319
  2. Research on F&B Index Structure Supporting XML Query,TP311.13
  3. The Study of Text Index Construction for Large-Scale Dynamic Collection,TP391.3
  4. Research on Query Expansion Technique of Retrieval System in Biomedical Field,TP391.3
  5. The Design and Analysis of C Axis Feed Index Device of High Precision Heavy NC Machine Tool,TG659
  6. A Study about the Stock Index Future’s Influence on the Stock Market,F224
  7. Research on the Correlativity between the Common Chinese Medicine Syndromes of CHF and UA、LVMI,R259
  8. Process Optimization and Degradation Kinetics for Microwave Vacuum Drying of Different Traditional Chinese Medicine Extract,TQ461
  9. The Study of Color Change and Control Technology for Oyster Protein Beverage,TS254.4
  10. Discovery and Biological Activities of Natural Michael Addition Acceptors,R284
  11. Research on Index System and Evaluation of University Office Greening,G647
  12. The Study on Risk Evaluation of BOT Projects in China University,G647
  13. Regional Water Environment Pressure Zoning Based on GIS,X321
  14. Municipal tourism land use planning environmental impact assessment,X820.3
  15. The Experimental Research on Regeneration Aggregate and Proportioning of Recycled Concrete Brick,TU528
  16. Professor Wang Qi identified the body - of Diseases - dialectical combination of academic thought and clinical experience and treatment of chronic insomnia clinical studies,R249.2
  17. Application of Improved Principal Component Analysis Algorithm in Course Construction,G642.4
  18. The Application of Fuzzy Comprehensive Evaluation Based on Genetic Algorithm in Vocational Evaluation of Classroom Teaching,G712
  19. Evaluation Index System Structure of University Students’ Choreographing Capability of Physical Education Aerobics Majors,G831.3
  20. The Research on Evaluation Method of Highway Ecosystem Healthy,X826

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Computer network > General issues > The application of computer network > E-mail ( E -mail )
© 2012 www.DissertationTopic.Net  Mobile