Dissertation > Excellent graduate degree dissertation topics show

Research of Multiple Emails Automatic Summarization

Author: WangBaoXun
Tutor: WangXiaoLong
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: multi-email automatic summarization maximal marginal relevance sentence similarity calculation Trie
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 46
Quote: 1
Read: Download Dissertation

Abstract


With the rapid development of the Internet, more and more Internet users have become the beneficiaries of e-mail service. At the same time the e-mail information in the Internet space is also showing a growing trend. As an accustomed communication tool, some e-mails maybe contain lots of confidential information, which belongs to the state, enterprises and individuals. E-mail content security technology directly related to the country’s political stability, the enterprise’s data security and vital interests of the individual, is of great practical significance. In such a context, this paper presents the research on the email content oriented automatic summarization.Multi-email automatic summarization extracts important or user-interesting information according to emails related to a given topic, and automatically generates a length-fixed summary. A feasible multi-email automatic summarization system is of great help for the monitors to promote the speed and precision of email information processing. In this paper we present and construct a multi-email automatic summarization system based on the retrieval results of the massive email. We mainly focus on the following issues:Firstly, this paper presents a user query-oriented improved extracting method by considering the application environment and the difference between the email content and normal texts. By using this method, the summarization system meets the effectiveness of the system and real-time demand to a certain extent.Secondly, this paper solves the problem of summary sentence extracting by using the maximal marginal relevance model, in order to reduce the redundancy of the summaries while keeping high precision. Based on this, we have done a deep study on the effects made by the sentence relevance calculation and the linear interpolation factor upon the MMR model. Furthermore we present a HowNet based sentence similarity calculating method and a self-adaptive factor choosing model to improve the performance of the summarization system. The intrinsic evaluation shows that the improved system achieves a higher summary quality. Finally, in this paper we have done some researches on a series of other relevant technologies. On the access of the email information, we have implemented the automatic email parsing and content decoding. The problem that the useless information existing in the email content may do adverse effects to the summary results, we have proposed the concept of email content noise and use a rule-based way to remove it. On the high-speed Chinese segment, this paper presents how to apply the Trie tree structure to build a segment dictionary automatically and to search any word fast, as a result of which the response time of the system has been cut down significantly.

Related Dissertations

  1. Phone number classification softswitch platform,TN915.05
  2. An Research on Mining Frequent Itemsets Algorithm Based on Titled-time Window,TP311.13
  3. Study on Packet Forwarding Engine for High Speed Router,TN915.05
  4. Fast and accurate method of structured machine learning,TP181
  5. Distributed firewall policy anomaly detection algorithm research,TP393.08
  6. NP-based multi-function gateway Research and Design,TP393.05
  7. High Performance Packet-classification Algorithm Study,TP393.01
  8. Hash -based word segmentation mechanism design and implementation,TP391.1
  9. Firewall policy based on decision tree algorithm research,TP393.08
  10. Research and Implementation of Packet Classification Technique Based on Varied Step Trie,TP393.07
  11. Categories Based E-commerce Navigation System Design and Implementation,TP311.52
  12. Design on TRIE-Based Soft-forword Route Lookup Module,TP393.02
  13. The Study on High-speed Algorithms of Packet Classification and Its Implement Techniques,TP393.03
  14. The Research of the IP Routing Lookup Scheme Based on Trie,TN915.05
  15. Geological Text Information Extraction Technology,P208
  16. The Research of Fast Packet Classification Algorithm,TP393.08
  17. Research and Implementation of Protocol Based on Content Delivery Network,TP393.04
  18. The Research of IPv6 Routing Lookup Algorithm Based on Hash Table and Multibit Trie,TP393.02
  19. Study on Efficient Indexing for Large Scale Chinese Text Retrieval Systems,TP391.3
  20. Research on Chinese Word Segmentation for Large Scale Information Retrieval,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile