Dissertation > Excellent graduate degree dissertation topics show

The Research on Cross Language Text Categorization Based on Interlingua Semantic

Author: BiWenXia
Tutor: WangMingWen
School: Jiangxi Normal University
Course: Computer System Architecture
Keywords: Middle semantic Cross - language text classification Cross-language information retrieval Partial Least Squares Latent Semantic variables on
CLC: TP391.1
Type: Master's thesis
Year: 2008
Downloads: 65
Quote: 0
Read: Download Dissertation

Abstract


With the development of the Internet, the network has become an important source of people to obtain information at the same time, information from government departments, academic and business areas also increased dramatically, these information covered is a multilingual knowledge base, but generally The situation is that most people usually only used to find relevant information in their own mother tongue, so people can understand the Internet information is often just the tip of the iceberg. The multilingual Internet information and people can skilled use of the limited nature of the language, the language has become one of the major obstacles for people to access to information and understanding. Cross-language text classification techniques emerged as a powerful means of organization and management from government departments and academic areas, commercial areas, as well as international organizations within the multi-language text, is being more and more attention. It can overcome language barriers, so that users can more effectively manage and positioning needs information. Dictionary mode and machine translation technology had become a hot research people to conduct cross-language text classification technology. Bilingual dictionary to do the translation, dictionary-based mode is the main problem here is the ambiguity of the word, a word may have multiple meanings, so the problem of the choice of words of a similar general machine translation system. Another problem is that the coverage of the dictionary itself is not enough, the dynamic proper nouns, such as names, places, institutions name with each passing day, most likely in the process of translation can not find in the dictionary. The machine translation literature translation, the literature translated shortcomings encounter large text collection efficiency in the implementation does not spend too costly. LSI technology does not translate, but the calculation of SVD to spend time, K values ??can only be determined by repeated attempts. In response to these problems, we propose a the middle semantic cross-language text classification model, the model under a unified framework for bilingual corpus of parallel document modeling, extraction of semantic corresponding relations between the bilingual. A more detailed elaboration of the principle of cross-language text classification model based on the middle semantic feature dimension and potential variables logarithmic change the stability of the classification performance. And cross-language text classification with single-language text classification comparison, the experimental results show that, based on the middle of the cross-semantic language text classification classification stability and accuracy. The innovation of this paper are: first, partial least squares theory using improved technology, the new intermediate semantics-based cross-language text classification model; Second, the establishment of the Chinese and English parallel corpus for future expansion English parallel corpus lay a foundation.

Related Dissertations

  1. Soft-sensing Technology in the Ethylene Distillation Process Applied Research,TQ221.211
  2. Design of Small-sized Immersed Instrument of COD Using Uy-vis Spectrophotometry,TH744.121
  3. Dynamic Measurement of Earnings Quality of Listed Companies of Steel Industry,F275
  4. Research and Application of Extraction Process Soft-sensor Based on an Adaptive Hybrid Model,TP274
  5. Evaluation and Analysis Operational Behavior of Electordes,TG422.1
  6. Personalized recommendation based image browsing and retrieval of relevant methods,TP391.41
  7. Research on Teaching Quality Evaluation Based on Partial Least Squares Regression and Fuzzy Comprehensive Evaluation,G647
  8. Distributing Characteristics of Microcystins and Regression Model of Cytotoxicity/Genotoxicity on Pollution-Spectrum with Huai River Water Organic Extract from X County,R114
  9. Serum Metabolite Profiling of the Hepatitis B Virus Related Cirrhosis,R512.62
  10. Research on cross-language text categorization,TP391.1
  11. Study the Relationship of Capital-GDP Marginal Growth Rate and the Industrial Structure,F127
  12. Research on Construction and Application of English-Chinese Comparable Corpora,TP391.1
  13. Research of the Model of Enterprise Competitive Intelligence Collection System Based on Cross-Language Information Retrieval,TP391.3
  14. The Research of Link Structure in Tibetan Web Base on Social Network Analysis,TP393.09
  15. Research of Multi-spectral Imaging System,TP391.41
  16. Dynamic Measurement of Earnings Quality of Listed Companies of Elecrtonic and Information Technology Industry,F275
  17. Study on Performance Monitoring for Process Industry Based on Data-driven Technique,TH86
  18. Three Gorges Reservoir Area Land Use Dynamic Analysis and Prediction of Trend,F301
  19. Classification based on kernel partial least squares spam filtering research,TP393.098
  20. Research on Earnings Quality Appraisement of Listed Chemical Industry Companies Based on SEM,F407.7
  21. Antiaircraft Missile Weapon System Life Cycle Cost Management,F406.72

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing
© 2012 www.DissertationTopic.Net  Mobile