Dissertation > Excellent graduate degree dissertation topics show

A Full-text Indexing Model Based on Suffix Array and Posting List

Author: GuoPengFei
Tutor: ZuoYouLi
School: Beijing Jiaotong University
Course: Computer Science and Technology
Keywords: Full-text Indexing Model Phrase Query Rank Query Self-index Inverted Index Suffix Array
CLC: TP391.3
Type: Master's thesis
Year: 2014
Downloads: 11
Quote: 0
Read: Download Dissertation

Abstract


Full-text retrieval system supports quick retrieving information from the massive text data, which has important application value. Full-text indexing model is the core of full-text retrieval system. It determines what functionality and performance that the full-text retrieval system can provide. The design of full-text indexing model is an important issue.The performance evaluation criteria includes query time, construction time and storage space; the functional evaluation criteria include self-index, rank query, phrase query and word boundary undetermined language adaptability.The inverted index model has fast query speed and small storage space, support the rank query, but it cannot support the phrase query well, unable to adapt to the boundary undetermined language well such as Chinese. Suffix tree and suffix array indexing model support the phrase query and self-index, also support word boundaries undetermined language, but do not support rank query. The ST-PL and CII indexing model combine with the advantages of suffix tree and inverted index.This paper has proposed SA-PL indexing model which combined suffix array with posting list. It has taken advantage of features that the suffix array supports rank query, phrase query, self-index and boundary uncertain language and has smaller storage space than suffix tree. The model aims to provide performance optimization of time and space compression under the premise of the same function of ST-PL and CII indexing model.The SA-PL-0indexing model designed according to the SA-PL indexing model.On the basis of the SA-PL-0indexing model, the SA-PL-1indexing model which reduced the index space by remove the short posting lists has been proposed.SA-PL-0, SA-PL-1, ST-PL and CII indexing model have been implemented. The experiment shows the SA-PL-0and the SA-PL-1indexing model could provide rank query, phrase query, self-index and supported boundary uncertain language well. They have advantages in time and spance over the ST-PL and CII indexing model.The SA-PL-1indexing model outperforms the other models.

Related Dissertations

  1. The Research on Full-Text Search and Related Technologies,TP391.3
  2. Research on Vertical Search Engine on Mobile Platform,TP391.3
  3. Content-Based Fast Audio Retrieval,TP391.3
  4. Improvement of Deep Packet Service Identification in P2P Network,TP393.02
  5. The Research of Fuzzy Query Based on Keywords,TP311.13
  6. Research on Secure Index Structure of Cipher-text Retrieval System,TP391.3
  7. Information retrieval system based on words associated with the degree of,TP391.3
  8. Research of Chinese Information Retrieval System and Document Reranking,TP391.3
  9. Research and Implementation of Ciphertext Retrieval System,TP391.3
  10. Research and analysis of the full-text index in index merge algorithm,G354.4
  11. Research and Implementation on User Interests Modeling for Personalized Information Retrieval,TP391.3
  12. Research and Application of MultiMedia Database Retrieval Technology,TP311.13
  13. Study based on the data compression of the information retrieval technique,TP391.3
  14. Research on Data Store of Search Engine,TP391.3
  15. A Research of Full-Text Retrieval Based on Inverted Index,TP311.13
  16. Algorithm Design and System Implementation of Search Engine Confederation,TP391.3
  17. An Efficient Web Traversal Pattern Mining Algorithm Based on Suffix Array,TP393.09
  18. Research on Full-Text Retrieval Technology for the Single Chinese Character,TP391.3
  19. Chinese Word Segment Based on Dictionary and Suffix Array,TP391.1
  20. Research on Unstructured Information Management of Digital TV,TN949.197
  21. A Restricted Domain Text Retrieval System,TP391.3

CLC: > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine
© 2012 www.DissertationTopic.Net  Mobile