Natural Language Processing Based Information Retrieval Aspects

Read Complete Research Material



Natural Language Processing Based Information Retrieval Aspects



Acknowledgement

I am heartily thankful to my advisor, Prof. [XYZ], whose encouragement, supervision and support from the preliminary to the concluding level enabled me to develop an understanding of the subject. His technical expertise was instrumental in my success and I owe him my deepest gratitude for the opportunity to work on this project.

I would like to thank my co-advisor Prof. [abc] and Prof. [LMN]. This piece of paper could not have been written without them.

Natural Language Processing Based Information Retrieval Aspects

Abstract

Many of the problems, in information retrieval (IR) stem from the rich, expressive power, in natural language. Natural language ambiguities including lexical, syntactic and semantic ambiguity have been recognized as an enormous block to information processing in general. Usually, a human has little difficulty to find the intended sense of an ambiguous expression. Naturally, it remains a challenge for machines to perform anywhere near a human's comprehension of understandable languages and general concepts. Furthermore, a human may use a number of variations, in words and structure, to describe the same concept, which makes natural languages even more complex for computers to process. In this paper, we try to focus on the Natural Language Processing Based Information Retrieval Aspects.

1 Introduction

According to Bates, the probability of two persons using the term in describing the same thing is less than 20%. It was also found by another study that the probability of two subjects picking the same term for a given entity ranged from 7% to 18%. It is not surprising therefore, the able of traditional, simple keyword search method to handle natural language ambiguities to be very limited. In classic keyword and Boolean search, the exact words are obtained from the user and a collection of documents which contain the exact search words will be returned. The limitation of this approach is that the user input must be clear otherwise it leads to the mismatching problem between the user query and documents [].

1.1 Background

Modem information retrieval begins with recognizing the right meaning of a given word in the context, or simply referred to as concept search. The motivation is to enable a computer to "understand" query and information in data repository before it may reason about a given search query. Formally speaking, a concept search is an automated information retrieval approach that is used to search for information that is conceptually similar to the information provided in a search query, which makes the retrieval process more flexible and intelligent. A lot of efforts have been made to adopt concept search approach to improve the performance of information retrieval. These efforts can be grouped into two types: query expansion and concept indexing depending on whether the concept search works on a query or data repository [].

Query expansion aims to reformulate the original query by further identifying the information. Although this approach is simply called "expansion", the exact techniques include not only extending but also deleting, revising and re-weighting, while query expansion works in the ...
Related Ads