Friday, December 30, 2022

Tokenization Issues - Information Retrieval

Some of the tokenization issues are below

1. One-word or is it two words 
2.Numbers
3.No Whitespace (Chinese language)
4. Ambiguous segmentation (Same word multiple meanings ex Chinese)
5.Bidirectional (ex : Arabic)
6.Accents and diacritics
7.case folding
8.Stop words 


---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. 
Host/author is not responsible for these posts.

Merge Algorithm - Intersecting two posting lists - Information Retrieval



---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Wednesday, December 28, 2022

Inverted index construction - Information Retrieval














---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Tuesday, December 27, 2022

Evaluation Measures - Information Retrieval








---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Functional View of Paradigm IR System - Information Retrieval




---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

The Process of Retrieving Information -- Information Retrieval







---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Data Retrieval vs Information Retrieval....


1. Matching.

In data retrieval we are normally looking for an  exact match, that is, we are checking to see whether  an item is or is not present in the file.
Ex: Select * from Student where per >= 8.0

In information retrieval more generally we want to  find those items which partially match the request  and then select from those a few of the best  matching ones.
Ex: Student having 8 or > 8 CGPA

2. Inference

In data retrieval is of the simple deductive kind, that is, a ∈ b and b ∈ c then a ∈ c.
In information retrieval it is of inductive inference; relations  are only specified with a degree of certainty or uncertainty  and hence our confidence in the inference is variable.

3.Model

Data retrieval is deterministic but information retrieval is  probabilistic.
Frequently Bayes' Theorem is invoked to carry out inferences in IR, but in DR probabilities do not enter into the processing.

4 .Classification:

In DR most likely monothetic classification is used.
That is, one with classes defined by objects possessing  attributes both necessary and sufficient to belong to a class.

In IR, polythetic classification is mostly used.
Each individual in a class will possess only a proportion of all the attributes possessed by all the members of that class..

5.Query Language:

The query language for DR is one with restricted  syntax and vocabulary.
In IR we prefer to use natural language although there  are some notable exceptions.

6.Query Specification:

In DR the query is generally a complete specification  of what is wanted,
In IR it is invariably incomplete.

7.Items wanted :

In IR we are searching for relevant documents as  opposed to exactly matching items in DR.

8.Error response:

DR is more sensitive to error in the sense that, an  error in matching will not retrieve the wanted item  which implies a total failure of the system.
In IR small errors in matching generally do not  affect performance of the system significantly




----------------------------------------------------------------------------
 All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.