forwards

Tuesday, January 3, 2023

Permuterm index - Information Retrieval

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Difference between Word/term/token and type

Word – A delimited string of characters as it appears in the text.

Term – A “normalized” word (case, morphology, spelling etc); an equivalence class of words.

ex: Same word can be present multiple times, need to consider it all times.

Token – An instance of a word or term occurring in a document.

ex: only time we need to consider how many times the word occurs.

Type – The same as a term in most cases: an equivalence class of tokens.

----------------------------------------------------------------------------

All the messages

below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Issues with Information Retrieval

Issues with Information Retrieval?

Information Retrieval deals with uncertainty and

vagueness in information systems.

Uncertainty: available representation does typically not reflect true semantics/meaning of objects (text, images, video, etc.)

Vagueness: information need of user lacks clarity, is only vague expressed in query, feedback or user actions.

Differs conceptually from database queries!

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Information Retrieval vs Data Retrieval -- Tabular form

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Monday, January 2, 2023

Deep Learning - Mid Semester - Makeup - DSECLZG524

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Sunday, January 1, 2023

Information Retrieval -- DSECLZG537 - Mid Semester Question Paper - June 2021

Birla Institute of Technology & Science, Pilani

Work-Integrated Learning Programmes Division

June 2021

Mid-Semester Test

(EC-1 Regular)

Course No. : SS ZG537

Course Title : INFORMATION RETRIEVAL

Nature of Exam : Closed Book

Weightage : 30%

Note:

1. Please follow all the Instructions to Candidates given on the cover page of the answer book.

2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.

3. Assumptions made if any, should be stated clearly at the beginning of your answer.

Q1 – 2+5+3+5=15 Marks

A) Give an example of uncertainty and vagueness issues in Information retrieval [2 Marks]

B) Explain the merge algorithm for the query “Information Retrieval”? What is the best order for query processing for the query “BITS AND Information AND Retrieval”? What Documents will be returned as output from the 15 documents? [5 Marks]

Solution:

Merge Algorithm - Intersecting two posting lists : Algorithm

Output document - 11

C) [3 Marks]

D) Build inverted index using Blocked sort-based Indexing for 50 million records. Explain the algorithm in detail with respect to indexing 50 million records. [5 Marks]

Q2 – 5+5+5=15 Marks

A) Assume a corpus of 10000 documents. The following table gives the TF and DF values for the 3 terms in the corpus of documents. Calculate the logarithmic TF-IDF values. [5 Marks]

Term	Doc1	Doc2	Doc3
bits	15	5	20
pilani	2	20	0
mtech	0	20	15

Term	dft
bits	2000
pilani	1500
mtech	500

B) Classify the test document d6 into c1 or c2 using naïve bayes classifier. The documents in the training set and the appropriate class label is given below. [5 Marks]

	Docid	Words in document	c= c1	c= c2
Training Set	d1	positive	Yes	No
	d2	Very positive	Yes	No
	d3	Positive very positive	Yes	No
	d4	very negative	No	Yes
	d5	negative	No	Yes
Test Set	d6	Negative positive very positive	?	?

C) The search engine ranked results on 0-5 relevance scale: 2, 2, 3, 0, 5. Calculate the NDCG metric for the same. [5 Marks]

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.