Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
June
2021
Mid-Semester
Test
(EC-1
Regular)
Course No. : SS ZG537
Course Title : INFORMATION RETRIEVAL
Nature of Exam : Closed Book
Weightage : 30%
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q1
– 2+5+3+5=15 Marks
A) Give an example of uncertainty and
vagueness issues in Information retrieval [2
Marks]
B) Explain the merge algorithm
for the query “Information Retrieval”? What is the best order for query
processing for the query “BITS AND Information AND Retrieval”? What Documents
will be returned as output from the 15 documents? [5 Marks]
Solution:
Merge Algorithm - Intersecting two posting lists : Algorithm
Output document - 11
C)
[3 Marks]
D)
Build inverted index using Blocked
sort-based Indexing for 50 million records. Explain the algorithm in
detail with respect to indexing 50 million records. [5 Marks]
Q2 – 5+5+5=15 Marks
A) Assume
a corpus of 10000 documents. The
following table gives the TF and DF values for the 3 terms in the corpus of
documents. Calculate the logarithmic TF-IDF values. [5
Marks]
Term |
Doc1 |
Doc2 |
Doc3 |
bits |
15 |
5 |
20 |
pilani |
2 |
20 |
0 |
mtech |
0 |
20 |
15 |
Term |
dft
|
bits |
2000 |
pilani |
1500 |
mtech |
500 |
B)
Classify the test document d6 into c1 or c2 using naïve bayes classifier. The documents
in the training set and the appropriate class label is given below. [5 Marks]
|
Docid |
Words in document |
c= c1 |
c= c2 |
Training
Set |
d1 |
positive |
Yes |
No |
|
d2 |
Very
positive |
Yes |
No |
|
d3 |
Positive
very positive |
Yes |
No |
|
d4 |
very
negative |
No |
Yes |
|
d5 |
negative
|
No |
Yes |
Test
Set |
d6 |
Negative
positive very positive |
? |
? |
C)
The search engine ranked results on 0-5 relevance scale: 2, 2, 3, 0, 5.
Calculate the NDCG metric for the same. [5
Marks]