All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.Host/author is not responsible for these posts
Monday, January 30, 2023
BITS-WILP-DSAD-Regular-2023-Mid Semester
BITS-WILP-MFDS-Regular 2023 - Jan 2023- Mid semester
Tuesday, January 24, 2023
Midsemester - Information Retrieval -- DSECLZG537 - Jan 7th 2023
Monday, January 23, 2023
Regular - Mid Semester - Deep Learning - DSECLZG524 - 7th Jan 2023
Sunday, January 22, 2023
Midsemester - Regular - SPA - Question Paper
Monday, January 9, 2023
BITS-WILP-BDS-Regular 2023-Mid Semester
Q1. Discuss briefly 3 key issues that will impact the
performance of a data parallel application and need careful optimization.
Q2. The CPU
of a movie streaming server has L1 cache reference of 0.5 ns and main memory
reference of 100 ns. The L1 cache hit during peak hours was found to be
23% of the total memory references. [Marks: 4]
- Calculate the
cache hit ratio h.
- Find out the
average time (Tavg) to access the memory.
- If the size of
the cache memory is doubled, what will be the impact on h and Tavg.
- If there is a
total failure of the cache memory, calculate h and Tavg.
Q3. A travel review site stores (user, hotel, review)
tuples in a data store. E.g. tuple is (“user1”, “hotel ABC”, “<review>”).
The data analysis team wants to know which user has written the most reviews
and the hotel that has been reviewed the most. Write MapReduce pseudo-code to
answer this question. [Marks: 4]
Q4. An e-commerce site stores (user, product, rating)
tuples for data analysis. E.g. tuple is (“user1”, “product_x”, 3), where rating
is from 1-10 with 10 being the best. A user can rate many products and products
can be rated by many users. Write MapReduce pseudo-code to find the range (min
and max) of ratings received for each product. So each output record contains
(<product>, <min rating> to <max rating>).
[Marks: 4]
Q5. Name a system and explain how it utilises the
concepts of data and tree parallelism.
[Marks: 3]
Q6. An
enterprise application consists of a 2 node active-active application server
cluster connected to a 2 node active-passive database (DB) cluster. Both tiers
need to be working for the system to be available. Over a long period of time
it has been observed that an application server node fails every 100 days and a
DB server node fails every 50 days. A passive DB node takes 12 hours to
take over from the failed active node. Answer the following questions.
[Marks: 4]
- What is the
overall MTTF of the 2-tier system ?
- Assume only a
single failure at any time, either in the App tier or in the DB tier, and
an equal probability of an App or a DB node failure. What is your estimate
of the availability of the 2-tier system ?
Q7. In the
following application scenarios, point out what is most important - consistency
or availability, when a system failure results in a network partition in the
backend distributed DB. Explain briefly the reason behind your answer.
[Marks: 4]
(a)
A limited quantity discount offer on a product for 100 items at an online
retail store is almost 98% claimed. (b) An online survey application records
inputs from millions of users across the globe.
(c) A travel reservation website is trying to sell rooms at a destination that
is seeing very few bookings.
(d) A multi-player game with virtual avatars and users from all across the
world needs a set of sequential steps between team members to progress across
game milestones.
Q8. Assume
that you have a NoSQL database with 3 nodes and a configurable replication
factor (RF). R is the number of replicas that participate to return a Read
request. W is the number of replicas that need to be updated to acknowledge a
Write request. In each of the cases below explain why data is consistent or
in-consistent for read requests.
[Marks: 4]
1.
RF=1, R=1, W=1.
2. RF=2, R=1, W=Majority/Quorum.
3. RF=3, R=2, W=Majority/Quorum.
4. RF=3, R=Majority/Quorum, W=3.
BITS-WILP-SPA-Regular 2020-Mid Semester
Monday, January 2, 2023
Deep Learning - Mid Semester - Makeup - DSECLZG524
Sunday, January 1, 2023
Information Retrieval -- DSECLZG537 - Mid Semester Question Paper - June 2021
Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
June
2021
Mid-Semester
Test
(EC-1
Regular)
Course No. : SS ZG537
Course Title : INFORMATION RETRIEVAL
Nature of Exam : Closed Book
Weightage : 30%
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q1
– 2+5+3+5=15 Marks
A) Give an example of uncertainty and
vagueness issues in Information retrieval [2
Marks]
B) Explain the merge algorithm
for the query “Information Retrieval”? What is the best order for query
processing for the query “BITS AND Information AND Retrieval”? What Documents
will be returned as output from the 15 documents? [5 Marks]
Solution:
Merge Algorithm - Intersecting two posting lists : Algorithm
Output document - 11
C)
[3 Marks]
D)
Build inverted index using Blocked
sort-based Indexing for 50 million records. Explain the algorithm in
detail with respect to indexing 50 million records. [5 Marks]
Q2 – 5+5+5=15 Marks
A) Assume
a corpus of 10000 documents. The
following table gives the TF and DF values for the 3 terms in the corpus of
documents. Calculate the logarithmic TF-IDF values. [5
Marks]
Term |
Doc1 |
Doc2 |
Doc3 |
bits |
15 |
5 |
20 |
pilani |
2 |
20 |
0 |
mtech |
0 |
20 |
15 |
Term |
dft
|
bits |
2000 |
pilani |
1500 |
mtech |
500 |
B)
Classify the test document d6 into c1 or c2 using naïve bayes classifier. The documents
in the training set and the appropriate class label is given below. [5 Marks]
|
Docid |
Words in document |
c= c1 |
c= c2 |
Training
Set |
d1 |
positive |
Yes |
No |
|
d2 |
Very
positive |
Yes |
No |
|
d3 |
Positive
very positive |
Yes |
No |
|
d4 |
very
negative |
No |
Yes |
|
d5 |
negative
|
No |
Yes |
Test
Set |
d6 |
Negative
positive very positive |
? |
? |
C)
The search engine ranked results on 0-5 relevance scale: 2, 2, 3, 0, 5.
Calculate the NDCG metric for the same. [5
Marks]