Showing posts with label Mid Semester. Show all posts
Showing posts with label Mid Semester. Show all posts

Monday, January 30, 2023

BITS-WILP-DSAD-Regular-2023-Mid Semester









---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

BITS-WILP-MFDS-Regular 2023 - Jan 2023- Mid semester





---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Tuesday, January 24, 2023

Midsemester - Information Retrieval -- DSECLZG537 - Jan 7th 2023

Information Retrieval - Regular - Mid Semester conducted on 7th Jan 2023









---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Monday, January 23, 2023

Regular - Mid Semester - Deep Learning - DSECLZG524 - 7th Jan 2023

Regular - Mid Semester - Deep Learning - DSECLZG524 - 7th Jan 2023







---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Sunday, January 22, 2023

Midsemester - Regular - SPA - Question Paper

BITS - Mid Semester - SPA - Regular - 21st Jan 2023







---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Monday, January 9, 2023

BITS-WILP-BDS-Regular 2023-Mid Semester

===================================================================

Name : MTDSE CLUSTER-MID SEM-1st Sem 2022-2023 EC-2R Jan23
Subject : DSECLZG522 
               BIG DATA SYSTEMS EC 2R
===================================================================

Q1. Discuss briefly 3 key issues that will impact the performance of a data parallel application and need careful optimization. 

Q2. The CPU of a movie streaming server has L1 cache reference of 0.5 ns and main memory reference of 100 ns. The L1 cache hit during peak hours was found to be 23% of the total memory references.         [Marks: 4]

  1. Calculate the cache hit ratio h.
  2. Find out the average time (Tavg) to access the memory.
  3. If the size of the cache memory is doubled, what will be the impact on h and Tavg.
  4. If there is a total failure of the cache memory, calculate h and Tavg.

Q3. A travel review site stores (user, hotel, review) tuples in a data store. E.g. tuple is (“user1”, “hotel ABC”, “<review>”). The data analysis team wants to know which user has written the most reviews and the hotel that has been reviewed the most. Write MapReduce pseudo-code to answer this question.         [Marks: 4]

Q4. An e-commerce site stores (user, product, rating) tuples for data analysis. E.g. tuple is (“user1”, “product_x”, 3), where rating is from 1-10 with 10 being the best. A user can rate many products and products can be rated by many users. Write MapReduce pseudo-code to find the range (min and max) of ratings received for each product. So each output record contains (<product>, <min rating> to <max rating>).              [Marks: 4]

Q5. Name a system and explain how it utilises the concepts of data and tree parallelism.           [Marks: 3]

Q6. An enterprise application consists of a 2 node active-active application server cluster connected to a 2 node active-passive database (DB) cluster. Both tiers need to be working for the system to be available. Over a long period of time it has been observed that an application server node fails every 100 days and a DB server node fails every 50 days. A passive DB node takes 12 hours to take over from the failed active node. Answer the following questions.            [Marks: 4]

  1. What is the overall MTTF of the 2-tier system ?
  2. Assume only a single failure at any time, either in the App tier or in the DB tier, and an equal probability of an App or a DB node failure. What is your estimate of the availability of the 2-tier system ?

Q7. In the following application scenarios, point out what is most important - consistency or availability, when a system failure results in a network partition in the backend distributed DB. Explain briefly the reason behind your answer.          [Marks: 4]

(a) A limited quantity discount offer on a product for 100 items at an online retail store is almost 98% claimed. (b) An online survey application records inputs from millions of users across the globe.
(c) A travel reservation website is trying to sell rooms at a destination that is seeing very few bookings.
(d) A multi-player game with virtual avatars and users from all across the world needs a set of sequential steps between team members to progress across game milestones.

Q8. Assume that you have a NoSQL database with 3 nodes and a configurable replication factor (RF). R is the number of replicas that participate to return a Read request. W is the number of replicas that need to be updated to acknowledge a Write request. In each of the cases below explain why data is consistent or in-consistent for read requests.               [Marks: 4]

1. RF=1, R=1, W=1.
2. RF=2, R=1, W=Majority/Quorum.
3. RF=3, R=2, W=Majority/Quorum.
4. RF=3, R=Majority/Quorum, W=3.





---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

BITS-WILP-SPA-Regular 2020-Mid Semester

Birla Institute of Technology & Science, Pilani
Work Integrated Learning Programmes Division

Second Semester 2019-2020
Mid-Semester Test
(EC-2 Regular)

Course No. : DSECLZG556
Course Title : Stream Processing & Analytics
Nature of Exam : Closed Book
Weightage : 30%
Duration : 2 Hours
Date of Exam :

Note to Students:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.
----------------------------------------------------------------------------------------------------------------

Q.1. What are streaming data systems? Explain the Generalized Streaming Data architecture and
its various components? [6]

Q.2. For parliamentary elections vote counting updates, a system has been developed which can
be used by interested parties to receive the vote counting related updates. Each constituency is
divided into six blocks. Each block has several voting centers in it. Counting is done center wise
which approximately takes 30 minutes for each center. Once the counting for all the centers in a
block is done then the central system is notified about the latest state of votes received by various
candidates. Giving three reasons, Justify whether the above described system is case of streaming
data or not. [6]

Q.3 Compare the different streaming data delivery protocols with respect to the following points:
I. Message frequency
II. Communication direction
III. Message Latency
IV. Efficiency [6]

Q.4 Consider an international airline which operates both in passenger segment and cargo segment.
For every flight that is flying, the airline captures a lot of data in real time which can be used for
live tracking of flight status, modelling the flight schedules as well as for preventive maintenance
schedule etc. Also, at the same time, the same data was used for various analytical purposes which
are oriented towards improving the airline operations and also for the predicting the passenger
loads, cargo loads in near future and devising the marketing strategies around it. Identify the
appropriate data processing architecture that can help in achieving these use cases. With a help of
architectural diagram, represent the proposed system architecture. [6]
No. of Questions = 05


Q.5 A producer produces messages which are fed to a Kafka topic which has three partitions into
it. Another producer produces messages which are fed to the earlier mentioned Kafka topic as
well as into a different Kafka topic having two partitions into it. There are 6 Kafka brokers in the
system and 3 consumers out of which first two listens to the Kafka topic partitions of first topic
whereas the last one listens to the partitions of second Kafka topic. For each topic partition, 2
replicas are maintained in the cluster. Draw a suitable Kafka Cluster architectural diagram
fulfilling the above-mentioned requirements. [6]




---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Monday, January 2, 2023

Deep Learning - Mid Semester - Makeup - DSECLZG524



























---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Sunday, January 1, 2023

Information Retrieval -- DSECLZG537 - Mid Semester Question Paper - June 2021

 

Birla Institute of Technology & Science, Pilani

Work-Integrated Learning Programmes Division

June 2021

Mid-Semester Test

(EC-1 Regular)

Text Box: No. of Pages        = 2
No. of Questions = 2

 


Course No.                   : SS ZG537  

Course Title                  : INFORMATION RETRIEVAL  

Nature of Exam            : Closed Book

Weightage                    : 30%

 

Note:

1.       Please follow all the Instructions to Candidates given on the cover page of the answer book.

2.       All parts of a question should be answered consecutively. Each answer should start from a fresh page. 

3.       Assumptions made if any, should be stated clearly at the beginning of your answer.

 

Q1 – 2+5+3+5=15 Marks

A) Give an example of uncertainty and vagueness issues in Information retrieval [2 Marks]               

 

B) Explain the merge algorithm for the query “Information Retrieval”? What is the best order for query processing for the query “BITS AND Information AND Retrieval”? What Documents will be returned as output from the 15 documents? [5 Marks]



 


Solution:

Merge Algorithm - Intersecting two posting lists : Algorithm


Output document - 11

 

C) [3 Marks]

 

D) Build inverted index using Blocked sort-based Indexing for 50 million records. Explain the algorithm in detail with respect to indexing 50 million records.                            [5 Marks]

 

 

Q2 – 5+5+5=15 Marks

A)    Assume a corpus of 10000 documents.  The following table gives the TF and DF values for the 3 terms in the corpus of documents. Calculate the logarithmic TF-IDF values.                                                                                                           [5 Marks]

 

Term

Doc1

Doc2

Doc3

bits

15

5

20

pilani

2

20

0

mtech

0

20

15

 

Term

dft

 

bits

2000

pilani

1500

mtech

500

 

 

 

 

 

B) Classify the test document d6 into c1 or c2 using naïve bayes classifier. The documents in the training set and the appropriate class label is given below.  [5 Marks]

                                                                                     

 

 

Docid

Words in document

c= c1

c= c2

Training Set

d1

positive

Yes

No

 

d2

Very positive

Yes

No

 

d3

Positive very positive 

Yes

No

 

d4

very negative

No

Yes

 

d5

negative

No

Yes

Test Set

d6

Negative positive very positive

?

?

 

C) The search engine ranked results on 0-5 relevance scale: 2, 2, 3, 0, 5. Calculate the NDCG metric for the same. [5 Marks]

                                                                                                                      

 





---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.