Showing posts with label stream processing and analytics. Show all posts
Showing posts with label stream processing and analytics. Show all posts

Monday, January 9, 2023

BITS-WILP-SPA-Makeup 2021- Final Semester

Birla Institute of Technology & Science, Pilani

Work Integrated Learning Programmes Division

Second Semester 2020-2021


Comprehensive Examination

(EC-3 Make-up)


Course No. :  DSECL ZG556

Course Title :  STREAM PROCESSING AND ANALYTICS

Nature of Exam :  Open Book 

Weightage :  45% 

Duration :  2 Hours 

Date of Exam :  11-09-2021  FN

Note to Students: 

  1. Please follow all the Instructions to Candidates given on the cover page of the answer book.

  2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.  

  3. Assumptions made if any, should be stated clearly at the beginning of your answer. 

 


Q1. Every day a multinational online taxi dispatch company gathers terabytes of event data from its mobile users. By using Kafka, Spark Streaming, and HDFS, to build a continuous ETL pipeline, they can convert raw unstructured event data into structured data as it is collected, and then use it for further and more complex analytics.                                                                                                                            [5 + 5 = 10]

  1. With this scenario in mind, explain how Spark Streaming will be leveraged as solution using a nicely labelled architecture diagram? 

  2. List and briefly explain the Apache Spark API's that can be used in? 

Q.2. Consider the following Kafka Cluster description.                                      

  • 10 node cluster

  • Name of the Topic: Cluster

  • Number of Partitions: 4

  • The Replication factor of ‘Bus: 3

  • 7 producers

  • 5 consumers

  1. Draw Kafka’s architecture clearly highlighting the following in a block diagram

-producers, consumers, broker, topic and partitions. 

  1. How many consumer groups can be created for this configuration?

  2. What is the maximum number of consumers that each consumer group can have while ensuring maximum parallelism?

  3. What is the maximum number of server failures that this setup can handle?

[2 + 1 + 1 + 1 = 5]


Q.3. Explain the various components available in the Apache Flink with suitable real time example.                                                    

                                                                                                                                                         [10]


Q4. Consider an online ecommerce portal where customers can search for the products anonymously but for placing the order, they need to have the account with the provider. When customers are browsing the products on the portal, their online behavior is getting monitored by the provider. The provider has business relationship with another online movie service provider whose movies are also displayed and sold on the provider’s platform. Also the users search queries are shared between these providers. The search queries are also matched with the users profile to provide product / movie recommendations to the users. For this purpose it makes use of Apache Storm as streaming platform. With the help of suitable architectural diagram, represent how this recommendation activity can be carried out.                                                                    [8]                                                                                                                                



Q5. Consider the following streaming SQL query where an output record (or row) is generated specifying the updates to the minimum and maximum temperatures over the window W1, plus an incrementally updated average for the temperature over that period.                                                                                    [3 * 4 = 12]


   SELECT STREAM    

         MIN(TEMP) OVER W1 AS WMIN_TEMP,

         MAX(TEMP) OVER W1 AS WMAX_TEMP,

         AVG(TEMP) OVER W1 AS WAVG_TEMP

    FROM WEATHERSTREAM

    WINDOW W1


Let’s assume that input streaming weather stream has following temperature values coming in at regular interval of two minutes:

{12, 14, 15, 13, 16, 20}


What will be the output of the above query (with proper explanation) if

  1. Window is defined as sliding window of length 3 

  2. Window is defined as batch window of length 3 

  3. Window is defined as sliding window of time 4 minutes

  4. Window is defined as batch window of time 3 minutes




---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

BITS-WILP-SPA-Regular 2020-Mid Semester

Birla Institute of Technology & Science, Pilani
Work Integrated Learning Programmes Division

Second Semester 2019-2020
Mid-Semester Test
(EC-2 Regular)

Course No. : DSECLZG556
Course Title : Stream Processing & Analytics
Nature of Exam : Closed Book
Weightage : 30%
Duration : 2 Hours
Date of Exam :

Note to Students:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.
----------------------------------------------------------------------------------------------------------------

Q.1. What are streaming data systems? Explain the Generalized Streaming Data architecture and
its various components? [6]

Q.2. For parliamentary elections vote counting updates, a system has been developed which can
be used by interested parties to receive the vote counting related updates. Each constituency is
divided into six blocks. Each block has several voting centers in it. Counting is done center wise
which approximately takes 30 minutes for each center. Once the counting for all the centers in a
block is done then the central system is notified about the latest state of votes received by various
candidates. Giving three reasons, Justify whether the above described system is case of streaming
data or not. [6]

Q.3 Compare the different streaming data delivery protocols with respect to the following points:
I. Message frequency
II. Communication direction
III. Message Latency
IV. Efficiency [6]

Q.4 Consider an international airline which operates both in passenger segment and cargo segment.
For every flight that is flying, the airline captures a lot of data in real time which can be used for
live tracking of flight status, modelling the flight schedules as well as for preventive maintenance
schedule etc. Also, at the same time, the same data was used for various analytical purposes which
are oriented towards improving the airline operations and also for the predicting the passenger
loads, cargo loads in near future and devising the marketing strategies around it. Identify the
appropriate data processing architecture that can help in achieving these use cases. With a help of
architectural diagram, represent the proposed system architecture. [6]
No. of Questions = 05


Q.5 A producer produces messages which are fed to a Kafka topic which has three partitions into
it. Another producer produces messages which are fed to the earlier mentioned Kafka topic as
well as into a different Kafka topic having two partitions into it. There are 6 Kafka brokers in the
system and 3 consumers out of which first two listens to the Kafka topic partitions of first topic
whereas the last one listens to the partitions of second Kafka topic. For each topic partition, 2
replicas are maintained in the cluster. Draw a suitable Kafka Cluster architectural diagram
fulfilling the above-mentioned requirements. [6]




---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.