forwards: 2022

Friday, December 30, 2022

Tokenization Issues - Information Retrieval

Some of the tokenization issues are below

1. One-word or is it two words

2.Numbers

3.No Whitespace (Chinese language)

4. Ambiguous segmentation (Same word multiple meanings ex Chinese)

5.Bidirectional (ex : Arabic)

6.Accents and diacritics

7.case folding

8.Stop words

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.

Host/author is not responsible for these posts.

Merge Algorithm - Intersecting two posting lists - Information Retrieval

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Wednesday, December 28, 2022

Inverted index construction - Information Retrieval

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Tuesday, December 27, 2022

Evaluation Measures - Information Retrieval

Confusion Matrix : https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Functional View of Paradigm IR System - Information Retrieval

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

The Process of Retrieving Information -- Information Retrieval

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Data Retrieval vs Information Retrieval....

1. Matching.

In data retrieval we are normally looking for an exact match, that is, we are checking to see whether an item is or is not present in the file.

Ex: Select * from Student where per >= 8.0

In information retrieval more generally we want to find those items which partially match the request and then select from those a few of the best matching ones.

Ex: Student having 8 or > 8 CGPA

2. Inference

In data retrieval is of the simple deductive kind, that is, a ∈ b and b ∈ c then a ∈ c.

In information retrieval it is of inductive inference; relations are only specified with a degree of certainty or uncertainty and hence our confidence in the inference is variable.

3.Model

Data retrieval is deterministic but information retrieval is probabilistic.

Frequently Bayes' Theorem is invoked to carry out inferences in IR, but in DR probabilities do not enter into the processing.

4 .Classification:

In DR most likely monothetic classification is used.

That is, one with classes defined by objects possessing attributes both necessary and sufficient to belong to a class.

In IR, polythetic classification is mostly used.

Each individual in a class will possess only a proportion of all the attributes possessed by all the members of that class..

5.Query Language:

The query language for DR is one with restricted syntax and vocabulary.

In IR we prefer to use natural language although there are some notable exceptions.

6.Query Specification:

In DR the query is generally a complete specification of what is wanted,

In IR it is invariably incomplete.

7.Items wanted :

In IR we are searching for relevant documents as opposed to exactly matching items in DR.

8.Error response:

DR is more sensitive to error in the sense that, an error in matching will not retrieve the wanted item which implies a total failure of the system.

In IR small errors in matching generally do not affect performance of the system significantly

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Thursday, September 29, 2022

BITS-WILP-DSECLZG555 - Data Visualization and Interpretation - DVI - Final Question paper - 25092022

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

BITS-WILP-DSECLZG565 - Machine Learning - ML - Final Question paper - 25092022

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.

Host/author is not responsible for these posts.

BITS-WILP-DSECLZG523 - Introduction to Data Science - IDS - Final Question paper - 18092022

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.

Host/author is not responsible for these posts.

BITS-WILP-DSECLZC413- Introduction to Statistical Methods - ISM - Final Question paper - 18092022

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Saturday, September 24, 2022

DSECLZG565- MACHINE LEARNING - Quick Calculators

SVM (linear and Non liner ) : https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm

algorithm : https://towardsdatascience.com/support-vector-machine-formulation-and-derivation-b146ce89f28

K means Cluster calculator : https://people.revoledu.com/kardi/tutorial/kMean/Online-K-Means-Clustering.html

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

DSECLZG555-DATA VISUALIZATION AND INTERPRETATION - Story Telling Strategies

Establishing Context

who : Audience and you

what : Action , Mechanism and Tone

How : Data

Story Telling Strategies

------------------------

1. 3 Minute Story - telling the story with in 3 mins just by telling audience hat they need to know . No dependency on materials/visualization etc

Story teller needs to know what exactly data is saying .

2. Big idea -- Boils down to most importance sentence. It should articulate unique point of view / convey whats at stake / must be complete sentence .

3. Story boarding -- Establishes structure of communication. Visual outline of content . Use whiteboard , post -it etc.

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

DSECLZG555 - DATA VISUALIZATION AND INTERPRETATION - Gestalt Principles of Visual Perception

Law of Prägnanz (Simplicity)
Law of Similarity
Law of Continuity
Law of Focal Point
Law of Proximity
Law of Figure/Ground
principle of enclosure
principle of closure
principle of continuity
principle of connection
principle of proximity
principle of similarity

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.

Host/author is not responsible for these posts.

DSECLZG555 - DATA VISUALIZATION AND INTERPRETATION - Mistakes in Dashboard design

Mistakes in dashboard

1. Design issues

a. Exceeding screen

b. meaningless variety

c. Clustering display

d. Unattractive visuals

2. Data Issues

a. Inadequate context for the data

b. Using deficient measure

c. Incorrect data encoding

d. Poor data arrangement

e. Ineffective data highlighting

3. Display Issues

a. Inappropriate display media

poorly designed display media

13 Design Mistakes

1.Exceeding the Boundaries of a Single Screen

2. Supplying Inadequate Context for the Data

3. Displaying Excessive Detail or Precision

4. Choosing a Deficient Measure

5. Choosing Inappropriate Display Media

6. Introducing Meaningless Variety

7. Using Poorly Designed Display Media

8. Encoding Quantitative Data Inaccurately

9. Arranging the Data Poorly

10. Highlighting Important Data Ineffectively

11. Cluttering the Display with Useless Decoration

12. Misusing or Overusing Color

13. Designing an Unattractive Visual Display

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

Easy way of converting google colab ipynb to a PDF file

Below steps are to be performed to convert ipynb file to PDF file.

1. Install below packages for every ipnyb file

!apt-get install texlive texlive-xetex texlive-latex-extra pandoc

!pip install pypandoc

2. Copy the path of the file [Ex: File name : abc.ipynb]

3. Execute below

!jupyter nbconvert <<path+ filename>> --to pdf

Ex : !jupyter nbconvert /content/drive/MyDrive/Colab/ForExam-Seaborn.ipynb --to pdf

----------------------------------------------------------------------------

All the messages above are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.

Host/author is not responsible for these posts.

Saturday, September 17, 2022

BITS WILP - DSECLZC413 - Introduction to Statistical Methods - Important calculators

Calculators

Chi Square Test Contingency Table : https://www.socscistatistics.com/tests/chisquare2/default2.aspx

Chi-Square Calculator for Goodness of Fit : https://www.socscistatistics.com/tests/goodnessoffit/default2.aspx

Easy Fisher Exact Test Calculator : https://www.socscistatistics.com/tests/fisher/default2.aspx

T-Test Calculator for 2 Independent Means : https://www.socscistatistics.com/tests/studentttest/default.aspx

T Test Calculator for 2 Dependent Means : https://www.socscistatistics.com/tests/ttestdependent/default.aspx

Single Sample T-Test Calculator : https://www.socscistatistics.com/tests/tsinglesample/default.aspx

Pearson Correlation Coefficient Calculator : https://www.socscistatistics.com/tests/pearson/default.aspx

Spearman's Rho Calculator :https://www.socscistatistics.com/tests/spearman/default.aspx

Linear Regression Calculator : https://www.socscistatistics.com/tests/regression/default.aspx

Multiple Regression Calculator (No residual calculation) : https://www.socscistatistics.com/tests/multipleregression/default.aspx

Maximium Likelyhood estimator : https://mathworld.wolfram.com/MaximumLikelihood.html

Z-test: One Population Proportion : https://mathcracker.com/z-test-for-one-proportion

4 year moving average calculator : https://atozmath.com/CONM/TimeSeries.aspx?q=smaf

autocorrelation : https://www.easycalculation.com/statistics/autocorrelation.php

autocovariance formula :

https://akshay-a.medium.com/basic-of-autocovariance-autocorrelation-and-partial-autocorrelation-explained-47840b065b92

https://www.easycalculation.com/statistics/autocorrelation.php

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post.

Host/author is not responsible for these posts.

Thursday, July 7, 2022

BITS-WILP-Machine Learning - ML - Comprehensive Examination-Regular - 2019-2020

Birla Institute of Technology & Science, Pilani

Work Integrated Learning Programmes Division

Second Semester 2019-20

M.Tech. (Data Science and Engineering)

Comprehensive Examination (Regular)

Course No. : DSECLZG565

Course Title : MACHINE LEARNING

Nature of Exam : Open Book

Weightage : 40%

Duration : 2 Hours

Date of Exam: July 12, 2020 Time of Exam: 10:00 AM – 12:00 PM

Note: Assumptions made if any, should be stated clearly at the beginning of your answer.

Question 1. [3+3+2+3=11 marks]

Suppose you flip a coin with unknown bias θ; P(x = H | θ) = θ, five times and observe the outcome as HHHHH.

What is the maximum likelihood estimator for θ? [1 mark]

Would you think this is a good estimator? If not, why not? [2 marks]

A disease has four symptoms and past history of a physician has the following data. Use Naïve Bayes classifier to predict whether patient has disease for new patient data symptoms. [2 marks]

	Symp1	Symp2	Symp3	Symp4	Disease
1	yes	no	mild	yes	no
2	yes	yes	no	no	yes
3	yes	no	strong	yes	yes
4	no	yes	mild	yes	yes
5	no	no	no	no	no
6	no	yes	strong	yes	yes
7	no	yes	strong	no	no
8	yes	yes	mild	yes	yes

For a new patient
Symp1	Symp2	Symp3	Symp4	Disease
yes	no	mild	yes	?

Can logistic regression be applied to multi-class classification problem?

State true or false [1 mark]

Why are log probabilities computed instead of probabilities? [1 mark]

To make computation consistent
To factor into smaller values of probabilities
To factor into larger values of probabilities
None of these

1. In a linear relationship y = m*x+b, y is said to be dependent on x when: [1 mark]

m is closer to zero.
m is far from zero.
b is far from zero.
b is closer to zero.

2. In a linear relationship between y and x, y is not dependent on x when: [1 mark]

The coefficient is closer to zero.
The coefficient is far from zero.
The intercept is far from zero.
The intercept is closer to zero.

3. In a linear regression model y= w0 + w1*x, if true relationship between y and x is

y = 7.5 +3.2x, then w0 acts as, [1 mark]

Intercepts
Coefficients
Estimators
Residuals

Question 2.

The following backpropagation network uses an activation function called leaky ReLU that generates output = input, if input >= 0, and 0.1 * input if output < 0. At a particular iteration, the weights are indicated in the following figure. Training error is given by E = 0.5*(t-y)2 where t is the target output and y is the actual output from the network. What are the outputs of hidden nodes and actual final output y from the network with x1=x2=1? What will be the weights w31 and w12 in the next iteration with learning rate = 0.1, x1=x2=1, and target output t=0? Assume derivative of activation function = 0 at input = 0, and zero bias at all nodes. [1+1+1+1.5+2.5=7 marks]

Question 3.

Consider training a boosting classiﬁer using decision stumps on the following data set:

1. Circle the examples which will have their weights increased at the end of the ﬁrst iteration? [2 marks]

2. How many iterations will it take to achieve zero training error? Explain. [3 marks]

A new mobile phone service chain store would like to open 20 service centres in Bangalore. Each service centre should cover at least one shopping centre and 5,000 households of annual income over 75,000. Design a scalable algorithm that decides locations of service centres by taking all the aforementioned constraints into consideration [5 marks]

Question 4.

In a clinical trial, height and weight of patients is recorded as shown below in the table. For incoming patient with weight = 58 Kg and Height = 180 cm, classify if patient is Under-weight or Normal using KNN algorithm with When K = 3? [5 marks]

Weight (in Kg)	Height (in cm)	Class
61	190	Under-weight
62	182	Normal
57	185	Under-weight
51	167	Under-weight
69	176	Normal
56	174	Under-weight
60	173	Normal
55	172	Normal
65	172	Normal

Question 5.

Considering the following data, Let x1, x2 be the features

Positive Points: {(3, 1), (5, 2), (1, 1), (2, 2), (6, -1)}

Negative Points: {(-3, 1), (-2, 2), (0, 3), (-3, 4), (-1, 5)}

Derive an equation of hyperplane and compute the model parameters. [7 marks]

----------------------------------------------------------------------------

All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.