Tuesday, April 11, 2023

Decaying Window

This algorithm allows you to identify the most popular elements (trending, in other words) in an incoming data stream.


The decaying window algorithm not only tracks the most recurring elements in an incoming data stream, but also discounts any random spikes or spam requests that might have boosted an element’s frequency. In a decaying window, you assign a score or weight to every element of the incoming data stream. Further, you need to calculate the aggregate sum for each distinct element by adding all the weights assigned to that element. The element with the highest total score is listed as trending or the most popular.


  1. Assign each element with a weight/score.
  2. Calculate aggregate sum for each distinct element by adding all the weights assigned to that element.


In a decaying window algorithm, you assign more weight to newer elements. For a new element, you first reduce the weight of all the existing elements by a constant factor k and then assign the new element with a specific weight. The aggregate sum of the decaying exponential weights can be calculated using the following formula:

t1i=0ati(1c)i

Here, c is usually a small constant of the order  or . Whenever a new element, say at+1 , arrives in the data stream you perform the following steps to achieve an updated sum:

1. Multiply the current sum/score by the value (1−c).
2. Add the weight corresponding to the new element.

Generally i is considered as 1 

Weight decays exponentially over time

In a data stream consisting of various elements, you maintain a separate sum for each distinct element. For every incoming element, you multiply the sum of all the existing elements by a value of (1−c). Further, you add the weight of the incoming element to its corresponding aggregate sum.

A threshold can be kept to, ignore elements of weight lesser than that.

Finally, the element with the highest aggregate score is listed as the most popular element.

Example

For example, consider a sequence of twitter tags below:
fifa, ipl, fifa, ipl, ipl, ipl, fifa

Also, let's say each element in sequence has weight of 1.
Let's c be 0.1
The aggregate sum of each tag in the end of above stream will be calculated as below:

fifa

fifa - 1 * (1-0.1) = 0.9
ipl - 0.9 * (1-0.1) + 0 = 0.81 (adding 0 because current tag is different than fifa)
fifa -  0.81 * (1-0.1) + 1 = 1.729 (adding 1 because current tag is fifa only)
ipl - 1.729 * (1-0.1) + 0 = 1.5561
ipl - 1.5561 * (1-0.1) + 0 = 1.4005
ipl - 1.4005 * (1-0.1) + 0 = 1.2605
fifa -  1.2605 * (1-0.1) + 1 = 2.135

ipl

fifa - 0 * (1-0.1) = 0
ipl - 0 * (1-0.1) + 1 = 1 
fifa -  1 * (1-0.1) + 0 = 0.9 (adding 0 because current tag is different than ipl)
ipl - 0.9 * (1-0.01) + 1 = 1.81
ipl - 1.81 * (1-0.01) + 1 = 2.7919
ipl - 2.7919 * (1-0.01) + 1 = 3.764
fifa - 3.764 * (1-0.01) + 0  = 3.7264

In the end of the sequence, we can see the score of fifa is 2.135 but ipl is 3.7264
So, ipl is more trending then fifa
Even though both of them occurred same number of times in input there score is still different.

Advantages of Decaying Window Algorithm:

  1. Sudden spikes or spam data is taken care.
  2. New element is given more weight by this mechanism, to achieve right trending output.

Similar Blog : Nitin Decaying Window

---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

No comments:

Post a Comment