May 12, 2015

Binary Indexed Tree (BIT) Part-1



In this series we’re exploring the data structure - Binary Indexed Tree (BIT)  [Fenwick tree], BIT is used for storing frequencies and manipulating cumulative frequencies.
This series includes 3 parts. This is the first part of the series. 1. Binary Indexed Tree (BIT) Part-1 2. Binary Indexed Tree (BIT) Part-2 3. Binary Indexed Tree (BIT) Part-3 [In progress]

Part 1


Consider a frequency table F, that stores frequencies of numbers in the range [1...N].  Such a table can be represented using a one dimensional array.

F[i], 1 <= i <= N
F[i] stores frequency of number i.

In this representation frequency of a number i is stored at index i, this defines a one-to-one mapping between number and index.

Below is a sample frequency table F storing frequencies of number in the range [1..16]






For example element at index 5 stores frequency of number 5, which is 8.

Since we use number itself as table index and since the numbers ranges from 1 to N, we will not use 0th index of the table.

The problem we are trying to solve is - given such a table, we want answer following type of queries:

Query #1: Find cumulative frequency at a number m inclusive.
  int CumFreq(m)

Query #2: Update frequency of a number m by x.
  void Update(m, x)

Query #3: Find/Read frequency of a number m.
  int Read(m)

Query #4: Find a number with given frequency f.
  int FindByFrequency(f)

Naive apporach 1: Using F table


Since table F is an array which is indexed using number,  type  #2 and #3 queries can be answered in O(1) time.

#2 Update the frequency of number m by x
  Action<in, int> Update = (m, x) => { F[m] + x; }

#3 Read frequency of a number
  Func<int> Read = (m) => { return F[m]; }

The cumulative frequency at a number m can be calculated by adding the frequency of number m to the cumulative frequency of the previous number m - 1. The cumulative frequency for the number 1 is same as its frequency since there is no cumulative frequency before it.

By just using frequency table F, achieving #1 requires iterating through the table from 1 to m

             i=m  
CF(m) = Σ F(i)
             i=1

Func<int> CumFreq = (m) => {
 int cf = 0;
 for (int i = 1; i <= m; i++) { cf += F[i]; }
 return cf;
};

Using this method to find the cumulative frequency takes O(N).

Naive approach 2: Using CF table

Another approach is maintaining a table CF, where CF[j] stores the cumulative frequency at number j.












This approach will take O(N) time for pre-computing the CF table. 
The value stored at index j of table CF is:

            j  
CF(j) = Σ F[a]
           a=1

Using CF table type #1 and #3 queries can be answered in O(1).

#1 Cumulative frequency at number m

Func<int> CumFreq = (m) => { return CF[m]; }

#3 Find frequency of a number m
Func<int> Read = (m) => {
 if (m == 1) { return CF[m]; }
 return CF[m] - CF[m - 1];
}

#2 type queries takes O(N) in this case. Updating frequency of a number m by x requires adding x to cumulative frequency of all numbers following m, inclusive.

Action<int, int> Update = (m, x) => {
 for (int i = m; i < CF.Length; i++) { CF[i] += x; }
};

Comparison of naive approaches with BIT

Before we start exploring BIT, let’s compare running time of above two naive approach with upcoming BIT approach so we know what BIT guarantee.

Query
Using F table
Using CF table
Using BIT
int Read(int m)
O(1)
O(1)
O(log(N))
void Update(int m, int x)
O(1)
O(N)
O(log(N))
int CumFreq(int m)
O(N)
O(1)
O(log(N))




Notes:
  1. [1..N] is the range of numbers whose frequencies we are manipulating.
  2. N is the max value for ‘number’
  3. 1 <= m <= N


No comments:

Post a Comment