Introduction

We have introduced the Apriori Algorithm and pointed out its major disadvantages in the previous post. In this article, an advanced method called the FP Growth algorithm will be revealed. We will walk through the whole process of the FP Growth algorithm and explain why it’s better than Apriori.

Why it’s good?

Let’s recall from the previous post, the two major shortcomings of the Apriori algorithm are

  • The size of candidate itemsets could be extremely large
  • High costs on counting support since we have to scan the itemset database over and over again

To overcome these challenges, the biggest breakthrough of Fp Growth is that

No candidate generation is required!

All the problems of Apriori can be solved by leveraging the FP tree. To be more specific, the itemset size will not be a problem anymore since all the data will be stored in a way more compact version. Moreover, there’s no need to scan the database over and over again. Instead, traversing the FP tree could do the same job more efficiently.

FP Tree

FP tree is the core concept of the whole FP Growth algorithm. Briefly speaking, the FP tree is the compressed representation of the itemset database. The tree structure not only reserves the itemset in DB but also keeps track of the association between itemsets

The tree is constructed by taking each itemset and mapping it to a path in the tree one at a time. The whole idea behind this construction is that

More frequently occurring items will have better chances of sharing items

We then mine the tree recursively to get the frequent pattern. Pattern growth, the name of the algorithm, is achieved by concatenating the frequent pattern generated from the conditional FP trees.

#python #machine-learning #data-science #data-mining #fp-growth

FP Growth: Frequent Pattern Generation in Data Mining with Python Implementation
21.35 GEEK