We have introduced the Apriori Algorithm and pointed out its major disadvantages in the previous post. In this article, an advanced method called the FP Growth algorithm will be revealed. We will walk through the whole process of the FP Growth algorithm and explain why it’s better than Apriori.
Let’s recall from the previous post, the two major shortcomings of the Apriori algorithm are
To overcome these challenges, the biggest breakthrough of Fp Growth is that
No candidate generation is required!
All the problems of Apriori can be solved by leveraging the FP tree. To be more specific, the itemset size will not be a problem anymore since all the data will be stored in a way more compact version. Moreover, there’s no need to scan the database over and over again. Instead, traversing the FP tree could do the same job more efficiently.
FP tree is the core concept of the whole FP Growth algorithm. Briefly speaking, the FP tree is the compressed representation of the itemset database. The tree structure not only reserves the itemset in DB but also keeps track of the association between itemsets
The tree is constructed by taking each itemset and mapping it to a path in the tree one at a time. The whole idea behind this construction is that
More frequently occurring items will have better chances of sharing items
We then mine the tree recursively to get the frequent pattern. Pattern growth, the name of the algorithm, is achieved by concatenating the frequent pattern generated from the conditional FP trees.
#python #machine-learning #data-science #data-mining #fp-growth