In this article, association analysis will be studied using the Orange Data Mining tool. The Apriori algorithm will be utilized for creating association rules. Algorithm steps will be shown on a small set of market shopping data.

Association Rules

Association analyses are studies that try to uncover if-else rules hidden within the dataset. It usually yields good results with categorical data. The most common example on association analysis is basket analysis. In addition, it has a wide range of uses such as bioinformatics, disease diagnosis, web mining and text mining.

Basket Analysis

In basket analysis, we keep products bought by shoppers in a list, and wonder which products are sold more together.

The Data

Let’s say we have a data consisting 5 transactions in a market like:

1 Bread, Milk

2 Bread, Tea, Coffee, Eggs

3 Milk, Tea, Coffee, Coke

4 Bread, Milk, Tea, Coffee

5 Bread, Milk, Tea, Coke

We can see that most shoppers who buy Tea also buy Coffee in the dataset. Now, let’s show the dataset using one-hot encoding. The dataset can be downloaded from here.


One-hot encoding

Some Definitions on Association Rules

**Product list: **List of all products in the basket, i.e {Bread, Milk, Eggs}.

**Support count (σ): **The number of items passed on purchases, i.e. σ({Milk, Tea, Coffee}) = 2

Support rate(s): The proportion of the product list in the exchange, i.e. s({Milk, Tea, Coffee}) = 2/5

**Product list frequency: **Support rate list of products above a specific value.

There is more information here on association rules. In this blog, I will show how to utilize association rules using Orange tool.

Apriori Algorithm

The Apriori Algorithm is the most used algorithm in basket analysis. The algorithm starts by specifying a threshold value. For example, let’s take the minimum support threshold to 60%.

Step 1: Type product lists in frequency and identify the product with maximum frequency. Multiply the number of products by threshold value and remove products below the value you find.

Step 2: Multiply the number of products by threshold value and remove products below the value you find.

#associate-rules #apriori #orange-data-mining #data-mining #association #data analysis

Orange Data Mining Tool and Association Rules
13.45 GEEK