Association Rules Mining    (srikant95mining.pdf) (srikant95mining.pdf) (techopedia, n.d.) Doneby KURAKULA AJAY BABU Studentid-13472219                              Contents:1.

Introduction2.What is Associative Rules Mining3.  Apriori Algorithm4.Algorithms5.  Types of AssociationRules  5.1.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Multidimensional Association Rules   5.2. Quantitative Associative Rules   5.3 Sequential Pattern Mining6.Applications7.

Conclusion8.References   Abstract:Association rules mining is the secondwidely used techniques in data mining. It searches for interestingrelationships among items in a given data set especially in transactionaldatabases. This will investigate what Association rules mining is, applicationareas, variants, etc. The problem of discovering association rules has receivedconsiderable research attention and several fast algorithms for miningassociation rules have been developed. In practice, users are often interestedin a subset of association rules. For example, they may only want rules thatcontain a specific item or rules that contain children of a specific item in ahierarchy. While such constraints can be applied as a post processing step,integrating them into the mining algorithm can dramatically reduce theexecution time.

      1.    Introduction  Data mining also called as knowledge discovery in databases,was discovered as a new era for database research. The area is used to findinteresting rules from large sets of data.  Given a set oftransactions, set of items is each transaction, an association rule is anexpression X=>Y, X and Y are sets of items.

The meaning of this thing isthat whatever items there may be in X however it also contains Y. An example ofsuch a thing is 98% of people who buy tires and auto accessories also buy someautomotive services the 98% of people here are called as confidence of therule. The percentage of rule that contain both X and Y are called as thesupport of the rule where X=>Y. The problem of the mining association rulesis that it needs to find all the rules that satisfy the user-specified minimumsupport and minimum confidence.

The applications with which the associationrules are linked with are attached mailing, catalog design, crossmarketing, store layout, loss-leader analysis and customer segmentation basedon buying patterns.   In most casesTaxonomies over the items are available. A simple example according to thetaxonomy rules is that we may think that people who bought outwear along withhiking boots buy hiking boots every time they buy the outer wear like ski pantsalong with hiking boots, jackets along with hiking boots. As many people hadbought these items together. Also outer wear and hiking boots is a valid rule.But not for clothes and hiking boots it may not be a valid rule because it maynot have minimum support and the latter may not minimum confidence.

                          Clothes                              Footwear                              Outerwear                 Shirts              Shoes                  HikingBoots Jackets                         Ski pants                       The taxonomies mostly workon the leaf level nodes rather than the parent nodes. However finding rules fordifferent taxonomies is valuable because1.) Taxonomiescan be used to prune uninteresting or redundant rules.2.) Rules atlower at lower levels may not have support.

There is less minimum support forthe people to buy hiking boots along with clothes. But it doesn’t say that thetaxonomy is limited to leaf level comparisons. We cannot find many associationrules if we are limited to leaf level. If we take the supermarket intoconsideration we have hundreds of products available there.

But discounts areavailable only if we buy a pair of items as many people buy those things together. 2. What is Associative Rules Mining? Associative Rulemining is a technique which we use to find frequent patterns, correlations,associations, or causal structures from data sets found in different kinds ofdatabases such as relational databases, transactional databases, and otherforms of data storages.

Given a set of transactions, association rule miningaims to find the rules which enable us to predict the occurrence of a specificitem based on the occurrences of the other items in the transaction.Association rule mining isthe data mining process of finding the rules that may govern associations andcausal objects between sets of items. So in a given transaction with multipleitems, it tries to find the rules that govern how or why such items are oftenbought together. For example, peanut butter and jelly are often bought togetherbecause a lot of people like to make PB sandwiches. Also incredibly,diapers and beer are bought in combination because, as it turns out, that dadsare often tasked to do the buying groceries while the moms are left near thebaby. The mainapplications of association rule mining:  •   Basket data analysis – is to analyse thecooperative of purchased items in a single basket or single purchase.

•   Cross marketing – is to work with otherorganizations that complement your own, not competitors. For example, vehicledealerships and manufacturers have cross marketing campaigns with oil and gascompanies for obvious reasons.•   Catalog design – the choice of items in abusiness’ catalog are often designed to complement each other so that shoppingfor one item will lead to buying of an alternative. So these items are oftencomplements or very related.

 (techopedia, n.d.) 2.    Apriori Algorithm Mining for associations among items in a largedatabase of sales transaction is an important database mining function.

Forexample, the information that a customer who purchases a keyboard also tends tobuy a mouse at the same time is represented in association rule below:              Keyboard =>Mouse  support = 6%,confidence = 70% •      Apriori pruning principle:      If there is any itemset whichis infrequent, its superset should not be generated or tested! •      Method: –     Initially,scan DB once to get frequent 1-itemset–     Generate length(k+1) candidate item sets from length k frequent item sets–     Test thecandidates against DB–     Terminatewhen no frequent or candidate set can be generated Example:TransactionalDatabase TID Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E   Item set sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Item set sup {A} 2 {B} 3 {C} 3 {E} 3   Itemset sup {A} 2  {B} 3 {C} 3 {E} 3                                                  1stscan C1 Item set sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C) 2 {B, E} 3 {C, E} 2   Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Item set {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} L2                                                                                                                                                                                                        Scan2    Itemset {B, C, E} Itemset sup {B, C, E} 2 3rd scan L3                                                                                                                                                                                                                                                                                                                                                                           DETAILS OF APRIORI•      Generate candidates–     Step 1:self-joining Lk–     Step 2:pruning•      Countsupports of candidates•      Example ofCandidate-generation–     L3={abc, abd, acd, ace, bcd}–     Self-joining:L3*L3•      abcd from abc and abd•      acde from acd and ace–     Pruning:•      acde is removed because ade is not in L3–     C4={abcd}                                                                                                                                                           BOTTLE NECK OF APRIORI•      Challenges–     Multiplescans of transaction database–     Hugenumber of candidates–     Tediousworkload of support counting for candidates•      ImprovingApriori: general ideas–     Reducepasses of transaction database scans–     Shrinknumber of candidates–     Facilitatesupport counting of candidatesPossible ways of improving performance of the algorithms•      Implementationtechniques-         Use ofgood data structures-         Fastimplementation of basic operations•      Algorithmimprovement-         Findingalgorithms that are more efficient •      Use ofparallel processing•      Samplingthe transaction databasesInteractive DiscoveryIn ARM, the user plays an important role in the process •      The useris responsible for setting the initial minimum support and confidencethresholds•      During thediscovery, the user may decide to further fine-tune the thresholds•      The usercan specify what items are to appear on either or both sides of the resultingrules. (for different purposes)(e.g.)  {X} -> {nappies}  or {nappies} -> {X}  or {Bear} -> {nappies}•      The usercan exploit a category hierarchy of some kind among items.(e.g.) Instead of “Bread à Coke”, “Bakery products” -> “Soft drinks”  Visualization of Association:Plane Graph                                                                                                                                                                                                                      Visualizationof Association Rules                                                      (SGI/MineSet 3.0) Measures of Interestingness•      playbasketball  Þ eat cereal 40%, 66.7% is misleading–     Theoverall % of students eating cereal is 75% > 66.7%.•      playbasketball  Þ not eat cereal 20%, 33.3% is more accurate, although withlower support and confidence•      Anothermeasure of interestingness:  lift Basketball Not basketball Sum (row) Cereal 2000 1750 3750 Not cereal 1000 250 1250 Sum(col.) 3000 2000 5000