Market Basket Analysis: A Guide to Understanding Consumer Behavior in the CPG Industry

Understanding consumer behavior is crucial to make effective business decisions for any CPG company. Market Basket Analysis (MBA) is a widely used technique in the CPG industry to analyze consumer purchasing patterns and gain insights into their behavior. In this blog, we will explain what MBA is, its importance in the CPG industry, and how it can be used to improve business decisions.

What is Market Basket Analysis?

Market Basket Analysis is a technique that analyzes customer purchase behavior to identify relationships between products. It is a data mining method that helps identify which products are frequently purchased together and which are not. MBA can reveal correlations between products that may not be immediately apparent, providing insights into consumer behavior and preferences.

The basic methodology of MBA involves analyzing transactional data to identify frequently occurring product combinations. The analysis is based on the concept of Association Rules, which identifies the co-occurrence of items in transactions. MBA utilizes three important metrics: Support, Confidence, and Lift.

Support measures how frequently an itemset appears in the transactional data. It is the proportion of transactions containing a particular itemset.

Confidence measures the likelihood that an item B is purchased when item A is purchased. It is the ratio of transactions containing both item A and B to the number of transactions containing item A.

Lift measures the strength of association between items. It is the ratio of the observed support to the expected support if the items were independent.

Why is Market Basket Analysis important in the CPG industry?

MBA is essential in the CPG industry as it can provide valuable insights into consumer behavior, preferences, and buying patterns. By analyzing consumer behavior, companies can identify which products are often purchased together and which are not.

This information can help companies create more effective marketing strategies, optimize product placement, and improve product bundling. For example, if a CPG company finds that customers who buy chips are likely to buy soda as well, they can place these two products next to each other to increase sales.

Moreover, it can help CPG companies in making pricing decisions. By analyzing customer buying patterns, companies can identify which products are price-sensitive and which are not. They can then optimize pricing to increase sales and maximize profits.

For example, if a CPG company finds that customers who buy bread are likely to buy milk as well, they can offer discounts on milk to increase its sales and maximize profits.

Examples of Market Basket Analysis in CPG industry

Market Basket Analysis has several applications in the CPG industry. Here are a few examples:

Amazon.com: Amazon.com uses MBA to identify which products are often purchased together and recommends products based on the customer’s purchase history. This helps Amazon increase sales and improve customer satisfaction.

Tesco: Tesco, a UK-based supermarket chain, uses MBA to improve store layout and optimize product placement. By analyzing customer purchase data, Tesco can identify which products are often purchased together and place them close to each other to increase sales.

Coca-Cola: Coca-Cola used MBA to identify which products are often purchased together and launched a new product line based on the analysis. Coca-Cola found that customers who bought coke were likely to buy popcorn, so they launched a new product line that combined coke and popcorn.

Advantages and Limitations

MBA has several advantages that make it an essential tool in the CPG industry. It is easy to use, provides valuable insights into consumer behavior, and can help improve business decisions. However, MBA has some limitations that need to be considered. The results of MBA are based on transactional data, which may not be representative of the entire customer base.

MBA also does not provide insights into why customers purchase certain products together, which can limit the usefulness of the analysis.

How to perform Market Basket Analysis

Performing MBA involves several steps, including data preparation, data analysis, and interpretation of the results. Here are some factors to consider while performing MBA:

Choose the right data: MBA is based on transactional data, so it is essential to choose the right data source. The data should be clean, reliable, and representative of the entire customer base.

Define the scope: Determine the scope of the analysis and the products or product categories to be analyzed.

Set the metrics: Set the metrics to be used in the analysis, such as support, confidence, and lift.

Choose the tool: There are several MBA tools available in the market, such as Excel, SPSS, and R. Choose the tool that best fits your needs and expertise.

Interpret the results: Interpret the results of the analysis and draw insights from the data.

Market Basket Analysis is a powerful technique that can help CPG companies gain valuable insights into consumer behavior and preferences. However, performing MBA can be a complex and time-consuming process that requires expertise in data analysis. This is where Explorazor comes in.

Explorazor is a data exploration tool that can help CPG enterprises quickly and easily perform MBA and other types of data analysis. With Explorazor, you can ask a query in seconds and get insights on your data, without the need for extensive data science knowledge.

Moreover, Explorazor can also perform root cause analysis to help you identify the pain points in your data and take corrective actions to improve your business operations. By using Explorazor, CPG companies can gain a competitive advantage by making data-driven decisions based on reliable insights.

Try Explorazor today and discover how it can help you gain valuable insights into your data.

CPG Jargon Buster Master Article

Hello, and welcome to the knowledge hub that is the CPG Jargon Buster Master Article!

Here you will find direct links to many relevant jargon/concepts in the CPG Industry. Each term is explained in brief below, with a link to the detailed blog at the end of it. 

We keep adding more jargon as we write about them, so be sure to bookmark this page and keep learning! We’re also creating a FANTASTIC CPG-specific product for optimal and super-easy data exploration – you might want to check Explorazor out!

Till now, we have covered 

  1. ACV

ACV stands for All Commodity Volume. It is used in the calculation of %ACV (obviously, but the term ‘ACV’ is often used interchangeably with %ACV, so one needs to be mindful of that). 

ACV is nothing but the total monetary sales of a store. Assessing the ACV of a retailer helps suppliers know which outlet presents the best sales potential based on its business health. 

Learn how to calculate ACV using Nielsen data and how ACV relates to %ACV 

Read more: What is ACV in CPG?


  1. %ACV 

A more comprehensive blog than the ACV blog above, %ACV, or %ACV Distribution, helps managers understand the quality of their distribution networks. You might wonder why a product is not selling well in a region despite being apparently well-distributed there. A deep analysis of metrics such as %ACV will help you resolve that. 

Read the blog to understand how to calculate %ACV, and the 5 points to consider when performing the calculations:

Read more: What is %ACV?


  1. Velocity

Velocity is another metric to study distribution. Velocity factors the rate at which products move off the store shelves once they are placed there. 

Managers can take charge of sales by utilizing velocity fully, and understanding the two major velocity measures – Sales per Point of Distribution (SPPD) and Sales per Million. Refer to the blog to learn what these measures are, with examples to help. As Sales per Million is a complex concept we’ve also explained it separately in another blog:

Read more: ALL About Velocity / Sales Rate in CPG


  1. Average Items Carried

This is the average number of items that a retailer carries – be it of a segment, brand, category, etc. For example, suppose that Brand X has 5 products/items under its name. Average items Carried would be from a retailer’s perspective – he could be carrying 2 products, or 2.5 products, or 4 products of Brand X, on average. 

AIC is one of the 2 components of Total Distribution Points (TDP), the other being %ACV Distribution. The blog explains the relationship between AIC and %ACV with respect to TDP (Total Distribution Points), using examples to simplify. 

Learn why AIC and %ACV are called the width and depth in distribution, and how to calculate AIC in Excel:

Read more: What is ‘Average Items Carried’ and How Does it relate to %ACV?


  1. Total Distribution Points – Basics

Total Distribution Points, or Total Points of Distribution, is again a distribution measure, considering both %ACV and Average items Carried to produce a TDP score that helps Brand Managers understand things like product distribution and store health, and base their future strategies accordingly. 

There’s also a method for managers to know whether their brand is being represented in a fair manner on the retailer’s shelf, using TDP. Learn how to calculate TDP and the special case of TDP if %ACV is 95 or above:

Read More: Basics of Total Distribution Points (TDP) in CPG


  1. Sales per Million

How do you compare two markets where one is many times larger than the other? Does a manager simply say “It’s a smaller market, thus sales are less” and be done with it? Shouldn’t s/he investigate if the products in the smaller market are moving as fast as they are in the larger market? 

Sales per million helps compare across markets, while controlling for distribution. It accounts for the varying Market ACVs and stabilizes them, so managers can find how each product is doing in each market, regardless of market size.

Learn how to calculate Sales per Million with a cross-market comparison example following it:

Read More: Sales per Million 


  1. Panel Data Measures

Nielsen and IRI provide the numbers for these 4 measures, and even those who do not use Nielsen/IRI need to have an understanding of household-level analysis using these 4 measures.

Here are the one-line introductions:

  1. Household Penetration

How many households are buying my product?

  1. Buying Rate

How much is each household buying?

Purchase Frequency and Purchase Size are sub-components of Buying Rate.

  1. Purchase Frequency (Trips per Buyer)

(For each household) How often do they buy my product? 

  1. Purchase Size (Sales per Trip)

(For each household) How much do they buy at one time?

These 4 measures in table format can be used by managers to understand the consumer dynamics that drive the total sales for their product.

Understand these 4 measures in detail, and how they relate to sales:

Read More: Panel Data Measures


  1. Market Basket Analysis

Market Basket Analysis (MBA) is a powerful data mining technique used in the CPG industry to analyze customer purchase behavior and identify relationships between products.

Learn how Market Basket Analysis can help you gain valuable insights into consumer behavior in the CPG industry.

Read more on: Market Basket Analysis


  1. Point of Sale

The consumer packaged goods (CPG) industry is a highly competitive market, and companies need to make informed decisions to stay ahead.

One tool that CPG companies use to make data-driven decisions is Point of Sale (POS) data.

Learn how CPG and Pharma companies optimize their performance using Point of Sale


  1. Customer Segmentation

Customer segmentation, is a technique that helps you divide your audience into distinct groups based on their characteristics, behavior, or preferences.

By doing so, enterprises can tailor your strategies to each segment’s specific needs, improving your chances of success.

Read more on: Customer Segmentation


  1. Price Elasticity of Demand

Price elasticity of demand is calculated by dividing the percentage change in the quantity demanded of a product by the percentage change in the price of that product. 

The resulting number is a measure of how sensitive the quantity of the product demanded is to changes in its price. 

The formula for calculation Price of Elasticity is:

Price Elasticity of Demand = (% Change in Quantity Demanded) / (% Change in Price)

Check out our blog on how CPG companies take decision on the basis of Price Elasticity.

Take an Interactive Product Tour of Explorazor Today!

What is ACV in CPG?

ACV stands for All Commodity Volume. Bear in mind that ACV is often used synonymously with %ACV, but it is actually not interchangeable. ACV is used in the calculation of %ACV.

WHAT IS ALL COMMODITY VOLUME?

The definition of ACV is simple: It is the total monetary sales of a store. To explain further, ACV includes everything that a retailer sells in his outlet – across products, across categories. 

Thus, ACV, All Commodity Volume, is not based on the physical size of an outlet. Rather, the total business of that outlet is the yardstick of ACV.

With that clear, let us understand, 

WHY SHOULD SUPPLIERS CARE ABOUT ACV?

A CPG manager can go “Shouldn’t I be concerned about the sales of my product, and how much of my product the store sells, instead of assessing everything that the store sells?”

A fair question, and there is an answer. Assessing the ACV of a retailer helps suppliers 

  1. Know which outlet’s business health is the best 
  2. Which outlet has the maximum growth and sales potential based on its business health trend

Essentially, ACV tells you which outlet to care most about with respect to distribution/presence, so you can optimize your distribution efforts and optimize sales.

Couple of points about ACV calculation done by Nielsen/IRI: 

The data collection done by Nielsen and IRI around ACV excludes some departments like pharmacy, gasoline, and lottery because all stores do not contain these departments. This is the reason behind the variance in numbers in annual reports vs Nielsen/IRI data. Your data supplier may be able to provide you with a list of all the included, or excluded, departments during ACV calculation. This will help you reconcile the numbers.

The second point is that ACV figures are usually updated annually. 

HOW DOES ACV RELATE TO %ACV?

%ACV is nothing but ‘ACV weighted distribution’. (Read all about %ACV)

ACV is used mostly as an input to calculate %ACV. Let’s see how ACV lets us calculate %ACV:

A market contains 3 stores. As we mentioned in the %ACV blog linked above, we consider only the stores where our product has scanned, and not those stores where our product is not moving off the shelf. 

Store ACV (millions)Did Product Y scan/sell here? 
Store 120Yes
Store 240No
Store 380Yes

Total Market ACV: 

20 + 40 + 80 = 140 Million

Distribution for Product Y can be measured in two ways: Weighted (%ACV) and Unweighted.

Unweighted distribution for Product Y is the %of stores in which the product is selling/scanned. This would be 2 stores of 3, hence we arrive at a distribution of 67%.

Weighted distribution for Product Y (%ACV) =  Total ACV of stores where Product Y is sold/scanned divided by Total ACV across all stores.

(20 + 80) divided by 140.

Converting into percentage, it’s 71.42%.

The aim is to get the product placed in a high ACV store. It happens that the number of outlets of a retail chain is lesser than others, yet its ACV is better. Only the calculation which uses ACV will identify the right retail chain or outlet where one’s product/brand must necessarily scan. Once managers understand the importance of ACV, they begin to use it in other measures such as velocity and promotion measures as well. 

Take an Interactive Product tour of Explorazor!

What is %ACV?

In this blog, we’ll understand precisely what %ACV distribution is, and why you as a manager should be paying maximum attention to it. The contents of the blog are as follows:

  • What is %ACV
  • Why managers should care about it
  • How to calculate %ACV, and
  • Some points to consider when using it in your data analysis

The total sales made is the bottom line of all your efforts, but for a commodity to sell, it needs to be present in stores, and the right ones at that. Distribution, therefore, is widely considered as the most important sales driver, and %ACV helps you get your distribution right.

WHY SHOULD MANAGERS CARE ABOUT %ACV?

Here are some reasons: 

%ACV helps managers understand the quality of their distribution networks, so they are not deceived into feeling cozy because their products are seemingly well-distributed, when, in fact, they might be well-distributed only at the surface level. %ACV can answer why certain products are not selling in an area despite widespread distribution in that area.  

On the other end of things, CPG managers get to know which retailers are the fastest at moving products off their shelves, and categorize them as such. Managers can then focus on specifically targeting these stores and ensuring distribution’s on point there. Knowing which stores are the best performing also provides a blueprint which can be referred to and possibly replicated.

If managers care about their distribution goals, and what’s really going on at the store-level, they should care about %ACV. 

WHAT IS %ACV DISTRIBUTION?

%ACV Distribution, simply known as %ACV, stands for All Commodity Volume. It is a metric that can be understood as the ‘percentage of stores selling, where each store is prioritized based on its size’. This figure is then compared to the sales of other (rival) retailers, territory-wise.

Now, size here means the total annual sales of the store, called All Commodity Volume (ACV). This means that the larger the store you are present in, by (ACV) size, the more weight is assigned to it. 

However,

IT’S ALL ABOUT SCANNING

Being present in a large store means nothing if your product is not getting scanned. Your brand may have a dedicated shelf or shelf tag in a store, but if

  • The product’s out of stock, or
  • Is in stock, but is not moving out (customers aren’t purchasing it)

it won’t be captured under %ACV distribution in the Nielsen and the IRI data.   

%ACV distribution helps managers understand the quality of their distribution networks. The golden word to gauge quality is ‘scanning’. 

HOW IS %ACV CALCULATED?

The formula to get Retailer %ACV is this:

(ACV of that retailer/ ACV of all the retailers) * 100

The City of Mumbai has 3 retailers (oh, the oversimplification) selling your brand. 

Assume the details are as such:

No. of storesACV (Rupees)% of stores%ACV
Retail Store 15080 Mn50%40
Retail Store 230100 Mn30%50
Retail Store 32020 Mn20%10
Total 100 200 Mn100%100

Now, if your brand is present in Retail Store 1 and Retail Store 2, then the distribution by % of stores is 80%, but the distribution by %ACV is 90

How we arrived at 90 for %ACV is thus:

[(ACV of Store 1 + ACV of Store 2) divided by Total ACV]

(80 + 100) divided by 200 = 90.

The entire column of ‘%ACV’ is similarly calculated.

Similarly, if your brand is present in Store 1 and Store 3, then the distribution by % of stores would be 70%, but the distribution by %ACV would only be 50%. 

Studying these two scenarios in light of the %ACV distribution metric helped us understand the classical ‘I am present in many stores, therefore I should be selling more’ mistake that a manager may make. Store 2 is clearly the most valuable store, even with a lesser number of outlets (30) than Store 1 (50).

To revise, %ACV is meant to categorize, or value stores based on their ACV size, which is the total annual sales of a store, and target the largest store.  

SOME POINTS TO CONSIDER WHEN PERFORMING %ACV DISTRIBUTION 

When using %ACV distribution in your data analysis, keep in mind the following points:

  1. Scanning = Quality of distribution

An actual product scan is what counts – and it’s all that counts. Nielsen does not consider your product to be distributed when it is sitting in a store shelf and not moving out, and you should follow the same reasoning. Retail authorization means nothing – our focus is on the quality of distribution

  1.  Can’t add distribution up 

%ACV distribution is non-additive, meaning that if one UPC (Universal Product Code) has 20% distribution and another has 25%, you can’t just add up and conclude that total distribution is 45%. Neither markets, nor products, nor periods can be added. If you do, that would be incorrect, not to mention you may end up with a distribution of more than 100%.

Use the periods, markets, and products available in your database for analysis, without adding them up

  1. Don’t go weekly for non-perishable items

For non-perishable items, you might want to look at longer distribution periods such as 12 weeks for slow-moving products, or 4 weeks for relatively faster-moving products. Conducting 1-week analysis for slow-moving products, for example, will lead to grossly incorrect conclusions, because these products have longer purchase cycles and do not get scanned on a weekly basis. As such, you might be finding faults with your distribution infrastructure when there are none.

Of course, as you widen the territory of analysis on a weekly basis, you will see units being sold, but micro-analysis at retailer-level or for a specific item is not possible in this manner

  1. Be careful with 52-week analysis as well

Longer periods of distribution-related numbers are often extrapolated from smaller data chunks. Now, if the current distribution is fluctuating i.e. moving up or down rather than being stable, and the small data chunk is relatively stable, the extrapolation will not represent the current fluctuation. The extrapolation may consider the average or the maximum of the week/s within the smaller data chunk, and produce a year-long picture or that basis

  1. Individual item distribution vs Total brand distribution

Total brand distribution will always be higher than individual item distribution, since every store will hold your brand, but not all stores will hold every product variety you produce. Discrepancy is to be expected, except for super-seller products which every store wants to keep

Until next time!


Explorazor, the data exploration tool for Brand Managers, is a product of vPhrase Analytics.

Take an Interactive Product tour of Explorazor!