Market basket analysis by solving the inverse Ising problem: Discovering pairwise interaction strengths among products
Introduction
Since the last twenty years, statistical physics models have been used to understand and discover properties of economic systems. For example, some pioneering studies have been able to detect hierarchical structures between different stocks traded in financial markets, using tools and procedures developed to model physical systems by determining correlations among assets [1], [2], [3]. Some of these studies have had a greater impact on quantitative finance, introducing new methodologies and models to explain price fluctuations [4], [5], to identify the noise in financial correlation matrices to improve portfolio optimization [6], also to describe the dynamics and properties of assets in which scaling laws have been found [7], [8], [9].
In the field of marketing and consumer behavior, there are large volumes of data present in transactional databases that record each purchase of a customer over time. The availability of this data makes it interesting to study purchasing patterns at an aggregate level. A recent area of research on purchasing patterns using networks has revealed that the buying phenomenon exhibits scale properties. The networks of products built from the list of transactions tend to possess the typical scale-free network [10], [11], [12]. However, unlike quantitative finance, there appears to be no further development in the analysis of purchasing behavior using more sophisticated techniques.
In this paper, we investigate purchases from transactional databases. These data contain the information of market preferences that can be seen as a complex system composed of many spins (or market offers) that interact with each other. The products purchased, reveal the preferences of the consumers of a set of various brands and products available at a given time. In a physical sense, a market basket is equivalent to a spin configuration in the system. Our approach is based on modeling purchases statistically using the maximum entropy principle [13], [14]. The resulting model allows us to understand how purchasing units interact with each other and to observe recurring behavior patterns in different product samples.
The maximum entropy principle has been a useful tool for studying scale and structure properties in neurosciences. For instance, it is possible to model collective neurons activations under different stimuli [15], [16], [17], [18]. In these cases, the idea is to characterize the distribution of the physical system by taking the first moments and the pairwise correlations between the variables of the observed distribution, and use them as constraints. Then through a process of entropy maximization, it is desirable to find a distribution that is consistent with those moments and correlations, but does not assume the existence of higher order interactions. This kind of distribution follows the Boltzmann distribution. The inference of the parameters of this distribution is difficult in general, and it corresponds to the inverse Ising problem [19].
In this line, we demonstrate that the Ising model approach is also feasible for describing the aggregate purchasing behavior of a large number of consumers. This approach allows an economic interpretation, in which the inferred parameters of the distribution can be understood as a measure of economic activity among the elements that constitute the system [20].
The results indicate that this kind of models are able to recover satisfactorily the distribution of the states. This makes it possible to validate the hypothesis that the pairwise interactions contain enough information to explain the phenomenon of purchases, as has been discovered in financial markets and in the functioning of biological neurons. As an example of the application of the inferred parameters, we use the couplings to obtain the hierarchical structure of the collective purchasing system.
Section snippets
Model and learning
Using the empirical data, it is necessary to find a probability distribution whose number of parameters is less than the possible number of purchase states. The states of the collective system of purchases are all possible combinations of elements that make up a market basket. Under the approach of the Ising-based model, we represent a market basket with the state . Each spin represents the presence or absence of the product in the market basket. The probability of observing
The data
To carry out our analysis, we took a transactional database of 179 610 purchase records from 17 093 regular customers of a branch of a supermarket chain in Santiago, Chile. These data involve purchases between July 2010 and November 2011. This branch is located in a sector of high public affluence, and it is usually used by consumers to make on-the-spot purchases, i.e., purchases of a low number of products and not for the end-of-the-month shopping to replenish household food.
The database
Model consistency
To show that the pairwise model of maximum entropy is consistent with the actual transactions, we carry out, for each of the 250 samples, a Boltzmann learning process of 2000 steps with an initial learning rate with a decay of 0.01. In each of these Boltzmann learning steps, we computed the means and correlations from 40 000 Monte Carlo samplings steps, using binomial initial configuration as with probabilities .
We track the mean squared error (MSE2
Application to a transactional sample dataset
Recently, it has been found that, similarly to the stock market, the aggregated purchasing behavior of retail consumers tends to form its own hierarchical structure [28]. The construction of minimum spanning tree (MST) using the Prim’s algorithm, based on the correlations between product activity, is a good alternative for discovering the hierarchical structure. From a graph oriented perspective, every element of the system is a node and edges between them are obtained from a transformation of
Conclusions
This study has shown that it is possible to describe transactional data using the maximum entropy principle, in which the aggregate purchasing system of a multitude of transactions can be analyzed with a pairwise Ising model. Using Boltzmann learning, it was possible to find the field parameters of each product and the couplings between pairs of products that allows to parameterize the distribution of market baskets. The maximum entropy distribution is consistent with the structure of
Acknowledgments
The authors would like to thank CONICYT-Chile under grant Fondecyt 11160072 (M.A.V.) and Basal(CONICYT)-CMM , Fondecyt 1180706 (G.A.R.) for financially supporting this research.
References (30)
The application of continuous-time random walks in finance and economics
Physica A
(2006)- et al.
Price fluctuations from the order book perspectiveempirical facts and a simple model
Physica A
(2001) Market structure explained by pairwise interactions
Physica A
(2013)- et al.
Market basket analysis: Complementing association rules with minimum spanning trees
Expert Syst. Appl.
(2018) - et al.
An ever-closer union? Examining the evolution of linkages of european equity markets via minimum spanning trees
Physica A
(2008) Hierarchical structure in financial markets
Eur. Phys. J. B
(1999)- et al.
Taxonomy of stock market indices
Phys. Rev. E
(2000) - et al.
Dynamic asset trees and portfolio analysis
Eur. Phys. J. B
(2002) - et al.
Modelling the short term herding behaviour of stock markets
New J. Phys.
(2014) - et al.
Random matrix approach to cross correlations in financial data
Phys. Rev. E
(2002)
The power of patience: a behavioural regularity in limit-order placement
Quant. Finance
Statistical properties of stock order books: empirical results and models
Quant. Finance
Modeling a store
Market basket analysis with networks
Soc. Netw. Anal. Min.
Scale-free networks: a decade and beyond
Science
Cited by (6)
Inverse problem for the quartic mean-field Ising model
2023, European Physical Journal PlusThe minimal representation of a system with interacting units using Boltzmann machines
2022, ACM International Conference Proceeding SeriesClustering based approach to enhance association rule mining
2021, Conference of Open Innovation Association, FRUCTMarket Basket Analysis Using Boltzmann Machines
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)