European Journal of Operational Research 195 (2009) 1–16
Contents lists available at ScienceDirect
European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor
Invited Review
Quantitative models for direct marketing: A review from systems perspective Indranil Bose a, Xi Chen b,* a b
School of Business, The University of Hong Kong, Pokfulam Road, Hong Kong, China School of Management, Zhejiang University, Hangzhou, China
a r t i c l e
i n f o
Article history: Received 9 September 2006 Accepted 1 April 2008 Available online 9 April 2008 Keywords: Marketing Data mining Customer profiling Customer targeting Statistical modeling Performance evaluation
a b s t r a c t In this paper, quantitative models for direct marketing models are reviewed from a systems perspective. A systems view consists of input, processing, and output and the six key activities of direct marketing that take place within these constituent parts. A discussion about inputs for direct marketing models is provided by describing the various types of data used, by determining the significance of the data, and by addressing the issue of selection of appropriate data. Two types of models, statistical and machine learning based, are popularly used for conducting direct marketing activities. The advantages and disadvantages of these two approaches are discussed along with enhancements to these models. The evaluation of output for direct marketing models is done on the basis of accuracy and profitability. Some challenges in conducting research in the area of quantitative direct marketing models are listed and some significant research questions are proposed. Ó 2008 Elsevier B.V. All rights reserved.
1. Introduction For product advertising and promotions, there are mainly two approaches that are used in practice: mass marketing and direct marketing. Mass marketing employs mass media to broadcast product related information to current as well as potential customers. The mass media used by marketers include television, radio, magazines, and newspapers. Mass marketing targets large groups of customers. It does not discriminate between customers within a group and the information delivered to customers is uniform. Direct marketing is different from mass marketing in that it targets individuals or households. Different customers are subjected to different marketing information. The Direct Marketing Association (DMA) defines direct marketing as ‘‘. . . communications where data are used systematically to achieve quantifiable marketing objectives and where direct contact is made, or invited, between a company and its customers and prospective customers”. Roddy (2002) defined direct marketing as ‘‘the delivery of a marketing message or proposition to a target customer or potential customer, in a customer favourable format, put to the customer from the seller or the seller’s agents (including call centers) without an intermediary person or indirect media involved”. From the two definitions, it is immediately obvious that direct marketing classifies customers so that personalized advertising and promotional activities can be targeted to specific classes of customers. Direct marketing is increasing in importance at present. Barwise and Farley (2005) reported that in some European countries, expenditures * Corresponding author. Tel.: +86 571 88206858; fax: +852 2858 5614. E-mail addresses:
[email protected] (I. Bose),
[email protected] (X. Chen). 0377-2217/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2008.04.006
on direct marketing have increased from 2001 to 2004. For example, the rate of increase is 14.6% in Germany, 73.6% in the UK, and 5.5% in France, respectively. The direct marketing expenditure in the US is quite significant as well. According to a report issued by Direct Marketing Association (Johnson and Frankel, 2005), the total direct marketing advertising expenditure in 2005 was about $161.3 billion. This is expected to result in $1.85 trillion in increased sales in 2005 accounting for 7% of the $26 trillion in total sales in the US. Direct marketing activities made up 10.3% of the US GDP in 2005. Direct marketing is marked by high efficiency. It is estimated that in 2005 the average profit margin of an investment of $1 in direct marketing advertising was about $11.49 across all industries. When compared to the profit margin of $10.99 in 2003 and $10 in 1999, it can be said that the profitability of direct marketing advertising is growing at a fast pace. Direct marketing is especially profitable for consumer direct marketing. For 2005, it is expected that, on an average, $1 spent on consumer direct marketing would yield $12.66 compared to $10.10 for B2B direct marketing. The use of the Internet has lowered the operational cost for direct marketers and so even a low response rate from customers is often sufficient to guarantee the success of a marketing campaign. Gopal et al. (2001) reported that unsolicited commercial email became cost-effective at a low response rate of 0.5%. Due to the growing popularity of direct marketing in industry, academic interest and research in direct marketing is flourishing. Sophisticated direct marketing models can help marketers conduct direct marketing campaigns effectively. Elsner et al. (2003) reported the case of Rhenania, a direct marketing company, which continuously moved up in the market after adopting their Dynamic Multilevel Modeling approach. According to Nash
2
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
(1984), the success of a direct mail campaign depended on the offer, the communication elements, the timing, and the selection of customers. These four issues proposed by Nash are the main motivating factors for research in the area of quantitative models for direct marketing. While Nash (1984) only talked about one end of direct marketing, the marketers, Rao and Steckel (1995) proposed a conceptual framework for direct marketing which also included the other end of direct marketing, the customers. In their framework, the main activities customers take part in are receiving solicitations and deciding whether to buy or not. Those activities of customers update the information of marketers and influence the decisions of marketers. Synthetically, quantitative models for direct marketing make use of customers’ characteristics and customers’ response to help marketers make decisions related to direct marketing activities. This paper reviews research in the area of quantitative models for direct marketing. In the context of the paper, we defined quantitative models as models that use statistical approaches or machine learning based data mining approaches for direct marketing. We first identified a series of activities conducted in direct marketing. Then we grouped those activities into input, processing, and output; the three basic constructs of a system (Wasson, 2006). It is a simple description but has great generalizability. To supplement the abstract nature of the input–processing–output model, the review is conducted by systematically answering several research questions that will be faced by direct marketers when carrying out practical direct marketing campaigns as shown in Fig. 1.
customers begins when direct marketers send solicitations to customers for the purpose of selling products or services. After receiving the solicitations the customers need to decide whether to buy the suggested product or not. By observing the responses from the customers (i.e., buy or not buy), direct marketers adjust their strategy and carry out new rounds of direct marketing activities. In direct marketing the following activities take place sequentially. The first activity is the collection of customer data. Since the revenue of direct marketing depends on how many customers respond to the solicitations, the selection of target customers is the next important activity of direct marketing. Before target selection, usually sophisticated customer profiling is conducted to make the selection more efficient. Selection needs to be optimized in order to satisfy various business requirements and maximize the generated revenue. The final activity is the performance evaluation of the direct marketing activities. 2.1. Collection of data Direct marketers are concerned about what type of data to collect about customers that may reveal meaningful information on customer’s preferences. Since data is easily available they are not worried about the methods of data collection but at the same time they want to ensure that the collected data is clean, meaningful, and sufficient. The key research questions that are related to this activity are What types of data are used? What are the significances of the different types of data? How can the marketer choose the most appropriate data?
2. Activities in direct marketing 2.2. Selection of target customers Direct marketing develops interactions between a company and its customers. A typical interaction between direct marketers and
Input Activity Collection of data
Questions What types of data are used? What is the significance of different types of data? How can marketers choose appropriate data?
Processing Activities Selection of target customers Customer profiling Cross-selling and up-selling Direct marketing strategy planning
Questions Who should be selected and how? How to understand customers better? What products or services should be offered to customers? How can direct marketing activities satisfy business requirements? Is single mailing enough?
Output
Selection of target customers is the core activity of direct marketing. A target can be an individual. It can also be a household (Bult, 1993; Bult and Wansbeek, 1995). Two important research questions that need to be answered by quantitative models in direct marketing: Who should be selected as target for direct marketing? What techniques should be used for selection of targets? Usually, a score is generated for an individual customer by the selection techniques. The score could take a binary value like one or zero indicating whether a customer would respond or not. It could also take an integer value representing the number or type of products the customer would buy. The score could also take a continuous value between zero and one thereby representing the probability that the customer will complete a purchase or a value representing the revenue that the customer might produce. There are three types of revenues generated by customers that direct marketing models try to estimate. The first is the amount of money a customer might spend as a response to a solicitation. The second is the expected monetary value of a customer’s spending that is measured by multiplying the response probability with revenue. The third measure of revenue is the life time value (LTV) of a customer that is mostly used in multi-mailing problems. LTV is calculated as the sum of possible monetary amounts a customer would spend in response to all current and future solicitations. However, in addition to the revenue generated by the customer it is also important to consider the cost of the solicitation. A solicitation cost that is too high can offset the revenue generated by the customer, resulting in a negative contribution for the customer. 2.3. Customer profiling
Activity Performance evaluation
Questions Is the model accurate? Is the model profitable?
Fig. 1. A systems perspective of direct marketing models.
The key research question that is related to this topic: How should marketers profile customers for effective target selection? Customer profiling includes customer clustering and customer pattern recognition. Customer clustering groups similar customers together and separates dissimilar ones. Customer clustering allows
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
further analysis on smaller group of customers or segments that represent various sub-markets for products. Different selection models can be built for different segments. Customer pattern recognition identifies relationships between different characteristics of customers. Market basket analysis determines what a customer buys at the same time and establishes relationships between those items. It is a type of customer pattern recognition method that shows customers’ preferences and is better than examining the customers’ purchasing records for a single item. 2.4. Cross-selling and up-selling From transaction records of customers, marketers can know what they bought in the past. This is a direct and easy way to decide what is to be offered to customers in a solicitation. However, just marketing products already existing in customers’ transaction records to customers is not enough. Marketers need to explore the changing needs of customers and then decide what products or services should be offered to target customers. Two of the most commonly used marketing strategies are cross-selling and up-selling. Cross-selling is the practice of suggesting similar products or services to a customer who is likely to buy something. This could include displaying a list of books on a Web page that is similar to the one purchased by other customers that have similar interests as that of the customer. In fact, Amazon.com recommends books to customers in this way. Up-selling is the practice of suggesting products or services (with higher price, more features, and better quality) to a customer who is considering a purchase, such as a gym membership with more privileges, or a faster computer. Both cross-selling and up-selling activities are concerned with estimating the potential buying behavior of customers based on sophisticated profiling such as market basket analysis that discover existing preference patterns of customers. 2.5. Direct marketing strategy planning Direct marketing is a business activity that is not a one time event. Strategies should be planned for a long run scope. Like any other business activity the goal of direct marketing strategy is to optimize various business requirements, such as profit maximization. On the other hand, marketers are also restricted by resource constraints such as cash, mailing cost, and inventory. Therefore the optimization problems should take those constraints into consideration. There are two main research questions related to this topic: How can direct marketing satisfy the business requirements of marketers? Is a single mailing enough in direct marketing? 2.6. Performance evaluation As a type of business activity, direct marketing campaigns are evaluated in terms of their revenue contribution to the company. The revenue contribution depends on the efficiency of the selection of target customers. Therefore, accuracies of the selection models should also be evaluated. The corresponding research questions related to this topic are: How to measure accuracy of the proposed models? How to estimate the profitability accruing from models for the direct marketers?
3. Inputs for direct marketing models 3.1. Types of data Typically, there are two kinds of data on customers’ characteristics that are used in quantitative models for direct marketing. The first type of data includes customers’ geographic, demo-
3
graphic, lifestyle, and socio-graphic characteristics (Bult, 1993; Bult and Wittink, 1996). Demographic data includes customers’ age, sex, family size, etc. Lifestyle data includes customers’ habits, booking of a certain magazine, leisure interests, etc. Geographic data includes customers’ location of home, office, and business. These three types of data are referred to as external data (Van der Sheer, 1998). The second type of data includes customers’ interactive behavior with marketers. Data on customers’ behavior includes customers’ transaction records, feedback from customers, and Web browsing records. The commonly used RFM variables are often extracted from transaction records of customers (Baumgartner and Hruschka, 2005; Bhattacharyya, 2000; Bitran and Mondschein, 1996; Bult and Wansbeek, 1995; Colombo and Jiang, 1999; Cui et al., 2006; Gonul and Shi, 1998; Jonker et al., 2004; Malthouse, 1999; Piersma and Jonker, 2004; Suh et al., 1999; Wedel et al., 1993). R stands for ‘recency’ and measures the length of time that has elapsed since a customer’s last purchase activity or the number of consecutive solicitations without response from the customers after the last purchase. F stands for ‘frequency’ or the number of products bought by a customer during a period of time. M stands for ‘monetary’ which represents the money value of a customer’s buying activity during a given time or in the last purchase transaction. RFM information helps estimate the probability that customers will buy a certain product. Customers’ feedbacks to marketers are also used by researchers to discover customers’ attitudes towards products or services (Bult et al., 1997; Ha et al., 2005; Hansotia and Wang, 1997; Viaene et al., 2001a; Viaene et al., 2001b). Due to the ubiquity of Internet access and e-commerce, marketers often carry out marketing activities online. E-commerce companies, such as Amazon.com, list products on their Web pages with related descriptions and prices. Sometime, there are also links to Web pages of similar products. When a customer wants to buy a product, (s)he might browse the Web pages in order to gather enough comparative information and then submit his/her order online after taking a decision. Browsing and purchasing behaviors reflect the customers’ interests and preferences. Data of customers’ browsing behaviors include among others customers’ IP address, operating system, key words used for searching, URL of Web pages visited, time of page visit, and length of time spent on a Web page. Suh et al. (2004) used Web browsing information such as IP address, time of access, and URL, to build a model that predicted the probability of customer purchase. Van den Poel and Buckinx (2005) used Web browsing data as well as customers’ demographic data and purchase records to predict customers’ purchases. They divided Web browsing data into two types, general clickstream data and detailed clickstream data. General clickstream included information on recency, frequency, and time of customers’ visits. Detailed clickstream data included information on number of pages visited related to products, supply procedures, and personal information. Some other factors are also believed to influence the decision on whether to buy a product or not. One such factor is the characteristics of products or services. Desarbo and Ramaswamy (1994) included a variable representing characteristics of products in their Customer Response based Interactive Segmentation Procedure model without specifying the exact characteristics of products. With increasing use of databases in modern businesses, transaction records containing detailed information on customers’ purchasing behaviors are available to marketers. Besides RFM information, what customers buy in each transaction can be known quite easily. Some researchers have profiled products according to product characteristics and customers according to their transaction records (Changchien et al., 2004; Chen et al., 2005; Liao and Chen, 2004; Min and Han, 2005; Suh et al., 2004; Weng and Liu, 2004). By comparing product profiles and
4
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
customers’ profiles they have come up with a plan to make personalized recommendation to customers to enable cross-selling or up-selling. Another factor that might influence customers’ choice is the characteristics of a solicitation. A solicitation could be a postal mail, an email, a short message on a mobile phone, or a recommendation on a Web page. Researchers have studied the influences of characteristics of postal mail on customers’ responses. Hansotia and Wang (1997) included dummy variables to represent different postal mail packages in their model. Bult et al. (1997) studied the influence of characteristics of postal mails on raising funds. Some of the characteristics were availability of a payment device, presence of brochure, position of illustration, amount of amplifier, content of post scriptum, owner of signature, and position of printed address. Van der Sheer (1998) studied some other characteristics of postal mail such as size of envelope, type of paper, type of extra imprint, line of text, format of sender’s name, and whether there is additional information about lottery. Very little research exists that examine factors that influence customers’ responses to types of solicitation other than postal mail. In Table 1, we list the types of data and number of attributes used in different researches. For the column titled ‘Other’, a ‘P’ indicates that product characteristics were used in the related paper and an ‘S’ indicates that solicitation characteristics were used in the related paper. 3.2. Significance of different types of data 3.2.1. Value of purchase data Behavioral data has gained the favor of most researchers. Very few papers use only external data for direct marketing (Bult, 1993; Bult and Wittink, 1996). Many researchers have found value in using behavioral data in various situations, such as catalog sales, online transactions, coupon promotions, etc. (Baesens et al., 2002; Buckinx et al., 2004; Changchien et al., 2004; Chen et al., 2005; Desarbo and Ramaswamy, 1994; Gonul et al., 2000; Haughton and Oulabi, 1997; Kaefer et al., 2005; Kim and Street, 2004; Kim et al., 2005; Kwon and Moon, 2001; Levin and Zahavi, 1998; Rao and Steckel, 1995; Shin and Sohn, 2004; Van der Sheer, 1998). Rossi et al. (1996) and Kaefer et al. (2005) have studied why the combination of two types of data is popular among researchers. Rossi et al. (1996) examined the impact of demographic data, demographic data plus record of one purchase made by a consumer, and demographic data plus the entire purchasing history of a consumer collected from point-of-sale on a direct marketing activity like target couponing. They found that purchasing histories (with an average 13 purchase records) or even a single purchase record can increase the revenue generated from target couponing compared to blanket couponing. This is because purchase records have a greater ability to explain customers’ choice compared to demographic data and also reveal customers’ sensitiveness to price. Kaefer et al. (2005) obtained better results when they used purchasing information as well as demographic information. From the above discussion, conclusions can be drawn that purchasing records are important for direct marketing in that by including them in direct marketing models, marketers can predict the choices of customers more effectively than using only demographic data. Due to the popularity of database technologies in business direct marketers do not worry much about the insufficiency of behavioral data. Bar code scanning devices exist in almost every supermarket and these devices help the marketers and market researchers such as A.C. Nielsen collect huge amount of transaction (Balasubramanian et al., 1998). Shaw et al. (2001) remarked that information technology has made large volumes of data on customers available for business use. As an example, they reported that Wal-Mart maintained a customer database which contained 43 terabytes of data at any time. But the collection and processing
Table 1 Type of data and number of variables used in direct marketing models References
External
RFM and transaction
Baesens et al. (2002) Baumgartner and Hruschka (2005) Bhattacharyya (2000) Bitran and Mondschein (1996) Buckinx et al. (2004) Bult (1993) Bult et al. (1997) Bult and Wansbeek (1995) Bult and Wittink (1996) Changchien et al. (2004) Chen et al. (2005) Colombo and Jiang (1999) Cui et al. (2006) Desarbo and Ramaswamy (1994) Gonul et al. (2000) Ha et al. (2005) Hansotia and Wang (1997), Haughton and Oulabi (1997) Heilman et al. (2003) Kaefer et al. (2005) Kim and Street (2004) and Kim et al. (2005) Kwon and Moon (2001) Levin and Zahavi (1998) Liao and Chen (2004) Malthouse (1999) Min and Han (2005) Rao and Steckel (1995) Shin and Sohn (2004) Suh et al. (1999) Suh et al. (2004) Van den Poel and Buckinx (2005) Van der Sheer (1998) Viaene et al. (2001a) Viaene et al. (2001b) Wedel et al. (1993) Weng and Liu (2004)
X
X X
22 7
X X
N/A 28
X
98 7 7 6
X X
X X
Feedback
X
Other
S
X
Number of variables
3
X
X
P
N/A
X
X X
P
N/A 5
X X
X X X X
X
X X
P
361 8
S
8 14 N/A
X
X
30
X X
X X X
6 5 93
X X
X X
50 N/A
X X
X X X X X X X X
X
X X X X
P P
P
S X
P
N/A 161 N/A N/A 55 27 26 92 N/A 18/25 6 7
of these data require money and time. Should marketers use as much purchasing information as is available? Heilman et al. (2003) studied the tradeoffs between the costs of using additional purchase data to profile prospective customers and improved accuracy. They found that marketers could achieve optimal profit by using more than one purchasing observation but less than the entire purchase history because of the cost of using additional data. One important point to be noted is that purchasing information is available only after a customer has made his/her first purchase. As a result, when marketers need to acquire new customers, they have to rely only on external data. Besides, collection of purchase data needs time. Waiting for more purchase data could result in loss of marketing opportunities. 3.2.2. Value of proxy information In addition to classification by meaning (e.g., external data and purchasing records) data can be also classified according to level of detail. For example, data could be about an individual customer or customers residing in an area with same postal code. The former
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16 Table 2 Summary of data used in direct marketing models Object
Type of data
Significance
Accessibility
Variability
Customer
Demographic, lifestyle, socio-graphic Transaction records, feedbacks, Web browsing log files Size, color, price Design style
Low
External
High
High
Internal, accumulating
High
High Low
Internal Internal
Low Low
Behavior
Product Solicitation
usually reflects perfect information about customers and is very expensive to collect. The latter, usually called postal information, reflects imperfect information and is easily available. Sometimes marketers use postal information in order to save costs and sometimes they do so because perfect information is not available. The postal information that often substitutes for individual information is called proxy information. Table 2 provides a summary of data used in quantitative models for direct marketing. The first column lists the objects which are of concern in direct marketing: customers, customer behaviors, products, and solicitations. The second column contains types of data which are used to describe those objects. The third column assesses significance of data on a relative basis and this represents the ability of data to influence the choices of customers. From the discussion in the previous sections, we believe behavioral data is very significant for direct marketing. In contrast, external data is less powerful. To the best of our knowledge no work has been done to compare the significance of product characteristics and solicitation characteristics together with customer data and behavioral data. But we believe that products’ characteristics reflect customers’ preferences and are closely related to customers’ behaviors. Further, product characteristics are important for cross-selling and up-selling activities when comparison is made between products that have been consumed by customers and products that have not. So the significance of product characteristics is marked as ‘High’ in Table 2. In contrast, solicitation characteristics can influence customers’ responses but they do not reflect customers’ preferences directly. Hence, the significance of solicitation characteristics is marked as ‘Low’ in comparison with that of product characteristics and customers’ behaviors. The column ‘Accessibility’ measures how easy it is to collect the data. Customers’ behavior, product, and solicitation data can be obtained by companies when there are transactions and is the internal data. On the other hand, business transactions between customers and companies seldom disclose demographic, lifestyle, or socio-graphic data of customers therefore companies have to get them from third-party sources, such as market research firms. This is external data. It is worth noting that behavioral data can be accumulated over time because transactions occur repeatedly over time. The more information about customers’ behavior the company records, the better the company understands customers’ preferences. ‘Variability’ indicates the degree of variations that exist between different objects on a relative scale. In direct marketing, the number of customers that marketers handle is at a much higher level than the number of products or solicitations. Thus, relatively speaking, we believe that the variability of products or solicitations is lower than the variability of the characteristics and behaviors of customers.
5
be inputs to direct marketing models as explanatory variables. But using all of them is not feasible and may bring errors due to noise in data or correlation between variables. Secondly, fewer attributes ease the effort of profiling customers and make customers’ preferences more understandable. The third reason is the curse of dimensionality that data mining techniques suffer from (Bellman, 1961). It influences the learning ability of machine learning techniques when they are trying to capture good patterns. As a result, it is necessary to select the most relevant variables. In direct marketing, two types of variable selection methods are used: wrapped methods and filter methods. Wrapped methods integrate the selection process with the actual target learning technique to see whether it was appropriate to include a certain variable based on target performance. Stepwise selection methods are a type of wrapped variables selection methods where each variable is evaluated sequentially to decide whether it should enter the model or not (Kim and Street, 2004; Kaefer et al., 2005; Kim, 2006; Viaene et al., 2001a; Viaene et al., 2001b). Filter methods operate independently from the target learning algorithm. Undesirable inputs are filtered out of the data before learning commences. Buckinx et al. (2004) used Relief-F algorithms to select variables. Relief-F examined the importance of a variable according to its relevance in distinguishing classes. After using feature selection, the numbers of variables that are considered in the model are significantly reduced. Viaene et al. (2001a) selected 6 variables out of 18, Viaene et al. (2001b) selected 9 out of 25 variables, Buckinx et al. (2004) selected 17 out of 98 variables, Kim and Street (2004) selected 6–7 variables out of 93 variables, and Kaefer et al. (2005) selected 3 out of 5 variables. It can be observed from all these papers that the best model is not the model that used all the variables. Rather the performance is much better when some less critical variables are removed by using feature selection. 3.4. Data for different direct marketing activities Some differences in the usage of data can be observed for the various direct marketing activities. For building target selection models for existing or past customers, summarized data such as RFM is usually used. This can be obtained by aggregating detailed transaction data over a period of time. For customer profiling, the direct marketers may examine transaction records of customers to determine what products are bought by customers at the same time so that the preferences of customers can be analyzed. Data on products’ characteristics is important for cross-selling and up-selling activities because similarities between products bought by customers and products to be recommended to customers can be determined based on this information (Weng and Liu, 2004; Li et al., 2005). In contrast, customer profiling and target selection needs to know the types of products instead of detailed product characteristics such as size or color. Direct marketing strategy involves the planning of multiple solicitation activities. As a result, the data for strategy planning needs to include temporal information. For example, it is necessary to calculate RFM information for each stage of solicitation in order to appropriately train the models (Bitran and Mondschein, 1996).
4. Processing with direct marketing models
3.3. Choice of most appropriate data for direct marketing
4.1. Methods for selection of targets for direct marketing
Due to advance in databases and data capture technologies, marketers are able to record transactions and other information about customers. It is common to find a customer record in a database that contains hundreds of attributes. These attributes can all
Tables 3–5 list the various techniques commonly used by researchers. The techniques listed in Table 3 are traditional statistical techniques. They include linear regression, Logit/Probit, Tobit, Beta-binomial model, Gamma-Poisson model, and discriminant
6
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
Table 3 Basic statistical techniques Techniques
Type of response
Type of score
References
Linear regression
Response interest Revenues Binary choice
Continuous
Levin and Zahavi (1998)
Continuous Binary Integer
Malthouse (1999) Bodapati and Gupta (2004), Bult et al. (1997), Bult and Wansbeek (1995), Hansotia and Wang (1997), Levin and Zahavi (1998), Van den Poel and Buckinx (2005) Levin and Zahavi (1998)
Integer
Heilman et al. (2003)
Continuous Binary Binary
Hansotia and Wang (1997), Levin and Zahavi (1998) Rao and Steckel (1995) Bult (1993), Bult and Wittink (1996)
Logit/Probit
Tobit Beta/Gamma Discriminant analysis
Number of products Categorical choice Revenues Binary choice Binary choice
Table 4 Advanced statistical techniques Techniques
Type of response
Type of score
References
Two stage Beta + Gamma Two stage Logit + Linear Two stage Probit + Non-linear Latent class model Logit
Binary choice and Revenues Binary choice and Revenues Binary choice and Revenues Revenues Binary choice Number of products Binary choice
Continuous Continuous Continuous Continuous Continuous Integer Binary
Colombo and Jiang (1999) Levin and Zahavi (1998), Van der Sheer (1998) Baumgartner and Hruschka (2005) Gonul et al. (2000) Desarbo and Ramaswamy (1994) Wedel et al. (1993) Bitran and Mondschein (1996)
Latent class model Poisson Latent class model Probit
Table 5 Machine learning techniques Techniques
Type of response
Type of score
References
ANN ANN
Response probability Binary choice
Continuous Binary
ANN Bayesian ANN CHAID/CART DT and Naive Bayes DT LS-SVM GP GA GA and GP Hybrid f(ANN + Logit + RFM) Hybrid f(ANN + DT + Logit) Hybrid f(BBN + GP)
Categorical choice Binary choice Binary choice Response probability Binary choice Binary choice Binary choice Revenue Binary choice and revenue Response probability Response probability Response probability
Integer Binary Binary Continuous Binary Binary Binary Continuous Binary and continuous Continuous Continuous Continuous
Kim and Street (2004), Kim et al. (2005), Shin and Sohn (2004) Ha et al. (2005), Kaefer et al. (2005), Viaene et al. (2001a), Zahavi and Levin (1997) Heilman et al. (2003) Baesens et al. (2002) Haughton and Oulabi (1997) Ling and Li (1998) Buckinx et al. (2004) Viaene et al. (2001b) Kwon and Moon (2001) Bhattacharyya (1999) Bhattacharyya (2000) Suh et al. (1999) Suh et al. (2004) Cui et al. (2006)
analysis. Table 4 includes advanced statistical techniques that consist of two-stage models or latent class models. The techniques listed in Table 5 are data mining techniques based on machine learning. They are artificial neural networks (ANN), genetic algorithms (GA), genetic programming (GP), decision trees (DT), and support vector machines (SVM). 4.1.1. Basic statistical techniques Regression models are the most commonly used techniques. They are simple but have limited explanatory ability. The coefficients of regression models represent the degree of influence of explanatory variables. Linear regression, Logit, Probit, and Tobit models are all regression models. Linear regression models can generate continuous scores like response interest (Levin and Zahavi, 1998) and the estimated amount of spending of a customer (Malthouse, 1999). When selecting target customers based on the scores given by a linear regression model, a threshold is usually set beforehand. If the score of a customer is greater then the
threshold, he/she is selected. The percentage of customers that are selected is referred to as the response rate of the regression model. Logit, Probit, and Tobit models are different from linear regression models in that they can deal with discrete response directly by using latent variables. Logit and Probit models can produce a binary choice score (Bult et al., 1997; Bult and Wansbeek, 1995; Hansotia and Wang, 1997; Levin and Zahavi, 1998; Van den Poel and Buckinx, 2005) or categorical values if there are more than two choices (Heilman et al., 2003). Bodapati and Gupta (2004) termed the way Logit/Probit models predicted discretized response as direct approach and that used by linear regression models indirect approach. According to them, the direct approach achieved better predictive performance for large samples because the bias was less for the direct approach. Latent variables are used in Tobit models to truncate values that are greater or lesser than a threshold. For example, in direct marketing, customers either generate revenues larger than zero or no revenue at all and so negative
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
values generated by models do not have any meaning. Tobit models convert negative values to zero and keep only the positive values. Tobit models usually model items with continuous values like amount of money spent by a customer (Hansotia and Wang, 1997; Levin and Zahavi, 1998). Other types of statistical techniques are also used for direct marketing such as discriminant analysis (Bult, 1993; Bult and Wittink, 1996) and sometimes they give rise to more accurate results than that obtained using regression, and Beta-binomial models (Rao and Steckel, 1995). However, different statistical techniques require different assumptions. Violation of those assumptions can cause unnecessarily inaccurate estimation of the parameters of those models, resulting in overly inaccurate predictions. 4.1.2. Advanced statistical techniques Some more advanced statistical models have been used in the past. These advanced models combine two simple statistical models together and hope to leverage the strengths of each model while overcoming the weaknesses of both models. These two stage models try in its first stage to model the probability of response and in the next stage the monetary value a customer might spend in response to direct marketing activities. The score developed by a two stage model is the expected revenue the customer might generate. Colombo and Jiang (1999) used a beta distribution model to deal with the probability of customer response and a gamma distribution model to deal with monetary value. Levin and Zahavi (1998) and Van der Sheer (1998) used Logit models to estimate customer response and linear regression models to estimate monetary value. Baumgartner and Hruschka (2005) used a probit model to distinguish between customers who had made a purchase and those who had made a purchase and sent back the goods. They followed this analysis with nonlinear regression models to estimate the monetary value of the purchased products and the returned products of customers. A more complicated technique like latent class analysis is used when segments exist among customers. Latent class analysis performs segmentation and builds selection models within segments simultaneously. The benefit of latent class analysis is that it can derive market segments and can model customers’ response at the same time. Desarbo and Ramaswamy (1994) used latent class analysis which built Logit models with each segment to solve binary response problems. Wedel et al. (1993) also used latent class analysis to build a Poisson regression model within each segment for estimation of the quantity of products a customer might buy. Latent class analysis can be used to determine customers’ response as well as time of response. Gonul et al. (2000) included the Logit model in their latent class analysis model and studied the impact of timing of solicitations on customers’ response. Bitran and Mondschein (1996) used Probit models in the latent class analysis model to estimate LTV of customers over multiple solicitations. 4.1.3. Machine learning techniques Another group of quantitative models is based on machine learning. According to Mitchell (1997), ‘‘Machine learning is a process in which computer programs can learn to improve their performance from experience of doing certain tasks”. Kantardzic (2003) differentiated the statistical and machine learning based approaches in two ways. Firstly, statistical models put more emphasis on mathematics and formalizations compared to machine learning based approaches. Secondly, modern statistics is almost entirely driven by the postulated structure, or an approximation to a structure, which could have led to the data. In contrast, machine learning tends to emphasize algorithms that are usually in the form of ‘learning’ processes that extract rules from data. The various machine learning techniques used for direct marketing are listed in Table 5.
7
Among these techniques listed in Table 5 ANN is good at learning non-linear relationships between the input and output. DT is a rule-based classification technique. SVM can perform either linear or nonlinear classifications (Burges, 1998). The most popular application of those techniques in direct marketing is in binary classification of customers. The applications of ANN for determining target customers can be found in Kim and Street (2004), Kim et al. (2005), Viaene et al. (2001a), and Zahavi and Levin (1997). Buckinx et al. (2004) and Ling and Li (1998) used DT to select customers. Viaene et al. (2001b) used least squares support vector machine (LS-SVM is a type of SVM (Suykens et al., 2002)) for binary classification of responders and non-responders of postal mail direct marketing. ANN and DT have been used for classification with more than two classes. For example, Heilman et al. (2003) used ANN to classify customers into three classes in order to identify the brands that they were loyal to; Haughton and Oulabi (1997) used DT techniques to produce treelike segmentation of customers in direct marketing. ANN is the most popular data mining technique used in direct marketing. Besides customer selection, it is also used for other purposes. Shin and Sohn (2004) incorporated data envelopment analysis techniques with ANN to evaluate the profitability of each customer. Kaefer et al. (2005) used ANN to classify customers in such a way that the timing of direct marketing improved. DT has strong ability to deal with categorical input variables. When processing categorical data with more than two levels of value, ANN, SVM, and statistical techniques create dummy variables for each level of value of the related input variable, and this adds a computational burden to the processing of those models. In contrast, DT can derive rules by making use of categorical data directly without creating dummy variables. However, DT cannot use continuous variables directly and has to convert them into categorical data. The fourth machine learning technique that is gaining popularity in direct marketing and is quite different from previously introduced techniques is the family of approaches simulating natural evolutionary processes such as Genetic Algorithms (GA), Genetic Programming (GP), and Evolutionary Programming (EP). These techniques are based on a search procedure that simulates natural selection and evolution including the following steps: selection, crossover, and mutation (Goldberg, 1989). GA, GP, and EP are suitable for optimization problems such as selection of solicitation targets while satisfying business requirements. Kwon and Moon (2001) proposed GP models for the target selection problem for email based direct marketing. Bhattacharyya (1999) and Bhattacharyya (2000) proposed GA and GP based approaches, respectively, to search for best solutions for direct marketing which maximized cumulative response rate and revenue at different mailing depths. Mailing depth refers to the top percentage of customers considered for the solicitation process. For example, 10% mailing depth means the top 10% customers are used for solicitation. In mailing depth analysis, individual customers were ranked in descending order according to a certain criteria like response probability or revenue. 4.1.4. Ensemble and hybrid techniques Usually, a data analysis technique can generate more than one possible solution because of different distributions of training data. One approach in dealing with this is to evaluate the performance of each model and to select the best one. Another approach is to use the ensemble of all possible models. The ensemble approach tries to combine the results of all possible models based on certain principles. There are two basic ensemble approaches used in direct marketing: bagging and boosting. The bagging approach replicates training sets by sampling them and combining the results of all solutions using averaging or majority voting. The boosting
8
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
approach assigns weights to features in the data set. The higher the weights the larger the influence a feature has on the learning process. The weights are adjusted iteratively over trials. Bagging and boosting usually show good performance on training data because they combine multiple solutions which individually may exhibit some bias. Ha et al. (2005) used bagging ANN to predict whether a customer will respond to a direct marketing solicitation. Ling and Li (1998) used boosting DT for the same problem. Kim and Street (2004) used an ensemble approach that combined the above approaches. The ensemble techniques described above make use of one specific technique. However, besides combining the results of the same technique, different techniques can also be combined using a hybrid approach with the purpose of eliminating bias of single techniques. Suh et al. (1999) analyzed the correlation coefficient among predicted response probabilities obtained from ANN, Logit, and a targeting approach that used the RFM attributes of customers for segmentation. It was found that the correlation coefficient between the results of ANN and those of RFM targeting and that between the results of RFM targeting and Logit were low and the correlation coefficient between the results of ANN and Logit was high. Subsequently, it was found that the combination of ANN and RFM in a single model achieved improved classification accuracy whereas the combination of RFM and Logit could not do so. Zahavi and Levin (1997) proposed a Best Double-Scoring (BDS) method to combine the scores of a ANN and a Logit model for each customer in direct marketing using functions such as Min [A,B], Max [A,B], and A + B, etc., where A and B represented the scores received by customers from two different scoring techniques. Suh et al. (1999) adopted and modified BDS in their work for customer list segmentation in direct marketing. Besides ANN and Logit models, Suh et al. (1999) also used RFM based approach in their hybrid models. They modified BDS so that it could combine three different techniques, namely ANN, Logit, and RFM. Lin and McClean (2001) proposed a hybrid approach that used accuracy as weight for each binary classifier to calculate a combined score for prediction of failure of a company. Suh et al. (2004) adapted Lin and McClean’s approach for real time direct Web marketing. The techniques used by Suh et al. (2004) were ANN, Logit, and DT. Cui et al. (2006) developed another type of hybrid approach. The techniques used in the research included Bayesian Belief Networks (BBN) and GP. First a directed acyclic graph was constructed that linked the variables to examine causal relations between them. Then the model was evaluated using Minimum Description-Length (MDL) (Rissanen, 1978). The MDL was then used as the fitness measure to help GP find the best BBN topology. The work done by Cui et al. (2006) was different from other hybrid approaches because instead of scoring customers separately using each technique and combining the results, they used GP to help construct better BBN model in an iterative manner.
regression models that used the segment of customers as dependent variable and customers’ characteristics as independent variable (Desarbo and Ramaswamy, 1994). The third type of profiling approach involved the use of unsupervised data mining techniques such as K-means clustering and Self Organizing Map (SOM). Min and Han (2005) used SOM to cluster customers with similar interests at different periods of time. SOM consists of a set of artificial neurons whose weights are adjusted to match input vectors in a training set (Kohonen, 1995). Weng and Liu (2004) used a twostage clustering approach which integrated SOM and K-means clustering. The K-means technique classified customers into a specified number of clusters while SOM could decide the number of clusters automatically. In the two stage model, the number of clusters generated by SOM was used as an input to the K-means algorithm. The results of their experiments showed that two stage clustering had higher concentration of customers (measured by the coefficient of variation of intra cluster dispersion) in a cluster than SOM clustering. Customer pattern recognition is used to discover correlations of events which can be represented as probabilistic rules within each cluster of customers. Association rule based techniques are often used to find customers’ behavior patterns. Association rule mining was first proposed by Agrawal and Srikant (1994) and was used to discover the correlations of events that could be represented as probabilistic rules. In direct marketing, the events could be customers’ characteristics or customers’ purchases of certain related products (Changchien et al., 2004; Chen et al., 2005). Chen et al. (2005) used association rules mining to analyze the correlations between demographic characteristics of customers and their purchasing behavior represented by RFM data at different periods of time in order to see variations in behavior. Customers’ Web browsing behavior could also act as important events for association rule mining (Liao and Chen, 2004; Suh et al., 2004). Suh et al. (2004) used association rules to find patterns in customers’ Web log files which recorded several items related to Web browsing actions of customers at an e-commerce Web site, such as pages requested, timestamps, IP addresses, URLs, orders placed etc and related those patterns to customers’ purchases. Two types of pattern were found. First some important pages were identified. Those important pages appeared frequently in customers visiting paths and ended with the customers’ purchase of certain products. The second pattern was related to the sequence of Web page visits. These patterns were used as inputs to a hybrid model that consisted of three data mining techniques (DT, NN, and LR) to predict customers’ purchase probability. The hybrid model determined the correlations between customers’ browsing behaviors and purchasing behaviors. Identification of these patterns helped marketers by identifying meaningful categories of products and target customers.
4.2. Methods for customer clustering and pattern recognition
To enable cross-selling and up-selling, recommendation systems are often used. There are mainly two kinds of recommendation approaches: content based (Lang, 1995) and collaborative filtering (Shardanand and Maes, 1995). The content based approach measures the degree of similarity between candidate items and items included in users’ profiles. The collaborative filtering approach tries to find similarities between users’ preferences. In the context of direct marketing, the item is the product or products’ characteristics. Users’ profile and preferences are found by analyzing customers’ transaction records. Products which are similar to the customers’ profile or are bought by other customers having a similar profile are usually recommended. Thus, recommendation approaches require both customer profiling and product profiling. Product profiling is similar to customer profiling. Products are described by items representing different characteristics and then
Customer clustering aims to classify customers into different groups. Customers within the same group are chosen to be more similar to each other than to customers in different groups. Various approaches have been used for profiling. One type of approach selected a certain cutoff for independent variables so that the resulting segmentation could have highest gain in profit. One example of this approach was RFM profiling. Jonker et al. (2004) and Bitran and Mondschein (1996) classified customers using RFM values. Customers were first divided into groups based on their R value. Within each group, customers were subdivided according to their F value and then further subdivided based on their M value. The second type of profiling approach applied statistical models such as latent class analysis. Further profiling was done by constructing
4.3. Methods for cross-selling and up-selling
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
these items are clustered. The clustering techniques used are the same as those used for customer clustering and described in the previous section. Cheung et al. (2003) used SVM in their content based approach to decide what products are to be offered to a customer. Min and Han (2005) used collaborative filtering for developing their recommendation system. They identified which cluster a customer belonged to in different time periods in order to find time-varying profiles of customers. Li et al. (2005) conducted collaborative filtering analysis for both customers’ transaction records and product characteristics in order to carry out recommendations. Content based approaches and collaborative filtering have their own advantages and disadvantages. A content based approach can recommend products that suit customers’ taste but it cannot process multimedia information and cannot recommend products with distinct characteristics that are different from the products existing in customers’ transaction records. A collaborative filtering approach recommends products with distinct characteristics but cannot recommend products that don’t exist in transaction records of any customers within the same cluster. Therefore, hybrid approaches that integrate content based and collaborative filtering approaches have been used to improve the performance of customer profiling and product profiling (Changchien et al., 2004; Weng and Liu, 2004). Customers’ preferences towards certain products and similarities between customers are calculated at the same time by comparing between customers’ profiles and products’ profiles and between different customers’ profiles. By comparing customers’ profile and products’ profile, products that had not existed in any transaction records of customers can be recommended. By comparing different customers’ profiles, products bought by some customers can be recommended to others who have a similar profile and who have not purchased these products as yet. 4.4. Methods for optimization of business requirements of marketers Mathematical programming and evolutionary algorithms are usually used for optimization in direct marketing. They take into account profit or cost parameters for estimation of model related coefficients. Bult and Wansbeek (1995) and Van der Sheer (1998) proposed a profit maximization approach in which profit and cost factors are included in the estimation function for Logit models. Prinzie and Van den Poel (2005) proposed a model with mailing depth constraints that reflected financial constraints. They modified ordinary binary Logit by using Weighted Maximum-Likelihood estimation which ranked customers according to response probabilities. By doing this, cost-effectiveness was taking into consideration when estimating regression coefficients. Bult (1993) and Bult and Wittink (1996) included cost factors in score estimation using discriminant analysis. They studied symmetric and asymmetric loss situations. Symmetric loss problems assume that the cost of misclassification of a customer that would have responded to the mail solicitation but did not receive the mailing (false negatives) is the same as the cost of misclassification of a customer that received the mail solicitation but did not respond (false positives). Asymmetric loss problems consider the cost of the two types of losses as different. If solicitations are sent to people who were not interested in buying the products, the loss encountered by the marketers included the postal expenses. However, if people who actually wanted to buy the products did not receive solicitations, the loss to the marketers would be the potential revenues they might have contributed if they had received solicitations. Typically in real cases of direct marketing, the loss due to false negatives is much larger than the loss due to false positives. Mathematical programming consists of an objective function and a set of constraints. The results from selection models such as the ones we discussed in the previous section are used as inputs
9
to the mathematical programming models as possible solutions (Cohen, 2004). Using the input, the mathematical program selects the best solution that can optimize the objective function while satisfying the constraints. Cohen (2004) used mathematical programming models to decide what types of products should be sent to what type of customers. The objective was profit maximization and the constraints included available amounts of products and limits on available monetary resources, among others. A group of studies used stochastic programming models to deal with multiple solicitation problems in direct marketing because direct marketing is not a one time activity. In a multi-solicitation problem, the fundamental problem still revolves around selection of customers. However, it is not reasonable to abandon a customer if (s)he does not respond to a single solicitation because (s)he may respond to later ones. Thus, marketers have to use different criteria to evaluate target customers. LTV of a customer is often used as the criteria in multi-solicitation problems, which, according to the definition provided by Pearson (1994), is ‘‘the net present value of the stream of contributions to profit resulting from the revenues from customer transactions and allowing for the costs of delivering products, services and promised rewards to the customer”. Stochastic programming models are used to evaluate LTV of customers by modeling customers’ evolution through multiple runs of solicitations. Bitran and Mondschein (1996) and Gonul and Shi (1998) used Markov chain models to represent the evolution of customers’ status characterized by their RFM values. While Bitran and Mondschein (1996) had only one objective function for profit maximization, Gonul and Shi (1998) included two objective functions. One of them tried to maximize profit for direct marketers and another tried to maximize utility for customers by capturing customers’ change in behavior resulting from changes in mailing policies. Piersma and Jonker (2004) also used Markov decision models to model the multi-solicitation problem. The difference is that they focused on the problem of optimizing the long-term frequency of mailing and allowed multiple responses to multi-mailings. One limitation of Gonul and Shi’s work was that the problem solving process involved Maximum-Likelihood estimation which resulted in high computational cost. In contrast, Simester et al. (2006) estimated customers’ state evolution directly from historical data. They constructed the state space and segmented the state space using binary tree like methods. To determine the optimal mailing policy for each state, they estimated the profit and transition probability by observing average rewards in historical data and the proportion of times transition occurred. The advantage of that approach was that no functional form assumption was imposed on the problem. Also, they tracked customers’ purchases with reference to different periods of mailing instead of looking for specific catalogs that customers ordered and this made the model even more realistic. Van der Sheer (1998) studied the problem of mailing frequency from another point of view. He decomposed the purchasing behavior of customers according to three key components: inter-purchase time, purchase acceleration, and direct marketing product choice. Inter-purchase time measured the time lapse between consecutive purchases. Purchase acceleration considered the fact that customers might accelerate their purchase timing due to direct marketing solicitations. Direct marketing product choice examined whether a customer made purchases through direct marketers or at a store. Evolutionary algorithms incorporate target selection in the process of parameter optimization through genetic algorithms and similar techniques. The business objectives and constraints are defined as fitness functions in the process. Bhattacharyya (2000) used both genetic algorithms and genetic programming techniques to search for best solutions along the Pareto-frontier (that consisted of a set of solutions) for direct marketing activities. This solution represented maximum cumulative response rate and revenue at
10
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
the corresponding mailing depth. Kim and Street (2004) presented a combined GA/ANN approach to select target customers while achieving two principal goals at a specific target point: model interpretability and predictive accuracy. ANN was used to score customers and GA was used to search for possible combinations of features as inputs to ANN which could optimize multiple objectives. Kim et al. (2005) used a similar approach as Kim and Street (2004). The main difference was that they used Evolutionary Local Selection Algorithm (ELSA) approach, an improved version of standard GA. ELSA performed better than standard GA because it searched the Pareto-frontier completely, and gave decision makers more information on trade-offs among different objectives. For the first approach, profit and costs can only be included in the estimation process as factors and it was not possible to model multiple objectives. Among the three approaches for optimization in direct marketing research, mathematical programming is the most flexible because it can incorporate business objectives and constraints. Evolutionary algorithms can optimize multiple objectives but no research has been done to show their ability to incorporate various business related constraints. To define constraints as fitness functions might be one solution but a problem may arise when there are a lot of constraints. 5. Evaluation of direct marketing models There are various performance evaluation criteria for judging the output of direct marketing models. The process of model building and evaluation for direct marketing involves two steps, training and testing. Two different data sets are used in the two steps. The training data set is used for building the direct marketing model and the validation data set with unknown outcomes is used for checking the performance of the predictive direct marketing model. In this section we focus on the criteria that are used to evaluate direct marketing models from a managerial perspective, and the criteria used are accuracy and profitability. If no specific information is given, we are discussing the performance of direct marketing models on validation data sets. 5.1. Determination of accuracy of direct marketing models When the accuracy of the models is evaluated, a confusion matrix such as the one shown in Table 6 is produced. In Table 6, ‘Actual’ stands for the value in real situations whereas ‘Prediction’ represents the outcome of direct marketing models. From the confusion matrix, several accuracy measures can be developed. A in the confusion matrix is sometimes referred to as true positive and it represents the number of correctly identified actual responders. B is known as false negative or the number of actual responders identified as non-responders. C is referred to as false positive which represents the number of actual non-responders identified as responders. D is called true negative or the number of correctly identified actual non-responders. One measure of accuracy is the ratio of correct identification, or PCC (Percentage Correctly Classified instances), which is defined as the ratio of A + D (sum of diagonal) over the total (i.e. A + B + C + D). There are several studies that use a confusion matrix to evaluate models or compare between different models. Kaefer et al. (2005) Table 6 Confusion matrix Prediction
Positive
Negative
Sum
A C A+C
B D B+D
A+B C+D A+B+C+D
Actual Positive Negative Sum
observed higher accuracy for both ANN and Logit models using purchase information in addition to demographic information. Their results showed that ANN performed better than Logit with an average classification advantage of 21.5% over 20 purchases. Suh et al. (2004) compared the misclassification rates of DT, ANN, LR, and a hybrid model of the three. The result showed the hybrid model had the lowest misclassification rate. A problem that can occur when using PCC as a measure of accuracy is that it does not say whether the model is able to determine true positives and true negatives equally well. For example, Gonul et al. (2000) showed that their model had PCC of 81%, but the ratio of true positive instances to total was less than 15%. This implied that their model was much better at identifying true negatives. However, only the knowledge of PCC could not reflect that. A typical measure of classification accuracy like PCC assumes symmetric misclassification costs for false positive and false negative predictions. However, in direct marketing, the cost of misclassification is usually asymmetric. To deal with this problem, some researchers have included a coefficient in their model that represented the ratio of two kinds of misclassification costs and made the loss function more general (Bult, 1993; Bult and Wittink, 1996). Bult (1993) compared a discriminant analysis model with a Logit model and showed that for both models asymmetric loss cases could cost less money than symmetric loss cases. Bult and Wittink (1996) further accommodated heterogeneity into the asymmetric loss function. They reported that the accuracy of the heterogeneous asymmetric loss function model was not significantly higher but the heterogeneous asymmetric loss function model was more capable of identifying customers who actually responded. Levin and Zahavi (1998) observed that the linear regression model tended to ‘over-predict’ which resulted in false positives whereas the Tobit model tended to ‘under-predict’ which resulted in false negatives. The question still remains: what are the best criteria to compare quantitative models. In general two measures: recall and precision, are usually used. Recall, which is the ratio of A to A + C, measures the percentage of true positives for all predicted positives. On the other hand, precision or the ratio of A to A + B measures the percentage of true positives identified in all actual positives. Recall, which is also known as response rate in direct marketing, is usually given more importance since solicitation cost is usually much smaller than profit generated by customers. Response rate is a popular measure that is used in traditional direct mail marketing applications (Haughton and Oulabi, 1997; Levin and Zahavi, 1998). Haughton and Oulabi (1997) compared the performance of a CHAID model and a CART model in terms of response rate. Their results showed that the performance of the two models were remarkably similar. Levin and Zahavi (1998) compared the response rate of a Logit model, a Tobit model, a linear regression model, and a two stage model. Their results showed that the Logit model performed the best. Response rate can also be used to evaluate models developed exclusively for new direct marketing channels. Kwon and Moon (2001) used response rate as the measure of performance for direct email marketing. Their model had a 20% higher response rate compared to random targeting on average. Some researchers have used response rate to evaluate the performance of models for multiple solicitation problems. Piersma and Jonker (2004) compared the performance of their Markov decision model that considered mailing frequency with the model proposed by Bitran and Mondschein (1996) and found that their model’s response rates were not high enough. A relatively new use of response rate can be found in the evaluation of recommendation systems. Weng and Liu (2004) used response rate as the measure to compare their recommendation methods, which combined customer profile and customer cluster profile models, with the original method using customer profile on their own. Response
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
rate was defined by the authors as the ratio of the number of products bought by the customers to the number of products recommended to customers. The results showed that the combined method exhibited highest values for recall and precision. Response rate can also be measured in terms of cumulative gains charts that are also known as lift curves or ‘banana charts’. A typical cumulative gains chart is shown in Fig. 2. The x-axis represents deciles while the y-axis represents response rate. This chart plots cumulative response rate over deciles of target customers. The higher the curve the better is the model in terms of response rate. Ha et al. (2005) reported that ANN models outperformed Logit models in terms of lift curve of response rate and bagging ANN models outperformed single ANN models. Ling and Li (1998) illustrated the performance of boosted Naive Bayes and boosted DT in terms of response rate and lift curve. It was found that over-sampling did not affect the performance of Naive Bayes but increased the response rate of DT. One problem that is encountered with respect to lift curves is that it is difficult to tell whether one lift curve is higher than another because two curves can intersect and this implies that at some deciles one model has higher response rate than the other but the relationship reverses at other deciles. Thus it is more meaningful to compare cumulative response rate at different deciles. Prinzie and Van den Poel (2005) compared their constrained optimization approach with the unconstrained approach and observed that the constrained model outperformed the unconstrained one up to a mailing depth of 48%. Kim and Street (2004) exhibited the performance of their ensemble (GA/ANN) model using a lift curve and demonstrated that the ensemble model had best performance at five different target points and second best performance at one particular target point. The ensemble model also performed better than single ANN models for the top deciles of targeted customers but was not as good as single ANN models at middle deciles. The authors remarked that oversearching was responsible for the inferior performance of their model for middle target points. Kim et al. (2005) found that their ANN model performed best for the top 20% of targeted customers. The value of lift can also be used to evaluate the accuracy of models. Lift is calculated by computing the ratio of response rate at top deciles achieved by direct models to the response rate achieved by random targeting and represented by the blue line in Fig. 2. The value of lift can be used to determine the improvement obtained using direct marketing models over random marketing. Cui et al. (2006) used lift values to compare between BNN augmented by EP, ANN, CART, and Latent
Fig. 2. A sample lift curve.
11
class models. The highest lift value was obtained by the BNN augmented by EP at top deciles. To distinguish between false positives and false negatives it is important to examine the accuracy of a model used in direct marketing using an ROC curve. A sample ROC curve is shown in Fig. 3. The ROC curve is usually plotted as a two-dimensional graph representing sensitivity along the y-axis and 1-specificity along the x-axis for various classification threshold values. Sensitivity is the ratio of true positives over all actual positives while specificity is the ratio of true negatives over all actual negatives and 1-specificity is the ratio of false positives over all actual negatives. The higher the ROC curve the better is the model. One measure related to the ROC curve is AUROC which stands for Area under ROC Curve. A larger AUROC represents a better model. Sometimes the ROC curve is plotted with 1-sensitivity along the y-axis to study the trade-off between false positives and false negatives. Ha et al. (2005) compared the performance of bagging ANN, single ANN, and Logit using both confusion matrix and ROC curve. Their results showed that the bagging ANN performed best in worst and median cases and its performance was similar to single ANN but better than Logit for the best case. Viaene et al. (2001a) observed that elimination of redundant and irrelevant inputs could reduce the complexity of models significantly without degrading the predictive generalization ability measured in terms of PCC and AUROC. Baesens et al. (2002) compared the performance of Logit, linear discriminant analysis, quadratic discriminant analysis, and Bayesian ANN models using PCC and AUROC and found that the Bayesian ANN outperformed others in terms of both PCC and AUROC. 5.2. Profitability of models Profitability measures the ability of the direct marketing model to generate profits or revenues. Each direct marketing solicitation has a cost associated with it and generates revenues. The revenue obtained from each customer can be predicted directly or indirectly. Models that score customers in terms of potential revenues are known as direct approaches, such as a Tobit model. There is also the indirect approach that classifies customers as responders or non-responders using a binary value and then calculates expected revenues from them, such as a two stage model. Both Van der Sheer (1998) and Levin and Zahavi (1998) compared two stage models and Tobit model and showed that the former performed better in terms of revenues. The relationship between revenue and cost can influence profitability. Kim and Street (2004) compared the expected net profit
Fig. 3. A sample ROC curve.
12
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
generated by ANN, GA/ANN ensemble model, and random targeting under different mailing costs. It was found that when the mailing cost was low, even random targeting could generate a positive profit. However, when the mailing cost was high random targeting resulted in monetary loss. The GA/ANN ensemble model generated highest expected profit among the three methods when the mailing cost was either high or low. Most direct marketing models assume that the cost of each solicitation and the revenue each customer can generate are homogenous across all customers. However in real life this assumption is not always true. Bult and Wittink (1996) reported a better performance in terms of profitability and accuracy with the discriminant analysis model using the heterogeneous asymmetric loss function. Discriminant analysis was used because it had better tolerance for heterogeneous distribution of parameters compared to other models. In multiple mailing cases, researchers care more about the LTV of each customer or the revenue that the customer can contribute in the long run. Bitran and Mondschein (1996) used simulation based on real data and examined the performance of their model with budget constraints and inventory costs. It was shown that multi-mailing outperformed single mailing in terms of revenues earned in the long run. Piersma and Jonker (2004) showed that their Markov decision model which was an extension of the model proposed by Bitran and Mondschein (1996) could generate higher net profit in the long run. Profit can be interpreted in other alternative ways. Gonul et al. (2000) chose marginal profit instead of general profit and showed that their hazard model exhibited higher expected marginal profit than actual observed marginal profit. An indirect way of measuring profit would be to find out if the model was able to save cost. Bhattacharyya (1999) and Bhattacharyya (2000) measured the profitability of the proposed GA/GP Pareto model by calculating the expected revenue that could be saved by retaining those customers that were likely to churn. The expected-revenue-saved was obtained by multiplying the churner capture rate by the revenues generated at chosen deciles. The lift curve can also be used to illustrate the profitability of a model by plotting cumulative profit over deciles of targeted customers. Colombo and Jiang (1999) measured the performance of a two stage beta/gamma distribution model in terms of the LTV that could be generated by the model. Their results showed that the expected LTV lift curve was above the random selection line with Gini index of 0.68. (Gini index is the ratio of the area between the lift curve and the 45° line to the area above the line). Malthouse (1999) showed that the ridge regression model was substantially more profitable at key quantiles and also more stable than least square regression models.
a popular measure for evaluating statistical models. However, it is not that commonly used for evaluating quantitative models in direct marketing due to two reasons. Firstly, accuracy and profitability measures are easily understandable and enough to satisfy marketers’ requirements. Secondly, R2 is not a popular performance measure adopted by machine learning researchers, who prefer measures like accuracy. 6. Discussion 6.1. Collection of data The main purpose of using quantitative models for direct marketing is to understand customers in order to predict whether a customer will buy products/services offered to them or not. In other words, the marketers want to measure the customers’ desire to purchase. Unfortunately desire cannot be measured directly therefore alternative information is used to measure desire indirectly. Different types of data have different abilities in explaining customers’ desire to purchase and we refer to this as interpretability of data. From the previous discussion of the literature on direct marketing we identified two dimensions which influenced interpretability of data: level of detail and type of data. The relationship between them is shown in Fig. 4, where the y-axis represents level of detail and the x-axis represents type of data. Level of detail indicates whether the data is about an individual or a group. If each customer is described by his/her own unique information, the data reveals perfect information about the particular customer. However, data such as proxy data are aggregated sometimes and in that case, the data reveals imperfect information about a particular customer. The higher the level of detail the higher is the interpretability. Two types of data are represented along the x-axis – external and behavioral. Behavioral data has higher interpretability than external data. So, the interpretability of data increases along the diagonal in Fig. 4. Thus, data which is located in quadrant III in Fig. 4 has the highest interpretability. However, it is more difficult to collect this type of data. In Fig. 4, the effort of data collection decreases along the direction of the dotted line. Individual data exhibits a large number of missing values and high cost of data collection. Behavioral data is usually collected during business transactions and owned by companies who sell products or services to customers. Behavioral data is regarded as confidential data. Either the owners do not want to share this data with researchers or high prices are charged for selling this data. Collecting behavioral data from individuals by means of surveys or interviews is also a difficult endeavor. People don’t want to disclose their preferences or even if they are willing to do so sometimes they may not be able
5.3. Other measures Some other measures have also been used to evaluate analytical models in direct marketing. The complexity of the models is always a concern for researchers. Bult et al. (1997) employed the Consistent Akaike’s Information Criteria (CAIC) when selecting between models that used mailing characteristics and customers’ characteristics as independent variables. CAIC used an explicit trade-off between model complexity and accuracy. Interested readers can refer to Bozdogan (1987) for more information about CAIC. Kim et al. (2005) evaluated the profitability of their model and also measured the complexity of the model in terms of number of input attributes, computation time, and scalability. Their ANN model performed well in terms of profitability but was computationally intensive. They established the scalability of the model by adapting it for analysis of large customer databases. R2, which is the ratio of the variance of the dependent variables explained by the model to the total variance of the dependent variables, is
Level of detail
Individual data
Interpretability of data
I
III
Effort
Aggregate data
II
External data
IV
Behavioral data
Fig. 4. Interpretability of data and effort required for data collection.
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16 Table 7 Comparison of statistical and data mining models for direct marketing Characteristics
Statistical models
Data mining models
Estimation function of model Distribution function of input data
Fixed Must be known
Model construction Influence of distribution of input data Interpretability Optimization
Estimation High Good Multi objective
Flexible Knowledge not needed Learning algorithms Low Moderate Multi objective
to provide accurate data about their behaviors due to weak recollections. If it is not feasible to obtain data located in quadrant III, then direct marketers have to use more aggregated data in quadrant IV and depend more on external data that is located in quadrant I. 6.2. Strengths and weaknesses of direct marketing models From the review presented in this paper it can be seen that basically there are two types of quantitative models that are used for direct marketing, namely statistical models and data mining models. We summarize some characteristics of these two types of models and compare them in Table 7. The first difference is that statistical models used in direct marketing usually have to specify explicitly the form of mathematical function that links explanatory variables and dependant variables. Also, statistical models make assumptions such as the distribution of variables and distribution of error terms. If these assumptions are violated, the results obtained from the models can be severely affected. In contrast to statistical models, most data mining models do not require a fixed mathematical function to show the relationship between explanatory variables and dependant variables. The parameters in the data mining models, such as the weight of each connection between nodes in different layers in ANN and the number of splits in DTs, are obtained using learning algorithms. Data mining tools such as ANN can be regarded as nonparametric and nonlinear models (Zahavi and Levin, 1997). Therefore they are not as sensitive to the distribution of variables as parametric or semiparametric statistical models. However the infrastructure of data mining algorithms is also a type of assumption which may add to the vulnerability of the model to real data. Baesens et al. (2002) argued that ANN makes assumptions about the infrastructure, such as the number of hidden layers, the number of nodes in each layer, and the weights of connections. To overcome the limitations of these assumptions, they applied the Bayesian learning algorithm during the training phase of ANN and improved the accuracy of ANN models. In Table 1, we also provided the number of variables used in the different statistical and data mining models. It is observed that for most researches conducted using statistical models, the number of variables used is small. In contrast, data mining models are able to handle larger number of variables ranging from 50100 and sometimes even more. Interpretability of a model is the ability of direct marketing models to establish the relationship between independent variables and dependent variables. This is also an important aspect of direct marketing because marketers want to find business rules from direct marketing models. Because of their fixed functional form, statistical models have good interpretability and the relationships between explanatory variables and dependant variables are clear. However, data mining models do not do very well in terms of interpretability. The estimation models for ANN, GA/GP, and SVM remain invisible to users as they identify the output but do not capture the interactions between variables. DT is differ-
13
ent because it presents the interactions between variables using rules instead of explicit mathematical functions. Statistical models have the advantage in interpretability in that they show the relationship between dependent variables and independent variables by the value and sign of parameters. 6.3. Improving the performance A major challenge in direct marketing is that the distribution of responders and non-responders are usually unbalanced. There are many more non-responders than responders. If unbalanced data set is used in the training phase for the purpose of model construction, models may not be able to capture the relationships between explanatory variables and non-responders’ negative reply. The result is that the models do not predict true positives accurately. This problem can be solved by oversampling which means selecting more responders into the sample in order to make the distribution of responders and non-responders in the sample balanced. Ling and Li (1998) showed that oversampling improved the performance of DT and Naive Bayes model for direct marketing. Ha et al. (2005) used a bagging approach and an oversampling approach to help ANN deal with unbalanced data sets. No matter whether the direct marketers use statistical models or data mining models it must be remembered that all these tools have bias due to factors such as the specific form of statistical tools, the structure of a neural network, or the kernel functions of SVM, etc. These biases are inevitable. However, Bayesian rules can be used to deal with uncertainties caused by the bias of models (Baesens et al., 2002; Van der Sheer, 1998). Incorporating managerial issues into models may help improve the profitability and accuracy of the models. Gonul et al. (2000), Bult (1993), and Bult and Wittink (1996) reported improvement in both accuracy and profitability by introducing factors related to managerial considerations such as costs. However, it is difficult to identify a typical performance for direct marketing models because there are different areas of application, different numbers of classes to predict, and different ratio of respondents between classes, etc. Despite the popularity of ANN models, conflicting results are reported in literature. For example, Baesens et al. (2002), Ha et al. (2005), Heilman et al. (2003), Kaefer et al. (2005), and Kim et al. (2005) all demonstrated that ANN models achieved higher accuracy than Logit, and discriminant analysis models. However Linder et al. (2004) found from their experiments using simulated data that when the sample size and the complexity of data increased, Logit models performed better then ANN in terms of AUROC of the top 50% of customers. The sizes of data sets used in these researches vary. Complexity of data is not reported in the first five papers. It is difficult to draw firm conclusions but the results of Linder et al. (2004) showed that ANN is quite stable and independent of the variation of size and complexity of data. Another consideration is the method of evaluation of the various techniques. The result achieved by Kim et al. (2005) was restricted to the top 20 deciles. Ha et al. (2005) showed that at top deciles such as top 10% and top 20%, ANN performed better. In contrast, the results obtained by Linder et al. (2004) considered the top 50% of customers. In direct marketing, due to budget issues, it is common for marketers to choose only part of the customers from the whole list. In such a case, ANN may be a better choice compared to the other methods. However, methodological conclusions will depend more on the size and nature of the data sets. 6.4. Relationship with credit scoring When we study the literature on direct marketing, it is difficult to ignore the similarities between direct marketing and consumer
14
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
credit risk management. The goal of credit risk management is to evaluate whether a loan should be granted to a customer by a credit granting agency or not (Hand and Henley, 1997). Both direct marketing and consumer credit risk are based on the idea that companies need to interact with customers and treat them individually based on their profile. These two types of activities can be conducted over the same channels such as postal mail, telephone, email, Internet etc. Also, the same types of data related to customer demographics are used as inputs for models in both areas and even similar statistical and data mining models are commonly used. However, the two areas are different in some respects. Firstly, consumer credit risk management is targeted to personal financial services whereas direct marketing can be used for various products and services, including but not limited to financial services. Secondly, managers of direct marketing worry about whether or not a customer will buy the product. In contrast, managers of consumer credit risk face three possible outcomes: customers decide not to take the credit (not to buy), customers decide to take credit and pay the loan on time (good buy), customers decide to take credit but cannot pay the loan on time (bad buy). In other words, risk management and loss control are critical in credit scoring because the repayment of loan is usually divided into multiple installments and takes a long time whereas this is not a big issue in the case of direct marketing. 6.5. Conclusion and future research directions This paper reviewed quantitative models used in direct marketing research. We analyzed research questions currently being studied in this field. We have identified the key issues which need to be carefully considered for improving the performance of direct marketing models. This research will help direct marketers decide which models should be used and which managerial issues should be taken into consideration in order to achieve better performance in their direct marketing campaigns. Finally, we identify the gaps that still remain in the research literature related to direct marketing. Direct marketing has its benefits and is increasingly gaining attention from researchers and marketers. It has been successfully used by marketers for several decades and now with advances in communication technologies and data mining techniques, direct marketing is likely to evolve to a newly mature and advanced stage. As new developments take place in the world of communication technologies and data analysis several research questions will continue to dominate the field of direct marketing in future. We identified eight such future research directions in the following paragraphs. 6.5.1. Data preparation for direct marketing Data preparation is always an indispensable job before building a model (Han and Kamber, 2001). It is especially important for direct marketing research because real life business transaction data always has missing values and noise. Presently most research is focused on model building and not too much emphasis is put on data preparation. Crone et al. (2006) conducted a survey on data preparation for data mining projects and found that almost no competitive evaluation of data preparation techniques has been done. They conducted experiments to test the influence of data preprocessing techniques such as scaling of continuous variables, and sampling and coding of categorical variables. They observed that different data preparation methods had varying influences on the data mining techniques. Researchers can explore this direction for future research and try to find answers to the following questions. Firstly, how does quality of data influence the performance of direct marketing models? Secondly, what type of approaches can be used to replace missing values and overcome noise in data? Thirdly, what is the impact of data preparation on the performance
of direct marketing models? Finally, how can an analyst find an appropriate data preparation approach that will clean the data with the best effort while minimizing the unwanted impact of overfitting on model building. 6.5.2. Conditions in data that trigger use of quantitative models From the content of this paper it is clear that a plethora of models have been used for direct marketing. However, to the best of our knowledge there is no research paper that provides detailed guidelines on which model should be used under what conditions for direct marketing. Since marketers may be limited to data sets with different size and quality some questions need to be answered urgently before data analysis can proceed. What techniques should be used when there is only aggregated customer data? What techniques should be used when plenty of data about individual customers is available? What techniques are most suitable for analysis of customers’ behavioral data? What techniques are most suitable for analysis of customers’ demographic or socio-graphic data? These questions need to be answered in future. 6.5.3. Use of alternative data analysis techniques Data analysis techniques such as rough sets (RS) and Bayesian Belief Networks (BBN) are seldom used in direct marketing. As noted before, BBN determines conditional dependence among variables. Thus, it is believed that BBN can perform classification with high accuracy and provide clear interpretation at the same time (Baesens et al., 2002). RS classifies the domain of interest into disjoint categories called upper and lower approximations to deal with concepts that have an element of vagueness associated with them (Jensen and Shen, 2004). RS can identify interesting rules among data sets and reduce the dimensionality of the problem without overlooking the semantic relationship between attributes. These two techniques are quite promising and are likely to find their use in future direct marketing applications. More research needs to be conducted on application of these techniques for direct marketing. 6.5.4. Scalability of quantitative models Researchers have often experimented with simulated data or ‘toy’ problems for testing the efficacy of the various models. However, they have often remained silent on how applicable the results are on real life data. In real life, the customer databases are usually quite large and contain a large number of features. When the size of data set is large, new concerns often emerge such as computation time, software and hardware support, appropriate explanation of results etc. All these concerns can add constraints to the use of a cost-effective direct marketing model. Larger data sets imply that the corresponding software and hardware to be used for analysis is more expensive. At the same time, for some applications, marketers may need a fast update of direct marketing models. The model may be highly accurate but if re-training of the model takes a long time, it may not be applicable in the real situation. Not much work has been done in this area and in fact in our review we identified only one paper (Kim et al., 2005) that addressed the issue of scalability. 6.5.5. Use of hybrid data mining models Although hybrid models are promising in terms of their performance (Suh et al., 1999; Suh et al., 2004; Cui et al., 2006) there are some limitations to the use of hybrid models. Firstly, the data mining or statistical techniques that are included as part of a hybrid model are only limited to a small number of techniques like ANN, DT, and Logit. One way to improve this is to include experimentation with more data mining techniques such as RS, BBN, and SVM. Secondly, the hybrid approaches developed are relatively simple. More sophisticated hybrid approaches need to be
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
developed with different roles in data analysis being assigned to different techniques. For example, some techniques can be used for overall classification and some techniques can be dedicated to identification of false negative classifications. 6.5.6. Validation of direct marketing models Direct marketing models are developed for practical use. Whether the model is good or not should be validated using real life data. Real life validation is relatively easy to achieve for single mailing problems. It is difficult for multiple mailing problems because the process takes a long time and costs a lot of money. However, it may still be worthwhile to do this because a multiple direct mailing campaign is not a simple combination of several single mailing events. It is a dynamic stochastic process. Each stage within this stochastic process is influenced by previous stages and influences the succeeding stages. At the same time multiple-mailing models often need to be updated based on customers’ responses. So validation using real life data for multiple mailing problems is critical for building and maintaining direct marketing models. Both Gonul et al. (2000) and Simester et al. (2006) used historical data from real direct marketing in their research. But the data was collected from direct marketing activities without the guidance of their models. Since no data was collected after the implementation of their dynamic multi-mailing models it was not clear how their devised models actually influenced customers’ behaviors. 6.5.7. Comparative direct marketing Due to competition in market, similar products or services are often designed by different companies. Because of the pervasiveness of the Internet, customers are now able to search for products or service information online and make comparison before making actual purchases. But usually the search process is time consuming and painful. To save customers’ effort, marketers should consider providing comparative information on products as a type of direct marketing activity through a comparative shopping engine (Yuan, 2003). It is likely that such an approach will win the customers’ favor. Marketers will need to decide what types of products or services customers want to compare and which features they want to compare the products or services on. Quantitative models that help identify the comparative features of products and services based on customers’ browsing patterns will be quite valuable for future Web direct marketing. 6.5.8. Cross-sell and up-sell direct marketing Cross-selling and up-selling are important marketing activities that explore the purchase potential of customers. Recommendation systems suggest products that match customers’ preferences or products bought by other customers with similar preferences (Schafer et al., 2001). At present they can do cross-selling but are not be good at doing up-selling because up-selling activities are concerned with promoting better versions of products or services with higher prices and new features that do not show any similarities to that of products that are previously purchased by the customers. It is worth trying to develop quantitative recommendation systems that are dedicated to up-selling activities. Also, research needs to be done on the use of quantitative models for simultaneous cross-selling and up-selling. Acknowledgments The authors want to thank the two anonymous referees for suggesting many changes that have improved the clarity and readability of the paper. The first author gratefully acknowledges financial support received from the University of Hong Kong in the form of the CRCG Grant (Project Code 200611159102) under the Seed Funding Program for Basic Research.
15
References Agrawal, R., Srikant, R, 1994. Fast algorithms for mining association rules. In: Proceedings of the Twentieth International Conference on Very Large Databases, Santiago, Chile, pp. 487–499. Baesens, B., Viaene, S., Van den Poel, D., Vanthienen, J., Dedene, G., 2002. Bayesian neural network learning for repeat purchase modeling in direct marketing. European Journal of Operational Research 138 (1), 191–211. Balasubramanian, S., Gupta, S., Kamakura, W., Wedel, M., 1998. Modeling large data sets in marketing. Statistica Neerlandica 52 (3), 303–323. Barwise, P., Farley, J.U., 2005. The state of interactive marketing in seven countries: Interactive marketing comes of age. Journal of Interactive Marketing 19 (3), 67– 80. Baumgartner, B., Hruschka, H., 2005. Allocation of catalogs to collective customers based on semiparametric response models. European Journal of Operational Research 162 (3), 839–849. Bellman, R.E., 1961. Adaptive Control Process: A Guided Tour. Princeton University Press, Princeton. Bhattacharyya, S., 1999. Direct marketing performance modeling using generic algorithms. INFORMS Journal on Computing 11 (3), 248–257. Bhattacharyya, S., 2000. Evolutionary algorithms in data mining: Multi-objective performance modeling for direct marketing. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 465–473. Bitran, G.R., Mondschein, S.V., 1996. Mailing decisions in the catalog sales industry. Management Science 42 (9), 1364–1381. Bodapati, A., Gupta, S., 2004. A direct approach to predicting discretized response in target marketing. Journal of Marketing Research 41 (1), 73–85. Bozdogan, H., 1987. Model selection and Akaike’s information criterion: The general theory and its analytical extensions. Psychometrika 52 (3), 345–370. Buckinx, W., Moons, E., Van den Poel, D., Wets, G., 2004. Customer-adapted coupon targeting using feature selection. Expert Systems with Applications 26 (4), 509– 518. Bult, J.R., 1993. Semiparametric versus parametric classification models: An application to direct marketing. Journal of Marketing Research 30 (3), 380–390. Bult, J.R., Wansbeek, T., 1995. Optimal selection for direct mail. Marketing Science 14 (4), 378–393. Bult, J.R., Wittink, D.R., 1996. Estimating and validating asymmetric heterogeneous loss functions applied to health care fund raising. International Journal of Research in Marketing 13 (3), 215–226. Bult, J.R., Van der Scheer, H., Wansbeek, T., 1997. Interaction between target and mailing characteristics in direct marketing, with an application to health care fund raising. International Journal of Research in Marketing 14 (4), 301– 308. Burges, C.J.C., 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2 (2), 121–167. Changchien, S.W., Lee, C.-F., Hsu, Y.-J., 2004. On-line personalized sales promotion in electronic commerce. Expert Systems with Applications 27 (1), 35–52. Chen, M.-C., Chiu, A.-L., Chang, H.-H., 2005. Mining changes in customer behavior in retail marketing. Expert Systems with Applications 28 (4), 773–781. Cheung, K.-W., Kwok, J.T., Law, M.H., Tsui, K.-C., 2003. Mining customer product ratings for personalized marketing. Decision Support Systems 35 (2), 231–243. Cohen, M.-D., 2004. Exploiting response models – optimizing cross-sell and up-sell opportunities in banking. Information Systems 29 (4), 327–341. Colombo, R., Jiang, W., 1999. A stochastic RFM model. Journal of Interactive Marketing 13 (3), 2–12. Crone, S.F., Lessmann, S., Stahlbock, R., 2006. The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research 173 (3), 781–800. Cui, G., Wong, M.L., Lui, H.-K., 2006. Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science 52 (4), 597–612. Desarbo, W.S., Ramaswamy, W., 1994. CRISP: Customer response-based interative segmentation procedures for response modeling in direct marketing. Journal of Direct Marketing 8 (3), 7–20. Elsner, R., Krafft, M., Huchzermeier, A., 2003. Optimizing Rhenania’s mail-order business through dynamic multilevel modeling. Interfaces 33 (1), 50–66. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing, Boston. Gonul, F., Shi, M.Z., 1998. Optimal mailing of catalogs: A new methodology using estimable structural dynamic programming models. Management Science 44 (9), 1249–1262. Gonul, F., Kim, B.D., Shi, M.Z., 2000. Mailing smarter to catalog customers. Journal of Interactive Marketing 14 (2), 2–16. Gopal, R.D., Walter, Z., Tripathi, A.K., 2001. Admediation: New horizons in effective email advertising. Communications of the ACM 44 (12), 91–96. Ha, K., Cho, S., Maclachlan, D., 2005. Response models based on bagging neural networks. Journal of Interactive Marketing 19 (1), 17–30. Han, J.W., Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco. Hand, D.J., Henley, W.E., 1997. Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society. Series A 160 (3), 523– 541. Hansotia, B.J., Wang, P., 1997. Analytical challenges in customer acquisition. Journal of Direct Marketing 11 (2), 7–19.
16
I. Bose, X. Chen / European Journal of Operational Research 195 (2009) 1–16
Haughton, D., Oulabi, S., 1997. Direct marketing modeling with CART and CHAID. Journal of Direct Marketing 11 (4), 42–52. Heilman, C.M., Kaefer, F., Ramenofsky, S.D., 2003. Determining the appropriate amount of data for classifying consumers for direct marketing purposes. Journal of Interactive Marketing 14 (3), 5–28. Jensen, R., Shen, Q., 2004. Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches. IEEE Transactions on Knowledge and Data Engineering 16 (12), 1457–1471. Johnson, P.A., Frankel, A.B., 2005. U.S. Direct Marketing Today: Economic Impact 2005, Direct Marketing Association, NY, USA. Available from
). Jonker, J.-J., Piersma, N., Van den Poel, D., 2004. Joint optimization of customer segmentation and marketing policy to maximize long-term profitability. Expert Systems with Applications 27 (2), 159–168. Kaefer, F., Heilman, C.M., Ramenofsky, S.D., 2005. A neural network application to consumer classification to improve the timing of direct marketing activities. Computer & Operations Research 32 (10), 2595–2615. Kantardzic, M., 2003. Data Mining: Concepts, Models, Methods, and Algorithms. Wiley-Interscience, Hoboken. Kim, Y., 2006. Toward a successful CRM: Variable selection, sampling and ensemble. Decision Support Systems 41 (2), 542–553. Kim, Y., Street, W.N., 2004. An intelligent system for customer targeting: A data mining approach. Decision Support Systems 37 (2), 215–228. Kim, Y., Street, W.N., Russell, G.J., Menczer, F., 2005. Customer targeting: A neural network approach guided by genetic algorithms. Management Science 51 (2), 264–276. Kohonen, T., 1995. Self-organizing Maps. Springer, Berlin. Kwon, Y.-K., Moon, B.-R., 2001. Personalized email marketing with a genetic programming circuit model. In: Proceedings of the Genetic and Evolutionary Computation Conference, San Francisco, CA, USA, pp. 1352–1358. Lang, K., 1995. Newsweeder: Learning to filter Netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, San Francisco, CA, USA, pp. 331–339. Levin, N., Zahavi, J., 1998. Continuous predictive modeling – a comparative analysis. Journal of Interactive Marketing 12 (2), 5–22. Liao, S.-H., Chen, Y.-J., 2004. Mining customer knowledge for electronic catalog marketing. Expert Systems with Applications 27 (4), 521–532. Li, Y., Lu, L., Li, X.F., 2005. A hybrid collaborative filtering method for multipleinterests and multiple-content recommendation in E-commerce. Expert Systems with Applications 28 (1), 67–77. Lin, F.Y., McClean, S., 2001. A data mining approach to the prediction of corporate failure. Knowledge-based Systems 14 (3), 189–195. Linder, R., Geier, J., Kolliker, M., 2004. Artificial neural networks, classification trees and regressions: Which method for which customer base? Journal of Database Marketing & Customer Strategy Management 11 (4), 344–356. Ling, C.X., Li, C.-H., 1998. Data mining for direct marketing: Problems and solutions. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 73–79. Malthouse, E.C., 1999. Ridge regression and direct marketing scoring models. Journal of Interactive Marketing 13 (4), 10–23. Min, S.-H., Han, I., 2005. Detection of the customer time-variant pattern for improving recommender systems. Expert Systems with Applications 28 (2), 189–199. Mitchell, T.M., 1997. Machine Learning. McCraw-Hill, New York. Nash, E.L., 1984. The Direct Marketing Handbook. McGraw-Hill, New York. Pearson, S., 1994. How to achieve return on investment from customer loyalty – Part I. Journal of Targeting, Measurement and Analysis for Marketing 3 (1), 39–47.
Piersma, N., Jonker, J.-J., 2004. Determining the optimal direct mailing frequency. European Journal of Operational Research 158 (1), 173–182. Prinzie, A., Van den Poel, D., 2005. Constrained optimization of data-mining problems to improve model performance: A direct-marketing application. Expert Systems with Applications 29 (3), 630–640. Rao, V.R., Steckel, J.H., 1995. Selecting, evaluating, and updating prospects in direct mail marketing. Journal of Direct Marketing 9 (2), 20–31. Rissanen, J., 1978. Modeling by shortest data description. Automatica 14 (5), 465– 471. Roddy, M., 2002. Direct Marketing: A Step-by-step Guide to Effective Planning and Targeting. Longdong, Kogan Page. Rossi, P.E., McCulloch, R.E., Allenby, G.M., 1996. The value of purchase history data in target marketing. Marketing Science 15 (4), 321–340. Schafer, J.B., Konstan, J.A., Riedl, J., 2001. E-commerce recommendation applications. Data mining and Knowledge Discovery 5 (1-2), 115–153. Shardanand, U., Maes, P., 1995. Social information filtering: Algorithms for automating ‘‘Word of Mouth”. In: Proceedings of the Conference on Human Factors in Computing Systems, New York, NY, USA, pp. 210–217. Shaw, M.J., Subramaniam, C., Tan, G.-W., Welge, M.E., 2001. Knowledge management and data mining for marketing. Decision Support Systems 31 (1), 127–137. Shin, H.W., Sohn, S.Y., 2004. Multi-attribute scoring method for mobile telecommunication subscribers. Expert Systems with Applications 26 (3), 363–368. Simester, D.I., Sun, P., Tsitsiklis, J.N., 2006. Dynamic catalog mailing policies. Management Science 52 (5), 683–696. Suh, E., Noh, K.C., Suh, C.K., 1999. Customer list segmentation using the combined response model. Expert Systems with Applications 17 (2), 89–97. Suh, E., Lim, S., Hwang, H., Kim, S., 2004. A prediction model for the purchase probability of anonymous customers to support real time Web marketing: A case study. Expert Systems with Applications 27 (2), 245–255. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J., 2002. Least Squares Support Vector Machines. World Scientific, Singapore. Van den Poel, D., Buckinx, W., 2005. Predicting online-purchasing behaviour. European Journal of Operational Research 166 (2), 557–575. Van der Sheer, H.R., 1998. Quantitative approaches for profit maximization in direct marketing, PhD thesis, University of Groningen, Netherlands. Viaene, S., Baesens, B., Van den Poel, D., Dedene, G., Vanthiene, J., 2001a. Wrapped input selection using multilayer perceptrons for repeat-purchase modeling in direct marketing. International Journal of Intelligent Systems in Accounting, Finance & Management 10 (2), 115–126. Viaene, S., Baesens, B., Van Gestel, T., Suykens, J.A.K., Van den Poel, D., Vanthienen, J., De Moor, B., Dedene, G., 2001b. Knowledge discovery in a direct marketing case using least squares support vector machines. International Journal of Intelligent Systems 16 (9), 1023–1036. Wasson, C.S., 2006. System Analysis, Design, and Development: Concepts, Principles, and Practices. Wiley-Interscience, New Jersey. Wedel, M., Desarbo, W.S., Bult, J.R., Ramaswamy, V., 1993. A latent class Poisson regression model for heterogeneous count data. Journal of Applied Economics 8, 397–411. Weng, S.-S., Liu, M.-J., 2004. Feature-based recommendation for one-to-one marketing. Expert Systems with Applications 26 (4), 493–508. Yuan, S.-T., 2003. A personalized and integrative comparison-shopping engine and its applications. Decision Support Systems 34 (2), 139–156. Zahavi, J., Levin, N., 1997. Applying neural computing to target marketing. Journal of Direct Marketing 11 (4), 76–93.