Forecasting customer behaviour in a multi-service financial organisation: A profitability perspective

Forecasting customer behaviour in a multi-service financial organisation: A profitability perspective

International Journal of Forecasting 28 (2012) 507–518 Contents lists available at SciVerse ScienceDirect International Journal of Forecasting journ...

714KB Sizes 0 Downloads 36 Views

International Journal of Forecasting 28 (2012) 507–518

Contents lists available at SciVerse ScienceDirect

International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast

Forecasting customer behaviour in a multi-service financial organisation: A profitability perspective Alena Audzeyeva a,∗ , Barbara Summers b , Klaus Reiner Schenk-Hoppé b,c a

Keele Management School, Keele University, UK

b

Leeds University Business School, University of Leeds, UK

c

School of Mathematics, University of Leeds, UK

article

info

Keywords: Profitability forecasting Adaptive segmentation Bootstrap Customer lifetime value Financial services

abstract This paper proposes a novel approach to the estimation of Customer Lifetime Value (CLV). CLV measures give an indication of the profit-generating potential of customers, and provide a key business tool for the customer management process. The performances of existing approaches are unsatisfactory in multi-service financial environments because of the high degree of heterogeneity in customer behaviour. We propose an adaptive segmentation approach which involves the identification of ‘‘neighbourhoods’’ using a similarity measure defined over a predictive variable space. The set of predictive variables is determined during a cross-validation procedure through the optimisation of rank correlations between the observed and predicted revenues. The future revenue is forecast for each customer using a predictive probability distribution based on customers exhibiting behavioural characteristics similar to previous periods. The model is developed and implemented for a UK retail bank, and is shown to perform well in comparison to other benchmark models. © 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction Customer Lifetime Value (CLV), an indicator of the expected future customer profitability for an organisation, is a well established concept in the academic and business literature. It is of particular interest to organisations which aim to maximise the value of a customer relationship over a particular time period.1 In contrast to the prevailing practice of setting separate performance objectives for different business functions (which may conflict with each other), CLV offers a holistic decision support tool which gives ‘‘the same focus throughout the different decision making areas of the organisation’’ (Thomas,



Corresponding author. Tel.: +44 0 1782 73 3271. E-mail address: [email protected] (A. Audzeyeva).

1 Despite the name, Customer Lifetime Value models do not estimate the value over the entire lifetime of a relationship in a situation where the length of the relationship is indeterminate.

2000). Applications of this concept include customer relationship management (CRM), which seeks to improve the long-term retention and profitability of both those customers who are currently highly profitable and those who are currently less profitable, or even unprofitable, but have the capacity to increase their profitability in the future (Zeithaml, Rust, & Lemon, 2001). Measuring the customer profit generating potential, along with credit scoring, which is an established decision support tool, is increasingly becoming an integral part of companies’ lending decisions, providing a focus which is truly relevant to a company’s objective of profit maximisation (Finlay, 2010; Fishelson-Holstine, 1998; Oliver, 1993; Thomas, 2000). Another important application of customer lifetime value modelling is in customer segmentation, where it facilitates the differentiation of levels and the volume of customer servicing in line with the revenue-generating levels and cost optimisation (Zeithaml et al., 2001). The implementation of our CLV modelling approach in a UK

0169-2070/$ – see front matter © 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2011.05.005

508

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

bank led to a better understanding of the needs and preferences of some customer groups, and to the development of tailored customer propositions.2 Companies have been wrestling with the concept of CLV for a long time, as it is fraught with complexity in the context of a multiservice financial organisation. Firstly, the multidimensional nature of customer behaviour introduces challenges for behavioural models, as not only future purchase decisions but also their volumes need to be predicted (Donkers, Verhoef, & de Jong, 2007). Secondly, customers often purchase more than one product, and these purchasing decisions are not independent; these interdependencies should be taken into account when modelling the customer lifetime value (Kamakura, Ramaswami, & Srivastava, 1991; Kamakura, Wedel, de Rosa, & Mazzon, 2003; Knott, Hayes, & Scott, 2002; Li, Sun, & Wilcox, 2005). Thirdly, retail financial services organisations offer a wide variety of financial products (some rather complex) which differ in both their nature and their revenue generating pattern. Finally, a customer can switch between products or even between product providers at any point in time (Kamakura et al., 2003). Intense competition and technological advances have enabled historically ‘‘monogamous’’ retail bank customers (i.e., those conducting business only with one provider) to move increasingly towards the ‘‘always a share’’ relationship, in which a customer conducts business with several financial services providers simultaneously and can switch between them relatively easily. Due to these specifics of the retail financial industry, only a limited number of existing modelling approaches can be applied in this context. A number of approaches to forecasting CLV have been discussed in the literature; for example, those focusing on the absolute level of individual customer values, the relative ordering of the individual customer values, and the aggregate CLV of all of the company’s customers. The main focus of this paper is to differentiate between individual customers with high versus low profit generating potentials. This addresses a business need, as noted by Donkers et al. (2007), who stress that retail financial companies are interested mainly in identifying customers with high CLVs, rather than predicting the precise levels of their CLVs. We propose a novel approach to the modelling of the customer lifetime value which is based on adaptive customer segmentation using a neighbourhood concept. Our approach identifies customer neighbourhoods (local segments) which are (1) small enough to ensure customer homogeneity by capturing only customers who have similar characteristics and past behaviour, and (2) sufficiently large to ensure robust forecasts of customers’ future behaviour. We use a similarity measure which is defined over a predictive variable space to establish a customer’s

2 A segment of the over-50s customer age group was identified as having a high future revenue generating potential, which was subsequently recognised by the organisation as being a prime target for product cross-sale and customer acquisition and retention activities. This understanding has also contributed to the organisation’s compliance with the UK Financial Services Agency’s regulatory requirements for treating customers fairly.

neighbourhood. The customers’ neighbourhoods are chosen so as to include customers who had predictive characteristics in a given past period which are similar to the most recently observed characteristics of the customer in question. The optimal local segment size and set of predictive variables are determined using a cross-validation procedure which employs the rank correlation between the observed and predicted profitability as the optimisation criterion. The one-period-ahead profitability is forecast for each customer using a predictive probability distribution estimated over the population from her neighbourhood. Multi-period forecasts are produced by the convolution of one-period predictive probability distributions, which is implemented using bootstrap simulations. Our segmentation approach can be applied to a range of predictive tasks, such as the forecasting of the conditional profitability (given some other customer characteristics, e.g., credit risk) or the prediction of other customer-related characteristics (e.g., product purchasing decisions or account balances). Thus, it provides a powerful tool in the development of tailored customer acquisition and retention strategies. The ranking approach has benefits during times of significant economic change. Our model is applied to predict and validate the customer profitability using customer data for a UK retail bank for the period 2005–2008. We also compare our results with those obtained using various benchmark models: simple and multiple linear regression models for predicting the revenues from individual customers, and a probit model for identifying customers whose revenues are likely to increase, decrease or remain stable in the future. Our model performs better than the benchmark models. We are able to make robust predictions of individual customer revenues using a small number of variables. The remainder of this paper is structured as follows. Section 2 offers a short review of the previous literature on modelling approaches in the context of a multi-service financial organisation. The details of our modelling approach are provided in Section 3. Section 4 offers an implementation algorithm which enhances the computational efficiency of our approach. The estimation results and a discussion of potential applications are given in Sections 5 and 6, respectively. Finally, Section 7 concludes. 2. Review of previous approaches in the context of a multi-service financial organisation Gupta et al. (2006) and Jain and Singh (2002) provide comprehensive overviews of the existing lifetime value models. In the context of a multiservice financial organisation, the empirical evidence suggests that there is no advantage of complex service-level models of customer behaviour over simple models when predicting the individual customer lifetime value (Donkers et al., 2007). These simple models use aggregate customer data and make the restrictive assumption that the customer profitability is constant over time, or else exhibits a linear trend in profitability (e.g. Berger & Nasr, 1998; Malthouse & Blattberg, 2005). In the context of retail banking, this assumption can only be satisfactory if profitability margins do not vary over time and revenues are generated by regular payments from

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

customers, such as for customers who continuously hold one or more long-term products (e.g., a savings account and a personal loan or a mortgage). The approach does not apply to customers who are either purchasing new products or switching between products or product providers before product maturity, or when interest rates (and hence, profit margins) change. This is a major drawback from a business perspective, because one would like to focus on customers who have a high propensity to buy new products or leave an organisation, given that immediate action is required in these cases in order to ensure the maximisation of their CLV. The more complex models of customer behaviour which are used for customer lifetime value predictions adapt the univariate or multivariate probit modelling frameworks in order to predict purchase probabilities. The multivariate probit model accounts for dependencies between the purchasing decisions arising from either a hierarchy in the decisions to add new services to the current portfolio, or cross-sales promotions or the sale of financial packages (e.g. Kamakura et al., 1991, 2003). The main weaknesses of this model class are that (1) only one-period-ahead predictions can be made, and (2) only a purchase decision is predicted, with a separate model being needed to estimate the purchase value (which varies greatly in the context of a retail bank, and introduces additional complexity to the model). This complexity contributes to larger prediction errors and a weaker forecasting ability for these more complex models of individual customer profitability than for the simpler models discussed above (Donkers et al., 2007). The Pareto/NBD model, which was initially proposed by Schmittlein, Morrison, and Colombo (1987), allows the forecasting of the customer attrition (using the Pareto timing model), and also of the next period purchasing decision if the customer is still active (using the NBD counting model). This forecasting approach employs only information on the past timing and frequency of transactions. It has been developed for non-contractual settings, where transactions occur at random and customers’ attrition is unobserved. Although attractive for such settings, the Pareto/NBD model is of limited use to a retail banking organisation. Firstly, many of the product-level relationships are contractual and long-term, e.g., fixed-term deposits and personal and home loans. Secondly, purchasing decisions are often interdependent, and, as in the case of the probit model, the purchase quantity needs to be forecasted separately, which is a nontrivial task in itself. An alternative BG/NBD model proposed by Fader, Hardie, and Lok Lee (2005) reduces the computational burden compared to the Pareto/NBD model, but exhibits similar weaknesses in the context of our study. Another widely discussed and applied approach in customer relationship and behavioural models employs the Markov Chain methodology (e.g. Morrison, Chen, Karpis, & Britney, 1982; Pfeifer & Carraway, 2000). This class of models allows for additional flexibility in modelling customer behaviour relative to earlier models. One important restrictive limitation of the Markov Chain methodology, however, is that the robustness of profitability estimation depends on the existence of a relatively small number of

509

meaningful and homogenous customer segments. In the retail banking industry, this condition does not hold because there is a significant degree of diversity in customer purchasing behaviours, which may result in a wide range of revenues. This is even true for customers who are similar in terms of characteristics such as age, tenure as a customer, earning capacity etc., which are often used as a basis for customer segmentation. As a result, the robustness of the estimation suffers. Allocating customers to a limited number of segments may also lead to a considerable loss of information, which again affects the robust estimation of the model parameters. In the end, there might not be enough information available to satisfy the business needs, in terms of differentiating between groups of customers for marketing decisions. 3. Modelling customer behaviour In this section we describe our approach to predicting the customer lifetime values of individual customers in the context of a multiservice financial organisation. As was discussed above, in such organisations, a customer’s relationship with a company is typically long-term, and customer purchasing behaviour is rather complex. An additional difficulty in retail banking is that purchases of the same product (say, a personal loan or fixedterm deposit) by two different customers may result in substantially different revenues, as the revenues vary not only with the decision to purchase, but also across the wide range of possible purchase volumes. As was also discussed above, the existing models of customer behaviour using a Markov Chain methodology, Pareto/NBD models and probabilistic models developed within a probit modelling framework do not offer a satisfactory solution to this problem. We therefore propose an adaptive segmentation approach which uses information from the historic probability distributions of a set of predictive variables for a close customer neighbourhood to predict the future revenues associated with a given customer. Our approach allows for a wide range of customer purchasing behaviours without compromising the robustness of the estimation. 3.1. General CLV model We first provide a general definition of the customer lifetime value as the net present value of future cash flows associated with a customer over the time period of her relationship with a company, or over a given time horizon. Many articles provide slightly differing formulations of the customer lifetime value model (e.g. Berger & Nasr, 1998; Gupta et al., 2006; Jain & Singh, 2002; Reinartz & Kumar, 2003). The formula below provides a generic formulation. The customer lifetime value is given by CLV i =

T  (Riτ − Cτi )(1 − qiτ )Dτ − AC0i ,

(1)

τ =1

where Riτ denotes the predicted revenue from customer i in period τ , given that the customer continues his or her relationship with the company in period τ , and Cτi is the

510

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

direct cost of servicing the customer in this period. AC0i denotes the cost of acquisition for a new customer. Dτ is the discount factor in period τ , and T is the number of periods for which CLV is estimated. qiτ is the projected probability that customer i will terminate the relationship with the company in period τ , τ < T ; this is sometimes called the customer attrition rate. In the current application, we focus on predicting future revenues from a customer, Riτ , assuming that the direct costs Cτi associated with servicing a customer during 1 ≤ τ ≤ T and the costs of customer acquisition AC0i are known. The attrition rate qiτ can be either inferred from historic data or estimated by the adaptive segmentation approach presented in this paper. Considering that our approach can be applied to a wide range of forecasting problems, we next present a general formulation for the proposed approach. 3.2. The adaptive segmentation approach For each customer, we aim to obtain the conditional probability distribution of observing the variable of interest yt +τ , τ = 1, 2, . . . , T , conditional on a K dimensional vector of variables xt = (x1 , . . . , xK )t . Both the target variable yt +τ and any element in the vector of conditioning variables xt can be either continuous or discrete. In the current application, yt +τ represents the customer revenue Rit +τ , a continuous variable. The vector xt , which can include any elements of the historic information set for a customer available at time t, provides the predictive variables in our model. In the current application, it contains a set of both continuous and discrete variables from the company’s customer database, which are predictive of customers’ future revenue, and may include variables observed at all or some of time periods t , t − 1, t − 2, . . . , t − L, where L is the number of past periods for which historic customer data is available. Typical predictive variables may include current and past customer revenues, age, tenure with the company, credit score, and current and past product holdings. The conditional probability distribution of observing the target variable yt +τ , given the current customer state xt , is p(yt +1 |xt ), p(yt +2 |xt ), . . . , p (yt +T |xt ). We call these distributions predictive distributions from now on. Note that the probability p (yt +τ |xt ) is conditional on the current customer state xt , and also on the customer continuing his or her relationship with the company until and throughout the time between periods t and t + τ . We estimate these distributions empirically using an adaptive segmentation approach, because the analytical forms of the predictive probability distributions are not known. Given an information set xt for a customer, we locate a group of customers that were in a similar state at time t0 < t (if historic variable observations are included in xt , these customers also have a similar history at t0 ), and use their outcomes at t0 + 1 as a basis for predicting the outcome for the customer of interest. This group of customers will be referred to as a local segment (or customer neighbourhood), and the time t0 as the base period. The empirical conditional distribution of the target variable for the local segment at time t0 + 1 serves as an estimate of the one-period-ahead predictive distribution of

this variable. This is typically the most recent time period for which customer data are available. The empirical joint distribution of the conditioning variables within a local segment at time t0 + 1 is used as an estimate of the oneperiod-ahead predictive distribution of the conditioning variables. This one-period-ahead predictive distribution is then used in obtaining multi-period predictions. Fig. 1 illustrates this concept. This approach is based on the conjecture that customers with similar characteristics and past behaviours will tend to exhibit similar behaviours in future time periods. We identify a group of customers (the local segment) whose characteristics in a previous period are similar to those of a particular customer in the current time period, then use the local segment’s outcomes one period forward as a basis for predicting that customer’s future behaviour. The size of a local segment is chosen so that it is (a) small enough to ensure customer homogeneity within a local segment by capturing only customers with similar characteristics and past behaviour, and (b) sufficiently large to ensure the robustness of the forecasts of future behaviour. The local segmentation is based on the concept of a similarity measure, which is defined over the predictive variable space to describe the customer neighbourhood. The similarity measure D is defined as the weighted Euclidean distance between the conditioning variables of two customers:



j

D xit , xt0



  K   2  = wk xik,t − xjk,t0 .

(2)

k=1

This measure is used to determine the customers in the neighbourhood of customer i for whom a forecast is being made. Note that the customer in question, i, is being considered at time period t, whereas the customer’s neighbourhood is being considered at the base time-period t0 . The weights wk , k = 1, . . . , K , are employed to standardize the predictive variables, which all have different measurement scales; details are given in Section 3.4. For example, the customer age may vary between 18 and 100 years, whereas the spot balance on a money transmission account can take values of up to several million pounds. Without weighting, the similarity measure will be dominated by the variable with the largest measurement scale. For a continuous target variable, e.g. the period revenue, the empirical predictive distributions are used to estimate the mean and median values and confidence intervals. For a discrete target variable, the frequency of observing a customer in the local segment with a given value of the variable at t0 + 1 is employed as an estimate of the corresponding probability. This estimate is then compared against some cut-off value, in order to provide a value forecast. The cut-off value is estimated during the model validation stage using a ROC-analysis framework.3 This analysis is a standard tool originating from signal detection

3 ROC stands for Receiver Operating Characteristics, and is also known as Relative Operating Characteristics.

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

511

Fig. 1. Estimation of the empirical predictive distributions with a bivariate vector of conditioning variables (K = 2). N is the number of customers in the customer database.

theory. It is used across a wide range of practical applications for organizing and selecting classifiers on the basis of their performances (Fawcett, 2006). Section 6 contains an example of how to apply our approach to predicting a discrete target variable. In the example, we assess the behaviour of a number of standard statistics, e.g. sensitivity and specificity, against a range of cut-off values, and focus on searching for a cut-off which maximises the classification accuracy. The overall predictive power of our model (across a range of cut-offs) is assessed against alternative model specifications using the value of the area under the ROC curve. This area is equivalent to the Wilcoxon-MannWhitney test statistic (Hand, 1997). 3.3. Multi-period-ahead forecasting Under the assumption of time-homogeneity, a multiperiod-ahead predictive distribution can be obtained from the one-period-ahead predictive distributions for the target variable and predictive variables. For example, the twoperiod-ahead predictive marginal probability distribution can be obtained as a convolution of the corresponding oneperiod-ahead distributions for the target variable and predictive variables: p (yt +2 |xt ) =

 

=

p (yt +2 , xt +1 |xt ) dxt +1 p (yt +2 |xt +1 ) p (xt +1 |xt ) dxt +1 ,

(3)

and the multi-period-ahead predictive distribution is obtained by the convolution of the one-period-ahead distributions: p (yt +τ |xt ) =

 

...



p (yt +τ |xt +τ −1 )

× p (xt +τ −1 |xt +τ −2 ) . . . p (xt +1 |xt ) dxt +τ −1 . . . dxt +1 . (4) It follows from Eq. (4) that knowing the one-period-ahead predictive distributions p (yt +1 |xt ), 1 ≤ τ ≤ T , for the target variable and p (xt +1 |xt ) for the conditioning variables is sufficient for obtaining multi-period-ahead predictions,

with such predictions being based on empirical predictive probability distributions, in complete analogy to the case of one-period-ahead forecasts. We begin by describing our approach for a two-periodahead forecast, before addressing the general case. Given a current customer state xt , a random vector x˜ t +1 is generated from the one-period predictive distribution p (xt +1 |xt ). This can be done by either using re-sampling techniques such as bootstrapping, as in this paper, or sampling from an approximating analytical distribution, for example by using copulas or kernel smoothing. Using the simulated vector of conditioning variables x˜ t +1 , we generate the prediction y˜ t +2 for the next period from the empirical distribution p (yt +1 |xt ) using a technique analogous to that in the previous step. The resulting pair x˜ t +1 , y˜ t +2 has the joint probability distribution p (yt +2 |xt +1 ) p (xt +1 |xt ). The marginalisation over xt +1 is achieved by pooling y˜ t +2 for all simulated values x˜ t +1 . A similar approach is used to estimate the empirical predictive distribution for the periods t + τ , with 1 ≤ τ ≤ T . The sequence x˜ t +1 , x˜ t +2 , . . . , x˜ t +τ −1 is generated by successive sampling from the one-period-ahead predictive distribution p (xt +1 |xt ). At the next step, y˜ t +τ is generated from the one-period ahead predictive distribution p (yt +1 |xt ) using x˜ t +τ −1 , and the marginalisation over possible realisations xt +1 , xt +2 , . . . , xt +τ −1 is achieved by pooling all simulated values y˜ t +τ . The realisation of our multi-period forecast can be illustrated as follows. For the sake of simplicity, we discuss the case where the set of predictive variables consists of two variables only: the most recently observed revenue from a customer (‘‘revenue’’) and the average balance on her money transmission account (‘‘balance’’). The forecast variable is the revenue in the next period. Assume that the base period t0 for determining the local segment is given by the year 2006 and that the size of the local segment is set to 1000 customers. For a customer A whose most recent revenue is £25 and balance is £1200, we search the customer database and retrieve the 1000 customers (her local segment) who, in 2006, had the closest revenue and balance to those of A (as measured by the similarity

512

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

measure introduced in Section 3.2). These 1000 customers provide the predictive population for determining A’s future revenue. The 2007 revenues from this population define a one-year predictive distribution for the next-year revenue. For a two-year forecast, we locate a local segment for each member of the local segment of A (1000 local segments altogether) using the revenue and balance data from the base period as discussed above. All local segments are pooled together (realising the marginalisation over the one-period forecast) to give a two-year predictive population for A. As a result, we obtain a 1000 × 1000 two-year predictive population. Next, we ‘‘downsize’’ this population to the local segment size of 1000 by applying a standard non-parametric bootstrap procedure with replacement (Efron & Tibshirani, 1993). This twoyear predictive population will give an empirical twoyear predictive distribution for the revenue. Third year predictions can be obtained by repeating this procedure, starting with the two-year predictive population; and so on for subsequent year predictions. 3.4. Data-specific considerations for assigning variable weights in the similarity measure The analysis of our data shows that our continuous predictive variables are non-normal; their distributions are asymmetric and leptokurtic.4 This implies that the routinely employed standardisation of variables involving division by the standard deviation (which is used for normally distributed variables) is not appropriate for the calculation of our similarity measure. The standard deviation (or its multiple) is not representative of the true range for non-normal variables. The non-normality of variables might be caused by the nature of the financial services business. For example, some customers have very large revenues, adding to the asymmetry and fat tails of the distribution. Errors and inaccuracies, which are unavoidable in real-life company data collection and recording processes, might also contribute to the fat tails. A substantial amount of data, e.g., customer personal data, is added to the database via manual entries which are prone to ‘‘human’’ errors and omissions. Even more importantly, data collection processes within a company are generally driven by operational needs, for which a less than perfect level of accuracy might be satisfactory, and, as a result, a company might not have a strong incentive for improving the quality of the data. Thorough data cleaning just for modelling purposes might prove too expensive and impractical for many organisations. Our method for assigning weights wk in Eq. (2) lowers the impact of outliers, including errors and inaccuracies, and accounts for data-specific considerations. We scale our predictive variables by dividing by the variable range. The variable range is limited here to 95% of the full range (2.5% of the observations are trimmed off each tail), to lower the impact of the outliers on the predictive

4 Exact descriptions of the variables and their distributions cannot be reported due to the commercial sensitivity of these data, but some general information is given in Section 5.1.

power of our model. The 95% variable range is judged to be representative of the ‘‘typical’’ variable range for most continuous variables, and is therefore used in our weighting. The weights for the continuous variables in Eq. (2) are defined so that the variable values in the typical range are constrained to be in the interval [0, 1]:

wk =

1 Range (xk )

,

(5)

where Range (xk ) = Percentile (xk , 0.975) − Percentile(xk , 0.025). For discrete variables, wk = 0 for equal values and wk = ∞ for non-equal values, so that only customers with a given value of a discrete variable enter a local segment.5 3.5. Discussion of the adaptive segmentation approach Our adaptive-segmentation approach to modelling offers a number of advantages over existing methods in the context of a multiservice financial organisation. First, our approach is not dependent on the presence of a particular type of distribution, and also works with different correlation structures (no limiting assumptions are imposed). This is an important feature, given that most variables in our commercial context are not normally distributed and exhibit a non-trivial dependence structure (e.g., strong nonlinear correlation and autoregressive effects are detected in the dataset). Second, all information contained in the variable probability distribution is preserved, which allows the estimation of confidence intervals and other important statistics. Third, unlike Markov Chain models and other probabilistic models which use customer segments, our model adaptively searches for homogenous customer segments without loss of information about variable distributions. Fourth, the model can work with partial information and missing variable values, and can still produce meaningful forecasts if some values of the predictive variables are not available. This is especially important in two real-world situations in particular: the prediction of future revenue for new-to-bank customers, for whom only partial information is available initially, and overcoming the problem of missing values in datasets due to the imperfections of the company data collection process. Finally, the effects of errors leading to outliers in the raw company data are reduced in our adaptive segmentation approach by scaling. A potential disadvantage of our non-parametric model is that additional analyses would be required to draw generalizing inferences about the relationships in the data if this is needed for the company’s decision process, but there are many applications where this is not an issue. 4. Optimisation of computational efficiency In a practical application, the computational intensity of our adaptive segmentation approach is a potential

5 If the number of possible values of a discrete variable is large, one may consider reducing its dimensions using problem-specific considerations or existing data-driven methods (e.g. Greenacre, 1988, 1993).

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

issue. We therefore use an algorithm for prior coarse segmentation, which substantially reduces the computational demands. As a pre-processing step, a coarse segmentation is applied to the customer dataset that will be used for local segmentation using the set of predictive variables. The boundaries of the coarse segments are chosen so that the coarse segments are non-overlapping and equally populated. We use the following procedure to obtain the boundaries. At the first step, the customer dataset for local segmentation is ordered by the first continuous predictive variable and divided into q equally sized subsets.6 Here, √ k q = S, where S is the number of coarse segments and k is the number of continuous predictive variables. In step two, each of the q groups is ordered by the second predictive variable and divided into q equally sized subsets. At the end of step two, q2 bins are obtained. In step three, each bin from step two is ordered by the third continuous predictive variable and divided into q subsets. This process is repeated k times altogether (for each continuous predictive variable), so that we end up with qk bins. The bins from the last split give the coarse segments, and we record their boundaries and centres. If one or more discrete variables are included in the predictive variable set of the model, a different approach is required. The customer dataset is first divided into c1 subsets, where c1 is the number of categories within the first discrete variable; second, each subset from the previous step is divided further into c2 subsets, where c2 is the number of categories within the second discrete variable. The latter procedure is repeated a number of times equal to the number of discrete variables. A procedure for continuous variables as described above is next applied to the bins obtained from the segmentation using discrete variables. The similarity measure is then calculated only for the central points of the coarse segments instead of for all entries in the dataset, drastically reducing the required computations. Only the coarse segments which are the closest to the current state of the customer in question are selected; the entries from these segments are then pooled together and a local segment is produced out of the pooled entries. Fig. 2 illustrates this algorithm. The proposed approach could potentially exclude a small number of entries on the external border of a local segment from the area available for local segmentation, even though they belong to the local segment. Such excluded entries are illustrated by the top part of the circle defining the local segment (no hatching), which lies outside the red-shaded area (vertical hatching) in Fig. 2; although this is part of the local segment, it is not included in the selection of coarse segments. However, the excluded area is very small and has a negligible effect on the estimated values. The segment size is the same across all coarse segments, in order to ensure equal representation.7 The coarse

6 The ordering of the predictive variables in the pre-processing step can be arbitrary. 7 If discrete variables are included in the set of predictive variables, coarse segments will be equally populated for a given value of a discrete variable.

513

Fig. 2. Adaptive segmentation: optimisation of computation. The large red dots and red lines correspond to the centres and boundaries of the coarse segments; the green dot (medium sized dot enclosed by a circle representing the boundary of the local segment) represents the current state of a customer whose revenue is forecast; the red shaded area (vertical hatching) covers the closest coarse segments, and the grey shaded area (horizontal hatching) shows the overlap between the local segment and the closest coarse segments. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

segment size is chosen with the objective of minimising the computation required at the √ segmentation stage. A coarse segment size close to 1.5 N satisfies this objective in our dataset for a given size of the local segment (where N is the number of customer entries in the local segmentation dataset).8 5. Model estimation and validation This section presents the model estimation and validation results using data from a UK retail bank. Model validation in our approach has two objectives: (a) to choose the set of predictive variables with the best predictive power, and (b) to evaluate the predictive performance of the model. The predictive performance of our model is also compared to the performances of two benchmark models: the simple and multiple regression models. Donkers et al. (2007) find that a simple profit regression model performs well compared to more complex relationship-level and service-level models when predicting CLV within a retail financial services organisation. 5.1. The data The model estimation and validation are performed using historic revenues and other retail customer data for 467 789 retail banking customers of National Australia Group Europe over the period 2005–2008. Our dataset covers five major product groups: money transmission accounts, savings accounts, unsecured personal

8 The function for the number of computations at the segmentation stage √ obtains a global minimum when the coarse segment size is close to 1.5 N; details are available from the authors upon request.

514

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

Table 1 Training and out-of-sample and out-of-time validation datasets covering three financial year periods: 2005–2006, 2006–2007 and 2007–2008. Customer dataset

Predictive variables: financial year

Revenue: financial year

One-period-ahead prediction Training 2005–2006 Validation 2006–2007

2006–2007 2007–2008

Two-period-ahead prediction Training 2005–2006 Validation 2005–2006

2006–2007 2007–2008

loans, home loans, and credit cards. For each of these product groups, we have records of customers’ monthly balances and turnovers, numbers of product holdings within each product group, dates of product purchases and product maturities, and also the initial volume of purchase (if applicable). We also have other customer-level information which is updated monthly: a customer’s age, the length of time a customer has been with the bank, her internal marketing segment, banking relationship segment, product holdings, frequency of transactions across all products, average debit and average credit turnovers, absolute value of the customer’s business with the bank, several indicators of customer credit risk, provision for bad and doubtful debt, and the revenue which a customer has generated for the bank in a given month at the product and customer levels. Only customers who stayed with the bank, i.e. kept at least one account active, over the entire threeyear period are included in our dataset.9 The post-estimation validation of our model is done using the out-of-sample and out-of-time observations (Table 1). Approximately 10% of our dataset is set aside for the validation set and is not included in the estimation. Setting aside a greater number of observations for the outof-sample validation will change the density of observations in our training sample and may affect the accuracy of estimation using our approach. 5.2. Input selection The main purpose of this study is to predict the customer revenue as an outcome of the customer behaviour. However, the customer revenue, expressed in absolute figures, is affected by external factors, even if the customer behaviour stays the same in different time periods. For example, the change in the base rate of the Bank of England has a direct impact on the sales margin of a financial services provider. Many long-term financial products are sold with a fixed interest rate, and products which are sold prior

9 The historic attrition rate, capturing customers who have left the bank by closing all of their accounts, is quite low. Most customers in our database open a money transmission account with the bank as a matter of convenience, even if their actual business with the bank relates to some other product (e.g., a personal loan), and often these accounts are retained with a small balance even if the customer moves their balances on other products elsewhere. This hidden attrition occurs at a considerably higher rate than complete attrition. The hidden attrition process is captured in our dataset by the presence of customers with residual dormant accounts, allowing this sort of outcome to be considered in the CLV estimation process.

to the change in the base rate will bear the old price (either increasing or decreasing the amount of the sales margin, depending on the direction of this change). In addition, the interest rates on retail financial products are not always adjusted in a direct correspondence with changes in the base rate. The most recent significant mismatch between the Bank of England rate, the costs of wholesale borrowing and interest rates on retail financial products was observed throughout the second half of 2008, at the peak of the financial crisis. Intense competition may also cause a financial provider to reduce their sales margins. In order to account for changes in the future revenue due to external factors, the set of predictive variables in our model is chosen by maximizing the rank correlation statistics between the predicted and actual values. Kendall’s τ is used for this purpose. This statistic has the advantage of being insensitive to any variable transformation which does not change the ordering in the population. Using this statistic for choosing the model ensures that the final model will best preserve the relative ordering in the target variable (in our case, the revenue from a customer) and accurately predict customers with higher future revenues versus customers with lower future revenues, even if the revenue value itself is affected by external factors in the period to which the prediction relates. We apply an iterative algorithm for input selection and for the determination of the optimal local segment size. In the first step, we employ a procedure for input selection which is similar to the standard stepwise procedure used in the multiple linear regression models. The procedure starts with an initial model. At each step, candidate variables are systematically added to and removed from the model; and Kendall’s τ of incrementally larger and smaller models are compared. This procedure is stopped when Kendall’s τ reaches its maximum, and adding or removing candidate variables does not produce a statistically significant improvement. We use the data from our validation dataset, which is set aside prior to estimation, for the out-of-sample assessment of alternative model specifications during the input selection process. Note that this assessment is performed using out-of-sample but not out-of-time observations. Given the non-parametric nature of the adaptive segmentation approach, our input selection process resembles a process of validation.10 During input selection, the size of the local segment is set equal to an arbitrary reasonable value. In the next step, we determine an optimal local segment size. Kendall’s τ is again used to assess the sensitivity of the accuracy of prediction to the size of the local segment. When increasing the size of the local segment, Kendall’s τ first increases, then reaches a maximum value, plateaus at the maximal level, and eventually begins to decrease. This pattern can be explained as follows. With a small segment size, the empirical predictive distribution is dominated by customers that closely match the current

10 We repeat our input selection procedure with a limited number of the most likely candidate variables, using a number of randomly drawn outof-sample test sets to ensure that the results are not specific to a given test set.

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

5.3. Model validation After the set of input variables and the optimal segment size have been chosen, we validate our one-period-ahead and two-period-ahead model predictions against actual values. We are limited by data availability from performing the validation of predictions for a greater number of periods ahead. The model provides robust predictions of customer revenue for one and two periods ahead. In Fig. 3, the slope of the line around which the model predictions are grouped indicates the predictive power of our model.11 Predictions along the diagonal line indicate perfect forecasts where the predicted revenue is exactly equal to the actual revenue. A low predictive power results in a near-zero slope, where the predicted revenues are scattered around the mean revenue in the population. A higher predicted than actual revenue value for customers in the lower revenue range could be due to the decrease in sales margins on many retail products which occurred in the second half of 2007 and in 2008. This trend should not, however, affect the robustness of the prediction of relative revenue ranking, as discussed above. The predictive power of our adaptive segmentation approach is compared to those of two alternative models: the simple and multiple regression models (Tables 2 and 3). In the simple regression model, the customer revenue at time t is a linear function of the customer revenue at t − 1. The multiple regression model includes additional variables, namely those which are used in our adaptive segmentation model. The same training and validation datasets are employed for assessing the out-of-sample and out-of-time predictive performances of all three models. In the two-year prediction exercise using the multiple regression model, the values of the predictive variables (except for the revenue) at time t + 1 are set equal to their values at time t, as suggested by Donkers et al. (2007). An improvement in both the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) is measured by a reduction in the prediction error relative to the

11 The actual revenue figures and predictive variables in our model cannot be revealed, because this is commercially sensitive information.

One-period-ahead prediction

Prediced revenue

103 102 Model prediction 2.5% prediction percentile 97.5% prediction percentile Ideal forecast

101 100 0 10

101

102

103

Actual revenue Two-period-ahead prediction 103 Prediced revenue

state. The lack of adequate variation in the underlying data may therefore lead to a prediction which is very similar to the current state. Over-expanding the size of the local segment, however, allows too many customers with different behavioural characteristics into a local segment, thus shifting the predictive distribution for this segment towards the population mean. Both situations lead to an inferior predictive power of the model. A local size segment is chosen on the basis of these considerations. At the final step, we repeat our stepwise input selection procedure, only this time we use a subset of the most likely candidate variables which contributed to an increase in Kendall’s τ at the first stage. The last step verifies that the set of predictive variables which was chosen at stage one, and which includes four variables in our case, has not changed with the adjustment in the size of the local segment.

515

102 Model prediction 2.5% prediction percentile 97.5% prediction percentile Ideal forecast

101 100 100

101

102

103

Actual revenue Fig. 3. Actual versus predicted customer revenue; logarithmic scale.

benchmark model (the simple regression model) divided by the mean revenue and multiplied by 100. Thus, a positive value signals a superior performance compared to the benchmark model in Tables 2 and 3. The value can be interpreted as an improvement in percentage points of average revenue relative to the benchmark model. For both one- and two-period-ahead predictions, the adaptive segmentation model outperforms the two alternative models. For one-period-ahead prediction, the adaptive segmentation approach improves the MAE by 9.2% and the RMSE by 6.2% of the average value of the customer revenue, compared to the simple regression model. The improvement in both MAE and RMSE over the benchmark model is higher for the two-period-ahead predictions. The corresponding figures are 15.2% and 9.3%. The out-of-sample R2 is also higher for our model.12 The adaptive segmentation approach is also better at predicting the relative orderings of future revenues from customers compared to the benchmark model, according to the fraction of the concordant and discordant pairs at both horizons and Kendall’s τ at the two-period horizon. The multiple regression model marginally improves the one-period-ahead prediction of the simple regression model according to the MAE, RMSE, and out-of-sample R2 , but not in terms of the indicators of the relative ordering. The two regression models show similar performances at the two-year horizon. From a business point of view, the main aim of this forecast exercise is to be able to spot customers whose revenues are likely to increase in the coming years. To

12 We adopt the Goyal and Welch (2008) approach to the calculation MSE of an out-of-sample R2 : R2 = 1 − MSE P , where MSEP is the MSE of H the out-of-sample predictions of the model and MSEH is the MSE of the historical sample mean (the mean revenue of the training set, in our case). It indicates how a model performs relative to the historical sample mean.

516

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

Table 2 Prediction accuracy: out-of-sample and out-of-time validation of the one- and two-period-ahead predictions of the revenue from individual customers. Model

One-period-ahead prediction Adaptive segmentation Simple regression Multiple regression Two-period ahead prediction Adaptive segmentation Simple regression Multiple regression

Improvement in (in percentage points)

R2

Kendall’s τ

Concordant pairsa

Discordant pairsa

MAE

RMSE

9.2 0.0 0.3

6.2 0.0 1.9

0.69 0.65 0.66

0.78 0.78 0.76

0.85 0.84 0.84

0.09 0.10 0.10

15.2 0.0 1.6

9.3 0.0 −0.2

0.47 0.40 0.40

0.69 0.68 0.65

0.81 0.80 0.79

0.14 0.15 0.15

a Concordant and discordant pairs do not add up to one because of ties. The ties in our estimation are mostly caused by the presence of ties in the observed revenues in our data sample.

Table 3 Improvement in prediction accuracy over the benchmark model, simple regression, separately for two customer groups: those whose actual revenue increased during the forecast period and the remaining customers. Model

Improvement in: (in percentage points) MAE

R2

RMSE

Customers whose actual revenue increased One-period-ahead prediction Adaptive segmentation Simple regression Multiple regression Two-period-ahead prediction Adaptive segmentation Simple regression Multiple regression

8.6 0.0 1.7

11.3 0.0 2.2

0.73 0.70 0.70

14.8 0.0 −0.2

23.4 0.0 0.6

0.57 0.48 0.49

9.8 0.0 0.1

2.7 0.0 1.7

0.67 0.65 0.66

13.3 0.0 1.6

−0.1

0.33 0.33 0.24

Customers whose actual revenue did not increase One-period-ahead prediction Adaptive segmentation Simple regression Multiple regression Two-period-ahead prediction Adaptive segmentation Simple regression Multiple regression

0.0 −4.2

this end, Table 3 presents the improvements in MAE and RMSE, and also the out-of-sample R2 for two subsets of customers: those whose actual revenue increased considerably (as measured by a given threshold) during the forecast periods (here called dynamic customers), and those whose revenue did not increase in these periods.13 The adaptive segmentation approach performs particularly well for the dynamic subset of customers whose revenue increased significantly in the two forecast periods, compared to the two alternative models. The advantage of our model in respect to the predictive performance is greater at the longer (two-year) forecast horizon for this subset of customers. The improvements in MAE and RMSE over the benchmark model are 14.8% and 23.4% per average revenue. Our model also outperforms

13 Stable customers whose revenue does not change significantly from year to year constitute a large part of our dataset. The statistics for the two customer subsets are therefore calculated using the out-of-time but not out-of-sample forecast values, in order to ensure that our subset of dynamic customers is large enough to make the sample statistics meaningful.

the other models for the more stable subset of customers at the one-period forecast horizon, as well as for the two-period forecast horizon in terms of MAE. This means that the improved prediction for dynamic customers can be capitalized on without compensatory problems in the more stable elements of the customer base. The adaptive segmentation model and the simple regression model give comparable prediction accuracies in terms of the RMSE and R2 at the two-year forecast horizon for customers whose revenues have not increased. The out-of-sample R2 values are lower for the twoperiod-ahead predictions for all three models. This is to be expected, as more distant periods are more difficult to predict. Part of our data for this validation comes from the year 2008, which coincides with the banking crisis. During that year, there was a market-wide misalignment between the cost of funds and the lending interest rate, and also an unusually high growth in bad and doubtful debts. Our model demonstrates a strong performance in this nontrivial environment. 6. Other applications With a view to directing the customer management effort and marketing spend toward customers with the greatest future profit generating potential, it is useful for management to identify those customers whose revenue is likely to significantly increase or decrease in future periods. We can therefore use our modelling approach to address this by the prediction of binary variables indicating whether the revenue from a customer is likely to increase (decrease) above (below) a given threshold or remain between the thresholds in the next period(s). Given that companies often do not possess information on the details of their customers’ activities with other providers, this objective can be rather challenging when traditional models are used. By matching a customer to the right local segment, our adaptive segmentation approach allows the extraction of relevant information from the behavioural patterns of the customers in their local segment, who possess similar characteristics and past behaviours. This can augment the directly-measurable information that companies normally have available. In order to assess the performance of our model in this context relative to a benchmark model, the three binary target variables were also predicted using the routinely used probit model with the same set of predictive

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

517

J

Fig. 4. ROC curves for the one-period-ahead prediction of customer behaviour. Table 4 Model statistics: the out-of-time assessment of the one-period-ahead forecasts of three binary variables indicating (1) an increase in revenue above the minimum upper threshold (‘‘Jump up’’), (2) a decrease in revenue below the minimum lower threshold (‘‘Jump down’’), and (3) a change in revenue within the thresholds (‘‘Stable’’).

states, and its ranking approach makes it useful in situations where the environmental characteristics are unstable or unpredictable. It is also suitable in situations where the data quality is not ideal.

Forecast variable

Model

Area under the curve

Accuracy

7. Conclusion

Jump up

Adaptive segmentation Probit Adaptive segmentation Probit Adaptive segmentation Probit

0.8725 0.8126 0.9322 0.9229 0.9360 0.9040

0.8078 0.7556 0.8704 0.8565 0.8761 0.8619

This paper proposes an adaptive segmentation approach to the modelling of the lifetime value for individual customers in a multi-service financial organisation. The main contribution of our modelling approach is the adaptive location of homogeneous customer segments, avoiding a loss of information about variable distributions which would compromise the accuracy of the predictions (as in Markov Chain and other probabilistic models using customer segments). This is important, because customer behaviours are very varied in the financial sector, even for customers who are similar in terms of the characteristics which are normally used for customer segmentation (e.g., customer age and personal income). Customers purchase more than one product or service, and these purchases are not independent. The revenue from a customer is determined not only by her purchase decision, but also by the purchase amount, which varies widely between customers. Other advantages of our modelling approach are that it does not require any assumptions about the shape of the variable distributions or the correlation structure between them. The model can work with partial information and missing variables, producing a meaningful forecast even if some values for the predictive variables are not available. This allows the prediction of future revenue for newto-bank customers, for whom only partial information is initially available, and also overcomes the problem of the missing values which occur in many company datasets due to the imperfections of the company data collection process. These features are extremely desirable in a practical context.

Stable Jump down

variables as in our adaptive segmentation model.14 , 15 The training and validation datasets for assessing the out-ofsample and out-of-time predictive performances of the two models are constructed in a way similar to that described in Table 1 for one-period-ahead predictions (Table 4 and Fig. 4). The adaptive segmentation approach outperforms the probit model for all three variables, with an advantage that is greater for customers whose revenue increased or decreased in a future period, according to the area under the curve. Our approach can be employed similarly for the prediction of the customer attrition rates which act as an input variable for CLV estimation in Eq. (1). Another potential application of our adaptive segmentation approach would be for predicting the propensity for a customer to purchase a financial product and the associated purchase volume. In general terms, this modelling approach can be applied in a range of predictive applications, whether of values or

14 A separate binary probit regression model is estimated for each of the three target variables. 15 We also used the logit model; it produced results which were very similar to those from the probit model, and therefore we do not report these findings here.

518

A. Audzeyeva et al. / International Journal of Forecasting 28 (2012) 507–518

We apply our model to the estimation of future customer revenue in a UK retail bank over the period 2005–2008. Our model provides robust predictions of the revenues from individual customers, and also of significant changes in their revenues (compared to benchmark models), using a small number of predictive variables. The accurate identification of dynamic customers is particularly useful from a practical viewpoint, as these are a focus of attention in business terms, and need to be differentiated from the larger volume of stable customers. The validation of our 2-period-ahead forecasts using 2008 data, which coincides with the crisis in the banking sector, confirms that the relative ranking approach still produces robust results during times of significant economic change. Other potential applications include the prediction of other customer-related characteristics, which could be either continuous or discrete variables. This provides a powerful tool in the development of tailored customer acquisition and retention strategies. Although the approach has been developed in the context of the financial services sector, it has a broad potential for predictive applications in other complex business environments for which the standard approaches do not offer a satisfactory solution. Acknowledgments This research is supported by the Knowledge Transfer Partnership between the University of Leeds and National Australia Group Europe, trading as Clydesdale Bank and Yorkshire Bank. The authors thank Martin Allton, Head of Customer Knowledge, and Lucy Marshall, Customer Insight Manager, of National Australia Group Europe for their valuable contributions to this project. The authors thank participants at the Credit Scoring and Credit Control XI conference, Edinburgh, 2009, for helpful comments on an earlier version of this work. References Berger, P. D., & Nasr, N. I. (1998). Customer lifetime value: marketing models and applications. Journal of Interactive Marketing, 12, 17–29. Donkers, B., Verhoef, P. C., & de Jong, M. (2007). Modeling CLV: a test of competing models in the insurance industry. Quantitative Marketing and Economics, 5, 163–190. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall/CRC. Fader, P. S., Hardie, B. G. S., & Lok Lee, K. (2005). ‘‘Counting your customers’’ the easy way: an alternative to the Pareto/NBD model. Marketing Science, 24, 275–284. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874. Finlay, S. (2010). Credit scoring for profitability objectives. European Journal of Operational Research, 202, 528–537. Fishelson-Holstine, H. (1998). Case studies in credit risk model development. In E. Mays (Ed.), Credit risk modeling (pp. 169–180). Chicago: Glenlake Publishing. Goyal, A., & Welch, I. (2008). A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies, 21, 1455–1508. Greenacre, M. J. (1988). Clustering the rows and columns of a contingency table. Journal of Classification, 5, 39–51.

Greenacre, M. J. (1993). Correspondence analysis in practice. London: Academic Press. Gupta, S., Hassens, D., Hardie, B., Kahn, W., Kumar, V., Lin, N., et al. (2006). Modeling customer lifetime value. Journal of Service Research, 9, 139–155. Hand, D. J. (1997). Construction and assessment of classification rules. In Wiley Series in Probability and Statistics, London: John Wiley & Sons. Jain, D., & Singh, S. S. (2002). Customer lifetime value research in marketing: a review and future directions. Journal of Interactive Marketing, 16, 34–46. Kamakura, W. A., Ramaswami, S., & Srivastava, R. K. (1991). Applying latent trait analysis in the evaluation of prospects for cross selling of financial service. International Journal of Research in Marketing, 8, 329–349. Kamakura, W. A., Wedel, M., de Rosa, F., & Mazzon, J. A. (2003). Crossselling through database marketing: a mixed data factor analyser for data augmentation and prediction. International Journal of Research in Marketing, 20, 45–65. Knott, A., Hayes, A., & Scott, A. N. (2002). Next-product-to-buy models for cross-selling applications. Journal of Interactive Marketing, 16, 59–75. Li, S., Sun, B., & Wilcox, R. T. (2005). Cross-selling sequentially ordered products: an application to consumer banking services. Journal of Marketing Research, 42, 233–239. Malthouse, E. C., & Blattberg, R. C. (2005). Can we predict customer lifetime value? Journal of Interactive Marketing, 19, 2–16. Morrison, D. G., Chen, R. D. H., Karpis, S. L., & Britney, K. E. A. (1982). Modelling retail customer behaviour at Merrill Lynch. Marketing Science, 1, 123–141. Oliver, R. M. (1993). Effects of calibration and discrimination on profitability scoring. In Proceedings of credit scoring and credit control III. Credit Research Centre, University of Edinburgh. Pfeifer, P. E., & Carraway, R. L. (2000). Modeling customer relationships as Markov chains. Journal of Interactive Marketing, 14, 43–55. Reinartz, W., & Kumar, V. (2003). The impact of customer relationship characteristics on profitable lifetime duration. Journal of Marketing, 67, 77–99. Schmittlein, D. C., Morrison, D. G., & Colombo, R. (1987). Counting your customers: who are they and what will they do next? Management Science, 33, 1–24. Thomas, L. C. (2000). A survey of credit and behavioural scoring: forecasting financial risk of lending consumers. International Journal of Forecasting, 16, 149–172. Zeithaml, V., Rust, R. T., & Lemon, K. N. (2001). The customer pyramid: creating and serving profitable customers. California Management Review, 43, 118–142.

Alena Audzeyeva is a Lecturer in Finance at Keele University. Her current research interests include modelling customer behaviour, estimation of credit rating transitions, and explaining and forecasting the termstructure of yield spreads on defaultable debt. Prior to coming to Keele Alena held a position of a KTP Associate with the University of Leeds and National Australia Group Europe. She has seven years of experience in the banking industry in the UK and abroad.

Barbara Summers is a member of the Centre for Decision Research at Leeds University Business School. Her recent work has focused on issues such as the impact of emotion (affect) on financial decision making, how individuals evaluate investments, and factors affecting consumers’ financial choices. Prior to her move to an academic career she spent a number of years working in software development, including over six years in the financial services industry, rising to the position of Head of Systems Development at Equifax Europe, a UK credit reference agency.

Klaus Reiner Schenk-Hoppé holds the Centenary Chair in Financial Mathematics at the University of Leeds, a joint post between the Business School and the School of Mathematics. His research interests are in finance, economics and applied mathematics, with a particular focus on evolutionary models of financial markets and investment. Klaus has published extensively in international journals and was co-editor of the volume ‘‘Financial Markets: Dynamics and Evolution’’ for the NorthHolland Handbooks in Finance series in 2009.