Cost-benefit factor analysis in e-services using bayesian networks

Cost-benefit factor analysis in e-services using bayesian networks

Expert Systems with Applications 36 (2009) 4617–4625 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

248KB Sizes 0 Downloads 25 Views

Expert Systems with Applications 36 (2009) 4617–4625

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Cost-benefit factor analysis in e-services using bayesian networks Jie Lu a,*, Chenggang Bai b, Guangquan Zhang a a b

Faculty of Information Technology, University of Technology, Sydney P.O. Box 123, Broadway, NSW 2007, Australia Department of Automatic Control, Beijing University of Aeronautics and Astronautics, China

a r t i c l e

i n f o

Keywords: e-Services Bayesian networks Evaluation Cost-benefit factor analysis Inference

a b s t r a c t This study applies Bayesian network techniques to analyze and verify the relationships among cost factors and benefit factors in e-service systems. This study first establishes a Bayesian network for e-service cost-benefit factor relationships based on our previous study [Lu, J. & Zhang, G. Q. (2003). Cost benefit factor analysis in e-services. International Journal of Service Industry Management (IJSIM), 14(5), 570– 595]. It then calculates conditional probability distributions among these factors shown in the Bayesian network. Finally it runs a Junction-tree algorithm to conduct inference for verifying these cost-benefit factor relationships, and the data collected through a survey is as evidences in the inference process. Through the above application of Bayesian network techniques a set of useful findings is obtained for the costs involved in e-service developments against the benefits received by adopting these e-service systems. The case of ‘increased investments in maintaining e-services’ would significantly contribute to ‘enhancing perceived company image’, and the case of ‘increased investments in security of e-service systems’ would bring high benefits in ‘building customer relationships’ and ‘improving cooperation between companies’. These findings have great potential to improve the strategic planning of businesses by determining more effective investments items and adopting more suitable development activities in e-service systems and applications. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction Since the mid-1990s, businesses have spent quite a bit of time, money and effort developing web-based electronic service (e-service) systems. These systems assist businesses in building more effective customer relationships and gaining competitive advantage through providing interactive, personalized, faster e-services to customers (Chidambaram, 2001). Businesses in the earlier stages of employing web-based e-service systems had little data, knowledge, and experience for assessing and evaluating the potential impacts and benefits of e-services for organizations. Organizational efforts were largely geared toward customer service provision with little thought to identifying and measuring the costs involved in e-service development against the benefits received by adopting e-services. After several years’ experience of e-service provision, businesses now urgently need to do it for planning their further development in e-services. Importantly, businesses have obtained related e-service systems running data and knowledge, which can directly help identify which items of investments for an e-service system effectively contribute to what benefit aspects of business objectives.

* Corresponding author. Tel.: +61 02 95141838. E-mail addresses: [email protected] (J. Lu), [email protected] (C. Bai), zhangg@ it.uts.edu.au (G. Zhang). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.05.018

With the wide development of e-services, researchers have expressed increasing interest in evaluating the success, quality, usability and benefit of e-service systems from various views and using various methods (DeLone & McLean, 2004; Wade & Nevo, 2005). A major focus in this area is the evaluation for the features, functions or usability of e-service systems. Typical approaches used are testing, inspection and inquiry (Hahn & Kauffman, 2002) through a web search or a desk survey such as the results reported in Ng, Pan, and Wilson (1998), Smith (2001), Lu, Tang, and McCullough (2001). Another type of related research is the evaluation of customers’ satisfaction for e-services. Questionnaire-based survey and multi-criteria evaluation systems are widely used to conduct this kind of research such as Lin (2003) and Srinivasan et al. (2002). Moreover, some significant results are reported in the establishment of e-service evaluation models and framework, such as the results shown in Lee, Seddon, and Corbitt (1999), Zhang and von Dran (2000) and Giaglis et al. (1999). However, the research discussed above only focuses on the evaluation of an e-service system itself from the user point of view by measuring either customer satisfaction or functionality of the e-service system. Although some research addresses the view of e-service providers such as Giaglis, Paul, and Doukidis (1999) presented a case study of e-commerce investment evaluation, Drinjak et al. (2001) and Piris et al. (2004) investigated the perceived business benefits of investing in e-service systems, and

4618

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625

Amir, Awerbuch, and Borgstrom (2000) created a cost-benefit framework for online system evaluation, lack of exploration and deep analysis of possible relations to link these investment items with related business benefits. Furthermore, businesses would like to know if their investments in e-service systems are successful by conducting cost and benefit analysis. The investments (costs) include e-service related software development, database maintenance, website establishment, staff training and other items. Similarly, the benefits obtained through e-service applications include many aspects, such as increasing the number of customers, better business image, and competitive advantages. Therefore, businesses, as e-service providers, need to know which item(s) of their investments are more important and effective than other items for achieving their business objectives, and also which item(s) of their investments can bring more obvious benefits for certain aspect(s) of the businesses. These results will directly or indirectly support better business strategy making in e-service application developments. Our previous research reported in (Lu & Zhang, 2003) identified some inter-relationships and interactive impacts among costs and benefits via providing e-services to customers by using the linear regression and ANOVA analysis approaches. Since some inter-relationships among the above mentioned cost-benefit factors may be non-linear, as a further study, this paper reports how these costbenefit factor relationships are verified and how uncertain relationships are identified by applying the Bayesian network techniques. After the introduction, this paper outlines our previous work including an e-service cost-benefit factor framework, data collection process, and a cost-benefit factor-relation model in Section 2. Section 3 analyses how Bayesian network techniques are applied in finding any relationships among cost and benefit factors. The detailed process of establishing a cost-benefit Bayesian network and conducting inference among cost and benefit factors are presented in Section 4. Section 5 reports our findings on the relationships between cost and benefit factors in e-service systems. Conclusions and further study are discussed in Section 6.

This study collected data concerning e-service development costs and benefits from a sample of Australia companies (e-service providers). In order to select the sample, this study first conducted a web search for finding companies which had adopted e-services on an appropriate level and period. A total of 100 companies were randomly selected from Yellow Pages Online (New South Wales, Australia) http://www.yellowpages.com.au under Tourism/Travel (including Accommodation and Entertainment) and IT/Communication categories (including Information Services). A survey was then conducted by sending a questionnaire to these sample companies. Out of 34 questions in the questionnaire, some were related to the costs of developing an e-service application, and some were related to the benefits obtained from developing an e-service application. A total of 48 completed responses were used in this study. In the questionnaire, all cost related questions listed use a five-point Likert scales: 1 – not important at all, 5 – very important. For example, if a company thinks the cost of maintaining an e-service is very important it records the degree of importance of the factor as 5. A 5-point scale is also used for present benefit assessment: 1 – low benefit, 5 – very high benefit. For example, if a company considers that, currently, their e-service only helps a little in customer relationship management, then the company would score the perhaps 1 or 2 on the present benefit assessment for benefit factor B1 (building customer relationships). Tables 1 and 2 are the summaries about the data collected and used in the study. The survey result has been firstly used to identify why companies adopt e-service systems, what the main cost factors of current e-service systems are, and what kinds of benefits have been obtained. It has been also used to initially create a cost-benefit factor-relation model which identifies the relationships among cost factors and benefit factors, as described below.

2. Previous research review

2.3. Initial cost-benefit factor-relation model

2.1. e-Service cost-benefit factor framework

By completing a set of ANOVA tests for data collected from the survey, a set of relationships between cost and benefit factors have been obtained (Lu & Zhang, 2003). These relationships reflect that certain cost factors have a significant effect on certain benefit factors. These effects are presented in a cost-benefit factor-relation model (Fig. 2). The lines in the model express the ‘effect’ relationships between related cost factors and benefit factors. Although

e-Service cost (C) is the expenses incurred in adopting e-services, such as expense of setting up e-service and maintaining e-service. e-Service benefit (B) is concerned with the benefits gained through employing e-services. Fig. 1 shows 16 benefit factors and 8 cost factors of e-service system developments and applications.

All these factors have been well identified and described in Lu and Zhang (2003). 2.2. Data collection

E-service benefits (B) Factor B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16

Description Building customer relationships Broadening market reach Lowering of entry barrier to new markets Lowering the cost of acquiring new customers Global presence Reducing information dissemination costs Reducing advertising media costs Reducing operation (transaction) costs Reducing transaction time Reducing delivery time Gathering information to create customer profiles Customer and market research facility Cooperation between companies to increase services Enhancing perceived company image Realizing business strategies Gaining and sustaining competitive advantages

E-service costs (C) Factor C1 C2 C3 C4 C5 C6 C7 C8

Fig. 1. e-Service cost-benefit factor framework.

Description Expense of setting up E-service Expense of maintaining E-service Internet connection cost Hardware/software cost Security concerns cost Legal issues cost Training cost Rapid technology change cost

4619

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625 Table 1 Summary of the data on benefit factors collected Benefit factor (5-point scale)

1 – not important 2 3 4 5 – very important NA

No. of companies’ responses in each benefit factor B1

B2

B3

B4

B5

B6

B7

B8

B9

B10

B11

B12

B13

B14

B15

B16

2 7 16 14 9 0

1 6 21 12 8 0

6 13 18 7 3 1

4 13 15 9 7 0

9 10 8 13 8 0

5 8 17 12 5 1

6 7 13 15 6 1

7 7 18 8 7 1

7 10 12 10 9 0

7 9 13 8 9 2

10 7 10 14 5 2

9 12 17 3 5 2

7 8 12 13 6 2

1 5 12 12 17 1

3 7 18 12 7 1

2 8 15 12 10 1

Table 2 Summary of the data on cost factors collected Cost factor (5-point scale)

1 – not important 2 3 4 5 – very important NA

Benefit B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16

No. of companies’ responses in each cost factor C1

C2

C3

C4

C5

C6

C7

C8

7 5 10 17 9 0

6 10 12 17 3 0

15 10 15 7 1 0

6 7 15 13 7 0

7 9 11 12 7 2

9 11 14 8 4 2

9 9 17 9 2 2

8 7 9 19 3 2

Effect relationships

Cost C1 C2 C3 C4 C5 C6 C7 C8

Fig. 2. Cost-benefit factor-relation model.

every cost factor can contribute, directly or indirectly, to all benefit factors to a certain degree, some cost factors are more important for the improvement of particular benefit factors than others. However, it is necessary to verify these initially identified relationships which could be uncertain and check their consistency with the data collected. Bayesian network techniques can effectively support these tasks. 3. Bayesian network techniques Bayesian network techniques are a kind of powerful knowledge representation and reasoning tools under conditions of uncertainty. A Bayesian network B = hN, A, Hi is a directed acyclic graph (DAG) hN, Ai with a conditional probability distribution (CPD) for each node, collectively represented byH, each node n 2 N represents a variable, and each arc a 2 A between nodes represents a probabilistic dependency (Pearl, 1988). In a practical application, the nodes of a Bayesian network represent uncertain factors, and the arcs are the causal or influential links between these factors. The association with each node is a set of CPDs that models the uncertain relationships between each node and its parent nodes. Using Bayesian networks to model uncertain relationships has been well discussed in theory by researchers such as Heckerman (1990) and Jensen (1996). Many applications have also proven that

Bayesian network is an extremely powerful technique for reasoning the relationships among a number of variables under uncertainty. For example, Heckerman, Mamdani, and Wellman (1995) applied Bayesian network techniques successfully into lymphnode pathology diagnosis and for data mining (Heckerman, 1997). Breese and Blake (1995) applied Bayesian network techniques in the development of computer default diagnosis. Comparing with other inference analysis approaches, Bayesian network techniques have four main advanced features in applications. Firstly, unlike neural network approach, which usually appears to users as a ‘black box’, all the parameters in a Bayesian network have an understandable semantic interpretation (Myllymaki, 2002). This feature helps users construct a Bayesian network directly by using their domain knowledge. Secondly, Bayesian network techniques have the ability to learn a relationship among its related variables. This not only lets users observe the relationship among its variables easily, but also can handle some data missing issues (Heckerman, 1997). Thirdly, Bayesian network techniques can conduct inference inversely. Many intelligent systems (such as feed-forward neural networks and fuzzy logic) are strictly one-way. That is, when a model is given, the output can be predicted from a set of inputs, but not vice versa. But, Bayesian networks can conduct bi-direction inference. The fourth advanced feature is that Bayesian network techniques can combine prior information with current knowledge to conduct inference as it has both causal and probabilistic semantics. This is an ideal representation for users to give prior knowledge which often comes in a causal form (Heckerman, 1997). These features will guarantee that using Bayesian networks is a good way to verify those initially identified uncertain relationships between cost factors and benefit factors in the development of e-services.

4. Bayesian networks based cost-benefit factor relationship analysis In general, there are three main steps when applying Bayesian network techniques in analyzing a set of relationships for a practical problem: (1) creating a graphical Bayesian network structure for the problem, (2) calculating related conditional probabilities to establish a Bayesian network, and (3) using the established Bayesian network to conduct inference for finding possible rela-

4620

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625

tionships among these factor nodes of the Bayesian network. The following sub-sections will describe the three steps in details.

B1

4.1. Creating a graphical structure for cost-benefit factor relationships We can find from Fig. 2 that there are no connections between benefit factors B3, B4, B6, B8, and B12 to any of cost factor nodes. That means that the effects of these cost factors on benefit factors are not significant. A graphical Bayesian network structure of cost and benefit factors relationships can be created by removing these unlinked factor nodes, shown in Fig. 3. These lines in the graphical Bayesian network structure express the significant ‘effect’ relationships between these factor nodes. These nodes and relationships shown in Fig. 3 are considered as a result obtained from domain knowledge. In order to test these established relationships, structural learning is needed to improve the Bayesian network by using collected real data from e-service applications. The data, discussed in Section 2, is therefore used to complete the structure learning of the Bayesian network. A suitable structural learning algorithm is first selected for conducting the structural learning of the Bayesian networks. Since the number of DAGs is super-exponential in these nodes of the Bayesian network, a local search algorithm, Greedy Hill-Climbing (Heckerman, 1996), is selected for the structural learning. The algorithm starts at a specific point in a space, checks all nearest neighbors, and then moves to the neighbor that has the highest score (Cooper & Herskovits, 1992). If all neighbors’ scores are less than of the current point, a local maximum is thus reached. The algorithm will stop and/or restart in another point of the space. By running the Greedy Hill-climbing algorithm for structure learning from the collected data (see Tables 1 and 2), an improved Bayesian network is obtained as shown in Fig. 4. Comparing to Fig. 3, the link between C2 and B1 is removed in Fig. 4 after the structural learning from the real data. Obviously, the Bayesian network shown in Fig. 4 is more consistent with the real data. This improved Bayesian network has 19 nodes and 16 links, and will be used for Bayesian rules based inference below. 4.2. Calculating the conditional probability distributions Now, let X = (X0, . . . , Xm) be a node set, Xi (i = 0, 1, . . . , m) be a discrete node (variable), in a Bayesian network B (m = 19) shown in Fig. 4. The CPD of the node Xi is defined as hBxi jPai ¼ PðX i ¼ xi jPai ¼ pai Þ (Heckerman, 1996), where Pai is the parent set of node Xi, pai is a configuration (a set of values) for the parent set Pai of Xi, and xi is a value that Xi takes. Based on the data collected in our survey, the CPDs of all nodes shown in Fig. 4 are calculated. Before using a Bayesian network to conduct inference, learning and establishing the parameters hBxi jPai from the data collected should be completed. In general, the easiest method to estimate the parameters hBxi jPai is to use frequency. However, as the size of data used in the study is not very large, using a frequency method may be not very effective. This study therefore selected the Bayes

B1

C1

C2

B10

B2

C3

B11

B5

B7

C4

C5

B13

B14

C6

B15

B9

C7

C8

B16

Fig. 3. Initial cost-benefit factor-relation graphical Bayesian network structure.

B2

C2

C1

B10

B5

C4

C3

B11

B13

B7

C5

C6

B14

B15

B9

C7

C8

B16

Fig. 4. Cost-benefit factor-relation Bayesian network after structure learning from data collected.

method for establishing related parameters. Based on Heckerman (1996)’s suggestions, the Dirichlet distribution is choose as the prior distribution for hBxi jPai by using the Bayes method. The Dirichlet distribution is the conjugate prior of the parameters of the multinomial distribution. The probability density of the Dirichlet distribution for variable h = (h1, . . . , hn) with parameter a = (a1, . . . , an) is defined by

DirðhjaÞ ¼

8 > CðaÞ > > : 0

n Q

Cðai Þ i¼1

a 1

hi i

h1 ; . . . ; hn P 0;

n P i¼1

hi ¼ 1 ;

others Pn

where h1, . . . , hn P 0, i¼1 hi ¼ 1, and a1, . . . , an > 0. The parameterai can be interpreted as ‘prior observation count’ for events governed by hi. P Let a0 ¼ ni¼1 ai . The mean value and variance of the distribution for hi can be calculated by (Gelman, Carlin, Stern, & Rubin, 1995)

Ehi ¼

ai ai ða0  ai Þ ; and Varðhi Þ ¼ 2 : a0 a0 ða0 þ 1Þ

When ai ? 0, the distribution becomes non-informative. The means of all hi(i = 0, 1, . . . , m) stay the same if all ai(i = 0, 1, . . . , m) are scaled with the same constant. If we do not know the difference among hi, let a1 =    = an. The variances of the distributions will become smaller as the parameters ai (i = 0, 1, . . . , m) increase. As a result, if prior information cannot be obtained, ai should be assigned with a small value. After the prior distributions are determined, the Bayes method also requires to calculate the posterior distributions of hBxi jPai and then complete the Bayes estimations of hi. To conduct this calculation, this study assumes that the state of each node can be one of the five values: 1 (very low), 2 (low), 3 (medium), 4 (high), and 5 (very high). Through running the approach, the CPDs of all cost and benefit factor nodes, shown in Fig. 4, are obtained. The paper only presents six very typical results of the relationships among some main cost factors and benefit factors. Table 3 shows the CPDs for node B5 (global presence) under cost factors C1 (expense of setting up e-service) and C4 (hardware/software cost). Table 4 shows the CPDs for node B7 (reducing advertising media costs) under cost factor C2 (expense of maintaining e-service). Table 5 shows the CPDs for node B11 (gathering information to create customer profiles) under cost factor C3 (internet connection cost). Table 6 shows the CPDs for node B13 (cooperation between companies to increase services) under cost factors C2 (expense of maintaining e-service) and C5 (security concerns cost). Table 7 shows the CPDs for node B15 (realizing business strategies) under cost factor C7 (training cost). And Table 8 shows the CPDs for node B16 (gaining and sustaining competitive advantages) under cost factor C2 (expense of maintaining e-service). Take Table 5 as an example, benefit factor ‘Gathering information to create customer profiles’ is effected by cost factor ‘Internet connection cost’. As a result, more investments in ‘Internet connection’ may benefit businesses in ‘Creating customer profiles’.

4621

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625 Table 3 The conditional probabilities for node B5 Pr(B5/C1, C4) C1 = 1 C1 = 2 C1 = 3 C1 = 4 C1 = 5 C1 = 1 C1 = 2 C1 = 3 C1 = 4 C1 = 5 C1 = 1 C1 = 2 C1 = 3 C1 = 4 C1 = 5 C1 = 1 C1 = 3 C1 = 4 C1 = 4 C1 = 5 C1 = 1 C1 = 2 C1 = 3 C1 = 4 C1 = 5

C4 = 1 C4 = 1 C4 = 1 C4 = 1 C4 = 1 C4 = 2 C4 = 2 C4 = 1 C4 = 1 C4 = 1 C4 = 3 C4 = 3 C4 = 3 C4 = 3 C4 = 3 C4 = 4 C4 = 4 C4 = 4 C4 = 4 C4 = 4 C4 = 5 C4 = 5 C4 = 5 C4 = 5 C4 = 5

Table 6 The conditional probabilities for node B13

B5 = 1

B5 = 2

B5 = 3

B5 = 4

B5 = 5

Pr(B13/C2, C5)

B5 = 1

B5 = 2

B5 = 3

B5 = 4

B5 = 5

0.2495 0.2000 0.0039 0.2000 0.2000 0.2000 0.2000 0.0077 0.0077 0.2000 0.0077 0.0077 0.2495 0.1669 0.3316 0.2000 0.2000 0.0026 0.2852 0.0026 0.0039 0.2000 0.2000 0.6605 0.0039

0.0020 0.2000 0.4941 0.2000 0.2000 0.2000 0.0016 0.0077 0.9692 0.2000 0.0077 0.0077 0.4970 0.1669 0.0026 0.2000 0.2000 0.3316 0.1432 0.6605 0.0039 0.2000 0.2000 0.0026 0.4941

0.2495 0.2000 0.0039 0.2000 0.2000 0.2000 0.3984 0.0077 0.0077 0.2000 0.0077 0.9692 0.0020 0.3325 0.3316 0.2000 0.2000 0.0026 0.2852 0.0026 0.4941 0.2000 0.2000 0.0026 0.0039

0.0020 0.2000 0.4941 0.2000 0.2000 0.2000 0.2000 0.9692 0.0077 0.2000 0.0077 0.0077 0.0020 0.1669 0.3316 0.2000 0.2000 0.3316 0.1432 0.3316 0.4941 0.2000 0.2000 0.3316 0.4941

0.4970 0.2000 0.0039 0.2000 0.2000 0.2000 0.2000 0.0077 0.0077 0.2000 0.9692 0.0077 0.2495 0.1669 0.0026 0.2000 0.2000 0.3316 0.1432 0.0026 0.0039 0.2000 0.2000 0.0026 0.0039

C2 = 1 C2 = 2 C2 = 3 C2 = 4 C2 = 5 C2 = 1 C2 = 2 C2 = 3 C2 = 4 C2 = 5 C2 = 1 C2 = 2 C2 = 3 C2 = 4 C2 = 5 C2 = 1 C2 = 3 C2 = 4 C2 = 4 C2 = 5 C2 = 1 C2 = 2 C2 = 3 C2 = 4 C2 = 5

0.0026 0.0026 0.0077 0.2000 0.2000 0.0077 0.4970 0.0039 0.0039 0.2000 0.0077 0.0077 0.2495 0.1669 0.9692 0.0077 0.0039 0.3316 0.0020 0.0039 0.2000 0.2000 0.0039 0.2000 0.2000

0.3316 0.0026 0.0077 0.2000 0.2000 0.0077 0.0020 0.0039 0.0039 0.2000 0.9692 0.0077 0.2495 0.1669 0.0077 0.0077 0.4941 0.3316 0.2495 0.0039 0.2000 0.2000 0.0039 0.2000 0.2000

0.3316 0.3316 0.9692 0.2000 0.2000 0.9692 0.0020 0.4941 0.4941 0.2000 0.0077 0.9692 0.2495 0.3325 0.0077 0.0077 0.0039 0.3316 0.2495 0.4941 0.2000 0.2000 0.9843 0.0016 0.2000

0.3316 0.6605 0.0077 0.2000 0.2000 0.0077 0.4970 0.0039 0.4941 0.2000 0.0077 0.0077 0.0020 0.1669 0.0077 0.9692 0.4941 0.0026 0.4970 0.0039 0.2000 0.2000 0.0039 0.3984 0.2000

0.0026 0.0026 0.0077 0.2000 0.2000 0.0077 0.0020 0.4941 0.0039 0.2000 0.0077 0.0077 0.2495 0.1669 0.0077 0.0077 0.0039 0.0026 0.0020 0.4941 0.2000 0.2000 0.0039 0.2000 0.2000

Table 4 The conditional probabilities for node B7

C5 = 1 C5 = 1 C5 = 1 C5 = 1 C5 = 1 C5 = 2 C5 = 2 C5 = 2 C5 = 2 C5 = 2 C5 = 3 C5 = 3 C5 = 3 C5 = 3 C5 = 3 C5 = 4 C5 = 4 C5 = 4 C5 = 4 C5 = 4 C5 = 5 C5 = 5 C5 = 5 C5 = 5 C5 = 5

Table 7 The conditional probabilities for node B15

Pr(B7/C2)

B7 = 1

B7 = 2

B7 = 3

B7 = 4

B7 = 5

Pr(B15/C7)

B15 = 1

B15 = 2

B15 = 3

B15 = 4

B15 = 5

C2 = 1 C2 = 2 C2 = 3 C2 = 4 C2 = 5

0.0065 0.1020 0.1672 0.1767 0.0125

0.3290 0.0039 0.2492 0.0605 0.3250

0.0065 0.2980 0.4131 0.4093 0.3250

0.4903 0.4941 0.0033 0.2930 0.0125

0.1677 0.1020 0.1672 0.0605 0.3250

C7 = 1 C7 = 2 C7 = 3 C7 = 4 C7 = 5

0.0043 0.1130 0.0542 0.1130 0.0182

0.1130 0.0043 0.2625 0.1130 0.0182

0.3304 0.5478 0.3667 0.4391 0.0182

0.4391 0.1130 0.2104 0.3304 0.4727

0.1130 0.2217 0.1063 0.0043 0.4727

Table 5 The conditional probabilities for node B11

Table 8 The conditional probabilities for node B16

Pr(B11/C3)

B11 = 1

B11 = 2

B11 = 3

B11 = 4

B11 = 5

Pr(B16/C2)

B16 = 1

B16 = 2

B16 = 3

B16 = 4

B16 = 5

C3 = 1 C3 = 2 C3 = 3 C3 = 4 C3 = 5

0.0684 0.2980 0.3316 0.1444 0.0333

0.1342 0.0039 0.2658 0.1444 0.0333

0.4632 0.3961 0.0684 0.1444 0.8667

0.2658 0.2000 0.2658 0.4222 0.0333

0.0684 0.1020 0.0684 0.1444 0.0333

C2 = 1 C2 = 2 C2 = 3 C2 = 4 C2 = 5

0.0065 0.2000 0.0033 0.0023 0.0125

0.1677 0.0039 0.3311 0.1186 0.0125

0.3290 0.2000 0.3311 0.3512 0.6375

0.3290 0.2000 0.1672 0.2930 0.3250

0.1677 0.3961 0.1672 0.2349 0.0125

However, through observing these results listed in Tables 3–8, we can find that the relationships among these cost and benefit factor nodes are hardly in a linear form. Therefore it is not very effective and suitable to express these relationships by traditional linear regression methods. The method of conditional probabilities could be more suitable to use in this situation. 4.3. Inference Now, we have created a cost-benefit factor-relation Bayesian network with both its structure and all conditional probabilities defined for its nodes. We can use it to conduct inference to the relationships among these identified cost factors and benefit factors. The inference process can be handled by fixing the states of observed variables, and then propagating the beliefs around the network until all the beliefs (in the form of conditional probabilities) are consistent. Finally, the desired probability distributions can be shown in the network. There are a number of algorithms used to conduct inference in Bayesian networks, which have different tradeoffs between speed,

complexity, generality, and accuracy. Junction-tree algorithm, proposed by Lauritzen and Spiegelhalter (1988), is one of the most popular algorithms. This algorithm uses an auxiliary data structure, called a junction tree, and completes deep analysis of the connections between graph theory and probability theory. As it is suitable for middle and small size of samples, this study applied the Junction-tree algorithm for conducting inference. The Junction-tree algorithm computes joint distribution for each maximal clique in a decomposable graph. It contains three main steps: construction, initialization, and message passing or propagation. The step of construction is to convert a Bayesian network to a junction tree. The junction tree is then initialized so that to provide a localized representation for the overall distribution. After the initialization, the junction tree can receive evidences, which consists of asserting some variables to specific states. Based on the evidences obtained for the factor nodes, we can conduct inference by using the established Bayesian network to analyse intensive and find valuable relationships between cost factors Ci (i = 1, 2, . . . , 8) and benefit factors Bj (j = 1, 2, 5, 7, 9, 10, 11, 13, 14,

4622

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625

15, 16) in this study. Table 9 shows the marginal probabilities of all nodes in the Bayesian network.

Table 10 Probabilities of the nodes when C2 (maintaining e-service) = 4 (high) 1

2

3

4

5

5. Result analysis

State Pr( ) Node C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11 B13 B14 B15 B16

0.1469

0.1265

0.2082

0.1673

0.3102 0.1265 0.1469 0.1878 0.1878 0.1673 0.1413 0.0023 0.1729 0.1767 0.1469 0.1469 0.2082 0.1050 0.0030 0.0653 0.0023

0.2082 0.1469 0.1878 0.2490 0.1878 0.1469 0.1684 0.1186 0.2388 0.0605 0.2082 0.1878 0.1469 0.1666 0.0517 0.1469 0.1186

0.3102 0.3102 0.2694 0.3102 0.3918 0.2286 0.2603 0.5256 0.1930 0.4093 0.2898 0.3510 0.2898 0.2741 0.2290 0.3918 0.3512

0.3510 1 0.1469 0.2694 0.2490 0.1673 0.1878 0.3918 0.2273 0.1767 0.2201 0.2930 0.1878 0.1469 0.2694 0.3494 0.3667 0.2694 0.2930

Over all inference results obtained through running the Junction-tree algorithm, five main significant results (Results 1–5) are particularly discussed in the paper. These results are under the evidences that the factor node is with a ‘high’ value. For the other situations, such as under the evidence that the node value is ‘low’, the similar results have been obtained. Result 1. Assuming the cost factor C2 (maintaining e-service) = 4 (high), we have got the probabilities of the other factor nodes under the evidence. The result is shown in Table 10. The result is also drawn in Fig. 5 for further explanation. We can find that when the value of C2 (maintaining e-service) is ‘high’ (= 4), the probability of a high B13 (cooperation between companies to increase services) has increased from 0.2427 to 0.3494. This increase indicates that C2 and B13 are correlated to some extent, that is, a company’ high investments in maintaining its e-service system tends to obviously increase the cooperation with other companies in related services. It is also found that the probability of a high B14 (enhancing perceived company image) has increased its value from 0.2670 to 0.3667, and B16 (gaining and sustaining competitive advantages) has increased from 0.2490 to 0.2930. These results mean that a business’ high investments in its e-service maintenance will also bring a significant enhancement of its company image (B14) and competitive advantages (B16). Result 2. Assuming C3 (internet connection cost) = 4 (high), we can get the probabilities of the other nodes under the evidence (Table 11). Fig. 6 also shows the result. When the value of internet connection cost is ‘high’, the probability of a high B11 (Gathering information to create customer profiles) increases from 0.2694 to 0.4222. This fact suggests that internet connection cost has an obviously effect on the creation of customer profiles, that is, high internet connection investments can bring high benefit to the businesses in the establishment of their customer profiles. Result 3. When C4 (hardware/software cost) = 4 (high), we can get the probabilities of the other nodes under the evidence (Table 12).

Table 9 Marginal probabilities of all nodes in the cost-benefit Bayesian network State Pr(node = state) Node

1

2

3

4

5

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11 B13 B14 B15 B16

0.1469 0.1265 0.3102 0.1265 0.1469 0.1878 0.1878 0.1673 0.1413 0.0449 0.1729 0.1265 0.1469 0.1469 0.2082 0.1289 0.0412 0.0653 0.0449

0.1265 0.2082 0.2082 0.1469 0.1878 0.2490 0.1878 0.1469 0.1684 0.1061 0.2388 0.1469 0.2082 0.1878 0.1469 0.1785 0.0948 0.1469 0.1469

0.2082 0.2490 0.3102 0.3102 0.2694 0.3102 0.3918 0.2286 0.2603 0.4327 0.1930 0.3306 0.2898 0.3510 0.2898 0.3468 0.2403 0.3918 0.3306

0.3510 0.3510 0.1469 0.2694 0.2490 0.1673 0.1878 0.3918 0.2273 0.2490 0.2201 0.2694 0.1878 0.1469 0.2694 0.2427 0.2670 0.2694 0.2490

0.1673 0.0653 0.0245 0.1469 0.1469 0.0857 0.0449 0.0653 0.2027 0.1673 0.1753 0.1265 0.1673 0.1673 0.0857 0.1031 0.3566 0.1265 0.2286

0.0245 0.1469 0.1469 0.0857 0.0449 0.0653 0.2027 0.1767 0.1753 0.0605 0.1673 0.1673 0.0857 0.1050 0.3496 0.1265 0.2349

1 0.8 0.6 0.4 0.2 0 C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11B13B14B15B16 prior probability

posterior probability

Fig. 5. Prior and posterior probability when C2 (maintaining e-service) = 4 (high).

Table 11 Probabilities of the nodes when C3 (internet connection cost) = 4 (high) State Pr( ) Node

1

2

3

4

5

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11 B13 B14 B15 B16

0.1469 0.1265 0 0.1265 0.1469 0.1878 0.1878 0.1673 0.1413 0.0449 0.1729 0.1265 0.1469 0.1469 0.1444 0.1289 0.0412 0.0653 0.0449

0.1265 0.2082 0 0.1469 0.1878 0.2490 0.1878 0.1469 0.1684 0.1061 0.2388 0.1469 0.2082 0.1878 0.1444 0.1785 0.0948 0.1469 0.1469

0.2082 0.2490 0 0.3102 0.2694 0.3102 0.3918 0.2286 0.2603 0.4327 0.1930 0.3306 0.2898 0.3510 0.1444 0.3468 0.2403 0.3918 0.3306

0.3510 0.3510 1 0.2694 0.2490 0.1673 0.1878 0.3918 0.2273 0.2490 0.2201 0.2694 0.1878 0.1469 0.4222 0.2427 0.2670 0.2694 0.2490

0.1673 0.0653 0 0.1469 0.1469 0.0857 0.0449 0.0653 0.2027 0.1673 0.1753 0.1265 0.1673 0.1673 0.1444 0.1031 0.3566 0.1265 0.2286

Fig. 7 further shows the effect of observing when the value of hardware/software cost is ‘high’. The probability of a high B5 (global presence) has increased from 0.2201 to 0.2295, which suggests

4623

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 0

0

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10B11B13B14B15B16

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10B11B13B14B15B16 prior probability

prior probability

posterior probability

posterior probability

Fig. 7. Prior probability and posterior probability when C4 = 4 (high).

Fig. 6. Prior probability and posterior probability when C3 (internet connection cost) = 4 (high).

Table 13 Probabilities of the nodes when C5 (security concerns cost) = 4 (high)

Table 12 Probabilities of the nodes when C4 = 4 (high) State Pr( ) Node

1

2

3

4

5

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11 B13 B14 B15 B16

0.1469 0.1265 0.3102 0 0.1469 0.1878 0.1878 0.1673 0.1413 0.0449 0.1558 0.1265 0.3061 0.1469 0.2082 0.1289 0.0412 0.0653 0.0449

0.1265 0.2082 0.2082 0 0.1878 0.2490 0.1878 0.1469 0.1684 0.1061 0.2845 0.1469 0.0030 0.1878 0.1469 0.1785 0.0948 0.1469 0.1469

0.2082 0.2490 0.3102 0 0.2694 0.3102 0.3918 0.2286 0.2603 0.4327 0.1558 0.3306 0.3818 0.3510 0.2898 0.3468 0.2403 0.3918 0.3306

0.3510 0.3510 0.1469 1 0.2490 0.1673 0.1878 0.3918 0.2273 0.2490 0.2295 0.2694 0.1545 0.1469 0.2694 0.2427 0.2670 0.2694 0.2490

0.1673 0.0653 0.0245 0 0.1469 0.0857 0.0449 0.0653 0.2027 0.1673 0.1744 0.1265 0.1545 0.1673 0.0857 0.1031 0.3566 0.1265 0.2286

State Pr( ) Node

1

2

3

4

5

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11 B13 B14 B15 B16

0.1469 0.1265 0.3102 0.1265 0 0.1878 0.1878 0.1673 0.1238 0.0449 0.1729 0.1265 0.1469 0.1469 0.2082 0.0853 0.0412 0.0653 0.0449

0.1265 0.2082 0.2082 0.1469 0 0.2490 0.1878 0.1469 0.1437 0.1061 0.2388 0.1469 0.2082 0.1878 0.1469 0.2742 0.0948 0.1469 0.1469

0.2082 0.2490 0.3102 0.3102 0 0.3102 0.3918 0.2286 0.2557 0.4327 0.1930 0.3306 0.2898 0.3510 0.2898 0.2042 0.2403 0.3918 0.3306

0.3510 0.3510 0.1469 0.2694 1 0.1673 0.1878 0.3918 0.2708 0.2490 0.2201 0.2694 0.1878 0.1469 0.2694 0.4009 0.2670 0.2694 0.2490

0.1673 0.0653 0.0245 0.1469 0 0.0857 0.0449 0.0653 0.2061 0.1673 0.1753 0.1265 0.1673 0.1673 0.0857 0.0354 0.3566 0.1265 0.2286

1 0.8

that high investments of a business in e-service hardware and software can significantly improve the business’ global presence.

0.6 0.4

Result 4. When C5 (security concerns cost) = 4 (high), we can get the probabilities of the other nodes under the evidence (Table 13). Fig. 8 further shows that when the value of security cost is ‘high’, the probability of a high B1 (building customer relationships) has increased from 0.2273 to 0.2708, and a high B13 (cooperation between companies to increase services) has increased from 0.2427 to 0.4009. This result indicates that security issue in e-service systems has obvious effect on building customer relationship and improving cooperation between companies. That is, high security related investments in e-services can bring high benefits to the businesses in both building their customer relationships and their cooperation with other companies (see Fig. 9).

0.2 0 C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10B11B13 B14B15 B16 prior probability

posterior probability

Fig. 8. Prior probability and posterior probability when C5 (security concerns cost) = 4 (high).

1 0.8 0.6

Result 5. Table 14 shows the probabilities of all the other nodes under the evidence C7 (training cost) = 4 (high). Fig. 6 shows the effect of observing when the value of training cost is ‘high’. The probability of a high B15 (realizing business strategies) has increased from 0.2694 to 0.3304, suggesting that high investments in training staff to better work with e-service systems will significantly contribute to the realization of current business strategies. These inference results were obtained by running the Junction-tree algorithm under the evidences collected through our survey. These results identified and explored some valuable relationships between some cost factors and benefit factors in the application

0.4 0.2 0 C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10B11B13B14B15B16 prior probability

posterior probability

Fig. 9. Prior probability and posterior probability when C7 (training cost) = 4 (high).

of e-service systems. More results will be discussed in the following section.

4624

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625

Table 14 Probabilities of the nodes when C7 (training cost) = 4 (high) State Pr( ) Node

1

2

3

4

5

C1 C2 C3 C4 C5 C6 C7 C8 B1 B2 B5 B7 B9 B10 B11 B13 B14 B15 B16

0.1469 0.1265 0.3102 0.1265 0.1469 0.1878 0 0.1673 0.1547 0.0449 0.1729 0.1265 0.1469 0.2217 0.2082 0.1289 0.0412 0.1130 0.0449

0.1265 0.2082 0.2082 0.1469 0.1878 0.2490 0 0.1469 0.1408 0.1061 0.2388 0.1469 0.2082 0.0043 0.1469 0.1785 0.0948 0.1130 0.1469

0.2082 0.2490 0.3102 0.3102 0.2694 0.3102 0 0.2286 0.3343 0.4327 0.1930 0.3306 0.2898 0.5478 0.2898 0.3468 0.2403 0.4391 0.3306

0.3510 0.3510 0.1469 0.2694 0.2490 0.1673 1 0.3918 0.2230 0.2490 0.2201 0.2694 0.1878 0.1130 0.2694 0.2427 0.2670 0.3304 0.2490

0.1673 0.0653 0.0245 0.1469 0.1469 0.0857 0 0.0653 0.1471 0.1673 0.1753 0.1265 0.1673 0.1130 0.0857 0.1031 0.3566 0.0043 0.2286

technologies. If the case is to build better customer relationships through the development of e-services, an appropriate way is to increase investments in system security, legal issues and staff training rather than other items. Therefore, these findings will provide great practical recommendations to e-service providers when they develop their business strategies to reduce current e-service costs, increase e-service benefits, and enhance e-service functionality. These findings can also directly help e-service system developers to make their proposals when designing new applications. There are two limitations within this study. One is that the data used in this study is limited to e-service businesses in New South Wales, Australia, and collected from a small number of businesses. Another is that this study has only identified and analyzed the relationships between cost and benefit factors. As a further study, we will address the relationships between cost, benefit, and customer satisfaction for e-services by using the Bayesian network techniques. Also, the relationships, both positive and negative, between the functionality provided in e-service systems and customer satisfactions will be explored to help business find more effective ways to provide personalised e-service functions. Acknowledgement

6. Conclusions and further study By applying Bayesian network techniques this study explored and verified a set of relationships between cost factors and benefit factors in the application of e-service systems. A cost-benefit factor-relation model proposed in our previous study was considered as domain knowledge and the data collected through a survey was as evidences to conduct the inference-based verification. Through calculating CPDs among these cost and benefit factors, we found that certain cost factors are more important than others to achieving certain aspects of benefits: (1) Comparing with other cost items, increased investments in maintaining e-service systems would significantly contribute to three benefit aspects of businesses, cooperation with other companies in related services, company image and competitive advantages. (2) Comparing with other cost items, increased investments in Internet connection would significantly help businesses in the establishment of customer profiles. (3) Comparing with other cost items, increased investments in e-service hardware and software would significantly improve a business’ global presence. (4) Comparing with other cost items, high security related investments in e-service systems would bring high benefits to businesses in both building customer relationships and cooperation with other companies. (5) Comparing with other cost items, increased investments in training staff to better work in e-service systems would significantly contribute to the realization of current business strategies. More results include (6) Comparing with other cost items, more concern on legal issue within e-service applications would significantly contribute to businesses in the establishment of their customer relationships. (7) Comparing with other cost items, increased investments in rapid technology change in e-service systems would significantly enhance a business’ image. Based on these findings, if a company plans to improve the perceived company image through an e-service application it would be appropriate for the company to have considerable investments in maintaining e-service systems and updating related information

This research is partially supported by Australian Research Council (ARC) under discovery Grants DP0557154 and DP0559213. References Amir, Y., Awerbuch, B., & Borgstrom, R. S. (2000). A cost-benefit framework for online management of a metacomputing system. Decision Support Systems, 28(1-2), 155–164. Breese, J., & Blake, R. (1995). Automating computer bottleneck detection with belief nets. In Proceedings of the conference on uncertainty in artificial intelligence (pp. 36-45), Morgan Kaufmann, San Francisco, CA. Chidambaram, L. (2001). The editor’s column: Why e-service journal. e-Service Journal, 1(1), 1–3. Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4), 309–347. DeLone, H. W., & McLean, R. E. (2004). Measuring e-commerce success: Applying information systems success model. International Journal of Electronic Commerce, 9(1), 31. Drinjak J., Altmann G., & Joyce P. (2001), Justifying investments in electronic commerce. In Proceedings of the twelfth Australia conference on information systems (Vols. 4–7, pp. 187–198), December 2001, Coffs Habour, Australia. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (1995). Bayesian data analysis. Boca Raton: Chapman and Hall. Giaglis, G. M., Paul, R. J., & Doukidis, G. I. (1999). Dynamic modelling to assess the business value of electronic commerce. International Journal of Electronic Commerce, 3(3), 35–51. Hahn, J., & Kauffman, R. J. (2002). Evaluating selling web site performance from a business value perspective. In Proceedings of international conference on eBusiness, May 23–26, 2002 (pp. 435–443), Beijing, China. Heckerman, D. (1990). An empirical comparison of three inference methods. In R. Shachter, T. Levitt, L. Kanal, & J. Lemmer (Eds.), Uncertainty in artificial intelligence (pp. 283–302). New York: North-Holland. Heckerman, D. (1996). A tutorial on learning Bayesian networks. Technical Report MSRTR-95-06, Microsoft Research. Heckerman, D. (1997). Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1(1), 79–119. Heckerman, D., Mamdani, A., & Wellman, M. (1995). Real-world applications of Bayesian networks. Communications of the ACM, 38(3), 25–26. Jensen, F. V. (1996). An introduction to Bayesian networks. UCL Press. Lauritzen, S., & Spiegelhalter, D. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society Series B, 50(2), 157–224. Lee, C., Seddon, P., & Corbitt, B. (1999). Evaluating business value of internet-based business-to-business electronic commerce. In Proceedings of 10th Australia conference on information systems (pp. 508–519). Lin, C. (2003). A critical appraisal of customer satisfaction and e-commerce. Managerial Auditing Journal, 18(3), 202–212. Lu, J., Tang, S., & McCullough, G. (2001). An assessment for internet-based electronic commerce development in businesses of New Zealand. Electronic Markets: International Journal of Electronic Commerce and Business Media, 11(2), 107–115. Lu, J., & Zhang, G. Q. (2003). Cost benefit factor analysis in e-services. International Journal of Service Industry Management (IJSIM), 14(5), 570–595. Myllymaki, P. (2002). Advantages of Bayesian networks in data mining and knowledge discovery. http://www.bayesit.com/docs/advantages.html.

J. Lu et al. / Expert Systems with Applications 36 (2009) 4617–4625 Ng, H., Pan, Y. J., & Wilson, T. D. (1998). Business use of the world wide web: A report on further investigations. International Journal of Management, 18(5), 291–314. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference, Morgan Kauffman, Palo Alto, CA. Piris, L., Fitzgerald, G., & Serrano, A. (2004). Strategic motivators and expected benefits from e-commerce in traditional organizations. International Journal of Information Management, 24, 489–506. Smith, K. (2001).Applying evaluation criteria to New Zealand government websites. International Journal of Information Management, 21, 137–149.

4625

Srinivasan, S., Anderson, R., & Ponnavolu, K. (2002). Customer loyalty in ecommerce: An exploration of its antecedents and consequences. Journal of Retailing, 78, 41–50. Wade, R. M., & Nevo, S. (2005). Development and validation of a perceptual instrument to measure e-commerce performance. International Journal of Electronic Commerce, 10(2), 123. Zhang, P., & von Dran, G. (2000). Satisfiers and dissatisfiers: A two-factor model for website design and evaluation. Journal of American Association for Information Science (JASIS), 51(14), 1253–1268.