Available online at www.sciencedirect.com
Expert Systems with Applications Expert Systems with Applications 34 (2008) 1618–1629 www.elsevier.com/locate/eswa
Developing an intelligent web information system for minimizing information gap in government agencies and public institutions Tae Hyun Kim a
a,*
, Gye Hang Hong
b,1
, Sang Chan Park
a,2
Department of Industrial Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea b Dongbu CNI, #28th Floor, Dongbu Financial Center, 891-10 Daechi-dong, Gangnam-gu, Seoul 135-523, Republic of Korea
Abstract A purpose of a web information service in government agencies and public institutions is in providing various kinds of public information to support decision-making of people. However, people have a difference of information access environment, ability to understand information and information pursuit desire, and so on, so that they have the information gap which affects profit gaps among them. Therefore, we suggest an intelligent web information system for minimizing information gap in government agencies and public institutions so that disadvantaged people can understand web contents and they make the more profit in their economic behaviors. In order to remove information gap, various and useful contents should be designed and managed so that disadvantaged people can understand easily and utilize beneficently. The government agencies and public institutions should provide the disadvantaged people that are experiencing the problem of information gap with the requisite web contents that can be understood by them. This paper discusses the desirable web information system in the government agencies and public institutions based on web intelligence technology such as web mining and web personalization tools for the providing the class of information weakness with the requisite web contents and supporting their successful practical use. We show application of the web information system to Ministry of Agriculture & Forestry (MAF) in Korea. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Electronic government; Information gap; Data mining; Personalization
1. Introduction In this paper, we suggest an intelligent web information system for minimizing information gap in government agencies and public institutions delivering personalized web contents which disadvantaged people can understand and from which they make the more profit in their eco-
*
Corresponding author. Tel.: +82 42 869 2960; fax: +82 42 869 3110. E-mail addresses:
[email protected] (T.H. Kim), kaistduck@ dongbu.com (G.H. Hong),
[email protected] (S.C. Park). 1 Tel.: +82 2 3011 5648; fax: +82 2 3011 5551. 2 Tel.: +82 42 869 2920; fax: +82 42 869 3110. 0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.01.041
nomic behaviors. For developing the system, we identify disadvantaged people having a lot of total losses and a high probability for loss per transaction through analyzing transaction data of all markets. Then we identify the difference of information gap between disadvantaged people and the other advantaged people, and redesign the contents of web pages for disadvantaged people to make good a gap of information and to understand it easily. As one of means for the electronic government embodiment, the website construction of public sectors such as government agencies and public institutions/enterprises have been actively propelled. Electronic government construction aims at providing citizen with service quickly and accurately, effectiveness of government work, innovation by
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
redesign work process, and raising national competitiveness by improving productivity (Lee & Jung, 2004). Since e-government (electronic government) firstly appears in the United States of America in 1993, today’s several countries in the world, including South Korea, are propelling the e-government building projects which actively take advantage of Information-Communication technologies in government administration. Those projects are seeking the efficiency elevation of government administration and its customer-oriented service through the use of those technologies. For implementing e-government, the website construction of government agencies and public institutions is actively propelled. It provides civilians with various information and service and plays the most important means in the collecting public opinion. The website of government agencies and public institutions ultimately differs with the private sector’s website in its seeking the public tendency and equability. That is, it is constructed and operated to realize the public benefit and equability. They provide interactive electronic administration service, act as the passage of administrative information spread and communication, public opinion of citizens and its reflection. They also raise the transparency of administration and perform the activation function of country and local economy. According as electronic administration and public information service are expanded because of e-government implementation, the information gap problem is happened in case of difficult access about that service. In other words, government agencies and public institutions should provide public information so that all people can easily access electronic administration and public information service and take advantage of necessary information regardless of the economical, physically handicap and region difference. However, in case the universality of access, use and benefit about this public information service is not secured, the information gap is happened. It is mainly considered that the causes of information gap are the differences of information access environment, ability to understand information and information pursuit desire, and so on, which come from the economical, education level, physically handicap and regional difference. When the contents that offer in the website are hard to understand and not utilized beneficently, it is recently indicated that more serious information gap can be produced by combining with those causes such as the economical, education level, physically handicap and regional difference. In order to remove information gap, various and useful contents should be designed and managed so that disadvantaged people can understand easily and utilize beneficently. Therefore, the government agencies and public institutions should provide the disadvantaged people that are experiencing the problem of information gap with the requisite web contents that can be understood by them. This paper discusses the desirable web information system in the government agencies and public institutions
1619
based on web intelligence technology such as web mining and web personalization tools for the providing the class of information weakness with the requisite web contents and supporting their successful practical use. We show application of the web information system to Ministry of Agriculture & Forestry (MAF) in Korea. The application is designed for supporting farmers’ decision-making because difference of information access environments among them is very high, so information gaps among them are very serious. As a result of the information gap, they have a high difference of profit in spite of selling the products having the same quality. 2. Web information system in MAF and research background In this section, we describe research background of egovernment and then a current public institution’s web information system in MAF. Lastly, we present some issues which we will solve for developing intelligent web information system in MAF. 2.1. Research background E-government is defined as an application of IT to government service. E-government is a global phenomenon and public servants around the world are adopting novel ways to leverage IT to better serve their constituents. The e-government services can be divided into three categories: transaction, citizen participation, and access to information (Marchionini, Sanan, & Brabdt, 2003). The access to information of these categories is the most common e-government application. Our e-government project also is included in the research field. A purpose of the service is in providing various kinds of public information to support decision-making of various users such as researchers, public servants in a web information system government agency and in other government agency, citizens and so on. Because various people have different education level, information pursuit desire, etc, access to the pubic information, government agencies should collect various data, transform the data into various kinds of information, and deliver the information to them. Therefore, it is very important for government agencies to consider how to integrate vast volumes of data managed in heterogeneous system and how to manage the data effectively (Bouguettaya, Ouzzani, & Cameron, 2001). They should identify users’ needs and determine what kinds of web contents they may serve for satisfying various needs of users. Lastly, they should consider how to design and deliver the web contents which users want and can understand. We can consider two technologies to solve the problems: web pages design and personalization. Marchionini and Levi (2003) suggested design issues in their BLS (the Bureau of Labor Statistics) project. They introduced four design philosophies: Use a highly structured, information intensive display, use dynamic
1620
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
representations and control mechanisms, use end-user vocabulary and use multiple representations and views. That is, they emphasized a design method in which we can contain maximum information in minimum pages and can view one kind of information into various different types of information for users to understand. A personalization is that information about the specific user is used to develop an electronic profile based on different types of user-specific information. So, users can have benefit for reducing searching time of information. The personalization is mainly used to private-sector information and service providers such as Yahoo and Amazon.Com. However, pubic sectors such as government agency and public institute don’t apply the technology to their web system because of legal restrictions regarding the use of personal information and privacy. Increasing the user demands can overcome the legal restrictions and make the technology to apply to their system (Hinnant & O’Looney, 2003; Medjahed, Rezgui, Bouquettaya, & Ouzzani, 2003). Therefore, the public sector system must deliver the beneficial information considering user level, so that users will make the more profits by referring the information to decision-making and they will trust the public sector system. 2.2. Web information system in MAF A current public institution’s web information system in MAF was made for supplying beneficial agriculture information such as shipping information connected with the distribution of farm products and culture information of farm village life in order to contribute in the development of agriculture farm village and agriculturist welfare.
As shown in Fig. 1, MAF investigates transaction data in markets and transmits the data to main transaction data server everyday. The transaction data consists of market name, buyer name, supplier name, product name, quality, quantity and price. The data are summarized into many kinds of information in which people are interested. For example, any people may have an interest in the information summarized according to two dimensions of month and city and other people may have an interest in the information summarized according to three dimensions of week, market and item. So, we find feasible dimensions and summarize data according to feasible combinations of the dimensions. The information is built to data warehouse and On-Line Analytical Processing (OLAP). It is delivered to farmers by web pages. 2.3. Defining issues We considered the following three issues about the intelligent web information system that can bring user such as farmer overall profit improvement through the use of that system. First, people use a little kinds of information in spite of being delivered seventy kinds of information or more. It is why many kinds of information are not useful for their decision-making or designed for people not to understand easily. Because it is web design issue, we should design web contents which contain maximum information in minimum pages and we should suggest a method in which we can make one kind of information into various different types of information considering user’s ability to understand.
Ministry of Agriculture and Forestry (MAF) Market 1
People Farmers
Transaction Data Server
Investigating transaction data (price, quantity, quality, etc.) in all market
Merchants Region
Customers
. . .
Market
Market n
Summarizing data and building data warehouse
Time
Web Pages
Market 2
Summarized Data : Price, Quantity
Fig. 1. Public institution’s web system in MAF subsidiary.
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
Second, users have a difference of ability to understand the same information due to difference of a standard of education, experience, etc. Therefore, every people refers different kinds of information for his decision-making and makes different profit or loss in the result. We suggest a method in which we deliver valuable web information to improve profits of disadvantaged users. Third, the value of information can be changed over the period in time because market environment is changed. However, many users do not know the changed value of information. They may not be adapted to changed market environment and make many losses in market. They may not trust in the system and allow use of personal information in the result. Therefore, we suggest a method in which we monitor changes of market environment, identify new important information, and deliver the valuable information. 3. Intelligent web information system In this section, we describe framework of an intelligent web information system (IWIS) in government and all modules of the IWIS. Firstly, we define main words as Han and Kamber (2001): – Cuboid: a combination of dimensions. It represents a different degree of summarization. – Number of dimensions of a cuboid: it represents complexity level of a kind of information. – Concept level of a dimension: summary level of a dimension in a cuboid. – Summarization schema: cuboid and concept levels of dimensions of the cuboid.
Various kinds of Web information
Make a decision
Web Service
Web log & Web pages
1621
As shown in Fig. 2, users refer various kinds of web information delivered in IWIS and then they decide an item, its quantity, a market in which they sell it. We collect users’ transactions in markets, evaluate users’ decision-making in the profit aspect and then segment users into advantaged group and disadvantaged group. We define advantaged group as the group which make more profit than average value of users’ profits in markets. Next, we identify the difference of used web information between two groups. That is, we find which web pages mainly referred by users of each group, what features of information included in the web pages and ability to understand information by users. Lastly, we deliver the redesigned web pages to improve profit of disadvantaged group. The intelligent web information system is proposed to support disadvantaged group to improve their profits. As shown in Fig. 3, the intelligent web information system consists of two modules: Web-information creation module (WCM) and personalization module (PM). The WCM is the module in which we identify summarization schemas and decision variables for all clusters. And then it finds the best summarization schema and decision variables of those and redesigns web information with the best things. The WCM consists of three sub-modules: Monitoring change of market environment, Identifying key summarization schema and decision variable, and Redesigning web pages. Monitoring change of market environment is the module in which we find important market environment variables from main customers’ behaviors in market, compare those with the environment variables of previous period and make the next module to rework if two environments are different.
Redesign web information for improving profit of disadvantaged group
Identify the difference of used web information between advantaged group and disadvantaged group
advantaged group
Collect transaction data in markets & Segment users in the profit aspect
Markets Fig. 2. Framework of intelligent web information system.
disadvantaged group
1622
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
Web-information Creation Module (WCM) Monitoring change of market environment
Update
Market data
Identifying key decision variables and summarization schema
Redesigning web pages
Key summarization schema & Key decision variables of each cluster
Web pages repository
Users’ Behavior data
Web logs
Database Users’ behavior data
User log-in
New Comer?
Is the profit of a user improved?
No
No
Searching new web pages for each user
Yes
Yes Designed all web page
Web Service
Packaging personalized web pages
Personalization Module (PM) Fig. 3. Main modules of intelligent web information system.
Identifying key summarization schema and decision variable is the module in which we identify key summarization schema and decision variable for all cluster by analyzing web-log data and users’ behavior data. And then it finds the best summarization schema and decision variable of those for profitable decision-making. Redesigning web pages is the module in which we understand user’s usage ability of information. And then, it transforms the information summarized with best summarization schema and decision variables into the different type of information which users can understand. The PM identifies whether registered user or not. If a user is a new visitor, it deliveries all web information to find his behaviors of web usage and abilities to understand. If a user is a registered visitor, we deliver personalized web pages of the user. The PM monitors the user’s profit by analyzing on-line/off-line transaction data after using the web information. If a user didn’t improve his profit, the PM searches new type of web pages from web page repository, and deliveries the new packaged web-pages to the user. Till now, we described the two main modules of intelligent information system in government. In the next subsection, we will describe the sub-modules of WCM and designed methods of those specifically. 3.1. Identifying key summarization schema and decision variable The value of information can be different according to users’ understanding ability in the previous section. We
should evaluate the user’s behavior in market after referring information to measure the value of the information by user. To measure the value of information, we distinguish advantaged users from the other users. We transform users’ behavior data into the data of three factors: – Total profit (TP): Profit is defined as difference between user’s transaction price per one item and average transaction price of markets per one item. Total profit is obtained by summing the profit made from user’s all transactions during specific period. – Probability for profit (PFP): It is obtained by dividing the number of transaction in which user obtain profit by total transactions during specific period. – Probability for loss (PFL): It is obtained by dividing the number of transaction in which user obtain loss by total transactions during specific period. Then, we segment users into several clusters having similar features with self-organizing-map (SOM), which is one of the clustering methods (Kohonen, 1982). The SOM is an unsupervised learning schema to train the neural network. Unsupervised learning comprises those techniques for which the resulting actions or desired outputs form training sequences are not known. The network is only told the input vectors and then self-organizes these inputs into categories. As shown in Table 1, we can see that users of clusters 1 and 2 make more total profits than users of clusters 3
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
classification methods (Quinlan, 1993). Classification is the process which finds the features of a newly presented object in a database and assigns it to one of a set of predefined classes (Fig. 4). It discoveries key summarization schema and decision variables of the best cluster (e.g. cluster 1) of advantaged clusters and those of the other clusters as shown in Fig. 5. Firstly, we make a ‘user to web page’ matrix by analyzing web logs. It describes access counts by user and web page. Secondly, we transform the user to web page matrix into a different type of matrix represented by user and schema. And the new matrix has value of five categories (High, Above Average, Average, Below Average, and Low) transforming access counts. We should classify web pages into kinds of schema and then measure degree of web usage of the schemas. Each web page consists of summarization schema and decision variables as shown in Fig. 6. As shown in Fig. 6, a ‘webpage 05’ is made by summarizing transaction data according to two decision variables of price and quantity and two dimensions of time and market. If we find the web pages characterizing the best cluster and understand the schema and decision variables of the pages, we can identify important decision variables and how to summarize data of the decision variables for a good decision-making. Thirdly, we identify schema and decision variables of the web information characterizing each cluster. We use the C4.5 classification method which is one of classification methods and obtain decision tree as shown in Fig. 5. We can result from nodes of decision tree and conditions for
Table 1 Summary of cluster characteristics Cluster 1 2 3 4 Total average
Fraction of users (%)
Total profit
Probability for profit
Probability for loss
12.5 28.1 34.4 25.0
38,725 7082 25,213 33,404
0.86 0.63 0.63 0.33
0.04 0.33 0.39 0.94
100.0
3203
0.61
0.43
1623
and 4. Especially, users of cluster 1 have very higher probability for profit per transaction than that of users of other clusters. On the other hand, users of cluster 4 have very lower probability of profit per transaction and total profit. Users of cluster 3 have the almost same probability for profit with users of cluster 2, but they have lower total profit than users of cluster 2. They can improve their total profits if the users reduce average amount of loss per transaction. These differences among clusters are caused by users’ abilities to understand information and what kinds of information are referred for decision-making. So, we must find that they mainly use any web pages and the web pages are summarized with any summarization schema and decision variables. The WCM identifies important web pages which can distinguish clusters of advantaged users from the others of disadvantaged users. We use the C4.5 which is one of
Identifying key decision variables and summarization schema
Users’ Behavior data
Transforming the data into values of three factors: • TP (Total Profit) • PFP (Probability For Profit) • PFL (Probability For Loss)
Clustering
Several clusters having similar features
clusters & their features
Each user’s cluster number
Access degree of each user by web pages
Identify variables and summarization schema of cluster
Summarizing web logs occurred between user behaviors Web logs
Decision variables and summarization schema of cluster Environment variables of market Yes
Market environments are changed? Market data
Create the web pages which can compare alternatives of each cluster with alternatives of the best valuable cluster
Finding main customers of market in terms of RFM and identifying their characteristics
Web Pages repository
Monitoring change of market environment
Redesigning web pages
Fig. 4. Sub-processes of web information creation module.
1624
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
Web01 User01 User02 … User k
Web02
…
Web03
3
Web m
7 10
A01
A02
A03
…
Am
H L
L H
BA L
… …
L AA
Cluster Number 1 2
L
L
A
…
BL
4
User01 User02 … User k
transform 2 4
• H : High • AA : Above Average • A : Average • BA : Below Average • L : Low
Summarize access counts of web pages
Transform web counts into counts of summarization schema
A01 ≤ AA
> AA
Identify decision variables and summarization schema of each cluster
A04
Class 4
≤ AA
> AA
A15
A06
>A
>H
Class 1
Class 2
≤H
Decision variables and summarization schema of each cluster
Class 3
Fig. 5. Extracting summarization schema and decision variable.
Summarization Schema 1st cuboids Time
1 dimension Daily (A1)
1 concept hierarchy
n-th cuboids
2nd cuboids Summarization schema for Presenting cuboids and its concept level
Time
Weekly (A2)
Monthly (A3)
Market
2 dimension
Market A (A4)
2 concept hierarchy
Daily
Supplier
Weekly
Item
Weekly
Monthly
Market
Time
Supplier
Monthly
Item
Item
Market B (A5)
A Item
Market
d dimension
Market A (A m)
d concept hierarchy
Price Trend
05
06
07
08
……
Quantity Trend
……
Webpage m
Price
……
Webpage
Webpage 04
Webpage
Webpage 03
Webpage
Webpage 02
Webpage
Webpage 01
Quantity
Decision variables for decision-making Fig. 6. Schema and decision variables of web pages.
their branches as shown in Table 2. Users of cluster 1, which is the best advantaged cluster, can be characterized as the following: They have an ability to understand information summarized with four kinds of dimensions and consider the two decision variables of quantities and its trend. The values of the decision variables are summarized into average val-
ues per a week. In the other hand, users of cluster 4, which is the lower profit cluster, have an ability to understand information summarized with three kinds of dimensions. So, if we delivery information summarized according to four kinds of dimensions or more to the users, the users may not understand means of the information and may be misled to error in decision-making. And the users
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
1625
Table 2 Summary of schema and decision variables
Cluster 1
Cluster 2
Cluster 3
Cluster 4
No. of Schema
Degree of reference
Description for schema and decision variable
Characteristics
A01
6AA
A04
>AA
A15
>A
Cubiod: time(daily), item, market Decision variables: price and quality Cubiod: time(week), market, and item Decision variables: price and quality Cubiod: time(week), market, supplier, item Decision variables: quantities and its trend
Maximum degree of dimension: 4 Key summarization: (1) Summarizing data of quantities and its trend according to week, market, supplier and item (2) Summarizing data of price and quality according to week, market, and item
A01
6AA
A04
6AA
Maximum degree of dimension: 4 Key summarization: Summarizing data of price and quantity according to week, market, customer, and item
A06
>AA
Cubiod: daily, item, market Decision variables: price and quality Cubiod: week, market, and item Decision variables: price and quality Cubiod: week, market, customer, and item Decision variables: price and quantity
A01
6AA
A04
6AA
Maximum degree of dimension: 4 Key decision variable: Summarizing data of price and quality according to daily, market, supplier and item
A06
6AA
A03
>AA
Cubiod: daily, item, market Decision variables: price and quality Cubiod: week, market, and item Decision variables: price and quality Cubiod: week, market, supplier, and item Decision variables: price and quantity Cubiod: daily, market, supplier, item Decision variables: price and quality
A01
>AA
Cubiod: daily, item, market Decision variables: price and quality
Maximum degree of dimension: 3 Key decision variable: Summarizing data of price and quality according to daily, market and item
consider the decision-variables of price and quality summarized into average value of transactions per a day. In the result, we can see that advantaged users importantly consider quantity of an item transacted and its trend by week for decision-making while disadvantaged users do daily price and daily quantity of an item. And advantaged users compare their transaction with transactions of other suppliers while the disadvantaged users don’t consider it importantly. Therefore, if the disadvantaged users refer the information comparing with the other suppliers’ quantity and its trend by week importantly, they may improve total profit and probability for profit. We need to redesign the information to a type of information which they can understand. 3.2. Monitoring change of market environment As described in previous section, environment variables of market are changed over the period in time. However, users may not be aware of the changed market environment and they may consider the same kinds of information which were used for decision-making in previous period. It is why they have many losses and lower the probability for profit per transaction. Therefore, we should monitor market environment and inform new environment variables to users. For monitoring market environment, we summarize transaction data of market to data of recency, frequency, and monetary (RFM). Using summarized RFM
data, we are able to find main customers of market and identify consuming pattern of main customers. The RFM clustering method is one of the analyzing methods for discovering customer patterns (Bult & Wansbeek, 1995; Woo, Bae, & Park, 2005). We define RFM as follows. – Recency: the last period of time of purchases during an analyzing period. – Frequency: the number of purchases during an analyzing period. – Monetary: amount of money spent during an analyzing period. After summarizing to data of RFM, we segment customers into the several groups having a similar consuming pattern with the SOM. When segmenting customers, we can obtain the results as shown in Table 3. In the period P1, main customer group is the cluster 2 because cluster 2 has the characteristics of recency and relative importance of total sales. We can see that the customers buy small quantity of the item which has below average quality, average price in the period P1. However, the characteristics of main customer group are changed as high quality, high price, and many transaction quantities in the period P2. Therefore, we should monitor the change of profits of advantaged users because important web information for a good decision-making may be changed. If their profits are decreased, we should find new advantaged users
1626
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
Table 3 Identifying change of market environment Period
Cluster
Recency
Frequency
Monetary
Relative importance of total sales (%)
Market environment (=consuming pattern of main customers in each cluster)
P1
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Total average
1.0 1.0 1.0 0.33 0.83
0.23 0.04 1.0 0.64 0.48
0.72 0.01 0.53 0.39 0.41
6 47 12 35
Cluster 2 Quality: 5th–7th level Frequency: average Price: average Quantity: below average
P2
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Total average
0.66 0.38 0.22 0.76 0.51
0.30 0.98 0.17 0.09 0.39
0.54 0.72 0.08 0.05 0.35
51.95 13.68 16.62 17.75
Cluster 1 Quality: 1st level Frequency: above average Price: high price Quantity: many quantity
and their key summarization schema and decision variables. The monitoring change of market environment module allows that users make a sufficient profit continuously.
3.3. Redesigning web pages After extracting summarization schema and decision variables of each cluster, we can see the difference between summarization schema and decision variables of the advantaged cluster and those of the others. Disadvantaged users may improve their profit if we deliver the information which is summarized with the summarization schema and decision variables of the advantaged cluster to them. However, the users may not understand the information if we deliver the same type of the information because of differ-
ence of abilities to understand among users. Therefore, we must redesign the information to consider an understanding ability of users. As shown in Fig. 7, we redesign the web information to improve profits of disadvantaged users. We retrieve data of alternatives related to the decision variables of best advantaged cluster and summarize those according to summarization schema of best advantaged cluster. Then, we segment the alternatives into several similar groups and measure values of all groups. A value of group is calculated from a linear weighting model (de Boer, Labro, & Morlacchi, 2001) as following: TV i ¼ wi1 li1 þ wi2 li2 þ þ wij lij where i is the label of group, j is decision variable, w is weight of decision variable j of group i, and l is average
Values of the decision variables of the best advantaged cluster
Values of the decision variables of each cluster
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
0 or 1
frequency
price
price trend
quantity
quantity trend
quality
frequency
price
price trend
quantity
quantity trend
quality
summarization schema of the best advantaged cluster
Searching the best alternatives for decision-making
summarization schema of each cluster
Searching the best alternatives for decision-making information of the best alternatives resulted from the information used by each user
information of the best alternatives resulted from the information used by the best advantaged users
Converting the information of the best alternatives of each user into information summarized with summarization schema of the best advantaged cluster converted information related to the best alternative of each user
Combining two alternatives to comparing the best alternatives of the best advantaged users with the best alternatives of each users and redesign web information Fig. 7. Searching the best alternatives for decision-making.
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
1627
Fig. 8. Redesigning an web-page. (a) Web information and its schema which users of cluster1 refer for decision-making. (b) Delivering the modified web information of (a) for users of cluster4 to select a better alternative.
value of decision variable j of group i. The reason for measuring values of all groups is comparing all groups and then recommending alternatives of the best group for decision-making. Because comparing values of multi-variables, we suggest a linear weighting model which considers important degree of each variable totally. The weight of each decision variable is set by market environment. We compare the values of all groups and choose alternatives of the best group. The alternatives are the optimal decision which best advantaged users can choose. As shown in the example of Fig. 8a, the best advantaged users refer the web page which summarized with quantity and its trend of decision variables and the schema having four dimensions of item, market, week and supplier. We can show the alternative, which is market in this case, having more quantity and its trend than others as pointed by circle in the graph of Fig. 8a. Using the same method, we search the alternatives which users of each cluster think to optimal decision. A user of disadvantaged cluster may refer the web information which is summarized with price and quantity of decision variable and the schema having three dimensions of daily, item and market as left web page of Fig. 8b. However, optimal alternative of the web information is different from that of the best advantaged users. We should inform
disadvantaged users of the difference of decision-making and guide them to rethink their decision-making again. Therefore, we convert web information of the alternatives which disadvantaged user regards as optimal solution into the different type of web information summarized with the decision variables and schema of the best cluster. When converting the web information, we consider disadvantaged user’s ability to understand. As shown in the right of Fig. 8b, we find information of optimal alternative with the decision variables and schema of best cluster and then we convert the information into a different type of the information presented by three dimensions of item, market, week and one fixed dimension of supplier. The new information is combined with information of the alternatives fined from the best cluster for showing that the alternative is not optimal decision in right of Fig. 8b although an alternative is optimal decision in left of Fig. 8b. Therefore, disadvantaged users may change their decision-making and understand other important decision variables and schema. 4. Evaluation To analyze effectiveness of the intelligent web information system in government, we used three measures which are total profit, probability for profit and probability for
1628
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
b
c
Fig. 9. Comparing control group with treatment group. (a) Relative profit of clusters changed over period in time. (b) Comparing total profits among clusters. (c) Comparing probability for profit/loss among clusters.
loss. We randomly selected twenty users in users of cluster 4, then one half of the users is included in control group and the other half is included in treatment group. Then, we redesigned the information summarized with schema and decision variables of cluster 1 as a type of information which users of the treatment group can understand easily. We served the information to users of the treatment group. Then, we obtained the following results after testing two groups during 10 weeks. As shown in Fig. 9a, users of the best cluster had the higher profits than two groups during almost all periods. Users of the treatment group didn’t have any difference of profit with the control group in first period but they had the higher profits than the control group during rest periods. In the point of total profit, there was comparison between treatment group and control group as shown in Fig. 9b. That is, profits of treatment group are sharply improved. In the point of probability for profit/loss in Fig. 9c, probability for profit of treatment cluster is increased and probability for loss of treatment cluster is decreased. The probability for profit of cluster 1 and treatment cluster is 70% because they have made profit in seven
periods during 10 weeks. However, the control cluster has made profit only one during 10 weeks. We can see that difference of abilities of which users understand information and difference of information used for decision-making from the result have many affects to success or failure of their economic behavior. Therefore, intelligent web information system in government should be designed to support effective information to the people who have a low ability to understand. 5. Conclusion The purpose of an intelligent web information system in government agencies and public institutions is in providing various kinds of public information to support decisionmaking of people. Because people have a difference of a standard of education, experience, etc., they have a difference of ability to understand information. They make a different decision in spite of delivering the same many kinds of information for decision-making, so that they make a different profit/loss. Therefore, we suggested an intelligent web information system in government for help disadvantaged users make
T.H. Kim et al. / Expert Systems with Applications 34 (2008) 1618–1629
more profit in their economic behaviors. We defined the important issues for developing intelligent web information system in government effectively: design of web contents, personalization, and corresponding to change of market environment. For developing the system, we collect users’ transactions in markets, evaluate users’ decision-making in the profit aspect and then segment users into advantaged groups and disadvantaged group. Then, we identify the difference of web information used between two groups. That is, we find which pages users of each group mainly refer, what features of information included in the pages, and ability to understand information by user. Lastly, we redesign the web information to improve the information weakness of disadvantaged group. Then, we used three measures which consist of total profit, probability for profit and probability for loss to analyze effectiveness of the intelligent web information system in government. From the evaluation, we could see that difference of abilities of which users understand information and difference of information used for decision-making have many affects to success or failure of their economic behavior. If web system of government can support that disadvantaged users can understand the information which advantaged users refer for decision-making, the web system may be trusted to users because people can make a more profits.
1629
References de Boer, L., Labro, E., & Morlacchi, P. (2001). A review of methods supporting supplier selection. European Journal of Purchasing and Supply Management, 7, 75–89. Bouguettaya, A., Ouzzani, M., & Cameron, J. (2001). Managing government databases. IEEE Computer, 34(2), 56–64. Bult, J. R., & Wansbeek, T. J. (1995). Optimal selection for direct mail. Marketing Science, 14(4), 378–394. Han, J., & Kamber, M. (2001). Data mining – Concepts and techniques. San Francisco: Morgan Kaufman. Hinnant, C. C., & O’Looney, J. A. (2003). Examining pre-adoption interest in on-line innovations: An exploratory study of e-service personalization in public sector. IEEE Transactions on Engineering Management, 50(4), 436–447. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69. Lee, J. A., & Jung, J. W. (2004). Strategy for implementing high level e-government based on customer relationship management. Korean National Computerization Agency. Marchionini, G., & Levi, M. (2003). Digital government information services: The bureau of labor statistics case. Interactions, 10, 18–27. Marchionini, G., Sanan, H., & Brabdt, L. (2003). Digital government. Communications of the ACM, 46(1), 25–27. Medjahed, B., Rezgui, A., Bouquettaya, A., & Ouzzani, M. (2003). Infrastructure for e-government web services. IEEE Internet Computing, 7(1), 58–65. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: MacGraw-Hill. Woo, J. Y., Bae, S. M., & Park, S. C. (2005). Visualization method for customer targeting using customer map. Expert Systems with Applications, 28(4), 763–772.