Information and Software Technology 44 (2002) 313±330
www.elsevier.com/locate/infsof
Performance testing of a negotiation platform q Charles HeÂlou a,*, Rachida Dssouli a, Teodor-Gabriel Crainic b a
b
Universite de MontreÂal, DIRO, CP 6128, Montreal, Que., Canada H3C 3J7 Centre de Recherche sur les Transports, CP 6128, Montreal, Que., Canada H3C 3J7
Abstract Accessible from all over the world, the EC became an indispensable element to our society. It allows the use of electronic systems to exchange products, services and information between the different existent users. During these exchanges, it is very important to assure a good quality of service. However, the enormous expansion of the Internet users push its resources to the maximum of its limits, which provoke, in many cases, an important degradation in its performance. Consequently, it is primordial to analyze the capacity of servers in order to handle heavy workloads that are growing considerably as a function of the number of users. It is, therefore, necessary to conduct performance tests before servers' deployment in order to detect any imperfection and predict their behavior under stress. In this context, this paper present a simpli®ed performance evaluation of the ªalphaº version of a negotiation platform called Generic Negotiation Platform (GNP) dated on September 2000. This platform is still under development. Many performance factors could be examined in this evaluation. However, we considered only the response time factor because of its important impact on auctions and negotiations applications. We mostly oriented this study to give us an idea about the variation of the average response time of the server as a function of the number of users and the type of different transactions. We also tried to evaluate the effect of the auctions' rules on the server average response time. We limited our study to the close and open auctions at the second price. This study followed the traditional way of doing performance tests. Therefore, we ®xed our test objectives and criterion and then create our own scripts. Once the nature of the workload of the server was speci®ed, we created an adequate benchmark to generate requests to the server. Afterwards, the average response time of each considered transaction was collected. In order to interpret these results properly, we calculated the standard deviation and the coef®cient of variation of each set of values. q 2002 Elsevier Science B.V. All rights reserved.
1. Introduction Electronic commerce (EC) has many de®nitions [1±3]. In Ref. [4], it is de®ned as ªthe use of electronic systems in the exchange of goods/services/informationº. We encounter two important categories of EC on the Internet: consumeroriented EC and business-to-business oriented EC. The number of users, in both sides, is increasing year after year. Business oriented EC is expected to grow from $114 billion in 1999 to $1.5 trillion in 2004 [4,5]. In order to assure a high QoS to the user, we have to invest a considerable effort to test the performance of available servers. This is necessary to offer to users a fast service and keep them connected to the site or platform. To face this problem, we have to test the capacity of the server in advance to handle heavy workloads. These tests should take into consideration the response time of the server, its capacity, its saturation and its behavior under a q This work was sponsored by Bell Canada. * Corresponding author. Tel.: 11-514-343-7599; fax: 11-514-343-5834. E-mail addresses:
[email protected] (C. HeÂlou), dssouli@iro. umontreal.ca (R. Dssouli),
[email protected] (T.-G. Crainic).
large workload. They could also help taking decisions at different levels. For an administrator, they allow him to choose the best system for his applications. For a designer, they give him the opportunity to compare the ef®ciency between different systems. For a user, they allow him to win in response time. They could also help in determining, if some enhancement could be add to the system in order to increase its ef®ciency and productivity. In this perspective, this paper treats the performance evaluation case study of the ªalphaº version of a negotiation platform called Generic Negotiation Platform (GNP) that support different auction rules. It aims at studying the average response time when applying the open and close auction rules on a second price. To do this, we considered a set of scenarios and built our own benchmark corresponding to certain objectives. Instead of looking at the general aspect of the server, this study looked into elementary transactions. This helped us to under the effect of each of them on the variation of the average response time of the server, and to conceive and propose some solutions. This paper is organized as follows. Section 2 gives a general overview on the market place side of the EC and
0950-5849/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0950-584 9(01)00215-4
314
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
Fig. 1. Distributed architecture for Electronic Commerce.
on the most used auction mechanisms. Section 3 outlines the steps that analysts should follow in any performance test procedure. Section 4 describes some important aspects of the server. The approach adopted in this study is brie¯y presented in Section 5. Results and interpretations are exposed in Section 6, and ®nally Section 7 gives a summary with some concluding remarks. 2. E-commerce: the market place side No factor affects the way of doing commerce as did the spectacular expansion of the Internet. The Internet introduces new electronic ways and methods of doing business. The Internet gave the possibility to exchange products, money and documents online between companies and their clients or between clients themselves. The Internet offers a good service to clients with competitive prices, and reduces the risk of errors and the delay of commercial procedures. The most important factors that contribute to the success of an EC system are classi®ed under three categories: strategic, technical and functional. The strategy side determines the gain and the marketing policies and tries to assure an attractive environment to clients. The technical category takes care of the QoS in terms of response time, throughput, availability, liability, quality of image and sound, etc. The last category takes into consideration the functional side of the system like the buying, bidding, searching and negotiating procedures.
2.1. Architecture of hosted e-commerce systems An EC system is based on three essential actors: the ®rst represents the clients or buyers in general, the second represents the companies and suppliers, and the third represents the service suppliers over the Internet. A client can consult a web page of a supplier and decide to buy a product by simply clicking on the ªbuyº button. At this point, the client should send the credit card information and product reference to the seller. The latter should then check the validity of the credit card at the concerned ®nancial institution. This role is usually reserved to a third actor. Certainly, its role is to bring different kinds of requested information and to prepare the necessary steps to deliver the purchased item. A general architecture of the EC market place is given in Fig. 1. The principal components are: [6] Web-site construction tools: used to create Internet pages containing the order forms. Web-site hosting and publishing: the site can be published by any server that the commerce-service provider allows. This latter should provide the seller with on-line transaction functions. Transaction server: handles credit and debit card transactions using secure EC standards technology. It deals with the necessary authorization requests and records the transaction and the settlement between the merchant, the credit card company and the user.
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
Security products and services: used to submit secure electronic transactions and to provide message integrity, authentication of all ®nancial data, and encryption of sensitive information. Payment systems: should be installed at the end customer's location, the merchant's transaction system location and the ®nancial institution's location. Records of detailed transaction payment information must be saved to facilitate reconciliation, adjustments and reporting. Call center: it is a ªcentral place where customer and other telephone calls are handled by an organization, usually with some amount of computer automation. Typically, a call center has the ability to handle a considerable volume of calls at the same time, to screen calls and forward them to someone quali®ed to handle them, and to log calls. Call centers are used by mail-order catalog organizations, telemarketing companies, computer product help desks, and any large organization that uses the telephone to sell or service products and servicesº [29]. EC has become a part of everyday life. Its rapid expansion is due to many factors. The most important are: New platforms became more accessible and easy to use. Flexibility reduced the execution time of operations. Elimination of geographic distances between the different countries of the globe. The continuing need for companies to conquer new markets. 2.2. Auction mechanisms Usually, auction mechanisms are considered as commercial exchange mechanisms. The word itself comes from Latin root augere, which means ªto increaseº [7]. Auction theory is a complex economic subject and only a brief introduction can be given here. For more details see Refs. [8±13]. Each type of auction is characterized by his rules that affect, in many ways, the execution of a negotiation. The type of an auction should be chosen according to the nature of the product, the kind of the market that we are going to deal with and our objectives. Actually, there are two major parts of auctions: auctions with common values and auctions with private values. With the ®rst part, items are bought to be sold on secondary markets, and the bid depends on the evolution of the market. Whereas, with the second part, products are bought for personal use, and the bid depends on the own evaluation of the buyer and seller. There are many differences between the different existing types of auctions. We distinguish: Simple auction: orders are accepted only from one side of the market: sellers or buyers.
315
Double auction: suppliers and buyers can order at the same time. Sequential auction: items are exposed to be sold one after the other. Simultaneous auction: items are exposed to be sold at the same time. Prices and quantities are revealed only at the end of the auction. Multi-units auction: many units of the same product are present in the auction. Multi-items auction: many different items are present in the auction. Simple sequential auction: apply at the same time the rules of simple and sequential auctions. Simple simultaneous auction: both the rules of simple and simultaneous auctions are applied. We consider in this category the close and open auctions. With the ®rst, no information is given to bidders during the negotiation period. With the second, bids are transparent to bidders. Combinatorial auctions: the majority of on-line auctions are simple single-item auctions. In many cases the market consists of multiple, identical or non-identical items. In this case, the price that a bidder is willing to offer for one item may critically depend on other items that he could win. An elegant solution to this dilemma is the concept of combinatorial auctions in combinatorial auctions, bidders may bid on combinations of objects to explicitly express complementarities or substitutabilities. A complete auction-based trading process is composed of six basic activities [14]: 1. Initial buyer and seller registration: this step permits the authentication of trading parties, the exchange of cryptography keys and the creation of a pro®le for each trader when needed. 2. Setting up a particular auction event: at this stage, the description of the item being sold or acquired is given and the choice of the rules of the auction is made. These rules specify the type of auction being conducted, the kind of negotiated parameters and the starting and closing date and time of the auction. 3. Scheduling and advertising: in order to attract potential buyers, we must separate popular auctions from the ones that group expensive items. Items of the same category should be auctioned together at a regular schedule. 4. Bidding: this step handles the collection of bids according to the bid control rules of the auction. 5. Evaluation of bids and closing the auction: at this level, the auction rules are applied and the winners and losers of the auction are noti®ed. 6. Trade settlement: this ®nal step handles the payment to the seller, the transfer of goods to the buyer and the payment of fees to the auctioneer and other agents when the seller is not the auctioneer.
316
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
Fig. 2. The auction business model.
2.3. Business models The Internet EC business models affect the EC benchmark application by the way the types and the makeup of business transactions are distributed, the frequency of individual transactions are invoked and the number of accesses to databases over the Internet is occurred. It was de®ned as ªan approach a company takes to make money onlineº [15]. Current EC business models are classi®ed into three classes: cybermediary, manufacturer and auction models [16]. The intermediary between suppliers of goods or services and customers characterizes the cybermediary model. It permits a vendor to add value to its on-line supplier sites in different ways: by marketing a large range of similar type products from one site, or by having a comparison shopping, or through coalition industries, or through creation of communities which facilitate the customization. The manufacturer model considers that marketing and distribution are part of the company's operations. Unlike the cybermediary model where goods are bought by suppliers and resold, the manufacturers make their own offers through their internal manufacturing processes. This model is better suitable for organizations that have a solid marketing team and elaborated customer service processes. The auction model works as a stock auction market: the buyer sets the price of the product throw submission of bids and suppliers are ready to sell their products at the bidding price. Companies that use this model should have large customer bases because, in general, the margins on products are low in order to be competitive. Businesses make their pro®ts from charging small fees to sellers when a sale is done through their site, and through the sales of advertisements. Fig. 2 shows how the auction model function [17]: 1. The client sends his order. 2. Internal treatment of clients orders.
3. Noti®cation of the winner. 4. The seller who has made the sale is noti®ed of the customer's identity. 5. The seller and the buyer transact the ®nal settlement without the intervention of the enterprise. 6. The supplier pays transaction fee to the business Z. 7. The seller adjusts his listed items at the site. The creation of benchmarks for these business models differs because they support different workloads and the business logic within the transactions is not the same. It is important to test, under heavy loads, the performance of servers and the different entities involved in order to avoid saturation and bottlenecks. 3. Server performance tests Conceiving scenarios to test a server performance requires a deep knowledge of the system and an adequate selection of parameters, methodology, workloads and tools. The challenge of studying the performance of a server consists of discovering, in a limited time, the true behavior of the server under heavy loads by choosing the right method, technique and metrics. Each analyst has his own style: given the same problem, two analysts may choose different performance metrics and methodologies. Often, problems detected after putting a server in service are not related to a dysfunction or a software failure, but to a considerable degradation of its performance under heavy loads. Usually, these systems are just tested from the functionality point of view, and performance tests are totally neglected. EC systems are architecturally composed of many entities that are decentralized and autonomous. Testing the performance of these components requires sophisticated procedures, tools and approaches. Indeed, testers have
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
to deal with concurrency, synchronization and communications issues [27]. 3.1. Capacity planning ªCapacity planning is the process of predicting when future load levels will saturate the system and of determining the most cost-effective way of delaying system saturation as much as possibleº [18]. This prediction should take into consideration the evolution of the workload. The absence or the lack of continuous capacity planning procedure may be ®nancially devastating to a company: an unexpected unavailability caused by a bogged-down server could cause the loss of millions of dollars just for a short time of service interruption. Also, poor performance of a system may lead to customer's dissatisfaction and could damage the external image of the company. Another reason to keep capacity planning procedure is that solving a performance problem may not be instantaneous. Even if the company has the ®nancial capacity to replace the needed hardware or software to solve a capacity problem, it may take awhile to ®nd the best technical approach and components to solve the problem [18]. Metrics, workload and evaluation techniques change usually from one problem to the other. However, there is a systematic common approach to all performance evaluation projects. Steps of this approach are given as follows [19]: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
De®ne goals and characterize the system. De®ne the environment and list services. Select metrics. List parameters. Select factors to study. Select evaluation techniques. Select Workload. Design experiments. Analyze and interpret data. Present results.
3.2. Characterizing the workload ªThe workload of a system can be de®ned as the set of all inputs that the system receives from its environment during any given period of timeº [18]. Because it is cumbersome to deal with a large number of elements, testers need to build a workload model that captures the most relevant characteristics of the real workload according to the objectives of the study. Even that each system may demand a speci®c approach to characterize its workload, there are some common guidelines that could be applied to all types of systems. These guidelines include the following steps [18]: 1. Speci®cation of a point of view from which the workload will be analyzed. 2. Choice of the set of parameters that captures the most
317
relevant characteristics of the workload for the purpose of the study. 3. Monitoring the system to obtain the raw performance data. 4. Analysis and reduction of performance data. 5. Construction of workload model. 3.3. Using benchmarks Using benchmark is one of the traditional ways to do performance testing. It is a workload that is representative of how the system will be used in real. In this case, the system's behavior on the benchmarks could reveal how the system will behave in the ®eld. It is not too evident to have a representative workload for several reasons [20]: The ®rst problem is to determine the issue of where the data come from. The best way is to have an operational pro®le that describes how the system has historically been used in the ®eld, and how it is going to be used in the future. The second issue that must be considered is the type of the workload: the benchmark should re¯ect an average or heavy workload? It is important to consider the side of observation from which the average or heavy loads will be taken. Another issue to be considered is to determine whether or not the system that is to be performance tested will run in isolation on the hardware platform. Time and rate are the basic measures of system performance. From the user's viewpoint, application execution time is the best indicator of system performance. Users always look for fast response time. From management viewpoint, the performance of a system is de®ned by the rate at which it can perform work [18]. To be useful, a benchmark should have the following attributes [18,20]: Relevant: it must provide meaningful performance measures. Understandable: results should be simple and easy to understand. Scalable: benchmark tests must be applicable to a wide range of systems. Acceptable: it should present an unbiased result recognized by both users and vendors. The emerging EC business models affect the EC benchmark development. For each individual EC business model correspond different EC benchmarks. A benchmark consists of two major components: business logic and performance metrics de®nitions. The ®rst requires an accurate representation of the used EC business model, the supporting web navigational and database designs, transaction de®nitions,
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
318
Fig. 3. Selecting a proper index.
and integration with payment systems and security. The second component takes into consideration the multimedia environment and interactions with servers and the network components. It involves quality of service parameters such as response time, throughput and reliability [17]. 3.4. Measuring performance Measuring performance is an essential step for the capacity planning because it collects data for performance analysis and modeling. The main source of information is the set of performance measurements collected from different reference points, carefully chosen to observe and monitor the environment under study [18]. Measurement data has three main uses in performance management: operation problem detection, performance tuning, and capacity planning. To follow the behavior of a system, testers rely on runtime monitors. They continuously collect measurement data from the system and display the status of some key variables that may indicate potential problems. On the other hand, analysts examine performance historical data to discover possible causes of performance problems that may affect the system operation. Capacity planning, as we mentioned earlier, makes use of measurement data to develop models to predict the performance of future system con®gurations or to anticipate problems [18,21]. 3.4.1. Selecting meaningful indices In general, we use a single number to represent the characteristic of a set of data. To be meaningful, this number should be representative of the major part of the data set. The most popular indices that specify the center of the
distribution of the observation in a sample are the average, the median or the mode. To select a proper index of central tendency, analysts should essentially take into consideration the type of the variable and if the total of all observations count. If the variable is categorical, the mode is the proper single measure that best describes that data. If the total of all observations is of any interests then the mean is a proper index of central tendency. Otherwise, one can choose between median and mode. The ¯ow chart of Fig. 3 shows a guideline to select a proper index [19]. Most performance measurements are collected in a random quanti®cation of the performance. The result would not be the same if the measurement were to be repeated and the variance of results is high specially if there are many uncontrollable factors that impact those results. In such cases, we have to repeat the experiment many times and calculate means. However, means alone are not enough when comparing two random results or if there is a large variability in the data. Overlapping con®dence intervals are generally enough to deduce that these quantities are statically indifferent. Variability is represented by the variance or by the standard deviation. Often, this latter is more meaningful because it is expressed in the same units as the mean. Analysts could calculate the ratio of the standard deviation to the mean, called the coef®cient of variation, to evaluate the variability. Generally, if this ratio is greater then 0.5 the result are not considered signi®cant. 3.4.2. Measurement techniques ªSystem measurement can be viewed as a process that observes the operation of a system over a period of time and registers the values of those variables that are relevant for a quantitative understanding of the systemº [22]. A measurement process involves four major steps [18]: 1. Specify reference points: ®rst analysts should specify the points from which performance data will be collected. 2. Specify Measurements: at this step, they have to decide on the performance variables to be measured. 3. Instrument and collect data: after choosing the variables to be observed programmers should con®gure and install measurements tools to monitor in order to measure the variables during the observation period. Meanwhile the required information is recorded. 4. Analyze and transform data: measurement tools collect a considerable quantity of data, which correspond to a detailed observation of the system. To be useful, this information must be analyzed and transformed into signi®cant results such as average response time or average size of requested ®les. Measurement techniques used to gather data can be divided into two categories: event mode and sampling mode [18]. With the ®rst mode, information is collected at
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
319
alternative market designs and with various types of negotiationsº [24]. 4.1. Origin
Fig. 4. Separation of economic and software levels.
the occurrence of speci®c events. Upon detection of an event, the special code calls an appropriate routine that generates a record containing values of variables already chosen by analysts. When the event rate becomes too high, the event handling routines are executed very often, which may introduce a large overhead in the measurement process. In this case, the measurement overhead becomes unpredictable which constitutes one of the major disadvantages of this measurement technique. In the sampling mode, information about the system is collected at prede®ned time instants. The data collection routines of a sampling software tool are activated at predetermined times speci®ed by analysts in advance. In this case, the overhead depends on the number of variables measured at each sampling point and the size of the sampling interval. Testers have to ®nd the best way of choosing the accurate size of the sampling and the monitoring interval to obtain low overhead and signi®cant results. Compared to event trace mode, sampling mode provides less detailed information on systems being tested [23]. 4. GNP: Overview The fast and signi®cant progress of information technologies is enormously changing the structure of the economy. This is seen clearly, through the EC, in the way in which economic negotiations are progressing, and through the exchange of products and services. The creation of an effective electronic market (electronic marketplace) gives rise to several challenges: the EC should do more than just reproduce the classical way of trade and should reconsider the way in which decisions are made and markets operate. Moreover, the negotiation system must be able to support many different rules and algorithms of negotiations. In this Context, Generic Experimentation Engine (GEE) was developed. ªIt supports game-oriented experimentation and allows for the study of human behavior under various game situations. The more recent GNP is a more focused version of GEE that will support experimentation with
GNP is thus a widened version of server GEE. The latter is based on a platform especially designed to study the human behavior under various game situations. It is characterized by its ¯exibility and its capacity to carry out several types of games. Whereas GEE is a game-oriented engine, GNP is an auction oriented-platform [24]. GEE is a ¯exible GEE that allows the implementation of many games. Its major task is to orchestrate a sequence of rounds. In each round, each participant receives a set of public and private information and actions. Depending on the structure of the game, a new round is initiated when a preset number of players have transmitted their choices or when a speci®c delay has expired. At the beginning of each round, information and actions are adapted according to the rules of the game. At the end of the game, each player receives his score [24]. The ¯exibility of GEE comes from the fact that it is a system easy to handle which does not impose too many economic restrictions. It was conceived in the objective to create an open electronic environment that could reproduce various economic mechanisms implying several players interacting with each other. 4.2. Flexibility of economic and business rules GNP is a generic negotiation platform allowing the use of several markets designs and various types of negotiations. It is conceived to be easy to use: no client software installation is required except the web browser. The generic software container and the economic aspects are clearly separated in order to allow rapid implementation of any type of negotiation. Speci®c business and economics rules are implemented as XML documents and the economic algorithms are Java classes written in JPython. This separation is shown in Fig. 4. 4.3. Architecture The architecture of GNP uses several concepts and technologies already adopted for GEE. It is based on a multi-tier Web application using servlets, JSP pages and JPython scripts. It is a server component written in Java 1.2 as Enterprise Java Beans (EJB) application using a RDBMS (Relational DataBase System Management) to store information. The various elements of this architecture are shown in Fig. 5. Each negotiation is treated as a document and, any interaction is a handling of that document. This approach supports the use of scripts since the internals of the system are open and accessible. In order to facilitate the handling of these internal documents, two API levels are provided. They are called the Negotiation Toolkit (NTK). The ®rst level,
320
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
Fig. 5. GNP architecture.
referred by NTK-1, contains the storage management functions such as loading, searching, and saving information about a negotiation. The second level, noted by NTK-2, provides objects and functions frequently used by the game designer such as negotiation and bid [24]. Another advantage of this approach is that the use of XML protocol for negotiation information exchange helps to have a more responsive user interfaces since the most recent generation of web browsers can process and format XML documents. In this case, a small Java applet located at the player's browser could be listening for events on the EJB server (using RMI or CORBA) and, when necessary, would update just a part of the DOM tree contained in the browser, without having to do a complete web page reload [24]. 4.4. Negotiation documents At this time, GNP has seven negotiation documents. They are distributed as follows: three input documents, one internal document and three output documents [25]. Input documents are: Rules: contains all constant information for negotiation products. It is read only after the creation of the negotiation. Product reference: contains references and information concerning the products on which the negotiation is occurring. Order: keeps data sent by a negotiator. Non-relevant attributes are ®ltered to just keep the major components of an order: name, product reference, price and quantity. The internal document, called Negotiation, keeps all important information in an active negotiation. After each round, information will be updated. The auctioneer could only see this document. Output documents are: Quote: contains all public information that each partici-
pant can see (like the price to beat and the expiration time of a round). This document is built by the auctioneer after each round. Response: this document holds private information for each participant. The auctioneer creates it after receiving an order. Adjudication: At the end of the negotiation, the auctioneer creates this document to conserve important details of the negotiation such like buyer's and seller's orders. 4.5. Negotiation steps The creation of the XML document, announce, initializes a negotiation on GNP. It contains all the necessary elements to launch a negotiation. Various models of negotiations exist on GNP. The participant wishing to initialize a negotiation has only to choose the appropriate model by ®lling the corresponding HTML announce form. It enables him to de®ne the parameters of the selected model and to guarantee a ¯exibility of implementation. Once completed and sent, the announce document will be instantly created. The following steps summarize the sequence of initialization of a negotiation: 1. Choosing of an announcement model by the participant. 2. Filling and sending the corresponding form. 3. Creation of the announce document in XML. Announce Document is divided into three parts: rules, productReference and order. The rules part establishes the ®xed parameters of the negotiation such as the closing and opening hours. The productReference part de®nes the nature and the description of the product to be negotiated. The order part contains the data of the initial setting made by the participant like the starting price and the quantity of the product [25]. At this point, the announce session will create the objects
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
321
Fig. 6. Structure of an XML economic rule document.
rules, negotiation, productReference and order with their corresponding XML documents. 4.6. Economic rules Economic rules are also handled in XML documents. An example of the structure of these documents is shown at Fig. 6. GNP can adopt several auction rules. The system is built around these economic concepts [25]: Forward, reverse and double sided auctions. Negotiation conducted through phases and rounds: each round is a sequence of three basic events: a quote, bids from the participants and the auctioneer evaluation. The next round starts with a quote showing the result of previous round. Multiple choreography: English auction: ask for a new price at each round, receive one or more bid for each round and close after an inactivity period.
Dutch auction: ask for a new price at each round, and close after receiving one bid. Cadenced auction: ask a new price at each round and close after nobody follows. Synchronous auction: use the same choreography for driving the negotiation on related items Market with quality factors, complex scoring and ranking: when the quality of the items and the companies involved in a negotiation varies, market makers, buyers or sellers integrate quality factors in the process and build a scoring function for the orders ranking. Multi-units auction. Multi-items, multi attributes and synchronized auction. Only the closed auction and the open auction at the second price will be considered in our study. A close auction (at second price) is a process through which the participants are invited, during only one round, to submit their bids without knowing those of their competitors. The adjudication is created when the end of the round is reached. It is declared when all orders of participants are
322
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
collected (according to the number of orders indicated in kwaitForOrderl), or when the time planned for the bidding is ®nished (according to the time expressed in seconds or in the form of a date in kdurationTimeout units ªSºl). The value-1 means that the corresponding option was not considered. In this case, the phase ®nishes when the single round arrives at its end. The parameter kwaitForOrderl speci®es the number of orders that the negotiation must accept before closing. The winner of this auction is the person who has the best limiting price. He obtains the second best price. An open auction (at second price) is a process through which the participants are invited to give a better price than the last bid. Thus, participants must be regularly informed about the different bids already submitted. Each new bid generates a new round. The parameter kincrementl in kallocationParametersl increments by 1 the selling price after each bidding. The end of the round is announced with the ®rst received bid, while the end of the phase is announced when the value of the parameter kdurationTimeoutl or kinstantTimeoutl is reached. The adjudication is created once the delay of the auction is ®nished. The leader of this auction, who has the best limiting price, obtains the second best price plus or minus the increment, according to the variation of the market.
before a considerable deterioration in its response time is detected. 2. The variation of the response time according to the number of users and the type of the considered transactions. 3. The effect of the auction rule on the response time of the server. Two auction rules are retained: the close and open auctions at the second price. In general, the purpose of performance tests is to study the behavior of a system under heavy loads, and particularly its response time from the user viewpoint. In this context, and in the actual status of GNP, we considered in our study its six most important functional transactions. These transactions are as follows: 1. submitAnnounce(): allows to initialize an announce and to create a negotioation. 2. submitOrder(): allows to submit an order. 3. getOrder(): permits to display an order already submitted by a user. 4. getOrdersForProductReference(): permits to bring the reference of the requested product. 5. getResponsesDocumentForOrder(): indicates regularly the status of a negotiation. 6. deleteOrder(): used by the administrator to remove an announce.
5. Our approach Considering the complexity and the diversi®cation of testing systems performance, and considering the fact that the GNP was under construction when we made our tests, we limited our study in a quite precise direction: evaluate the behavior of the current status of the server starting from a simple approach. To this end, we de®ned our parameter and objectives in a simple way that respect the common approach to all performance projects already evoked in Section 3. The response time, the most important parameter for distributed systems, was chosen as a metric in this study. It is de®ned as ªthe total individual response time of each transaction divided by the number of transactions during the measurement intervalº [15]. The response time do not take into consideration the human interaction times such as the key time and the think time. It plays a crucial role for the success of EC applications. Indeed, following the ¯uctuation of the market and participating ef®ciently in an EC application is directly affected by the capacity of the server to handle requests, and by its speed in retrieving and displaying the requested information. The response time becomes critical when the number of customers increases considerably. The unit chosen to represent the response time is the second (s). Our strategy of tests primarily aims to study: 1. The maximum number of users that GNP can handle
5.1. Considered scenarios For each of the two auction rules considered, the following test's scenarios were conceived: With an aim of determining the maximum number of orders that a negotiation can support before a signi®cant deterioration appears on the level of the response time, tests of the group A1 were launched. Tests of the group A2 aim at determining the effect of the six transactions combined on the variation of the average response time of the server. In order to be able to interpret the results obtained in A2, it was necessary to understand the effect of each existing function on the average response time of the server. That was the cause of considering the set of tests of the group A3. After having collected the results of the tests of group A3 we judged that it is important to consider a more realistic and signi®cant combination of these transactions. The absence of history pushed us to consider a distribution that we believe that it is close to reality. The stages of the group A4 are identical to those of group A2 except that the proportion of the transactions respects the conceived distribution. Let us note that: The close and open auctions at the second price will be indicated quite simply by close and open auctions. The letter A represents the economic algorithm related to a close auction and the letter B to an open auction.
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
5.3. Benchmark architecture
Table 1 List of scenarios Series
Description
A1 and B1
Studying the average response time of the server for N orders during one negotiation. N varies between 5 and 50. Studying the variation of the average response time of the server for N transactions chose in an arbitrary way. N varies between 5 and 50. Studying the variation of the average response time of the server for each type of existent transactions. N varies between 5 and 50. Studying the variation of the average response time of the server of a realistic distribution of the different existent transactions chose in an arbitrary way. N varies between 5 and 50.
A2 and B2
A3 and B3
A4 and B4
323
The average response time is given as the total individual response time of each transaction divided by the number of transactions during the measurement interval.
5.2. Tests environment Our tests were executed in the Laboratoire Universitaire de Bell (LUB). The machines used were a Pentium III, 733 MHz, with 128 RAM. The operating system used was Linux and the data base system was PostgreSql. In order to minimize the effect of the transmission time over the Internet, we considered a closed local area network. The in¯uence of the network transmission time on the server is not a goal of this study. However, in an advanced stage, this could be an interesting parameter to evaluate its effect on the average response time of the server.
Fig. 7. Benchmark architecture.
Taking into consideration the steps to follow in characterizing a representative workload mentioned in Section 3.2 and the attributes of a useful benchmark evoked in Section 3.3, we conceived a script for each scenario and we created our appropriate benchmark. Scripts were written with JPython language. Compatible with Java, JPython is an easy and ef®cient script language. Each script permits to create and launch N threads (Table 1). Each thread corresponds to a transaction or a participant. It sends a request to the server and waits for its reply and measures the time it took to answer. This time will be preserved in a result ®le. When all the threads have ®nished their jobs, the average of all these elementary response time's is calculated by dividing their sum on the number (N) of transactions. Let us note that this average is signi®cant because all the elements are homogeneous. The general architecture of the considered benchmark is represented in Fig. 7. 6. Results and interpretations After launching our scripts, we obtained a set of results. In order to verify their signi®cance, we calculate the standard deviation and the coef®cient of variation for each average we had. The role of the coef®cient of variation is to indicate the position of the standard deviation with respect to the mean. If the ratio is not too high we can consider that values are acceptable and signi®cant. Otherwise, we have an important variation between values, which means that the mean doesn't re¯ect the real situation. We have obtained the following results: 6.1. Series A1 and B1 In Fig. 8(a), the average curve of the series A1 shows that the average response time of the server starts to increase quickly at the beginning to arrive at 2.784 s with 25 participants. Once this threshold is exceeded, the stiffness of the slope tends to decrease. This is due probably to the effect of the cache memory: after a certain time, a part of the required information will be preserved in the cache memory, which means that the server will take less time to reach them. Consequently, the server will waste less time to answer requests. That could partly explain this slowing down of the curve. This state of stability will change with 45 participants (or orders). At this stage, one notices a clear increase in the average response time that will reach suddenly the 3.954 s. This re¯ects the tendency of the response time to increase in a periodic way. The behavior of the average curve of B1 does not differ much from A1's average curve. Here too, we can see that the average response time tends to increase periodically. If we suppose that a reasonable average time to submit an order is around 3.5 s, we could say that GNP could handle
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
324
Fig. 8. Variation of the average response time of series A1 and B1.
simultaneously 40 orders before the response time start to become long for a user. However, even if the variation of the two curves seems to be identical, margins of the standard deviation of the two curves do not enable us to con®rm which of the two economic algorithms is better than the other one in terms of response time. But both of them re¯ect the aspect of non-
stability of the server: sometimes the growth of the average response time is fast, and sometimes it is slow (Table 2). 6.2. Series A2 and B2 The variation of A2's and B2's curves (Fig. 9) can only con®rm the idea that there is a big difference, in term of
Table 2 Results of series A1 and B1 Number of users Step number 1 2 3 4 5 6 7 8 9 10
5 10 15 20 25 30 35 40 45 50
Serie A1
Serie B1
Average response time
Standard deviation
Coef®cient of variation
Average response time
Standard deviation
Coef®cient of variation
1.825 1.956 2.107 2.323 2.784 3.055 3.278 3.356 3.954 4.291
0.476 0.477 0.683 0.696 0.750 0.901 0.928 0.956 1.022 1.299
0.261 0.244 0.324 0.299 0.269 0.295 0.283 0.285 0.258 0.303
1.593 1.797 2.194 2.468 2.562 2.820 3.125 3.301 3.754 4.527
0.487 0.613 0.666 0.675 0.719 0.891 0.964 1.053 1.077 1.211
0.306 0.341 0.304 0.274 0.281 0.316 0.309 0.319 0.287 0.268
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
325
Fig. 9. Variation of the average response time of series A2 and B2.
average response time, between the various considered transactions. This chaotic variation of the two curves con®rms it. Consequently, no consistent analysis or signi®cant comparison can be made, at this stage, especially that the differences between the values obtained are rather large. The series of tests A3 and B3 contain a detailed description of the variation of the average response time of the server for each considered function, and can, consequently, inform
us better on how they affect the behavior of the server (Table 3). 6.3. Series A3 and B3 At this level, we studied the variation of the average response time of each considered transaction. We obtained a large difference between the execution time of these
Table 3 Results of series A2 and B2 Number of users Step number 1 2 3 4 5 6 7 8 9 10
5 10 15 20 25 30 35 40 45 50
Serie A2
Serie B2
Average response time
Standard deviation
Coef®cient of variation
Average response time
Standard deviation
Coef®ent of variation
1.025 1.158 1.364 1.125 1.569 1.651 1.449 1.937 1.714 2.121
0.585 0.676 0.791 0.924 1.074 1.260 1.258 1.338 1.327 1.504
0.570 0.583 0.580 0.821 0.684 0.763 0.868 0.691 0.774 0.709
1.226 1.024 1.187 1.627 1.456 1.858 1.637 1.857 2.157 1.905
0.577 0.669 0.758 0.856 1.132 1.060 1.178 1.218 1.190 1.310
0.471 0.653 0.639 0.526 0.778 0.571 0.720 0.656 0.552 0.688
326
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
Fig. 10. Variation of the average response time of series A3.1 and B3.1.
transactions. For example, submitAnnounce() transaction had the highest average response time with a high of 11.375 s for the close auction and a high of 12.152 s for the open auction (Fig. 10), whereas getOrdersForProductReference() transaction took the lowest average response time with a high of 0.421 s for the close auction and a high of 0.446 s for the open auction (Fig. 11). Curves of series A3 and B3 con®rm two realities: The existence of a considerable difference between the execution time of the different tested transactions. Indeed, when the number of requests reaches 50, submitAnnounce() and submitOrder() functions take, respectively, an execution time close to 12 and 4.5 s, at a time that the execution time of the other functions does not exceed 0.7 s. So, the proportion of these two functions compared to the others could affect considerably the response time of the server: the higher this proportion is, the longer the response time will be and vice versa. In spite of the margins of the average response time obtained, there is an interesting conformity in the behavior of the server regarding the two auctions' rules adopted and this for each selected function. This phenomenon could be explained by the fact that the execution of these two algorithms affects the process
of a negotiation from the user point of view but not from the server point of view. The number of rounds, as an example, changes from one type of auction to the other but, in both cases, the execution time of an order or an announce, under the same conditions, will be almost the same for the server. This result could be veri®ed by using a simulation. In order to have a general idea about the behavior of the server in the ®eld, we considered a case, which we think is close to reality, that the various functions are suitably distributed in order to obtain a signi®cant distribution. With the absence of any history, we chose the following distribution: submitAnnounce(): 10% submitOrder(): 20% getOrder(): 10% getOrdersForProductReference(): 23% getResponsesDocumentForOrder(): 35% deleteOrder(): 2% We made the following assumptions: 1. The number of orders is twice the number of announces. 2. The number of getOrder() is related to the number of
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
327
Fig. 11. Variation of the average response time of series A3.4 and B3.4.
orders. We suppose that for each two orders the user will make only one getOrder(). 3. For each order, there are several answers informing the user of the current state of the negotiation. Let's say that for each order we have 1.75 times of getResponsesDocumentForOrder(). 4. The function deleteOrder() is used, in general, by the administrator, and the probability that such a function will be called is very low. That's why, we affect to it the weight 2%.
5. The function getOrdersForProductReference() is related to the number of announces. An announce can contain several products. We suppose that, on average, each announce will contain two products (Tables 4 and 5).
6.4. Series A4 and B4 With this distribution, we notice that the impulsive behavior of the server remained unchangeable (Fig. 12).
Table 4 Results of series A3.1 and B3.1 Number of users Step number 1 2 3 4 5 6 7 8 9 10
5 10 15 20 25 30 35 40 45 50
Serie A3.1
Serie B3.1
Average response time
Standard deviation
Coef®ent of variation
Average response time
Standard deviation
Coef®cient of variation
4.861 5.405 5.957 6.571 7.062 7.728 8.064 8.437 9.324 11.375
0.822 0.795 0.982 1.146 1.093 1.099 1.079 1.233 1.436 1.659
0.169 0.147 0.165 0.174 0.155 0.142 0.134 0.146 0.154 0.146
5.162 5.602 5.767 6.780 7.425 7.815 7.950 8.097 9.565 12.152
0.858 1.005 1.204 1.133 1.065 1.017 1.159 1.377 1.320 1.786
0.166 0.179 0.209 0.167 0.143 0.130 0.146 0.170 0.138 0.147
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
328
Fig. 12. Variation of the average response time of series A4 and B4.
However, starting from 40 requests, the average response time starts to increase in an exponential way, which indicates a kind of saturation of the server. Another outstanding element in these results is that the server average response time exceeded slightly the bar of 2.5 s. If we admit that ªa widely accepted rule of thumb puts the waiting threshold for average users at 8 sº [28], and ªa Web page that takes any longer than eight seconds to load is at high risk of losing its audienceº [28], we can say that these results are acceptable for a user who is not too impatient. However, if we take into
consideration the transmission time through the network, this average time risks to exceed the acceptable delay for a participant. In spite of the presence of several types of transactions in these series of tests, the variation of the average response time is almost the same for the two economic algorithms applied. Which push us to believe that, even if the margins of the standard deviation do not prove it, none of our two auction rules permits a better performance of the server (Table 6). According to what precedes, and according to our
Table 5 Results of series A3.4 and B3.4 Number of users Step number 1 2 3 4 5 6 7 8 9 10
5 10 15 20 25 30 35 40 45 50
Serie A3.4
Serie B3.4
Average response time
Standard deviation
Coef®ent of variation
Average response time
Standard deviation
Coef®ent of variation
0.075 0.105 0.152 0.196 0.236 0.283 0.307 0.328 0.364 0.421
0.025 0.040 0.049 0.069 0.079 0.097 0.113 0.124 0.138 0.145
0.327 0.383 0.319 0.351 0.334 0.343 0.368 0.378 0.380 0.344
0.078 0.120 0.144 0.187 0.258 0.291 0.323 0.347 0.372 0.446
0.025 0.039 0.054 0.073 0.089 0.104 0.116 0.131 0.129 0.136
0.326 0.327 0.378 0.392 0.346 0.359 0.360 0.377 0.346 0.306
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
329
Table 6 Results of series A4 and B4 Number of users Step number 1 2 3 4 5 6 7 8 9 10
5 10 15 20 25 30 35 40 45 50
Serie A4
Serie B4
Average response time
Standard deviation
Coef®ciant of variation
Average response time
Standard deviation
Coef®ent of variation
0.602 0.726 0.832 1.018 1.097 1.192 1.470 1.591 1.898 2.412
0.105 0.140 0.178 0.288 0.247 0.314 0.302 0.366 0.563 0.605
0.174 0.193 0.214 0.283 0.225 0.263 0.205 0.230 0.296 0.251
0.526 0.659 0.892 1.095 1.165 1.283 1.373 1.498 1.943 2.522
0.098 0.118 0.193 0.205 0.223 0.293 0.319 0.359 0.560 0.583
0.186 0.179 0.216 0.187 0.191 0.229 0.232 0.240 0.288 0.231
considered test's context, we could deduce the following conclusions: 1. The server tends to be saturated when the number of users exceeds 40. 2. The two economic algorithms used don't affect considerably the behavior of the server. 3. The response time of submitAnnounce() and submitOrder() transactions is rather high compared to the other considered functions. This is probably due to the number of XML documents that they handle during their execution. Consequently, their proportions compared to the others type of transactions could affect the behavior of GNP. So the performance of the server, in term of response time, is acceptable when the negotiation do not contain a high proportion of these two transactions.
7. Conclusion Very often, performance's tests of servers focus on their general aspects and neglect to study the effect of their elementary transactions. However, these transactions and functions constitute the core and the base of the general behavior of servers and can affect them in different ways. This study showed that when there is a big difference between the execution time of the various existent transactions, the average response time of the server depends on the distribution of the transactions to treat. It will be interesting when testers are able to estimate, in advance, the number, and the proportion of the type of requests that the server should handle. It could ensure a fast data processing and helps to ®nd effective solutions to avoid as much as possible the saturation of servers. In this context and in a distributed system, it will be interesting, for example, to create some intelligent agents which will have the role to analyze each request and to
convey it to the appropriate server. It is important that these routing decisions take into consideration the delay of information transmission through the Internet, as well as the peak hours during a day. This study could be completed in different ways. It will be interesting, in order to follow the behavior of the server periodically for example, to create a tests generator which could be launched regularly and execute the conceived scenarios by analysts. These tests could be enriched by considering many parameters and metrics such as the throughput, the CPU processing time and the capacity of the cash memory. Certainly, this tests generator could also contain several kinds of tests such as functionality, conformity or security tests. It is also important, especially for EC applications, to study the effect of adopted business rules on server performances. Even if our study did not reveal a signi®cant in¯uence of the considered auction rules on GNP, it is important to take into consideration their in¯uence on each transaction, especially that the time factor plays a major role in auctions and EC applications. The best way of testing these kinds of applications is simulation. It permits measuring and better understanding the intersection of the time factor in various situations. High Language Architecture (HLA) simulator [26] is well placed to simulate this kind of application since it allows the representation and synchronization of time. Finally, servers' performance is far from being perfect. In fact, the need to create and ®nd effective, reliable, signi®cant and revealing tests methods is more necessary now than ever before. Acknowledgements Thanks to Robert GeÂrin-Lajoie who provided us accessibility to the server and all the necessary hardware to successfully complete our testing. This research project was sponsored by Bell Canada.
330
C. HeÂlou et al. / Information and Software Technology 44 (2002) 313±330
References [1] V. Ahuja, Secure Commerce on the Internet, AP Professional, Academic Press, NY, 1997. [2] R. Clark, EDI is but one element of electronic commerce. In Proceedings of the 6th International EDI Conference, Bled, Slovenia, June 1993. [3] R. Kalakota, A.B. Whinston, Frontiers of Electronic Commerce, Addison Wesley, New York, 1996. [4] P. John, J. Baron, M. Shaw, A. Bailey, Web-based E-catalog Systems in B2B Procurement, Communications of the ACM 43 (5) (2000). [5] S. Goldman, Goldman Sachs Investment Research (1999). [6] IBM Electronic Commerce Tutorial, Web ProForums, October 2000. http://www.webproforum.com/e_commerce/tpic05.html. [7] B. Carrie, S. Arie George, Electronic Negotiation through Internetbased Auctions, CITM Working Paper 96-WP-1019, University of California, Berkeley, 1996. [8] W. Vickrey, Counterspeculation, Auctions, and Competitive Sealed Tenders, The Journal of Finance (1961) 9±37. [9] P. Milgrom, Auctions and Bidding: A Primer, Journal of Economic Perspectives Summer (1989) 3±22. [10] V. Smith, Auctions, entry, in: J. Eatwell, M. Milgate, P. Newman (Eds.), The New Palgrave: A Dictionary of Economics, vol. 1, The Stockton Press, New York, NY, 1987. [11] R.P. McAfee, J. McMillan, Auctions and Bidding, Journal of Economic Literature June (1987) 699±738. [12] P. Milgrom, R.J. Weber, A Theory of Auctions and Competitive Bidding, Econometrica September (1982) 1089±1122. [13] R. Wilson, Strategic analysis of auctions, in: R. Aumann, S. Hart (Eds.), Handbook of Game Theory with Economic Applications, vol. 1, Elsevier Science Publishers, Amsterdam, 1992. [14] Manoj Kumar, Stuart I. Feldman, Business Negotiations on the Internet. IBM Research Division. T.J. Watson Research Center, Yorktown Heights, NY 10598. [15] Dawn Jutla, Peter Bodorik, Yie Wang, Developing Internet ECommerce Benchmarks. Faculty of Commerce, Saint Mary's University, Halifax, Nova Scotia, Canada and Faculty of Computer Science,
[16] [17] [18] [19] [20] [21]
[22] [23] [24]
[25] [26] [27] [28] [29]
DalTech, Dalhousie University, Halifax, Nova Scotia, Canada, August 1999. D. Jutla, P. Bodorik, C. Hajnal, C. Davis, Making business sense of electronic commerce, IEEE Computer 32 (3) (1999) 67±75. D.N. Jutla, P. Bodorik, Y. Wang, Developing Internet E-commerce Benchmarks, Information Systems Journal 24 (6) (1999) 475±493. D.A. MenasceÂ, V.A.F. Almeida, L.W. Dowdy, Capacity Planning and Performance Modeling: From Mainframes to Client±Server Systems, Prentice Hall, Upper Saddle River, NJ, 1994. R. Jain, The Art Of Computer Systems Performance Analysis, Techniques for Experimental Design, Measurement, Simulation, and Modeling, Digital Equipment Corporation, Littleton, MA, 1991. J. Gray, The Benchmark Handbook for Database and Transaction Processing Systems, 2nd ed, Morgan Kaufman, San Mateo, CA, 1993. J.P. Buzen, A.N. Shum, Beyond bandwidth: mainframe style capacity planning for networks and Windows NT, Proceedings 1996 Computational Measurement Group (CMG) Conference, Orlando, FL, 8±13 Dec., 1996, pp. 479±485. C. Rose, A measurement procedure for queueing network models of computer systems, Computing Surveys 10 (3) (1978). P. Heidelberger, S. Lavenberg, Computer performance methodology, IEEE Transactions and Computations C-33 (12) (1984). Morad Benyoucef, Rudolf K. Keller, Sophie Lamoureux, Jacques Robert, Vincent Trussart, Towards a Generic E-Negotiation Platform, in: Proceedings of the Sixth International Conference on Re-Technologies for Information Systems, Zurich, Switzerland, February 2000, Austrian Computer Society, pp. 95±109. Robert GeÂrin-Lajoie, Vincent Trussart, Sophie Lamoureux, Isabelle Therrien, GNP V1.0ÐArchitecture Document, CIRANO, July 2000, v0.2 D. F. Kuhl, R. Weatherly, J. dahmann, Creating Computer Simulation Systems: An Introduction to the High Level Architecture, Prentice Hall, Englewood Chiffs, NJ, 1999. Kassem Saleh, Robert Probert, Issues in Testing Electronic Commerce Systems, University of Ottawa. W. Craig, Understanding Web Site Speed and Reliability, Progressive Strategies, Inc, New York, 2000. Whatis.techtarget.com.