Computer Networks 33 (2000) 179–195 www.elsevier.com/locate/comnet
An interchange format for cross-media personalized publishing Patrick van Amstel, Pim van der Eijk Ł , Evert Haasdijk, David Kuilman Cap Gemini Nederland BV, Daltonlaan 400, P.O. Box 2575, 3500 GN Utrecht, Netherlands
Abstract Web sites are rapidly becoming the medium of choice for one-to-one marketing, communication and commerce. Many commercial solutions in this area have the following drawbacks: they force companies to implement systems within a single framework that is highly vendor-specific and that does not allow them to reuse content for other media. In this paper, we introduce i*Doc, a simple XML interchange format for content-level conditionalization based on a variant of the MIL-PRF-87269 standard for classes IV–V IETMs. This format can serve as integration format in multi-vendor CRM solutions and offers consistent cross-media publishing to multiple lower-level delivery channels such as direct mail, ASP, JSP, and WML. Personalization is determined by properties that can be bound to intelligent external systems and determined dynamically. As a showcase for i*Doc, we have developed a demonstrator of an on-line wine shop, where i*Doc serves to transport information between a database of product descriptions and generated ASP pages. The Web site is highly dynamic, as its behavior is controlled by properties that are re-computed using predictive models generated by the OMEGA predictive data mining (PDM) system. The use of i*Doc allows content to be rapidly retargeted towards other Web delivery platforms, such as JSP, direct mail or mobile Internet. 2000 Published by Elsevier Science B.V. All rights reserved. Keywords: Customer relationship management (CRM); Electronic commerce; Extensible markup language (XML); Interactive electronic technical manuals (IETM); Personalization; Predictive data mining (PDM)
1. Introduction Customer relationship management (CRM) concerns acquiring, establishing and retaining a mutual business relationship based on knowledge the company has acquired from customer behavior, preferences and response. Coupling knowledge of company processes to the insight of customer behavior is a key factor for establishing effective communication, transaction and processing of customer sessions. Knowing, on the basis of data mining techniques and predictive modeling, how a customer wants to be treated and what triggers interest, is an Ł Corresponding
author. E-mail:
[email protected]
important ingredient of CRM. Making connections to back-office sales strategies and content repositories completes the opportunity to build a true one-to-one experience with the customer. The World Wide Web is rapidly becoming a medium of choice to achieve personalized marketing and commerce. Within a company, one-to-one communication affects many departments and is being addressed at many organizational levels, ranging from strategic sales, marketing analysis and decision making down to implementing their implications for back-end information systems and front-end Web engineering. A range of commercial products are positioned as frameworks for development of one-to-one Web
1389-1286/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved. PII: S 1 3 8 9 - 1 2 8 6 ( 0 0 ) 0 0 0 4 9 - 9
180
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
communication solutions. In Section 3, we will provide a brief overview of the technical approach shared by some of these products, and will argue that they have several important drawbacks. Reaching the users of tomorrow will not only be restricted to the means we know today, but more likely to different media that will be used transparently in the situation the user is in. One could argue that communication will be driven by the requirements of the customer, not by the means of communication. Ideally, information interchange will adapt to the needs of the user, even if the person is not even aware of this fact. As an attempt to provide for this concept, we have developed i*Doc, an XML-based format that encodes a snapshot of a company’s portfolio and marketing strategy. i*Doc is heavily based on approaches developed for interactive electronic technical manuals (IETM), as discussed in Section 4. In contrast to existing approaches, i*Doc is neither a marketing data management solution nor a delivery format but an interchange format. Delivery formats like Microsoft Active Server Pages (ASP) or JavaServer pages (JSP) can be generated in fully automatic fashion, thus offering significant flexibility in deployment options at potentially lower cost. As an XML-based interchange format, i*Doc offers a simple interface to integrate information systems. Finally, i*Doc offers simple integration to intelligent external customer modeling systems. As a particularly relevant example of this, we discuss the OMEGA predictive data mining system [9], which can be used to generate intelligent models for customer behavior. To demonstrate the capabilities of i*Doc, we have developed a sample E-commerce site that demonstrates on-line personalized wine selling. In Section 5, we will discuss the motivation of this showcase, and discuss how i*Doc content is generated from a wine product database, transformed to ASP using an i*Doc compiler, and combined with OMEGA-derived models to offer intelligent personalization.
2. CRM and content management Key to provide the conditions necessary to meet the high and diverse demands we are putting on in-
formation, is control. Content management (CM) is often mentioned in this context to scope the functional domain of managing information in chunks (or components), irrespective of its purpose or use in a later (IT-) life-cycle. Bridging the gap between customer expectation and business response to individual needs and circumstances can only be properly addressed with the following prerequisites: ž content management system enabling control on arbitrary fine-grained information units (CM); ž user profiling through real-time feedback and offline feedback (e.g. through predictive data mining); ž defining business rules that capture supply and demand mechanisms (e.g. a customer profile will match if a product has the right pricing); ž modeling user interaction in a framework that makes profiling, coupling of business rules and content collation possible in a consistent manner. What is needed is the capability to automate the process of collecting and collating information from all the operational systems that manage interaction with the customer, such as front-office sales automation systems, call centers (including telesales and telemarketing), order processing, customer support, shipping, etc. A standard to define the relationships and enable interchange between systems is of paramount importance. XML seems like the most likely contender to address this need [3]. A content management system based on XML offers the environment to maintain information components on an arbitrary fine-grained level that makes addressing, querying and retrieval of components possible to ‘fuse’ with business rules. Dynamics and userdriven communication can be further extended with the use of predictive data mining (PDM). Results of interaction are constantly updated through the CRM life-cycle (knowing, targeting, selling and designing) and are stored or routed to business logic rules. One can argue that modeling of business rules and content matter on the component-level constitutes the required business intelligence and technological basis for CRM. This paper will argue that it is necessary to encode both business rules, objects, logic and content in one platform-independent format: XML. Fig. 1 shows how one-to-one marketing is enabled by acquiring
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
181
Fig. 1. Pre-requisites for one-to-one marketing.
knowledge about customers and managing content as components, that can be assembled according to the profile of the individual customer.
3. Current approaches Systems that are engineered to establish a one-toone relation with customers are usually based on application areas such as the Web, call-centers, direct mail, etc. The type of media dictates the mix of ingredients and underlying architecture used within the CRM-system. Databases are used for storing customer profiles, data-mining techniques are applied to detect patterns in customer behavior, filtering techniques for information dissemination and scripting languages to connect different information sources. There are also more monolithic systems that attempt to do all or most of these things. What is lacking is a consistent approach that is independent of application area, and can be transposed in multiple application environments such as the Web, WAP and direct mail simultaneously.
3.1. Scope of personalized publishing A number of personalization techniques can be used and related to business logic rules. These techniques have varying effectiveness depending on situation and purpose in mind. The following list gives an idea of possible personalization techniques: ž rules-based matching (club members, frequent visitors, Gold-card owners, etc.); ž matching agents (established profile can be matched with other profiles displaying similar purchasing behavior); ž feedback and learning (fields of interest); ž community ratings (others help define good from bad); ž attribute searches (all books with reduced prices); ž full-text search; ž collaborative filtering (feedback on products and services defines groups of individuals with similar interests). Encoding the above techniques can be achieved in many ways, usually based on dedicated application software working on content fragments that supply transformed and converted content elements on the fly. Within the context of this discussion, we have
182
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
decided to handle these techniques as external procedures that convey values to the core variables within an i*Doc (see Section 4.5). 3.2. Web-oriented systems Today’s Web-oriented systems are tools that extend functionality of Web server applications. Site management systems offer the interface between server-based repositories to controlled client Web delivery. Depending on the level of sophistication of the tool, site-developers are able to build serverside applications that respond to client behavior and implement the calling of these ‘remote procedures’ from client Web pages. ASP and JSP technologies are widely used to encode intelligence within Web pages to deliver a personalized experience. Tools like BroadVision One-to-One [8] and Vignette StoryServer are examples of commercial products based on this concept. The approach both applications share is the separation of business rules and content and the invocation of these rules from Web pages. Fig. 2 gives an example of such code embedded in a Web page.
Business rules are encoded as methods that use relational table definitions as parameters. Usually, a GUI generates a creation-script, as shown in Fig. 3. The supplied scripting language can manipulate objects that have been defined in the ‘Business Manager’ workbench. These objects are user-defined entities that map to database records and fields. Setting business rules on business objects is handled in a proprietary fashion: constraints are defined describing boundaries that determine delivered content to the user (or community). Matching agents use the rules (or sets of rules) to fill HTML-templates for one-to-one publishing. Agents are implemented as server-side methods that are invoked from functioncalls that reside in Web templates. Within templates, ASP-scripting (or JSP) is commonly used for arithmetic functions and iteration on stored objects. Direct access to services on the server makes this approach very efficient but also very dependent on the implementation of the data and application layers. Two seemingly distinct sets of information, business rules and content components, are still tied
Some CD’s you might want to check out:
Artist Title | Label |
<% content = matchObject.matchContent("match_rule", "MusicAdvice", "CDS", visitor, Session.Profile,100); if (content!= null and and content.length >0) { var x; for (x=0;x" + item.get("TITLE") + " | "); Response.write("" + item.get("ARTIST") + " | "); Response.write("" + item.get("LABEL") + " | "); } } else { Response.write("No content available! |
"); } %>
Fig. 2. Example of BroadVision invocation of a business rule for personalized music advice.
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
183
Fig. 3. A GUI enables the specification of business rules.
together by using application logic that maps directly into the tables of the business rules. In this sense, controlled content delivery is a matter of delivering templates with proprietary application logic that execute matching agents. If a certain business rule is changed, this would require editing of all occurrences of the code throughout the site. Control on information is set on a level corresponding to the granularity of the database-schemas. However, in real-world applications more control is required such as ‘tone-of-voice’ and conditional texts. Creating support for these features in a relational database scheme would prove untractable. Deeply nested recursive structures do not map well on the relational paradigm. The Web-orientation implies the maintenance of a format that is at the end of its life-cycle. In order to adapt to future market-change and communicate across media, it will be necessary to abstract from low-level access on information components, and switch to a more generic format for defining business intelligence. Another issue is the limited distinction between content, lay-out and business information in the aforementioned tools. Cascading style sheets (CSS) provide a clear separation of lay-out information and content elements, but have the limitation that stylesheet information can only be bound to elements within an Internet environment. Current personalized Web delivery tools can therefore be considered as low-level tools that rely on trained staff to make changes to the system when business or market demands this.
3.3. Conclusion We have seen that within current approaches conditionalization is accounted for at a level of HTML templates rather than at the level of content, causing high maintenance cost and no cross-medium publishing capabilities. Second, systems integration is product-specific and costly. Third, customer modeling is largely based on static, handmade profiles rather than on dynamic models derived by mining the company’s warehouse of historical sales data. The challenge we are faced with is devising a format that can integrate multiple sources and serve multiple purposes in a uniform way. The format must also support extensibility: user-behavior and tracking must be fuelled back in the system enabling a more knowledgeable communication with the customer. The systems we discussed still rely on dedicated, proprietary architectures that make it difficult to leverage the effort in building a one-to-one system and re-using that effort in multiple, future environments and applications. The eminent WAP-revolution is a good example of the requirement to maintain business information and customer communication on a higher level for cross-media purposes. The i*Doc-format proposes to capture and implement business rules on this level. The ability to use a higher-level vocabulary (e.g. isLoyalCustomer, likesNewWorldWine) and a higherlevel abstraction on data sources (e.g. CustomerDatabase, LegalSite, PredictiveModelingTool, AccountStatements) are vital to make sure that content
184
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
can be re-purposed to meet future business demands. Business rules, objects, application logic and related content should be made independent of technical issues to meet today’s and tomorrow’s fast growing demands on information delivery. The XML interchange standard is designed for text-encoding at any required level for generic purposes and can therefore be considered the choice technology for encoding information for multi-purpose, cross-media delivery of content.
4. The i*Doc document format 4.1. Objectives Our interest in working on i*Doc is to improve development of intelligent Web-based systems that offer one-to-one communication and commerce to support CRM. The use of an XML-based interchange format allows us to avoid some of the shortcomings of existing systems as identified in Section 3, specified below. ž CRM applications typically require integration of multiple existing back-end systems for managing content, inventory, marketing and customer data. i*Doc is an interchange format that can integrate information from multiple sources in a uniform and consistent way. ž i*Doc is based on XML. This means that standard XML interfaces like DOM [4] and SAX [22] can be used to develop software interfaces with back-end systems and publishing systems, and that i*Doc content can be converted using generic (license-free) XML transformation languages like Omnimark [26] or XSLT [11]. ž Conditionalized constructs can be nested and can apply to any level of granularity, ranging from top-level elements down to individual characters in running text. ž Declarative specification of conditionalization offers flexibility in choice of delivery platform. Using XML transformation tools, both conditional constructs and content can be transformed automatically to multiple delivery formats like HTML, ASP, JSP or WML, or used to control a more traditional hardcopy-based direct mail distribution channel. This enables companies to communicate
consistently with their customers irrespective of channel. ž i*Doc only standardizes a conditionalization vocabulary, it does not prescribe any standard encoding for content. This means that a common i*Doc architecture can be used for vastly different types of content, provided that content is expressed in an XML format. ž Conditionalization information is encoded using an expression language that references properties associated with customers. On-demand, run-time evaluation of expressions allows personalization on the basis of very dynamic properties, such as user navigation history. ž Reference to properties in the i*Doc expression language is separated cleanly from binding of properties to external systems. This means that simple i*Doc-based systems that are based on relatively simple, fixed profiles can evolve into more complex systems where properties are (re-)computed dynamically by external intelligent systems, without the need to change i*Doc content. In Section 5, we illustrate the use of i*Doc in a simple demonstrator system that features properties determined dynamically by an advanced predictive data mining system. A production system might keep track of hundreds of properties, many of which are customer-specific, and a significant subset of which are dynamically (re)computed using intelligent external systems. 4.2. Interactive electronic technical manuals The i*Doc concept is heavily based on results of developments in the context of interactive electronic technical manuals (IETM). IETM is a concept developed at the U.S. Department of Defense to support operation and maintenance of complex technical systems [17]. The DoD uses a scale of five classes of IETM systems to distinguish various levels of functionality offered. Classes I to III are basic electronically viewable documents, ranging from pagebased display (class I), via electronically scrolling documents (class II), to linearly structured IETMs (III). Common delivery platforms for these are PDF or TIFF viewers for class-I IETMs (often scanned legacy documents), and HTML viewers for class-II
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
and class-III IETMs (the latter with extensive use of frames and limited scrolling). Classes IV–V IETMs offer high end-functionality described in the DoD MIL-PRF-87269 standard document. These classes require the use of SGML [19] as source format, use of databases for storage, and support for context-sensitive navigation and content delivery. In the IETM context, context-sensitivity is important for various reasons. One application is to support situations where different system versions or configurations require different maintenance procedures. Another application is support for different types of users. Some IETMs offer multiple versions of particular information components for novice and expert users. The description for novice users may provide more details, introductory information, and instructions when to call in expert help. The description for expert users can be more succinct or even be limited to a check list, and may describe alternative actions that require special expert skills. Via a user interface control, expert users can switch down to the detailed description, but novice users cannot switch up to the expert view. While the novice=expert distinction is often the only way to segment the user base of an IETM, the personalization mechanisms for context filtering as offered in MIL-PRF-87269 are very generic. i*Doc applies these concepts to Web site personalization. 4.3. Building morphing Web sites using i*Doc As a metaphor, morphing illustrates a process where content adapts dynamically to the individual who accesses the site, thus reducing the need to use hyperlink traversal or search engines to access relevant content and increasing the commercial interest of the site. The i*Doc-format is intended to act as a flexible data format to encode content for morphing Web sites. Technically, the requirements on content for class IV–V IETMs are very similar to the requirements for morphing Web sites. We have been using the term i*Doc to refer to the use of content encoded using an XML version of MIL-PRF-87269 content in combination with intelligent external systems. Apart from adapting the SGML specification to XML, i*Doc makes explicit use of three-valued logic and uses namespaces to merge content from multiple sources. The format
185
also supports definition of access to ODBC [25] and external COM [24] objects. The i*Doc format is a declarative XML-based vocabulary to express conditional content. As opposed to HTML, i*Doc only standardizes a markup sublanguage, as in itself it provides only the limited set of constructs that express conditionalization. These elements need to be complemented with content-bearing elements. The distribution of i*Doc tags is governed by the i*Doc document type definition (DTD). This schema is a simplified version derived from the MIL-PRF-87269 IETM standard. Distribution of content elements is governed by application-specific content DTDs. i*Doc can be used with both higher-level content DTDs like DocBook [12] or lower-level DTDs like HTML [18,32] or WML [31]. Content and conditionalization can be separated cleanly using the namespace mechanism [7]. In Fig. 4 an example is shown of an i*Doc frame, as it might be displayed graphically in a (hypothetical) marketeer’s workbench. The i*Doc notation is shown in Fig. 6. 4.4. The MIL-PRF-87269 language MIL-PRF-87269 offers a standard language to express conditional context filtering [13,15]. It offers data structures corresponding to standard programming language constructs using an SGML-based syntax (in this paper converted to XML syntax for consistency). In this subsection, we will briefly summarize (a simplified variant of) this language. Expressions can reference properties that can be typed as integers, strings or booleans. Constants can be defined and referenced using the elements
, , and . Operators can be used to test property values or to construct complex expressions. There are integer operators like (greater than) and (less than) that yield boolean values. Boolean operators can be combined using and operators. String-valued properties can be tested for equality. Fig. 5 displays an example from a hypothetical IETM that tests whether a property SerialCode is less than 16230. This test might conditionalize content that only relates to the first instances of a particular product (that may suffer from a defect that has
186
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
Fig. 4. An i*Doc flow-diagram.
been remedied for later releases). Note that in this example no namespace prefixes are used. These expressions can be used within IETM content to conditionalize document sections. MIL-PRF87269 contains the following constructs that express conditions.
SerialCode 16230 Fig. 5. Example of MIL-PRF-87269 expression of the condition ‘SerialCode’ < 16230.
ž An consists of an expression and two container elements for the ‘true’ and ‘false’ branches. Our work extends MIL-PRF-87269 in allowing explicitly for a third branch that is taken when expression evaluation is stalled because property values are unknown. ž A consists of a number of container elements that must contain a daughter node containing content. This is similar to a ‘switch’ or ‘case’ programming language construct. It resolves to the container element that contains a that evaluates to boolean ‘true’. The element can be used inside a conditional element as a generic container for content. It contains an optional first daughter node precond. A precond element contains an expression that, when evaluated at run-time, should return boolean ‘true’ (unless the node is contained in a NodeAlts). Nodes can be referenced via unique identification attributes. The i*Doc compiler used in Section 5 can generate separate output units (HTML pages, WML decks) for various nodes and convert ID=IDREF cross-references to hypertext links. Fig. 6 shows the use of namespaces to differentiate i*Doc elements from content elements, in this case elements from the XHTML DTD [32]. The example shows an that limits access to a special offer to high-value customers. This is done by only displaying a hyperlink to users that have the boolean property highValueCustomer set to the value true. This corresponds to the graphical representation shown in Fig. 4. The MIL-PRF-87269 standard assumes a runtime interpreter that has knowledge of the syntax and semantics of the expression language, or compilation to a format that provides equivalent functionality. The interpreter should validate the conditions on elements, and evaluate the s contained in s. It should also maintain a lookup table associating properties with values. All properties have global scope and need not be initialized. Properties can obtain values in one of several ways. First, a value can be asserted in an statement in a in a node. This is shown in Fig. 7 with an example that might be used in an on-line shopping site such as the wine site
P. van Amstel et al. / Computer Networks 33 (2000) 179–195
187
highValueCustomer true Read all about our special vintage Champage offer View our daily specials Fig. 6. Conditional hyperlinks to nodes ‘specialOffer’ or ‘dailySpecials.
discussed in Section 5. This assertion records that a customer has visited a particular page containing a Champagne offer. This illustrates a simple way to selectively record user navigation, which might subsequently be used to generate content (such as related offers) sensitive to site navigation patterns. In IETMs, a second way a property value can be set is through user-interaction. The run-time interpreter is required to detect reference to properties that are not assigned values. In MIL-PRF-87269, a can have a dialogRef attribute, which references a