The Journal of Systems and Software 76 (2005) 221–235 www.elsevier.com/locate/jss
Workflow analysis for web publishing using a stage-activity process model Jiannong Cao *, Catherine Chan, Keith Chan Internet Computing and E-Commerce Lab, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Received 17 December 2003; received in revised form 16 April 2004; accepted 22 May 2004 Available online 23 July 2004
Abstract Web publishing has been the core business of ICP (Internet Content Provider) companies. It is also a major functional component in other Internet industry sectors. However, although there have been on-going developments in this area, up to our knowledge, very little work has been done on web publishing process models and methodologies. In this paper we describe a general process model for web publishing based on workflow analysis. Using a case study approach, the existing web publishing methods used by major Hong Kong ICP companies have been evaluated. Based on the evaluation results and our observations, the issues and problems of the existing methods are identified. A general process model, called the stage-activity model, is proposed which facilitates the development of a framework for automating the web publishing process with the incorporation of workflow management. The results of this research contribute to a clear understanding of what challenges ICPs are currently facing and a solution for building an automating web publishing system. The framework can also be used to evaluate and select the web publishing tools in the market. Ó 2004 Elsevier Inc. All rights reserved. Keywords: Web publishing; Process model; Workflow management; E-commerce
1. Introduction The continuous advance in the Internet technology and related services has created many opportunities for business organizations and companies to conduct electronic business on the Internet. Shortly after the Internet became a hot industry since late 1990s, there have been a large number of Internet companies appearing in the market that offer a variety of services in different areas of e-commerce, such as B2B (Business-toBusiness), B2C (Business-to-Customer), and C2C (Customer-to-Customer). Among the most common business in the B2C arena is the Internet Content Provider (ICP). The main service offered by ICP companies is to provide
*
Corresponding author. Tel.: +852 27667275; fax: +852 22740842. E-mail address:
[email protected] (J. Cao).
0164-1212/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2004.05.028
on their web site quality contents, including news, stories, articles, etc., on a timely basis (Dewan et al., 1998). A number of attributes of the Internet in general and the World Wide Web (WWW) in particular, including global reach, distance and content-independent pricing, standard browsers, and easy advertising via Web links, are particularly attractive to proprietary content providers. These features made it easy for firms to market and sell contents to a large marketplace (Dewan et al., 1998; Ciolek, 1996). Web publishing has been the core business of ICP companies. As a matter of fact, to certain extend, different areas in the Internet industry as mentioned above also require publishing contents/information on the Web. For example, an e-retailing company engages in publishing new product information, special promotions news, and so on from time to time. A company providing web services needs to publish information based on
222
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
its web hosting packages. For example, an online directory company involves in publishing directory services along with advertiser messages. Thus, web publishing is vital to the Internet companies operating in all areas, although the technique has a different degree of significance in each area. For Internet content, as for most information goods, there is a large cost of developing and providing the content. Furthermore, efforts are needed to ensure the quality of the content which is provided to users. Therefore, it is usually required to have a team of producers working together during the whole web publishing process, conducting quality content development and management, collaboration and control over the site design. However, interestingly enough, not many ICPs in Hong Kong are able to manage and publish their contents in an efficient and quality way. In general, this is also true for ICPs worldwide. Web publishing remains one of the most chaotic areas (Aceto, 1999; Choy and Morris, 1996). Although there have been on-going developments on web publishing technologies and software, up to our knowledge, very little work has been done in the area of process models and methodologies for automating whole web publishing process. Most existing work on web publishing have been focused on formats, markup languages, HTTP servers, browsers, and other technical issues which are of concern to web authors or the coordination of workflow in some stage (not whole) of web publishing. So far, a lot of the technical implementations were done on ad hoc basis. There is not sufficient systematic approach and support available to facilitate the coordination of collaborative work amongst the web publishing team. In this paper, we propose a general framework that helps Internet companies, ICPs in particular, to streamline their web publishing operations following a systematic approach based on workflow management. More specifically, we describe a stage-activity process model for web publishing. Using a case study approach, the existing web publishing processes undergoing in major Hong Kong ICP companies have been evaluated. Based on the evaluation results and our observations, the issues and problems of the existing methods are identified and then a stage-activity process model is proposed. A practical process model should be supported with automation schemes (Jacobson et al., 1999). Therefore, we have also developed a framework based on the proposed process model to facilitate the automation of the web publishing process. The framework is based on web publishing workflow analysis and management, and a prototype of automating web publishing system has been implemented on networked PCs. The rest of this paper is organized as follows. Section 2 provides the background of our study and describes the related work. In Section 3, we report case studies for evaluating the web publishing methods currently
adopted in major Hong Kong ICP companies. Section 4 proposes a framework for streamlining and automating the web publishing operations. First, the stage-activity process model is proposed. Then, we discuss how the model can be used to develop a computer-based system for automating the web publishing process. We describe the architecture of an automating web publishing system as well as the workflow management model based on the model. Section 5 introduces a prototypical implementation of the automating web publishing system. Section 6 concludes the paper and proposes the future work.
2. Background and related work In this paper, we refer to Internet Service Providers (ISPs) that primarily provide content as Internet Content Providers (ICPs). ICPs are one of the major parts in the Internet economy model (Dewan et al., 1998). They collect, process, organize, index, update, and provide the data in a format usable by customers. An ICP can have a vertical focus or a horizontal focus. A vertical ICP provides content pertaining to a particular industry or area, e.g., a financial site providing contents on financial news, stock quote, and so on. A horizontal ICP provides contents on a wider range of topics, e.g., a community site focusing on providing a variety of contents as in daily news, horoscope, city events, and so on. There are ICPs that adopt a full cyber marketing approach and whose core business is web publishing. Such an ICP develops and maintains a quality web site, where the production of web contents starts within the company from scratch by using various languages and authoring tools such as HTML, DHTML, ASP, Frontpage, Dreamweaver, CGI, etc. Apart from this type of ICPs, there are ICPs that adopt a partial cyber marketing approach, where its web publishing business is only an extension of its core business in other industry. The production of web contents does not start from the ICP division of the company since the contents originate from its core business. A traditional newspaper company is an example of these companies, which publishes its newspaper through both traditional print media and on the web. In general, the web publishing methods used by these ICPs are specific to its core business. That is, the web publishing process starts from transferring the contents that make up of the printed products to their web site. The related work in web publishing can be classified into three main categories: electronic document management (Eriksen, 1997; IEEE Standards Association, 1999), underlying process models (Choy and Morris, 1996; Green, 1996), and content management tools (Aceto, 1999; Foo and Lim, 1997; Vaughan-Nichols, 1995; Lie and Saarela, 1999). Although all of them provide insights and may also be applied to our research, in
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
this paper we focus on the development of web publishing process models and systems. In our research, we endeavor to identify the activities in web publishing process and the key components that are required for an integrated web publishing system that support workflow management. A process is defined as a description and ordering of work activities across time and space that is designed to yield specific products or services while ensuring the organizationÕs overall objectives (Sommerville, 1996). It provides a conceptual basis for the integration and coordination of distributed resources, tasks, and individuals (Cichocki et al., 1998). The web publishing process aims to deliver timely and accurate contents using an integrated and coordinated system model that involves both people and technology. Timely content, in this context, means the up-to-date content, where the status of the content is defined according to the site policy. Accurate content refers to the content free from error such as broken or wrong links, missing or wrong images, wrong aggregation of materials, wrong data/ materials, distorted layouts and so on. Furthermore, the investment on human resources for maintaining an accurate and timely site is usually large, which raises a continuous problem of management. Schedule delay and long development time are frequently encountered. In addition, the factors such as lack of involvement of skilled technical people, inappropriate tools, and uncertainty on how WWW technology can be integrated with the existing organization business model also contribute to the difficulty of creation and maintenance of electronic document (Eriksen, 1997; Seesing, 1996). A general model of online publishing has been proposed by Green (1996) with the focus on the sequence of major activities involved. Five steps, namely submission, acquisition, quality control, publication, and delivery, have been identified as the main steps in the web publishing process. The model, however, does not include the stage required prior to submission, which includes the activities covering production work by web editors and web designers to produce the contents/materials needed. As pointed out by Seesing (1996), there are lots of pitfalls in the content production stage, and the right process workflow plan for this stage requires much thought and attention. As we will see, the bottleneck and needs for additional resources often occur in this part of web publishing process, which result in overall schedule delay and budget overrun. The lack of integration of the production stage into the process model makes the total control of web publishing difficult. There have been commodity content management systems available in the market. Most of the products support building up the contents of web sites, e.g., ecommerce, as well as keeping the integrity and maintainability of web pages. RedDot (RedDot Solutions, 2004) is a well-known content management system. It supports
223
the administration, workflow management, and web page maintenance in enterprises. RedDot provides two types of workflow management capabilities to project administrators. One is multi-level workflow in which project administrators can assign a workflow to every content area in a project. The workflow system is very simple to administer, based on an action-reaction formula. For example, RedDot can e-mail specific users to request a page be released for publishing after it has been created. The workflow can be nested as much as desired, to create a release process that contains multiple levels of hierarchy. The other is a straight-line workflow process in which all content is routed to the administrator to approve for release to the live site. Spinblue provides easy web content management, such as the capability of easy adding, editing and deleting web pages, for small businesses and organizations. It does not support workflow management. Vignette (2004) provides content management and portal solutions to enable organizations to rapidly build, manage and deploy web applications to the real-time enterprises. Its primary workflow management empowers a team to get the right staff involved in information creation, review and approval before it is published. The web content management systems above only provide limited support to workflow management. They have not implemented the automatic workflow management throughout the entire web publishing process. So far, there is no robust web publishing system to support the coordination of collaborative work among web publishing teams in which the tasks range from the initial stage of production to the final stage of upload and maintenance of the web site. Usually, the text editor and graph designer use different design tools and applications to produce the contents. The submission and approval of the content is accomplished in person rather than via an integrated web publishing system. The operation managers rely on technical team to upload and maintain the contents of the web site. There is not an integrated platform for the web publishing process. Hence, the web content is mostly developed, uploaded and maintained on an ad hoc basis. With regard to the high requirement on workflow management found by our investigation, we have developed a web publishing process model in this paper based on a workflow analysis approach to facilitate the workflow management for web publishing. In general, a workflow management system has been considered as a computer-automated infrastructure where a group of people participate together to achieve a common goal following some predefined rules and task assignments (Cichocki et al., 1998). In the context of this paper, we aim at identifying the key activities carried out by the respective persons/entities at each stage and proposing a framework for coordinating the activities. Our model extends the general model of online publishing
224
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
developed by Green (1996) in order to come up with a more comprehensive and systematic framework that helps ICPs to streamline the web publishing operations from workflow management perspective. The model serves as a guideline for the ICPs to build their web publishing systems.
3. An evaluation of web publishing methods used by Hong Kong ICPs In this section, we describe case studies of three major Hong Kong ICPs. The main purpose of the studies is to evaluate the web publishing methods currently used by these ICP companies so that we can identify the key issues and problems in existing practice and propose potential areas for improvement. The results of the evaluation and our observations derived from the studies are discussed, which provide the basis of analyzing the web publishing workflow and developing the stage-activity process model proposed in next section. Using an extended model of the web publishing process based on GreenÕs work (Green, 1996), we divide the web publishing operations into six stages: (a) Production––creation of the contents by web editor and web designer using web publishing tools and the development of templates to hold these contents. (b) Submission––submission of the produced contents for approval and publishing. (c) Acquisition––reception and retrieval of the submitted contents. (d) Quality control––quality check on the contents as well as the approval of the contents for advancing to next stage. (e) Publication––publication of approved contents to preview platform for preview. (f) Delivery––final release of approved contents for public access. As we can see from the model in Fig. 1, the web publishing process follows a linear but sometimes iterative model. Each stage is carried out in sequence. In the case where the produced contents do not pass the quality Production
Submission
Acquisition & Retrieval
control, the process iterates back to the previous stages to fine-tune the operations. This situation is common in web publishing as most content is developed through a series of fine-tuning and approval activities. The iteration from Acquisition and Retrieval stage back to Submission stage only happens when system error occurs during the content is being acquired. In this case, files have to be re-registered. Three major Hong Kong ICP companies, where one of the co-authors once worked as an employer, have been evaluated. The first case is the on-line division of a major directory company in Hong Kong. The company has an existing core business in print media and it is in the third year of running the online business. At the very beginning, the directory site was developed for the purpose of offering only the online search function of directory listing. With the recent take-off in the Internet industry, the company keeps on adding new features and channels to the site. Up till now, there is no Internet technology expertise in the online division to support the team. Most of the technical development and web publishing tasks are outsourced to major IT vendors and supported and maintained by the division. The main functions for the site are the search engine for directory listing and the search engine for advertising listing of specific industries. The second case is the portal group in the ISP division of a major Telecommunication company in Hong Kong, which has an existing core business of providing mobile communication services and it is in the second year of operation in the ISP business. The portal group focuses on building and maintaining the portal site of the ISP division. It has a business model as an ICP, that is, it provides quality contents to attract traffic to the site and monetizes the traffic of the site by revenue generated from online advertisements. In order to make the site attractive, more and more new channels with many sub-channels are being added to the site. For example, it has channels that focus on the contents in travel, IT, entertainment, financial information, e-commerce services, etc. The companyÕs initial strategy for building and maintaining the site was leveraging the resources from the existing core business. The group lacks Internet-related technical expertise. Most members of the Quality Control
Publication
Delivery Release
Approve
Correct
Re-register
Live Site
Legend: = Processs
= Required Action
= Final deliverable from the Process
Fig. 1. Web publishing process model.
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
content development team (i.e., web editors and web designers) within the portal group are hired from outside. Outsourcing in the form of using free lance editors for content development is also frequently adopted in this company. The last case is a pure ICP company (i.e., the company does not have other core business except its ICP business). Its main business focuses on producing quality web site and making revenue out of online advertisement. The company is in the second year of operation at the time of interview. This company adopts two different approaches in developing contents. Its web site contains many different channels and sub-channels. In particular, there is a news channel that is updated daily and produces about 30 news articles per day. The other channels, including e-commerce channel, business computing channel, games review channel, etc., are produced and updated less frequently. These channels have a rough schedule of producing around three to six articles per week. As a common finding in all of the three cases, the content development and content/site management involve the following areas of expertise: Web Designer––responsible for the production of graphic contents. Web Editor––responsible for the production of textual contents. Content Manager––responsible for overseeing all of content development activities of the site. Web Specialist/Web Master––responsible for technical support to content development team, programming markup language to facilitate content development, and managing the contents in uploading, downloading, updating, backing up, and archiving the contents. The staffs involved in each stage of the web publishing process were interviewed. A well-prepared questionnaire was used to conduct the interviews (see Appendix A). There are two main aspects covered in the evaluation study: 1. The workflow of the web publishing process which focuses on the major phases involved––from production to submission and to delivery, and the activities conducted by the members in the team; 2. The mechanisms that are used for facilitating and integrating each phase in the web publishing process. The questionnaire was designed to facilitate the identification of key issues related to the two aspects. It consists of two main parts, namely content development and content/site management, which contain a total of 49 questions covering operations, entities, activities, and
225
management issues involved in different stages of web publishing. Content development involves the tasks that carry out the development of contents for the site. Referring to the web publishing process in Fig. 1, the content development involves activities from stage one to stage four, namely, production, submission, acquisition, and quality control. Content/site management involves the management of the contents from submission to delivery. That is, it involves activities that manage the contents to be produced, submitted, and acquired but not approved yet. It further involves the activities that facilitate the quality control, publication, and delivery. Through the case studies, we found that the underlining web publishing process adopted by each company matches quite closely with the six stages in Fig. 1, although there was not such a clear cut between the six stages and the ways to carry out the activities within these companies were quite different. In the evaluation, the issues like inappropriate segregation of activities, non-cohesive and unstructured web publishing process, lack of collaborative platform to facilitate the whole process, and the system developed for inappropriate user group were found as the major concerns. This is mainly because of two reasons: the web publishing methods adopted by the ICP companies are still very primitive and, as a consequence, planning and management are difficult. The companies in cases 1 and 2 actually adopt a fully manual approach to coordinate the web publishing activities. The company in the third case adopts a semi-automatic method. It uses manual approach to publish almost all the channels and sub-channels of the site except the news channel. An automated approach is used to publish the news channel which demands high volume of contents being delivered every day. The content/site management team developed a small-scale news publishing system, called News Admin System, to support the content development phase of publishing news articles. The system has a content management tool by which web editor/free lancer can produce news articles using templates and then store them into a single database. However, there are many limitations in their system. There is no access control over the system and no version control, and thus no way to verify who produced the articles. Also, there is no support for the maintenance of quality and integrity of the contents being produced and stored in the database. More importantly, there is no integration between this system and the rest of the publishing process for other channels. The system does not facilitate the collaboration among the team members. Coordination of activities involved in content development and content/site management remains to be manual-driven either over the phone or via e-mail. In summary, publishing the whole site is still considered very fragmented that needs a lot of interpersonal steps to support smooth operation.
226
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
Based on the case studies, the workflow of the web publishing methods in these companies can be summarized in Fig. 2. Table 1 explains the activities, limitations, and issues in the workflow in detail. As we see, in general, the most of the development activities are done separately by different entities at different stages. There lacks systematic approach and support to facilitate the coordination of collaborative work amongst the web publishing teams. For example, the text editors and graphic designers often use different design and authoring tools for the production of the contents. The submission and approval of the produced content is done in person rather than via an integrated web publishing system. The editors and operation managers rely on technical teams to upload and maintain the contents of the site. There are not standard operational procedures or schedules in operating and maintaining the site. Most of the time, the content/site management is done on an ad hoc basis either by internal team or outsourced vendor. Furthermore, there is no centralized repository for storing submitted contents, which is essential for automating the process (Dzlina, 1999). Consequently, significant effort is required and a large amount of time is spent on manually coordinating the activities of different people and integrating the outcomes of different stages. As one consequence of the lack of an integrated web publishing process model and associated supporting system, all three companies find it very difficult to build and maintain the site within operational budget and resource constraints when the sites of these companies are expanding. The integration of the works done individually by the web specialist/web master is very repetitive and tedious that wastes a lot of technical resources. Moreover, the content is often approved at last minute, which puts a lot of stress on the team for maintaining updated contents on the site. Under the situation where the content development team uses free lancers for content development, it makes the process even more complicated and time consuming. As this practice accumulates, it demoralizes the team spirit and results in high turnover of staff in these companies. As a matter of fact, most ICP companies face this same situation.
4. The stage-activity process model and workflow management for web publishing The problems that we have identified in the existing web publishing methods urge the companies to seek solutions for building a web publishing system to manage the site in order to reduce the technical staff, production time, and money expense. The availability of an integrated web publishing system can also help to control and maintain high quality contents on the site. However, building such a supporting system requires
the development of a streamlined web publishing process model first. A general process model, called stage-activity process model, has been developed based on the results of the case studies. The stage-activity model is built by analyzing the workflow in a streamlined web publishing process, which facilitates the automation of web publishing process. Publishing a live site involves many people in cooperation to produce contents that pass through several quality control phases at various stages. Therefore, a workflow management based approach for quality control and communication activities can greatly improve the productivity of the web publishing process. In the rest of this section, we provide an overview of the proposed process model and give a detailed discussion on each component of the proposed model. The tasks involved in web publishing range from the initial stage of production to the final stage of uploading and maintenance of the site. As illustrated in Fig. 1, a web publishing process comprises six stages and follows a sequence of activities. When the activities of each stage are carried out smoothly, the whole process presents a linear model. In practice, however, the process usually proceeds with iterations of the activities. According to the workflow shown in Fig. 1, the stage-activity process model organizes the activities into six stages. Table 2 defines the objective of each stage. Table 3 shows the activities conducted at each stage. Given the stage-activity process model, a streamlined system can be developed for automating the web publishing process. The automating system is expected to streamline the process and increase the productivity of content development and content/site management teams by providing user-friendly tools for producing and managing contents, storing contents and related information in a common repository, and automating labor-intensive repetitive tasks. It is also expected to increase the quality of the contents by applying access control to the system, ensuring integrity of the submitted contents, facilitating and ensuring quality control on the contents, and ensuring timely delivery and regular backup and archive of old contents. The proposed automating web publishing system consists of three major components, namely, Staging System, Publication and Delivery System, and Central Archive System. Fig. 3 illustrates the structure of the proposed system at highlevel system architecture. In Fig. 3, the Staging System is the core component of the proposed system, which manages all activities prior to publication and delivery stage. In order to accomplish all the activities up to quality control stage, the Staging System is composed of three subsystems, namely Content Management Sub-system (CMS), Staging Workflow Sub-system (SWFS), and Central Content Repository (CCR). The Content Management Sub-system consists of the following modules:
227
Fig. 2. Workflow of existing web publishing methods.
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
Phases
Activities
228
Table 1 The activities, limitations and issues on the existing web publishing methods Limitations and issues
1. Web editor and designer, using their own computers, create contents based on job requests from content manager/sales order 2. In cases where a web editor is also responsible for disseminating job orders, coordination with designer is done manually 3. Web designer reviews the graphics with web editor. This process repeats until no more change needs to be made on the graphics 4. In the case where free lancers are used, web editor and web designer have to coordinate job orders and changes manually
a. Content files are not stored in a repository but rather stored in web editor/web designerÕs personal computers. Thus, no file can be shared b. All coordination of job orders and changes are done manually which is highly time-consuming. There is no workflow system that facilitates the coordination c. When free lancers are used, web editor and web designer need to make extra personal coordination effort, which consumes a lot of time. Web editor and web designerÕs time should be spent on development work rather than on coordination. Because of this, it also demoralized them
Submission
5. Web editor seeks web specialist/web master to develop templates for integrating the textual and graphical contents 6. Web specialist/web master develop templates using the contents. This process repeats until no change is needed on the developed templates 7. In case of free lancers are used, the free lancers have to deliver the files by hand or by e-mail to web editor. All coordination between free lancers and web editor are done in person 8. Web editor manually keeps track on the amount of work done and payment owning to free lancers
d. Coordination for template development is done manually, which is time consuming e. Review process on templates by web editor is done manually, which is time consuming f. Submission of free lancerÕs work is done manually, again very time consuming and tedious g. Administration tasks for tracking free lancerÕs work and payment are done manually by web editor. Sometime that demoralizes web editorÕs spirit and spends a lot of the time
Acquisition and retrieval
9. Web specialist/web master uploads the templates to a subdirectory of the live site 10. Web specialist/web master reviews the templates with web editor. This adjustment process repeats until no change is needed for the templates
h. Templates waiting for review by web editor are uploaded to a subdirectory of the live site, where the templates can be easily access by outsiders. This greatly cause concerns on security
Quality control
11. Web editor manually seeks content managerÕs approval on the templates 12. Content manager requests for changes, if necessary 13. If changes are needed, web editor, web designer and web specialist/web master make changes on the contents and templates respectively. This adjustment process repeats until no change is needed 14. Content manager approves the uploading of the templates to live platform via e-mail, phone, or in person
i. Approval process and change requests on templates are made manually between web editor and content manager, which is time consuming and often results in delayed uploading thus outdated contents on web site
Publication and delivery stage
15. Web specialist/web master confirms with web editor on upload schedule and display order of the templates once receiving approval from content manager 16. Web specialist/web master sends the approved templates to live platform by FTP based on upload schedule and display order given to him manually by web editor 17. Web specialist/web master performs backup and archive of old contents on a periodic basis
j. Information on upload schedule and display order is confirmed personally. There is no system that keeps track of the information, thus creating a lot of mistakes in arranging the contents properly on the site k. FTP activities are done manually on a daily basis, which could be pre-programmed and run automatically l. Backup and archive activities are not done frequently enough, which sometime results in lose of data
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
Production
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
229
Table 2 The stage-activity model for web publishing process: stages Stage
Objective
Production (S1)
Automate and streamline the production process for textual content development Provide input function for upload schedule and display order of the contents
Submission (S2)
Facilitate automatic submission process for all templates and contents (both textual and graphical) Facilitate automatic submission of upload schedule and display order for submitted contents
Acquisition and retrieval (S3)
Ensure submitted contents be received properly and version control applied thus ensuring integrity of the contents Write (store) and read all submitted contents to and from central content repository
Quality control (S4)
Coordinate all job orders and acknowledgements Coordinate all approval requests and ensure all submitted contents obtaining proper approval before advancing to the next stage of the process
Publication (S5)
Create preview site using approved contents and its upload schedule and display order
Delivery (S6)
Upload preview site to live platform upon approval Perform backup and archive of old contents based on pre-programmed schedule and procedures
Table 3 The stage-activity model for web publishing process: activities Stage
Activities
S1 Production
S1 S1 S1 S1
S2 Submission
S2 A1––Submit site template layout S2 A2––Submit textual and graphical contents S2 A3––Submit upload schedule and display order of submitted contents
S3 Acquisition and retrieval
S3 S3 S3 S3
A1––Receive submitted site templates and contents A2––Perform file version control on submitted items A3––Store templates and contents in central content repository A4––Retrieve submitted templates and contents from repository
S4 Quality control
S4 S4 S4 S4 S4
A1––Coordinate job order and acknowledgement for development works A2––Coordinate approval request and status on submitted contents A3––Ensure submitted contents be approved before advancing to publication stage A4––Coordinate adjustment requests on submitted contents A5––Coordinate and ensure final approval on preview site
S5 Publication
S5 A1––Production platform receives approved contents S5 A2––Production platform creates and displays preview site with approved contents based on upload schedule and display order S5 A3––Initiate final approval on preview site S5 A4––Transfer approved preview site to live platform
S6 Delivery
S6 A1––Receive uploaded files from preview platform S6 A2––Generate live site for public access S6 A3––Back up and archive old data based on pre-programmed schedule and procedure
A1––Develop site template upon receiving job order A2––Develop textual contents upon receiving job order A3––Develop graphical contents upon receiving job order A4––Input upload schedule and display order of the content
Site set-up. This module sets up the architecture of a web site. The content manager designs the site architecture
and the system administrator constructs the site based on the design such as the channels and sub-channels with the
230
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235 STAGING SYSTEM StagingWorkflow WorkflowSub Sub- System Staging
List of web editor/web designer
Signal for final quality check
Submit contents, upload schedule & display order
Preview site, approval notice, non-approved status
Job order & coordination, approval request
Retrieve review / approved files
File store Content ContentManagement ManagementSub Sub-System -System File retrieve
Central Content Repository
Store user statistic report
PUBLICATION & DELIVERY SYSTEM Preview Sub-System Sub-System
Delivery Sub-System Sub-System User statistic report
File backup/archive System Administration Module File retrieve
User identification, access right
CENTRAL ARCHIVE SYSTEM Backup & Archive Sub-System Sub-System
Backup Database & Tapes for Archive Archive data data
Web site access control
Fig. 3. Overall structure of the proposed automating web publishing system.
upper bound on the number of articles to be presented in the channels/sub-channels. The web editor and web designer can customize, arrange, and change the template layouts and subsequent contents based on the architecture. The web specialist and web master can upload and maintain the templates of the channels and sub-channels according to the design of templates. The files of the templates and contents are all stored in the Central Content Repository. Content protection and submission. This module performs S1A2 to S1A4 activities (see Table 3) that include the production of textual contents, upload schedule and display order. A user-friendly interface should be provided to capture and maintain the textual contents. When the user is a free lancer, the interface should also record the time spent on developing the contents; therefore the correspondent manager can monitor the productivity of the free lancer. When the textual content is produced and the related information is obtained, the web editor triggers the submission of the contents and information to the Central Content Repository through the interface. After the content submission, the web editor can use another interface to define the upload schedule and display order for all textual contents, which in turn determine the final arrangement of the textual contents in the site. At meantime, the graphical contents produced by the web designer can be uploaded accordingly along with the related textual contents. The module also allows the content manager to override the submitted contents and associated upload schedule and display order after reviewing the contents and related information upon the notification from the Staging Workflow Sub-system. File acquisition and retrieval. This module handles the activities in stage S3 for the acquisition and retrieval of files. It interacts with the Central Content Repository
for the registration and storage of the submitted template and content files as well as retrieving the files from the repository. The model handles the organization of the files according to the definition of site architecture. The module enforces version control on the files. The web editor and content manager can retrieve any version of the contents for adjustments. The Staging Workflow Sub-system handles all activities of Quality Control in stage S4. It provides a collaborative platform among the content development and content/site management teams to ensure quality control on all contents in the site. This subsystem allows system administrator to set up workflow procedures to govern the quality control activities throughout the web publishing process. It provides an interface for the definition of approval procedure by specifying the entity and the related activities carried out by the entity. The subsystem routes the job orders to appropriate members in the content development and content/site management team and tracks the job acknowledgements on the job orders. It also coordinates all approval requests and the status of the submitted contents such as channel/ sub-channel templates and textual/graphical contents. When a content file is submitted and registered, an approval request is sent to the content manager to review the content. If the content is not approved, a ‘‘non-approved’’ status is routed to the appropriate member in the content development or content/site management team for the revision of the content. If the content is approved, it is considered as passing the quality control procedure. The approved contents can be grouped as an upload batch for the preparation of a preview site. Then, the subsystem sends an approval request to the content manager again for the final review of the preview site. If the content manager issues an ‘‘approved’’ status to the preview site, the Staging Workflow
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
Sub-system will activate the Publication and Delivery System to publish the preview site to live platform for public access. The Central Content Repository provides a database to store and manage the files that are shared among different subsystems and modules. The files include definition of site architecture, site templates and submitted textual and graphical contents, upload schedule and display orders, user identification and access right, log of user activities, etc. The repository also supplies templates, textual and graphical files to the Publication and Delivery System for the generation of preview and live sites. The repository provides a comprehensive search function for web editor to retrieve the files from the repository. The Publication and Delivery System handles all activities in the publication stage S5 and the activities S6A1 and S6A2 in the delivery stage. It aims at generating preview and live versions of a site once the content manager grants approval. This component is made up of Preview Sub-system and Delivery Sub-system. The Preview Sub-system interacts with the Central Content Repository to retrieve the relevant files and generates the preview version of a site. Then, it passes a signal
231
to the Staging Workflow Sub-system to initiate the final quality check on the preview site. If the latter grants an approval, the Preview Sub-system uploads the final contents to the Delivery Sub-system, which generates the live site and delivers it for public access. The Delivery Sub-system also implements the basic functions of a web server. It handles the requests from public viewers for the site. The Central Archive System is responsible for the S6A3 activity to back up, archive and restore all data stored in the Central Content Repository. It has two basic components: the Backup and Archive Sub-system which performs the backup and archive functions at predefined time interval and the Backup Database Subsystem which stores the data. In addition to the major components discussed above, the automating web publishing system contains another module called System Administration Module that administrates the overall administration activities of the proposed system. The main user of this module is the system administrator. Based on the content managerÕs instruction on the assignment of user identification and access right, the system administrator uses this module to create and maintain the user identification
Fig. 4. Workflow of the proposed web publishing process model.
232
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
and access right to the staff (including free lancers) in the content development team and content/site management team for them to access the relevant modules and subsystems in the web publishing system. According to the access right, the module compiles and updates a list of web editors and web designers involved in the Content Management Sub-system. It also identifies the free lancers from in-house staffs based on the user identification and provides functions to manage the free lancerÕs per hourly rate for use by other systems in the company (e.g., Accounting system). Finally, the module produces administration log based on the activities of the user. Fig. 4 illustrates the workflow of whole web publishing process using the proposed automating system. It demonstrates how different entities within the content development and content/site management teams carry out the various web publishing activities based on the system. Interested reader can find more details of the model and the workflow in (Chan, 2000). To validate the proposed web publishing process model, a second round of interview has been conducted. The proposed stage-activity process model and the automating web publishing system closely match the requirements of the companies that have been interviewed. It is expected that the introduction of the automating system can help streamline the web publishing process, separate the responsibilities and concerns of team members, ease the tasks involved in different stages, facilitate the collaboration and coordination, and reduce the difficulties and cost of maintaining quality contents. Several considerations on implementation have also been identified during the interview. Those include ease of integration with the rest of the companyÕs technical setup, organization restructuring, change of plan, and user interface issues.
commands by the middle tier. The middle tier controls the data access and update made by the users. The Central Content Repository is implemented using MS SQL database. Other modules access the database through Tomcat 3.2 (Java servlet container) and jConnect 7.2 (Java database connection). The Content Archive System is implemented in the similar way as a database system. The Staging Workflow Sub-system provides the interfaces and APIs for content manager to control the quality of contents and coordinate the collaborative work among different staff in content development and content/site management teams. The Apache 1.3 web server is installed on the PC to implement the Publication and Delivery System module which acts as the web server for preview and live site. The prototype has provided a preliminary userfriendly environment for automating the web publishing
Fig. 5. Template layout setting.
5. Prototypical implementation We have implemented a prototype of the proposed automating web publishing system using WAT, a generic workflow automation tool developed by us. The prototypical system is developed on PIII PCs with Windows ME (a Windows version for PC networking). The prototype is implemented in Java. It consists of four major modules in correspondence to the proposed system architecture shown in Fig. 3. In the Staging System module, the Content Management Sub-system is developed as a three-tier architecture. The front tier provides interfaces for different users including content manager, web editor and web designer to access different parts of the systems to define template, create/submit content, issue request/approval, as well as perform system setting and administration. It provides users with high-level APIs that will be translated to lower level
Fig. 6. Approval of web content: (a) specify the approval members and methods, (b) grant the approval.
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
Fig. 7. Define upload schedule for approved contents.
process throughout a web project. A large amount of contents including data, texts, graphics and video clips can be handled. Fig. 5 shows a user interface in the site set-up module of the Content Management Sub-system for web editor/designer to set the layout of template. Fig. 6 shows the user interfaces in the Staging Workflow Sub-system for approving web contents. Fig. 6(a) is the interface to assign the content managers who are responsible for approving web contents and select the methods for the content approval. Fig. 6(b) is the interface for granting the approval. Fig. 7 is an interface in the content protection and submission module of the Content Management Sub-system to define the upload schedule for publishing the approved contents. The prototype can also implement reliable service by using replicate servers.
6. Conclusions and future work In this paper, we have described a general process model and a workflow management based automating system for web publishing. This work was motivated by the insufficient adoption of automatic workflow management in Hong Kong ICP companies. In general, there is no integrated platform for the web publishing process. The web content is usually developed, uploaded, and maintained separately on an ad hoc basis. The results and outcome of this research contribute to a clear understanding of the challenges that the ICPs are currently facing and the solution for automating the web publishing process. Using case studies we were able to identify the key factors that contribute to the difficulties and problems in each stage of the web publishing process. Then, based on workflow analysis, we have proposed a stage-activity process model and discussed how the model can be used to facilitate the development of an integrated computer supported system for automating the web publishing process.
233
The research reported here does not cover discussion on the operations related to publishing contents to several regions, rather, it focuses only on local operations of the ICPs with the assumption that the ICPs only have local web sites to maintain. Furthermore, in our study, the contents being developed for the site include text and graphic contents only. The development and management of contents/functions (e.g., Polling) that requires writing programs is not within the main scope of this study. In our future work, we will investigate how the proposed model can be extended to cover these issues. Our future work also includes the improvement of the prototypical software platform for an integrated web publishing system. The prototype will be used to further validate the practical usage and applicability of our findings and framework.
Appendix A. Questionnaire used in the case studies A.1. Section One: Content Development features: (Interviewees: Web Designer, Web Editors, Content Manager, Web Specialist/Web Master) Note: This part of the questionnaire is designed to find out the front-end requirements of a web publishing system. Part A––Questions relating to Production Stage (questions are divided according to the sub-process of this stage): Sub-process 1––Initiating job request 1. How are Production jobs initiated and what tools are being used for communicating job requests to appropriate staffs? Are these tools a part of a collaborative platform? 2. Is there a collaborative platform that allows the management and co-ordination of activities and the assignment of tasks/jobs through a workflow system? Sub-process 2––Producing raw contents 3. Who will be responsible for producing the raw contents once job request is received? 4. What text editors/tools are used in accomplishing the production of textual contents in the production stage? 5. What design tools are used in accomplishing the production of graphical contents in the production stage? 6. (a) Are the above tools part of the existing web publishing system (if any)? (b) If yes, is there a provision for producing these contents (for example, a user interface for inputting text contents to the site and creating and uploading graphics contents to the site)?
234
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
Sub-process 3––Production of templates using raw contents 7. Once both graphical and textual contents are produced, what tools are used for assembling the contents together? Is the assembling tool a part of the existing web publishing system (if any)? 8. Who will be involved in assembling the contents? 9. What programming language is used for integrating the contents? 10. Is there any standard used in the approach for assembling the contents or site? 11. Is there a collaborative platform that enabled editors (from in house) and free lancers (working outside of the company) to submit raw contents for template production? Part B––Questions related to Submission Stage: 12. After templates are produced, how are they being submitted? 13. What is the tool used for submitting the templates for approval? Is this tool a part of the existing web publishing system (if any)? 14. If yes, is there a provision for submitting these templates (for example, a user interface for submitting templates? 15. What types of file format does the tool support? 16. Who is involved in submitting the templates? Part C––Questions related to Acquisition Stage: 17. When templates are submitted, how are these templates and all the raw contents that made up the templates being acquired? 18. Is there any version control functions to keep track of the submitted contents, or are files simply overridden by the latest submission? 19. How are these templates and all the raw contents that made up the templates being validated? 20. What is the tool used for acquiring the templates for approval? 21. Upon acquisition, where will these templates be received and stored? (eg. live site, temporary directory, database system, etc.)? 22. Who is responsible for acquiring the submitted contents? Part D––Questions related to Quality Control Stage: 23. After templates are acquired, how are they being approved? 24. What is the tool used for this approval process? Does this tool facilitate a collaborative platform? If not, how is successful approval being informed? 25. Who will be involved in approving the templates? 26. Who will be involved in storing the approved/disapproved templates? 27. What tools are used for storing approved/disapproved templates? Does this tool facilitate a collab-
orative platform? If not, how is successful and nonsuccessful approval being informed? 28. Is there any version control functions to keep track of the approval process? 29. How are approved templates being stored? And how are the disapproved templates being sent back for adjustment? Part E––Questions related to Publication and Delivery Stage: 30. Once templates are approved, how are templates generated onto the site? 31. What is the tool used for publishing the approved templates? Is this tool a part of the existing web publishing system (if any)? 32. Is validating approved templates part of this process? 33. If yes, are the approved templates uploaded directly to live server after it is validated? Or does it have a preview function on the approved templates before sending them to live server? 34. Who is responsible for publishing contents on the site? 35. Is this process done manually and on an ad hoc basis? 36. How are old contents/templates being archived? 37. Who is responsible for archiving old contents once new contents/templates are available for publishing? 38. What tool is used for archiving contents/templates? 39. How are archiving functions being carried out by the existing process? 40. Is the archiving process done manually and on an ad hoc basis? A.2. Section Two: Content/Site Management features: (Interviewee: Web Master) Note: This part of the questionnaire attempts to find out the back end requirements of a web publishing system. Storing and sharing of contents: (1) What tool is used to store the contents of each of the different stages being described above? Is it organized and stored within a database system? (2) If a database system is used, how is the database architecture being set up (eg. does it has a centralized database where different staffs of the content development team can access to it? Managing different channels of the site: (3) How are the contents of different channels within the same site being handled? (4) How are different templates (i.e. look and feel) of the web sites being managed once they are produced?
J. Cao et al. / The Journal of Systems and Software 76 (2005) 221–235
File management/document management system: (5) What kind of file management feature does the existing system have? (6) Does the existing content development process provide any document life cycle management as check in/check out, versioning, approval, etc.? (7) Is there any search engine to search through all content database? If there is, is it accessible by non-technical staffs? Content distribution function: (8) How is the existing process catered for distribution of contents (partial or complete) to business partners or among multiple publications across several web sites within the same company? Content management and other enterprise information system: (9) If there is an existing content management system, does the system integrate with other enterprise information systems? If yes, please specify which system is integrated with content management system. If no, is there a need for integration with other system?
References Aceto, C., 1999. Managing web content. Document World 4, 18–23. Chan, C., 2000. An evaluation of web publishing methods adopted by major ICPs in Hong Kong. M.Sc. thesis, The Hong Kong Polytechnic University, Hong Kong.
235
Choy, D., Morris, R., 1996. Services and architectures for electronic publishing. COMPCON 1996, Santa Clara, California, pp. 291–297. Cichocki, A., Helal, A., Rusinkiewicz, M., Woelk, D., 1998. Workflow and Process Automation: Concepts and Technology. Kluwer Academic Publishers, Boston. Ciolek, T., 1996. TodayÕs WWW–tomorrowÕs MMM? The specter of multi-media mediocrity. IEEE Computer 29, 106–108. Dewan, R., Freimer, M., Seidmann, A., 1998. Internet service providers, proprietary content, and the battle for userÕs dollars. Communications of the ACM 41, 43–48. Dzlina, D., 1999. Publishers look to the database for efficient content repackaging, Folio, July, pp. 88–89. Eriksen, L., 1997. Digital documents, work and technology, three cases of Internet news publishing. In: Proc. 13th Hawaii IntÕl Conference on System Sciences, vol. 6, pp. 87–96. Foo, S., Lim, E., 1997. A hypermedia database to manage WorldWide-Web documents. Information and Management 31, 235–249. Green, D., 1996. A general model for on-line publishing. In: Proc. AUUGÕ96 and Asia Pacific World Wide Web Õ96 Conference, Australian Unix Users Group, Sydney, pp. 152–158. IEEE Standards Association, 1999. IEEE Recommended Practice for Internet Practices––Web Page Engineering––Intranet/Extranet Applications, IEEE Computer Society. Jacobson, I., Booch, G., Rumbaugh, J., 1999. The Unified Software Development Process. Addison-Wesley. Lie, H., Saarela, J., 1999. Multipurpose Web publishing using HTML, XML, and CSS. Communications of the ACM 42, 95–101. RedDot Solutions, 2004. Extended Content Management Suite. Available from
. Seesing, P., 1996. Electronic publication on the World-Wide Web: the process and the pitfalls. In: Proc. IPCCÕ96, Saratoga Springs, New York, pp. 145–155. Sommerville, I., 1996. Software Engineering, fifth ed. Addison-Wesley. Spinblue. Available from
. Vaughan-Nichols, S., 1995. Internet publishing tools proliferate, Byte, March. Vignette, 2004. Available from .