A visual programming language for XML manipulation

A visual programming language for XML manipulation

Journal of Visual Languages and Computing 24 (2013) 110–135 Contents lists available at SciVerse ScienceDirect Journal of Visual Languages and Compu...

2MB Sizes 12 Downloads 172 Views

Journal of Visual Languages and Computing 24 (2013) 110–135

Contents lists available at SciVerse ScienceDirect

Journal of Visual Languages and Computing journal homepage: www.elsevier.com/locate/jvlc

A visual programming language for XML manipulation$ Gilbert Tekli a, Richard Chbeir b,n, Jacques Fayolle a a b

Telecom St Etienne, University Jean Monnet, 25 rue Dr Remy Annino, 42000 St Etienne, France LIUPPA Laboratory, University of Pau et des Pays de l’Adour (UPPA), 64200 Anglet, France

a r t i c l e in f o

abstract

Article history: Received 13 April 2011 Received in revised form 1 August 2012 Accepted 18 November 2012 Available online 4 February 2013

XML data flow has reached beyond the world of computer science and has spread to other areas such as data communication, e-commerce and instant messaging. Therefore, manipulating this data by non-expert programmers is becoming imperative and has emerged two alternatives. On one hand, Mashups have emerged a few years ago, providing users with visual tools for web data manipulation but not necessarily XML specific. Mashups have been leaning towards functional composition but no formal definitions have yet been defined. On the other hand, visual languages for XML have been emerging since the standardization of XML, and mostly relying on querying XML data for extraction or structure transformations. These languages are mainly based on existing textual XML languages, they have limited expressiveness and do not provide non-expert programmers with means to manipulate XML data. In this paper, we define a generic visual language called XCDL based on Colored Petri Nets allowing non-expert programmers to compose manipulation operations. The XML manipulations range from simple data selection/projection to data modification (insertion, removal, obfuscation, etc.). The language is oriented to deal with XML data (XML documents and fragments), providing users with means to compose XML oriented operations. The language core syntax is presented here along with an implemented prototype based on it. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Visual languages Language syntax and specification Colored Petri Nets Composition XML data manipulation Concurrency

1. Introduction The widespread of XML today has invaded the world of computers and is present now in most of its fields (i.e., internet, networks, information systems, software and operating systems). Furthermore, XML has reached beyond the computer domain and is being used to communicate crucial data in different areas such as e-commerce, data communication, identification, information storage, instant messaging and others. Therefore, due to the extensive use of textual information transmitted in form of XML by expert and non-expert users, it is becoming essential to allow any user (programmers, scientists, writers, etc.) to manipulate corresponding XML data holding information satisfying personal requirements. The manipulation operations may include but are not limited to data selection/projection, filtering, restructuring, extraction, modification, insertion, removal, and obfuscation. As an example, consider a cardiologist who shares medical records of his patients with some of his colleagues. For confidentiality reasons, he wishes to omit personal information concerning his patients (i.e., name, social security number, address, etc.). In this case, the manipulation required is data omission which can be done via data encryption, removal, substitution or others depending on the operations provided by the system and the requirements of the user (cardiologist in this case).

$

This paper has been recommended for acceptance by Shi Kho Chang. Corresponding author. Tel.: þ 335 59 57 43 37. E-mail addresses: [email protected] (G. Tekli), [email protected], [email protected] (R. Chbeir), [email protected] (J. Fayolle). n

1045-926X/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jvlc.2012.11.001

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

111

In order to address this matter, two main approaches have emerged in the literature, Mashups and XML oriented visual languages. Both approaches try to provide expert and non-expert users with the ability to write/draw data manipulations by means of visual elements. On one hand, while there has been no common definition for Mashups, existing Mashup tools mainly aim at composing manipulation operators (e.g., RSS filters) for different types of web data (e.g., html, web content, etc.), but are not specific to XML. Since Mashups have not been formally defined yet as stated in [1], no Mashup related languages have emerged yet providing visual functional compositions. On the other hand, XML oriented visual languages are already formalized and mainly based on existing XML transformation (e.g., XSLT) or querying languages (e.g., XQuery). They provide visual means for non-expert users to write manipulation operations specific for XML data. The main goal of these languages is data extraction and structure transformation translated normally by selection/projection techniques/queries. The expressiveness of existing XML oriented visual languages is limited to their inability to visually express all the operations existing in the languages (e.g., aggregation functions) which they are based upon. Their expressiveness is also limited to the operations included in these languages themselves. Aside from their expressiveness limitations, these languages normally require the user to have some knowledge in different areas such as data querying which renders the task more difficult. XML visual languages so far and to the best of our knowledge are not viewed as visual functional composition languages which are considered closest to natural human thinking. In this paper, we aim at solving these issues by defining a new visual language, XML-Oriented Composition Definition Language (XCDL) initially introduced in [2] which provides a generic and formal definition of visual function-based composition. It is based on Colored-Petri Nets which allows it to express complex compositions with concurrency. In this paper, we present the definitions and specifications of the XCDL language along with a prototype developed in order to evaluate and validate our proposal. The rest of this paper is organized as follows. The first section motivates our research with a few scenarios. Section 2 discusses related work and available approaches. In Section 3, we give some background and preliminaries. In Section 4, we present the definitions and specifications of the XCDL language. Section 5 presents the prototype. And finally, we conclude and state some future works. 2. Motivating scenarios To motivate more our research and describe some of the issues which need to be addressed by visual languages, two scenarios are described which illustrate different perspectives of XML data manipulation. Consider a media company running different departments locally and internationally (reporting department, publishing department, communication department, etc.). Different control scenarios are required either in a single department or between departments. 2.1. Scenario 1: news gathering and report generation A journalist working in the reporting department is writing an article covering an event. The journalist wishes to acquire all information being transmitted by different media sources (television channels, radio channels, journals, etc.) in the form of RSS feeds, filter out their content based on the topic (s)he is interested in, and then compare the resulted feeds. Based on the comparison results, a report covering relevant facts of the event will be generated. To achieve this, several techniques would be required: 1. XML filtering: Filter XML data provided by several sources having the same structure (RSS Schema) based on a specific topic. 2. XML content similarity: Compare the filtered XML data for content similarities and retrieve significant data. 3. Automated XML generation: Generate an XML file reporting the filtered out XML data.

2.2. Scenario 2: sensitive data obfuscation The Communication department posts information and news concerning its activities in form of RSS feeds over the internet. The company wants to keep sensitive parts of the information exclusive to its employees and partners. However, the information needs to be partially available worldwide over the internet. In other words, sensitive data in the RSS feeds are to be encrypted by the information provider (the communication department), decrypted by the corresponding readers (employees and partners), and obfuscated for the rest. The feeds should remain RSS standardized. To achieve this, several techniques would be required: XML granular content encryption and signature:

 Encrypt and sign part of the data content transmitted in an XML file without altering the structure. (e.g,.odescription 438SUJujdgxxvES decided to sign the contract on the Wx34zs5sdZD.o/description4).

 Decrypt the encrypted data by the corresponding users.

112

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

These scenarios can neither be solved using existing Mashups nor XML visual languages. Mashups have two main drawbacks: (i) they are not XML specific tools and therefore do not offer data or text centric XML related operators, and (ii) they are server tools and cannot manipulate desktop data. As for XML visual languages, they do not really offer manipulation operators, but operate mostly on extracting data and transforming the XML structure. If we consider having a visual composition language generic and adaptable to different data types providing the user with means to create data manipulation operations from functions defined either in system libraries or external services, such a language can be thus embedded in any XML manipulated framework, implemented on both server or client side, and used to create different manipulations based on the functions specified in them.

3. Related work Since the widespread of information over the web, in particular XML-based data, researchers have been working on controlling and managing this data such as in [3–5]. In particular, regarding XML data, two main approaches have emerged aiming at providing non-expert programmers with tools allowing them to define their own control or manipulation operators for XML data: Mashups and XML-oriented visual languages. 3.1. Mashups Mashup is a new application development approach that allows users to aggregate multiple services, each serving its own purpose, to create a service that serves a new purpose. Mashups are built on the idea of reusing and combining existing services. They are mainly designed for data circulating on the web. Their objective is to target non-expert users; therefore a graphical interface is generally offered to the user to express most operations. Mashup applications [1,6–8] can include Mashups using maps (i.e., Google maps and Yahoo map3), multimedia content (i.e., YouTube and Flicker videos), ecommerce services (i.e., amazon.com and ebay.com) and news feeds (i.e., RSS and ATOM). The latter is the focus of most emerging Mashup tools nowadays. So far and to the best of our knowledge, the Mashup approach has not been formally defined, nevertheless, based on the existing Mashup tools, a preliminary common architecture is elaborated [1]. The Mashup architecture was defined from three main criterions:

 Integration between the different types of data (data flow).  Communication with the components and interaction among them.  Displaying of the content to the end-user. Therefore, three main components were defined: (a) Data mediation level: consists of all possible data manipulations (conversion, filtering, format transformation, etc.) needed to integrate different data sources where each manipulation could be done by analyzing both syntax and semantics requirements. (b) Process mediation level: defines the choreography between the involved applications. The integration is done at the application layer and the composed process is developed by combining functions, generally exposed by the services through APIs. (c) Presentation level: is used to extract user information as well as to display intermittent and final process information to the user. Results to the user can be drawn as a simple HTML page, or a more complex web page developed with Ajax, Java Script, etc. The languages used to implement user interface components and the front-ends visualization support both server side and client-side approaches. But due to the cross-domain problem, using server-side approach such as ASP or JSP is inevitable. To the best of our knowledge, no tool yet provides information regarding the analysis of the performances. All the tools are supposed to target non-expert users, but a programming knowledge is usually required depending on each tool. Several tools have emerged such as Popfly [8], Apatar [1] and MashMaker [6], Damia [7], Yahoo Pipes [1]. Popfly [8] is used to visualize data associated to social networks such as Flicker and Facebook. Popfly is a framework for creating web pages containing dynamic and rich visualizations of structured data retrieved from the web through REST web services. Apatar [1] helps users join and aggregate data such as MySQL, Oracle and others with the web through REST web services. MashMaker [6] is used for editing, querying and manipulating data from web pages. Its goal is to suggest to the user some enhancements, if available, for the visited web pages. Damia [7] and Yahoo Pipes [1] are mainly designed to manipulate data feeds such as RSS feeds. In this study, our interest mainly falls on Yahoo Pipes and Damia seeing that they allow manipulations of XML-based data, which is not the case of the other tools.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

113

XML-capable Mashup tools such as IBM Damia and YahooPipes share some main advantages and disadvantages with regard to XML manipulations by non-experts. The advantages are:

 The majority of tools have internal data models based on XML which makes them more flexible to use even if more 

programming is required to implement operations on them, especially for programmers [1] Mashups offer operators for data elaboration such as filtering and sorting. Mashup tools are all extensible even though special requirements (e.g. specific programming knowledge such as PHP) are necessary.

The disadvantages are:

 They are mainly designed to handle Web data which can be a disadvantage since by doing this, user’s data, generally available on desktops cannot be accessed and used.

 The offered operators are not easy to use, at least from a naive user point of view.  The tools do not offer powerful expressiveness since they allow expressing only simple operations. To summarize, existing Mashup tools are (i) mainly designed to handle online Web data which is restrictive in several scenarios since by doing this, user’s data, generally available on desktops cannot be accessed and used, (ii) not specifically designed for XML data manipulation and therefore do not provide XML specific operations for querying, updating and modifying all types of XML data and, (iii) going towards functional compositions (i.e., Damia and Yahoo Pipes) which allows them to increase their expressiveness in comparison with the tools following the query by example paradigm [1]. The latter have limited operations and are considered more complex for non-expert users due to the fact that some knowledge is required in querying data.

3.2. Visual languages for XML Since the emerging of the XML standard and its widespread beyond the computer domain, researchers have been trying to provide visual languages allowing the manipulation of XML data. These visual languages are mainly extensions of existing approaches such as XML query languages and transformation languages. Their main contribution is to allow nonexpert users to extract sensitive data from XML document and restructure the output document. Several languages have been developed over the years such as Xing [9], XML-GL [10,11], XQBE [12] and VXT [13]. On one hand, Xing [9] and XML-GL [10,11] were developed before XQuery was standardized and took the SQL querying approach by following the three main components of a regular query, selecting, filtering and restructuring the data. XML-GL was one of the first graphical querying languages designed for XML documents. The main purpose was to provide users, mainly non-expert programmers, with the ability to restructure and extract sensitive data from XML files. Nonetheless, due to the limitations provided by the existing querying languages at the time, in this case SQL and in particular SQL selection queries, XML-GL’s queries were very limited. Xing [9] was defined formally as a visual representation for querying XML data by following the selection projection paradigm. It was defined conceptually based on the SQL selection querying paradigm. Even though it is called XML in graphics, nonetheless it is not based only on visual representations but on textual as well, defined in tabular forms. Similar to XML-GL, its expressiveness is limited seeing that it is based on SQL. On the other hand, XQBE [12] was developed after XQuery and is based on it. Its expressiveness is greater than previous approaches whereas it allows the creation of complex queries containing aggregation functions, ordering results and negation expressions. Nonetheless, its expressiveness is limited to data extraction and query reconstruction in XQuery and does not include textual data manipulation operations such as value modification, insertion and deletion. As for VXT, it was designed to express selection projection queries similar to other XML visual languages but from a different perspective. VXT dropped the idea of building its visual syntax on a querying language and went towards a transformation language instead, so that it can be more expressive by introducing some transformation rules. VXT was based on XSLT [14,15] which is mainly used for XML data restructuring and not textual data manipulation. From a visual perspective, all of these approaches followed the same pattern, dividing their workspace into two main sections, left and right. The left section constitutes the source file with the extraction rules. As for the right section, it defines the structure of the output file. The query is defined by mapping the element to be extracted from the left section to the element to be constructed in the right section as shown in Fig. 1. To summarize, existing visual languages successfully bridged the gap between the complexities of XML data querying and non-expert users but were limited only to data extraction, filtering and restructuring. So mainly they provided nonexpert programmers with the ability to create XML structural transformations along with data extraction and filtering but did not deal with the XML value manipulations such as textual insertion, deletion, modification, filtering, etc. Table 1 summarizes the different criteria of the Mashups and XML oriented visual languages.

114

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Fig. 1. Examples of existing visual languages for XML. Table 1 Properties of Mashups and XML oriented languages. Properties

Mashups

XML visual languages

XML specific Manipulate online data Manipulate desktop data Expressiveness Based on formal languages Functional composition Composition-based functions Extending functions

No Yes No High No Yes No Dependent on the tool

Yes Yes Yes Low Yes No No Limited

4. Preliminaries and definitions In [16,17], ‘‘Visual Languages’’ are generally used to describe several types of languages: languages manipulating visual information, languages for supporting visual interactions, and languages for programming with visual expressions. The latter generally refers to visual programming languages, which is the case of the XCDL provided here. Visual programming languages define programs from pictures as defined in [16]. A visual language is a set of pictures. A picture is a collection of picture elements. A picture element is a primitive graphical object such as a line, generic shapes or a text string. The syntax of a visual language is specified by distinguishing the set of pictures forming the language. A visual language is mainly divided into three levels:

 The graphical representation model which defines the graphical elements that will be used in the languages (e.g., basic shapes: lines, circles, etc.).

 The language syntax which is normally defined based on an existing grammar (in our case Colored Petri Nets).  The transformation syntax which is used to map the language syntax to the graphical model. In this paper, we present the XCDL language defined according to the three previously stated levels. Our language is based on Colored Petri Nets (CP-Nets). As stated in [18,19], a Petri Net is foremostly a mathematical description, but it is also a visual or graphical representation of a system. Petri nets allow the definition of the state and behavior of a language simultaneously, in contrast with most specification languages. They provide an explicit description of both the states and the actions. Petri nets were mainly designed as a graphical and mathematical tool for describing and studying information processing systems, with concurrent, asynchronous, distributed, parallel, non-deterministic and stochastic behaviors. They consist of a number of places and transitions with tokens distributed over places. Arcs are used to connect transitions and places. When every input place of a transition contains a token, the transition is enabled and may fire. The result of firing a transition is that a token from every input place is consumed and a token is placed into every output place. CP-nets have been developed, from being a promising theoretical model, to being a full-fledged language for the design, specification, simulation, validation and implementation of large software systems.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

115

In a CP-Net:

 The states are represented by means of places (drawn as ellipses).  The actions are represented by means of transitions (drawn as rectangles).  An incoming arc indicates that the transition may remove tokens from the corresponding place while an outgoing arc indicates that the transition may add tokens.

 The exact number of tokens and their data values are determined by arc expressions (positioned next to the arcs).  The data types are referred to as color sets.  A transition has an expression guard (with variables) attached to it defining its operation. A CP-Net is formally defined as follows: Definition 1. Colored Petri Net or CP-net: it is an 8-tuple represented as CP-Net ¼ ðS, P, T, A, C, G, E, IÞ where        

S is a finite set of non-empty types also called color sets P is a finite set of places T is a finite set of transitions A is a finite set of arcs such that:  P\T¼ P\A ¼T\A ¼ Ø C is a color function. It is defined from P into S G is a guard function. It is defined from T into expressions such that:  8tAT: [Type(G(t))D S] E is an arc expression function. It is defined from A into expressions such that:  8aAA: [Type(E(a))¼ C(p) 4 Type(Var(E(a))) D S], where p is the input place of a I is an initialization function. It is defined from P into closed expressions such that:  8pAP: [Type(I(p)) ¼ C(p)]

The types of a variable v and an expression expr are denoted Type(v) and Type(expr) respectively. Var(expr) designates the variables of an expression expr. Also, we denote by 9X9 the number of elements in a set X. An example of a CP-Net is depicted in Fig. 2. This CP-Net has three places: two of them have a type Int  String, and one has a type Int. The transition takes one token of the pair type and one of the integer type, and produces one token of the pair type. In XCDL, both the language syntax and graphical model are based on CP-Nets with some adjustments and restrictions. Next, we present our approach by giving first an informal definition of the XCDL and then presenting its formal definition and specifications. 5. XML-oriented composition definition language (XCDL) Our approach is based on the same spirits of both Mashups and XML visual languages as shown in Fig. 3. On one hand, it has a similar architecture to Mashups and takes advantage of the functional composition paradigm. On the other hand, it is a formally defined language and separates the inputs and outputs to source and destination structures. The approach targets both expert and non-expert users. The language can be adapted to composition based Mashup tools and visual functional composition frameworks. Nevertheless, our language is XML-oriented, scopes all XML data

Fig. 2. An example of a CP-Net.

116

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Fig. 3. XCDL mapping to related works.

Fig. 4. Several sample functions defined in XCDL.

(documents and fragments, user-based and grammar-based). It is based on CP-Nets which allow us to define the language following the natural human thinking process while remaining highly expressive and provide information regarding performance analysis and error handling. To render the language as generic, extensible and user friendly; it should fulfill the following main properties:

 Simplicity: providing an easy to use language practical for all users: non, novice and expert programmers.  Expressiveness: allowing users to describe basic operations (e.g., String search) as well as complex ones (e.g., granular encryption).

 Flexibility: enabling its integration and appliance to different systems (e.g., web applications, desktop applications, etc.)  Scalability: allowing it to be extended and enriched with new operators and functionalities.  Adaptability: allowing it to be adaptable to different domains (in our research to textual and XML data). In order to satisfy simplicity, we defined the language as visual and based on functional composition. It is based on simple drag and drop actions of graphical components in order to compose manipulation operations. To provide expressiveness, flexibility and scalability, we based the syntax of the XCDL on CP-Nets known commonly to:

    

Have a well-defined semantics that unambiguously defines the behavior of each CP-Net. Have very few, but powerful, primitives (i.e., transition firing rule). Have a semantics which builds upon true concurrency, instead of interleaving. Integrate the description of control and synchronization with the description of data manipulation. Have a large number of formal analysis methods which can be used for performance analysis.

And for the adaptability, we have separated the composition, from the inputs and outputs which allows us to render the language adaptable to different data types. In our research, we defined a tree structure representing XML data and rendered the language XML-oriented. 5.1. Informal definition of the XCDL The XCDL is a visual functional composition language based on system defined functions and oriented towards XML data manipulations [20]. We denote by system defined functions (SD-functions), functions which will be identified in the language environment. These SD-functions can be provided by DLL/JAR files or services (e.g., Web service).

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

117

Fig. 5. Functional composition in XCDL.

XML document books.xml

XCD-tree representation

Charles Dickens A Christmas Carol 17-12-1843 James Joyce Ulysses 2-2-1922 An epic Greek myth.

Fig. 6. XML document to XCD-tree example.

XCDL is divided into two main parts:

 The inputs/outputs (I/O).  The SD-functions and the composition which both constitute the XCDL core. The I/O are defined as XML Content Description trees (XCD-trees) [21] which are Ordered Labeled Trees (OLT) summarizing the structure of XML documents or XML fragments, or representing a DTD or an XML schema, in forms of tree views as shown in Fig. 5. SD-functions are defined each as a CP-Net with the inputs and outputs defined as places and represented graphically as circles filled with a single color each defining their types. It is important to note in our case that a function can have one or multiple inputs but only one output. The operation of the function itself is represented in a transition which transforms the inputs to the output. Graphically, it is represented as a rectangle with an image embedded inside it describing the operation. Input and output places are linked to the transition via arcs represented by direct lines. Several sample functions are shown in Fig. 4. The composition is also based on CP-Nets. It is defined by a sequential mapping between the output and an input of SDfunctions. It is represented by a combination of graphical functions which are dragged and dropped, and then linked together with a sequence operator which is represented by a direct line between the output of a function and an input of another having the same color as shown in Fig. 5. As a result, on one hand, a composition might be a serial one meaning that all the functions are linked sequentially and to each function one and only one function can be mapped as illustrated in Fig. 11a. In this case, the sequential operator is enough. On the other hand, the composition might contain concurrency, as in, several functions can be mapped to a single one as depicted in Fig. 11b. In this case we introduce an abstract operator, the concurrency operator, in order to indicate the functions are concurrent. The geometric properties of the functions are shown in Fig. 11, such that, input places are drawn in a symmetric manner with regards to the X-axis considered to be situated in the middle of the transition. The distance between the

118

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Fig. 7. XML fragment to XCD-tree example.

Fig. 8. XCDL overview.

Fig. 9. XCDL-GR components.

Fig. 10. Graphical representations of the XCDL core components (SD-function and sequence).

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

119

Fig. 11. Composition in XCDL. (a) Concurrent composition: (SDF1//SDF2//SDF3)-SDF4 and (b) serial composition: SDF1-SDF2-SDF3.

circles is calculated automatically as described in Section 5.4. Now we give a formal definition of the I/O of the XCDL and the syntax of the language core. 5.2. XCD-tree IO of XCDL XCDL aims at manipulating XML data, whether they are user based XML documents or fragments, DTD (Document Type Definition) based documents or XSD (XML Schema Definition) based documents. We introduce a representation called XCD-tree (XML Content Description tree) depicted in Fig. 6. It is based on the tree model defined in the standardized W3C DOM model. It represents an XML based document as a node-tree where all nodes in the tree have a relationship to each other. In our research, we design the XCD-tree as an ordered labeled tree allowing us to represent the structure defining the content of XML data. It is imperative to note that the XCD-tree does not represent the content of XML data itself but its structure. The content of XML data is defined by the XML elements, attributes, and element/attribute values, which we assume to be textual data in our study (similarly to most approaches targeting XML data management, e.g., search, indexing, etc., which disregard the various types of values that could occur in XML documents, i.e., Decimal, Integer, Date, etc., for the sake of simplicity). The XCD-tree allows us to represent any type of XML, data-centric and text-centric, i.e., DTD (Document Type Definition) and XSD (XML Schema Definition,) and XML files and XML data fragments. The XCD-tree representation of DTDs and XSDs is straightforward since they already give a structural view of XML documents. As for XML files and fragments, we use structural summarization techniques with repetition reduction [22] in order to extract the structure of the XCD-tree. It is important to note that our purpose is to have a representation describing the structure of XML data (ELEMENT, ATTRIBUTE and TEXT nodes). Their grammar constraints, such as max occurrence and min occurrence are out of the scope of this paper. Definition 2. XCD-tree: it is a root node with a set of ordered sub-trees defined as   XCDtree ¼ N X ,T X ,LX ,f X ,AX where     

NX is the set of nodes in the XCD-tree (i.e., XCD-nodes) TXA{ELEMENT, ATTRIBUTE, TEXT} is the set of node type LX is a set of labels associated to XCD-nodes fX: NX-LX,TX is the function associating a label and a type to each node AX D NX  NX is the set of arcs associating 2 nodes together

Definition 3. XCD-nodeANx: it is represented by a doublet as XCDnode ¼ otype,label 4 where  typeATX  labelALX A node can have one and only one parent except for the root node which has no parents. Each node has a list of child nodes. Attributes are child nodes of their elements. A node with an empty list of child nodes is a leaf node and Text nodes are the only leaf nodes. We denote by RXCD-tree the root node of XCD-tree. If the XML data is a fragment of XML and contains no unique root element, then a virtual root node is inserted with the label ‘‘v_root’’.

120

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

The ELEMENT and ATTRIBUTE typed nodes represent structural data, their labels underlining corresponding element/ attribute tag names. As for a TEXT typed node, it represents data content, and is thus assigned a TEXT label in our tree representation model (since we are only interested in the structure, and not the content values themselves). To illustrate this, we present Figs. 6 and 7 depicting respectively XCD-trees originating from an XML document (books.xml) and an XML fragment with no single root. In the following section, we present the syntax of the XCDL core. 5.3. XCDL core As discussed in the previous sections, the XCDL is a visual language defined in three levels as shown in Fig. 8. 5.3.1. XCDL-graphical representation model (XCDL-GR) The XCDL-GR model defines the graphical components used to represent visually the language syntax to the user. We define the following components: Point, AD (Abstract drawing), Color, Circle, Line and Rectangle. Fig. 9 shows the class diagram of the XCDL-GR components. These components are formally defined as follows. Definition 4. Point: it is a spatial point defined by two coordinates as Point ¼ o x,y 4 where  x and y are Integers defining the Cartesian coordinates respectively over the X-axis and the Y-axis  P.x and P.y denote respectively the coordinates of the Point P

We define AD as an abstract drawing type which has no representation and is used as a super type for the subsequent drawing types color, circle, line and rectangle. Definition 5. AD: it is an abstract drawing type defined as a doublet AD ¼ o P1,P2 4 where  P1 and P2 are 2 Points which define the reference points for the sub-types of AD  AD1.P1 and AD1.P2 respectively denote the values P1 and P2 of the abstract drawing AD1

Definition 6. Color: it is an abstract drawing type defining an RGB color as Color ¼ o c 4 where  c is an IntegerA[0,16777215] defining an RGB color

Definition 7. Circle: it is a drawing type, sub-type of AD, represented by an ellipse shape and is defined as Circle ¼ oAD1,radius,color 4 where  AD1 is an instance of AD where AD1.P1 ¼ AD1.P2 defines the center of Circle  radius is an Integer defining the radius of Circle  color is a Color which is used to fill Circle

Definition 8. Line: it is a drawing type, sub-type of AD, represented by a segmented line shape and is defined as Line ¼ oAD1,style 4

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

121

where  AD1 is an instance of AD where AD1.P1 and AD1.P2 define respectively the starting and ending points of Line  StyleA{dashed, normal} defines the style of the line

Definition 9. Rectangle: it is a drawing type, subtype of AD, represented by a rectangular shape enveloping an image as, Rectangle ¼ o AD1,w,h,img 4 where  AD1 is an instance of AD where AD1.P1 ¼AD.P2 defines the point of the upper left corner of Rectangle  w and h are Integers defining respectively the width and height of Rectangle  img is a thumbnail image resized proportionally to w and h and drawn in the middle of Rectangle

If we consider D an instance of a drawing type and t one of its tuples, we denote D.t in order to specify the required tuple (e.g., Consider r as a Rectangle, R.img retrieves img of Rectangle r). The following section presents the syntax of the XCDL core based on CP-Nets.

5.3.2. Syntax definition Before we introduce the XCDL core syntax, we define its grammar. The syntax is based on a grammar defined by the CPNets’ algebra (and therefore retains their properties such as, Petri Net firing rule and Incidence matrix). It is important to note that the semantics of our language is simply inherited from CP-Nets.

Definition 10. XCGN (standing for XML oriented Composition Grammar Net): it represents the grammar of the XCDL which is compliant to CP-Nets. It is defined as XCGN ¼ ðS, P, T, A, C, G, E, IÞ where  S is a set of data types available in the XCDL       

 The XCDL defines 6 main data types, S ¼{Char, String, Integer, Double, Boolean, XML-Node} where Char, String, Integer, Double and Boolean designate the standard types of the same name. XML-Node defines a super-type designating an XML component (cf. Definition 11). P is a finite set of places defining the input and output states of the functions used in the XCDL. T is a finite set of transitions representing the behavior of the XCDL functions and operators. A D(P  T)[(T  P) is a set of directed arcs associating input places to transitions and vice versa.  8aAA: a.p and a.t denote the place and transition linked to arc a. C:P-S is the function associating a color to each place. G:T-S is the function associating an SD function to a transition where:  S is the set of SD-functions, which are operations performed by functions identified in the development platform’s libraries (e.g., concat(string,string)). E:A-Expr is the function associating an expression exprAExpr to an arc such that:  8aAA: Type(E(a)) ¼C(a.p). I:P-Value is the function associating initial values from Value to the I/O places such that:  8pAP, 8vAValue: [Type(I(p)) ¼ C(p)4Type(v)AS]

In the remainder of our work, XCGN(X) denotes the CP-Net representation of X conform to the XCGN grammar. Definition 11. XML-Node: it is a super type designating an XML component. It has three main sub-types defined in the XCD-tree as XML-Node l^ fXCD-Node:Element, XCD-Node:Attribute and XCD-Node:Textg where  XCD-Node:Element defines the type XML Element  XCD-Node:Attribute defines the type XML Attribute  XCD-Node:Text defines the type XML Element/Attribute Value

122

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

We define a marking M, which will be used in the rest of this paper, over a place p allowing us to retrieve the value in p. Definition 12. M: it is a marking over p defined as, M ðpÞ where  pAP and M (p) is the value of the token in place p  M0 denotes the set of initial markings of P and M(p) the initial marking of p where M0(p) ¼I(p)

Before defining the syntax of our language, we define an empty CP-Net ‘‘F’’.

Definition 13. F: it is an empty CP-Net and is defined as, F ¼ ðS, P, T, A, C, G, E, IÞ where    

S¼Ø P ¼Ø T¼ Ø A¼Ø

Since the CP-net is empty, therefore the functions do not perform any operations. We define now the syntax of the XCDL core. As mentioned previously, the core of the language is defined by SDfunctions, a sequential operator, a concurrency operator and the composition realized between different instances of SDfunctions, and sequential and concurrent operators. Thus, we introduce first the three main components of XCDL, as defined with XGCN: (i) SD-function, (ii) sequence operator ‘‘-’’ and (iii) concurrency operator ‘‘//-’’. The latter is an abstract operator denoting that the functions are concurrent. Then, we introduce the composition which is defined mainly by two types: (i) a serial composition which is a sequential composition between multiple instances of SD-functions and sequence operators as shown in Fig. 11a. (ii) a concurrent composition which is a composition between multiple instances of SD-functions and sequence operators to a single instance of an SD-function as shown in Fig. 11b. The concurrency operator is used in this case to indicate that the SD-functions are concurrent. We formally define next an SD-function, which represents a function defined in the system’s library, through a DLL file or web-services, having 1 or multiple inputs and a single output.

Definition 14. SD-function: it is a system defined function based on CP-Nets, describing an operation based on an identified function in the system’s library and is defined as SD-function ¼ ðS, P, T, A, C, G, E, IÞ where  S is the set of colors defining the types of data available in the SD-function.  S D  CGNS.

 P is a finite set of places defining the input and output states of the SD-function.     

 P ¼PIn[POut and PIn\POut ¼ | where PIn ¼ {pIn0, pIn1,y,pInn} and POut ¼ {pOut}. PIn and POut represent respectively the set of input and output places of an SD-function. T is a finite set of transitions representing the behavior of the SD-function.  T¼ {t} where t contains the operation to be executed. A D (PIn  {t})[({t}  POut) is a set of directed arcs associating input places to transitions and vice versa where PIn  {t} indicates the set of arcs linking the input places to t and {t}  POut linking t to the output place. C:P-S is the function associating a color to each place G:{t}-S is the function associating an operation to t where Type(G(t)) ¼ C(pOut). The operation can be retrieved with a URI to the DLL file or a web-service. E:A-Expr is the function associating an expression exprAExpr to aAA:  Expr is a set of expressions where: ( M ða:pÞ if a:papOut 8expr 2 Expr : expr ¼ Gða:t Þ otherwise

 I:PIn-Value is the function associating initial values to the input places.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

123

In addition to the graphical representation and mathematical syntax of a component in XCDL, it can have a detailed textual expression which can be used for analysis purposes. This expression is defined as follows. Consider a as an instances of an SD-function. a has two input places (i1 and i2) and a single output (o). a is represented textually as: afi1,i2g f0g where the function name is written in the middle, the input places are written in a set of elements placed as a superscript on the right of the function name and the output place is written in a set of elements as a subscript on the right of the function name. In Fig. 10, we give a graphical representation example of an SD-function and a sequence operator. We define now a Sequence operator represented by ‘‘-’’ which is used to map an output place of an SD-function to an input place of another. Definition 15. Sequence ‘‘-’’: it is an operator mapping two places together and is defined as Sequence ¼ ðS, P, T, A, C, G, E, IÞ where  

S is the set of CP-Net colors (data types) where 9S9¼1 P is set of 2 places defining the input and output states of the Sequence operator P ¼ PIn[POut and PIn\POut ¼ Ø where PIn ¼{pIn} and POut ¼ {pOut} where pIn represents the input place and pOut represents the output place T ¼{t} where t contains the sequence operator A ¼({pIn}  {t})[({t}  {pOut}) ¼{aIn, aOut} is aIn and aOut are directed arcs associating respectively the input place pIn to transition t and t to the output place pOut C:P-S is the function associating a color to each place where C(pIn)¼ C(pOut) G: is a function over T Where: Type(G(t))¼ C(pIn) 4 G(t) ¼M(pIn), M(pIn) is the marking of pIn E:A-Expr is the function associating an expression exprAExpr to aAA: – Expr is a set of expressions where: –

     

8expr 2 Expr : ( expr ¼



M ða:pÞ if a:papOut Gða:t Þ otherwise

I:POut-Value is the function associating initial values to the output place

Similar to the SD-function, a sequence has a textual expression defined as follows. Consider-1 as an instance of a sequence.-1 has a single input place (i) and a single output (o).-1 is represented textually as fgifg o -1

where the sequence name is written in the middle, the input place is written in a set of elements placed as a superscript on the left of the sequence name and the output place is written in a set of elements as a subscript on the left of the sequence name. We define next the concurrency operator. Definition 16. Concurrency ‘‘//-’’: it is an abstract operator indicating that multiple instances of SD-functions are concurrent. It is composed of two operators as

 Parallel ‘‘//’’ is an abstract operator denoting that SD-functions are parallel and independent from each other  Sequence ‘‘-’’ is a sequence operator

The concurrency operator is an abstract operator thus having no graphical representation nor a detailed textual expression since it does not have any input and output places. In the XCDL core, we separate the composition into a serial composition (SC) mapping sequentially several instances of SD-functions and a Concurrent Composition (CC), mapping several instances of SD-functions sequentially to a single instance of SD-function. Fig. 11a and b illustrate respectively a serial composition and a concurrent composition.

124

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Definition 17. SC, it is a Serial Composition, SC ¼ Pni¼ 0 SDF i -i , linking sequentially n instances of SD-functions using n 1 instances of Sequence operators and is compliant to a CP-Net. It is defined as SC ¼ Pni¼ 0 SDF i -i ¼ ðS, P, T, A, C, G, E, IÞ where  SDFi is a SD-function where: 8i,j 2 ½0,n, SDF i aSDF j f or iaj    

-i is a Sequence operator where: -i.S D SDFi.S -i.PIn ¼ SDFi.POut and-i.POutASDFi þ 1.PIn -n ¼(Ø, Ø, Ø, Ø, C, G, E, I) in an empty CP-Net

 S ¼ [ni¼ 0 SDFi:S P ¼ P In [ P Out where P In ¼ [ni¼ 0 SDF i :P In and P Out ¼ [ni¼ 0 SDF i :P Out

    

T ¼ [ni¼ 0 ðSDF i : [ -i :TÞ A ¼ [ni0 ðSDF i : [ -i :AÞ C:P-S is the function associating a color to each place where C¼SD-function.C G: is a function over T where ( SDF i :GðtÞ,9 2 [ni¼ 0 SDF i :T 8t 2 T, Gðt Þ ¼ -i :Gðt Þ,t 2 [ni¼ 0 -i :T

 E:A-Expr is the function associating an expression expr to an arc a where E ¼SD-function.E  I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

Definition 18. CCit is a Concurrent Composition, CC ¼ Pni¼ 0 ðSDF i -i SDF n þ 1 Þ==, linking n instances of SD-functions using n instances of Sequence operators concurrently to an instance of SD-function and is compliant to a CP-Net. It is defined as n Y ðSDF i -i SDF n þ 1 Þ== ¼ ðS,P,T,A,C,G,E,IÞ CC ¼ i¼0 where 

SDFi and SDFn þ 1 is a SD-function where: 8i A [0,n þ1] and 8j A [0,nþ 1], SDFiaSDFj for iaj -i is a Sequence operator where: – -i.S D SDFi.S – --i.PIn ¼ SDFi.POut and-i.POutASDFn þ 1.PIn –



     

S ¼ [ni ¼þ 01 SDF i :S P ¼ P in [ P out where Pin ¼ [ni ¼þ 10 SDF i :P in and Pout ¼ [ni ¼þ 10 SDF i :P out T ¼ [ni¼ 0 ðSDF i :T [ - i :TÞ [ SDFn þ 1:T A ¼ [ni¼ 0 ðSDF i :A [ - i :A Þ [ SDFn þ 1:A C:P-S is the function associating a color to each place where C¼ SD-function.C G: is a function over T where ( 8t 2 T, Gðt Þ ¼

 

SDF i :GðtÞ,9t 2 [ni ¼þ 01 SDF i :T -i :Gðt Þ, t 2 n [ni¼ 0 -i :T

E:A-Expr is the function associating an expression expr to an arc a where E ¼SD-function.E I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

Now that we have defined the main components and compositions in XCDL, we give a full illustration on the textual representations in XCDL. A composition in XCDL can have a detailed textual expression which can be used for analysis purposes. This expression is shown in the following example. Consider a, b and c as instances of SD-functions. a and c have two inputs (i1 and i2) and a single output (o), and b has a single input (i1) and output (o). a and b are in concurrency with c as shown in Fig. 12. This representation well be detailed in future works where we discuss analysis and optimization techniques. In the following section, we identify the properties of the XCDL algebra.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

125

Fig. 12. (a//b)-c.

Table 2 XCDL algebra properties. Associative property of sequence

(a-b)-c ¼ a-(b-c)

Associative property of concurrency Commutative property of concurrency Distributive property of concurrency Sequence identity property Concurrency Identity property

((a//b)//c)-d¼ (a//(b//c))-d (a//b)-c ¼(b//a)-c (a//b)-c ¼((a-c)//(b-c)) a-F ¼ a a//-F ¼ a

5.4. XCDL algebra properties Since the XCDL is a visual language and the composition is done via drag and drop, the order used by the user to add his functions and map them together is arbitrary. Nonetheless, this does not affect the resulting composition. We prove that by proving that the composition is associative along with other properties stated below. Consider a, b, c and d instances of SD-functions. We identify the following properties presented in Table 2: The proofs of the algebra properties are given here below regarding the operators defined previously (sequence ‘‘-’’ and concurrency ‘‘//-’’). It is important to note that the concurrency operator is composed of two operators as defined earlier. 5.4.1. Associative property of sequence: (a-b)-c ¼a-(b-c) Consider the following compositions SC1, SC2, SC and SC’ where:

   

SC1 ¼(sdf1-1 sdf2) SC2 ¼(sdf2-2 sdf3) SC¼(sdf1-1 sdf2)-2 sdf3 SC’ ¼sdf1-1 (sdf2-2 sdf3) In order to prove the associative property of sequence (SC¼SC’), we need to prove that XCGN(SC) ¼XCGN(SC’) Proof:

 

       

 

SC¼ (S, P, T, A, C, G, E, I) sdf1, sdf2 and sdf3 are SD-functions {-1,-2} are Sequence operators where: – -1.S D sdf1.S ¼SC’.-1.S – -1.pInAsdf1.POut and-1.pOutAsdf2.PIn ¼ SC’.-1.P – -2.S D sdf2.S ¼ SC’.-2.S – -2.pInA SC1.pOutA sdf2.POut and-2.pOutAsdf3.PIn ¼ SC’.-2.P S ¼SC1.S[sdf3.S ¼ sdf1.S[sdf2.S[sdf3.S ¼ sdf1.S[SC2.S ¼ SC’. S P¼ PIn[POut where: PIn ¼ SC1.PIn[sdf3.PIn ¼ sdf1.PIn[sdf2.PIn[sdf3.PIn ¼sdf1.PIn[SC2.PIn ¼SC’.PIn POut ¼ SC1.POut[sdf3.POut ¼sdf1.POut[sdf2.POut[sdf3.POut ¼ sdf1.POut[SC2.POut ¼SC’.POut T¼SC1.T[--2.T[sdf3.T ¼ sdf1.T[--1.T[sdf2.T[-2.T[sdf3.T ¼ sdf1.T[-1.T[SC2.T ¼ SC’.T A ¼SC1.A[-2.A[sdf3.A ¼ sdf1.A[-1.A[sdf2.A[-2.A[sdf3.A¼ sdf1.A[-1.A[SC2.A ¼SC’.A C:P-S is the function associating a color to each place where C¼ SD-function.C G: is a function over T where: ( SDf unction:Gðt Þ, 9t 2 sdf 1:T [ sdf 2:T [ sdf 3:T 8t 2 T,Gðt Þ ¼ Sequence:Gðt Þ, t 2 -1:T [ -2:T E:A-Expr is the function associating an expression expr to an arc a where E¼ SD-function.E I:PIn-Value is the function associating initial values to the input places, I ¼SD-function.I

126

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

And thus, XCGN(SC)¼ (SC’.S, SC’.P, SC’.T, SC’.A, C, G, E, I)¼ XCGN(SC’)

5.4.2. Distributive property of concurrency: (a// b)-c ¼((a-c)// (b-c)) Consider the following compositions CC, CC’ where:

 CC ¼(sdf1// sdf2)-sdf3  CC’ ¼(sdf1-1 sdf3)// (sdf2-2 sdf3) In order to prove the distributive property of concurrency (CC¼ CC’), we need to prove that XCGN(CC) ¼XCGN(CC’) Proof:  CC¼ (S, P, T, A, C, G, E, I)  sdf1, sdf2 and sdf3 are SD-functions  -is a set o–f Sequence operators,- ¼{-1,-2} where: – -.S ¼ {-1.S/-1.S Dsdf1.S} [ {-2.S/-2.S D sdf2.S)} – -.P ¼-.PIn[-.POut -.PIn ¼ {(-1.pIn/-1.pInA sdf1.POut) , (-2.pIn/-2.pInA sdf2.POut)} -.POut ¼ {(-1.pIn/-1.pOutA sdf3.PIn) , (-2.pIn/-2.pOutAsdf3.PIn)}

 S ¼ sdf1.S[sdf2.S[sdf3.S ¼ CC’.S  P ¼PIn[POut where:    

 PIn ¼ sdf1.PIn[sdf2.PIn[sdf3. .PIn ¼ CC’.PIn  POut ¼ sdf1.POut[sdf2.POut[sdf3.POut ¼CC’.POut T¼ sdf1.T[-1.T[sdf2.T[-2.T[sdf3.T ¼ CC’.T A ¼ sdf1.A[-1.A[sdf2.A[-2.A[sdf3.A ¼CC’.A C:P-S is the function associating a color to each place where C¼SD-function.C ( SDf unction:Gðt Þ,9t 2 sdf 1 :T [ sdf 2 :T [ sdf 3 :T G: is a function over T where:8t 2 T,Gðt Þ ¼ Sequence:Gðt Þ,t 2 -1 :T [ -2 :T

 E:A-Expr is the function associating an expression expr to an arc a where E ¼SD-function.E  I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

And thus, XCGN(CC) ¼(CC’.S, CC’.P, CC’.T, CC’.A, C, G, E, I) ¼XCGN(CC’) 5.4.3. Associative property of concurrency: ((a// b)// c)-d¼(a// (b// c))-d Consider the following compositions CC1, CC2, CC, CC’ where:

   

CC1 ¼ (sdf1// sdf2)-sdf4 ¼(sdf1--1 sdf4)// (sdf2-2 sdf4) CC2 ¼ (sdf2// sdf3)--sdf4 ¼ (sdf2-2 sdf4)// (sdf3-3 sdf4) CC ¼((sdf1// sdf2)// sdf3)-sdf4 CC’ ¼ (sdf1// (sdf2// sdf3))--sdf4 In order to prove the associative property of concurrency (CC ¼CC’), we need to prove that XCGN(CC)¼XCGN(CC’) Proof:

CC ¼ (S, P, T, A, C, G, E, I)

 sdf1, sdf2, sdf3 and sdf4 are SD-functions  - is a set of Sequence operators,- ¼{-12,-3}¼ {-1,-2,-3} where:  -.S ¼ {-1.S/-1.S D sdf1.S} [ {-2.S/-2.S D sdf2.S)} [ {-3.S/-3.S D sdf3.S)}  -.P ¼-.PIn[-.POut -.PIn ¼ {(-1.pIn/-1.pInA sdf1.POut), (-2.pIn/-2.pInA sdf2.POut), (-3.pIn/-3.pInA sdf3.POut)} -.POut ¼ {(-1.pIn/-1.pOutA sdf4.PIn), (-2.pIn/-2.pOutA SDF3.PIn), (-3.pIn/-3.pOutA sdf4.PIn)}

 S ¼ CC1.S[sdf3.S[sdf4.S ¼sdf1.S[sdf2.S[sdf3.S [sdf4.S ¼ sdf1[ CC2.S[sdf4.S ¼ CC’.S  P ¼PIn[POut where:   

 PIn ¼ CC1.PIn[sdf3.PIn[sdf4.PIn ¼ sdf1.PIn[sdf2.PIn[sdf3.Pin[sdf4.PIn ¼ sdf1.PIn[CC2.PIn[sdf4.PIn ¼CC’.PIn  POut ¼ CC1.POut[sdf3.POut[sdf4.POut ¼sdf1.POut[sdf2.POut[sdf3.POut[sdf4.POut ¼ sdf1.POut[CC2.POut[sdf4.POut ¼ CC’.POut T¼ CC1.T[-12.T[sdf3.T[-3.T[sdf4.T ¼ sdf1.T[-1.T[sdf2.T[-2.T[sdf3.T[-3.T[sdf4.T¼ sdf1.T[-1.T[CC2.T[-23.T[sdf4.T ¼ CC’.T A ¼ CC1.A[-12.A[sdf3.A[-3.A[sdf4.A ¼sdf1.A[-1.A[sdf2.A[-2.A[sdf3.A[-3.A[sdf4.A ¼sdf1.A[-1.A[CC2.A[-23.A[sdf4.A¼ CC’.A C:P-S is the function associating a color to each place where C¼SD-function.C

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

127

 G: is a function over T(where 8t 2 T, Gðt Þ ¼

SDf unction:Gðt Þ, 9t 2 sdf 1 :T [ sdf 2 :T [ sdf 3 :T [ sdf 4 :T Sequence:Gðt Þ, t 2 -1:T [ -2:T [ 3:T

 E:A-Expr is the function associating an expression expr to an arc a where E¼ SD-function.E  I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

And thus XCGN(CC)¼(CC’.S, CC’.P, CC’.T, CC’.A, C, G, E, I) ¼XCGN(CC’) 5.4.4. Commutative property of concurrency: (a// b)-c ¼(b// a)-c Consider the following compositions CC, CC’ where:

 CC¼(sdf1// sdf2)-sdf3  CC’ ¼(sdf2// sdf1)-sdf3 In order to prove the commutative property of concurrency (CC¼CC’), we need to prove that XCGN(CC) ¼XCGN(CC’) Proof:. CC ¼(S, P, T, A, C, G, E, I)

 sdf1, sdf2 sdf3 are SD-functions  -is a set of Sequence operators,-¼ {-1,-2} where:  -.S ¼{-1.S/-1.S Dsdf1.S} [ {-2.S/-2.S D sdf2.S)} ¼{-2.S/-2.S D sdf2.S)} [ {-1.S/-1.S D sdf1.S} ¼CC’.-.S  -.P¼ -.PIn[-.POut -.PIn ¼ {(-1.pIn/-1.pInA sdf1.POut), (-2.pIn/-2.pInA sdf2.POut)}¼ {(-2.pIn/-2.pInA sdf2.POut), (-1.pIn/-1.pInA sdf1.POut) } ¼ CC’.-.PIn -.POut ¼ {(-1.pIn/-1.pOutA sdf3.PIn), (-2.pIn/-2.pOutA sdf3.PIn)}¼{(-2.pIn/-2.pOutA sdf3.PIn) , (-1.pIn/-1.pOutA sdf3.PIn)} ¼CC’.-.POut

 S ¼sdf1.S[sdf2.SS[sdf3.S ¼sdf2.S[sdf1.S[sdf3.S ¼CC’.S  P¼ PIn[POut where:    

 PIn ¼ sdf1.PIn[sdf2.PIn[sdf3. .PIn ¼ sdf2.PIn[sdf1.PIn[sdf3. .PIn ¼ CC’.PIn  POut ¼ sdf1.POut[sdf2.POut[sdf3.POut ¼ sdf2.POut[sdf1.POut[sdf3.POut ¼ CC’.POut T¼ sdf1.T[-1.T[sdf2.T[-2.T[sdf3.T ¼ sdf2.T[-2.T[sdf1.T[-1.T[sdf3.T ¼CC’.T A ¼sdf1.A[-1.A[sdf2.A[-2.A[sdf3.A ¼ sdf2.A[-2.A[sdf1.A[-1.A[sdf3.A ¼CC’.A C:P-S is the function associating a color to each place where C ¼SD-function.C G: is a function over T where: ( 8t 2 T,Gðt Þ ¼

SDf unction:Gðt Þ,9t 2 sdf 1 :T [ sdf 2 :T [ sdf 3 :T Sequence:GðtÞ,t 2 -1 :T [ -2 :T

 E:A-Expr is the function associating an expression expr to an arc a where E¼ SD-function.E  I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

And thus, XCGN(CC)¼(CC’.S, CC’.P, CC’.T, CC’.A, C, G, E, I) ¼XCGN(CC’) 5.4.5. Identity property of sequence: a-F ¼a Consider the following composition:

 SC¼sdf-F In order to prove the identity property of sequence (SC¼sdf), we need to prove that XCGN(SC) ¼XCGN(sdf) Proof: SC ¼ (S, P, T, A, C, G, E, I)

 Sdf is an SD-function and F is an empty net.  -is a Sequence operators where based on the Serial Composition SC ¼

0 Q i¼0

 -0 ¼(Ø, Ø, Ø, Ø, C, G, E, I) in an empty CP-Net

 S ¼ sdf1.S [F.S ¼sdf1.S

SDFii

128

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

 P ¼PIn[POut where:  PIn ¼sdf1.PIn[F.PIn ¼ sdf1.PIn   

 POut ¼ sdf1.POut[F.POut ¼ sdf1.POut  T¼ SC1.T[-.T[F.T ¼ sdf1.T A ¼ sdf1.A[-.A[F.A¼ sdf1.A C:P-S is the function associating a color to each place where C¼SD-function.C G: is a function over T where: 8t 2 T,Gðt Þ ¼ SDf unction:GðtÞ, t 2 sdf :T

 E:A-Expr is the function associating an expression expr to an arc a where E ¼SD-function.E  I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

And thus, XCGN(SC)¼ (sdf.S, sdf.P, sdf.T, sdf.A, C, G, E, I) ¼XCGN(sdf)

&

5.4.6. Identity property of concurrency: a//-F ¼a Consider the following composition:

 CC ¼sdf//-F In order to prove the identity property of sequence (CC¼sdf), we need to prove that XCGN(CC) ¼XCGN(sdf) Proof: CC ¼ (S, P, T, A, C, G, E, I)

 Sdf is an SD-function and F is an empty net.  S ¼ sdf1.S [F.S ¼sdf1.S  P ¼PIn[POut where:      

 PIn ¼ sdf1.PIn[F.PIn ¼ sdf1.PIn  POut ¼ sdf1.POut[F.POut ¼ sdf1.POut T¼ SC1.T[-.T[F.T ¼ sdf1.T A ¼ sdf1.A[-.A[F.A¼ sdf1.A C:P-S is the function associating a color to each place where C¼SD-function.C G: is a function over T where: 8t 2 T,Gðt Þ ¼ SDf unction:Gðt Þ, t 2 sdf :T E:A-Expr is the function associating an expression expr to an arc a where E ¼SD-function.E I:PIn-Value is the function associating initial values to the input places, I¼ SD-function.I

And thus XCGN(CC) ¼(sdf.S, sdf.P, sdf.T, sdf.A, C, G, E, I) ¼XCGN(sdf) In the following section, we define the transformation syntax which will allow us to transform the XCDL Core syntax into a graphical representation based on the components defined in the XCDL-GR model.

5.5. Transformation syntax Before defining the transformation syntax, we introduce formally an abstract syntax which will ease the treatment of the graphical components and the transformation to the XCDL core syntax. Since the XCDL is based on CP-Nets, therefore we identify the following main components: Color, Place, Transition and Arc. Definition 19. AS, it is the abstract syntax for XCDL and is defined as AS ¼ o F S ,F P ,F T ,F A 4 where  FS: S-C is a function associating an abstract drawing type Color to a type eAS  FP: P-O is a function associating a drawing type Circle to a place pAP  FT: T-R is a function associating a drawing type Rectangle to a transition tAT  FA: A-L is a function associating a drawing type Line to an arc aAA Now, we define the transformation syntax which allows us to transform the XCDL syntax into the XCDL-GR model by the aid of the AS.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

129

Definition 20. T is the transformation syntax and is defined as T ¼ o TF S ,TF F 4 where  TFS is a transformation function used to translate sequence operators into graphical data as, TFS ¼ o x1, y1, x2, y2, FS 4 where:  x1, y1, x2 and y2 are integers representing the values of 2 spatial points provided by the user’s mouse click  FS: S-D is the function applying the transformation from a drawing type AD1 to a sequence-where aIn ¼ -.ain and aOut ¼ -.aOut as: 8 F A ðaIn Þ:AD1 :P 1 :x ¼ x1 > > > > > F A ðaIn Þ:AD1 :P 1 :y ¼ y1 > > > > > F A ðaIn Þ:AD1 :P 2 :x ¼ x1 þ2 x2 > > > > y1 þ y2 > > F A ðaIn Þ:AD1 :P 2 :y ¼ > 2 < F A ðaOut Þ:AD1 :P 1 :x ¼ x1 þ2 x2 > > y > FAðaOut Þ:AD1 :P 1 :y ¼ 1 þ2 y2 > > > > > > FAðaOut Þ:AD1 :P 2 :x ¼ x2 > > > > > FAðaOut Þ:AD1 :P 2 :y ¼ y2 > > > : F ðaÞ:style ¼ dashed, 8a 2 -:A A

         

TFF is a transformation function used to translate an SD-function into graphical data and is defined as, TFF ¼ o x1, y1, x2, y2, h, w, ht, wt, img, FF 4 x1, y1, x2 and y2 are integers representing the values of 2 points provided by the user’s mouse click h is an integer representing the maximum height between the first and last input places w is an integer representing the distance between the transition a place on the x-axis ht and wt are integers representing respectively the height and width of a rectangle representing a transition img is an image representing an SD-function FF: F-D is the function applying the transformation from a drawing type AD1 to an SD-function, SDf, as FS(e) where eASDf.S FP(pi) 8   > F pi :AD1 :P 1 :y ¼ y1  2h þ idy,9i o n2 > > >   > > < F pn1i :AD1 :P 1 :y ¼ y1 þ 2h idy,9io n2   f or n ¼ 9P In 9, i 2 ½0,n½, pi 2 SDf :P In and dy ¼ nh > F pn=2 :AD1 :P 1 :y ¼ y1 , n mod2 ¼ 1 > > >   > > : F pi :AD1 :P 1 :x ¼ x1  wt 2 w ( f or p0 2 SDf :P Out

  F p :AD1 :P 1 :y ¼ y1   0 F p0 :AD1 :P 1 :x ¼ x1 þ wt 2 þw

f or n ¼ jP In jþ 9P Out 9, i 2 ½0,n þ m½, pi     2 SDf :P In [SDf :P Out , F pi :color ¼ F SðC pi Þ

 FT(t), tA SDf.T 8 F ðt Þ:AD1 :P 1 :x ¼ xl wt > 2 < T F T ðt Þ:AD1 :P 1 :y ¼ y1  ht 2 > : F T ðt Þ:img ¼ img

 FA(ai), aiA SDf.Af or n ¼ jPIn jþ 9POut 9, i 2 ½0,n þ m½, pi 2 PIn [POut   8 F A ðai Þ:AD1 :P 1 :x ¼ F P pi :AD1 :P 1 :x > >   > > < F A ðai Þ:AD1 :P 1 :y ¼ F P pi :AD1 :Pl:y ( F T ðt Þ:AD1 :P 1 :x w > > 2 , pi 2 P In > F A ða Þ:AD1 :P 2 :x ¼ > F A ðai Þ:AD1 :P 2 :y ¼ F T ðtÞ:AD1 :P1 :yF A ðai Þ:style ¼ normal i : F T ðt Þ:AD1 :P 1 :x þ w 2

The transformations based on TFS and TFF are depicted in Fig. 13a and b respectively. In this section we defined a generic composition language which allows users to visually create functional compositions. The language syntax was defined in CP-Nets. The csomponents and the composition results are all CPNets which allows the composition to express true concurrency and parallelism.

6. Prototype and evaluation To validate our language, we developed a prototype called Visual X-Manip, based on the XCDL core grammar allowing us to draw XML oriented manipulation operations based on functions defined in the prototype’ system libraries. The

130

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Fig. 13. Transformation functions.

Fig. 14. Prototype architecture.

Fig. 15. Relational schemas compliant with XCGN.

functions defined in the prototype were mainly functions allowing different string manipulations. The prototype was developed in VB.Net and therefore the string functions used were the ones defined in the VB.Net String Class (e.g., concat, trim, hashcode, etc.). The architecture of the prototype is shown in Fig. 14. The prototype is composed of four main modules to be detailed here: (i) the internal data model, (ii) the library, (iii) the I/O XCD-trees and (vi) the composition platform. 6.1. Internal data model The internal data model is based on XML and is conform to XCGN. Thus, the internal data model is defined with respect to CP-Nets as defined in Definition 10. It is composed of two relational schemas as shown in Fig. 15. The first one is the library schema which defines the internal model of the SD-functions defined in the prototype library based on the syntax definition of SD-functions (cf. Definition 14). The second schema defines the internal model of the composition based on the composition syntax in Definitions 17 and 18. 6.2. Library The library module is a set of graphical forms which allow the customization of the language and define the functions to be embedded in the library. The customizations are done by choosing which data types to include in the language, what colors to give to each type, what are the dimensions to give to a transition (ht and wt), the maximum height for the places in a function (h), and the images used to describe the functions, etc. The functions are identified via a set of forms allowing the initiation of a function definition, define its transition containing the operation to be executed and define its I/O places. Fig. 16 presents some of these forms. In the current prototype several basic functions for manipulating XML data have been embedded in forms of DLL files and web services. These functions range from simple selections to filtering, insertion and modification of XML data and their values.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

131

Fig. 16. Library configuration forms.

Fig. 17. composition platform modules.

Additional functions are being embedded such as semantic filtering and XML similarity computations from both the structural and content side. These functions have been conceptualized and defined in [23,24]. 6.3. I/O representation In this module we defined an algorithm which generates an XCD-tree representing the structure of an XML document with repetition reduction. It takes an XML file as an input and generates an ordered labeled tree an output based on XCDtrees as shown in Fig. 6.

132

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Fig. 18. Composition platform.

6.4. Composition Platform The XCDL composition platform is the visual language editor where the user can define/create his compositions, functions and I/O XCD-trees. It is defined as a visual editor which instantiates SD-functions visually and maps them together along with I/O XCD-trees as discussed here below. The platform, shown in Fig. 18, provides the user with a graphical interface allowing him to compose his operations from the functions defined in the Library module as shown in Fig. 17a. This platform is divided to 4 main sections: the XCD-tree:In which shows the input XML structure, the XCD-tree:Out which shows the output structure (cf. Fig. 17b), the SD-functions which lists the SD-functions defined in the Library and the Composition Workspace which is where the composition is drawn by dragging and dropping SD-functions and sequentially mapping them by consequently selecting an output and then an input. The composition is straight forward: 1. The user selects the desired XML document, fragment as an input feed 2. An XML document, fragment or grammar is selected that defines the structure of the flow output 3. The required manipulation operation is defined by dragging and dropping different instances of SD-functions into the workspace, then mapping them together 4. The input feed is then mapped to the SD-functions present in the workspace by selecting a node from the input XCDtree and mapping it to an SD-function. 5. The SD-functions are mapped to the output XCD-tree through simple mouse clicks 6. The publish button is used to compile the composition and test it.

To give an illustration of the use of XCDL, we go back to scenario 1 ‘‘A journalist wants to filter RSS feeds based on a specific topic (e.g., pollution), check for information redundancy by comparing the similarity of the feeds and generate an ATOM XML based report with the retrieved sensitive data’’. Fig. 17 shows the composed operation defined based on CPnets. To define the composition operation, first we define the input and output content description structures in forms of XCD-trees. The input and output XCD-trees generated represent respectively a simplified RSS structure and an ATOM structure. Different SD-functions are shown in the SD-functions’ section. In this scenario, to create the composition, the user first checks available SD-functions. He starts by selecting the textual content values of the Elements, title and description as shown above by using the Extract Data function which extracts the values from the selected XCD-nodes. Then, the uses adds a separator character ‘‘*’’ to the extracted value from the Element title using the Concat function. The user concatenates the data extracted from the Element description with the title using another instance of the Concat function.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

133

At this stage, the composition so far merges 2 TEXT nodes together with a character separator ‘*‘ and discards the rest of the RSS data. The user then sequentially adds a Select function to the merged output which discards all data not containing the specified topic. To compare the feeds, the user inserts a Wait For function and gives it two as an argument so that it waits till it receives two data values and then transmits them simultaneous as separate outputs. To this point, the user is able to filter his feeds based on a certain topic and extract the sensitive data in textual format. The user adds next a Redundancy removal function. After removing all redundancies, the user directs the output to a Split operator which will split data based on the separating character ‘‘*’’ and maps the data to the TEXT typed XCD-nodes, title and description, in the output XCD-tree as shown in Fig. 15. The split operator, along with other operators will be detailed in future works. 6.5. Evaluation In order to evaluate our language, we studied different analysis and evaluation models such as [25,26] and defined the VPL evaluation model shown in Fig. 19. We provided 4 use case scenarios that were executed by a number of participants. A questionnaire survey technique was used to collect data. The data collected was then analyzed based on the evaluation model. The language was tested, validated and evaluated alongside YahooPipes and IBM Damia, being the closest to our work. After evaluating the quality of visualization, interaction and use, we elaborated the quality of language concerning XCDL shown Fig. 20. The XCDL quality of use was evaluated to be better than that of IBM Damia and YahooPipes in terms of XML-oriented visual manipulations. While XCDL received over 78% of positive feedbacks regarding all of the quality of visualization, interaction and use, both IBM Damia and YahooPipes received less than 78% of favorable feedbacks regarding all the quality factors. Nevertheless, in XCDL the quality of interaction has the least positive feedback of all which is anticipated due to the lack of error handling and existing bugs in the Visual X-Man prototype developed. It is interesting to note that the quality of visualization was assessed positive by over 87% of the participants which is remarkable since both IBM Damia and YahooPipes were less than 79% positive. A detailed discussion of the evaluation and the evaluation framework can be found online at /http://www.xman.jsho. org/evaluation/S. 7. Conclusion and future works In this paper we discussed the issues regarding XML data manipulation by non-expert users and presented our language, XCDL, as a visual language for XML manipulations. In the literature we identified two main approaches, (i) Mashups and (ii) XML visual languages. On one hand, while Mashups provide non-expert programmers with means to draw manipulation operations for web data by means of compositions, they are not XML specific and have not been formally defined yet. On the other hand, XML visual languages are mainly oriented towards XML, nevertheless, they are limited to data extraction and structure transformation and have limited expressiveness due to the languages they are based upon. To solve these issues, we presented the XCDL language, an XML-oriented visual composition language for defining XML manipulation operations by functional compositions. The language was based on CP-Nets in order to provide scalability, flexibility, adaptability, expressivity, and performance analysis. The language was implemented in a prototype

Fig. 19. VPL evaluation model.

134

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

Fig. 20. Quality of language. (A) Quality of visualization; (B) Quality of interaction and (C) Quality of use.

developed in VB.Net which allows users to customize the language and compose XML manipulation operations. The main track, in future works, relies on extending the XCDL core to include composed functions. In other words, a user should be able to define and embed functions based on existing compositions. A second track is to provide decision and loop operators such as if and for which will increase the expressiveness of the language. References [1] G.D. Lorenzo, et al., Data integration in mashups, SIGMOD Record 38 (2009) 59–66. [2] G. Tekli, et al., XCDL: an XML-oriented visual composition definition language, in: the 12th International Conference on Information Integration and Web-based Applications and Services (iiWAS2010), 2010, pp. 134–143. [3] P. Bottoni, et al., e-document management in situated interactivity: the WIL approach, Universal Access in the Information Society 8 (2009) 137–153. [4] F. Ferri, et al., Using shape to index and query web document contents, Journal of Visual Languages and Computing 13 (2002) 355–373. [5] P. Bottoni, et al., Specifying dialog control in visual interactive systems, Journal of Visual Languages and Computing 9 (1998) 535–564. [6] R.J. Ennals, M.N. Garofalakis, MashMaker: mashups for the masses, in: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007, pp. 1116–1118. [7] D.E. Simmen, et al., Damia: data mashups for intranet applications, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 1171–1182. [8] T. Loton, Introduction to Microsoft Popfly, No Programming Required in: Bibliometrics,: Lotontech Limited, 2008, p. 128, ISBN-10: 0–9556764–3–6, ISBN-13: 978-0–9556764–3–7. [9] M. Erwig, A visual language for XML, in: IEEE Symposium on Visual Languages, 0, 2000, pp. 47–54. [10] S. Ceri, et al., Complex queries in XML-GL, in: Proceedings of the 2000 ACM Symposium on Applied computing, vol. 2, Como, Italy, 2000, pp. 888– 893. [11] S. Ceri, et al., XML-GL: a graphical language for querying and restructuring XML documents, in SEBD, 1999, pp. 151–165. [12] D. Braga, et al., XQBE (XQuery By Example): a visual interface to the standard XML query language, ACM Transactions on Database Systems 30 (2005) 398–443. [13] E. Pietriga, et al., VXT: a visual approach to XML transformations, in: Proceedings of the 2001 ACM Symposium on Document engineering, Atlanta, Georgia, USA, 2001, pp. 1–10. [14] M. Kay, XSL Transformations (XSLT) Version 2.0 Available from: /http://www.w3.org/TR/2007/REC-xslt20-20070123/S, 2007. [15] W3C, Extensible Stylesheet Language Transformations-XSLT 1.0. Available from: /http://www.w3.org/TR/xsltS, 1999. [16] E.J. Golin, S.P. Reiss, The specification of visual language syntax, Journal of Visual Languages and Computing 1 (1990) 141–157.

G. Tekli et al. / Journal of Visual Languages and Computing 24 (2013) 110–135

135

[17] D.D. Hils, Visual languages and computing survey: data flow visual programming languages, Journal of Visual Languages and Computing 3 (1992) 69–101. [18] K. Jensen, An introduction to the theoretical aspects of coloured Petri Nets, in: a decade of concurrency, Reflections and Perspectives, REX School/ Symposium, 1994, pp. 230–272. [19] T. Murata, Petri Nets: properties, analysis and applications, in: Proceedings of the IEEE, 1989, pp. 541–580. [20] G. Tekli, et al., XA2C framework for XML alteration/adaptation, in: S.Y. Shin, et al., (Eds.), Reliable and Autonomous Computational Science, Springer Basel, USA, 2010, pp. 327–346. [21] G. Tekli, et al., Towards an XML adaptation/alteration control framework in: Proceedings of the 2010 Fifth International Conference on Internet and Web Applications and Services, 2010, pp. 248–255. [22] T. Dalamagas, et al., A methodology for clustering XML documents by structure, Information Systems 31 (2006) 187–228. [23] F.G. Taddesse, et al., Semantic-based merging of RSS items, World Wide Web 13 (2010) 169–207. [24] J. Tekli, R. Chbeir, A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics, Web Semantics 11 (2012) 14–40. [25] T.R.G. Green, M. Petre, Usability analysis of visual programming environments: a cognitive dimensions framework, Journal of Visual Languages and Computing 7 (1996) 131–174. [26] D. Marghescu, et al., Evaluating the quality of use of visual data-mining tools, in: Proceedings of the 11th European Conference on IT Evaluation (ECITE 2004), Netherland, 2004, pp. 239–250.