Journal of Statistical Planning and Inference 133 (2005) 69 – 93 www.elsevier.com/locate/jspi
Temporal aggregation in chain graph models Juan Ferrándiza , Enrique F. Castillob , Pilar Sanmartínc,∗ a Departament d’Estadística i Investigació Operativa, Universitat de València Estudi General, Spain b Departamento de Matemática Aplicada y Ciencias de la Computación, Universidad de Cantabria, Spain c Departamento de Matemática Aplicada y Estadística, Universidad Politécnica de Cartagena,
Paseo Alfonso XIII 52, Cartagena, Spain Received 18 September 2001; accepted 10 March 2004 Available online 9 June 2004 This paper is dedicated to Juan Ferrándiz, with friendship
Abstract The dependence structure of an observed process induced by temporal aggregation of a time evolving hidden spatial phenomenon is addressed. Data are described by means of chain graph models and an algorithm to compute the chain graph resulting from the temporal aggregation of a directed acyclic graph is provided. This chain graph is the best graph which covers the independencies of the resulting process within the chain graph class. A sufficient condition that produces a memory loss of the observed process with respect to its hidden origin is analyzed. Some examples are used for illustrating algorithms and results. © 2004 Elsevier B.V. All rights reserved. MSC: 62H11; 62H05; 68R10 Keywords: Factorization; Independence graphs; Spatial–temporal models
1. Introduction When modelling spatio–temporal data in environmental studies, observations often arise by temporal aggregation of unobserved underlying values (see Irwin et al., 2002). This is usually the case of epidemiological studies, where mortality and morbidity counts in geographical regions are reported as a sum of data produced at a higher speed rate. As Cressie (1991) points out “the change in time scale is crucial in determining whether any ∗ Corresponding author. Fax: +34-968-325694.
E-mail address:
[email protected] (P. Sanmartín). 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.03.012
70
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
space-time model should have a purely spatial component”. If observations are made at high frequency, the assumption of no instantaneous spread of diseases seems quite natural. This implies that spatial observations at any time are conditionally independent of each other, given the past data. However, if data are subject to the above-mentioned temporal aggregation, perhaps due to the complexity of collecting procedures, new temporal and contemporary spatial dependence can appear between geographical regions. The spatio–temporal dependence structure of data can be described by means of chain independence graphs that are natural extensions of undirected independence graphs used in the pure spatial case (see Besag, 1974; Cressie, 1991; Darroch et al., 1980; Winkler, 1995; Lauritzen, 1996 for spatial case). Guyon (1995) introduces the concept of Markovian dynamic of a Markovian field that can also be undertaken in this way. Sets of simultaneously observed values are taken as elements of a partition of the set of vertices defining the dependence chain. Lacruz et al. (2000) consider dynamic graphical models for the study of nonhomogeneous hidden Markov models. For causality issues see Pearl (2002). An alternative to the study of multivariate data in time is given by Dahlhaus (2000), who proposes partial correlation undirected graphs to describe the linear dependence structure of multivariate time-series in which every node represents a univariate time-series component. Moreover, Dahlhaus and Eichler (2003) introduce the concept of a time-series chain graph based on the AMP Markov property (Andersson et al., 2001). They refer to Lynggaard and Walther (1993) for an alternative definition based on the classical LWF Markov property for chain graphs (Lauritzen and Wermuth, 1989; Frydenberg, 1990a). This particular use of chain graphs can complement those given by Studeny and Bouckaert (1998), and Lauritzen (1996) (see also Buntine, 1995; Meek, 1995; Richardson, 1998; Wermuth and Lauritzen, 1990; Whittaker, 1990; Mohamed et al., 1998). More recently, Lauritzen and Richardson (2002) gave a detailed study about chain graphs models and their causal interpretations, showing that it is difficult, but possible, to give examples where chain graphs are able to capture all the independence structure in a causally generated system. The aim of this paper is to study the dependence structure of the observable process resulting from aggregation of a time evolving hidden random vector. We adhere to the chain graph approach and focus our attention on the particular case of spatio–temporal data as a motivating starting point. In spite of the natural representation of these data by means of chain graphs, these graphical structures fail to represent, in general, all the independence statements after marginalizing and conditioning in acyclic directed graphs (see e.g. Proposition 3 of Lauritzen and Richardson, 2002). In this paper we are concerned with the best chain graph cover and try to state our results as general as possible with the aim of being able to apply them to other graphical contexts. This work can also be related with similar ideas developed in the context of probabilistic inference and influence diagrams (see Shachter, 1988) where algorithms to obtain unidimensional conditional distributions with partially observable process are developed. To represent the full set of conditional independencies generated by aggregation in acyclic directed graphs, other kinds of graphical structure, like ancestral graphs (Richardson and Spirtes, 2002; Richardson, 2003), summary graphs (Cox and Wermuth, 1996; Wermuth et al., 1999; Wermuth and Cox, 2000) or MC graphs (Koster, 2002) are needed. But all of them can lead to a representation of the aggregated model which does not preserve the natural temporal blocking of the structures under study (see for example the graph model in Fig. 5).
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
71
The paper is structured as follows. In Section 2 we consider the case of temporal aggregation of spatio–temporal data, we set up the notation and terminology, we introduce the concept of aggregated chain graphs in this context and summarize some results on independence graph models that will be used in posterior sections. The main results of the paper are presented in Sections 3 and 4. Section 3 introduces the aggregation algorithm producing the aggregated chain graph, discusses how this graph describes the independence properties induced by the aggregation process and compare it with the aggregation structures previously mentioned. Section 4 gives a sufficiency condition producing a loss of memory of the observed process with respect to the hidden one. The resulting methods are illustrated by means of some examples. 2. Aggregated chain graph In this section we describe temporal aggregation of spatio–temporal data by means of graphical models. This requires some basic concepts of independence graphs that will be introduced as they become necessary to understand the problem at hand. We do not pretend to be exhaustive in this introduction; instead, we rely on the existing literature (for a complete discussion on graphical models see for example Lauritzen, 1996; Cowell et al., 1999). Let YU be a vector of unobservable random variables corresponding to L spatial locations through T times, that is U = L × T . Letting U (t) = L × {t}, then YU (t) is the subvector of the contemporary variables at time t. We can consider {YU (t) }t∈T as a hidden multivariate time series. Given a vertex u = (l, t), to recover the spatial and temporal coordinates when needed, we use functions l(u) = l and t (u) = t, where l and t are their corresponding location and time, respectively. Although it is not strictly necessary in the derivation of the forthcoming results, for the lack of simplicity we assume for the moment that contemporary variables are independent given their past. Thus, following Dawid’s (1980) notation, we write t (u) = t (u ) ⇒ Yu @Yu | {Yv : t (v) < t (u)}.
(1)
For instance, we can think of these variables as morbidity counts of an infectious disease on a set of geographical sites. If we make observations very frequently, infection between different sites will be produced in time intervals greater than our inter-observational ones, then no “contemporary” dependence appears between regions. In addition, we assume that each variable depends on the past only through a small set of neighboring locations (see Besag (1977) and Chadoeuf et al. (1992) for similar approaches). All these stochastic relationships are better described through an acyclic directed graph GU = (U, E) with set of vertices U (representing the variables Yu ) and a set E of directed edges (arrows) pointing from parents to children, (the vertices directly influenced by parents). If pa(u) stands for the set of parents of the vertex u, condition (1) can be expressed as ∀u ∈ U,
pa(u) ⊆ {v ∈ U : t (v) < t (u)}.
More generally, given a subset A of vertices, pa(A) denotes the set of vertices not in A which are parents of vertices in A.
72
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Fig. 1. Starting graph.
Fig. 2. Enlarged graph of the starting graph.
Example 1. Fig. 1 shows an example for two locations and six times, where location 1 in time 2 influences its own future in time 3, and location 2 in time 2 influences location 1 in time 3. The set of parents of y13 is {y12 , y22 }. Darkened vertices mean that they are not directly observable (all vertices in this example). We have been saying so far that the vector YU is not directly observed. We only have access to it through some kind of aggregated vector data denoted by ZS . Each Zs , s ∈ S represents the variable observed at the spatial location l(s) which results by aggregating the values of Y for k consecutive times. That is, we will consider a partition of U induced by a map from U onto S. Each set U (s) = −1 (s) corresponds to a unique location l(s) and k consecutive times. For every s ∈ S there is a measurable map hs : R|U (s)| → R carrying YU (s) into Zs . We could call this the aggregation partition of U. Example 2. The whole situation can be illustrated with Fig. 2. Starting from Fig. 1, assume that we can only observe the sum of each consecutive pair of hidden variables. Then, yi,2j −1 and yi,2j give rise to zi,j , the observable variables, that are children of the corresponding hidden ones. In this case k = 2, and the aggregation measurable function hs is the sum of the parents.
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
73
These kind of partitions arise for instance when administrative limitations impede a detailed follow up of the spread of an infectious disease. We can distinguish the administrative locations where counts have been taken, but temporal aggregation is usually inescapable due to the complex tasks required to gather all the information. As Example 2 shows we can also express the relationships between YU and ZS by means of a directed graph GU ∪S . However, as we only observe ZS we are interested in its marginal independence structure (in absence of YU ). In addition, given the spatio–temporal structure of data we would like to reflect, at any time t, the independence structure of contemporary variables Zs given the past. In this case we cannot guarantee that vector ZS verifies condition (1). Since spatial contemporary interactions between sites can appear as a consequence of the aggregation process, we need a more general graph to reflect symmetric dependence relationships. Chain graph models allow for the presence of directed (arrows) and undirected edges (lines) in a graph representation of a problem. Thus, we will describe the qualitative nature of the stochastic structure of ZS by means of a chain graph GS =(S, E) where the set of vertices S is partitioned into numbered sets corresponding to blocks of vertices of contemporary variables. This enumeration of S corresponds to the new time scale resulting from the aggregation of the original one related to hidden variables. Abusing notation we use the same symbols t and T for the new set of times. Then, the enumeration : S → T induces a partition of S in blocks S(t) = −1 (t) that we call temporal partition of S. In chain graph terminology, these blocks S(t) are said to form a dependence chain. Moreover, only undirected edges are allowed between vertices in the same block, and existing arrows, reflecting the influence of parents on children, are always pointing from a set with lower number t towards a set with higher one t > t. Chain graphs are a generalization of directed and undirected graphs. Note that a directed acyclic graph is a chain graph with all blocks consisting of only one vertex. Likewise, an undirected graph is a special case of a chain graph without arrows. In our case, the chain graph GS resulting from the aggregation process will be called aggregated chain graph from GU according to S. Example 3. Fig. 4(j) shows the aggregated chain graph resulting from the graph of Example 1 according to the partition described in Example 2. In a chain graph GV = (V , E), the relationship between two nodes connected by a line is recognized as symmetric, and we say that the connected vertices are neighbors. The boundary bd(A) of a subset A ⊆ V is the set of vertices in V \A that are parents or neighbors of vertices in A. A set A is complete if all of its vertices are joined each other by an edge. A path of length n from vertex v to vertex w is a sequence of distinct vertices {vi }ni=1 such that v = v0 , vn = w and (vk−1 , vk ) ∈ E for all k = 1, . . . , n. When v = w the path becomes a cycle. A directed path is a path with at least one arrow among its edges. If there is a directed path from u to w but no directed path goes from w to u, we say that u is an ancestor of w. Moreover, if there is a path from u to w and another from w to u we say that v and w are connected. It is easy to check that this last relationship is an equivalence relation partitioning the set of vertices in equivalence classes which we call connectivity components. The set of such components in a chain graph GV will be denoted by C(GV ).
74
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
The previously mentioned restrictions imposed to the direction of edges in a chain graph imply that the blocks forming any dependence chain are unions of connectivity components. Moreover, no directed cycles can be found in a chain graph. To complete the description of the aggregation process in forthcoming sections we need also the concepts of subgraph, union graph and marginal graph. Given a subset A of U, the subgraph GA is the graph with set of vertices A and set of edges E|A = E ∩ (A × A). Given two graphs, GV1 = (V1 , E1 ) and GV2 = (V2 , E2 ) we define its union as the graph GV1 ∪V2 = (V1 ∪ V2 , E1 ∪ E2 ). We can now define the marginal graph associated with a subset of vertices A (see the previous work by Castillo et al., 1998; Studeny, 1997): Definition 4 (Marginal graph). For an undirected graph GV , and a subset A of V, the marginal graph Gma A is the undirected graph with vertex set A and edge set the set of edges of the subgraph GA enlarged with all lines needed to make all boundaries of connectivity components in V \A complete, that is, E|A ∪ [bd() × bd()],
where varies in the set C(GV \A ) of connectivity components of GV \A . Example 5. Fig. 3(c) shows the marginal graph of the graph in Fig. 3(b) with respect to the subset A = {z11 , z12 }. 2.1. Independence properties Chain graphs represent independence statements of the probability distributions defined on the set of vertices. The absence of edges between two vertices can be interpreted as conditional independence of these vertices in a pairwise, local or global sense (see Lauritzen, 1996; Frydenberg, 1990a; Studeny and Bouckaert, 1998). The graphical structure can be interpreted as well in terms of factorization properties of the corresponding probability densities. We have adhered to this last approach in order to avoid some limitations in the absence of the positivity condition (required for different Markovian properties of the graph to be equivalent). Results reached in this section are stated without the positivity condition but the proofs are limited to discrete probability distributions. Therefore probability density functions are taken with respect to the counting measure, and we will call them probability mass functions (p.m.f.). These results can be extended to absolutely continuous distributions (see Sanmartín, 1997), but the proofs are quite different in nature and would increase the technical part of the paper without any significant contribution to the main ideas. Let PY stand for a probability distribution of the vector YU . In general, we will denote by MF (GU ) the set of probability distributions PY fulfilling some factorization property F for the graph GU . The factorization property for chain graphs can be stated as follows (see Frydenberg, 1990a; Lauritzen, 1996):
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
75
Definition 6 (Factorization chain property). A probability distribution P of the random vector YU admits a chain factorization according to a chain graph GU if its probability mass function p factorizes according to p(y) = (2) p(y |ypa() ) ∈C(GU )
and each factor in (2) can be further factorized as p(y |ypa() ) = c (yc ),
(3)
c
where c varies on all subsets of ∪ pa() which are complete in G∗∪pa() , the undirected graph obtained from the subgraph G∪pa() by adding all lines needed to make pa() complete. In this case we say that P verifies the chain factorization property (CF) with respect to GU , and we write P ∈ MCF (GU ). If GU is a directed acyclic graph, then the previous factorization reduces to expression (2) and we say that P admits a recursive factorization according to GU and we write P ∈ MDF (G). This is the case of YU in the preceding subsection if it verifies condition (1). When GU is an undirected graph the factorization can be expressed as in (3), we say that P admits a factorization according to G and write P ∈ MF (G). We can describe the whole situation briefly by saying that chain factorization is a recursive factorization between its connectivity components jointly with a factorization within each connectivity component (enlarged with its parents). The factorization chain property has several equivalent formulations. We next consider one that will appear in forthcoming proofs. Let V (1), . . . , V (T ) be the blocks of a dependence chain in the chain graph GU . We define the family of concurrent sets of vertices relative to this partitioning to be the collection C(t) = ∪r t V (r), for t = 1, . . . , T . This is an increasing family growing from C(1) = V (1) to the whole C(T ) = U . We can define an undirected graph G∗C(t) in each concurrent set C(t) by keeping the graphical structure of the last block V (t) unchanged and making the preceding concurrent set C(t − 1) complete. More precisely, in G∗C(t) we put an undirected edge (u, v) whenever there is an edge of GU or u, v ∈ C(t − 1). Now, the chain factorization property can be formulated as (see Proposition 3.30 in Lauritzen, 1996), Proposition 7. The probability distribution PY verifies the chain factorization property according to the chain graph GU iff for some dependence chain leading to concurrent sets C(t), PY admits a p.m.f. factorizing as p(z) =
T
p(yC(t) )/p(yC(t−1) )
t=1
and each numerator factorizes according to the undirected graph G∗C(t) .
(4)
76
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
In the aggregation process described in the previous subsection we cannot observe the underlying vector YU . In the resulting graph only the vector ZS remains. This makes clear the necessity of marginalizing out YU from the joint distribution of YU and ZS . The next result, relating marginal distributions and graph factorization, was proved in Castillo et al. (1998). It can also be deduced using results from collapsibility (see Frydenberg, 1990b): Theorem 8. Let GU be an undirected graph and PY a probability distribution of the vector Y factorizing according to GU . If A ⊂ V and PA is the marginal distribution associated with the subvector YA , then PA factorizes according to the marginal graph Gma A . 3. Aggregation algorithm In the preceding section we have motivated the aggregation process by means of spatio– temporal graph models. Nevertheless, the results we are to present in this section can be applied to any other graph model with similar partitioning of its vertices. Let us state the problem in its general terms, without any explicit mention to spatio–temporal coordinates. Let GU be an acyclic directed graph on the set of vertices U, and let YU be a random vector whose probability distribution PY admits a recursive factorization with respect to GU . Let the map from U onto S define a partition in U , and let us denote its elements with U (s) = −1 (s). Consider now the enumeration : S → {1, 2, . . . , T } establishing a partition of S in contemporary subsets. We assume here the compatibility of this “temporal” partition with the natural ordering induced by arrows in GU , that is, if we call St = {s : (s) t}, U (s) (UT = U ), Ut =
(5) (6)
s∈St
we assume an(Ut ) ⊆ Ut ,
t = 1, . . . , T .
(7)
This compatibility makes the Ut ancestral sets in the sense that they contain their own ancestors. For each s ∈ S there is a measurable function hs : R|s| → R mapping the subvector YU (s) into the aggregated variable Zs . Joining all these partial aggregations functions hs into a whole transformation h, we can write z = h(y). The observable vector ZS inherits its probability distribution PZ from the hidden random vector YU and we want to build up a chain graph GS showing its independence properties while respecting the “contemporary” relationship induced by the enumeration (s). The following definition will help in future statements about the aggregated graph which we are to build up. Definition 9 (Link graph). Given an undirected graph G = (V , E) and a subset A ⊆ V , the A-link graph A G is obtained by the following procedure: delete all edges in G but those
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
77
involving at least one vertex in A, and change any edge with only one vertex in A into an arrow pointing to it. Example 10. Fig. 3(f) shows the link graph for graph in Fig. 3(e) with respect A={z12 , z22 }. Now it is time to formulate the aggregation algorithm in the previously stated terms Algorithm 1. Aggregation algorithm Input: A directed acyclic graph GU and a partition induced by a map : U → S in the set of vertices U under the conditions previously defined. Output: A Chain graph GS . Initial step: Enlarge the original graph GU adding a vertex for each s ∈ S, and draw an arrow pointing to s from each vertex u ∈ U (s). Denote the resulting graph by GU ∪S . Let G0 = ∅. For k = 1, . . . , T do the following: Step 1: Take G1k to be the subgraph corresponding to Uk ∪ Sk . Step 2: Obtain G2k as the moral graph of G1k , that is, delete directions and complete the sets of parents for each vertex in the graph G1k . Step 3: Obtain G3k , the marginal graph of G2k with respect to Sk . Step 4: Obtain G4k , the −1 (k)-link graph from G3k . Step 5: Set Gk = G4k ∪ Gk−1 . The resulting graph GS = GT will be called aggregated chain graph from GU according to S. Before proving that the resulting graph GS obeys the independence properties of the distribution PZ , let us consider an example based on the spatio–temporal graph presented in Fig. 1. Example 11. Figs. 3 and 4 show the successive steps of the aggregation algorithm starting with the graph presented in Fig. 1 and the aggregation problem presented in Example 2. Fig. 2 shows the initial step with the enlarged graph GU ∪S . For k = 1 we have G11 , G21 and G31 in subfigures (a)–(c) of Fig. 3. Note that in this case G31 = G41 = G1 . To reduce the extension of figures we have drawn pairs of intermediate graphs in the same drawings. To distinguish between superimposed graphs we have used solid and dashed lines. Solid lines will mark edges shared by both graphs although possibly considered as arrows or lines depending on the graph. The cycle for k = 2 appears in subfigures (d)–(g). (d) G12 is the directed graph with solid arrows. We can see G22 by omitting directions and considering the new prescribed lines (dashed lines). (e) The marginal graph G32 . (f) The −1 (2)-link graph G42 . (g) The resulting graph of this cycle G2 . The main steps of cycle k = 3 are shown in Figs. 4(h)–(j). (h) G13 and G23 share the same picture. G13 corresponds to solid arrows while to see G23 we have to omit directions and add lines indicated with dashed lines.
78
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Fig. 3. Aggregation algorithm: (a) k = 1, Initial Step; (b) k = 1 Step 2; (c) k = 1 Steps 3–5; (d) k = 2 Initial Step to Step 2; (e) k = 2 Step 3; (f) k = 2 Step 4; (g) k = 2 Step 5.
(i) G33 , the marginal graph, appears in this figure if we omit directions and consider dashed lines. To see the −1 (3)-link graph G43 in the same place we have to consider only solid arrows and lines. (j) finally, the aggregated chain graph.
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
79
Fig. 4. Aggregation algorithm: (h) k = 3, Initial Step to Step 2; (i) k = 3 Steps 3 and 4; (j) k = 3 Step 5.
Now we can state how the chain graph GS describes the independence structure of the random vector Z. To prove the following theorem we need a collection of lemmas corresponding to the successive steps of the aggregation algorithm. Lemma 12. If PY admits a recursive factorization according to GU , then the joint distribution PY Z of Y and Z, admits a recursive factorization according to the enlarged graph GU ∪S of the initial step of Algorithm 1.
80
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Proof. Let p(y) be the p.m.f. of PY . From the aggregation functions {hs : R|s| → R}s∈S we can write p(z|y) = I{h−1 (ys ), (8) s (zs )} s∈S
where IB (·) stands for the characteristic function of the set B. Since PY has a recursive factorization according to GU , p(y) = p(yu |ypa(u) ) (9) u∈U
so that, p(y, z) =
s∈S
(ys ) I{h−1 s (zs )}
p(yu |ypa(u) ).
(10)
u∈U
By construction, pa(u) are the same in both graphs GU and GU ∪S , and pa(s) = U (s). Expression (10) shows that PY Z ∈ MDF (GU ∪S ), and the lemma is proved. Note that PY Z does not satisfy the positivity condition but the factorization property remains true. Lemma 13. Under the same hypothesis of Lemma 12, PY Z factorizes according to the moral graph Gm U ∪S (the undirected graph obtained from GU ∪S by deleting directions of arrows and completing the sets of parents of each vertex). Proof. By Lemma 3.21 in Lauritzen (1996) any distribution admitting a recursive factorization according to a directed acyclic graph G factorizes according to the moral graph Gm . In combination with Lemma 12, the desired result is proved. Using Theorem 8 of the previous section and Lemma 13, the following result is easily stated. Lemma 14. Under the hypothesis of Lemma 12, the marginal distribution PZ factorizes ma according to the marginal graph (Gm U ∪S )S . The next lemma is related to the link graph used in Step 4 of the aggregation algorithm. Lemma 15. Consider the undirected graph GV = (V , E) and, given A ⊂ V , let A G∗V be the graph obtained from the A-link graph A GV by making the subgraph A GV \A complete. If P factorizes according to GV then it satisfies the chain graph factorization for A G∗V as well. Proof. We have to check the equality (4) of Proposition 7. Given the dependence chain V (1) = V \A and V (2) = A, we have the concurrent sets C(1) = V (1) and C(2) = V . Let p(y) be the probability mass function as before. Then, p(y) = p(yV \A )p(yA | yV \A ) = p(yC(1) )(p(yC(2) )/p(yC(1) ))
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
81
P factorizes according to G∗C(2) =A G∗V because it does for GV , and A G∗V has the same set of vertices but some more edges added. P (YC(1) ) factorizes according to G∗C(1) =A G∗V \A because it is a complete graph. Note that the previous result has been proved without assuming the positivity condition for P. We have used only factorization properties. As a direct consequence of Lemmas 14 and 15, we can state Lemma 16. With the same notation as in Algorithm 1, let H = −1 (T ) ⊆ S be the last contemporary subset of S. The probability distribution PZ satisfies the chain graph factorization property for H (((GU ∪S )m )ma S )∗, the H-link graph with all vertices in S\H directly connected to each other. Now let us go back to the stepwise procedure of Algorithm 1. Let PkY be the probability distribution of YUk . The Proposition 3.22 in Lauritzen (1996) states the recursive factorization of the marginal distribution PA according to the subgraph GA if A is an ancestral set. Our sets Uk are ancestral sets for k = 1, . . . , T by definition. Then, Lemma 17. If P factorizes according to G, then PkY factorizes according to GUk . Note that, as a consequence of Lemma 17, the statements in Lemmas 12–16 will be valid if we consider GUk instead of the whole GU . For any k = 1, . . . , T just replace S, PY , PY Z , PZ , with their analogues Sk , PkY , PkY Z and PkZ , which appear when we consider GUk instead of G. Here PkY , PkY Z and PkZ stand for the probability distribution of the corresponding subvectors YUk and ZSk . Theorem 18. If PY admits a recursive factorization with respect to GU , then the probability distribution PZ satisfies the chain graph factorization property for GS . Proof. We have to check that PZ verifies condition (4) in Proposition 7 with respect to the graph GT . Let us consider the dependence chain V (k) = −1 (k) for k = 1, . . . , T in GT . The sets of concurrent variables will be C(k) = Sk as stated in (5). Let G4T be the graph resulting from the last step of Algorithm 1. From Lemma 16 PZ ∈ MCF (G4T ∗), i.e., p(z) = p(zST −1 )p(zS\ST −1 | zST −1 ) = p(zC(T −1) )(p(zC(T ) )/p(zC(T −1) )),
(11)
where the numerator p(z) factorizes according to the graph G∗C(T ) = G4T ∗. We can now proceed analogously with p(zC(T −1) ) = p(zST −1 ) and GT −1 . Arguing as before, p(zST −1 )=p(zST −2 )p(zST −1 \ST −2 | zST −2 ) = p(zC(T −2) )(p(zC(T −1) )/p(zC(T −2) )) and P (zC(T −1) ) factorizes according to paragraph following Lemma 17.
G∗C(T −1)
=
G4(T −1) ∗
(12) as was commented in the
82
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Reiterating this argument for St as t decreases towards 1, we get p(z) = p(zC(1) )
T
(p(zC(t) )/p(zC(t−1) ))
t=2
and each P (zC(t) ) factorizes according to G∗C(t) = G4(t) ∗ as was to be proved.
Note that the recursive nature of GT makes the last proof valid for Gk as well, by replacing PZ by PkZ , thus leading to PkZ ∈ MCF (Gk ), k = 1, . . . , T . 3.1. Best chain graph cover Theorem 18 shows that the aggregated model obeys the independencies of the aggregated chain graph. Unfortunately, the aggregated chain graph does not reflect in general all the independencies present in the aggregated model. For instance, in the model displayed in Fig. 2, z12 and z23 are independent, but this is hidden in the aggregated chain graph given in Fig. 4(j). We could think that the arrow from z12 to z13 is not necessary, but if we delete it, the conditional independence of z12 and z23 given z22 and z13 is read off the graph which is not fulfilled for the model in Fig. 2. The problem arises from the fact that chain graphs fail to represent the independence structure generated by aggregation of DAG variables as we previously mentioned. In spite of this comments, the interest of this result is that the aggregated chain graph identifies the “Best Chain Graph Cover” compatible with the temporal blocking in the following sense, if we delete any edge in the aggregation graph we can find some independence statement that is violated by the aggregated variables in the original graph GU ∪S . To see this suppose that we delete an existing edge in GS between vertices s and s . Let us take k = max((s), (s )), it follows easily that Zs @Zs | {Zj : j ∈ Sk \{s, s }}
(13)
but by construction there is an edge between s and s in Gk if and only if there is a path between them in G2k with all vertex except the end points in Uk , but G2k is the moral graph of G1k = GUk ∪Sk which is just the minimal ancestral graph in GU ∪S that contains Sk . This means that Zs is not conditionally independent of Zs given {Zj : j ∈ Sk \{s, s }} which is in contradiction with the statement given in (13). As mentioned earlier, in order to represent the full set of conditional independencies generated by aggregation, ancestral graphs, summary graphs or MC graphs are required (see Richardson and Spirtes (2002), for a comprehensive comparative analysis of these structures and their connections with chain graphs). All of them consider more types of edges than those allowed in chain graphs leading to a representation of the aggregated model which does not necessarily preserve the natural temporal blocking of the structures under study (see for example Fig. 5). Moreover, the separation criteria for them are equivalent although the MC graphs comprise a more general class of models. Ancestral graphs are the minimal class of graphs closed under marginalizing and conditioning which contains directed acyclic graphs. Only a very special class of chain graphs, those in which chain
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
83
Fig. 5. Ancestral and MC graph after marginalizing the graph in Fig. 2.
components which more than one vertex have no parents (recursive causal graphs) are also ancestral graphs (for instance graphs in Figs. 6(c) and (e)). In general, there are chain graphs that are not Markov equivalent to any ancestral graph (the aggregated chain graph in Fig. 4(j) and the s-aggregated chain graph in Fig. 9 are not ancestral graphs). Example 19. Fig. 5 shows the ancestral graph corresponding to the graph in Fig. 2 after marginalizing out YU which coincides in this case with the MC graph. The summary graph can be obtained substituting every bidirected edge by a dashed line. Applying the separation criteria associated with every of these graphs we can deduce that z12 and z23 are independent whereas z12 is not conditionally independent of z23 given z22 and z13 .
4. A sufficiency condition for memory loss The aggregation process can produce edges connecting vertices s , s even in the case of no previous direct connection between nodes u ∈ U (s ) and u ∈ U (s). In Example 11 we can see an arrow from z11 to z13 although, in the original graph, no edge connects directly y11 or y12 to y15 or y16 . This phenomenon happens whenever the information on Zs from the past is not exhausted by the aggregation of Ypa(U (s)) , and it needs to be complemented with information from previous Yu ’s. Conversely, if the dependence of YU (s) from Ypa(U (s)) is carried on exclusively by the results of previous aggregated variables, these aggregated variables act as “sufficient” predictors of YU (s) and, therefore, of Zs as well. Our purpose in this section is to explore the implications of this sufficiency criterion in the temporal aggregation of acyclic directed graphs. Abusing notation, for every vertex u ∈ U we write U (u) for the partition element containing u, that is U (u) = −1 ((u)), because it can be distinguished easily from U (s) = −1 (s) by context. The meaning of U (·) will depend on whether its argument belongs to U or S, but its value is always an element of the partition induced by in the set U. Consider next the following partition of pa(u), the set of parents of u in GU : (i) pa1 (u) = U (u) ∩ pa(u), the set of those parents sharing with u the same partition element U (u), and (ii) pa2 (u) = pa(u)\pa1 (u), those belonging to previous partition elements.
84
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Fig. 6.Aggregation algorithm vs. s-aggregation algorithm: (a) initial graph, (b) initial Step of aggregation algorithm; (c) aggregated graph, (d) initial Step of s-aggregation algorithm; (e) S-aggregated graph.
In this section we assume the following sufficiency condition for the aggregation family {hs }s∈S : U (s) ∩ pa2 (u) = ∅ ⇒ U (s) ⊆ pa2 (u),
(14)
p(yu | ypa(u) ) = p(yu | {hs (yU (s) ) : U (s) ⊂ pa2 (u)}, ypa1 (u) ).
(15)
Hereafter Mhs (GU ) will denote the set of probability distributions in MDF (GU ) satisfying the sufficiency condition (14) and (15) for the acyclic directed graph GU .
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
85
Obviously, GU must reflect the dependence structure implied in (14). For instance, in Example 20 (see also Fig. 6(a)) both nodes y11 and y12 are directly connected to the node y13 . To take advantage of this sufficiency condition, we modify the aggregation algorithm as follows. Algorithm 2. Sufficient aggregation algorithm Input: A directed acyclic graph GU and a partition induced by : U → S in U (under the conditions previously defined). ˜ S. Output: A chain graph G Initial step: Enlarge the original graph GU adding a vertex for each s ∈ S, and draw an arrow pointing to s from each vertex u ∈ U (s). Replace any existing arrow from a vertex u to u verifying ((u)) < ((u )) with an arrow from (u) to u . Denote the resulting ˜ U ∪S . graph by G Steps 1–5: As in the aggregation Algorithm 1 but replacing the notation Gj k of the ˜ j k for j = 1, 2, 3, 4. resulting graphs by G ˜ S will be called the s-aggregated chain graph from G according The resulting graph G to S. Example 20. The initial steps of the aggregation Algorithms 1 and 2 for the graph given in Fig. 6(a) are given in Figs. 6(b) and (d), respectively. Paralleling Theorem 18, we can state a similar result for the s-aggregation algorithm. However, to prove the above theorem we need the following lemma. ˜ U ∪S ). Lemma 21. If PY ∈ Mhs (G) then PY Z ∈ MDF (G Proof. If PY satisfies the sufficient condition for {hs }s∈S we can write: p(yu |{hs (yU (s) ) : U (s) ⊆ pa2 (u)}, ypa1 (u) ) p(y)= u∈U
=
p(yu |{zs : U (s) ⊆ pa2 (u)}, ypa1 (u) ).
(16)
u∈U
Using characteristic functions of the relevant sets as in the proof of Lemma 12, from (8) and (16) we get I{h−1 (y (s)) p(yu |{zs : U (s) ⊆ pa2 (u)}, ypa1 (u) ). (17) p(y, z) = U s (zs )} s∈S
u∈U
˜ U ∪S we have Since in G paG˜ U ∪S (s) = U (s),
(18)
paG˜ U ∪S (u) = {s : U (s) ⊆ pa2 (u)} ∪ pa1 (u),
(19)
˜ U ∪S ). expression (17) shows that PY Z ∈ MDF (G
86
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
With a similar argument, it is easy to prove the following lemma which extends the previous result to the intermediate graphs of the recursive s-aggregation algorithm. ˜ Uk ∪Sk ) for k = 1, . . . , T . Lemma 22. If PkY ∈ Mhs (GUk ) then PkY Z ∈ MDF (G Theorem 23. If PY factorizes according to G and verifies the sufficiency condition for ˜ S. {hs }s∈S , then PZ satisfies the chain graph factorization for G Proof. From Lemma 21, the statements in Lemmas 13, 14 and 16 are also verified replacing ˜ U ∪S . Consider the s-aggregation algorithm in the second step of its kth cycle GU ∪S by G 2k ˜ 2k ). Thus, by ˜ ˜ Uk ∪Sk )m . Lemma 22 and Lemma 13 imply PkY Z ∈ MF (G when G = (G 3k ˜ ). Finally, by Lemma 16 it follows that PkZ ∈ MFC (G ˜ 4k ). The Lemma 14, PkZ ∈ MF (G T T ˜ . rest of the proof runs as in Theorem 18 replacing G with G Using analogous arguments, and thanks to Lemma 22, we can extend Theorem 23 to intermediate graphs of the recursive aggregation algorithm. ˜ k ) for k = 1, . . . , T . Corollary 24. If PY ∈ Mhs (G), then PkZ ∈ MCF (G Example 25. Continuing with Example 20, in Figs. 6(c) and (e) we have the aggregated chain graph and the sufficient aggregated graph, respectively. Once we have stated how to translate the sufficiency condition to the aggregation algorithm, we can ask ourselves about what are the advantages achieved. The following proposition is a first step in this direction. ˜ k is a subset of those Proposition 26. For any k = 1, . . . , T the set of edges of the graph G of the graph Gk . ˜ 1k for k = 1, . . . , T . The sets Sk separate vertices in Uk beProof. Consider the graph G longing to different “contemporary” blocks Ut \Ut−1 . More precisely,
((u)) < ((u )) k ⇒ Yu @Yu | Sk , that is, there is no arrow between u and u . Note that in each step of the kth cycle both algorithms perform the same operations. But ˜ 1k depending on the algorithm, and these graphs have the the cycle starts from G1k or G same nodes but possibly differ in the set of edges. Let k = 1. All s ∈ S1 share the same “time” (s) = 1. In this case there is no arrow from ˜ 11 , and the resulting graphs G1 , G ˜ 1 , coincide. any s-vertex to an u-vertex in G ˜ Now we proceed by induction. Suppose that the G-graphs have edges always present in the G-graphs up to cycle k − 1. That is, ˜ t ) ⊆ E(Gt ), E(G
t = 1, . . . , k − 1.
(20)
˜k = G ˜ 4k ∪ Gs(k−1) . Take s and s verifying By construction Gk = G4k ∪ Gk−1 and G ˜ 4k if and only if there exist u ∈ U (s) (s) < (s ) = k. An arrow s → s appears in G
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
87
and u ∈ U (s ) directly connected in GUk . In G4k this condition is only sufficient but not necessary. As Example 11 shows, an arrow s → s can appear even when no arrow connects a vertex of U (s) to a vertex of U (s ). When (s) = (s ) the construction is identical in both algorithms. Thus, in general, ˜ 4k ) ⊆ E(G4k ) E(G and the proposition is true.
(21)
Comparing Figs. 6(c) and (e) of Example 25 we can see a line joining z21 to z12 in the aggregated graph that is not present in the s-aggregated graph. This shows that inclusion of edges in Proposition 26 can be strict. We can conclude that under the sufficiency condition for hs , s-aggregated graphs describe independence properties more accurately than aggregated graphs.
5. Examples of application 5.1. Damage of a reinforced structure In this example, the objective is to assess the damage of reinforced concrete structures of buildings. This example, which is taken from Liu and Li (1994) (see also Castillo et al., 1997), is slightly modified for illustrative purposes. The goal variable (the damage of a reinforced concrete beam) is denoted by X1 . A civil engineer initially identifies 16 variables (X9 , . . . , X24 ) as the main variables influencing the damage of reinforced concrete structures. In addition, the engineer identifies seven intermediate unobservable variables (X2 , . . . , X8 ) that define some partial states of the structure. Table 1 shows the list of variables and their definitions. In our example, the engineer specifies the following cause–effect relationships, as depicted in Fig. 7. The goal variable X1 , is related primarily to three factors: X9 , the weakness of the beam available in the form of a damage factor; X10 , the deflection of the beam; and X2 , its cracking state. The cracking state, X2 , is related to four variables: X3 , the cracking state in the shear domain; X6 , the evaluation of the shrinkage cracking; X4 , the evaluation of the steel corrosion; and X5 , the cracking state in the flexure domain. Shrinkage cracking, X6 , is related to shrinkage, X23 , and the corrosion state, X8 . Steel corrosion, X4 , is related to X13 , X24 , and X5 . The cracking state in the shear domain, X3 , is related to four factors: X11 , the position of the worst shear crack; X12 , the breadth of the worst shear crack; X21 , the number of shear cracks; and X8 . The cracking state in the flexure domain, X5 is affected by three variables: X13 , the position of the worst flexure crack; X22 , the number of flexure cracks; and X7 , the worst cracking state in the flexure domain. The variable X7 is a function of four variables: X14 , the breadth of the worst flexure crack; X15 , the length of the worst flexure crack; X16 , the cover; and X17 , the structure age. The variable X8 is related to three variables: X18 , the humidity; X19 , the PH value in the air; and X20 , the content of chlorine in the air. A graphical representation of the damage problem is shown in Fig. 7. Suppose that instead of the above variables the civil engineer only has some kind
88
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Table 1 Definitions of the variables related to damage assessment of reinforced concrete structures Xi
Definition
X1 X2 X3 X4 X5 X6 X7 X8
Damage assessment Cracking state Cracking state in shear domain Steel corrosion Cracking state in flexure domain Shrinkage cracking Worst cracking in flexure domain Corrosion state
X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24
Weakness of the beam Deflection of the beam Position of the worst shear crack Breadth of the worst shear crack Position of the worst flexure crack Breadth of the worst flexure crack Length of the worst flexure cracks Cover Structure age Humidity PH value in the air Content of chlorine in the air Number of shear cracks Number of flexure cracks Shrinkage Corrosion
of weighted averages of them in the following way(we denote the weighted average function by hi i = 1, . . . , 6): z1 = h1 (x14 , x15 , x16 , x17 ), z2 = h2 (x18 , x19 , x20 , x8 , x11 , x12 , x21 ), z3 = h3 (x7 , x22 , x13 , x5 , x24 , x4 ), z4 = h4 (x3 , x23 , x6 ) z5 = h5 (x9 , x2 ), z6 = h6 (x10 , x1 ) and this information is available sequentially by the engineer. First he knows the values of z1 and z2 , then z3 and z4 are simultaneously given and finally the values of both z5 and z6 . The temporal ordering of ZS is compatible with the original graph, after applying the aggregation algorithm, the resulting aggregated chain graph is given in Fig. 8.
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Fig. 7. Original graph.
Fig. 8. Aggregated chain graph.
89
90
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
Fig. 9. S-aggregated chain graph.
Consider now that hi i = 1, . . . , 8 weighted averages functions are given in the following way: z1 = h1 (x14 , x15 , x16 , x17 ), z2 = h2 (x18 , x19 , x20 ), z3 = h3 (x8 , x11 , x12 , x21 ), z4 = h4 (x24 , x23 ), z5 = h5 (x7 , x22 , x13 ), z6 = h6 (x5 , x4 , x6 , x3 ), z7 = h7 (x9 , x2 ), z8 = h8 (x10 , x1 ). As before, this information is available sequentially by the engineer. First he knows the values of z1 , z2 and z3 , after that, z4 , z5 and z6 are simultaneously given and finally the values of both z7 and z8 . As in the previous case the temporal ordering of ZS is compatible with the original graph. Suppose, by the other hand that all variables have Gaussian distributions and the dependence on its parents are given in the following way: Xi |Xpa(i) ∼ N (hi (xpa(i) ), 2i ). In this case the sufficient condition holds. The resulting aggregated sufficient chain graph is given in Fig. 9.
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
91
Fig. 10. Mortality counts in Alava (i = 1) and Guipuzcoa (i = 2).
Fig. 11. Final annual chain graph.
5.2. Meningitis mortality data Meningitis mortality data is observed at two neighbor counties in Spain, Alava and Guipuzcoa. The original data is aggregated by a period of six months at every location. A descriptive analysis of the original bivariate time series using partial cross-correlations and partial auto-correlations functions suggests a dependence structure as is shown in Fig. 10. Suppose that by administrative limitations data is only available for annual periods. The final annual aggregated chain graph is given in Fig. 11. The absence of an edge between these two locations in time t = 1 and 2 in the final annual graph shows that the influence of meningitis mortality in Alava on meningitis in Guipuzcoa has a delay bigger than three semesters in the original scale.
6. Conclusions The proposed aggregation algorithm and the resulting aggregated chain graph provide a natural description of independence properties of temporal aggregation of spatio–temporal data. We have seen that new spatial and temporal dependencies can appear as a consequence
92
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
of aggregation. The sufficiency condition given in Section 4 allows for the preservation of special temporal independence properties. Unfortunately, the aggregated chain graph does not identify all the independencies in the aggregated model. However, it satisfies a relevant property, in fact the aggregated chain graph is, within the chain graph class, the best graph that covers them. All the results have been proved assuming discrete distributions. They can be extended to the case of absolute continuity (see Sanmartín, 1997), but arguments become more abstract and the algorithmic steps cannot be introduced in such a natural way. The use of factorization properties to characterize statements of independence between variables avoids technical problems in the absence of the positivity condition. The main results have been stated in a more general setting than temporal aggregation of spatio–temporal data. They can be applied to every chain graph where an aggregation process is present. In fact, the starting acyclic directed graph could be replaced by a chain graph and the aggregation algorithm would still remain valid (if we assume the chain factorization of the initial hidden random vector with respect to this starting chain graph). Only some minor technical details would change in the proofs. We have chosen the directed graph version for a better motivation of the problem. The algorithm can be used in a reversed sense. Given the resulting observed aggregated process we can apply the algorithm to different starting graphs and try to identify original hidden dependencies. Acknowledgements The final version of this paper was carried out while the third author was visiting CRESTEnsae whose hospitality is gratefully acknowledged. The authors also wish to express their thanks to the associate editor and referees for their helpful suggestions and comments. References Andersson, S.A., Madigan, D., Perlman, M.D., 2001. Alternative Markov properties for chain graphs. Scand. J. Statist. 28, 33–85. Besag, J., 1974. Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36, 192–225. Besag, J., 1977. Some methods of statistical analysis for spatial data. Bull. Internat. Statist. Inst. 47, 77–92. Buntine, W., 1995. Chain graphs for learning. in: Besnard,P.,Hanks,S. (Eds.), Uncertainty in Artificial Intelligence, Vol. 11. Morgan Kaufmann, San Francisco, CA, pp. 46–54. Castillo, E., Gutiérrez, J.M., Hadi, A.S., 1997. Expert Systems and Probabilistic Networks Models. Springer, New York. Castillo, E., Ferrándiz, J., Sanmartín, P., 1998. Marginalizing in undirected graph and hypergraph models. in: Cooper,G.F.,Moral,S. (Eds.), Uncertainty in Artificial Intelligence, Vol. 14. Morgan Kaufmann, San Francisco, CA, pp. 69–78. Chadoeuf, J., Nandris, D., Geiger, J., Nicole, M., 1992. Modélisation spatio–temporelle d’une epidémie par un processus de Gibbs: estimation et tests. Biometrics 48, 1165–1175. Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J., 1999. Probabilistic Networks and Expert Systems. Springer, New York. Cox, D.R., Wermuth, N., 1996. Multivariate Dependencies: Models, Analysis and Interpretation. Chapman & Hall, London.
J. Ferrándiz et al. / Journal of Statistical Planning and Inference 133 (2005) 69 – 93
93
Cressie, N., 1991. Statistics for Spatial Data. Wiley, New York. Dahlhaus, R., 2000. Graphical interaction models for multivariate time series. Metrika 51, 157–172. Dahlhaus, R., Eichler, M., 2003. Causality and graphical models in time series analysis. in: Green,P.,Hjort,N.,Richardson,S. (Eds.), Highly Structured Stochastic Systems. Oxford University Press, Oxford, pp. 115–134. Darroch, J., Lauritzen, S.L., Speed, T., 1980. Markov fields and log-linear interaction models for contingency tables. Ann. Statist. 8, 522–539. Dawid, A.P., 1980. Conditional independence for statistical operations. Ann. Statist. 8, 598–617. Frydenberg, M., 1990. The chain graph Markov property. Scand. J. Statist. 17, 333–353. Frydenberg, M., 1990. Marginalization and collapsibility in graphical interaction models. Ann. Statist. 18, 790–805. Guyon, X., 1995. Random fields on a network: modelling, statistics and applications. In: Kelly, F.P., Willians, R.I. (Eds.), Stochastic Networks. The IMA Volumes in Mathematies and its Applications. Springer Verlag, New York. Irwin, M.E., Cressie, N., Johannesson, G., 2002. Spatial-temporal nonlinear filtering based on hierarchical models. Test 11, 249–302. Koster, J.T.A., 2002. Marginalizing and conditioning in graphical models. Bernoulli 8, 817–840. Lacruz, B., Lasala, P., Lekuona,A., 2000. Dynamic graphical models and nonhomogeneous hidden Markov models. Statist. Probab. Lett. 49, 377–385. Lauritzen, S.L., 1996. Graphical models. in: .Oxford Statistical Science Series, Vol. 17. Oxford University Press, Oxford. Lauritzen, S.L., Richardson, T.S., 2002. Chain graph models and their causal interpretations (with discussion). J. Roy. Statist. Soc. Ser. B 64, 321–361. Lauritzen, S.L., Wermuth, N., 1989. Graphical models for associatiation between variables, some of which are qualitative and some are quantitative. Ann. Statist. 17, 31–57. Liu, X., Li, Z., 1994. A reasoning method in damage assessment of buildings (special issue on Uncertainty in Expert Systems). Microcomput. Civil Eng. 9, 329–334. Lynggaard, H., Walther, K.H., 1993. Dynamic modelling with mixed graphical association models. Masters Thesis, Aalborg University. Meek, C., 1995. Strong completeness and faithfulness in Bayesian networks. in: Horvitz,E.,Jensen,F. (Eds.), Uncertainty in Artificial Intelligence, Vol. 12. Morgan Kaufmann, San Francisco, CA, pp. 40–48. Mohamed, W., Diamond, I.D., Smith, W.F., 1998. The determination of infant mortality in Malasia, a graphical chain modelling approach. J. Roy. Statist. Soc. Ser. A 161, 349–366. Pearl, J., 2002. Statistics and causal inference: a review. Test 12, 281–345. Richardson, T.S., 1998. Chain graphs and symmetric associations. in: Jordan,M.I. (Ed.), Learning in Graphical Models. Kluwer, Dordrecht, pp. 231–260. Richardson, T.S., 2003. Markov properties for acyclic directed mixed graphs. Scand. J. Statist. 30, 145–157. Richardson, T.S., Spirtes, P., 2002. Ancestral graph Markov models. Ann. Statist. 30, 962–1030. Sanmartín, P., 1997. Agregación temporal en modelos de grafos cadena. Ph.D. Thesis, Departament d’Estadística i I.O., Universitat de Valencia. Shachter, R.D., 1988. Probabilistic inference and influence diagrams. Oper. Res. 36, 589–603. Studeny, M., 1997. On marginalization, collapsibility and precollapsibility. in: Bienes,V.,Stepan,J. (Eds.), Distributions with Given Marginals and Moment Problems. Kluwer, Dordrecht, pp. 191–198. Studeny, M., Bouckaert, R., 1998. On chain graph models for description of conditional independence structures. Ann. Statist. 26, 1434–1495. Wermuth, N., Cox, D., 2000. A sweep operator for triangular matrices and its statistical applications. Technical Report 00-04 ZUMA Institute, Mannheim, Germany. Wermuth, N., Lauritzen, S.T., 1990. On substantive research hypotheses conditional independence graphs and graphical chain models (with discussion). J. Roy. Statist. Soc. Ser. B 52, 21–72. Wermuth, N., Cox, D., Pearl, J. 1999. Explanations for multivariate structures derived from univariate recursive regressions. Technical Report Revision of 94-1, University of Mainz, Germany. Whittaker, J., 1990. Graphical Models in Applied Multivariate Statistics. Wiley, New York. Winkler, G., 1995. Image Analysis, Random Fields and Dynamic Monte Carlo Methods: A Mathematical Introduction. Springer, Berlin.