Identifiability of Path-Specific Effects Chen Avin, Ilya Shpitser, Judea Pearl Cognitive Systems Laboratory Department of Computer Science University of California, Los Angeles Los Angeles, CA. 90095 {avin, ilyas, judea}@cs.ucla.edu Abstract which defined effects in a more refined way by holding vari- ables constant not to predetermined values, but to values they Counterfactual quantities representing path- would have attained in some situation. For example, the nat- specific effects arise in cases where we are ural direct effect of Z on Y is the sensitivity of Y to changes interested in computing the effect of one variable in Z , while the context variables are held fixed to the values on another only along certain causal paths in the they would have attained had no change in Z taken place. graph (in other words by excluding a set of edges Similarly, the natural indirect effect is the sensitivity of Y to from consideration). A recent paper [Pearl, 2001] changes the context variables would have undergone had Z details a method by which such an exclusion been changed, while Z is actually being fixed. can be specified formally by fixing the value of Being complex counterfactual quantities, natural effects the parent node of each excluded edge. In this tend to have intricate verbal descriptions. It is often easier paper we derive simple, graphical conditions for to explain such effects using the visual intuitions provided experimental identifiability of path-specific effects, by graphical causal models. Graphical causal models repre- namely, conditions under which path-specific sent causal assumptions as graphs, with vertices representing effects can be estimated consistently from data variables, and edges representing direct causal paths. In such obtained from controlled experiments. models, natural direct effect can be interpreted as the effect along the edge Z Y , with the effect along all other edges 1 Introduction 'turned off.' Similarly, the natural indirect effect can be inter- preted as the effect along all edges except the one between Z Total, direct and indirect effects are important quantities in and Y . Using this interpretation, the suggestive next step in practical causal reasoning about legal, medical, and public the study of natural effects is to consider effects along a se- policy domains, among others. The task of explicating, and lect subset of edges between Z and Y which are called path- computing these quantities has been successfully addressed specific effects. in the framework of linear structural equation models (SEM), but encountered difficulties in non-linear as well as non- 1.1 A Motivating Example parametric models. See for instance [Robins and Greenland, 1992], [Galles and Pearl, 1995], [Pearl, 2001], Consider the following example, inspired by [Robins, 1997], In the linear SEM framework, the total effect of Z on Y is A study is performed on the effects of the AZT drug on AIDS the response of Y to a unit change in the setting of Z . On patients. AZT is a harsh drug known to cause a variety of the other hand, the direct effect is the effect of Z on Y not complications. For the purposes of the model, we restrict mediated by any other variable in the model while the indirect our attention to two ­ pneumonia and severe headaches. In effect is the effect of Z on Y excluding the direct effect. turn, pneumonia can be treated with antibiotics, and severe In non-parametric models, we can define the controlled di- headache sufferers can take painkillers. Ultimately, all the rect effect as the change in the measured response of Y to a above variables, except headache, are assumed to have a di- change in Z , while all other variables in the model, hence- rect effect on the survival chances of the patient. The graphi- forth called context variables, are held constant. Unfortu- cal causal model for this situation is shown in Fig. 1. nately, there is no way to construct an equivalent notion of The original question considered in this model was the to- controlled indirect effects, since it is not clear to what val- tal effect of AZT and antibiotics treatment on survival. How- ues other variables in the model need to be fixed in order to ever, a variety of other questions of interest can be phrased measure such an effect. in terms of natural effects. For instance, what is the direct Recently, a novel formulation of natural [Pearl, 2001] or effect of AZT on survival, if AZT produced no side effects in pure [Robins and Greenland, 1992] effects was proposed the patient, which is just the natural direct effect of AZT on survival. See Fig. 2 (a). Similarly, we might be interested in This research was partially supported by AFOSR grant how just the side effects of AZT affect survival, independent #F49620-01-1-0055, NSF grant #IIS-0097082, and ONR (MURI) of the effect of AZT itself. This corresponds to the natural grant #N00014-00-1-0617. A A A H P H P H P K B K BK B S S S Figure 1: The AZT example. A: AZT, P : pneumonia, H : (a) (b) headaches, B : antibiotics, K : painkillers, S : survival Figure 2: (a) Natural direct effect (b) Natural indirect effect indirect effect of AZT on survival. See Fig. 2 (b). Furthermore, certain interesting questions cannot be A A phrased in terms of either direct or indirect natural effects. For example we might be interested in the interactions be- H P H P tween antibiotics and AZT that negatively affect survival. To study such interactions, we might consider the effect of ad- K B K B ministering AZT on survival in the idealized situation where the antibiotics variable behaved as if AZT was not adminis- tered, and compare this to the total effect of AZT on survival. S S Graphically, this amounts to 'blocking' the direct edge be- (a) (b) tween antibiotics and survival or more precisely, keeping the edge functioning at the level it would have had no AZT been given, while letting the rest of the edges function as usual. Figure 3: Path specific effects This is shown graphically in Fig. 3 (a). The edges which we wish to block will be crossed out in the graph. To obtain this result formally, we treat effects as probabili- 1.2 Outline and Discussion of Our Approach ties of statements in a certain counterfactual logic. However, rather than manipulating these probabilities directly, we con- Our goal is to study and characterize situations where path- vert them to subgraphs of the original causal model, and rea- specific effects like the one from the previous section can be son about and perform manipulations on the subgraphs. We computed uniquely from the data available to the investigator. then introduce simple counterfactual formulas whose prob- Our main result is a simple, necessary, graphical condition for abilities are not identifiable, and prove that certain simple the identifiability of path-specific effects from experimental graphical conditions must be described by such formulas, and data. Furthermore, our condition becomes sufficient for mod- lack of such conditions leads to subgraphs corresponding to els with no spurious correlations between observables, also identifiable effects. known as Markovian models. Due to space considerations, the proofs of some lemmas The condition can be easily described in terms of blocked have been omitted, while the proofs included generally are and unblocked paths as follows. Let X , Y be variables in missing some technical details. Our technical report contains a causal model M inducing a graph G. Then given a set of the complete proofs. blocked edges g , the corresponding path-specific effect of X on Y cannot be identified if and only if there exists a node W with an unblocked directed path from X to W , an unblocked 2 Preliminaries directed path from W to Y , and a blocked directed path from W to Y . For instance, the effects of A on S are identifiable in Fig. 2 (a), (b), and Fig. 3 (b), but not in Fig. 3 (a). There- This paper deals extensively with causal models and counter- fore, in general we cannot study the interractions of AZT and factuals. We reproduce their definitions here for complete- ness. A full discussion can be found in [Pearl, 2000]. For antibiotics in the way described above, but we can study the interractions of AZT and painkillers. The latter case is made the remainder of the paper, variables will be denoted by cap- tractable by an absense of blocked and unblocked paths shar- ital letters, and their values by small letters. Similarly, sets of ing edges. variables will be denoted by bold capital letters, sets of values by bold small letters. We will also make use of some graph Our condition also shows that all identifiable path-specific theoretic abbreviations. We will write P a(A)G , De(A)G , effects are 'equivalent', in a sense made precise later, to ef- and An(A)G , to mean the set of parents, descendants (in- fects where only root-emanating edges are blocked. Thus clusive), and ancestors (inclusive) of node A in graph G. G identifiable path-specific effects are a generalization of both will be omitted from the subscript when assumed or obvious. natural direct effects, where a single root-emanating edge is If a variable is indexed, i.e. V i , we will sometimes denote the unblocked, and of natural indirect effects, where a single root- above sets as P ai , Dei , and Ani , respectively. emanating edge is blocked. (i) (u) t1 = t2 and for the given setting of u, the terms 2.1 Causal Models and Counterfactual Logic t1 and t2 are equal in M . Definition 1 A probabilistic causal model (PCM) is a tuple M = U , V , F , P (u) , where (ii) (u) (¬ )(u) and M |= (u). (i) U is a set of background or exogenous variables, which (iii) (u) ( )(u) and M |= (u) and M |= (u) cannot be observed or experimented on, but which can Thus a formula (u) has a definite truth value in M . If the influence the rest of the mode values u are unknown, we cannot in general determine the (ii) V is a set {V 1 , ..., V n } of observable or endogenous truth of . However, we can easily define a natural notion of variables. These variables are considered to be func- probability of in M as follows: { tionally dependent on some subset of U V . P (|M ) = P (u) (1) (iii) F is a set of functions {f 1 , ..., f n } such that each f i is u|M |=(u)} a mapping fF m a subset of U V \ {V i } to V i , and ro We will omit the conditioning on M if the model in ques- is a function from U to V . such that tion is assumed or obvious. (iv) P (u) is a joint probability distribution over the vari- If we consider each value assignment u as a possible ables in U . world, then we can view P (u) as describing our degree of A causal model M induces a directed graph G, where each belief that a particular world is true, and P () as our be- variable corresponds to a vertex in G and the directed edges lief that a particular statement is true in our causal model if are from the variables in the domain of f i (i.e P ai ) to V i for viewed as a ty pe 2 probability structure [Halpern, 1990]. all the functions. For the remainder of this paper, we consider 2.2 Submodels and Identifiability causal models which induce directed acyclic graphs. A Markovian causal model M has the property that each = Definition 5 (submodel) For a causal model M exogenous variable U is in the domain of at most one func- U , V , F , P (u) , an intervention do(z ) produces a new tion f . A causal model which does not obey this property is causal model Mz = U , V z , F z , P (u) , where V z is a called semi-Markovian. By convention, nodes corresponding set of distinct copies of variables in V , and F z is obtained to variables in U are not shown in graphs corresponding to by taking distinct copies of functions in F , but replacing all Markovian models. copies of functions which determine the variables in Z by For the purposes of this paper, we will represent counter- constant functions setting the variables to values z . factual statements in a kind of propositional modal logic, sim- The joint distribution P (V z ) over the endogenous vari- ilar to the one used in [Halpern, 2000]. Furthermore, the dis- ables in Mz is called an interventional distribution, and is tribution P (u) will induce an additional probabilistic inter- sometimes denoted as Pz . For a given causal model M , de- pretation on the statements in the logic. fine P as {Pz |Z V , z a value assignment of Z }. In other Definition 2 (atomic counterfactual formula) Let M be a words, P is the set of all possible interventional (or experi- causal model, let X be a variable and Z be a (possibly mental) distributions of M . empty) set of variables. Then for any value x of X , and val- Intuitively, the submodel is the original causal model, min- ues z of Z , x is a term, and Xz (u) is a term, taken to mean imally altered to render Z equal to z , while preserving the 'the value X attains when Z is forced to take on values z , rest of its probabilistic structure. and U attain values u.' Because there is no requirement that interventions in For two terms t1 and t2 , an atomic counterfactual formula atomic counterfactuals in a formula be consistent with each has the form t1 = t2 . We will abbreviate formulas of the form other, it is in general impossible to alter the original model Xz (u) = x as xz (u). using only interventions in such a way as to make the en- tire formula true. Thus, we introduce a causal model which The 'forcing' of the variables to z is called an intervention, encompasses the 'parallel worlds' described by the counter- and is denoted by do(z ) in [Pearl, 2000]. Counterfactual for- factual formula. mulas are constructed from atomic formulas using conjunc- Before doing so, we give a simple notion of union of sub- tion and negation. models, as follows: Definition 3 (counterfactual formula) Definition 6 (causal model union) Let Mx , and Mz be sub- (i) An atomic formula (u) is a counterfactual formula. models derived from M . Then Mx Mz is defined to be Mx (ii) If (u) is a counterfactual formula, then so is (¬)(u). if z = x, and U , V x V z , F x F z , P (u) , otherwise. (iii) If (u) and (u) are counterfactual formulas, then so is Definition 7 (parallel worlds model) Let M be a causal ( )(u). model, a counterfactual formula. Then the parallel worlds model M is the causal model union of the submodels corre- The satisfaction of counterfactual formulas by causal mod- sponding to atomic counterfactuals of . els is defined in the standard way, which we reproduce from [Halpern, 2000]. We call the joint distribution P (V ) over the endogenous Definition 4 (entailment) A causal model M satisfies a variables in M a counterfactual distribution, and will some- counterfactual formula (u), written M |= (u), if all vari- times denote it as P . In the language of the potential out- comes framework [Rubin, 1974], we can view P as the joint ables appearing in are in M and one of the following is distribution over the unit-response variables mentioned in . true the value of P Ai (g )z (u) in M . The collection of modified ¯ The parallel worlds model is a generalization of the twin network model, first appearing in [Balke and Pearl, 1994], to functions forms a new model Mg . The g -specific effect of z on Y , denoted S Eg (z , z ; Y , u)M is defined as the total effect more than two possible worlds. It displays independence as- (abbreviated as TE) of z on Y in the modified model: sumptions between counterfactual quantities in the same way a regular causal model displays independence assumptions S Eg (z , z ; Y , u)M = T E (z , z ; Y , u)Mg (2) between observable quantities ­ by positing counterfactuals are independent of their non-descendants given their parents. where T E (z , z ; Y , u)Mg = Yz (u)Mg - Yz (u)Mg . Given a causal model M and a formula , we are interested in whether the corresponding counterfactual joint distribution If we wish to summarize the path-specific effect over all P (or its marginal distributions) can be computed uniquely settings of u, we should resort to the expectation of the above from the set of joint distributions available to the investigator. difference, or the expected path-specific effect. To identify The formal statement of this question is as follows: this effect, we need to identify P (yz ) and P (yz ) in Mg . For Definition 8 (identifiability) Let M be a causal model from our purposes we can restrict our attention to P (yz ), as the a set of models M inducing the same graph G, M a par- second term corresponds to the quantity P (yz ) in the origi- allel worlds model, and Q be a marginal distribution of the nal model M , and so is trivially P -identifiable. counterfactual joint distribution P . Let K be a set of known In this paper we assume, without loss of generality, edges probability distributions derived from M . Then Q is K - in g = G \ g are all along directed paths between Z and ¯ identifiable in M if it is unique and computable from K in Y . The next theorem states that any path specific effect, ex- any M M . pressed as a total effect in the modified model Mg , can be expressed as a counterfactual formula in the original model It follows from the definition that if we can construct two M. models in M with the same K but different Q, then Q is not identifiable. An important, well-studied special case of Theorem 1 Every path specific effect P (yz )Mg has a corre- this problem ­ which we call evidential identifiability of in- sponding counterfactual formula in M s.t for every u, terventions ­ assumes is an atomic counterfactual, and K is the joint distribution over the endogenous variables in M , Mg |= yz (u) M |= (u) or P (V ). Being able to identify an interventional marginal in this way is being able to compute the effects of an interven- Proof outline: The proof is for causal models with fi- tion without having to actually perform the intervention, and nite domains. Fix M , u, y , z and g . To prove the the- instead relying on passive, observational data. orem, we need to 'unroll' yz and remove any implicit ref- In this paper we are concerned with identifying probabili- erences to modified functions in Mg , while preserving the ties of counterfactuals formulas using the set P of all inter- truth value of the statement. Our proof will use the axiom ventional distributions of M as a given. In other words, we of composition, known to hold true for causal models un- are interested in computing probabilities of counterfactuals der consideration. In our language, the axiom states that from experimental and observational probabilities. for any three variables Z, Y , W , and any settings u, z , w, y , (Wz = w Yz,w = Yz )(u). 3 Path-Specific Effects Fix u1 . Let S = An(Y ) De(Z ) Then by axiom of com- position, yz (u1 ) has the same truth value as a conjunction of Our aim is to provide simple, graphical conditions for the P - atomic formulas of the form vpai (g) , where V i S , P Ai (g ) i identifiability of path-specific effects. To do so, we must for- malize such effects as counterfactual formulas, and translate is the set of parents of V i in Mg , and pai (g ) and v i are suit- the identifiability conditions on the formula to conditions on ably chosen constants. Denote this conjunction 1 . the graph. For every term vpai (g) in 1 corresponding to V i with i The following is the formalization of the notion of path- P Ai (g ) P Ai , replace it by vpai (g),pai (g) pai (g ) in the i ¯z specific effect in terms of a modified causal model, as it ap- ¯ conjunction, where pa (g ) takes the value of P Ai (g )z (u1 ) i pears in [Pearl, 2001]: ¯ ¯ in M . Denote the result 1 . Note that 1 is in M and Definition 9 (path-specific effect) Let G be the causal Mg |= yz (u1 ) M |= 1 (u1 ). We construct a sim- graph associated with model M , and let g be an edge- ilar conjunction j for every instantiation uj in M . Let j subgraph of G containing the paths selected for effect analy- = j . It's easy to see the claim holds for by con- sis (we will refer to g as the effect subgraph). The g -specific struction. effect of z on Y (relative to reference z ) is defined as the 2 An easy corollary of the theorem is, as before, that total effect of z on Y in a modified model Mg formed as P (yz )Mg = P ()M . Note that different i in the proof only follows. Let each parent set P Ai in G be partitioned into differ in the values they assign to variables in S . Since M is two parts P Ai = {P Ai (g ), P Ai (g )}, where P Ai (g ) rep- ¯ composed of functions, the valu{ s of variables in S are fixed e resents those members of P Ai that are linked to V i in g , given u, and since P () = u|M |= i i (u)} P (u) by def- and P Ai (g ) represents the complementary set. We replace ¯ W each function f i in M with a new function fg in Mg , defined i inition, we can express P () as a summation over the vari- ables in S \ {Y }. as follows: for every set of instantiations pai (g ) of P Ai (g ), fg (pai (g ), u) = f i (pai (g ), pai (g ) , u), where pai (g ) takes i ¯ ¯ For instance, the first term of the path-specific effect in Fig. Z Z Z Z 1 2 Table 1: The functions fR and fR 1 2 R = fR (z , uR ) R = fR (z , uR ) UR Z V V V V 0 1 0 1 0 2 1 1 W W 0 3 1 0 Y Y Y Y 1 1 1 1 (a) (b) 1 2 0 0 1 3 0 0 Figure 4: Bold edges represent directed paths (a) R1 Rule (b) R2 Rule edges. Thus, it is not surprising these two identities cannot be applied forever in a dag. 2 (a) can be expressed as k Lemma 1 Let M be a causal model, g an effect subgraph. P (sk,b,p,a kh bp pa ha ) P (sa )Mg2a = Then any sequence of applications of R1 and R2 to g will reach a fixed point g . ,b,p,h h P (sa,h,p ha pa ) = (3) 4 Problematic Counterfactual Formulas ,p Identification of a distribution must precede its estimation, which is just the direct effect. The more general case of Fig. as there is certainly no hope of estimating a quantity not 3 (a) can be expressed as:1 uniquely determined by the modeling assumptions. Further- k more, uniqueness frequently cannot be guaranteed in causal P (sk,b,p,a kh ba pa ha ) P (sa )Mg3a = models. For instance, when identifying interventions from ,b,p,h b observational data, a particular graph structure, the 'bow- arc', has proven to be troublesome. Whenever the graph of P (sa,b ba ) = (4) a causal model contains the bow-arc, certain experiments be- come unidentifiable [Pearl, 2000]. Our investigation revealed It looks as if the expressions in Eq. (3) and (4) for the that a similarly problematic structure exists for experimental two effects are very similar, moreover we know that direct identifiability, which we call the 'kite graph', due to its shape. effects are always P -identifiable in Markovian models. Sur- The kite graph arises when we try to identity counterfactual prisingly, the path specific effect of Fig. 3 (a) and Eq. (4) is probabilities of the form P (rz rz ). not P -identifiable as we will show later. Lemma 2 Let M be a causal model, let Z and R be vari- We will find it useful to modify the effect subgraph g while ables such that Z is a parent of R. Then P (rz rz ) is not preserving the value of the path-specific effect. We do so by P -identifiable if z = z . means of the following two rules. Let M be a causal model with the graph G, g an effect subgraph of G, and g = G \ g . ¯ Proof outline: The proof is by counter example. We let For a node V , let in(V ) denote the set of edges incoming into = rz rz , and construct two causal models M 1 and V , and out(V ) denote the set of edges outgoing from V , in M 2 that agree on the interventional distribution set P , but G. disagree on P (). In fact, we only need 2 variables. The R1 : If there is a node V in G such that out(V ) g , then ¯ two models agree on the following: Z is the parent of R, R1 (g ) = (g \ out(V )) in(V ). See Fig. 4 (a). UZ , Z and R are binary variables, UR be a ternary variable, fZ = UZ , and P (uZ ), and P (uR ) are uniform. The two R2 : If there is an edge e g , such that for all directed paths ¯ models only differ on the functions fR , which are given by from Z to Y which include e, there exists another edge table 4. It's easy to verify our claim holds for the two models e g , which occurs 'upstream' from e, then R2 (g ) = ¯ for any values z = z of Z . 2 g \ {e}. See Fig. 4 (b). The next theorem shows how a particular path-specific ef- Theorem 2 (Effect-Invariant Rules) If R1 is applicable the fect leads to problematic counterfactuals from the previous R1 (g )-specific effect is equal to the g -specific effect. If R2 is lemma. applicable the R2 (g )-specific effect is equal to the g -specific Theorem 3 The g -specific effect of Z on Y as described in effect. Fig. 5 (a) is not P -identifiable. Proof outline: The proof is by induction on graph struc- Proof: We extend models M 1 and M 2 from the previous ture, and is an easy consequence of the definition of g -specific proof with additional variables V , Y , and UY . We assume effect, and the R1 and R2 rules. 2 P (uY ) is uniform, and both P (V , Y |R) and the functions Intuitively, R1 'moves' the blocked edges closer to the which determine V and Y are the same in both models. manipulated variable Z , and R2 removes redundant blocked Note that since all variables are discrete, the conditional 1 P Note that Eq (4) is different from ba P (sa,b ba ) which is probability distributions can be represented as tables. If we require |R| = |V | and |Y | = |V | |R|, then the conditional just a marginalization over the counterfactual variable ba Pearl, 1998], [Halpern, 2000]. Such procedures are far less Z Z intuitive, do not have running time bounds, and cannot be used to obtain non-identifiability proofs. First let's define this criterion: Definition 10 (Recanting witness criterion) Let R = Z be R R a node in G, such that there exists a directed path in g from Z V to R, a directed path from R to Y in g , and a direct path from R to Y in G but not g . Then Z , Y , and g satisfy the recanting witness criterion with R as a witness Y Y The recanting witness criterion is illustrated graphically as (a) (b) the 'kite pattern' in Fig. 5 (b). The name 'recanting witness' comes from the behavior of the variable R in the center of Figure 5: (a) Problematic effect (b) The kite graph the 'kite.' This variable, in some sense, 'tries to have it both ways.' Along one path from R to Y , R behaves as if the variable Z was set to one value, but along another path, R probabilities are representable as square matrices. We fix the functions fV and fY , as well as the exogenous parents of V behaves as if Z was set to another value. This 'changing of and Y such that the matrices corresponding to P (V , Y |R) and the story' of R is what causes the problem, and as we will P (V |R) are matrices are invertible. show it essentially leads to the the existence of a non P - Call the extended models M 3 and M 4 . Note that by con- identifiable expression of the type discussed in section 4. struction, the two models are Markovian. Since M 1 and M 2 To proceed, we must make use of the following helpful lemmas: Let g be an effect subgraph of G and g the fixed have the same P , and since the two extended models agree on all functions and distributions not in M 1 and M 2 , they point of R1 and R2 . Let g = G \ g . must also have the same P . Lemma 3 g satisfies the recanting witness criterion iff g Consider the g -specific effect shown in Fig. 5 (a). From does. Moreover, if g does satisfy the criterion, then there 3 Theorem 1 we can express the path-specific effect in Mg in exists a witness R s.t out(R) g = . If g does not, then 3 terms of M , In particular: g out(Z ). r P (yrv rz vz )M 3 P (yz )Mg = Lemma 3 states that repeated applications of rules R1 and 3 R2 preserves the satisfaction of the recanting witness crite- v r rion. Moreover, if the witness exists in the fixed point g , then rz )M 3 (yrv rz vr = some outgoing edge from it is blocked. If the witness does not ,v ,r P exist in g , then only root-emanating edges are blocked. r P (rz , rz )M 3 (yrv )M 3 P (vr = )M 3 Lemma 4 Assume the g -specific effect of Z on Y is P - ,v ,r P identifiable. Let E be any set of edges in g . Let g = E g . Then the g -specific effect of Z on Y is P -identifiable. The last step is licensed by the independence assumptions en- coded in the parallel worlds model of yrv rz vr rz . The Lemma 4 states that if a path specific effect is not identi- same expression can be derived for P (yz )Mg . Note that since 4 fied, then adding blocked directed edges 'does not help,' in P is the same for both models they have the same values that the effect remains unidentified. Now we can state and for the interventional distributions P (yrv ) and P (vr ). Note prove the main results: that since P (Y |R, V ) and P (V |R) are square matrices, the Theorem 4 If g satisfies the recanting witness criterion, then summing out of P (Y |R, V ) and P (V |R) can be viewed as a the g -specific effect of Z on Y is not P -identifiable. linear transf ormation. Since the matrices are invertible, the transformations are one to one, and so if their composi- Proof: Let M be our model and assume that g satisfies the tion. Since P (yrv ) = P (y |r, v ) and P (vr ) = P (v |r ), and recanting witness criterion. By Lemma 3 so does g , let R be since P (rz rz ) is different in the two models, we obtain the witness from the lemma s.t e = R V is in g . Assume that P (yz )Mg = P (yz )Mg . Since adding directed or bidi- the g -specific effect is identifiable, By Theorem 2 so is the 3 4 g -specific effect. Let g be the path specific effect obtained rected edges to a graph cannot help identifiability, the result by adding all edges to g , but e. By Lemma 4 the g -specific also holds in semi-Markovian models. 2 effect is also P -identifiable. Now by composing the func- tions in g we can obtain a new model M which is exactly 5 Main Result the model of Fig. 5 (a)2 and P (yz )Mg = P (yz )Mg . From The main result of this section is that a simple sufficient Theorem 3 we know that P (yz )Mg is not P -identifiable, and necessary (in Markovian models) graphical criterion ex- ists. This condition is easily stated and can be derived from hence, neither is P (yz )Mg and the g -specific effect is not the effect subgraph g in linear time. By contrast, the only P -identifiable. Contradiction. 2 To illustrate the use of other methods known to us for obtaining identifiability re- 2 sults of probabilities of general counterfactual logic formulas or a similar model where we "cut" the edge R V and not the are proof search procedures based on results in [Galles and edge R Y the theorem, consib er the example in Eq. (4) from Section 3. d While it is possible to give a sufficient condition for iden- P (sa,b ba ) = The expression tifiability of general counterfactual formulas in our language, using induction on formula structure, this does not give a b single necessary and sufficient condition for semi-Markovian pa ) P (sa,b bp = models. The search for such a condition is a good direction ,p b for future work. pa pa ) Another interesting direction is to consider special cases (sa,b,p bp = (5) of causal models where path-specific effects can be identified ,p,p P b even in the presence of the 'kite' ­ this is true in linear mod- (pa pa ) (sa,b,p bp = els, for instance. )P ,p,p P Finally, our result assumes causal models with finite do- mains, and 'small' graphs. An interesting generalization is The first two steps are by definition, the last step is licensed to consider causal models with 'large' or infinite graphs and by the parallel worlds model corresponding to the formula in infinite domains. Such models may require adding first-order Eq. 5. The theorem shows that, as in this example, non- features to the language. identifiability arises because formulas of the form pa pa appear whenever the recanting witness criterion holds. 7 Acknowledgements Theorem 5 If g does not satisfy the recanting witness crite- rion, then the g -specific effect of Z on Y is P -identifiable in The authors would like to thank Brian Gaeke and Paul Twohey for proofreading earlier versions of this paper. Markovian models. Proof: From theorem 2 we have that P (yz )Mg = P (yz )Mg . References Since g does not satisfy the recanting witness criterion, by Lemma 3 all the edges in g emanate from Z . From Theorem [Balke and Pearl, 1994] Alexander Balke and Judea Pearl. 1 there is a formula (g ) corresponding to P (yz )Mg that Counterfactual probabilities: Computational methods, i bounds and applications. In Proceedings of UAI-94, pages contains only atomic counterfactuals of the form vpai . Since 46­54, 1994. all blocked edges emanate from Z , it can be easily observed j that for each two atomic counterfactuals in (g ), vpai , vpaj , i [Galles and Pearl, 1995] David Galles and Judea Pearl. Test- ing identifiability of causal effects. In Proceedings of UAI- i = j . This follows, since we only introduce atomic coun- terfactuals with do(z ) where we cut edges. Now since in 95, pages 185­195, 1995. Markovian models any two different variables are indepen- [Galles and Pearl, 1998] David Galles and Judea Pearl. dent if you set all their parents, all the atomic counterfactual An axiomatic characterization of causal counterfactuals. in (g ) are independent of each other which makes the ex- Foundation of Science, 3:151­182, 1998. pression P -identifiable. 2 [Halpern, 1990] Joseph Y. Halpern. An analysis of first-order For example, we stated earlier that the g specific effect of logics of probability. Artificial Intelligence, 46(3):311­ Fig 3 (b) is identifiable, this is true since g does not satisfy 350, 1990. the recanting witness criterion. In particular the expression [Halpern, 2000] Joseph Halpern. Axiomatizing causal rea- for the path-specific effect is: k soning. Journal of A.I. Research, pages 317­337, 2000. P (sk,b,p,a kh ba pa ha ) P (sa )Mg3b = [Pearl, 2000] Judea Pearl. Causality: models, reasoning, and ,b,p,h h inference. Cambridge University Press, 2000. P (sh,a ha ) = (6) [Pearl, 2001] Judea Pearl. Direct and indirect effects. In Pro- ceedings of UAI-01, pages 411­420, 2001. h = P (sh,a )P (ha ) [Robins and Greenland, 1992] James M. Robins and Sander Greenland. Identifiability and exchangeability of direct and indirect effects. Epidemiology, 3:143­155, 1992. As before, the first two steps are by definition, and the last step is licensed by the parallel worlds model corresponding to [Robins, 1997] James M. Robins. Causal inference from the formula in Eq. 6. But now note that P (sh,a ), P (ha ) complex longitudinal data. In Latent Variable Modeling P , therefore the above expression can be computed from ex- and Applications to Causality, volume 120, pages 69­117, periments. 1997. [Rubin, 1974] D. B. Rubin. Estimating causal effects of 6 Conclusions treatments in randomized and non-randomized studies. Our paper presented a sufficient and necessary graphical con- Journal of Educational Psychology, 66:688­701, 1974. ditions for the experimental identifiability of path-specific ef- fects, using tools from probability theory, graph theory, and counterfactual logic. We related identifiable path-specific ef- fects to direct and indirect effects by showing that all such effects only block root-emanating edges.