JID:IJA
AID:7714 /DIS
[m3G; v 1.132; Prn:14/04/2014; 15:21] P.1 (1-3)
International Journal of Approximate Reasoning ••• (••••) •••–•••
Contents lists available at ScienceDirect
International Journal of Approximate Reasoning www.elsevier.com/locate/ijar
Discussion
Comments on “Likelihood-based belief function: Justification and some extensions to low-quality data” by Thierry Denœux Serafín Moral Dpto. Ciencias de la Computación e Inteligencia Artificial, Universidad de Granada, 18071 Granada, Spain
a r t i c l e
i n f o
Article history: Received 13 December 2013 Received in revised form 1 April 2014 Accepted 2 April 2014 Available online xxxx Keywords: Consonant belief functions Likelihood Statistical inference Imprecise probability Bayesian statistics
a b s t r a c t The paper by Denœux justifies the use of a consonant belief function to represent the information provided by a likelihood function and proposes some extensions to low-quality data. In my comments I consider the point of view of imprecise probabilities for the representation of likelihood information and the relationships with the proposal in the paper. I also argue for some alternatives to the use of consonant belief functions. Finally, I add some clarifications about the comparison with the Bayesian approach. © 2014 Published by Elsevier Inc.
1. Comments The problem considered in Denœux’s paper [1] is very important because, in many practical situations, we have a parameter space Θ and the only available information is provided by an observation X ∈ X, which defines a likelihood on Θ , through a conditional probability distribution p (x|θ). If there is a prior distribution on Θ , then the problem can be solved using Bayesian statistics, but in many situations this prior distribution is not available. In fact, in the paper [2] of this issue, we consider a similar problem, which is studied within the theory of imprecise probabilities. In this paper a sound treatment is considered using the theory of belief functions. I agree in that profile likelihood (i.e., the consonant belief function with a contour function equal to the relative likelihood) is a distinguished solution to the problem, but I have some doubts about the unicity of the solution. The argument provided in the paper is that this plausibility is the least informative one in B X , where this set is given by all the plausibility functions that are compatible with Bayes’ rule. This compatibility is satisfied when the combination of the plausibility with any prior probability in X using Dempster’s rule is equal to the posterior of the prior given the likelihood. However, in Section 2.4 it is discussed the incompatibility with Dempster’s rule when two conditionally independent pieces of information are available, each one of them providing a likelihood. It is said that to transform each one of them into a plausibility function and then combine the results using Dempster’s rule is not the same as to consider them as a joint information defining the product likelihood, and compute the plausibility associated with this product likelihood. The only possible explanation for this incompatibility is that it might be that different kinds of evidence require different combination mechanisms. If this argument is accepted in this case, why not to accept it for prior and likelihood information? In fact these are very specific types of information which could have specific procedures to be combined. Then, if compatibility with Bayesian inference is transformed into a weaker requirement in which the
DOI of original article: http://dx.doi.org/10.1016/j.ijar.2013.06.007. E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.ijar.2014.04.005 0888-613X/© 2014 Published by Elsevier Inc.
JID:IJA
AID:7714 /DIS
[m3G; v 1.132; Prn:14/04/2014; 15:21] P.2 (1-3)
S. Moral / International Journal of Approximate Reasoning ••• (••••) •••–•••
2
combination of prior (non-necessarily Bayesian) and likelihood information is not done by using Dempster’s rule, then this opens the possibility to other ways of assigning a plausibility to a likelihood function. In [3] other rules are proposed. If we call pl to the one defined in the paper and pl1 to the one defined by
pl1 ( A ) = θ ∈ A
L (θ; x)
θ ∈Θ L (θ; x)
(1)
,
which is in fact a probability equal to the posterior with respect to the uniform distribution on Θ , then the plausibility
pl3 ( A ) = pl( A ) + pl1 ( A ) − pl( A )pl1 ( A )
(2)
corresponds to the disjunctive combination of pl and pl1 [4] and it is a plausibility less informative than pl and which could be an alternative to pl. It has an additional property: it avoids sure loss [3] if plausibilities have a behaviour interpretation (the plausibility of A is the infimum of the selling prices for event A). In [3], another rules were proposed as
P 4 ( A ) = max pl( A ), pl1 ( A ) .
(3)
However, in this case we do not always obtain a plausibility function, but an upper probability. With this rule, consistency with Bayesian inference, could be obtained if we consider the following procedure to combine the likelihood information with a hypothetical prior plausibility, plπ , on Θ :
• Compute the pignistic probability π associated with plπ on Θ . • Let plπ 1 the posterior of π with respect to L (θ; x):
π (θ) L (θ; x) pl1 ( A ) = θ ∈ A . θ ∈Θ π (θ) L (θ; x) π
(4)
π • Let plπ 2 the combination with Dempster’s rule of pl (as defined on the paper) and pl . π π • Compute P π using the maximum rule of Eq. (3) but now using pl and pl instead of pl and pl1 . 2 1 4
With this alternative rule, we obtain P 4 in the case that plπ is vacuous and Bayesian inference is recovered in the case of a prior Bayesian plausibility plπ on Θ . We do not know how to achieve consistency with Bayesian inference in the case of rule pl3 . In conclusion, in this comment I want to stress the fact that even if there are important arguments in favour of pl, I do not think that they are strong enough to discard another plausibilities as representations of a likelihood function. A point that is not fully clear to me is Remark 1, in which it is discussed the difference with the Bayesian approach. It is said that if a prior probability π is available in Θ , then the posterior probability conditioned to a basic belief assignment mX on X is given by
f (θ|mX ) = π (θ)
P X ( A |θ) mX ( A ). P X( A)
(5)
A ⊆X
My point will be that this is a possible Bayesian solution, but that there are alternative ones, and from my point of view, the most natural one coincides with the solution provided in the paper. For me the conditioning to mX is somewhat ambiguous, and then I will prefer to assume that we are conditioning to the fact we have received evidence E which has defined mX . Imagine that E is a piece of evidence which depends on the true observation X ∈ X and an unknown value ω on a set Ω . Assume that Ω and X are independent and that E is obtained with the following probability:
P ( E |x, ω) =
1 if x ∈ Γ (ω) 0 otherwise,
(6)
where Γ is the multivalued function given rise to mX . We have to take into account that given E, each ω ∈ Ω defines a set on X which is given by {x : P ( E |x, ω) = 1} and that coincides with Γ (ω). So we can consider that E also defines mass mX . Under these conditions, and assuming that E is independent of θ given x and ω , we have that given E the conditional information on Θ is proportional to π (θ) P ( E |θ), and it is easy to verify that
P ( E |θ) =
ω ,x
=
P {ω} P ( E |ω, θ, x) f (x; θ)
P {ω} P ( E |ω, x) f (x; θ) =
ω ,x
i.e., the same expression (33) in the paper.
x
f (x; θ) Pl(x),
(7)
JID:IJA
AID:7714 /DIS
[m3G; v 1.132; Prn:14/04/2014; 15:21] P.3 (1-3)
S. Moral / International Journal of Approximate Reasoning ••• (••••) •••–•••
3
The Bayesian analysis in Remark 1 can be obtained, if we assume that we have four variables: one takes values on Θ , the other one takes values on 2X (the set of subsets of X), the third one is evidence E, and the fourth one is ω . It is also assumed that E is independent of θ given X and ω , that the value Γ (ω) is obtained with probability 1 given ω and E, P ( A |θ) and that given A ⊆ X, the probability of θ is equal to PX ( A ) . Under these conditions, the analysis in the paper provides X the result of Eq. (42) in Denœux’s paper [1]. However, these assumptions are not quite natural: first it has been necessary X to transform X into 2 . The conditional information on Θ given A has been computed with the original probabilistic information, however not all the original specifications have been taken into account. If in this model we compute the probability P ( A |θ), what is obtained is that this probability is proportional to mX ( A ) x∈ A f (x; θ), which is not the same of what it could be expected in the original model. References [1] T. Denœux, Likelihood-based belief function: justification and some extensions to low-quality data, Int. J. Approx. Reason. (2014), http://dx.doi.org/ 10.1016/j.ijar.2013.06.007 (in this issue). [2] A.R. Masegosa, S. Moral, Imprecise probability models for learning multinomial distributions from data. Applications to learning credal networks, Int. J. Approx. Reason. (2014), http://dx.doi.org/10.1016/j.ijar.2013.09.019 (in this issue). [3] P. Walley, S. Moral, Upper probabilities based only in the likelihood function, J. R. Stat. Soc. B 61 (1999) 831–847. [4] D. Dubois, H. Prade, A set-theoretic view of belief functions – logical operations and approximations by fuzzy sets, Int. J. Gen. Syst. 12 (1986) 191–226.