An interval estimator for the unmixing of mixtures with set-based source descriptions

An interval estimator for the unmixing of mixtures with set-based source descriptions

International Journal of Approximate Reasoning 60 (2015) 71–92 Contents lists available at ScienceDirect International Journal of Approximate Reason...

864KB Sizes 1 Downloads 19 Views

International Journal of Approximate Reasoning 60 (2015) 71–92

Contents lists available at ScienceDirect

International Journal of Approximate Reasoning www.elsevier.com/locate/ijar

An interval estimator for the unmixing of mixtures with set-based source descriptions Jan Verwaeren ∗ , Michael Rademaker, Bernard De Baets KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure links 653, Ghent, Belgium

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 14 November 2014 Received in revised form 10 March 2015 Accepted 12 March 2015 Available online 16 March 2015 Keywords: Mixtures Interval estimation Mathematical optimization

The unmixing of mixtures, i.e., the computation of the proportional contribution of a set of sources to a mixture, is of interest to a multitude of applied research disciplines. We consider an unmixing problem in which a mixture is represented by a point in Rn , and the sources are represented by subsets of Rn . We argue that the proportional contribution of a source of interest to that mixture can naturally be estimated by an interval estimator, as opposed to a more traditional point estimator. An interval estimator is proposed and it is shown that this estimator can be computed efficiently in a variety of cases by solving an optimization problem. © 2015 Elsevier Inc. All rights reserved.

1. Introduction We start this introduction with an example of an unmixing problem: Assume you have been given a cup of water with a salinity of 21 ppt (parts per thousand). Moreover, you are told that the water in this cup is a blend of fresh water (salinity of 2 ppt) and sea water (salinity of 40 ppt). Subsequently, you are asked to estimate the proportional amount of fresh water (x1 ) in the cup. It may seem rather intuitive to give x1 = 50% as an answer to this problem. Indeed, when the proportional amount reflects the ratio of the number of fresh water molecules to the total number of molecules in the cup, this answer is valid. We now make this reasoning more formal. In the example above, the mixture, the fresh water and the sea water are represented by their salinities, i.e., the representation space is [0, 1000]. Let us denote the representation of the mixture, fresh water and sea water, respectively, as y, c1 and c2 . To obtain the answer x1 = 50%, we can solve the following system of linear equations:



y = x1 c1 + x2 c2 1 = x1 + x2

.

Naturally, the solution of this system should satisfy x1 , x2 ≥ 0. Even though this description may seem rather trivial, it expresses the assumption that we are working with linear mixtures, and is generally referred to as the linear mixing model (LMM) assumption (see for instance [1]). Loosely speaking, this assumption implies a linear relationship between the fractional abundance of the fresh water (x1 ) in the mixture and the representation of this mixture (the salinity in this case). The first equation in the system above expresses this assumption mathematically. The second equation states that x1 and

*

Corresponding author. Tel.: +32 92645931; fax: +32 92646220. E-mail addresses: [email protected] (J. Verwaeren), [email protected] (M. Rademaker), [email protected] (B. De Baets).

http://dx.doi.org/10.1016/j.ijar.2015.03.004 0888-613X/© 2015 Elsevier Inc. All rights reserved.

72

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

x2 should be proportions and are the only constituents of the mixture. Correspondingly, they should add up to one. In this example, the LMM assumption trivially holds. However, this is not necessarily the case in general [1,2]. We will generally refer to the fresh water and the sea water as sources and the vectors c1 and c2 as prototype vectors of these sources; y is called the mixture vector. In this example, where the representations of the mixture and the sources are scalars, one can readily obtain an estimate for x1 and x2 . However, in most applications, the information about the sources is less explicit. For instance, we might only know that c1 ∈ C 1 = [1, 3] and c2 ∈ C 2 = [30, 45]. In this setting, any solution (x1 , x2 , c1 , c2 ) to the system

⎧ ⎪ ⎪ ⎪ ⎨

y 1 x1 , x2 ⎪ ⎪ c1 ⎪ ⎩ c2

= = ≥ ∈ ∈

x1 c1 + x2 c2 , x1 + x2 , 0, C1 , C2 ,

can be considered as a possible estimate for x1 and x2 . This means that the information on the sources lacks the required precision to allow for a unique x1 and x2 to be extracted. We now briefly elaborate on the origin of this imprecision. Firstly, we state what it is not: the imprecision is not due to some measurement error, neither does it result from an underlying stochastic process. Instead, the imprecision is inherent to the way in which the source is defined. Here, the sources are stated to be fresh and sea water. Naturally, there exists a variety of fresh and sea water sources. For example, the salinity of water taken from the Mediterranean Sea differs from the salinity of water taken from the North Sea. Therefore, a precise answer to the question raised before cannot be given. Instead, we can provide a set of values (proportions) for x1 that can explain our observation. More precisely, for each proportion in this set, there exists at least one couple of prototype vectors c1 ∈ [1, 3] and c2 ∈ [30, 45] that could have been used to create a mixture such that y = x1 c1 + (1 − x1 ) c2 . In this example, the set of possible proportions can easily be computed. However, the problem becomes more challenging as the number of sources increases, and when the sources or the mixture are represented in a high(er)-dimensional space, i.e., y ∈ Rq with q > 1 and C 1 , . . . , C n are (generic) subsets of Rn . The main result of this paper is the formulation of an optimization problem that can be used to compute the set of possible mixing proportions in a very general setting. The remainder of this paper is organized as follows. In Section 2, we give a formal problem description. In Section 3, we review related work and discuss application areas. In Section 4, we prove that, under rather general conditions, the set of possible proportions is an interval and provide a characterization of the optimization problem that can be used to compute this interval. In Section 5, an equivalent optimization problem is proposed that can be used to compute the set of possible proportions more efficiently. In Section 6, we illustrate our approach. We conclude in Section 7. 2. Formal problem description Let the prototype vectors c1 , . . . , cn ∈ Rq be the representations of n sources in a feature space (throughout this paper, all vectors are column vectors). Moreover, let the mixture vector y ∈ Rq be the representation of a mixture that is formed by mixing (or blending) these n sources. The linear mixing model (LMM) assumes the following relationship between the representations of the sources and the representation of the mixture:

y=

n 

xi ci ,

(1)

i =1

n

where xi ∈ [0, 1] is the proportional contribution of the ith source to the mixture, and i =1 xi = 1. Note that (1) is a vector equality. When the prototype vectors are given (and linearly independent), deriving the vector of mixing proportions x = (x1 , . . . , xn ) is trivial. In practice, however, prototype vectors are rarely known explicitly. Instead, we often only have a rough description of these prototype vectors. In some cases, probability density functions can be used to model this lack of knowledge (either assumed to be given or estimated from data), and Bayesian modeling approaches are used to derive estimates for xi or credibility intervals if needed. However, often the amount of information or data that is available is insufficient to specify (or accurately estimate) the required probability density functions. In such cases, one must resort to a less expressive way of describing the sources. One option consists of representing a source by a set, i.e., the ith source is represented by the set C i ⊆ Rq , representing the knowledge that the ith prototype vector ci ∈ C i . Now, let C 1 , . . . , C n ⊆ Rq and y ∈ Rq be given. The feasible set X (y, (C i )ni=1 ) is defined as

X (y, (C i )ni=1 )

n

= x∈S |



∃ (ci )ni=1



n i =1

Ci

y=

n  i =1

 xi ci

,

(2)

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

n

73

n

where Sn = {x ∈ [0, 1]n | i =1 xi = 1} is the n-dimensional simplex and i =1 C i is the Cartesian product of the sets C 1 , . . . , C n . This means that each element in X (y, (C i )ni=1 ) is a vector of mixing proportions that respects our knowledge about the sources and the LMM. Moreover, we define the following projection of X (y, (C i )ni=1 ):





X k (y, (C i )ni=1 ) = z ∈ [0, 1] | (∃ x ∈ X (y, (C i )ni=1 ))(xk = z) . The set X k (y, (C i )ni=1 ) can be interpreted as the set of all possible values for the proportional contribution of the kth source to the mixture represented by y. As such, any element from X k (y, (C i )ni=1 ) can be used as a point estimate for xk .

Without making further assumptions, one cannot select a single best element out of X k (y, (C i )ni=1 ). Because of that,

we propose X k as a set estimator for xk (note that the arguments of X k are dropped here for notational convenience). Fortunately, as we will show later, under quite general conditions we have that X k = [inf( X k ), sup( X k )]. Consequently, in these cases we will refer to our estimator as an interval estimator; it can be considered as a natural extension of the more familiar point estimator of xk . Here, the computation of this interval can be performed by solving two optimization problems. Interestingly, it will turn out that these problems can be solved globally in an efficient manner. We conclude this section by presenting three properties of X (y, (C i )ni=1 ) that will be used later in this manuscript. These properties trivially follow from the definition of X (y, (C i )ni=1 ): (i) If C i ⊆ C i , for i = 1, . . . , n, then it holds that X (y, (C i )ni=1 ) ⊆ X (y, (C i )ni=1 ). n  n / conv( ni=1 C i ), then it holds that X (y, (C i )ni=1 ) = ∅. (ii) Let conv( i =1 C i ) be the convex hull of i =1 C i . If y ∈

(iii) If y ∈ C k , then it holds that sup( X k (y, (C i )ni=1 )) = 1. 3. Related work and application areas 3.1. Statistical estimation approaches

We briefly describe two popular statistical approaches for estimating the mixing coefficients in the LMM. First, we consider a constrained least squares estimation approach, which is very popular in several applied domains (for instance, oil mixtures in food industry [3], remote sensing [4] and medicine [5]). Generally, least squares methodologies assume that an observed mixture vector y can be written as a linear combination of n prototype vectors plus an additional error term, leading to the following model

y˜ =

n 

xi ci +  ,

(3)

i =1

which differs from the LMM (1) by a q × 1 error vector. When y and c1 , . . . , cn are given, the constrained least squares estimator xˆ is the optimal point of the following optimization problem.



minimize y − x∈Rn

subject to

n

2  i =1 x i c i 2

n

i =1 x i

= 1,

x1 , . . . , xn ≥ 0 . This formulation mainly applies to situations where q >> n, such that the solution of the optimization problem is uniquely defined. Three main differences exist between our set estimator described in the introduction and the setting to which the constrained least squares estimator applies. Firstly, the least squares approach requires a single-point description of the sources; as such, our approach can be seen as more generally applicable. Secondly, the least squares approach is generally designed for high-dimensional, (and thirdly) noisy data. On the other hand, our set estimator mainly applies to situations where q < n. Bayesian modeling methods have proved to be very effective, mainly for the analysis and deconvolution of spectral data (see for instance [6] for Bayesian modeling with nuclear magnetic resonance (NMR) data, or [7] for an application in the setting of Raman spectroscopy). In Astle et al. [6], a Bayesian method is proposed for predicting the abundance of metabolites in a complex biological mixture based on the H-NMR spectrum of that mixture. Here as well, the authors use the LMM (3). To define priors over the sources, a library containing H-NMR profiles of frequently occurring metabolites is used. Moreover, their method takes into account that some metabolites may not be present in the library. Estimates (or credibility intervals) for the abundance of the metabolites of interest are obtained by likelihood maximization. The work of Astle et al. [6] is related to the interval estimates that were proposed in the introduction, as the data-driven construction of priors in [6] provides an imprecise, probabilistic description of the sources (the metabolites). In the same spirit, the sets C 1 , . . . , C n provide a set-based alternative to the imprecise description of sources.

74

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Fig. 1. Comparison of three estimators: the least squares point estimate xˆ kLS , the Bayesian (estimated) posterior distribution (dotted line), and the set estimate [inf( Xˆ k ), sup( Xˆ k )].

3.2. Point estimators, probability estimators or set estimators The constrained least squares estimator and the Bayesian modeling approach can be related to our set estimator. We argue that these estimators mainly differ in the assumptions that are made. As such, these methods can complement each other, rather than being competitive. Firstly, the restricted least squares estimator implies that the sources are known exactly, and that the observed mixture vector y˜ is a noisy observation of the true mixture vector y, as expressed by (3). In this perspective, the constrained least squares method provides a point estimate of the mixing coefficients. The uncertainty on the estimate xˆ only results from the noise in the observation. If needed, by making several assumptions about the noise term, this uncertainty can be translated into a confidence interval. Secondly, Bayesian modeling methods can be seen as a conceptual generalization of the constrained least squares estimator, as it allows to include uncertainty on the sources (in addition to the noise term). Bayesian modeling methods express this uncertainty by defining priors on the sources. When, as in the motivating example or in [6], data are available about these sources, these data can be used to define a data-driven prior. The estimated posterior distribution on x, resulting from this modeling method, can be used to assess the uncertainty on the mixing coefficients in a probabilistic manner (if needed by using credibility intervals). However, the adequacy of this assessment strongly depends on the appropriateness of the priors that are used. Thirdly, our set estimator requires that the uncertainty about the sources is described in an epistemic manner (using sets) rather than a probabilistic one. The set estimate that is obtained has a clear possibilistic interpretation; it represents a range of values for xi that can explain our observation, i.e., possible values, in contrast to the Bayesian approach that gives an interval of likely values, i.e., probable values for the mixing coefficient. As mentioned above, the successful application of the Bayesian modeling method requires that the priors over the sources are specified correctly. This contrasts our set-based approach, which requires that sets C 1 , . . . , C n corresponding to these sources can be specified correctly. Fig. 1 illustrates the connection between the three estimators. 3.3. A set-based representation of sources To be able to use the set estimator X k (y, (C i )ni=1 ), its arguments need to be specified. In a practical setting, it can be assumed that y is given. However, the set-based representation (C i )ni=1 of the sources may not be directly available in most

applications. Fortunately, there exist several manners to obtain estimates (Cˆ i )ni=1 of (C i )ni=1 in an indirect manner. In some of these applications, data are available that capture the (natural) variability in these sources. In these cases, the convex hull of the data points can be used to obtain inner approximations of (C i )ni=1 as illustrated in Fig. 2. To motivate this approach, let us return to the introductory problem setting. Assume that fresh water and sea water are characterized by q components. Moreover, let two matrices C1 ∈ Rm1 ×q and C2 ∈ Rm2 ×q be given in which the rows represent observations of fresh water (C 1 ) and sea water (C 2 ). We refer to these observations as observed prototypes. Fig. 2 shows a scatter plot of these datasets (q = 2) as well as their convex hulls. Now let a and b be two elements of C1 . Recall that a and b are representations of two types of fresh water (i.e., two prototype vectors). These prototypes of fresh water can be mixed to obtain a third prototype by choosing θ ∈ [0, 1] and defining d = θ a + (1 − θ) b (we typically say that d is a convex combination of a and b). Naturally, a blend of two types of fresh water results in a mixture that solely consists of fresh water. Therefore, d represents fresh water as well. Interestingly, the convex hull conv(C1 ) of the observed prototype vectors in C1 contains all vectors that can be constructed using this strategy. Therefore, it is a natural set to consider, as long as linear mixtures of prototypes are considered to characterize the sets capturing the variability of a source. The reasoning above also suggests that (C i )ni=1 are convex sets that can be approximated naturally by convex sets. Let i C ∈ Rmi ×q be a matrix in which each row represents an observed prototype of the ith source. Assuming a noise-free environment, we know that conv(Ci ) ⊆ C i . Therefore, we say that conv(Ci ) is an inner approximation of C i . It follows that X k (y, (conv(Ci ))ni=1 ) ⊆ X k (y, (C i )ni=1 ). Some applications1 might require that the approximations (Cˆ i )ni=1 of (C i )ni=1 are 1

For example, consider a setting in which we want to assess whether a mixing proportion exceeds a given threshold θ ∈ [0, 1]. There are three options:

(i) θ < inf( X k (y, (C i )ni=1 )); (ii) θ ∈ X k (y, (C i )ni=1 ); and (iii) θ > sup( X k (y, (C i )ni=1 )). In case (i) the threshold is exceeded, in case (ii) we are inconclusive and

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

75

Fig. 2. Scatter plots of two artificial datasets C1 (left) and C2 (right) as well as their convex hulls. The solid dots represent observed points, the red dot is a convex combination of a and b. The red line segment groups all points that can be written as convex combinations of a and b. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

chosen such that X k (y, (Cˆ i )ni=1 ) ⊇ X k (y, (C i )ni=1 ). In those settings, we could try to find outer approximations of (C i )ni=1 . Unfortunately, finding such outer approximations is often hard. Nevertheless, the data in Ci can be used to construct, for instance, minimum spanning ellipsoids [8] or multifocal ellipses [9]. Even though these approximations are not guaranteed to be outer approximations, they will lead to intervals that extend the estimates that are obtained when approximating C i by the convex hull of Ci . Unmixing problems appear in several research areas. For example, in remote sensing, unmixing approaches are used to estimate the relative amounts of land-use classes in mixed pixels [1,4,10], using spectral information to describe the sources and the mixture. In the field of remote sensing, an approach was presented [11] that is related to our set estimator. However, their work is limited to sources that can be described by convex polytopes. In food sciences, unmixing approaches are often applied to detect fraudulent adulteration of vegetable oils [3]. The sources and the mixture can be represented by their triacylglyceride composition. In molecular biology, the LMM is used by [6] to estimate the relative abundance of several metabolites in complex biological mixtures. There, the mixture and the sources are typically described by nuclear magnetic resonance spectra. In principle, all of the aforementioned applications allow set-based unmixing approaches to be used. Especially applications in which sources show (natural) variability that prevents a representation by a point or the knowledge on the variability of the sources is too limited to be represented by a probability distribution function. From a methodological point of view, the set estimator can be seen as the result of reasoning with set-valued information. In view of the position paper of Couso and Dubois [12], the sets C 1 , . . . , C n can be interpreted as ontic sets, e.g., the sets are the union of the salinities of different oceans. From a mathematical point of view, the set estimator borrows several ideas from interval calculus [13]. Moreover, the set-based representation of the sources can, for a given mixture, be interpreted as a way to describe the uncertainty associated with the sources. It can be stated that this uncertainty is propagated to the prediction of the composition of the mixture, illustrating the relationship with (mainly possibilistic) uncertainty propagation (see for instance [14–17]). Interestingly, a lot of methodologies on uncertainty propagation lead to numerical optimization problems. This is also the approach that is taken here. However, as opposed to most papers (that often rely on nonlinear programming problems that are hard to solve), we provide a procedure that is guaranteed to find the global optimum of the problems that are encountered. 4. A characterization of the interval estimator In the previous sections, a set estimator was introduced and related to existing unmixing approaches. In this section, we will show that, under fairly general conditions, our set estimator can be computed efficiently. We start with a characterization of X k . In what follows, we will assume that C 1 , . . . , C n are compact and convex subsets of Rq . Moreover, given a set A ⊂ Rn , the convex hull of A is denoted conv( A ). We now present several propositions that can be used to characterize X k . Their proofs are given in Appendix A.

n

Proposition 1. Let y ∈ conv(

i =1

C i ). It holds that

X 1 (y, (C i )ni=1 ) = X 1 (y, (C 1 , conv

n

i =2



C i )) .

As a result of this proposition, any unmixing problem that involves multiple sources can be translated into an equivalent unmixing problem that involves two sources only leading to the same set estimator. Therefore, we can limit our analysis

in case (iii) the threshold is certainly not exceeded. When we use inner approximations Cˆ i of C i , we might have that X k (y, (Cˆ i )ni=1 ) ⊆ X k (y, (C i )ni=1 ) and, in some cases wrongly conclude that a threshold is exceeded (which may not be desirable).

76

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

to settings that involve only two sources. Let us now define the following optimization problem (we refer to this problem as M1 ):

M1 (y, C 1 , C 2 ) :

maximize

x1 ∈R,x2 ∈R,c1 ∈Rq ,c2 ∈Rq

subject to

x1 y = x1 c1 + x2 c2 , 1 = x1 + x2 , 0 ≤ x1 , x2 , c1 ∈ C 1 , c2 ∈ C 2 .

We call y, C 1 and C 2 the data (or the inputs) of this optimization problem. It is clear that the optimal value of M1 (y, C 1 , C 2 ) is equal to sup( X 1 (y, C 1 , C 2 )). Now consider the following definitions. Definition 1. A scalar x1 is called M1 (y, C 1 , C 2 )-feasible (or shorthand M1 -feasible) if there exists a point (x1 , x2 , c1 , c2 ) ∈ R × R × Rq × Rq that is a feasible point of M1 (y, C 1 , C 2 ). Definition 2. A couple (c1 , c2 ) ∈ Rq × Rq is called M1 (y, C 1 , C 2 )-feasible (or shorthand M1 -feasible) if there exists a point (x1 , x2 , c1 , c2 ) ∈ R × R × Rq × Rq that is a feasible point of M1 (y, C 1 , C 2 ). These definitions are used in the following propositions. Proposition 2. For every M1 (y, C 1 , C 2 )-feasible couple (c1 , c2 ) ∈ Rq × Rq , such that c1 = c2 , there exists exactly one couple (x1 , x2 ) ∈ [0, 1]2 such that (x1 , x2 , c1 , c2 ) is a feasible point of M1 . Moreover, for this couple we have

x1 =

c2 − y2 . c1 − y2 + c2 − y2

Proof. This proof is a trivial consequence of the LMM.

2

Remark 1. For an M1 (y, C 1 , C 2 )-feasible couple (c1 , c2 ) ∈ Rq × Rq , such that c1 = c2 , it trivially holds that y = c1 = c2 . This can only occur when y ∈ C 1 ∩ C 2 and it then trivially holds that X 1 (y, C 1 , C 2 ) = [0, 1]. Moreover, this can easily be detected in practice by solving two optimization problems, one to check whether y ∈ C 1 and one to check whether y ∈ C 2 (see for instance [18]). Proposition 3. If two scalars x¯ 1 and x˙ 1 are M1 (y, C 1 , C 2 )-feasible, then any scalar z = M1 (y, C 1 , C 2 )-feasible as well.

α x¯ 1 + (1 − α ) x˙ 1 , with α ∈ [0, 1], is

Corollary 1. The set X k (y, (C i )ni=1 ) is an interval. 5. Computing the set of proportional contributions In this section, we turn our attention to solving M1 . It should be noted that solving M1 directly (for instance with a general-purpose nonlinear solver) is potentially hard as the constraint y = x1 c1 + x2 c2 is a bilinear vector equality constraint. The presence of this type of constraint complicates the search for a solution [19]. In the previous section, we have proved that X k is an interval. This property implies that, given an M1 -feasible scalar, a bisection approach can be used to find inf( X k ) and sup( X k ). In this section, we derive a more in-depth characterization of the solution of M1 that will allow us to gain further insight into the problem and derive a more general optimization procedure. It should be stated that the main results are rather intuitive and can easily be understood using the basic geometry of the problem. Therefore, the general intuition behind the solution strategy is presented first, followed by the technical details. 5.1. Computing X k when q = 1 As a starting point, we consider the case q = 1, which implies that C 1 = [bl1 , b1u ] and C 2 = [bl2 , b2u ] are intervals. An example of this case is presented in Fig. 3. We now focus on the computation of sup( X k ). The geometry behind the computation of sup( X k ) is rather simple. Recall from Proposition 2 that, for a given M1 (y, C 1 , C 2 )-feasible couple (c1 , c2 ), the proportional contribution x1 of c1 is

x1 =

  c2 − y2 c1 − y2 −1 = 1+ . c1 − y2 + c2 − y2 c2 − y2

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

77

Fig. 3. Visualization of the search space when q = 1. We use the notation bl1 = inf(C 1 ) (resp. bl2 = inf(C 2 )) and b1u = sup(C 1 ) (resp. b2u = sup(C 2 )).

Geometrically, the couple (c1 , c2 ) for which x1 is maximal, is the couple for which c2 is as far away from y as possible and c1 is as close to y as possible. In the example in Fig. 3, this amounts to setting c1 = b1u and c2 = b2u ; as a result, we obtain that sup( X 1 ) =

b2u −y b2u −b1u

. This closed-form solution is applicable to any case in which b1u < y ≤ b2u (this is formally shown (and

generalized) in the following section). Later in this section, we will see that this simple case serves as a building block for computing sup( X k ) in settings where the dimensionality q is greater than one. The following proposition provides a characterization of M1 in the case q = 1 (the proof is given in Appendix B). Proposition 4. Let C 1 = [bl1 , b1u ], C 2 = [bl2 , b2u ] and y ∈ conv(C 1 ∪ C 2 ) \ C 1 . Let (x∗1 , x∗2 , c∗1 , c∗2 ) be an optimal point of M1 (y, C 1 , C 2 ). It holds that









(x1 , x2 , c1 , c2 ) =

⎧ y−bu y−b u ⎪ ( bu −b2u , 1 − bu −b2u , b1u , b2u ) , ⎪ ⎪ 1 2 1 2 ⎪ ⎪ ⎪ ⎪ ⎨ (0, 1, ., y) , ⎪ y−bl y−bl ⎪ ⎪ ( l 2l , 1 − l 2l , bl1 , bl2 ) , ⎪ ⎪ b − b b ⎪ 1 2 1 −b 2 ⎪ ⎩ (0, 1, ., y) ,

if b1u < y < b2u ,

(a)

if b1u < y = b2u ,

(b)

if bl2 < y < bl1 ,

(c )

if bl2 = y < bl1 ,

(d)

where the dot indicates that there are multiple optimal points. Moreover, there are no (other) locally optimal points. 5.2. Computing X k when q > 1 The results that were obtained above can now be used to characterize the solution of M1 when q > 1. To gain insight into this general case, we will often resort to the geometrical interpretation of the problem for q = 2. Nevertheless, the results obtained here are not limited to q = 2. As a running example, we will use the case presented in Fig. 4(a). Moreover, we assume that y ∈ conv(C 1 ∪ C 2 ) \ C 1 . Recall the discussion in Section 5.1 where it was stated that, for a given M1 (y, C 1 , C 2 )-feasible couple (c1 , c2 ), the proportional contribution x1 of c1 is

x1 =

c2 − y2 . c1 − y2 + c2 − y2

Geometrically, the couple (c1 , c2 ) for which x1 is maximal, is the couple for which c2 is as far away from y as possible and c1 is as close to y as possible. Naturally, for (c1 , c2 ) to be feasible when q > 1, c1 and c2 should be collinear with y. This complicates the analysis of the problem. Nevertheless, this geometrical interpretation leads to insights into the problem. We start our analysis by formally introducing the following notation for a line segment and a line; for any two distinct vectors a, b ∈ Rq , we will denote

(a, b) = {v ∈ Rq | (∃ t ∈ [0, 1]) (v = t a + (1 − t ) b)} , (a, b) = {v ∈ Rq | (∃ t ∈ R) (v = a + t (a − b))} . Note that, when fixing c1 ∈ C 1 and c2 ∈ C 2 such that y ∈ (c1 , c2 ), the solution of M1 directly follows from Proposition 4. Moreover, we define several sets that will be used in the reasoning hereafter:

C 1 = {c1 ∈ C 1 | (∃ c2 ∈ C 2 ) (∃t ∈ [0, 1])(y = t c1 + (1 − t ) c2 )} , C 2 = {c2 ∈ C 2 | (∃ c1 ∈ C 1 ) (∃t ∈ [0, 1])(y = t c1 + (1 − t ) c2 )} . For an illustration, see Fig. 4(b). Note that C 1 is a subset of C 1 that is obtained by removing all elements from C 1 that can never occur in feasible points (due to the bilinear equality constraint). Consequently, replacing C 1 by C 1 in the optimization problem will not reduce the feasible space. Next, we define the convex sets C¯ 1 and C¯ 2 :

C¯ 1 = {c ∈ Rq | (∃ c1 ∈ C 1 ) (∃t ∈ [1, ∞[) (c − y = t (c1 − y))} , C¯ 2 = {c ∈ Rq | (∃ c2 ∈ C 2 ) (∃t ∈ [1, ∞[) (c − y = t (c2 − y))} .

78

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Fig. 4. (a) An example of the sets C 1 , C 2 and a mixture vector y for q = 2. (b) Illustration of the sets C 1 and C 2 . (c) Illustration of the sets C¯ 1 and C¯ 2 . (d) Illustration of the set C¯ 1∗ . (e) Illustration of the set G e . (f) Illustration of the upper level set G¯ e of f 6 (at level e) and the optimal source vector c∗2 .

For an illustration, see Fig. 4(c). Finally, we define the set C 1∗ :





C 1∗ = {c ∈ C 1 | ¬ (∃ c ∈ C 1 ) (∃t ∈ ]0, 1]) (c = t y + (1 − t ) c) } . From this definition, it is clear that C 1∗ is a subset of the boundary of C 1 . For an illustration, see Fig. 4(d). When we fix c˙ 2 ∈ C 2 , the couple (c1 , c˙ 2 ) is an M1 -feasible couple if and only if c1 ∈ (y, c˙ 2 ) ∩ C 1 , which is a line segment. Moreover, from Proposition 4, it follows that on this line segment the point c1 that maximizes the objective function can be found at the endpoint that is closest to y. As such, if (x∗1 , c∗1 , c∗2 ) is locally optimal, then c∗1 will be located at the boundary of C 1 . More precisely, we have that c∗1 ∈ C 1∗ . Combining the reasoning above with Proposition 2 allows to reformulate the original optimization problem in the following way:

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

M2 :

maximize

x1 ∈R,c1 ,c2 ∈Rq

subject to

79

c2 − y2 c1 − y2 + c2 − y2 y = x1 c1 + (1 − x1 ) c2 , 0 ≤ x1 , x1 ≤ 1 , c1 ∈ C 1∗ , c2 ∈ C 2 .

Optimization problem M2 can be simplified further. Indeed, for a fixed vector c2 ∈ C¯ 2 , there exists only one c1 ∈ C 1∗ such that the bilinear equality constraint of M2 holds. As such, c1 can be written as a (vector) function of c2 . This vector function is denoted2

g : C¯ 2 → C 1∗

c2 → (y, c2 ) ∩ C 1∗

As the definition of g assures that the bilinear constraints are always satisfied, the bilinear constraints can be left out of the optimization procedure. Additionally, as the objective function of M2 is automatically constrained to the unit interval, the bounds on x1 can be dropped as well (in fact x1 can be eliminated from the optimization problem). As such, we arrive at the following reformulation:

M6 :

c2 − y2 g(c2 ) − y2 + c2 − y2 subject to c2 ∈ C 2 . maximize c2 ∈Rq

As we will be using the objective function of M6 several times in the remainder of this section, we denote this function as f 6 . We now show that the set of contour vectors of the objective function of M6 , i.e., the vectors c2 for which f 6 (c2 ) = e (where e ∈ [0, 1] is a constant), equals





G e = {c2 ∈ Rq | ∃ c1 ∈ C 1∗ (c2 − y =

e e−1

(c1 − y))} .

For an illustration, see Fig. 4(e). In the remainder of this subsection, we will assume that y = 0q (which can always be obtained by a translation). Given c2 ∈ C 2 , let us choose d such that g(c2 ) = d c2 (it is easy to show that d < 0 will be unique). Now let us choose c1 = g(c2 ). Setting t = 1/d, we obtain c1 t = c2 . This means that (c1 , c2 ) = (c1 , t c1 ) = (g(c2 ), c2 ). We then have

f 6 (c2 ) =

c2  −t c1  −t = = . g(c2 ) + c2  c1  − t c1  1 − t

(4)

This equality shows that for a fixed t, for any c2 for which there exists a c1 ∈ C 1∗ such that c2 = t c1 , we have f 6 (c2 ) = e . This shows that every vector in G e has the same e −1 objective function value. For each c2 ∈ C¯ 2 , there exists a unique d > 0 and ge ∈ G e such that c2 = d ge . Moreover, for each ge ∈ G e , there exists a e unique c∗1 ∈ C 1∗ such that ge = e− c∗ . We have that g(ge ) = g(c2 ) = c∗1 , leading to: 1 1

−t /(1 − t ). To obtain the contour f 6 (c2 ) = e, we simply choose t =

f 6 (c2 ) =

−d e−e 1 c2 2 d ge 2 =  ∗ = . c  + d ge 2 g(c2 )2 + c2 2 1 − d e−e 1 1 2

(5)

From (5) we can conclude that, for any c2 = d ge (where d > 0 and ge ∈ G e ), the following holds: – If d ∈ [0, 1[, then f 6 (c2 ) < e. – If d = 1, then f 6 (c2 ) = e. – If d ∈ ]1, ∞[, then f 6 (c2 ) > e. These implications show that if c2 ∈ C 2 \ G e , then f 6 (c2 ) = e, so G e contains all contour vectors, and no others. Let us now define the following set:

G¯ e = {c ∈ C¯ 2 | (∃ c2 ∈ C 2 ) (∃t ∈ [1, ∞[) ( f 6 (c2 ) = e ∧ c = t c2 )} . 2 Strictly speaking, the function that is presented here maps a vector to a set. However, this set is always a singleton, and the unique element in this set is a q-dimensional vector. As a result, we identify this mapping with a vector function that maps an input to a q-dimensional vector.

80

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

For an illustration, see Fig. 4(f). The implications given above show that for any c2 ∈ G¯ e , we have that f 6 (c2 ) ≥ e, i.e., G¯ e is the upper level set of f 6 . Moreover, the following equivalence holds:



c2 ∈ G¯ e ⇐⇒ (∃ c1 ∈ C¯ 1 ) c2 =

e e−1



c1

.

(6)

From this equivalence, we have that G¯ e can be seen as a point reflection of C¯ 1 through the origin followed by an isotropic scaling. As each of these operations preserves the convexity of a set, it can be concluded that G¯ e is a convex set. As such, C 2 ∩ G¯ e , which is the upper level set of the objective function on the feasible domain, is a convex set. This means that optimization problem M6 is a quasi-concave maximization problem (see for instance [18]), having (except for some degenerate cases) one local (and thus global) optimal point.3 The optimal value is attained by increasing e up to the point where C 2 ∩ G¯ e is a singleton. For this case, we have that e = sup( X 1 ). Moreover, when the optimal point of M1 is denoted as (x∗1 , x∗2 , c∗1 , c∗2 ), we have that c∗2 = G¯ x∗ ∩ C 2 (see Fig. 4(f) 1 for an illustration). The reasoning set forth in this section has several important implications: (i) Firstly, instead of solving M1 directly, we can solve an equivalent quasi-concave optimization problem (which leads to a series of convex feasibility problems). As such, it generalizes the bisection approach that was suggested in the previous section, as this procedure does not require an initial feasible point. (ii) Secondly, it lays the foundations for a deeper characterization of the solution of M1 : The optimal value occurs when e is chosen such that the upper level set C 2 ∩ G¯ e is the singleton {c∗2 }. For this value, i.e., e = x∗1 , we have that the convex set G¯ e touches C 2 . This means that there exists at least one hyperplane passing through C 2 ∩ G¯ e that is a supporting hyperplane for both C 2 and G¯ e at c∗2 (see Fig. 5(a) for an illustration). Moreover, as G¯ e is a scaled point reflection of C¯ 1 , there exists a hyperplane that is parallel to the former one, and is supporting for C 1 at (c∗2 , y) ∩ C 1∗ = {c∗1 } (see Fig. 5(b) for an illustration). (iii) We can now (orthogonally) project C 1 , C 2 and y on a line that is orthogonal to these hyperplanes (see Fig. 5(c) for an illustration). Moreover, it is easy to see that when solving the optimization problem in this projected space, the optimal objective function value is identical to the original one. In the following section we will derive a procedure that allows to compute this direction very efficiently. (iv) Lastly, as a consequence of the previous point, in the special case that C 1 and C 2 are convex polytopes (that do not have parallel faces4 ), we have that at least one of c∗1 and c∗2 is located at a vertex of C 1 or C 2 when q = 2. 5.3. A reformulation as a fractional program In this section, we will show that M1 (y, C 1 , C 2 ) is equivalent to the following fractional program:

M3 :

minimize

a∈Rq ,b1 ,b2 ∈R

subject to

b2 − a, y b2 − b1 a2 = 1 , b1 ≥ c1 , a ,

for all c1 ∈ C 1 ,

b2 ≥ c2 , a ,

for all c2 ∈ C 2 ,

b1 < a, y , b2 ≥ a, y . The reasoning behind M3 is the following. When the sets C 1 and C 2 and the vector y are projected orthogonally onto the direction a, M1 is transformed into a one-dimensional format. More precisely, given a unit vector a ∈ Rq , the coordinate of a point d ∈ Rq on the axis that is defined by a, can be computed (and denoted) as follows

P a (d) = a, d , moreover, the projection of a set D ∈ Rq on a is

P a ( D ) = {a, d | d ∈ D } . 3 It is well known that quasi-convex optimization problems can have locally optimal points, i.e., points at which the null vector is a (sub)gradient of the objective function, that are not globally optimal. However, from (5) and the adjoining implications described above, there are no regions in which the null vector is a (sub)gradient of f 6 . It is possible, however, that there are multiple globally optimal points (forming a connected region). This can occur when C 1 and C 2 are polytopes that have parallel faces. 4 In the case that the polytopes have parallel faces, the maximizer may not be unique. However, in that case we can always find an optimal pair (c∗1 , c∗2 ) such that at least one of c∗1 and c∗2 is located at a vertex of C 1 or C 2 when q = 2.

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

81

Fig. 5. (a) An illustration (blue line) of the supporting hyperplane of the sets C 2 and G¯ x∗ at c∗2 . (b) An illustration (green line) of the supporting hyperplane 1 of C 1 at c∗1 and parallel to the former hyperplane (blue line). (c) An illustration of the projection of C 1 , C 2 and y on a direction that is orthogonal to the supporting hyperplanes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

This projection can be used to transform M1 (y, C 1 , C 2 ) into a new optimization problem M1 ( P a (y), P a (C 1 ), P a (C 2 )). Interestingly, the optimal value of M1 ( P a (y), P a (C 1 ), P a (C 2 )) is greater than or equal to the optimal value of M1 (y, C 1 , C 2 ) (we show this hereafter). On the other hand, from the previous section, we know that there exists a direction such that the optimal values of both optimization problems are equal. In this perspective, M3 searches the direction a that minimizes the optimal value of M1 ( P a (y), P a (C 1 ), P a (C 2 )). The following proposition states that M1 and M3 are equivalent. The optimal direction is illustrated in Fig. 6. Proposition 5. Let C 1 and C 2 be subsets of Rq such that M1 (0q , C 1 , C 2 ) has feasible points and 0q ∈ / C 1 . Then we have the following correspondence between the optimal point (x∗1 , c∗1 , c∗2 ) of M1 (0, C 1 , C 2 ) and the optimal point (a∗ , b∗1 , b∗2 ) of M3 (0, C 1 , C 2 ):

b∗ x∗1 = ∗ 2 ∗ . b2 − b1 With respect to the proposition above, it should be noted that for (a∗ , b∗1 , b∗2 ) to be a strict local minimizer, there should exist exactly one pair of parallel supporting hyperplanes at c∗1 and c∗2 . Indeed, if multiple pairs of parallel supporting hyperplanes exist, each of these pairs can be used to obtain a unit vector that turns inequality (B.14) in the proof of Proposition 5 into an equality. Loosely speaking, we can say that if the boundary of C 1 in the neighborhood of c∗1 or that of C 2 in the neighborhood of c∗2 is smooth, then the local minimizer is strict. From Proposition 5, we have the equivalence between M1 and M3 . This means that M3 can be used as a substitute for M1 . When dropping the norm constraint in M3 , a fractional program (M4 ) is obtained that is equivalent5 to M3 (however, implicitly we need that a2 = 0). Formally, we define M4 as:

5

It is the result of dividing the numerator and the denominator in the objective function as well as the inequality constraints by the norm of a.

82

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Fig. 6. Derivation of a direction a such that M1 (y, C 1 , C 2 ) and M1 ( P a (y), P a (C 1 ), P a (C 2 )) are equivalent. Here, we let y = 02 .

M4 :

minimize

q a∈R0 ,b1 ,b2 ∈R

subject to

b2 − a, y b2 − b1 b1 ≥ c1 , a ,

for all c1 ∈ C 1 ,

b2 ≥ c2 , a ,

for all c2 ∈ C 2 ,

b1 < a, y , b2 ≥ a, y , q R0

where = Rq \ {0q }. In M4 , the objective function is the ratio of two affine functions of b1 and b2 . Moreover, the affine function in the denominator is strictly positive in the feasible domain. As such, the objective function of M4 is quasi-convex. All problem constraints are affine, making M4 a potentially tractable optimization problem. We say potentially, as the number of affine inequality constraints can be infinite. However, several measures can be taken to overcome this problem. We elaborate on these cases in the following section. 5.4. Efficiently solving the fractional program As a starting point of this section, consider the constraint(s):

b1 ≥ c1 , a ,

for all c1 ∈ C 1 .

(7)

As stated at the end of the previous section, the expression above potentially implies an infinite number of affine constraints. Fortunately, these constraints can easily be translated into a single constraint:

b1 ≥ sup c1 , a . c1 ∈ C 1

These observations could suggest to use cutting plane algorithms [20] to solve M4 . Even though this approach potentially allows for very generic forms of C 1 and C 2 for which M4 can be solved efficiently, we will focus on two specific cases here that do not require the use of cutting plane algorithms. (i) Notably, when C 1 and C 2 are convex polytopes, the constraints of M4 reduce to a finite set of linear inequalities. More precisely, the first set of constraints can be replaced by

c1 , a ≤ b1 , for all vertices c1 of C 1 . In this case, M4 reduces to a linear fractional program. It is well known that linear fractional programs can be transformed into equivalent linear programs that can be solved efficiently.6 (ii) As a second interesting situation, we consider the case where C 1 and C 2 are ellipsoids. An ellipsoid in a q-dimensional space can be described by a q-dimensional vector and a positive definite q × q matrix. Therefore, let the vectors v1 and v2 as well as the positive definite matrices V1 and V2 be given such that:

C 1 = {v1 + V1 u | u ∈ Rq ∧ u2 ≤ 1} , C 2 = {v2 + V2 u | u ∈ Rq ∧ u2 ≤ 1} . 6 As we do not wish to disturb the flow of this paper, we defer a description of this transformation to Appendix C. Moreover, Appendix D contains guidelines on how the resulting optimization problems can be solved with standard linear programming or SOCP software.

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

83

Table 1 Synthetic data representing the observed prototypes of two sources. Source nr.

Feature 1

Feature 2

Source nr.

Feature 1

Feature 2

1 1 1 1 1 1

0.5264 0.8298 0.8677 0.8648 0.7780 0.6153

0.1082 0.0266 0.0708 0.0800 0.2179 0.3398

1 2 2 2 2 2

0.5435 0.1125 0.3911 0.3849 0.0502 0.0268

0.2047 0.5038 0.5399 0.6085 0.9242 0.6835

It is well known that [18]:

 

 

1 max{a c1 | c1 ∈ C 1 } = v a . 1 a + V 2

Therefore, constraint (7) reduces to the following second order cone constraint7 :

 

 

1 v a ≤ b1 . 1 a + V 2

An equivalent procedure can be applied to C 2 . Note that, when C 1 and C 2 are polytopes, we argued that the resulting optimization problem can be transformed into an equivalent linear programming problem. Fortunately, the presence of the second order cone constraint does not prevent us from applying a similar strategy here. Therefore, once this transformation is applied, second order cone programming solvers can be used to optimize the resulting program efficiently. Remark 2. Finally, we note that we have shown the equivalence of M1 and M4 only for the case where y ∈ conv(C 1 ∪ C 2 ) \ C 1 . We know that, if y ∈ / conv(C 1 ∪ C 2 ), then it holds that X k (y, C 1 , C 2 ) = ∅. Therefore, we could attempt to detect such cases prior to solving M4 . However, when y ∈ / conv(C 1 ∪ C 2 ), there exists a hyperplane that separates y from both C 1 and C 2 . In this case, we can choose a to be the normal vector of that hyperplane, b2 = y, a and b1 < y, a. Such a point will be optimal for M4 . Therefore, an algorithm that solves M4 will find such a solution and we can simply verify if b2 = y, a to conclude whether X k (y, C 1 , C 2 ) = ∅. 6. An illustration In this section, the use of the interval estimator is illustrated on synthetic and real-life data. All experiments were performed with MATLAB (version R2013a), using CVX, a package for specifying and solving convex programs [21,22]. Second order cone programs were solved using SDPT3 [23]. Linear programs were solved using the linear programming solver of MATLAB. All experiments were performed on a PC (Intel Dual Core 2.66 GHz, 4 GB). 6.1. An illustration on synthetic data In this section, we illustrate our interval estimator using artificially generated data. As a starting point, consider a setting with two sources (referred to as source 1 and source 2) that are represented by two sets C 1 and C 2 in a 2-dimensional feature space (the boundaries of these sets are shown in Fig. 7). In Table 1, the coordinates of 12 observed prototype vectors of these sources are given. We now assume that the two sources have been used to create a mixture with (observed) mixture vector y = (0.4, 0.4) . The observed prototypes can be used to estimate C 1 and C 2 . As argued in Section 3, the convex hulls of these observed prototypes provide natural estimates of C 1 and C 2 . Therefore, let Cˆ 1 and Cˆ 2 be the convex hulls of the observed prototype vectors. As the feasible sets Cˆ 1 and Cˆ 2 are convex polyhedrons of which the vertices are known, X 1 (y, C 1 , C 2 ) can be computed using optimization problem M4 (y, C 1 , C 2 ). As this optimization problem is a linear fractional program, it can be transformed into an equivalent linear program (see Appendix C). The resulting linear program can be solved using any linear programming solver, resulting in X 1 (y, C 1 , C 2 ) ≈ [0.247, 0.713] (where we used ≈ to denote potential rounding errors). Fig. 7(a) shows the location of the ‘optimal’ prototype vectors corresponding to the minimal (red) and maximal (blue) elements of X 1 . Alternatively, we can estimate the sets C 1 and C 2 by the minimum volume enclosing ellipsoids C˜ 1 and C˜ 2 of the observed prototype vectors (see Fig. 7(b)). The minimum volume enclosing ellipsoid of a set of points can be computed in O (m3.5 ln(m −1 )) (where m is the number of points and  is the error with respect to the optimal ellipsoid) using an algorithm proposed in [24]. We used an implementation of this algorithm by [25]. In this setting, we obtain X 1 (y, C˜ 1 , C˜ 2 ) ≈ [0.100, 0.826] by solving a second-order cone programme. As Cˆ 1 ⊂ C˜ 1 and Cˆ 2 ⊂ C˜ 2 , it follows from the properties of the set estimator (see Section 2) that X 1 (y, Cˆ 1 , Cˆ 2 ) ⊆ X 1 (y, C˜ 1 , C˜ 2 ), which is clearly illustrated by this experiment. 7

This constraint is called a second order cone constraint as it can be written as a generalized inequality using the second order Lorentz cone.

84

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Fig. 7. Visualization of the experiment with synthetic data. The coordinates observed prototype vectors are given in Table 1. (a) Visualization of the location of the ‘optimal’ prototype vectors corresponding to the minimal (red) and maximal (blue) elements of X 1 (y, Cˆ 1 , Cˆ 2 ). (b) Visualization of the location of the ‘optimal’ prototype vectors corresponding to the minimal (red) and maximal (blue) elements of X 1 (y, C˜ 1 , C˜ 2 ). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 8. Log–log plot of the time needed to compute an interval estimate as a function of the number of observed prototypes (m) per source and the dimensionality q of the representation space. Each point represents the average of 30 repetitions.

6.2. Computational complexity In this section, an experiment is presented to assess the computational complexity of the interval estimator. In this experiment, we consider two sources and a varying dimensionality q of the feature space. C 1 is a q-dimensional hypersphere with radius 1, centered at 2q (a q × 1 vector of twos), C 2 is a q-dimensional hypersphere with radius 1, centered at −2q and y = 0q . To obtain an interval estimate: (1) m observed prototypes were sampled from each hypersphere (where m ranges between 25 and 2000), and (2) the interval estimate is computed following the procedure in Appendix D (C 1 and C 2 are approximated by means of the convex hulls of the observed prototypes). For each combination of q and m, this procedure was repeated 30 times. The average computation times are reported in Fig. 8. From this figure it can be seen that, even in high-dimensional settings (q = 100) that count 2000 observed prototypes per source, the set estimator can be computed in only 100 seconds, suggesting that interval estimates can be computed efficiently in most practical settings. 6.3. An illustration on real-life data We now turn our attention to a setting in which we use real-life data. Therefore, we consider the problem setting discussed in [26]. For a given oil mixture, the authors consider the problem of detecting fraudulent adulteration of mustard seed oil with (cheaper) soybean oil, using the information that is contained in the fatty acid composition of the mixture.

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

85

Fig. 9. Three-dimensional scatter plot of the observed prototypes (based on the mass concentrations of fatty acids C16:1, C18:0 and C18:2). Mustard seed oils are depicted in black, soybean oils are depicted in gray. Fatty acid concentrations were centered and scaled to have unit variance.

We can easily translate this problem into a set-based unmixing problem. Let y ∈ R3 be the fatty acid composition of an oil mixture, and let x1 , resp. x2 be the proportional amounts of mustard seed oil (source 1, represented by C 1 ⊂ R3 ) and soybean oil (source 2, represented by C 2 ⊂ R3 ) in that mixture. Moreover, let θ ∈ [0, 1] be a given threshold that defines an allowable level of adulteration. In that case, the question whether a mixture is the result of fraudulent adulteration reduces to assessing whether x2 > θ . For our interval estimator, this translates into: (i) If θ < inf( X 2 (y, C 1 , C 2 )), then the adulteration level x2 of y is above θ . (ii) If θ > sup( X 2 (y, C 1 , C 2 )), then the adulteration level x2 of y is below θ . (iii) If θ ∈ X 2 (y, C 1 , C 2 ), then it cannot be concluded whether the adulteration level x2 is above or below the threshold θ . To obtain estimates of C 1 and C 2 , we use a dataset [26] of 37 observations (observed prototype vectors) of mustard seed oils and 22 observations of soybean oils. The convex hulls or the minimum volume enclosing ellipsoids of these data can be used to estimate C 1 and C 2 . A scatter plot of the observed prototype vectors is presented in Fig. 9. A small experiment was set up to test the applicability of our interval estimator in this setting. Firstly, a mixture vector was created by randomly selecting one observed prototype vector from the set of mustard seed oils (we refer to this random observation as C 1 ) and one observed prototype vector (C 2 ) from the set of soybean oils. Setting x1 = 3/4 and x2 = 1/4, we can compute Y = x1 C 1 + x2 C 2 , where Y represents the random vector representing the mixture. Thirdly, the two observed prototype vectors that were used to create Y are removed from the datasets and the remaining observed prototype vectors are used to estimate C 1 and C 2 by convex hulls (Cˆ 1 and Cˆ 2 ) or minimum volume enclosing ellipsoids (C˜ 1 and C˜ 2 ). Lastly, we compute X 2 (Y , Cˆ 1 , Cˆ 2 ) and X 2 (Y , C˜ 1 , C˜ 2 ). Figs. 10(a–d) show histograms of inf( X 2 ) and sup( X 2 ) for 200 repetitions of this approach. From these figures, it can be seen that in most cases the true adulteration level 1/4 ∈ X 2 (Y , Cˆ 1 , Cˆ 2 ). The average width of X 2 (Y , Cˆ 1 , Cˆ 2 ) was 0.208. The average width of X 2 (Y , C˜ 1 , C˜ 2 ) was 0.365. Moreover, there were 8 runs in which X 2 (Y , Cˆ 1 , Cˆ 2 ) = ∅.8 For an allowed adulteration level of θ = 0.30 (which means that all mixtures are legitimate), we can conclude from Fig. 10(c) that for the setting where ellipsoids are used to estimate C 1 and C 2 , the interval estimator will suggest that none of the mixtures violates the threshold. Using convex hulls, 4 mixtures are falsely considered to be illegitimate. On the other hand, consider the setting where θ = 0.15 (which means that all mixtures are illegitimate). In this case, our estimator (using convex hulls to estimate C 1 and C 2 ) will consider 72 mixtures to be illegitimate, whereas only 25 mixture are considered to be illegitimate when minimum spanning ellipsoids are used.9 These results can be verified by considering Fig. 11, which shows the estimated probability that a mixture will be considered to be legitimate, illegitimate or the analysis is inconclusive for different values of θ . From Fig. 11, it can also be seen that for values of θ that are close to x2 = 1/4, the probability that the result of an analysis will be inconclusive is high. For θ = 0.2, for example, approximately 85% of the samples will result in an inconclusive analysis result (when C 1 and C 2 are approximated by the convex hulls of the observed prototypes). Only

8 In these cases, the mixture vector cannot be written as a convex combination of the observed prototype vectors. This is a consequence of using an inner approximation of C 1 and C 2 (in this case the convex hulls of the observed prototypes). 9 Note that, as Cˆ 1 ⊂ C˜ 1 and Cˆ 2 ⊂ C˜ 2 , we obtain that X 2 (Y , Cˆ 1 , Cˆ 2 ) ⊆ X 2 (Y , C˜ 1 , C˜ 2 ). As a result, when using minimum spanning ellipsoids to approximate C 1 and C 2 instead of convex hulls, it becomes less likely that a mixture will falsely be considered to be illegitimate. On the other hand, when a mixture is truly illegitimate, it is more likely that this will be detected when using convex hulls. Clearly, if we want to prevent that a mixture is falsely considered to be illegitimate, outer approximations of C 1 and C 2 are needed.

86

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Fig. 10. Results of the oil-unmixing problem. (a, b) Frequency counts for inf( X 2 (Y , Cˆ 1 , Cˆ 2 )) and sup( X 2 (Y , C˜ 1 , C˜ 2 )), based on 200 repetitions. From 200 repetitions, 8 resulted in X 2 (Y , Cˆ 1 , Cˆ 2 )) = ∅. (c, d) Frequency counts for inf( X 2 (Y , C˜ 1 , C˜ 2 )) and sup( X 2 (Y , C˜ 1 , C˜ 2 )), based on 200 repetitions. From 200 repetitions, 0 resulted in X 2 (Y , Cˆ 1 , Cˆ 2 )) = ∅.

Fig. 11. (a) Visualization of the probability that a mixture will be considered illegitimate (Pr(inf( X 2 (Y , Cˆ 1 , Cˆ2 ) > θ))), legitimate (Pr(sup( X 2 (Y , Cˆ 1 , Cˆ2 )) ≤ θ)) or that the analysis is inconclusive (Pr(θ ∈ X 2 (Y , Cˆ 1 , Cˆ2 ))) as a function of θ , for the setting where C 1 and C 2 are approximated by convex hulls (Cˆ 1 and Cˆ 2 ) of the observed prototypes. (b) Visualization of the probability that a mixture will be considered illegitimate, legitimate or that the analysis is inconclusive for the setting where C 1 and C 2 are approximated by ellipsoids ( C˜ 1 and C˜ 2 ).

15% of the mixtures will be considered as illegitimate. For some applications, such a result may be too weak. We argue that the number of illegitimate mixtures that is detected can only be increased by adding information to the problem. Such information can be included by making assumptions (for instance distributional assumptions) are by extending the characterization of the sources and the mixtures, i.e., by taking more fatty acids into account.

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

87

7. Conclusions and future work We have presented an interval estimator for the unmixing of mixtures with set-based source descriptions. It was shown that interval estimation leads to a tractable optimization problem when the sources can be represented by convex subsets of Rq . The presented interval estimator is complementary to more traditional distribution estimators or point estimators. It should be noted that our interval estimator is mainly developed for low-dimensional settings that provide data that is not too noisy. Indeed, in high-dimensional settings, due to the curse of dimensionality, a large number of observed prototype vectors may be needed to be able to estimate C 1 and C 2 with sufficient accuracy. It is interesting to study how noise in the data influences the interval estimates and how noise can be inherently incorporated into the interval estimator. Assuming, for example, a probabilistic error model, this could allow to combine epistemic uncertainty (the descriptions by sets) and statistical uncertainty (the error in the observed prototypes). Appendix A. Proofs of the propositions in Section 4 This subsection contains the proofs of the propositions in Section 4 of the paper. Proof of Proposition 1.

(i) We first prove that





n

x1 ∈ X 1 (y, (C i )ni=1 ) ⇒ x1 ∈ X 1 (y, (C 1 , conv(

i =2

C i ))) :

n

Let x1 ∈ X 1 (y, (C i )ni=1 ). We can always choose (x1 , . . . , xn ) ∈ Sn and c1 ∈ C 1 , . . . , cn ∈ C n such that y = i =1 xi ci . We n x c n now define c = i =2 1−i xi . It easily follows that c ∈ conv( i =2 C i ). Moreover, as (x1 , 1 − x1 ) ∈ S2 and x1 c1 +(1 − x1 ) c =

n n x1 ∈ X 1 (y, (C 1 , conv( i =2 C i ))). i =2 C i ))) and thus  n (ii) We now prove that x1 ∈ X 1 (y, (C i )ni=1 ) ⇐ x1 ∈ X 1 (y, (C 1 , conv( i =2 C i ))) : n n 1 Let x1 ∈ X (y, (C 1 , conv( i =2 C i ))). We can always choose(x1 , x2 ) ∈ S2 , c1 ∈ C 1 and c ∈ conv( i =2 C i )) such that n y = x1 c1 + x2 c. As C 2 , . . . C n are  convex and c ∈ conv( i =2 C i )), we can always find  c2 ∈ C 2 , . . . , cn ∈ C n and (x2 , . . . , xn ) ∈ Sn−1 such that c = ni=2 xi ci . It easily follows that y = x1 c1 + (1 − x1 ) ni=2 xi ci . As a result, we have that (x1 , (1 − x1 )x2 , . . . , (1 − x1 )xn ) ∈ X (y, (C i )ni=1 ) and thus x1 ∈ X 1 (y, (C i )ni=1 ). 2 1

y, we have that (x1 , 1 − x1 ) ∈ X (y, (C 1 , conv(



Proof of Proposition 3. In this proof, we will assume that y = 0q . This does not affect the generality of the proof as this case can always be obtained by translating the coordinate system (which does not affect the optimal value). As x¯ 1 and x˙ 1 are M1 -feasible, there exist at least two points (¯x1 , x¯ 2 , c¯ 1 , c¯ 2 ) and (˙x1 , x˙ 2 , c˙ 1 , c˙ 2 ) that are feasible points of M1 . Let us define the vector functions h1 : [0, 1] → Rq and h2 : [0, 1] → Rq as follows

h1 (β) = c¯ 1 β + c˙ 1 (1 − β) , h2 (β) = c¯ 2 β + c˙ 2 (1 − β) . For any β ∈ [0, 1], we have that h1 (β) ∈ C 1 and h2 (β) ∈ C 2 . Moreover, as (¯c1 , c¯ 2 ) (resp. (˙c1 , c˙ 2 )) is an M1 -feasible couple, we have that

c¯ 2 = c¯ 1 t¯

c˙ 2 = c˙ 1 t˙ ,

and

(A.1)

for some scalars t¯, t˙ < 0. Additionally, these equalities can be used to rewrite h2 as follows

h2 (β) = c¯ 1 t¯ β + c˙ 1 t˙ (1 − β) . Now consider the vector



 −1  1−β ˙ ˙ ¯ m = x1 (β), x2 (β), h1 (β), h2 t t +t , β

where

x1 (β) =

1 1−

(1−β) t¯+β t˙ t¯ t˙

,

and

x2 (β) = 1 − x1 (β) .

It can easily be verified that m is a feasible point of M1 for any β ∈ [0, 1]. Moreover, by applying Proposition 2, we obtain that

x1 (0) = x¯ 1 =

1 1 − 1/t˙

,

and

x1 (1) = x˙ 1 =

1 1 − 1/t¯

.

(A.2)

As x1 (β) is a continuous, monotone function of β , we know that (from the intermediate value theorem) the range of x1 (β) is [min(¯x, x˙ ), max(¯x, x˙ )]. This means that, for any z = α x¯ 1 + (1 − α ) x˙ 1 , there exists a β ∈ [0, 1] such that z = x1 (β). 2

88

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Appendix B. Proofs of the propositions in Section 5 To prove Proposition 4, we will need several lemmas. The general goal here is to show that the points that are given in Proposition 4 are the only points that satisfy the Karush–Kuhn–Tucker conditions. Lemma Appendix B.1. Let C 1 = [bl1 , b1u ], C 2 = [bl2 , b2u ], and b1u < y < b2u , then every feasible point of M1 (y, C 1 , C 2 ) satisfies LICQ.10 Proof. We start by slightly rephrasing M1 , by eliminating x2 and transforming it into a simpler form. This leads to the following optimization problem:

minimize

x1 ∈R,c1 ∈Rq ,c2 ∈Rq

−x1

(B.1)

subject to

x1 c1 + (1 − x1 ) c2 − y = 0 ,

(B.2)

−x1 ≤ 0 ,

(B.3)

−1 + x1 ≤ 0 ,

(B.4)

−c1 + bl1 −c2 + bl2 −b1u + c1 −b2u

≤ 0,

(B.5)

≤ 0,

(B.6)

≤ 0,

(B.7)

+ c2 ≤ 0 .

(B.8)

To show that the LICQ holds, it suffices to compute the partial derivatives of the constraints and show that the partial derivatives of active constraints are always linearly independent. This can easily be proved to be the case. 2 Lemma Appendix B.2. Let C 1 = [bl1 , b1u ], C 2 = [bl2 , b2u ], and bl2 < y < bl1 , then every feasible point of M1 (y, C 1 , C 2 ) satisfies LICQ. Proof. The proof is analogous to the proof of the lemma above.

2 y−b u

Lemma Appendix B.3. Let C 1 = [bl1 , b1u ] and C 2 = [bl2 , b2u ], and b1u < y < b2u . The point (x∗1 , x∗2 , c∗1 , c∗2 ) = ( bu −b2u , 1 − 1

is the only KKT point of M1 (y, C 1 , C 2 ).

2

y−b2u b1u −b2u

, b1u , b2u )

Proof. As a starting point, it is trivial to verify that (x∗1 , x∗2 , c∗1 , c∗2 ) is a feasible point. To simplify the proof, we eliminate x2 by replacing it by 1 − x1 (which follows from the summation constraint), such that (x1 , x2 , c1 , c2 ) ≡ (x1 , c1 , c2 ). Moreover, we rewrite the maximization problem in the form of Eqs. (B.1)–(B.8). For this problem, the Lagrangian (L) becomes

L ((x1 , c1 , c2 ), λ, γ ) = −x1 + λ1 (x1 c1 + c2 − x1 c2 − y) − γ1 (x1 ) − γ2 (1 − x1 )

− γ3 (c1 − bl1 ) − γ4 (c2 − bl2 ) − γ5 (b1u − c1 ) − γ6 (b2u − c2 ) . The partial derivatives of the Lagrangian with respect to c1 , c2 and x1 are:

∂L = −1 + λ1 (c1 − c2 ) − γ1 + γ2 , ∂ x1 ∂L = λ1 x1 − γ3 + γ5 , ∂ c1 ∂L = λ1 (1 − x1 ) − γ4 + γ6 . ∂ c2

(B.9) (B.10) (B.11)

It can easily be seen that (x∗1 , ck∗ , c∗2 ) is a feasible point, moreover, at this point, only the constraints x1 c1 + c2 − x1 c2 = y, b1u − c1 ≥ 0 and b2u − c2 ≥ 0 are active, which means (using the complementarity conditions of the KKT conditions) that γ1 = γ2 = γ3 = γ4 = 0. Setting the partial derivatives of the Lagrangian equal to zero, taking into account the complementarity conditions and plugging in (x∗1 , ck∗ , c∗2 ), leads to the following Lagrange multipliers:

λ1 =

1 c1 − c2

< 0,

γ5 = −λ1

y − b2u b1u − b2u



> 0,

γ6 = −λ1 1 −

y − b2u b1u − b2u



> 0.

As such, the KKT conditions hold at (x∗1 , c∗1 , c∗2 ), which means that the necessary conditions hold. 10

Linear independence constraint qualification, see for instance [27] for a definition.

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

89

We now show that there exist no other points that satisfy the KKT conditions. Firstly, consider a generic point (x01 , c01 , c02 ). When this point is limited to be in the interior (not at the bounds) of the search space, none of the Lagrange multipliers are zero. It can easily be shown that solving the KKT system in this case leads to Lagrange multipliers, for the inequality constraints, of which at least one is negative. The same holds when only one of the variables is located at its bounds, as well as when both c01 = b 1 and c02 = b 2 . 2 Lemma Appendix B.4. Let C 1 = [bl1 , b1u ], C 2 = [bl2 , b2u ] and bl2 < y < bl1 , then the point (x∗1 , x∗2 , c∗1 , c∗2 ) = ( is the only KKT point of M4 (y, C 1 , C 2 ). Proof. The proof is analogous to the proof of the lemma above.

y−bl2 bl1 −bl2

,1 −

y−bl2 bl1 −bl2

, bl1 , bl2 )

2

We can now prove Proposition 4. Proof of Proposition 4. The proof of Proposition 4 largely follows from the lemmas above. We only prove case (a). The proofs for cases (b–d) are analogous. Lemmas B.1 and B.3 show that the proposed point is the only point that satisfies the first order necessary conditions. Therefore, it is trivial to see that this point is the only local and thus global minimum of M1 . 2 Proof of Proposition 5. From the discussion in Section 5.2, we know that there exists at least one pair of parallel hyperplanes such that the first hyperplane of this pair is a supporting hyperplane of C 1 at c∗1 and the second hyperplane of this pair is a supporting hyperplane of C 2 at c∗2 . We now assume that a∗ is the normal vector of these hyperplanes (we show later that this assumption is correct). We now have that

a∗ , c∗2  = x∗1 . a∗ , c∗2  − a∗ , c∗1  Moreover, we have that a∗ , c1  ≤ a∗ , c∗1  for all c1 ∈ C 1 , and a∗ , c2  ≤ a∗ , c∗2  for all c2 ∈ C 2 . This means that, for M3 , the point (a∗ , b1 , b2 ) is feasible, with

b1 = a∗ , c∗1  , ∗

(B.12)



b2 = a , c2  .

(B.13)

The procedure described above allows, given (x∗1 , c∗1 , c∗2 ), for the construction of a point (a∗ , b1 , b2 ) that is feasible for M3 , and moreover

b2

x∗1 =

b2 − b1

.

We will now show that: (i) a∗ is the normal vector of a pair of parallel supporting hyperplanes such that the first hyperplane of this pair is a supporting hyperplane of C 1 at c∗1 and the second hyperplane of this pair is a supporting hyperplane of C 2 at c∗2 , (ii) b∗1 = a∗ , c∗1 , (iii) b∗2 = a∗ , c∗2 . Firstly, when a is fixed (instead of an optimization variable for M3 ), it is easy to see that setting b1 = max( P a (C 1 )) and b2 = max( P a (C 2 )) optimizes M3 . Moreover, when a is the normal vector of the supporting hyperplanes at c∗1 and c∗2 , it is easy to see that b1 = max( P a (C 1 )) = a, c∗1  and b2 = max( P a (C 2 )) = a, c∗2 . We now prove that a∗ is the normal vector of these hyperplanes (and optimal for M3 ). For any unit vector a, it can easily be seen (as c∗1 = t c∗2 for some t ∈ R) that





a, c∗2







a, c∗2 − a, c∗1

 = x∗1 .

Moreover, as P a (c∗1 ) ∈ P a (C 1 ), we have that P a (c∗1 ) ≤ max( P a (C 1 )) (and, equivalently, P a (c∗2 ) ≤ max( P a (C 2 ))). This means that, when a is feasible,



a, c∗2



max( P a (C 2 ))   ∗ ≤ x1 =  . ∗ max( P a (C 2 )) − max( P a (C 1 )) a, c2 − a, c1 ∗

(B.14)

More precisely, when choosing a orthogonal to the supporting hyperplane of C 1 (resp. C 2 ) passing through c∗1 (resp. c∗2 ), inequality (B.14) becomes an equality. This completes the proof that the point in (i)–(iii) is a minimizer of M3 . 2

90

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

Appendix C. Equivalent linear programmes and SOCPs In Section 5.4 it was argued that, in some cases, M4 can be transformed into an equivalent linear program or second order cone program. In this appendix, this transformation is briefly described. The transformation that is used here is generally known as the Charnes and Cooper transformation [28]. It was originally developed for transforming a linear fractional program into a linear program. When the sets C 1 and C 2 are convex polytopes, M4 reduces to a linear fractional program and the Charnes and Cooper transformation can be applied directly. Instead of presenting the complete transformation, we apply it directly to M4 . As a starting point, consider the following transformation of the vector of optimization variables (b1 , b2 , a ) :





b1 ⎝ b2 ⎠ . z= b2 − b1 a 1

It can easily be seen that, using this variable, the objective function of M4 can be rewritten as



0 1 −y



z.

The inequality constraints can be rewritten as:



 z −1 0 c 1    0 −1 c2 z   1 0 −y z   0 −1 y z

≤ 0,

for all c1 ∈ C 1 ,

≤ 0,

for all c2 ∈ C 2 ,

< 0, ≤ 0.

However, z is not a free variable. The transformation that is used in the definition of z implies that



 

−1 1 0q

 



z = −1 1 0q





b1 ⎝ b2 ⎠ = 1 . b2 − b1 a 1

Therefore, the constraint



 −1 1 0q z = 1

should be included as well. This transformation leads to the following optimization problem:

M5 :

minimize z∈Rq+2



0 1 −y

subject to





z

−1 0 c 1 0 −1 c 2 1 0 −y 0 −1 y



 z ≤ 02 , for all c1 ∈ C 1 and c2 ∈ C 2 ,



 

−1 1 0q

z ≤ 02 , z

= 1.

The equivalence between M4 and M5 follows from [28]. Unfortunately, just as M4 , the new optimization problem M5 has an infinite number of inequality constraints. However, as argued before, when C 1 is a convex polytope,



 z ≤ 0, −1 0 c 1

for all c1 ∈ C 1 ,

reduces to



 z ≤ 0, −1 0 c 1

for each vertex c1 of C 1 .

Moreover, when C 1 is ellipsoidal, i.e.,

C 1 = {v1 + V1 u | u ∈ Rq ∧ u2 ≤ 1} , where v1 is a q-dimensional vector and V1 a q × q positive definite matrix,



 z ≤ 0, −1 0 c 1

for all c1 ∈ C 1 ,

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

reduces to

91

 ⎛ ⎞   z3    1 ⎜ .. ⎟  ⎜ .. ⎟ v1 ⎝ . ⎠ + V ⎝ . ⎠ ≤ z1 .    z q +2 z q +2  ⎛

z3



2

This inequality is a second order cone constraint. Appendix D. Practical guidelines for computing X k In this appendix, we provide guidelines on how to compute sup( X 1 (y, Cˆ 1 , Cˆ 2 )) in practice (the infimum can be computed analogously). We distinguish three cases: (i) y ∈ C 1 : sup( X 1 (y, Cˆ 1 , Cˆ 2 )) = 1; / conv(C 1 ∪ C 2 ): sup( X 1 (y, Cˆ 1 , Cˆ 2 )) = ∅; (ii) y ∈ (iii) y ∈ conv(C 1 ∪ C 2 ) \ C 1 : sup( X 1 (y, Cˆ 1 , Cˆ 2 )) is the optimal value of M1 (or, equivalently, M4 ). Therefore, a practical implementation needs to discover (1) which case applies to a given problem, and (2) in case (iii) solve M4 . D.1. Sources approximated by means of polytopes When Cˆ 1 and Cˆ 2 are polytopes, M4 reduces to a linear program (M5 , see Appendix C) that can be solved using m general-purpose linear programming solvers. We assume to have been given a set of m1 observed prototype vectors {¯c1i }i =11 2 m2 of the fist source and m2 observed prototype vectors {¯ci }i =1 of the second source, and choose:

Cˆ 1 = conv({¯c1i }i =11 ) , m

and

Cˆ 2 = conv({¯c2i }i =21 ) . m

We can compute sup( X 1 (y, Cˆ 1 , Cˆ 2 )) as follows: m

1. Solve the following linear program to check if y ∈ conv({¯c1i }i =11 ):

minimize m

t

t ,(ai )i =11

subject to −1q t ≤ y − 1=

m1

a i =1 i

m1

a c¯ 1 i =1 i i

≤ 1q t ,

,

0 ≤ ai .

Let t ∗ be the optimal value of this optimization problem. If t ≤  (where  is a tolerance parameter, for instance,  = 10−4 ), return sup( X 1 (y, Cˆ 1 , Cˆ 2 )) = 1 (this corresponds to case (i)), else go to step 2. m m 2. Solve the following linear program to check if y ∈ conv({¯c1i }i =11 ∪ {¯c2i }i =21 ):

minimize m

t

m

t ∈R,(ai )i =11 ,(b i )i =21

subject to

 1 m −1q t ≤ y − ( m a c¯ 1 + i =21 b i c¯ 2i ) ≤ 1q t , i =1 i i m m 1 = i =11 ai + i =21 b i , 0 ≤ ai , b i .

Let t ∗ be the optimal value of this optimization problem. If t >  , return X 1 (y, Cˆ 1 , Cˆ 2 ) = ∅ (this corresponds to case (ii)), else go to step 3. 3. Solve the following optimization problem



minimize z∈Rq+2

subject to



0 1 −y



z

 −1 0 c¯ 1i  z   0 −1 c¯ 2i  z   1 0 −y z 0 −1 y   −1 1 0q z

≤ 0 , for i = 1, . . . , m1 , ≤ 0 , for i = 1, . . . , m2 , ≤ 02 , = 1.

The optimal value of this optimization problem yields sup( X 1 (y, Cˆ 1 , Cˆ 2 )).

92

J. Verwaeren et al. / International Journal of Approximate Reasoning 60 (2015) 71–92

D.2. Sources approximated by means of ellipsoids When Cˆ 1 and Cˆ 2 are ellipsoidal, M4 reduces to an SOCP (M5 , see Appendix C) that can be solved using general-purpose linear SOCP solvers. We assume to have been given two ellipsoids:

Cˆ 1 = {v1 + V1 u | u ∈ Rq ∧ u2 ≤ 1} , Cˆ 2 = {v2 + V2 u | u ∈ Rq ∧ u2 ≤ 1} . 1. If y ∈ Cˆ 1 (this can easily be checked), return sup( X 1 (y, Cˆ 1 , Cˆ 2 )) = 1, else go to step 2. 2. Solve the following optimization problem:



minimize z∈Rq+2

subject to

0 1 −y



z

 ⎛ ⎞   z3   ⎜ .. ⎟  i  ⎜ .. ⎟  vi ⎝ . ⎠ + V ⎝ . ⎠ ≤ z1 , for i = 1, 2 ,    z q +2 zq+2 2   1 0 −y z ≤ 02 , 0 −1 y   −1 1 0q z = 1 . ⎛

z3



If the optimal value of the optimization problem is zero, return X 1 (y, Cˆ 1 , Cˆ 2 ) = ∅ (case (ii), see Remark 2), else, the optimal value yields sup( X 1 (y, Cˆ 1 , Cˆ 2 )). References [1] N. Keshava, J.F. Mustard, Spectral unmixing, IEEE Signal Process. Mag. 19 (2002) 44–57. [2] C. Jutten, J. Karhunen, Advances in nonlinear blind source separation, in: Proceedings of the 4th Symposium on Blind Signal Separation, 2003, pp. 245–256. [3] M.H. van Vliet, G.M.P. van Kempen, M.J.T. Reinders, D. de Ridder, Computational estimation of the composition of fat/oil mixtures containing interesterifications from gas and liquid chromatography data, J. Am. Oil Chem. Soc. 82 (2005) 707–716. [4] D.C. Heinz, C. Chang, Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 39 (2001) 529–545. [5] M.H. van Vliet, L.F.A. Wessels, M.J.T. Reinders, Knowledge driven decomposition of tumor expression profiles, BMC Bioinform. 10 (Supplement 1) (2009). [6] W. Astle, M. De Iorio, S. Richardson, D. Stephens, T. Ebbels, A Bayesian model of NMR spectra for the deconvolution and quantification of metabolites in complex biological mixtures, J. Am. Stat. Assoc. 107 (2012) 1259–1271. [7] N. Dobigeon, S. Moussaoui, J. Tourneret, C. Carteret, Bayesian separation of spectral sources under non-negativity and full additivity constraints, Signal Process. 89 (2009) 2657–2669. [8] M.J. Post, Minimum spanning ellipsoids, in: Proceedings of the Sixteenth Annual ACM symposium on Theory of Computing, 1984, pp. 108–116. [9] P. Erdös, I. Vincze, On the approximation of convex, closed plane curves by multifocal ellipses, J. Appl. Probab. 19 (1982) 89–96. [10] B. Somers, G.P. Asner, L. Tits, P. Coppin, Endmember variability in spectral mixture analysis: a review, Remote Sens. Environ. 115 (2011) 1603–1616. [11] T. Bajjouk, J. Populus, B. Guillaumont, Quantification of subpixel cover fractions using principal component analysis and a linear programming method: application to the coastal zone of Roscoff (France), Remote Sens. Environ. 64 (1998) 153–165. [12] I. Couso, D. Dubois, Statistical reasoning with set-valued information: ontic vs. epistemic views, Int. J. Approx. Reason. 55 (2014) 1502–1518. [13] R.E. Moore, R. Baker Kearfott, M.J. Cloud, Introduction to Interval Analysis, Society for Industrial and Applied Mathematics, 2009. [14] E. Kriegler, H. Held, Utilizing belief functions for the estimation of future climate change, Int. J. Approx. Reason. 39 (2005) 185–209. [15] C. Baudrit, I. Couso, D. Dubois, Joint propagation of probability and possibility in risk analysis: towards a formal framework, Int. J. Approx. Reason. 45 (2007) 82–105. [16] H. Vernieuwe, N.E.C. Verhoest, H. Lievens, B. De Baets, Possibilistic soil roughness identification for uncertainty reduction on SAR-retrieved soil moisture, IEEE Trans. Geosci. Remote Sens. 49 (2011) 628–638. [17] H. Vernieuwe, N. Verhoest, B. De Baets, K. Scheerlinck, Practical computing with interactive fuzzy variables, Appl. Soft Comput. 22 (2014) 518–527. [18] S.P. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [19] F.A. Al-Khayyal, Generalized bilinear programming: Part I. Models, applications and linear programming relaxation, Eur. J. Oper. Res. 60 (1992) 306–314. [20] R. Horst, H. Tuy, Global Optimization: Deterministic Approaches, Springer, 1996. [21] M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.0, http://cvxr.com/cvx, Mar. 2013. [22] M. Grant, S. Boyd, Graph implementations for nonsmooth convex programs, in: Recent Advances in Learning and Control, 2008, pp. 95–110. [23] K.C. Toh, M.J. Todd, R.H. Tütüncü, SDPT3 – a matlab software package for semidefinite programming, version 1.3, Optim. Methods Softw. 11 (1999) 545–581. [24] L.G. Khachiyan, Rounding of polytopes in the real number model of computation, Math. Oper. Res. 21 (1996) 307–320. [25] N. Mostagh, Minimum volume enclosing ellipsoids, Tech. rep., University of Pennsylvania, 2009. [26] K. Shrestha, B. De Meulenaer, Computational estimation of soybean oil adulteration in Nepalese mustard seed oil based on fatty acid composition, Commun. Agric. Appl. Biol. Sci. 76 (2011) 211–214. [27] J. Nocedal, S.J. Wright, Numerical Optimization, Springer, 1999. [28] A. Charnes, W.W. Cooper, Programming with linear fractional functionals, Nav. Res. Logist. 9 (1962) 181–186.