Identifying the asymmetry of finite support probability distributions on the basis of the first two moments

Measurement 149 (2020) 106968 Contents lists available at ScienceDirect Measurement journal homepage: www.elsevier.com/locate/measurement Identifyi...

Download PDF

936KB Sizes 2 Downloads 10 Views

Report

PDF Reader
Full Text

Measurement 149 (2020) 106968

Contents lists available at ScienceDirect

Measurement journal homepage: www.elsevier.com/locate/measurement

Identifying the asymmetry of finite support probability distributions on the basis of the first two moments Grzegorz Smołalski ´ skiego 27, Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzez_ e Wyspian PL50-370 Wrocław, Poland

a r t i c l e

i n f o

Article history: Received 19 January 2018 Received in revised form 5 August 2019 Accepted 17 August 2019 Available online 22 August 2019 Keywords: Skewness Uncertainty interval Coverage interval

a b s t r a c t The asymmetry of a probability distribution of measurement result may be recognised, among other methods, on the basis of the versatile numerical characteristics of distribution. A skewness parameter, although suitable for this task, is often hardly convergent when the estimation is based on a sample. In this paper, the outer assessment of the uncertainty interval for skewness is determined on the basis of knowing the limit points of the distribution support together with its initial two moments. For some of their values the uncertainty interval of the skewness contains solely either significantly negative or significantly positive values. In all such cases the distribution must be asymmetrical. The proposed method for determining the asymmetry of probability distribution is suitable for cases where no sample data is available, and only prior knowledge of the initial two moments is given. Examples are provided of the method’s practical applications to asymmetry detection of the measurement result distributions. Ó 2019 Elsevier Ltd. All rights reserved.

1. Introduction One of the sub-areas of metrology, where the symmetry of probability distribution is of importance, is the estimation of coverage interval limits. If the probability distribution of the measurement results is Gaussian, or close to it, then the determination of the coverage interval can follow a standard procedure [9] and usually presents no difficulties. If, however, there are doubts regarding the distribution normality, an investigation into the actual distribution should be undertaken. For this, a histogram of the results is usually created. During the estimation of the coverage intervals those results usually come from a simulation performed using a Monte Carlo (MC) method [7,14]. This numerical, and principally universal, method also reveals inconveniences: a) It requires multiple repetitions of the measurement model evaluation and then the results have to be sorted. The number of necessary trials is usually of the order of 104 108 [7]. b) Because of the very high number of the necessary random numbers, simulating the changes of the model input quantities, high-quality random generators should be used. Table 1. Nowadays, because of the availability of many tools for carrying out the MC method, the above mentioned inconveniences are no longer severe. The main difficulty that might still be important during the MC simulation is the lack of a direct analytical description of the measurement. An involved description, given in the E-mail address: [email protected] https://doi.org/10.1016/j.measurement.2019.106968 0263-2241/Ó 2019 Elsevier Ltd. All rights reserved.

form of an equation or algorithm, requiring a multiple ( 106 ) numerical solution of equations, still may be cumbersome or lengthy. In the case of such difficulties, the decision as to whether to undertake a full investigation into the probability distribution of the result should be justified. Therefore, any argument signifying that the framework of the Guide to the Expression of Uncertainty in Measurement (GUM) ([9] and [7]: 5.6–5.8) is not fulfilled, particularly any argument based on information items which are usually available anyway, would be useful. For the random variable X to describe measurement results, the assumption that the support of its distribution is infinite, is quite typical in measurement science. The support signifies a set of values for which the probability or PDF are non-zero. Distributions obtained from theoretic-analytical considerations often have infinite supports. These types of distributions are usually easily manageable analytically. However, from the physical, technical, technological and physiological points of view, all the distributions modelling real objects should be of finite supports. The values of the bounds of those supports ½xl ; xu generally come from our prior knowledge. This, in fact, usually means that it was acquired by prior measurements, but often indirectly, i.e., with additional assumptions and calculations [26]. For instance, the bounds of a voltage at some point of an electric circuit can be evaluated on the basis of the measured value of the supplying voltage(s) and the known structure of the circuit. Also the value of a given property of any product is usually kept in a quite narrow interval determined by the characteristics of a given technological process.

2

G. Smołalski / Measurement 149 (2020) 106968

Table 1 Symbols and abbreviations in order of their appearance in the text. MC(M) GUM PDF f ðÞ f lh ðÞ X, x

xl ; xu xðtÞ; v ðtÞ

T obs T T max xi ; i ¼ 1; 2; . . . mi m1 m2 m01 ; m02

li l2 r c3 c1 ; c2

LL; UL b1 ; b2 m^

Ta n

Ds

s

min ; Ds max

Du; b T0 R TðRÞ a; b; c; d

rT j Vr

rr Vv d

the Monte Carlo (method) Guide to the Expression of Uncertainty in Measurement; cf. also [9] a Probability Density Function generally means some function, often this is a PDF a likelihood function a random variable describing measurement result, and its particular realisation respectively; in SubSection 6.1, x denotes a normalised measure of the residual time s=T the lower and upper bound, respectively, of the random variable X numerical functions describing temporal changes of the corresponding quantity x or v; temporal signals an interval of the signal observation an actual value of the signal period; in SubSection 6.2, T signifies an absolute temperature the estimation of the period T from above realisations of different physical quantities the i-th order moment of a random variable about the origin a mean value a mean-square value some particular values of m1 and m2 respectively the i-th order moment about the random variable’s mean; a central moment a variance pﬃﬃﬃﬃﬃﬃ a standard deviation of the random variable; r ¼ l2 a skewness, i.e. a standardised central moment of the third order: c3 ¼ l3 =r3 values assumed by the Bernoulli distribution with probabilities p and q ¼ 1 p respectively; in Eq. (4) the coefficient c1 assumes values xl or xu the lower and the upper limit of the coverage interval respectively parameters of the beta distribution the value of a given parameter defined by Eq. (6); the parameter averages a temporal signal v ðtÞ from ts to ts þ T a ; the length of an interval of averaging a number of signal full periods included in T a a residual time of averaging: s ¼ T a n T the extremes of the segmentation error PDF’s support parameters of the segmentation error distribution; cf. Eqs. (8) and (7) a value of the period known to an experimenter resistance dependence of the thermistor absolute temperature on its resistance coefficients in a thermistor’s Hoge-2 empirical model; cf. Eq. (10) a standard deviation of a temperature value caused by the limited resolution of a resistance measurement of the thermistor the coefficient of asymmetry of the interval ½c3 min ; c3 max position about the origin a reference value used during a calibration procedure a standard uncertainty associated with a reference value V r the indication of the verified instrument obtained for a reference V r a normalised error of the verified instrument: d ¼ ðV v V r Þ=r

On the other hand, when the investigated realisation of quantity reveals significant temporal changes xðtÞ, then measuring the limits of these changes requires additional knowledge that allows for the evaluation of a time interval T obs after which accepting xl ¼ mint xðtÞ and xu ¼ maxt xðtÞ becomes credible. For instance, when xðtÞ is known to be almost periodic, then at least an overestimation T max of its period is necessary just to observe a sufficiently long signal segment T obs > T max . At times, the limits of some quantity x1 to be bounded can be evaluated on the basis of the known limits of other quantities

x2 ; x3 ; . . . ; xm , because the dependence x1 ¼ f ðx2 ; x3 ; . . . ; xm Þ is known. Sometimes the function f ðÞ is itself limited (cf., e.g., [25], pp. 28–29) which additionally facilitates finding the bounds of x1 . Nevertheless, from the practitioner’s point of view, the distributions of the unlimited supports are acceptable in the, not so rare cases, when the mean value of the measurand lies sufficiently distant from both bounds xl and xu , i.e., when both differences m1 xl and xu m1 are greater than several standard deviations r. A feature signifying that the PDF of the measurement result does not fulfil some assumptions from the GUM uncertainty framework ([7]: 5.6–5.8) seems to be a PDF asymmetry. It may be caused by a significant non-linearity of the mathematical model of measurement ([7]: 5.8.1 b), 4.1 and also [9]: G1.5, 4.1) or by a distribution asymmetry of one of the dominating input quantities of the model. The asymmetry of the probability distribution may be recognised both by using specialised functions [6,23] and on the basis of different numerical parameters [12,16,29,44,28], e.g., by comparing the interquartile ranges, or by investigating the distribution skewness [32,47,48] c3 ¼ l3 =r3 , where l3 ¼ m3 3m1 m2 þ 2m31 ; pﬃﬃﬃﬃﬃﬃ r ¼ l2 ; l2 ¼ m2 m21 , and mi ; i ¼ 1; 2; 3 stand for moments about the origin of the first three orders, i.e., the mean, the mean-square value, and the third moment, while li describe the corresponding central moments. Although well-fitted for distribution asymmetry identification, the skewness is often troublesome in practice as it is very sensitive to outliers and thus reveals large standard errors when estimated [31]. On the other hand, a sample of the measurement results is sometimes not available to researcher. The reasons for this may be versatile, e.g., the population from which the results are to be drawn may be beyond the reach of the experimenter, or – even – the population may no longer exist. Secondly, obtaining a sufficiently large sample may be unacceptably costly, e.g., when destructive testing is necessary, or the investigation is dangerous. Thirdly, when samples are to describe some large-scale natural phenomena (e.g., they could come from weather, climate, seismological, geological, or astronomical observations) then the period necessary to collect a data set of an adequate volume may be too long. In all such cases one has to content oneself with the moment values that one already has. When a preliminary analysis, like the one proposed in this paper, indicates that the distribution deviates from a Gaussian one, then elaborating a model that quite closely describes a physical process becomes unavoidable. The simulated samples provided by the model, in place of their unavailable real values, constitute a source of some insight into an actual distribution. The values of skewness c3 generally depend on m1 ; m2 , and m3 . If the values of xl ; xu ; m1 , and m2 are the only known variables, then c3 cannot be strictly determined. However, the limits c3 min and c3 max , calculated as a logical consequence of knowing xl ; xu ; m1 ; m2 , may be closely enough located or have values that render them useful in some practical applications. If, for example, it were known that c3 max < 1, it would signify that although the shape of the PDF is unknown it reveals a significant left tail. On the other hand, for c3 min > þ1, the heavy right tail would be expected. The purpose of this work is to determine such pairs of moments of the first two orders, for which the assumption of the distribution symmetry may be directly rejected.1

1 From the point of view of further applications, the usage of the central second moment l2 (i.e., the variance) instead of m2 would probably be more convenient, since the value of variance is often directly available in measurements. However, the derivation of the main results of this paper became simpler when both moments about the origin were used (cf. Appendix and Fig. 3). Therefore, the values of m1 and m2 are generally used here. Nevertheless, in all the cases where the substitution m2 ¼ l2 þ m21 provides simpler, easier to interpret or directly usable results, the variance l2 , or the standard deviation r is also used (cf., e.g., Eq. (4), or application examples in Section 6).

G. Smołalski / Measurement 149 (2020) 106968

The paper is organised in the following way. In Section 2, the method of probability support bounding is introduced. In Section 3 and in the Appendix, the procedure of the c3 min and c3 max calculation is described. In Section 4, the total error of the coverage interval limits determination is defined and its dependence on the skewness value of the probability distribution is shown. Then, in Section 5, within the entire set of possible values of fm1 ; m2 g sub-sets are found, in which the modulus of the skewness is substantially large. For these sub-sets, an assumption about the distribution symmetry would lead to unacceptably large errors while determining the coverage interval. Then, in Section 6, three different metrological applications of the method are presented. Finally, in Section 7, various applicational aspects of the presented method are discussed. 2. A method of probability support bounding Besides the MC numerical simulation, properties of a random variable distribution may also be investigated analytically, on the basis of all of the available relevant information. Probably the most complete description of the state of knowledge concerning the value of c3 , when the values of xl ; xu ; m1 ; m2 are known, would be provided by a posteriori PDF: f posterior ðc3 m1 ¼ m01 ; m2 ¼ m02 Þ, where m01 and m02 denote some particular, known values of corresponding moments. This PDF may be evaluated on the basis of the Bayes’ theorem which, for the case under consideration, would take the form [21,3,20,8,2,24,18]:

f posterior ðc3 m1 ¼ m01 ; m2 ¼ m02 Þ / f lh ðm1 ; m2 c3 ¼ c03 Þ f prior ðc3 Þ

ð1Þ In the equation given above, the sign / means is proportional to while indices lh and prior stand for likelihood and a priori, respectively. Obtaining credible values of the PDFs from the right-hand side of (1) often faces difficulties some of which have been reported in [38]. Additionally, the assumption that f prior ðc3 Þ is almost constant over the support of f lh ðm1 ; m2 jc3 ¼ c03 Þ may not be maintainable. The latter is referred to as the assumption of dominating likelihood ([3], pp. 21–24) often simplifies calculations. Such a situation takes place, e.g., when the parameters assumed to be known (here m1 and m2 ) are not sufficiently similar to the parameter which is going to be estimated (here c3 ). If one abandons an attempt to discover a particular shape of f posterior and contents oneself with knowing only the limits of the f posterior support, then the application of a Bayes’ theorem is not necessary. Instead, the proposed method of the PDF support propagation may be used. The essence of this method is to search for the global extrema of the estimated parameter (c3 ) for constraints described by the available knowledge (m1 ¼ m01 ; m2 ¼ m02 ). The method of support propagation may also be considered as the first stage of the full investigation into the posterior PDF. Sometimes, this initial stage may prove sufficient. The starting point of this method is to assume that the probability distribution of the random variable describing experimental results has a finite support. For such a random variable the moments of its distribution, as long as they exist, are limited as well. In practice, the limit points of the moments are of interest. For instance, if a random variable X takes values from the interval ½xl ; xu , the mean value of this variable belongs to the same interval: m1 2 ½xl ; xu , whereas the mean-square value takes values between the limit points [38]:

(

m2 m2

min

max

¼ ¼

min x2l ; xu2 for xl xu > 0 0

max x2l ; xu2 :

otherwise

ð2Þ

3

If xl ; xu , and m1 are known, the limits of the interval for m2 are given by [38]:

m2

min

¼ m21

m2

max

¼ m1 ðxl þ xu Þ xl xu ;

ð3Þ

and bounds for c3 , for points from this area only, should be looked for. 3. Bounding surfaces of the skewness when the mean and the mean-square values are known In the case under consideration where random variables have bounded supports, the most extreme values of the centralised moments are revealed by discrete distributions. For such distributions, the point masses of probability are significantly distant from each other, thus providing relatively large moment values. The limit values of the skewness c3 min and c3 max are reached by the random variables of Bernoulli’s (i.e., the two-point) distribution. For such random variables the limit values of the skewness were found to be (for details please cf. Appendix):

c3 min=max ¼

c21 2c1 m1 þ 2m21 m2 c1 m1 r qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ ; r c1 m1 ðc1 m1 Þ m2 m21

ð4Þ

where c1 is either equal to xl or xu for the c3 min or c3 max , respectively, while r stands for a standard deviation, as previously. The obtained limits are in accordance with simplified evaluations known for standardised, finite support, random variables [33]. Unfortunately, as they are general, the limits (4) are wider than of those known for unimodal distributions [40]. The exemplary limits of c3 min and c3 max are presented in Fig. 1. The cross sections of the bounding surfaces (4) are given there for several values of m1 . For instance, if m1 ¼ 1:5 and m2 ¼ 3:5 the values of c3 min and c3 max depicted in Fig. 1 constitute the outer assessment of the skewness uncertainty interval. It could be thought of as the coverage interval corresponding to the coverage probability equal to one ([42], entries 2.36 and 2.37). It is also seen in Fig. 1 that for some pairs of fm1 ; m2 g, one can be certain that the skewness is substantially positive or negative. In such cases, the assumption that the probability distribution is, e.g., Gaussian or Student’s-t would result in an erroneous evaluation of the coverage interval limits. To investigate this problem quantitatively, the influence of the PDF skewness on the error of the coverage interval determination must first be found. 4. An illustrative example of the total error of the coverage interval determination caused by probability distribution asymmetry In cases where the probability distribution of the measurement results is asymmetric, the limits of the coverage interval lie asymmetrically in relation to the obtained result even if determined in a probabilistically symmetric manner ([7], entry 3.15). If, despite the actual PDF asymmetry, the assumption of the distribution similarity to the normal distribution is maintained, then the corresponding limits of the coverage interval are determined with the following errors: LLN LLact and ULN ULact , where the symbols LL and UL designate the lower and upper limits respectively, while the indices N and act mean that the limit was determined for the quasi-normal or for the actual distribution of the results. Then, the total relative error of the coverage interval limits may be expressed as:

4

G. Smołalski / Measurement 149 (2020) 106968

Fig. 1. The cross sections of bounding surfaces of the skewness c3 values of m1 . The case of m1 ¼ 1:5 is elaborated in the text.

dL ¼

jLLN LLact j þ jULN ULact j : ULact LLact

min

and c3

max

calculated according to Eq. (4) for the exemplary values of xl ¼ 2 and xu ¼ 3 and for the given

ð5Þ

To investigate the influence of the PDF asymmetry on the error (5), some exemplary distribution must be chosen, in which the skewness may be changed in a relatively wide range by means of, preferably, one parameter alternation. However, to make the analysis selective, the distribution should be constructed so as to maintain constant values of the mean, of the variance, and also of the kurtosis l4 =r4 while changing the skewness. As an example of the Pearson distribution of a finite support, the dual parameter beta distribution was chosen. Its PDF reads: f U ðu; b1 ; b2 Þ ¼ K ub1 ð1 uÞb2 , where K is the normalisation constant depending both on b1 and b2 . To preserve the kurtosis constancy while the skewness is being changed, the coefficient b2 was made dependent on b1 to maintain l4 =r4 ¼ 3, i.e., to maintain a kurtosis equal to that of the Gaussian distribution. Then, finally, the distribution was centralised and normalised to reveal m1 ¼ 0 and r ¼ 1. Thus, the value of the beta distribution skewness was changed in such a manner so as to always maintain the values of the mean, the variance, and the kurtosis identical to those of the normalised Gaussian distribution. Equality of these three moments preserves the visual similarity of PDFs of both considered distributions. Their substantial difference (asymmetry and a finite support of the beta distribution vs symmetry and an infinite support of Gaussian) is accomplished by different values of their higher order moments. Then, for such a constructed PDF, revealing a controllable value of the skewness, the values of the quantiles of the orders ð1 pc Þ=2 and ð1 þ pc Þ=2 were calculated numerically for two values of the coverage probability pc ¼ 0:95 and pc ¼ 0:99. Those quantiles constitute the actual limits LLact and ULact of the coverage interval for the asymmetrical distribution. Subsequently, values of the error (5) were found. While doing this, the following limits for the Gaussian distribution were used: LLN ¼ ULN ﬃ 1:95996 for pc ¼ 0:95 or 2:25783 for pc ¼ 0:99. Fig. 2 presents the obtained values of the error for different values of skewness.

Note that, if the limits of the coverage interval are to be determined with a total error less than 10%, for example, the modulus of the distribution skewness should be jc3 j < 0:37 for the coverage probability pc ¼ 0:95 or jc3 j < 0:26 for pc ¼ 0:99 (please refer to Fig. 2). If greater errors dL 6 20% are acceptable, then the skewness moduli must be: jc3 j < 0:63 for pc ¼ 0:95 or jc3 j < 0:48 for pc ¼ 0:99. The above values of dL cannot be regarded as maximum or minimum. The choice of the beta distribution proved to be convenient for modeling asymmetrical distributions with limited supports. It is also asymptotically normal (for large values of b1 and b2 coefficients). So the order of magnitudes of the obtained values of dL are likely to be typical. Summarising, the value of c3 modulus even less than unity may already signify the necessity of a detailed distribution investigation. Therefore, a question arises, whether such small asymmetries can be recognised solely on the basis of knowing the value of xl ; xu ; m1 , and m2 .

5. The pairs of moment values for which distribution investigation is necessary Fig. 1 shows that for some pairs of fm1 ; m2 g the range of the skewness c3 possible values is entirely situated above or below zero. Sometimes, this allows recognition of distribution asymmetry. For instance, if for a given pair of fm1 ; m2 g values, c3 min ¼ þ0:5 is observed, then a heavy right tail of the distribution can be expected, for which both limits of the coverage interval are shifted to the right in relation to those that would be obtained for a Gaussian or Student’s t-distribution. For example, in the region bounded by (3), of all possible values of fm1 ; m2 g, contour lines have been determined for c3 min ¼ 0; 0:25; 0:5; 1; and also for c3 max ¼ 0; 0:25; 0:5; 1 (see the shadowed areas in Fig. 3). In all those areas the sign of the skewness is strictly determined. The areas between the contour for which lines correspond to the pairs fm1 ; m2 g

5

G. Smołalski / Measurement 149 (2020) 106968

Fig. 2. The errors (5) of the evaluation of the coverage interval’s limits for asymmetrical PDF revealing the skewness c3 when the assumption of the distribution normality is accepted.

c3 min 2 ð0; 0:25Þ; ð0:25; 0:5Þ; ð0:5; 1Þ; ð1; 1Þ, where c3 must be positive, or c3 max 2 ð1; 1Þ; ð1; 0:5Þ; ð0:5; 0:25Þ; ð0:25; 0Þ, where c3 must be negative. Minimum values of the error dL based on Eq. (5) for all those sub-areas may be found in Fig. 2. For example, in the regions corresponding to the skewness moduli greater than 0.5, the errors dL are greater than 14% for pc ¼ 0:95 and greater than 21% for pc ¼ 0:99. Such large values of error dL are usually unacceptable, which means that the assumption about the distribution symmetry (and thus also about its normality) are both erroneous and that a detailed investigation of the distribution is necessary. It could also be seen in Fig. 3 that the percentage of the fm1 ; m2 g pairs for which the necessity of a distribution investigation may be decided directly, is relatively high. For the shadowed areas of Fig. 3 it varies from 15% to 50%. A right-hand side, symmetrical form of Eq. (4) provides an additional way of understanding when a skewed PDF should be 1 expected. For this, a normalised difference: u ¼ c1 m r between the upper (c1 ¼ xu ) or lower (c1 ¼ xl ) limit value and the mean should be considered. For the evaluation of c3 min a value of u is negative, while for c3 max it is positive. An investigation of the function f ðuÞ ¼ u 1u shows that c3 min > 0 when u > 1 whereas c3 max < 0 when u < þ1. This means that in both these cases of skewed PDFs, our evaluation of the distribution support limit xl or xu should differ from the mean value less than the standard deviation r. 6. Applications 6.1. Investigation of distribution asymmetry of the segmentation error Averaged parameters of periodic signals are frequently measured in a time domain as [39]:

m^ ¼

1 Ta

Z ts

t s þT a

u½v ðtÞdt;

ð6Þ

where u½ describes a type of measured parameter of the voltage signal v ðt Þ, while t s and T a denote a starting instant and an averaging time, respectively. Such a procedure yields repeatable and accurate results when performed for an integer multiple of a real period T of the investigated signal, i.e. T a ¼ nT. On the other hand, when T a – nT, the averaging (6) provides results that depend on t s and usually differ from the proper value m of the measured parameter. ^ m, where m ^ denotes the value of (6) obtained The difference: Ds ¼ m for T a – nT, while m - the value obtained for T a ¼ nT or T a ! 1, is called a segmentation error. In situations where our knowledge of the period T of the investigated phenomenon is scanty, for instance it is merely known that T < T max , and the instrumentation used neither detects the period nor controls the starting instant t s in (6) to be always in the same phase of the measured phenomenon, the ^ are usually not repeatable. A probability distribuobtained results m ^ depends both on the distribution of instants t s tion revealed by m and on the shape of the function u½v ðt Þ. For signals with a limited range of variability, the PDF’s support of the segmentation error is limited as well. However, the possibility of evaluating the segmentation error bounds depends directly on our ability to assess the variability limits of the function u½v ðtÞ integrated in (6):

umin ¼_ min u½v ðtÞ; t2½0; T Þ

umax ¼max _ u½v ðtÞ: t2½0; T Þ

ð7Þ

For a uniform distribution of the starting instants over ts 2 ½0; T Þ, the limit values of the segmentation error reveal a distribution that is uniform as well, but has two Dirac-deltas at the extremes [37,35]. The expected value of Ds is zero then. The extremes of the segmentation error PDF’s support, which are important here, equal:

( Du

Ds Ds

min

max

¼ ¼

b x nþx Du ð1bÞð1xÞ nþx ( Du ð1bÞ x nþx Du b ð1xÞ nþx

for x 2 ½0; 1 bÞ for x 2 ½1 b; 1Þ for x 2 ½0; bÞ for x 2 ½b; 1Þ

ð8Þ

6

G. Smołalski / Measurement 149 (2020) 106968

Fig. 3. Regions in the fm1 ; m2 g plain in which the modulus of the skewness jc3 j is greater than 0, 0.25, 0.5, and 1. The darker the region’s shading, the more skewed the probability distribution. In this example xl ¼ 2 and xu ¼ 3.

All properties of this distribution depend on three parameters: on the product b Du, on a dimensionless parameter x ¼ Ts, and on the number n of averaged periods. The coefficient b describes a position of the averaged parameter m in relation to the values umin and umax (7): b ¼ u mumin u ; Du ¼ umax umin . The extremes max

min

(8) correspond to b 2 ½0; 0:5Þ. A residual time of averaging s denotes an interval over which the averaging time T a exceeds an integer multiple of the real period T, i.e. T a ¼ nT þ s. The coefficient x 2 ½0; 1Þ constitutes a normalised measure of the residual time. It is worth considering how the residual time s depends on our knowledge of the value of the period T of the investigated signal v ðtÞ. A real value of T is assumed here to be practically constant during the measurement but different from its known value T 0 by some value DT > 0, usually being a small part of T. Let, for instance, T ¼ T 0 DT, i.e. the known value T 0 of a period overestimates its real value T. In this case, an attempt to make T a an integer multiple of the known value T 0 leads to an averaging that is too long, since T a ¼ nT 0 ¼ nT þ nDT. In this case, a residual time equals s ¼ nDT, and is further assumed to be still smaller than T. On the other hand, when the real period is longer than its known value T 0 , i.e. T ¼ T 0 þ DT, then T a ¼ nT 0 ¼ ðn 1ÞT þ ðT nDTÞ. Now, for sufficiently small values of nDT, the residual time s ¼ T nDT becomes close to T, while its normalised value x - close to unity. On the basis of the distribution parameters (8), for the extreme values of the segmentation error the coefficient of skewness c3 was calculated and plotted for some exemplary values of b (Fig. 4, the inner line for each value of b). In practice, we are frequently not sure whether the known value T 0 of the signal period constitutes an over- or underestimation of its real T value. As discussed above, a transition of T 0 from overestimation to underestimation corresponds with passing a normalised residual time x from values close to zero to values close to unity. A comparison of the formulas (8) with the plots in Fig. 4 shows that as x approaches zero, both the extreme values Ds min and Ds max of the segmentation error vanish, but the PDF’s asymmetry becomes the strongest. Moreover, a sharp, stepwise change of the c3 sign, corresponding to a transition of x from 0 to 1, signifies a change in the character of the asymmetry.

Then, the limits c3 min ; c3 max were found for the distribution skewness. For this purpose, in the formula (4) the limits Ds min and Ds max and also the value of the standard deviation r were used, all calculated on the basis of a known, complete distribution of the segmentation error extreme values, [37,35]. Because the PDF’s asymmetry coefficient c3 is dimensionless, its limit values c3 min and c3 max depend solely on coefficients x and b, both of which are dimensionless as well. In Fig. 4, the lower and upper draws for each value of b describe just the limit values of c3 for each particular value of the residual time x. It can be seen from the graph that the range of x values for which PDF must be asymmetric, i.e. when c3 min > 0 or c3 max < 0, is the broader, the smaller is the value of b, i.e. the closer the measured value m lies to the value of umin . For b ¼ 1=2 the distribution asymmetry of the segmentation error can no longer be ascertained with certainty for any value of x. Subsequently, the possibility of detecting the PDF’s asymmetry was verified for signals for which the value of b is neither close to zero nor to unity, but also not very close to 1=2. Instead of choosing a random example, an important class of band-limited signals, was considered. Obtaining the values of the averaged parameter m very close to any extreme umin , or umax (7), requires that u½v ðt Þ in (6) must assume the shape of a narrow pulse. Such signals occupy a wide range of frequencies. Because signals we come across in practice often have a constrained bandwidth, their averaged parameters must lie somewhere in a central subinterval of the range ½umin ; umax . The PDF skewness c3 for a segmentation error of such signals may assume values of both signs which makes asymmetry detection less obvious (Fig. 4). Moreover, for band limited signals, the real limits of the segmentation error are usually closer to zero than the extremes (8). As a result, the obtainable limits (4) of the skewness are wider than those actually possible for such signals. As an example, the signal of a limited bandwidth was considered that contains only the first three harmonics. In this class of signals, one was selected that revealed the highest possible asymmetry of its averaged value placement between umin and umax . As it was presented in [36], the value of b obtainable by the signals containing only the first m non-zero harmonics is bounded to the

7

G. Smołalski / Measurement 149 (2020) 106968

Fig. 4. The skewness c3 of the distribution of the segmentation error extreme values vs. a normalised residual time x. For each value of b, the inner line represents a real value of c3 , while the upper and lower lines – the limits c3 max ; c3 min evaluated on (4).

range b 2

h

1 ; m mþ1 mþ1

i

which for m ¼ 3 yields: b 2

1

;3 4 4

. A composed

function u½v ðtÞ revealing an extreme value of b ¼ 14 or b ¼ 34 should P mkþ1 sin kxt ðk 1Þ p2 , have a structure of: u½v ðtÞ ¼ m k¼1 m where x ¼ 2p=T. The segmentation error for such a function: R ^ m ¼ T1 tts þs u½v ðtÞdt depends not only on ts but also on s Ds ¼ m a

s

and the number n over averaged periods. For the given values of s and n, the function Ds ðts Þ transposes a uniform distribution of t s into a sought distribution of the segmentation error. For a considered example, a conventional analytical evaluation of the Ds distribution would be rather involved, because of a relatively large number of six local extrema of the transposing function Ds ðts Þ. Therefore, a Monte Carlo method was used here for a distribution visualisation. A histogram obtained for 106 sample values is presented in Fig. 5. The histogram has the form of two U-shaped functions: the one inside the other, both visibly asymmetrical. To verify the effectiveness of the proposed method of PDF asymmetry identification, the limits of the skewness were evaluated on the basis of formula (4):

c3 min ﬃ 1:026; c3 max ﬃ 1:287:

ð9Þ

These values apparently show the right tail of the PDF (c3 min > 0), which were obtained without use of the Monte Carlo method. In the assessment (9), the extreme values (8) of the segmentation error were applied. In them, real values of the function u½v ðtÞ bounds were used, and also a real value of r - understood by the assumption. In practice, however, only approximate assessments of these values are usually available. Nevertheless, since the difference c3 max c3 min is here a relatively small part of c3 min > 0, a small overestimation of Du in formulae (8) is acceptable and should not preclude the possibility of certain identification of a PDF asymmetry. This example illustrates that for periodic signals, amplitude constraints, even those that are relatively loose, give rise to a finite support of the resulting segmentation error. A function Ds ðt s Þ, relating the segmentation error with an instant ts when averaging starts, is not only strongly non-linear, but even reveals multiple local extrema. In consequence, the PDF of the segmentation error not only has finite support, but often is asymmetrical.

Identification of this asymmetry with the aid of m1 and m2 values (or m1 and r, equivalently), that are typically used for uncertainty assessment, is possible and much simpler than employing any conventional method. 6.2. Distribution asymmetry caused by the non-linearity of the measurement function The purpose of this example is to show that the proposed method is able to detect the asymmetry of a measurement result PDF for which the skewness jc3 j is of the order of 0:5 thus signifying a closer investigation of this PDF. An univariate transfer function of a thermistor was used as an example of a non-linear measurement function. For temperature measurement a Hoge-2 empirical equation [13,4]: 2

3

1=T ¼ a þ b ln ðRÞ þ cln ðRÞ þ dln ðRÞ

ð10Þ

constitutes a widely accepted model of a thermistor’s T ðRÞ dependence, where T denotes, in this example, a temperature in Kelvin. To obtain numerical values of c3 , coefficients a; . . . ; d in (10) were computed for a glass encapsulated thermistor of Rnom ¼ 20 kX @ 25 C, specifically for temperature measurements ranging from 80 C to þ300 C, [10]. The values obtained were: a ﬃ 0:001161; b ﬃ 0:0002124; c ﬃ 1:061 107 ; d ﬃ 1:029 107 . Putting them into (10) provides us with a complete T ðRÞ function for the thermistor. Then, the end points of a range of temperatures to be measured were assumed 50 C and þ300 C, for which resistances of the considered thermistor were 1 646 200 X and 15:55 X, respectively. If resistances from this interval are to be measured by a single-range ohmmeter, and if it is able to resolve 100 000 resistance values (i.e., its analog-to-digital converter has a 17-bit resolution) then the resolution DR ¼ 20 X will be obtained. The most asymmetrical PDFs of the temperature results are obtained for temperatures greater than 240 C that correspond to thermistor resistances less than 40 X. For such small values the most important part of the ohmmeter uncertainty is its additive component caused mainly by the limited resolution of the device. A uniform probability distribution is usually assumed for the resolution-caused uncertainty component. Therefore, for the resistance result of 40 X, an interval ð20 X; 60 XÞ constitutes support

8

G. Smołalski / Measurement 149 (2020) 106968

Fig. 5. A histogram of the segmentation error values for the function u½v ðtÞ in (6) containing solely the first three harmonics and revealing the smallest possible value of b ¼ 1=4; an example for: n ¼ 3 periods, a normalised residual time x ¼ 0:06, and a number of samples 106 .

of the resistance distribution. Then, the corresponding support for temperatures will be ðT ð60 XÞ; T ð20 XÞÞﬃ ð491:3 K; 556:0 KÞ, that may be used as xl and xu in (4). To evaluate the limits c3 min=max of the temperature distribution one also needs the values of m1 and m2 . As a rough approximation of m1 the value of T ð40 XÞ may be used, which equals 513.4 K. The value of m2 may be evaluated as: m2 ¼ m21 þ r2T . As an approximation of a standard deviation rT of the temperature distribution, a linear approximation of rR , [9]. As a conseT ðRÞ is often used, which leads to: rT dT dR quence of the assumption of the distribution uniformity for R, we pﬃﬃﬃ have rR ¼ DR= 3 ﬃ 11:55 X. Then, the derivative of T ðRÞ was found for R ¼ 40 X as equal to 1:422 K=X, that gives rT 16:42 X. Finally, an interval of the temperature distribution skewness was found (4): c3 2 ð0:604; þ2:21Þ. It is worth noting that although the majority of this interval lies above zero, the minimum c3 min is still negative here. This means that the c3 value may be principally both positive and negative in this case. On the other hand, because a strong non-linearity of the measurement function (10) suggests a positive skewness of the temperature PDF, the value of c3 was also conventionally evaluated. An exact procedure for distribution transformation [27] requires a calculation of the inverse of the measurement function followed by a differentiation of the result. In the case under consideration the function TðRÞ (10), although tedious, is still analytically manageable. This classic procedure leads here to PDF described by a highly involved function that decreases approximately exponentially with temperature within the support ð491:3 K; 556:0 KÞ. It is presented as a solid line in Fig. 6.2 Consequently, its shape was also approximated by an appropriate histogram obtained by a Monte Carlo method. As it can be seen in Fig. 6, both results are consistent and right-skewed. Subsequently, not only the exact values of the distribution moments were found: m1 ¼ 516:6 X; rT ¼ 17:95 X; m2 ¼ 267 165 X2 , but also an actual value of the skewness was acquired c3 ¼ 0:464. The latter supports the right tail of the PDF in evidence in Fig. 6. Thus we can see, that for some distributions for which the actual values of c3 are substantially positive, the assumed criterion for PDF skewness determination: c3 min > 0 seems to be too stringent. 2 Of course, to achieve similar transformations, a measurement function must be calculable at least numerically. Also a numerical, approximate value of its derivative is necessary. Although much more laborious, such numerical approximations are usually obtainable even if the description of the measurement function is not given directly but, e.g., in the form of an equation or of an algorithm.

If for some pair of fm1 ; m2 g the value of c3 min is positive, then one is unable to construct any distribution which would have a negative value of c3 . However, even if c3 min is negative in some cases, the c3 value of the majority of distributions still may be substantially positive. Therefore, it should be noted that for distributions such as those in this example, a major part of the interval ½c3 min ; c3 max lies to the right of zero. Thus, the conditions c3 min > 0 or c3 max < 0 may be relaxed assuming that probability distribution should be suspected asymmetrical if a sufficiently large percentage of the interval ½c3 min ; c3 max lies on the side of zero where the majority of that interval is. For this the following coefficient j is determined:

j¼

8 þ1; c3 min > 0 > > > c3 max > > ; c3 max c3 min < 0 > c c > > 3 max 3 min > > jc3 max j > jc3 min j > < > > c3 min > > > c3 max c3 min ; c3 max c3 min < 0 > > > > jc3 max j < jc3 min j > > : 1; c3 max < 0

ð11Þ

By definition, the modulus of j is: 12 < jjj 6 1. For instance, the value of j ¼ 0:75 could be chosen as sufficiently large to suspect a positive skewness of the PDF. For the distribution considered here, such a choice results in asymmetry recognition for an actual value of c3 equal to 0:464, since then the value of j equals 0:785. Thus, loosely speaking, a quite reasonable sensitivity of the method is reached. Finally it was considered, how the estimation errors of m1 and rT influence the value of the asymmetry coefficient j. In the investigated example, both m1 and rT were underestimated. It was discovered that a simultaneous influence of both these underestimates causes a resultant increase of j. This seems to be a fortunate circumstance, since with the estimated values of m1 and rT , the proposed procedure already signifies the necessity of the distribution investigation for slightly less skewed PDFs than is required. In this example, the influence of a typical sensor non-linearity on the PDF asymmetry of the uncertainty resulting from a limited resolution of an analog-to-digital converter was considered. It was demonstrated that even if a 17-bit resolution converter is used, the uncertainty distribution, besides being limited, is also asymmetrical. Its asymmetry (c3 ﬃ 0:464) is not so marked as in the previous

G. Smołalski / Measurement 149 (2020) 106968

9

Fig. 6. PDF of temperature measurement with a thermistor sensor. A solid line corresponds to the PDF obtained analytically, while the graph bars correspond to a histogram obtained using a Monte Carlo method for 107 samples.

example. Therefore, the limits of c3 assessed on (4) also have opposite signs and do not indicate asymmetry for certain. Nevertheless, the proposed coefficient (11) indicates that the majority of the interval ½c3 min ; c3 max lies to the right of zero (j > 0:75) thus showing the right tail of PDF. It is to be emphasised, that the detection procedure, involving (4) and (11) usage, is quite easy, and does not require any data that isn’t already available. Then, a full investigation of this PDF may be performed either analytically or using a Monte Carlo method, or else – when possible – a wide range of temperature measurement should be divided into several subranges, thus making the PDF tops in Fig. 6 sufficiently flat. 6.3. Verification of a measuring instrument calibration During a calibration procedure a series of reference quantities are measured by an indicating measuring instrument to be verified. For a particular reference realisation, with which a value V r and a standard uncertainty rr are associated, the indication of the verified instrument is V v . The standard uncertainty rr of the reference is assumed to be a sufficiently small part of the declared standard uncertainty r of the verified instrument, e.g., rr < r=3. In the following, a case is considered, where information items provided by the instrument being tested and by the reference are mutually non-contradictory. A simplified condition for this reads: V v 3r < V r 3rr and V v þ 3r > V r þ 3rr , which may be condensed to:

jDV j ¼ jV v V r j < 3ðr rr Þ:

ð12Þ

Our purpose is to deduce the asymmetry of the PDF attributed to a measurand by the verified indication V v , on the basis of the available information. The mean value m1 of the distribution under consideration may be estimated as the indication V r of the reference instrument. Consequently, the PDF that is attributed to the measurand has the following mean-square value:

m2 ¼ m21 þ r2 ¼ V 2r þ r2 ;

ð13Þ

where r denotes a standard uncertainty associated with the verified instrument, provided by its producer, and describing the constructional and technological quality of this instrument. Moreover, r depends on the imposed environmental constraints (e.g., on a temperature range prescribed) and also on a time interval that has

passed since a previous verification. Therefore, the value of r should not be subjected to change even during a calibration procedure. Finally, the bounds ½xl ; xu of the distribution support may be evaluated with almost certainty as: xl ¼ V v 3r and xu ¼ V v þ 3r. Putting the above values of m1 ; m2 ; xl , and xu into Eq. (4) results in the following bounds for the PDF skewness:

c3 min=max ¼

ðd 2Þðd 4Þ ; d3

ð14Þ

where minus signs correspond to c3 min , and plus signs to c3 max and d denotes an error of the verified instrument indication DV ¼ V v V r related to the value of the standard uncertainty: d ¼ DV=r. Fig. 7, where the bounds (14) are depicted, shows that for all values of d – 0 the limits of c3 become asymmetrical, indicating a PDF’s right tail for V v > V r or a left one for V v < V r . Since the value of r in (13) is fixed, the range of possible values of m1 is constrained now by both the upper limit m2 max in (3) and also by the h pﬃﬃﬃ pﬃﬃﬃi parabola (13) with a range of 2 2; 2 2 , which is slightly narrower than ½3; 3 that it would be if (13) were not imposed. For jdj > 2, i.e., for jV v V r j > 2r, the distribution of the result must be asymmetrical, because then either c3 min > 0 or c3 max < 0. Moreover, it can be found that a substantial asymmetry of the c3 limit values in relation to zero might be expected already for jV v V r j > 1:154 r, since then the asymmetry coefficient (11) is j > 0:75. In other words, although error values 2r < jV v V r j < 3r do not yet signify a contradiction between both instrument indications, they do mean that the PDF of the verified instrument must be asymmetrical for which symmetrical uncertainty limits are inadequate. In such cases the result should be corrected by the value DV ¼ V r V v either numerically or by the appropriate physical adjustment. It would be good to introduce the above correction for errors greater than 1:154 r, since from that point many PDFs are asymmetrical. The above-mentioned result may be easily explained and justified intuitively. For symmetric distributions of the limited supports, the maximum value of the variance (and thus also the m2 value) is determined by the smaller of the distances m1 xl and xu m1 . That maximum variance is reached by a symmetric (p ¼ 0:5) Bernoulli distribution. If the observed value of m2 exceeds its maximum value obtainable in a symmetric manner, then it is obvious that a random variable must assume some values more

10

G. Smołalski / Measurement 149 (2020) 106968

Fig. 7. The limits of the skewness c3 of the PDF associated with a measurand for different values of the normalised error d ¼ ðV v V r Þ=r of the verified instrument.

or far more distant from m1 than the closest bound, and in the direction opposite to this of the nearest bound. This, however, signifies a PDF asymmetry, and that is exactly what the conditions c3 min > 0 or c3 max < 0 mean for Eq. (14). Finally, a required relation r=rr should be discussed. Of course, greater values of this quotient are technically advantageous but usually costly. It seems reasonable to assume that for the value of the error jV v V r j ¼ 2r a lack of contradiction between both instrument indications should be certain, i.e., the coverage interval for a reference value should still lie entirely within the corresponding interval for the verified instrument. For this the requirement of r=rr P 3 is minimal but sufficient. To conclude: when the error is jDV j > 2r, the assumption concerning the bounds of the uncertainty interval of a verified instrument together with the fixation of its standard uncertainty r contradict a symmetry requirement of the measurement result PDFs. To re-symmetrise the PDF it is advisable that the verified instrument indications should be corrected for jDV j > 1:154 r already. A normalised measure of the error d is commonly used for the detection of the excessive discrepancy of a result from a reference value. When the latter is a weighted mean of a set of values, normalised deviations of all values from that reference (referred to as En -values then) are used as components of the statistics for the inconsistency test of the entire set of values, [5]. The above example shows that such a normalised error d may also be used for distribution asymmetry identification.

7. Discussion and conclusions It is to be noted that the proposed method may be useful also in other applications of skewness c3 assessment. For instance, in physics, models of electric charge transport are verified by investigating the shot component of a noise signal. Shot noise typically reveals an asymmetrical probability distribution but usually is masked by normally distributed thermal noise. The skewness of the compound noise signal is thus a measure of the shot noise contribution [30,41]. Next, c3 may be used as a powerful test for an outliers presence [22].

In many metrology applications there is no direct access to individual samples while the outputs may consist of the mean and mean-square values (or – equally – of the mean and the standard deviation). That is why the moments m1 and m2 were considered here as known parameters. This is not only a very typical case when an averaging of the signal is performed in the domain of a quantity (see, e.g., [43,11,15]) but also when an averaging is performed in the numbers domain, still on-line, i.e., without memorising sample values. Less common, examples of m1 and/or m2 on-line measurements include the SEM technique [45] and stochastic comparison [34]. In all these methods the user obtains values of m1 or/and m2 but has no possibility to directly obtain the value of the third order moment. Nevertheless, as it has been proven here, a relatively good and useful approximation of c3 may sometimes be obtained. A complete Bayesian model of the PDF evolution, from its prior shape up to its final a posteriori form, suffers from particular difficulties: either computational or in relation to assumptions [38]. Therefore, a simplified method was proposed here, in which no specific assumptions are made about the shapes of prior distributions. Instead, the analysis is based solely on the assumption that some prior distributions have finite supports, and the limits of those supports are known. In particular, the support limits together with the values of the first two moments were used here for the asymmetry recognition of probability distribution. A twopoint Bernoulli distribution was found to have the extreme values of the skewness, and formulas (4) have been found specifically for it. These formulas constitute the most general and universal result of the paper. Summarising, if an actual distribution of a measurement result random variable is considerably skewed, a typical assumption about its similarity to the normal distribution leads to an erroneous evaluation of the coverage interval limits. Therefore, the total relative error of those limit determinations has been defined (5). Then, its dependence on the skewness value has been investigated (Fig. 2). It was noted that the distribution asymmetries for which the skewness’ modulus is jc3 j 0:5 may already lead to an error (5) ranging from ten to twenty percent. Next, the value sets fm1 ; m2 g have been delimited for which the conclusion that the

11

G. Smołalski / Measurement 149 (2020) 106968

ulus of the negative one should be minimised. To maximise the positive contribution, the whole probability mass taken by the random variable above the mean should be concentrated at one point c1 located at the greatest possible distance from m1 , i.e. at the right extreme of the support c1 ¼ xu . Then, to minimise a negative contribution to c3 , the whole probability mass taken below m1 should also be concentrated at one point, c2 . This guarantees that a required value of m2 will be obtained with a minimum distance jm1 c2 j, on the third power of which the skewness depends. Thus, the distribution maximising c3 must be of a Bernoulli type, i.e., the probability should be concentrated at two points: c1 and c2 , assumed with the probabilities p and q ¼ 1 p, respectively. For all such Bernoulli distributions a choice of the c2 and p values still can be done in many ways leading to different values of c3 . Thus, we have a problem of the extreme finding of the function:

distribution is substantially skewed is just a logical consequence of the knowledge of xl ; xu ; m1 , and m2 values (please refer to Fig. 3). It has been shown, that distribution asymmetry may be recognised this way for a substantial part (15–50%) of all the possible values of fm1 ; m2 g. The examples considered in Section 6 together with Fig. 3 lead to the conclusion that the effectiveness of the proposed method depends on the variability of the random variable. Distributions exhibiting greater values of variance usually have a better chance of being identified as asymmetrical. For distributions revealing small variability, i.e., such as for which m2 is only slightly above its minimum values m21 , the asymmetry detectability requires a sufficiently close determination of one of the distribution support limits, cf. Fig. 3. The application examples given in Section 6 describe circumstances when asymmetrical distributions for which jc3 j 0:5 can be met. It was shown that the proposed method is able to detect such asymmetries. It is worth noting that the requirement of the mean and meansquare values to be known exactly, may be relaxed assuming that the value of the parameter belongs to some limited interval. This does not violate the assumption that the distribution supports of m1 and m2 are finite thus enabling the application of the proposed method. Then, the values of c3 min and c3 max should be searched as global extrema of the functions (4) found over some sets of values: ½m1 min ; m1 max and ½m2 min ; m2 max instead of for their particular exact values. The presented method of probability support propagation needs much weaker, though physically well-justified, assumptions than the methods where propagation of a complete probability distribution is investigated [46,21,17,19]. Although the proposed method may be considered as an initial stage of the full PDF shape investigation, this stage is often sufficient for distribution asymmetry identification.

The numerator of the right-hand side of (15) was obtained by a substitution of the equation m3 ¼ c31 p þ c32 ð1 pÞ into the formula l3 ¼ m3 3 m1 m2 þ 2 m31 . When the maximum value of the skewness is searched for, the value of c1 is maintained on the xu level and therefore the function (15) depends solely on two variables c3 ðc2 ; pÞ. The method of Lagrange multipliers was utilised [1] for a maximum of (15) and (16) finding, that yields the following necessary conditions for the extreme location:

Funding

c2 ¼

This work was supported by Wrocław University of Science and Technology (Grant No. 0401/0040/18). Declaration of Competing Interest The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements The author would like to express his deep gratitude to Professor D. Robert Iskander, from Wrocław University of Science and Technology, for his encouragement, help and valuable discussions. Appendix A. The limit values of skewness - the derivation of Eq. (4)

c3 ðc2 ; pÞ ¼

l3 c1 3 p þ c32 ð1 pÞ 3 m1 m2 þ 2m31 ¼ 3=2 r3 ðm2 m21 Þ

ð15Þ

with the constraints:

m1 ¼ c1 p þ c2 ð1 pÞ

ð16Þ

m2 ¼ c21 p þ c22 ð1 pÞ :

c1 m1 m2 c1 m1

and p ¼

c21

m2 m21 ; 2 c1 m1 þ m2

ð17Þ

where c1 ¼ xu in this case. Substitution of these values into (15) gives us (4). Then, a numerical verification was performed that the extremum found was really a maximum and not a minimum. Finally, a right-hand side version of the formula (4) was obtained by a simple substitution: m2 ¼ r2 þ m21 . During the process of the minimum c3 min finding, the way of reasoning was generally similar to that presented above. Minimisation of c3 requires, however, that the entire probability mass taken below the mean is concentrated at the smallest possible value, i.e. at c1 ¼ xl , thus maximising a modulus of a negative contribution to c3 , being proportional to ðc1 m1 Þ3 . Similarly, the whole probability mass above the mean, constituting a positive contribution to c3 , to be minimised in this case, should be concentrated at one point, also. Searching for the minimum of the function c3 ðc2 ; pÞ (15) with the constraints (16) leads to the structure (4) again, in which now c1 ¼ xl should be assumed. References

It is assumed that the support of the probability distribution is constrained to the interval ½xl ; xu of known ends. Also the values of the first two moments m1 and m2 are assumed to be known. However, the assumption of distribution unimodality will be omitted here, since the minimum and maximum values of the skewness c3 for unimodal distributions are known, [40]. A value of c3 consists of two components. A positive component is produced by the values exceeding the mean m1 while a negative one – by the values lying below it. Maximisation of the skewness c3 requires maximisation of the positive contribution while the mod-

[1] G.B. Arfken, H.J. Weber, Mathematical Methods for Physicists, Academic Press, San Diego, 1995. [2] J. Beyerer, The Value of Additional Knowledge in Measurement – A Bayesian Approach, Measurement 25 (1) (1999) 1–7. [3] G.E. Box, G.C. Tiao, Bayesian Inference in Statistical Analysis, Addison-Wesley Publishing Company, Reading, 1973. [4] C. Chen, Evaluation of resistance-temperature calibration equations for NTC thermistors, Measurement 42 (7) (2009) 1103–1111. [5] M.G. Cox, The evaluation of key comparison data: determining the largest consistent subset, Metrologia 44 (2007) 187–200. [6] F. Critchley, M. Jones, Asymmetry and gradient asymmetry functions: densitybased skewness and kurtosis, Scand. J. Stat. 35 (2008) 415–437.

12

G. Smołalski / Measurement 149 (2020) 106968

[7] Evaluation of measurement data – Supplement 1 to the ‘‘Guide to the expression of uncertainty in measurement”– Propagation of distributions using a Monte Carlo method, Joint Committee for Guides in Metrology (JCGM 101:2008). [8] C. Elster, Calculation of uncertainty in the presence of prior knowledge, Metrologia 44 (2007) 111–116. [9] Guide to the Expression of Uncertainty in Measurement (GUM), International Organization for Standardization (1993, 1995). [10] DO-41 standard glass encapsulated thermistors. Part number 203RH1K. Available at http://www.ussensor.com (2017). [11] R. Goyal, B.T. Brodie, Recent advances in precision ac measurements, IEEE Trans. Instrum. Meas. 33 (3) (1984) 164–167. [12] R.A. Groeneveld, G. Meeden, Measuring skewness and kurtosis, J. R. Stat. Soc. Ser. D 33 (4) (1984) 391–399. [13] H.J. Hoge, Useful procedure in least squares, and tests of some equations for thermistors, Rev. Scientific Instrum. 59 (6) (1988) 975–979. [14] J. Jakubiec, System oriented mathematical model of single measurement result, Metrol. Meas. Syst. 13 (4) (2006) 407–419. [15] M. Kampik, M. Klonz, T. Skubis, Thermal converter with quartz crystal temperature sensor for ac-dc transfer, IEEE Trans. Instrum. Meas. 46 (2) (1997) 387–390. [16] T.-H. Kim, H. White, On more robust estimation of skewness and kurtosis, Finance Res. Lett. 1 (2004) 56–73. [17] K. Klauenberg, G. Wübbeler, B. Mickan, P. Harris, C. Elster, A tutorial on Bayesian Normal linear regression, Metrologia 52 (2015) 878–892. [18] I. Lira, Dealing with prior knowledge about the measurand, Measurement 78 (2016) 344–347. [19] I. Lira, The GUM revision: the Bayesian view toward the expression of measurement uncertainty, Eur. J. Phys. 37 (2) (2016) 025803. [20] I. Lira, W. Wöger, Bayesian evaluation of the standard uncertainty and coverage probability in a simple measurement model, Meas. Sci. Technol. 12 (2001) 1172–1179. [21] I. Lira, W. Wöger, Comparison between the conventional and Bayesian approaches to evaluate measurement data, Metrologia 43 (2006) S249–S259. [22] J.H. Livesey, Kurtosis provides a good omnibus test for outliers in small samples, Clin. Biochem. 40 (2007) 1032–1036. [23] H.L. MacGillivray, Skewness and asymmetry: measures and orderings, Ann. Stat. 14 (3) (1986) 994–1011. [24] L. Mari, Notes towards a qualitative analysis of information in measurement results, Measurement 25 (1999) 183–192. [25] A. O’Hagan, Probability: Methods and Measurement, Chapman and Hall, London, New York, 1988. [26] A. O’Hagan, Eliciting and using expert knowledge in metrology, Metrologia 51 (2014) S237–S244. [27] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGrawHill, 1965. [28] P.N. Patil, D. Bagkavos, A.T.A. Wood, A measure of asymmetry based on a new necessary and sufficient condition for symmetry, Sankhya: Indian J. Stat. 76A (1) (2014) 123–145.

[29] R.H. Randles, M.A. Fligner, G.E. Policello, D.A. Wolfe, An asymptotically distribution-free test for symmetry versus asymmetry, J. Am. Stat. Assoc. 75 (369) (1980) 168–172. [30] B. Reulet, J. Senzier, D.E. Prober, Environmental effects in the third moment of voltage fluctuations in a tunnel junction, Phys. Rev. Lett. 91 (19). [31] H. Shore, Fitting a distribution by the first two moments, partial and complete, Comput. Stat. Data Anal. 19 (1995) 563–577. [32] C.H. Sim, M.H. Lim, Evaluating expanded uncertainty in measurement with a fitted distribution, Metrologia 45 (2008) 178–184. [33] J. Simpson, B. Welch, Table of the bounds of the probability integral when the first four moments are given, Biometrica 47 (3/4) (1960) 399–410. [34] G. Smołalski, A stochastic comparison used in any moment of the signal measurement, Naucˇnaja Apparatura – Scientific Instrum. I (3) (1986) 91–99. [35] G. Smołalski, Probabilistic properties of the segmentation error in an averaged parameters measurement of periodic signals, in: National Congress of Metrology, Gdan´sk, Poland 15–18.09.1998; Proceedings, Technical University of Gdan´sk, 1998, pp. 25–34, vol. 2. [36] G. Smołalski, Analysis of the uncertainty of the mean value of strictly periodic and periodic with noise signals with a known bandwidth, Metrol. Meas. Syst. VIII (2) (2001) 197–211. [37] G. Smołalski, Segmentation error in averaged parameters’ measurements of periodic signals: its upper limits and general probabilistic properties, Measurement 29 (1) (2001) 21–30. [38] G. Smołalski, Measurability conditions of the signal parameter for a given prior knowledge, Measurement 42 (4) (2009) 583–603. [39] G. Smołalski, Parameters of the signal: their non-locality vs. averaging character, Meas. Autom. Monit. 62 (11) (2016) 354–360. [40] F. Teuscher, V. Guiard, Sharp inequalities between skewness and kurtosis for unimodal distributions, Stat. Prob. Lett. 22 (1995) 257–260. [41] A.V. Timofeev, M. Meschke, J.T. Peltonen, T.T. Heikkilä, J.P. Pekola, Wideband detection of the third moment of shot noise by a hysteretic Josephson junction, Phys. Rev. Lett. 98 (2007) 207001. [42] International vocabulary of metrology – Basic and general concepts and associated terms (VIM), Joint Committee for Guides in Metrology (JCGM 200:2008). [43] B.P. van Drieënhuizen, R.F. Wolffenbuttel, Integrated micromachined electrostatic true rms-to-dc converter, IEEE Trans. Instrum. Meas. 44 (2) (1995) 370–373. [44] C. Versluis, S. Straetmans, Skewness measures for the Weibull distribution. Available at SSRN: http://ssrn.com/abstract=2590356. [45] W. Wehrmann, A new line of stochastic-ergodic measuring instruments, NORMA Tech. Inf. VIII (II) (1971) 3–12. [46] K. Weise, W. Wöger, A Bayesian theory of measurement uncertainty, Meas. Sci. Technol. 3 (1992) 1–11. [47] R. Willink, A procedure for the evaluation of measurement uncertainty based on moments, Metrologia 42 (2005) 329–343. [48] R. Willink, Uncertainty analysis by moments for asymmetric variables, Metrologia 43 (2006) 522–530.

Identifying the asymmetry of finite support probability distributions on the basis of the first two moments

Identifying the asymmetry of finite support probability distributions on the basis of the first two moments

Recommend Documents