Atmospheric Research 112 (2012) 1–11
Contents lists available at SciVerse ScienceDirect
Atmospheric Research journal homepage: www.elsevier.com/locate/atmos
Estimation of raindrop size distribution parameters by maximum likelihood and L-moment methods: Effect of discretization Marzuki a, c, e,⁎, Walter L. Randeu a, Toshiaki Kozu b, Toyoshi Shimomai b, Michael Schönhuber d, Hiroyuki Hashiguchi c a b c d e
Institute of Broadband Communication, Graz University of Technology, Austria Interdisciplinary Faculty of Science and Engineering, Shimane University, Japan Research Institute for Sustainable Humanosphere (RISH), Kyoto University, Japan Joanneum Research Graz, Austria Department of Physics, Andalas University, Padang, Indonesia
a r t i c l e
i n f o
Article history: Received 24 October 2011 Received in revised form 15 March 2012 Accepted 10 April 2012 Keywords: Raindrop size distribution Distrometer Bin size Discretization
a b s t r a c t Disdrometer has been widely used to estimate raindrop size distribution (DSD) for broad list of application. However, it classifies or bins drops automatically into size groups and provides the DSD at nominal drop diameters that correspond to the mean of bin width. Selection of the bin width may influence shape and parameterization of DSD since the exact size of individual drop in each bin, of course, is not the same as the mean of its bin size. Therefore, we present a comprehensive follow-up of a previous studyon the effect of bin width selection of 2D-Video Distrometer data. We applied the L-moment method, along with the moment and maximum likelihood methods, to samples taken from simulated and measured gamma raindrop populations. It is found that L-moment is less sensitive to bin width selection than maximum likelihood and moment methods. The bias due to bin width selection for L-moment and maximum likelihood methods is not much influenced by the mean sample size in comparison with that of moment method . With samples from the DSD having large number of raindrop or a larger shape parameter μ, the bias due to bin width selection can be small or negligible. Using the midsize of bin as the representative value for the class (bin) of binned data (ΔD) was acceptable because it gives the parameters closer to drop-by-drop data basis than using mean, mode and median of raindrop size. © 2012 Elsevier B.V. All rights reserved.
1. Introduction Raindrop size distribution (DSD) is a key attribute to connect rainfall rate with cloud processes and remote sensing observations. Studies of DSD have motivated some engineers and meteorologists to develop instruments to observe the raindrop (e.g., Joss and Waldvogel, 1969; Löffler-Mang and Joss, 2000; Schönhuber et al., 2008). There are many different types of instruments available today and disdrometer seems to be the most widely used one. Rainfall properties estimated from disdrometer show great variability. There are three factors ⁎ Corresponding author at: Department of Physics, Andalas University, Padang, Indonesia. E-mail address:
[email protected] (Marzuki). 0169-8095/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.atmosres.2012.04.003
that can explain such variability: (1) climatological factors, (2) physical factors and (3) instrumental factors (Uijlenhoet et al., 2006). The latter includes any instrument malfunctioning, instrument limitations (resolution and sensitivity), and undersampling effect (e.g., Joss and Waldvogel, 1969; Uijlenhoet et al., 2006). In addition, disdrometers provide data at nominal drop diameters that correspond to the mean of bin sizes (or discretization interval) in order to reduce the amount of the data. Discretization procedure and model bias can generate error when estimating the DSD and integral rainfall parameters from disdrometer data (e.g., Kliche et al., 2008; Marzuki et al., 2010a). If we choose a too large bin size, binned data would not represent the shape of the underlying distribution (Shimazaki and Shinomoto, 2007). In this paper, we investigated the magnitude of variability caused by the
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
bin size selection. The analysis was restricted to the special (limiting) case of uniform bin size as applied in the 2D-Video Disdrometer (2DVD). The 2DVD is a precipitation gauge, working on the basis of video cameras. It is a new instrument, capable of measuring not only size and fall velocity but also shape of hydrometeors (Schönhuber et al., 2008). The DSD for a size category i is computed by the formula
j¼1
1 ; Aj vj
0.8
ML−nb (5.18) LM−nb (5.09) ML−b0.2 (5.09)
0.6
LM−b0.2 (5.09) ML−b0.3 (4.98) LM−b0.3 (5.10)
0.4
ML−b0.4 (4.83) LM−b0.4 (5.10)
0.2
ML−b0.5 (4.64) LM−b0.5 (5.12)
ð1Þ
0 3
where mi is the number of drops within size class i, Δt is the integration time, ΔD is the width of the size class, Aj is the disdrometer effective measurement area during the collection of drop j and vj is the fall speed of drop j. The user can adjust ΔD in the software which is provided by the manufacturer, based on their preferences, for example Marzuki et al. (2010b) used ΔD=0.1 mm and Marzuki et al. (in press) used 0.2 mm. Some instruments such as Joss-Waldvogel disdrometer (JWD, Joss and Waldvogel, 1969) and Parsivel (Löffler-Mang and Joss,
4
5
6
7
8
1
Cumulative distribution
1 Ni ¼ : ΔtΔD
mi X
1
Cumulative distribution
2
0.8 ML−nb (4.13) LM−nb (4.07)
0.6
ML−b0.2 (4.07) LM−b0.2 (4.07) ML−b0.3 (4.00)
0.4
LM−b0.3 (4.06) ML−b0.4 (3.89) LM−b0.4 (4.08)
0.2
ML−b0.5 (3.76) LM−b0.5 (4.08)
Cumulative distribution
1
0 2.5
(a)
3.5
4
4.5
5
5.5
6
ˆ Λ
0.8 ML−nb (2.09)
Fig. 2. Same as Fig. 1, but population DSD: gamma, μ = 5, Λ = 4, NT = 100.
LM−nb (2.05) ML−b0.2 (1.93)
0.6
LM−b0.2 (2.04) ML−b0.3 (1.80)
0.4
LM−b0.3 (2.03) ML−b0.4 (1.70) LM−b0.4 (1.99) ML−b0.5 (1.65)
0.2
LM−b0.5 (1.94)
0
1.5
1
2
2.5
3
3.5
1
(b) Cumulative distribution
3
0.8 ML−nb (4.13) LM−nb (4.07)
0.6
ML−b0.2 (3.91) LM−b0.2 (4.07)
0.4
ML−b0.3 (3.75) LM−b0.3 (4.05) ML−b0.4 (3.61)
0.2 0 2.5
LM−b0.4 (4.00) ML−b0.5 (3.56) LM−b0.5 (3.94)
3
3.5
4
4.5
5
5.5
6
ˆ Λ Fig. 1. Cumulative distribution of estimated gamma shape ( μ^ ) and scale parameter (Λ^ ) using ML and LM methods for different values of bin width indicated in the legend. Letter "nb" denotes drop-by-drop data (without binning), b0.2, b0.3, b0.4 and b0.5 indicate the data are binned at 0.2, 0.3, 0.4 and 0.5 mm, respectively. Vertical dashed line indicates the population value. Population DSD: gamma, μ = 2, Λ = 4, NT = 100. Total sample for each initial parameter is 104. Values in parentheses correspond to the mean estimated parameter.
2000) employ nonuniform bin size in which the bin sizes increase as raindrop sizes increase. Furthermore, they can not measure terminal velocity and calculation procedure assumes terminal velocity as a function of drop diameter. Hence, further errors may introduce due to wrong bin width selection. However, the possible error due to bin width selection for other instruments is beyond the scope of this article. Fraile et al. (2009) elucidated the possible effect of discretization for the exponential distribution of hailstone by means of moment method (MM). Recently, such effect for the modified gamma distribution of DSD was studied by Marzuki et al. (2010a). Moment estimator is inherently biased in which it tends to give erroneous DSD parameters when the total number concentration of drops are small (Smith and Kliche, 2005). In general, Marzuki et al. (2010a) found that mean DSD parameters increased with increasing bin width and for very large number of raindrop which is normally accompanied by heavy rain, the bias due to bin width selection was relatively small. It seems the variability of DSD parameters due to bin size selection through MM is in tune with that of the MM error. Therefore, the variability of DSD parameters in Fraile et al. (2009) and Marzuki et al. (2010a) are probably influenced by the bias due to the moment error. Although it is in principle difficult to distinguish accurately between both types of bias, a better knowledge of the magnitude of the variability associated with bin width selection, that is of the fluctuations in DSD parameters in the absence of method bias, would certainly provide a step in the right direction. Current study is a follow up of Marzuki et al. (2010a), by applying the L-moment (LM) estimators (Hosking, 1990) and maximum likelihood (ML)
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
20
100.32 oE, 865 m above mean sea level). Both simulated and real DSD were binned at 0.20, 0.30, 0.40, and 0.50 mm. M234−nb M346−nb ML−nb LM−nb M234−b0.2 M346−b0.2 ML−b0.2 LM−b0.2 M234−b0.3 M346−b0.3 ML−b0.3 LM−b0.3 M234−b0.4 M346−b0.4 ML−b0.4 LM−b0.4 M234−b0.5 M346−b0.5 ML−b0.5 LM−b0.5
18 16 14 12
Mean ˆ
3
M346
10 8
M234
6
2. Methodology 2.1. Modified Gamma Distribution The DSD is not a distribution in the probabilistic sense but a function N(D) in units of m − 3 mm − 1, such that N(D)dD is the number of drops of diameter D to D + dD mm per unit volume (m 3) of air. In terms of probability density function, f(D ; θ), DSD can be written as a product of f(D ; θ) with the total number concentration of raindrops (NT), given by NðDÞ ¼ NT :f ðD; θÞ:
4 ML & LM
2 0 101
102
103
NT Fig. 3. Variation of the mean μ^ for different values of bin width indicated in the legend and the drop-by-drop case (nb), using LM, ML, and MM, with sample size. Population DSD: gamma, μ = 2, Λ = 4.
estimators that are asymptotically unbiased and comparing the result with the MM estimators (e.g., Smith and Kliche, 2005; Kliche et al., 2008). First, the ability of each method for several selected bin sizes, to recover the parameters of known DSDs from which samples are taken, was examined. This step must be done by computer simulation, because DSD parameters in nature are inherently unknown. Second, we analyzed several examples of real raindrop spectra collected by the 2DVD at Kototabang, West Sumatra in the Republic of Indonesia (0.20oS,
ð2Þ
Although the modified gamma distribution is not general enough to perfectly represent the full range of observed DSD, this distribution has been widely accepted by the radar meteorology and cloud physics communities (e.g., Haddad et al., 1996; Tokay & Short). Moreover, Mallet and Barthes (2009) reinforced that approximately 91% of DSDs in nature are of the gamma type. Therefore, in this paper we employed the modified gamma distribution and f(D ; θ) for this distribution is: f ðD; Λ; μ Þ ¼
Λ μþ1 μ −ΛD D e ; Γ ðμ þ 1Þ
ð3Þ
where Λ and μ is the slope parameter in mm− 1 and shape parameter of the distribution function, respectively. The simulation procedure to generate artificial DSD in the present work is the same as that used in Marzuki et al. (2010a). The simulation was started by generating drops (D) that follow the gamma probability density function for selected parameters (NT, μ, Λ). For simplicity's sake, the sampling volume was assumed 1 m3. Smith and Kliche (2005) found that the usage of drop size dependent for sampling volume did not much influence the results as compared to the results obtained using
Table 1 Mean and root mean square error (RMSE) estimated shape parameter for a gamma DSD (Λ = 4, μ = 2 ) as a function of the mean sample size. NT
M234 nb
M346
ML
LM
b0.2
b0.3
b0.4
b.05
nb
b0.2
b0.3
b0.4
b.05
nb
b0.2
b0.3
b0.4
b.05
nb
b0.2
b0.3
b0.4
b.05
Mean μ^ 10 8.47 20 5.66 50 3.76 100 3.08 200 2.6 500 2.3 1000 2.16
8.62 5.75 3.82 3.13 2.64 2.34 2.2
9.15 5.95 3.95 3.23 2.74 2.42 2.28
10.46 6.46 4.23 3.46 2.93 2.6 2.45
14.12 7.42 4.77 3.88 3.28 2.92 2.75
17.09 11.04 6.84 5.21 4.02 3.13 2.71
17.22 11.11 6.89 5.21 4.04 3.16 2.73
18.31 11.22 6.96 5.27 4.1 3.2 2.77
21.44 11.61 7.1 5.37 4.17 3.27 2.84
36.28 13.17 7.44 5.6 4.34 3.42 2.98
3.21 2.49 2.18 2.09 2.04 2.01 2.01
3.07 2.33 2.02 1.93 1.88 1.86 1.85
2.88 2.17 1.88 1.8 1.76 1.73 1.73
2.66 2.01 1.77 1.7 1.66 1.64 1.64
2.41 1.9 1.71 1.65 1.62 1.61 1.6
2.61 2.25 2.1 2.05 2.02 2.01 2
2.66 2.26 2.1 2.05 2.02 2 2
2.71 2.26 2.08 2.03 2 1.99 1.98
2.76 2.24 2.05 1.99 1.96 1.95 1.94
2.62 2.18 2 1.94 1.91 1.89 1.89
RMSE μ^ 10 8.53 20 4.94 50 2.74 100 1.9 200 1.34 500 0.91 1000 0.68
8.81 5.05 2.79 1.94 1.37 0.93 0.69
12.17 5.3 2.93 2.04 1.45 0.98 0.73
16.4 6.02 3.24 2.26 1.6 1.1 0.84
27.41 7.71 3.84 2.68 1.93 1.36 1.07
17.76 10.59 5.97 5.14 2.97 1.97 1.51
18.06 10.69 6.03 4.19 2.96 1.99 1.5
40.64 10.85 6.1 4.25 3.02 2.02 1.53
54.29 12.21 6.27 4.35 3.08 2.08 1.58
103.88 19.99 6.73 4.62 3.26 2.2 1.68
2.77 1.32 0.67 0.43 0.3 0.18 0.13
2.73 1.28 0.64 0.42 0.31 0.23 0.2
2.66 1.21 0.6 0.43 0.36 0.31 0.29
2.66 1.05 0.56 0.44 0.41 0.38 0.37
2.26 0.86 0.51 0.44 0.42 0.41 0.41
2.29 1.22 0.67 0.44 0.31 0.2 0.14
2.5 1.28 0.69 0.46 0.32 0.2 0.14
2.87 1.36 0.72 0.48 0.33 0.21 0.15
3.65 1.48 0.75 0.49 0.34 0.22 0.16
3.89 1.54 0.75 0.49 0.35 0.24 0.19
4
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
12
2.2. Moment Method Estimation The moment method uses combinations of moment of the DSD to estimate the DSD parameters. The i th moment Mi of the DSD is calculated by
10
∞
i
Mi ¼ ∫ D NðDÞdD:
ð5Þ
0
8
M234−nb M346−nb ML−nb LM−nb M234−b0.2 M346−b0.2 ML−b0.2 LM−b0.2 M234−b0.3 M346−b0.3 ML−b0.3 LM−b0.3 M234−b0.4 M346−b0.4 ML−b0.4 LM−b0.4 M234−b0.5 M346−b0.5 ML−b0.5 LM−b0.5
Mean ˆ
M346
6 M234
4
ML & LM
2
If the DSD follows the gamma distribution, Mi is Mi ¼ NT
2
4
6
8
10
ð6Þ
where Γ(y) is the complete gamma function. Explanation of moment estimators for the DSD and drop-by-drop data (without grouping the data into classes) was given by Marzuki et al. (2010a). Two moment estimators are again used here. The M234 estimator is based on the 2nd, 3 rd, 4th order moments. Similarly another estimators is referred to M346, which is based on the 3nd, 4 rd, 6th order moments. 2.3. Maximum Likelihood Estimation
0 0
Γ ði þ μ þ 1Þ : Λ i Γ ðμ þ 1Þ
12
Fig. 4. Variation of the mean μ^ for different values of bin width indicated in the legend and the drop-by-drop case (nb), using LM, ML, and MM, with shape parameter. Population DSD: gamma, μ = 0 − 12, Λ = 4, NT = 100.
Suppose there is a set D1, D2, … , DC of C observations, coming from a certain distribution. The idea behind the method of maximum likelihood is to first find the joint density function for all observations which is called likelihood function defined by C
the same sampling volume for all diameters. By combining (2) and the equation of ∫f(D;θ)dD = 1, we got NðDÞ ∫ dD ¼ 1: NT
ð4Þ
For the total number of raindrops NT, the probability of drops within the interval D ±(ΔD/2) is Cb/NT, where Cb is the number of drops in the specified interval. If we relate this basic definition to (4), the DSD is obtained. We generated 104 drop spectra for each initial value.
φðθÞ ¼ f ðD1 ; D2 ; …; DC ; θÞ ¼ ∏ f ðDi ; θÞ:
ð7Þ
i¼1
2.3.1. For DSD In case of DSD (quantized data), D and C in (7) is the midsize and the number of bins in the sample, respectively. The likelihood function in term of DSD is (Haddad et al., 1996) C
φðθÞ ¼ ∏ ½ f ðDi ; θÞ
Ni
ð8Þ
i¼1
where Ni corresponds to DSD. In practice it is always more convenient to work with the scaled logarithm of the likelihood
Table 2 Mean and root mean square error (RMSE) estimated shape parameter for a gamma DSD (Λ = 4, NT = 102) as a function of the shape parameter. μ
M234 nb
M346
ML
LM
b0.2
b0.3
b0.4
b.05
nb
b0.2
b0.3
b0.4
b.05
nb
b0.2
b0.3
b0.4
b.05
nb
b0.2
b0.3
b0.4
b.05
^ Mean μ 0 1.51 2 3.08 4 4.98 6 7.01 8 8.96 10 11.05 12 13.03
1.89 3.13 4.99 7 8.94 11.03 13.06
2.26 3.23 5 6.99 8.93 11.01 13.02
2.42 3.46 5.02 6.98 8.9 10.96 12.95
2.16 3.88 5.06 6.96 8.86 10.91 12.88
5 5.21 6.66 8.52 10.32 12.33 14.25
5.17 5.21 6.68 8.52 10.32 12.33 14.31
5.67 5.27 6.71 8.53 10.32 12.32 14.29
6.7 5.37 6.75 8.54 10.31 12.3 14.22
7.97 5.6 6.77 8.55 10.31 12.27 14.18
0.03 2.09 4.16 6.22 8.26 10.33 12.36
0.73 1.93 4.06 6.13 8.19 10.26 12.31
1.38 1.8 3.93 6.03 8.1 10.18 12.22
2.37 1.7 3.75 5.9 7.97 10.06 12.1
3.92 1.65 3.52 5.72 7.82 9.91 11.95
0.01 2.05 4.08 6.11 8.12 10.18 12.16
0.5 2.05 4.09 6.11 8.13 10.18 12.17
1.33 2.03 4.09 6.12 8.13 10.19 12.18
3.07 1.99 4.09 6.12 8.13 10.19 12.17
4.98 1.94 4.1 6.12 8.14 10.19 12.17
RMSE μ^ 0 2 2 1.9 4 2.15 6 2.44 8 2.72 10 3.04 12 3.31
2.42 1.94 2.15 2.44 2.7 3.02 3.25
2.8 2.04 2.15 2.43 2.7 3.01 3.24
2.93 2.26 2.17 2.42 2.68 2.99 3.21
2.66 2.68 2.2 2.41 2.66 2.96 3.2
5.64 5.14 4.04 4.23 4.41 4.73 4.99
5.84 4.19 4.05 4.23 4.41 4.71 4.94
6.48 4.25 4.08 4.23 4.41 4.7 4.94
8.05 4.35 4.11 4.24 4.4 4.7 4.91
9.92 4.62 4.13 4.24 4.4 4.68 4.91
0.13 0.43 0.75 1.05 1.36 1.65 1.91
0.75 0.42 0.74 1.03 1.34 1.63 1.89
1.41 0.43 0.74 1.02 1.33 1.61 1.87
2.46 0.44 0.78 1.02 1.31 1.59 1.85
4.13 0.44 0.87 1.05 1.32 1.58 1.84
0.16 0.44 0.75 1.05 1.35 1.64 1.89
0.54 0.46 0.76 1.06 1.36 1.65 1.9
1.41 0.48 0.78 1.08 1.38 1.66 1.91
3.31 0.49 0.8 1.1 1.39 1.68 1.93
5.87 0.49 0.83 1.12 1.42 1.7 1.96
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
Cumulative distribution
=2Λ=4
0.8 0.6 0.4
M234
0.2 0
2
0
4
0.6
nb midsize
0.4
mean mode
0.2
median
0
2
4
6
8
10
12
1
Cumulative distribution
Cumulative distribution
1 0.8 0.6 0.4 0.2
M346
0
0
2
4
6
8
0.8 0.6 0.4 0.2 0
10
0
5
10
15
2
4
6
8
1
Cumulative distribution
1
Cumulative distribution
0.8
0
6
=5Λ=4
Population
1
Cumulative distribution
Population
1
5
0.8 0.6
ML(Black) LM(Gray)
0.4 0.2 0
0.8 0.6 0.4 0.2 0
0
1
2
3
4
Fig. 5. Cumulative distribution of μ^ using ML, LM and MM methods, obtained by using several representative values of ΔD. ”Midsize”, ”mean”, ”mode” and ”median” denote midsize of bin (0.2 mm), mean, mode and median of drops in each class are employed as ΔD, respectively. Vertical dashed line indicates the population value. Population DSD: gamma, μ = 2 and 5, Λ = 4, NT = 100. Total sample for each initial parameter is 104.
function called the log-likelihood, lnφ(θ). The ML method tries to find θ that maximizes φ(θ) or lnφ(θ). From (3), φ(θ) and lnφ(θ) for the gamma distribution, respectively, are "
C
φðθÞ ¼ ∏ i¼1
μþ1
Λ μ −ΛDi D e Γ ðμ þ 1Þ i
lnφðθÞ ¼ ðμ þ 1ÞlnΛ
#N
The scale parameter (Λ) can be obtained by solving the derivative of (10) with respect to Λ, given by Λ¼
i
;
ðμ þ 1Þ∑Ci¼1 N i : ∑Ci¼1 Ni Di
ð11Þ
ð9Þ Substituting (11) in (10), the shape parameter (μ) can be then determined numerically by solving the derivative of (10), with respect to μ, and yielding the following equation:
C X
Ni
i¼1
C C X X −lnΓ ðμ þ 1Þ Ni þ μ Ni lnDi C X −Λ N i Di : i¼1
i¼1
i¼1
ð10Þ
C
∑i¼1 Ni lnDi ∑Ci¼1 Ni C C X X þln Ni Di −ln N i ;
lnðμ þ 1Þ−Ψðμ þ 1Þ ¼ −
i¼1
i¼1
ð12Þ
6
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
where Ψ is the "psi" or "digamma" function defined by ΨðxÞ ¼
d Γ ′ ðxÞ lnΓ ðxÞ ¼ : dx Γ ðxÞ
that X1 : C ≤ X2 : C … XC : C, then rth L-moment of X is defined as (Hosking, 1990) ð13Þ lr ≡ r
Posing α = μ + 1, (12) can be resolved recursively, as detailed in Kliche et al. (2008)
α jþ1 ¼ α j
ln α j −Ψ α j Y
;
α1 ¼
1þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ 4expðY Þ=3 : 4expðY Þ
k
ð−1Þ
k¼0
r−1 EfX r−k:r g; r ¼ 1; 2; … k
ð20Þ
ð14Þ
ð15Þ
br ¼ C
−1
C X ði−1Þði−2Þ…ði−rÞ x : ð C−1 ÞðC−2Þ…ðC−r Þ i:C i¼1
ð21Þ
Writing the L-moments in terms of PWMs we have ð16Þ
After getting α, the shape parameter is calculated by μ = α − 1. It can be seen in (8) that there is no ML estimator for NT. The value of NT can be estimated by the 0 th moment of DSD (M0). 2.3.2. For drop-by-drop data Follow the same procedure as above but taking the likelihood function in (7), we get Λ for drop-by-drop data as Λ¼
r −1 X
where E{.} is the expectation of an order statistic. Since Lmoments are the linear function of Probability Weighted Moments (PWMs) a simple description of L-moments can be expressed in terms of the PWMs (Hosking, 1990). Using order statistics, an unbiased estimate of sample PWMs can be computed by
with Y and α1 given by ! ∑Ci¼1 Ni Di ∑Ci¼1 Ni lnDi ; Y ¼ ln − ∑Ci¼1 Ni ∑Ci¼1 Ni
−1
ðμ þ 1ÞC ðμ þ 1Þ ¼ ; D ∑Ci¼1 Di
l1 ¼ b0 ;
ð22Þ
l2 ¼ 2b1 −b0 ;
ð23Þ
l3 ¼ 6b2 −6b1 þ b0 ;
ð24Þ
l4 ¼ 20b3 −30b2 þ 12b1 −b0 :
ð25Þ
Let D1, D2, …, DC be a random sample of drop size with total number of drops C. Only the first two L-moments (l1 and l2) are needed to calculate the two parameters of the gamma
ð17Þ
where for this case C is the total number of raindrops and D is the average drop diameter. The equation to estimate μ for drop-by-drop data is 0 B B lnðμ þ 1Þ−Ψðμ þ 1Þ ¼ lnB @
1 D C
∏ Di
C C : 1=C C A
ð18Þ
i¼1
Eq. (18) as in (12) can be resolved recursively with Y given by 0
1
B B Y ¼ lnB @
D C
∏ Di
C C : 1=C C A
ð19Þ
i¼1
2.4. L-Moment Estimation L-moment estimators are "linear" combinations of the observations that do not require squaring or cubing of the observations. Therefore, they are less sensitive to the largest observations in a sample than product moment. Let X1, X2, …, XC be a random sample of size C with cumulative distribution function F(x) and quantile function x(F). If the samples are sorted into ascending order, such
Fig. 6. Time-raindrop size cross section of DSD from 2DVD observation on 12 April 2006 (a) and 12 April 2007 (b) at Kototabang. Black circled line is rainfall rate for each event.
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
distribution (μ, Λ), and l1 as well as l2 for this distribution are (Gupta and Kundu, 2003) l1 ¼
μ þ1 ; Λ
ð26Þ
l2 ¼
1 μ þ 1:5 : Λ Γ ð0:5ÞΓ ðμ þ 1Þ
ð27Þ
(30) are replaced with the midsize and the number of bins in the sample, respectively. In addition, number of i is replaced with the total number of raindrops in each class in which for the simulation data it is the same as M0 (NT). For measured DSD, C is not the same as NT (C > NT) because the sampling volume is dependent on the drop size. Thus, C for binned data must be calculated when generating the DSD from the HYD file. This file contains the hydrometer information of 2DVD observation.
The value of μ is estimated by using an iterative procedure from the dimensionless ratio of l2/l1. Once μ is determined, Λ is calculated by Λ¼
μ þ1 : l1
3. Results 3.1. Simulation
ð28Þ
3.1.1. ML and LM: Population μ = 2 and μ = 5 Fig. 1 shows an example of cumulative distribution for the ^ in a case estimated gamma shape (μ^ ) and scale parameter (Λ) of samples from initial parameters of μ = 2, Λ = 4 and NT = 100. The figure includes the results for binned and drop-bydrop samples calculated by the ML and LM methods. The legend of the figure also shows the average μ^ and Λ^ . Although ML and LM estimator give the results close to the true population value, the effect of bin width selection on the DSD parameters is clearly visible. The smaller the bin size, the closer the mean parameters to the initial values. Moment estimators overestimates the DSD parameters and the binning
The value of b0 and b1 which are needed to calculate l1 and l2 are given by b0 ¼
C 1X D ; C i¼1 i:C
ð29Þ
b1 ¼
C X 1 ði−1ÞDi:C : C ðC−1Þ i¼1
ð30Þ
Eqs. (29) and (30) are only applicable for the drop-bydrop data. In case of the quantized data, D and C in (29) and b0.2
20
b0.3
b0.4
b0.5
20
20
M234
20
ML
M346
LM
15
15
15
15
10
10
10
10
5
5
5
5
0
ˆ Bin Λ
7
0
10
20
0
0
10
20
0
0
10
20
0
25
25
25
25
20
20
20
20
15
15
15
15
10
10
10
10
5
5
5
5
0
0
10
ˆ Drop Λ
20
0
0
10
ˆ Drop Λ
20
0
0
10
ˆ Drop Λ
20
0
0
0
10
10
20
20
ˆ Drop Λ
Fig. 7. Scatter plot of μ^ and Λ^ from the DSD collected by 2DVD data on 12 April 2006, for different values of bin width indicated in the legend and the drop-bydrop case (x-axis).
8
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
With small number of raindrop, the bias due to bin width selection for the MM is very large. However, the bias is small in the opposite case, consistent with the result in Marzuki et al. (2010a). The bias due to bin width selection for the LM and ML methods at small NT is also relatively more than at large NT. However, the bias is much more smaller in comparison with the moment estimator as inferred clearly by the RMSE (Table 1). Thus, bias due to bin width selection for the LM and ML methods is not much influenced by the mean sample size. The moment error and error due to bin width selection decrease as the number of drops increases so that at large NT (103) MM provides the parameter close to the initial value and that of ML and LM methods. Fig. 4 is the same as Fig. 3 but for several initial shape parameters and the mean μ^ and RMSE of this figure are summarized in Table 2. As discussed above that small μ indicates the DSD shifted more to the left, which implies more small drops, viceversa. The bin width selection has less effect on either estimator when μ is larger as aforementioned discussion. For all estimators, the mean μ^ variability among several bin sizes is more clearly observed at μ = 0 and slightly apparent at μ = 2 particularly for moment estimator. Thus, with samples from large μ, the bias due to bin width selection can be small or negligible (for the same NT). However, for μ = 0 (exponential case), the effect of bin size selection is very significant including for ML and LM methods. Population of μ = 0 indicates that the DSD consists of many small drops. As mentioned above,the population of
procedure increases the overestimation (Marzuki et al., 2010a). On the other hand, the ML and LM methods underestimate the parameters and the binning procedure increases the underestimation of the parameters. Fig. 2 is the same as Fig. 1 but for the gamma population with μ = 5. The bin width selection has less effect on either estimator when the population has a larger μ. This is due to the fact that the population of small-sized drops would mainly determine the parameters of the distribution if we use the ML and LM methods (Kliche et al., 2008). The relative error on drop diameter due to bin width selection is much more important for small than for larger drops. In case of μ = 5 the DSD is shifted more to the right, which implies fewer small drops and therefore gives more weight to large drops. Figs. 1 and 2 show that the effect of bin width selection is more significant for the ML than for the LM methods. The difference between the DSD parameters obtained from the DSD binned at 0.20 mm and 0.50 mm is more significant for the ML than for LM methods. Thus, the LM method is less sensitive to bin width selection than the ML method. 3.1.2. Comparison MM, ML and LM for several shapes of DSD Fig. 3 illustrates the variation of mean μ^ with NT for MM, ML and LM methods by applying several bin widths. The mean and root mean square error (RMSE) are given in Table 1. The ML and LM estimators give the results closest to the population value, while the moment estimators give the greatest bias (overestimation) as previously reported in Kliche et al. (2008). b0.2
20
b0.4
b0.5
20
M234
20
ML
M346
LM
15
15
15
15
10
10
10
10
5
5
5
5
0
ˆ Bin Λ
b0.3
20
0
10
20
0
0
10
20
0
10
0
20
0
25
25
25
25
20
20
20
20
15
15
15
15
10
10
10
10
5
5
5
5
0
0
10
ˆ Drop Λ
20
0
0
10
ˆ Drop Λ
20
0
0
10
20
0 0
10
20
ˆ Drop Λ
Fig. 8. The same as Fig. 7, but for rain event on 12 April 2007.
0
10
ˆ Drop Λ
20
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
9
of drop size as ΔD in moment estimator provide μ^ close to the population value, this method are known to be biased. From the unbiased method (ML and LM), the result of mode of drop size is far from the population value and drop-by-drop data basis. The bias of midsize of 0.2-0.3 mm bin is relatively small in comparison with larger bin sizes (Figs. 1–4). Therefore, with its simplicity, using midsize of bin as the representative value for the bin size of grouped data (ΔD) particularly for small bin sizes (0.20-0.30 mm) are acceptable, instead of using mean and median of drops that must be calculated for each spectra and each class.
small-sized drops would mainly determine the distribution parameters of ML and LM methods. Thus, binning the data of μ = 0 particularly by large bin size such as 0.4 and 0.5 mm would significantly influence the parameters of ML and LM as it is clearly observed from RMSE (Table 2). Fig. 4 again reinforce that the ML and LM methods provide the results that are closer to the initial value than moment estimators for all values of μ and bin sizes with the exception of μ = 0 for bin of 0.4 and 0.5 mm. Note that the current analysis is based on the same NT assumption for all values of μ. In nature, it is quite possible some DSDs having same μ but different NT and the effect of bin size selection will be also controlled by NT.
3.2. Measured DSD 3.1.3. Mean, modus and mean of drop size Actual size of drop distributed in the DSD (binned data), of course, is not exactly the same as the midsize of the bin. Consequently, if we choose a too large bin size, the difference between actual size of drop and the midsize of the bin will be large. Moreover, because drops also follow gamma distribution in each class, the mean value of drops in that class is not the midsize of the bin. Fig. 5 presents the corresponding cumulative distribution of μ^ for the gamma population with μ = 2 and 5, obtained by using several representative values of ΔD. It can be seen that employing midsize, mean and median of drop size as ΔD give the results closest to that of drop-by-drop data, for all estimators. Although usage of mode
midsize
mean
20
20
mode
median
20
M234
20
M346
ML
LM
15
15
15
15
10
10
10
10
5
5
5
5
0
ˆ Bin Λ
In this section we analyze the bias due to bin width selection of real DSD for two rain events collected by 2DVD at Kototabang on 12 April 2006 and 12 April 2007 (Fig. 6). Marzuki et al. (in press) classified these event as shallow convective and stratiform, respectively. They also discussed the DSD characteristics of these events and it is beyond the scope of this article to revisit it. Mean rainfall rate and drop count per minute of rain event on 12 April 2006 are about 19 mm/h and 1942 drops, respectively. Fig. 7 shows scatter plot of the DSD parameters (μ^ , Λ^ ) of binned data versus drop by drop data basis. Consistent with the simulation (Fig. 3), the effect of binning is not clearly visible for moment estimator
0
10
20
0
0
10
20
0
0
10
20
0
25
25
25
25
20
20
20
20
15
15
15
15
10
10
10
10
5
5
5
5
0
0
10
ˆ Drop Λ
20
0
0
10
ˆ Drop Λ
20
0
0
10
ˆ Drop Λ
20
0
0
0
10
10
20
20
ˆ Drop Λ
Fig. 9. Scatter plot of μ^ ) and Λ^ from the DSD collected by the 2DVD on 12 April 2006, obtained by using several representative values of ΔD. ”Midsize”, ”mean”, ”mode” and ”median” denote that midsize of bin (0.2 mm), mean, mode and median of drops in each class are employed as ΔD, respectively.
10
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
NT, MM is less sensitive than ML method. However, as discussed above, the moment estimator is inherently biased in which it tends to give erroneous DSD parameters unless the total number concentration of drops are small (Fig. 3) and it applies to all shape of DSD (Fig. 4). As we observed from this study, the method bias more influences the DSD parameter than bin width selection. Although μ^ obtained by ML in Figs. 3 and 4 are biased by the bin size selection, its parameter is still closer to the initial value than all parameters obtained by MM. Thus, even though ML is sensitive to bin width selection, the method is still better than MM in which it gives parameters closer to the initial value (for all bins), particularly for small NT. When the number of drop is large (> 10 3), the moment error is small so that the bias due to bin width selection of ML is more visible than moment estimators as in Figs. 7 and 8.
and LM method because the number of drop is very large (> 10 3) and more robust in ML method. For moment estimator, such effect can be observed at large μ^ (> 10) which is belong to the light rain (small NT) at the end of this event. From this event we can see again that the parameters of small bin sizes (0.2 and 0.3 mm) are very close to that obtained from the drop-by-drop data basis. Mean rainfall rate and the number of drops of rain event on 12 April 2007 are smaller than during 12 April 2006 which are about 5.4 mm/h and 1317 drops, respectively. Thus, the effect of binning procedure is more clearly observed in which more scatter is apparent (Fig. 8). From this two events we can see again that the DSD parameters of small bin sizes (0.2 and 0.3 mm) are very close to that of drop-by-drop data basis. Figs. 9 and 10 present the scatter plot of μ^ and Λ^ obtained by using several representative values of ΔD versus those obtained from drop by drop data, for rain event on 12 April 2006 and 12 April 2007, respectively. Consistent with the simulation, employing midsize as ΔD give the results closest to that of drop-by-drop data, for all estimators. Furthermore, the usage of mode, mean and median of drop size as ΔD give much different parameters from that of drop-by-drop data basis. Therefore, it emphasizes that using midsize of bin as the representative value for the bin of grouped data (ΔD) are better than using mean, mode and median of drops. From Figs. 7 and 8 as if the performance of MM is better than ML method. In term of the bin width selection at large midsize
20
The main goal for the present work was to evaluate the biases and uncertainties due to bin width selection in estimating the parameters of population raindrop size distribution functions, using the ML, LM and MM methods. This paper is a comprehensive follow-up of the previous study on the bias due to bin width selection which employed MM estimators that are known to be biased. Therefore, comparison of MM and two less biased methods (ML and LM methods) can help us to clarify the
mean
mode
median
20
20
20
M234
M346
ML
LM
15
15
15
15
10
10
10
10
5
5
5
5
0
ˆ Bin Λ
4. Conclusions
0
10
20
0
0
10
20
0
0
10
20
0
25
25
25
25
20
20
20
20
15
15
15
15
10
10
10
10
5
5
5
5
0
0
10
ˆ Drop Λ
20
0
0
10
ˆ Drop Λ
20
0
0
10
20
ˆ Drop Λ
Fig. 10. The same as Fig. 9, but for rain event on 12 April 2007.
0
0
0
10
10
ˆ Drop Λ
20
20
Marzuki et al. / Atmospheric Research 112 (2012) 1–11
previously reported result because we can study the fluctuations in the DSD parameters in the absence of method bias. Our result shows that calculating the parameter of the gamma distribution from data grouped into classes using MM method, considerable relative errors resulted and consistent with the previous study, particularly for small number of drops. In the other hand, LM method is less sensitive to the bin width selection than maximum likelihood and moment methods. The bias due to bin width selection for LM method and ML methods is not much influenced by the mean sample size in comparison with that of MM . Thus, because MM is more sensitive to bin width selection, it is important to keep in mind the sensitivity to the bin width selection by which DSDs are determined when comparing the DSD and integral rainfall parameters from various studies calculated by MM, all of which may have different bin size for quantization of DSD. However, with samples from large number of drops or from the DSD having a larger shape parameter μ, the bias due to bin width selection for all estimators can be small or negligible. Finally, in case of 2DVD, using the midsize of bin as the representative value for the class (bin) of binned data was acceptable because it gives the results close to drop-bydrop data in comparison with mean, mode and median of raindrop size. Unlike the 2DVD, the widths of the bin used in some other instruments such as the Joss and Waldvogel disdrometer and Parsivel are not uniform and increase as drop size increases. The effect of discretization for nonuniform bin size will be reported on at a later date. Acknowledgements 2DVD observation at Kototabang is supported by Grant-Aid for Scientific Research on Priority Areas funded by the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan. The first author was with Graz University of Technology, Austria, and was supported by an ASEA-Uninet scholarship from the Austrian Government from Oct 2007 - Sept 2010, and now with the Research Institute for Sustainable Humanosphere
11
(RISH), Kyoto University, Japan, as a postdoctoral fellowship of the Japan Society for the Promotion of Science. References Fraile, R., Palencia, C., Castro, A., Giaiotti, D., Stel, F., 2009. Fitting an exponential distribution: Effect of discretization. Atmos. Res. 93, 636–640. Gupta, R.D., Kundu, D., 2003. Closeness of gamma and generalized exponential distribution. Commun. Stat. Theory Methods 32, 705–721. Haddad, Z.S., Durden, S.L., Im, E., 1996. Parameterezing the raindrop size distribution. J. Appl. Meteorol. 35, 3–13. Hosking, J.R.M., 1990. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B 52, 105–124. Joss, J., Waldvogel, A., 1969. Raindrop size distribution and sampling size errors. J. Atmos. Sci. 26, 566–569. Kliche, D.V., Smith, P.L., Johnson, R.W., 2008. L-Moment estimators as applied to gamma drop size distributions. J. Appl. Meteor. Climatol. 47, 3117–3130. Löffler-Mang, M., Joss, J., 2000. An optical disdrometer for measuring size and velocity of hydrometeors. J. Atmos. Oceanic Technol. 17, 130–139. Mallet, C., Barthes, L., 2009. Estimation of Gamma Raindrop Size Distribution Parameters: Statistical Fluctuations and Estimation Errors. J. Atmos. Oceanic Technol. 17, 1572–1584. Marzuki, Randeu, W.L., Schonhuber, M., Bringi, V.N., Kozu, T., Shimomai, T., 2010a. Raindrop size distribution parameters of distrometer data with different bin sizes. IEEE Trans. Geosci. Remote. Sens. 48, 3075–3080. Marzuki, Kozu, T., Shimomai, T., Hashiguchi, H., Randeu, W.L., Vonnisa, M., 2010b. Raindrop size distributions of convective rain over equatorial Indonesia during the first CPEA campaign. Atmos. Res. 96, 645–655. Marzuki, Randeu, W.L., Schonhuber, M., Kozu, T., Hashiguchi, H., Shimomai, T., in press. Raindrop axis ratios, fall velocities and size distribution over Sumatra from 2D-Video Disdrometer measurement. Atmos. Res., http:// dx.doi.org/10.1016/j.atmosres.2011.08.006 Schönhuber, M., Lammer, G., Randeu, W.L., 2008. The 2-D-Video-Disdrometer. In: Michaelides, S. (Ed.), Precipitation: Advances in Measurement, Estimation and Prediction. Springer. ISBN: 978-3-540-77654-3. Shimazaki, H., Shinomoto, H., 2007. A method for selecting the bin size of a time histogram. Neural Comput. 19, 1503–1527. Smith, P.L., Kliche, D.V., 2005. The bias in moment estimators for parameters of drop size distribution functions: sampling from exponential distributions. J. Appl. Meteorol. 40, 1195–1205. Tokay, A., Short, D.A., 1996. Evidence from tropical raindrop spectra of the origin of rain from stratiform versus convective clouds. J. Appl. Meteorol. 35, 355–371. Uijlenhoet, R., Porrà, J.M., Sempere-Torres, D., Creutin, J.-D., 2006. Analytical solutions to sampling effects in drop size distribution measurements during stationary rainfall: Estimation of bulk rainfall variables. J. Hydrol. 328, 65–82.