The International Federation of Congress Automatic Control Proceedings of 20th Proceedings of the the 20th World World Congress Toulouse, France, July 9-14, 2017 The International Federation of Congress Automatic Control Proceedings of the 20th World Available online at www.sciencedirect.com The International Federation of Automatic Control Toulouse, France, July The International of Automatic Control Toulouse, France,Federation July 9-14, 9-14, 2017 2017 Toulouse, France, July 9-14, 2017
ScienceDirect
IFAC PapersOnLine 50-1 (2017) 5474–5479 The Novel Dimensional Reduction Method and Tikhonov Regularisation in The Novel Dimensional Reduction Method and Regularisation The Novel Dimensional Reduction Method and Tikhonov Tikhonov Regularisation in in Parameter Identification ofMethod Non-Linear Ill-Posed Problems The Novel Dimensional Reduction and Tikhonov Regularisation in Parameter Identification of Non-Linear Ill-Posed Problems Parameter Identification of Non-Linear Ill-Posed Problems Parameter Identification of Non-Linear Ill-Posed Problems
Shaun M. Davidson*, Paul D. Docherty*, Jörn Kretschmer**, Rua Murray*** Shaun Shaun M. M. Davidson*, Davidson*, Paul Paul D. D. Docherty*, Docherty*, Jörn Jörn Kretschmer**, Kretschmer**, Rua Rua Murray*** Murray*** Shaun M. Davidson*, Paul D. University Docherty*,ofJörn Kretschmer**, Rua Murray*** *Department of Mechanical Engineering, Canterbury, Christchurch, New Zealand (e-mail: *Department of Mechanical Engineering, University of Canterbury, Christchurch, New (e-mail:
[email protected],
[email protected]) *Department of Mechanical Engineering, University of Canterbury, Christchurch, New Zealand Zealand (e-mail: *Department ofTechnical Mechanical Engineering, University
[email protected]) Canterbury, Christchurch, New Zealand (e-mail:
[email protected], **Institute of Medicine, Furtwangen University, 78054 Villingen-Schwenningen, Germany
[email protected],
[email protected])
[email protected],
[email protected]) **Institute Medicine, Furtwangen University, 78054
[email protected]) **Institute of of Technical Technical Medicine, (e-mail: Furtwangen University, 78054 Villingen-Schwenningen, Villingen-Schwenningen, Germany Germany **Institute of Technical Medicine, Furtwangen University, 78054 Villingen-Schwenningen, Germany (e-mail:
[email protected]) ***Department of Mathematics and Statistics, of Canterbury, Christchurch, New Zealand (e-mail: (e-mail: University
[email protected]) (e-mail:
[email protected]) ***Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
[email protected]) ***Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand (e-mail: (e-mail: ***Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand (e-mail:
[email protected])
[email protected])
[email protected]) Abstract: One of the most popular means of addressing premature declaration of convergence due to low Abstract: One most means declaration of due to error gradients in the parameter identification Tikhonovpremature regularisation. Tikhonov regularisation Abstract: One of of the most popular popular means of ofisaddressing addressing premature declaration of convergence convergence dueutilises to low low Abstract: One of the most popular means of addressing premature declaration of convergence due to low error gradients in parameter identification is Tikhonov regularisation. Tikhonov regularisation utilises additional a-priori information to modify the objective surface, and thus is not a true least squares error gradients in parameter identification is Tikhonov regularisation. Tikhonov regularisation utilises error gradients in parameter identification is regularisation. Tikhonov utilises additional a-priori information to the objective surface, and thus not aa true least squares approach. The Dimensional Reduction Method is a framework beis around parameter additional a-priori information to modify modify theTikhonov objective surface, that and can thus is placed notregularisation true least squares additional a-priori information to modify the objective surface, and thus is not a true least squares approach. The Dimensional Reduction Method is a framework that can be placed around parameter identification approaches, restricting iteration to hyperplanes where low error gradients exist, thus allowing approach. The Dimensional Reduction Method is a framework that can be placed around parameter approach.within Theapproaches, Dimensional Reduction Method framework that can be placedexist, around identification approaches, restricting iteration to hyperplanes hyperplanes where low error error gradients exist, thusparameter allowing iteration these areasrestricting without modification ofisthea objective surface. A comparison between the ability identification iteration to where low gradients thus allowing identification approaches, restricting iteration to hyperplanes where low error gradients exist, thusthe allowing iteration these without modification of surface. A between ability of the twowithin methods to areas accurately identify parameters onobjective the highly non-linear pulmonary recruitment model iteration within these areas without modification of the the objective surface. A comparison comparison between the ability iteration within these areas without modification of the objective surface. A comparison between the ability of the two methods to accurately identify parameters on the highly non-linear pulmonary recruitment model was The Reduction Method able tonon-linear produce lower residuals in the majority of theundertaken. two methods to Dimensional accurately identify parameters onwas the highly pulmonary recruitment model of the two methods to accurately identify parameters on the highly non-linear pulmonary recruitment model was undertaken. The Dimensional Reduction Method was able to produce lower residuals in the majority and statistically significant improvement in able bothtoparameter errorresiduals and model wascases, undertaken. The Dimensional Reduction Method was produce lower in theresiduals. majority was undertaken. The Dimensional Reduction Method was able to produce lower residuals in the majority of cases, and statistically significant improvement in both parameter error and model residuals. Additionally, the two approaches can be implemented simultaneously, which further improved model of cases, and statistically significant improvement in both parameter error and model residuals. of cases,and andparameter significant insimultaneously, both parameter error and model residuals. Additionally, the two can implemented which further improved model residuals errors, demonstrating the modularity of each approach and that both approach illAdditionally, thestatistically two approaches approaches can be beimprovement implemented simultaneously, which further improved model Additionally, the two approaches can be implemented simultaneously, which further improved model residuals and parameter errors, demonstrating the modularity of each approach and that both approach posed problems in a distinct manner. Overall, this provides a strong case for the advantages of the residuals and parameter errors, demonstrating the modularity of each approach and that both approach illillresiduals and Reduction parameter errors, manner. demonstrating the this modularity of aaeach approach and the thatadvantages both approach illposed in Overall, provides strong case for of Dimensional Method over current parameter identification methods for addressing posed problems problems in aa distinct distinct manner. Overall, this provides strong case designed for the advantages of the the problems in a distinct Overall, this provides a strong case designed for the advantages of the Dimensional Reduction Methodmanner. over current current parameter identification methods designed for addressing addressing illposed problems. Dimensional Reduction Method over parameter identification methods for illDimensional Reduction Method over current parameter identification methods designed for addressing illposed problems. posed problems. © 2017,problems. IFAC (International of Automatic Hosting by Elsevier Identification, Ltd. All rightsIdentifiability reserved. Keywords: Gradient Methods,Federation Regularisation, ParameterControl) Identification, Least-Squares posed Keywords: Gradient Methods, Regularisation, Parameter Identification, Least-Squares Identification, Identifiability Keywords: Gradient Methods, Regularisation, Parameter Identification, Least-Squares Identification, Identifiability Keywords: Gradient Methods, Regularisation, Parameter Identification, Least-Squares Identification, Identifiability
1. INTRODUCTION 1. 1. INTRODUCTION INTRODUCTION Parameter identification algorithms are designed to determine 1. INTRODUCTION Parameter identification designed to the parameter values thatalgorithms fit a modelare to data of some observed Parameter identification algorithms are designed to determine determine Parameter identification are designed to determine the values fit to of some observed phenomenon (Docherty, 2009, Schranz et al., Revie et the parameter parameter values that thatalgorithms fit aa model model to data data of 2011, some observed the parameter values that fit a model to data of some phenomenon (Docherty, 2009, Schranz et al., 2011, Revie et al., 2013, Delforge et 2009, al., 1990) or et performance criteria phenomenon (Docherty, Schranz al., 2011, observed Revie et phenomenon (Docherty, 2009, Schranz et al., 2011, Revie et al., 2013, Delforge et al., 1990) or performance criteria (Fischer 1987). etTheal.,error fieldorof performance model-data residuals al., 2013,et al., Delforge 1990) criteria al., 2013, Delforge et al., 1990) or performance criteria (Fischer et al., 1987). The error field of model-data residuals generated by varying each of n model parameters is an (Fischer et al., 1987). The error field of model-data residualsn (Fischer et by al., 1987).termed The of model-data residuals generated by varying eacherror of nnfield model parameters is an an nn dimensional surface the objective surface. Parameter generated varying each of model parameters is generated by varying each of n model parameters is an orn dimensional surface termed the objective surface. Parameter sets that provide the best fit between the model and data dimensional surface termed the objective surface. Parameter dimensional surface termed the objective surface. Parameter sets that provide the best fit between the model and data or desired performance occur at the minimum value of sets that provide the best fit between the model and datathe or sets that performance provide between the model andWhile data ora desired performance occur at the global minimum value of the the objective surface,the andbest arefittermed minima. desired occur at the minimum value of desired at the global minimum value ofbeen theaa objective surface, andoccur are termed global minima. While variety performance ofsurface, parameter identification methods have objective and are termed minima. While objective and are termed global minima. variety ofsurface, parameter identification methods have beena developed toparameter efficiently and accurately locate thisWhile global variety of identification methods have been variety of parameter identification methods have been developed to efficiently and accurately locate this global minima, all towill occasionally to locate suboptimal developed efficiently and converge accurately this results global developed to efficiently and accurately locate this global minima, all will occasionally converge to suboptimal results that are distant from the global minima. minima, all will occasionally converge to suboptimal results minima, all willfrom occasionally that are distant distant from the global globalconverge minima. to suboptimal results that are the minima. The Levenberg-Marquardt that arepopular distant from the global minima. (LMQ) algorithm The popular popular Levenberg-Marquardt (LMQ) algorithm (Marquardt, 1963,Levenberg-Marquardt Levenberg, 1944) is fast(LMQ) and robust, and has The algorithm The popular Levenberg-Marquardt (LMQ) algorithm (Marquardt, 1963, Levenberg, 1944) is fast and robust, andThis has become established as a preferred method for researchers. (Marquardt, 1963, Levenberg, 1944) is fast and robust, and has (Marquardt, 1963, Levenberg, 1944) isorder fastfor and robust, andThis has become established as aa preferred preferred method for researchers. This method iterates quickly using second approximations of become established as method researchers. become established asand a preferred method for researchers. method iterates quickly using second order approximations of the objective surface, converts to first order iteration ifThis the method iterates quickly using second order approximations of method iterates quickly using second order approximations of the objective surface, and converts to first order iteration if the objective surface poorly estimated by second the objective surface,isand converts to first order iteration order if the the objective surface, and converts to first order iteration if the objective surface is poorly estimated by second order approximations. the widespread LMQ, some objective surfaceDespite is poorly estimated use by ofsecond order objective surfaceDespite ismodels poorly estimated by order approximations. Despite the can widespread use ofsecond LMQ,tosome some poorly conditioned cause theuse algorithm halt approximations. the widespread of LMQ, approximations. Despite the widespread use of LMQ, some poorly conditioned models can cause the algorithm to halt convergence prior to identification of the global minima. This poorly conditioned models can cause the algorithm to halt poorly conditioned models can cause the algorithm to halt convergence prior to identification of the global minima. This convergence prior to identification of the global minima. This convergence prior to identification of the global minima. This
can be caused by convergence to a local minimum in the can be caused caused by(Goldstein convergence to aa local local minimum in the objective surfaceby and Price, 1971) minimum or becausein LMQ can be convergence to the can be caused by convergence to a local minimum in the objective surface (Goldstein and Price, 1971) or because LMQ arrives at a point on the objective surface where the direction objective surface (Goldstein and Price, 1971) or because LMQ objective surface (Goldstein and Price, 1971) or because LMQ arrives at a point on the objective surface where the direction of further gradient descent is poorly defined (Docherty et al., arrives at a point on the objective surface where the direction arrives at a point on the objective surface where the direction of further gradient descent is poorly defined (Docherty et al., 2012a, Davidon, 1991, Docherty et al., 2012b, Azarm, 2004). of further gradient descent is poorly defined (Docherty et al., of further gradient descent is poorly defined (Docherty et al., 2012a, Davidon, 1991, Docherty et al., 2012b, Azarm, 2004). 2012a, Davidon, 1991, Docherty et al., 2012b, Azarm, 2004). A wideDavidon, variety 1991, of gradient based parameter identification 2012a, Docherty et al., 2012b, Azarm, 2004). A wide wide are variety of gradient gradient baseddeclaration parameterofidentification identification methods affected by premature convergence A variety of based parameter A variety of gradient parameter methods are affected by premature premature declaration ofidentification convergence duewide to low error gradients, andbased a considerable amount of work methods are affected by declaration of convergence methods are affected by premature declaration of convergence due to low error gradients, and a considerable amount of work work has to been develop and methods addressing this problem. due lowdone errortogradients, a considerable amount of due to low error gradients, and a considerable amount of work has been done to develop methods addressing this problem. One been of thedone mosttopopular these is addressing Tikhonov Regularisation has developofmethods this problem. has been done to develop methods addressing this problem. One of the most popular of these is Tikhonov Regularisation (Golub et al., 1999, Calvetti et al., 2000, Engl et al., 1989), also One of the most popular of these is Tikhonov Regularisation 21999, One of the most popular of these is Tikhonov Regularisation (Golub et al., Calvetti et al., 2000, Engl et al., 1989), also known as L Regularisation or Ridge Regression, which (Golub et al., 21999, Calvetti et al., 2000, Engl et al., 1989), also (Golub et al., 1999, Calvetti et al., 2000, Engl et al., 1989), also known as L Regularisation or Ridge Regression, which 2 2 involves the addition of a-priori information about expected known as L2 Regularisation or Ridge Regression, which known as L Regularisation or Ridge Regression, which involves the addition of a-priori information about expected parameter values in order to modify the objective surface. This involves the addition of a-priori information about expected involves addition a-priori information about expected parameter values in order order to modify modify the objective surface. This can havethe the effect ofof increasing gradients near where the parameter values in to the objective surface. This parameter values inlies, order to modify the objective surface. This can have solution the effect of allowing increasing gradients near where the expected gradient descent towhere continue can have the effect of increasing gradients near the can have the effect of increasing gradients near where the expected solution lies, allowing gradient descent to continue iterating insolution an arealies, it would otherwise consider ‘flat’. expected allowing gradient descent to continue expected solution lies, allowing gradient descent to continue iterating in an area it would otherwise consider ‘flat’. iterating in an area it would otherwise consider ‘flat’. However,inTikhonov is not a pure‘flat’. least squares iterating an area it regularisation would otherwise consider However, Tikhonov regularisation is not not aarelying pure least least squares method ofTikhonov addressingregularisation this issue, instead on a-priori However, is pure squares However, Tikhonov regularisation is not a pure least squares method of addressing this issue, instead relying on a-priori information and modification of the objective surface. A method of addressing this issue, instead relying on a-priori of addressing this issue, instead relying on a-priori information and modification of the objective surface. A method to address this issue of premature convergence in illinformation and modification of the objective surface. A information and modification of the objective surface. A method to address this issue of premature convergence in illconditioned regions of the objective surface, while preserving method to address this issue of premature convergence in illmethod address issue of was premature convergence in illconditioned regionsthis of the objective surface, while preserving the exacttomodel-data error field desired. The Dimensional conditioned regions of the objective surface, while preserving conditioned regions(DRM) of the field objective surface,that while preserving the exact model-data model-data error field was desired. The Dimensional Reduction Method is a was framework can be placed the exact error desired. The Dimensional the exact model-data error field was desired. The Dimensional Reduction Method (DRM) is a framework that can be placed placed around the Method LMQ, which undertakes n-convergence from Reduction (DRM) is a framework that canpaths be Reduction Method (DRM) is a framework that can be around the LMQ, which undertakes n-convergence paths from around the LMQ, which undertakes n-convergence pathsplaced from around the LMQ, which undertakes n-convergence paths from
Copyright © 2017 IFAC 5654 2405-8963 © 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright © 2017 IFAC 5654 5654Control. Copyright 2017 responsibility IFAC Peer review©under of International Federation of Automatic Copyright © 2017 IFAC 5654 10.1016/j.ifacol.2017.08.1085
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017
Shaun M. Davidson et al. / IFAC PapersOnLine 50-1 (2017) 5474–5479
random initial conditions in an n-dimensional parameter space (Davidson et al., 2017). A hyperplane is declared based on the terminal points of these paths. On the (n-1) hyperplane, (n-1) convergence paths are undertaken. This process of dimensional reduction is continued until convergence is undertaken on single line defined across the n-dimensional parameter space. In this way, DRM restricts LMQ to operating only within these low gradient hyperplanes, thus allowing LMQ to detect these smaller error gradients that are otherwise drowned out by large changes in the objective surface in other directions. This paper presents a comparison analysis between the performance of LMQ augmented by Tikhonov Regularisation and the performance of LMQ augmented by the DRM. This analysis is undertaken using a-priori data on a 5 parameter, illconditioned physiological model, and is designed to determine whether the DRM provides any benefit over the typical LMQ method when regularisation is used. This methodology may enable the LMQ to escape the premature convergence points encountered by the standard LMQ algorithm. 2. METHODS 2.1 The Dimensional Reduction Method The dimensional reduction method (DRM) is designed to overcome problems with premature declaration of convergence that has been observed with some gradient descent methods (Docherty et al., 2014, Docherty et al., 2012b). The methodology forms a framework around a parameter identification method. In this research the DRM uses the Levenberg-Marquardt algorithm as a base. The method iteratively reduces the number of dimensions that convergence occurs on by eliminating directions that iteration halts on. Note that all vectors are column vectors. Given a model with an output u, dependent on a set of parameters 𝐲𝐲 ∈ ℝ𝑛𝑛 , the model objective surface, ψ, relative to a set of observations us at times ts can be expressed: 𝛙𝛙(𝐲𝐲) = 𝐮𝐮(𝐭𝐭 S , 𝐲𝐲) − 𝐮𝐮𝐒𝐒
(1)
𝐲𝐲 opt = argmin𝐲𝐲 ‖𝛙𝛙(𝐲𝐲)‖2
(2)
The optimal set of parameters, yopt, which provides the closest match to measured behaviour can be found by determining:
𝐲𝐲 = (𝐀𝐀0 𝐱𝐱 + 𝛇𝛇0 )
(0),opt 𝑛𝑛
𝛇𝛇0 ≔ 𝐲𝐲min
(3a)
}
denoted {𝐱𝐱𝑗𝑗
𝑗𝑗=1
. The affine span of these n optima is an
(𝑛𝑛 − 1)-dimensional hyperplane, and parametrization of this hyperplane allows dimensional reduction and continued parameter optimisation restricted to this hyperplane. 2.1.2 First Dimension Reduction
𝛉𝛉(𝑛𝑛) ∈ ℝ𝑛𝑛 normal to the affine span of
Define vector (0),opt 𝑛𝑛 } {𝐱𝐱𝑗𝑗 𝑗𝑗=1
such that:
(0),opt 𝑇𝑇
(𝐱𝐱𝑗𝑗
) 𝛉𝛉(𝑛𝑛) = 1
(5)
for j = 1,…,n, a system of linear algebraic equations that (𝑛𝑛) uniquely defines 𝛉𝛉(𝑛𝑛) . Using 𝛉𝛉(𝑛𝑛) , define 𝑖𝑖 ∗ = argmax𝑖𝑖 |𝛉𝛉𝑖𝑖 |, corresponding to the component of 𝐱𝐱 (0) that is most perpendicular to the hyperplane being defined, and thus likely the most accurately identified at this point. Parametrisation of the hyperplane is performed using the (n – 1) coordinates excluding 𝑖𝑖 ∗ . Define an (𝑛𝑛 − 1)×𝑛𝑛 matrix 𝐁𝐁 (𝑛𝑛) : (𝑛𝑛) 𝐁𝐁𝑖𝑖𝑖𝑖
1 𝑖𝑖 < 𝑖𝑖 ∗ , 𝑗𝑗 = 𝑖𝑖, = {1 𝑖𝑖 ≥ 𝑖𝑖 ∗ , 𝑗𝑗 = 𝑖𝑖 + 1, 0 otherwise.
Let 𝐱𝐱 (1) be the result of removing the 𝑖𝑖 ∗ component of 𝐱𝐱 (0) , such that 𝐁𝐁 (𝑛𝑛) 𝐱𝐱 (1) ∈ ℝ𝑛𝑛−1. Converting back from 𝐱𝐱 (1) to 𝐱𝐱 (0) is possible via Eq. 6: 𝐱𝐱 = (𝐈𝐈𝑛𝑛 −
𝐞𝐞(𝑛𝑛) 𝛉𝛉(𝑛𝑛)𝑇𝑇 𝐞𝐞(𝑛𝑛) (𝑛𝑛)𝑇𝑇 (1) ) 𝐁𝐁 𝐱𝐱 + 𝛉𝛉(𝑛𝑛)𝑇𝑇 𝐞𝐞(𝑛𝑛) 𝛉𝛉(𝑛𝑛)𝑇𝑇 𝐞𝐞(𝑛𝑛)
(6)
where 𝐞𝐞(𝑛𝑛) = (0, … ,0,1,0, … ,0) ∈ ℝ𝑛𝑛 , the coordinate vector for the axis corresponding to 𝑖𝑖 ∗ and 𝐈𝐈𝑛𝑛 is an identity matrix. To allow conversion between 𝐱𝐱 (1) and valid model parameters y, update:
2.1.1 Initialisation of the Method
𝐀𝐀0 ≔ diag(𝐲𝐲max − 𝐲𝐲min )
(4)
Select n sets of random initial points within the normalised (0) parameter set 𝐱𝐱𝑗𝑗 ∈ [0,1]𝑛𝑛 (𝑗𝑗 = 1, … , 𝑛𝑛), and for each initial point use gradient descent to determine argmin 𝐱𝐱 ‖𝛙𝛙(𝐀𝐀0 𝐱𝐱 + 𝛇𝛇0 )‖2 . Let this set of n gradient descent determined optima be
𝐀𝐀1 ≔ 𝐀𝐀0 (𝐈𝐈𝑛𝑛 −
Given an expected range of values for each element of y, [𝑦𝑦1,min , 𝑦𝑦1,max ]× ⋯×[𝑦𝑦𝑛𝑛,min , 𝑦𝑦𝑛𝑛,max ], the parameter set y can be transformed into the normalised parameter set 𝐱𝐱 ∈ ℝ𝑛𝑛 by transforming the domain to the unit cube in ℝ𝑛𝑛 :
5475
𝐞𝐞(𝑛𝑛) 𝛉𝛉(𝑛𝑛)𝑇𝑇 ) 𝐁𝐁 (𝑛𝑛)𝑇𝑇 𝛉𝛉(𝑛𝑛)𝑇𝑇 𝐞𝐞(𝑛𝑛)
𝛇𝛇1 = 𝛇𝛇0 + 𝐀𝐀0
and
let
𝐞𝐞(𝑛𝑛) 𝛉𝛉(𝑛𝑛)𝑇𝑇 𝐞𝐞(𝑛𝑛)
𝐲𝐲 = 𝐀𝐀1 𝐱𝐱 (1) + 𝛇𝛇1 .
argmax𝑗𝑗 ‖𝛙𝛙 (𝐀𝐀0 𝐱𝐱𝑗𝑗
(0),opt
(7a) (7b)
Define
𝑗𝑗 ∗ ≔
+ 𝛇𝛇0 )‖ , the previous minima with 2
the highest 𝛙𝛙 value. This minima is excluded as a start point for the next iteration, the set of which are defined:
(3b) 5655
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017 Shaun M. Davidson et al. / IFAC PapersOnLine 50-1 (2017) 5474–5479
5476 𝑛𝑛−1
{𝐱𝐱 𝑘𝑘 } (1)
𝑘𝑘=1
= {𝐁𝐁 (𝑛𝑛) 𝐱𝐱𝑗𝑗
(0),opt
}
𝑗𝑗≠𝑗𝑗 ∗
(8)
2.1.3 Subsequent Dimension Reduction Given a set of 𝑛𝑛 − 𝑚𝑚 initial conditions {𝐱𝐱𝑗𝑗
(𝑚𝑚)
𝑛𝑛−𝑚𝑚
}
𝑗𝑗=1
ϲ ℝ𝑛𝑛−𝑚𝑚 , a
matrix 𝐀𝐀𝑚𝑚 ∈ ℝ𝑛𝑛×(𝑛𝑛−𝑚𝑚) and vector 𝛇𝛇𝑚𝑚 ∈ ℝ𝑛𝑛 as the output of the previous algorithm iteration, gradient descent is used to (𝑚𝑚),𝑜𝑜𝑜𝑜𝑜𝑜 𝑛𝑛−𝑚𝑚
}
find the optima {𝐱𝐱𝑗𝑗
𝑗𝑗=1
, by minimising ‖𝛙𝛙(𝐀𝐀𝑚𝑚 𝐱𝐱 +
𝛇𝛇𝑚𝑚 )‖2 , corresponding to this set of initial conditions.
, is arrived If 𝑚𝑚 = 𝑛𝑛 − 1, a single optimum, 𝐱𝐱 opt = 𝐱𝐱1 at. The optimum parameter choice as determined by the DRM is 𝐲𝐲 opt = 𝐀𝐀𝑚𝑚 𝐱𝐱 opt + 𝛇𝛇𝑚𝑚 , and the algorithm stops. (𝑛𝑛−1),opt
Otherwise, iteration continues by defining the affine span of (𝑚𝑚),𝑜𝑜𝑜𝑜𝑜𝑜 𝑛𝑛−𝑚𝑚
{𝐱𝐱𝑗𝑗
}
𝑗𝑗=1
through definition of a vector 𝛉𝛉(𝑛𝑛−𝑚𝑚) :
(𝑚𝑚),opt 𝑇𝑇
(𝐱𝐱𝑗𝑗
) 𝛉𝛉(𝑛𝑛−𝑚𝑚) = 1 (𝑗𝑗 = 1, … , 𝑛𝑛 − 𝑚𝑚)
(9)
(𝑛𝑛−𝑚𝑚) and selecting 𝑖𝑖 ∗ = argmax𝑖𝑖 |𝛉𝛉𝑖𝑖 |. A (𝑛𝑛 − 𝑚𝑚 − 1)×(𝑛𝑛 − (𝑛𝑛−𝑚𝑚) 𝑚𝑚) matrix 𝐁𝐁 is defined by removing the 𝑖𝑖 ∗ row of 𝐈𝐈𝑛𝑛−𝑚𝑚 . Letting 𝐞𝐞(𝑛𝑛−𝑚𝑚) = (0, … ,0,1,0, … ,0) ∈ ℝ𝑛𝑛−𝑚𝑚 be the coordinate vector for the 𝑖𝑖 ∗ coordinate axis, define:
𝐀𝐀𝑚𝑚+1 ≔ 𝐀𝐀𝑚𝑚 (𝐈𝐈𝑛𝑛−𝑚𝑚 −
𝐞𝐞(𝑛𝑛−𝑚𝑚) 𝛉𝛉(𝑛𝑛−𝑚𝑚)𝑇𝑇 ) 𝐁𝐁 (𝑛𝑛−𝑚𝑚)𝑇𝑇 𝛉𝛉(𝑛𝑛−𝑚𝑚)𝑇𝑇 𝐞𝐞(𝑛𝑛−𝑚𝑚)
𝛇𝛇𝑚𝑚+1 = 𝛇𝛇𝑚𝑚 + 𝐀𝐀𝑚𝑚
𝐞𝐞(𝑛𝑛−𝑚𝑚)
𝛉𝛉(𝑛𝑛−𝑚𝑚)𝑇𝑇 𝐞𝐞(𝑛𝑛−𝑚𝑚)
Now define 𝑗𝑗 ∗ ≔ argmax𝑗𝑗 ‖𝛙𝛙 (𝐀𝐀𝑚𝑚 𝐱𝐱𝑗𝑗
(𝑚𝑚),opt
(10b)
=
Fig. 1. Summary of the Dimensional Reduction Method
+ 𝛇𝛇𝑚𝑚 )‖ , and
select a set of start points for the subsequent iteration: (𝑚𝑚+1) 𝑛𝑛−𝑚𝑚−1 {𝐱𝐱 𝑘𝑘 } 𝑘𝑘=1
(10a)
(𝑚𝑚),opt {𝐁𝐁 (𝑛𝑛−𝑚𝑚) 𝐱𝐱𝑗𝑗 } ∗ 𝑗𝑗≠𝑗𝑗
2
2.3 The Pulmonary Recruitment Model The Pulmonary Recruitment (PR) model is a lung model which features five parameters and thirty alveolar layers designed to open at different pressure levels (Schranz et al., 2013, Schranz et al., 2012). The opening of these alveoli is modelled via a Heaviside function, which leads to discontinuities in the error surface (Docherty et al., 2014). The model is defined:
(11)
This process is summarised in Figure 1.
𝑝𝑝𝑎𝑎𝑎𝑎 (𝑡𝑡) = 𝑅𝑅𝑉𝑉̇ (𝑡𝑡) + 𝑝𝑝𝑎𝑎 (𝑡𝑡)
2.2 Tikhonov Regularisation Regularisation refers to a process of introducing additional apriori information in order to solve an ill-posed problem or avoid overfitting. Tikhonov Regularisation (TKN) (Golub et al., 1999, Calvetti et al., 2000, Engl et al., 1989), also known as ridge regression or L2 regularisation, accomplishes this by penalising parameter divergence from a set of expected parameter values ye. This process can be expressed as a modification of the model residual ψ, such that: 𝛙𝛙 𝑇𝑇𝑇𝑇𝑇𝑇 = 𝛙𝛙 + α‖𝐲𝐲 − 𝐲𝐲e ‖2
(11)
where α is the Tikhonov factor (which modifies the magnitude of the regulatory effect), and ye is the a-priori expectation for parameter values.
𝑝𝑝̇ 𝑎𝑎 (𝑡𝑡) = 𝑉𝑉̇(𝑡𝑡) (𝐶𝐶1𝑒𝑒 −𝑘𝑘𝑝𝑝𝑎𝑎(𝑡𝑡) + ∑ ℍ𝜉𝜉 (𝑡𝑡)𝐶𝐶𝑁𝑁 𝑒𝑒 −𝑘𝑘(𝑝𝑝𝑎𝑎 (𝑡𝑡)−𝑇𝑇𝑇𝑇𝑇𝑇−.5𝜉𝜉)) 𝜉𝜉
where ℍ𝜉𝜉 (𝑡𝑡) = {
0, 𝑝𝑝̇𝑎𝑎 (𝑡𝑡) < (.5𝜉𝜉 + 𝑇𝑇𝑇𝑇𝑇𝑇) , 1, 𝑝𝑝̇𝑎𝑎 (𝑡𝑡) ≥ (.5𝜉𝜉 + 𝑇𝑇𝑇𝑇𝑇𝑇)
ξ = 1,2,3, … ,30 and
𝐲𝐲 = [𝐶𝐶1 , 𝐶𝐶𝑁𝑁 , 𝑘𝑘, 𝑅𝑅, 𝑇𝑇𝑇𝑇𝑇𝑇]T .
Model nomenclature is described in Table 1:
5656
(12) −1
(13)
(14)
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017
Shaun M. Davidson et al. / IFAC PapersOnLine 50-1 (2017) 5474–5479
However, it is worth noting that, after the first dimensional reduction, individual DRM iterations begin to use less computational power than TKN iterations, and the dimension of the Jacobian matrices that must be calculated for each iteration is also reduced.
Table 1. PR Model Variable Summary Symbol 𝑝𝑝𝑎𝑎𝑎𝑎 𝑉𝑉̇ 𝑝𝑝𝑎𝑎 ℍ𝜉𝜉 𝑅𝑅 𝐶𝐶1 𝐶𝐶𝑁𝑁
𝑘𝑘 𝑇𝑇𝑇𝑇𝑇𝑇
Meaning Airway Pressure Flow Rate into Lung Alveolar Pressure Heaviside Function Flow Resistance Compliance of Initially Open Alveoli Compliance of Recruitable Alveoli in the nth Layer Distention Threshold Opening Pressure
Units mbar mL·s-1 mbar ϵ [0,1] mbar·mL-1 mL·mbar-1 mL·mbar-1 mbar-1 mbar
The PR Model objective function was created, as in (Docherty et al., 2014), using an input function of: 200 2𝜋𝜋 ̇ 𝑉𝑉 (𝑡𝑡) = { 200cos ( (t − 10)) 20
𝑡𝑡 < 10𝑠𝑠 𝑡𝑡 ≥ 10s
3. RESULTS
2.4 Analysis of Results All analysis was performed using MATLAB® (R2014a, 64bit, The Mathworks®, Natick, MA, USA) on an Intel® Core™ i7-2600 CPU @ 3.40GHz, with 16 GB of RAM. Comparison between the methods was achieved using a Monte Carlo methodology and the PR Model. For each run within the Monte Carlo analysis, random noise was added to the simulated data, and random start points within ±20% of the apriori parameter values were selected. Two comparisons were made within the Monte Carlo analysis:
In order to optimise TKN performance, trials were run using a variety of Tikhonov factors, α, and the best performing value for α selected. While there is a variety of literature surrounding the selection of α, no fully generalizable method exists (Calvetti et al., 2000). ye for regularisation was set equivalent to the randomly generated start point specific to that trial. This decision was made as the start points are within the range (±20%) of the global minima regularly encountered when working with physiological models, and typically parameter identification is begun at the expected value of the solution. Use of this definition of ye demonstrated improved performance relative to LMQ without TKN, as in (Davidson et al., 2017).
(15)
and parameter values of C1 = 10, CN = 2, R = 0.1, TOP = 10 and k = 0.02, with 0.5 mbar standard deviation normally distributed noise at 100 Hz added to the input function.
5477
Table 2 shows a comparison between the residual error values for the DRM, TKN and DRM+TKN approaches. As can be seen, at α = 0.01 TKN appears to have converged to near its maximum performance. DRM is still able to provide lower residuals in 79.2% of cases, and statistically significantly lower residuals overall (paired Student’s t-test, p << 10-3). Figure 2 shows these same residual values on a scatter plot. As can be seen, both methods find similar minimum residuals when performing optimally, but regularisation has a longer tail of poor Ψ values than DRM does. The minimum residuals present in Figure 2 are a product of the injection of noise into the simulated data, and the nature of the numeric approach. They represent a small, minimum achievable but still finite valued residual when convergence is declared.
A comparison across the least-squares values identified by the LMQ gradient descent with DRM and the least-squares values from the LMQ gradient descent with TKN. As DRM uses 5 start points and TKN only 1, TKN was run 5 times for every DRM run, with each of these 5 runs provided equivalent computational power to the single DRM run. A comparison across the least-squares values identified by LMQ gradient descent with DRM and TKN, and the least squares values from LMQ gradient descent with TKN. Once again, TKN was run 5 times for every DRM+TKN run.
For each trial, DRM was allocated 75 LMQ iterations per step (a total of 1125 iterations), while TKN was allocated 1125 iterations for every single start point. The model was run for 1000 start points (200 unique DRM runs and 1000 unique TKN runs). No convergence criteria was set, as the goal was to compare the ability of the methods to accurately locate global minima given equivalent computational resources. 5657
Fig. 2. Comparison of Paired DRM and TKN Residuals
Toulouse, France, July 9-14, 2017 Shaun M. Davidson et al. / IFAC PapersOnLine 50-1 (2017) 5474–5479
5478
Table 2. Comparison of Absolute Values of DRM and TKN Residuals, median (25th perc. – 75th perc.) α 0.005 0.01 0.02 α 0.01
ΨDRM 32.46 (31.79 - 33.45) 32.35 (31.86 - 33.27) 32.29 (31.86 - 33.11) ΨDRM+TKN 31.80 (31.52 - 32.19)
ΨTKN 36.21 (33.21 - 48.15) 35.46 (32.93 - 45.31) 36.68 (33.28 - 45.54) ΨTKN 35.16 (32.73 - 44.17)
Table 3 shows that the lower residuals produced by DRM are not a result of overfitting, as the differences between the parent values and the optimal values are significantly lower in all cases (paired Student’s t-test, p << 10-3), regardless of the introduction of noise to the simulated data.
C1 CN R TOP K
Abs. Percentage Error, DRM 6.08 (3.12 - 10.23) 0.83 (0.38 - 1.54) 0.65 (0.30 - 1.19) 0.06 (0.03 - 0.11) 2.83 (1.41 - 4.64)
Abs. Percentage Error, TKN 11.80 (5.71 - 19.35) 1.78 (0.81 - 2.99) 1.65 (0.70 - 3.03) 0.09 (0.04 - 0.26) 8.27 (3.84 - 13.21)
p-value <<10-3 <<10-3 <<10-3 p-value <<10-3
regularisation. Similarly, the absolute percentage errors in parameter estimation are significantly reduced in all cases (paired Student’s t-test, p << 10-3). Table 4. Comparison of Absolute Percentage Error in Parameters, DRM+TKN and TKN
Table 3. Comparison of Absolute Percentage Errors in Optimal Parameters, DRM and TKN y
ΨDRM < ΨTKN 83.4% 79.2% 84.1% ΨDRM+TKN < ΨTKN 99.7%
y C1 CN R TOP K
pvalue <<10-3 <<10-3 <<10-3 <<10-3 <<10-3
Abs. Percentage Error, DRM+TKN 4.24 (2.08 - 7.66) 0.49 (0.26 - 0.93) 0.41 (0.18 - 0.81) 0.04 (0.02 - 0.08) 1.92 (0.80 - 3.66)
Abs. Percentage Error, TKN 10.57 (4.99 - 18.26) 1.60 (0.73 - 2.83) 1.41 (0.65 - 2.80) 0.10 (0.04 - 0.26) 8.11 (3.94 - 12.79)
pvalue <<10-3 <<10-3 <<10-3 <<10-3 <<10-3
4. DISCUSSION This research has determined how the DRM performs in comparison to TKN regularisation. In particular, it was hypothesised that the benefit of the DRM could be yielded obsolete by the normalising influence of TKN regularisation. It is possible that the regularisation would enable the LMQ algorithm to navigate out of the sub-optimal points of convergence points on the parameter space. The findings of this study show that the DRM does, in fact, provide significant benefits over TKN regularisation. Table 2 demonstrates this ability of DRM to provide advantages relative to TKN regularisation, even in cases where several Monte-Carlo trials have been run to optimise the Tikhonov factor. With lower residuals from equivalent starting points in 79.2% of cases and statistically significant lower residuals overall, without requiring additional a-priori information, the DRM demonstrates effectiveness at finding a least squares solution. Figure 2 and the more significant difference in 75th percentile differences in Table 2 also show that DRMs performance is more consistent, without the long tail of higher Ψ values that can occur with TKN regularisation.
Fig. 3. Comparison of Paired DRM+TKN and TKN Residuals Figure 3 and Table 4 show a comparison between the performance of the combined DRM and TKN method as compared to a pure TKN approach. Here, as shown in Table 2, the residuals are significantly lower overall cases (paired Student’s t-test, p << 10-3), and lower in 997 of 1000 cases (with the last 3 cases having identical residual values). DRM eliminates the long tail of higher Ψ values present in TKN
This long tail of higher Ψ values is due to the increased likelihood of TKN regularisation, compared to the DRM, to converge to a point that was distant from the true minima. At such points, the LMQ gradient descent method was unable to determine the direction of further gradient descent. This inability to determine the direction for continued convergence was due to the highly ill-conditioned nature of the PR model. The DRM was specifically designed to overcome this problem by continuing iteration on a hyperplane that contains the final values of several optimisation runs. Hence, the method provides the gradient descent algorithm with an indication of
5658
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017
Shaun M. Davidson et al. / IFAC PapersOnLine 50-1 (2017) 5474–5479
the direction to continue iteration upon. Ultimately, this choice typically requires more computational time, and is significantly more complex to apply in a computational framework than the typical methods. Table 3 demonstrates that the lower residuals provided by DRM benefit the actual identification of parameters, rather than being due to overfitting. Despite random noise being applied to the measured data for each trial, DRM could produce statistically significant lower percentage differences for all 5 parameters compared to TKN regularisation. DRM+TKN is shown in Table 2 to outperform pure TKN regularisation in 99.7% of cases (with the remaining 0.3% of cases yielding identical results). Table 4 shows that DRM+TKN provides statistically significant lower absolute percentage errors for all 5 parameters than pure TKN regularisation, and Figure 3 again shows the tail of poor ψ values in the TKN results being removed by the of DRM. These results stress that DRM and TKN address the issue of parameter trade-off in different fashions, and thus the benefits of each are not mutually exclusive. DRM and TKN have both been designed to be easily adapted for use with various models and gradient descent methodologies, thus both can be easily combined to provide an advantageous hybrid approach. Only one model has been used for the comparison of the two approaches. While this model is highly non-linear, there is a wide variety of models in use across various fields, and a comparison between DRM and TKN regularisation using some of these would be beneficial. Secondly, the model used in the study employed simulated rather than experimental data. This data had random noise added to provide realistic conditions for the parameter identification algorithms. The use of a-priori true parameter values allowed a comparison to parent values that determined whether overfitting is occurring. Overall, the DRM has been shown to have advantages over current parameter identification methods designed for addressing ill-posed problems. This is further bolstered by the fact that DRM is generic and approaches these problems in a different manner from methods such as regularisation, allowing the easy combination of both. As such, the DRM has clear potential to provide benefits in a variety of situations involving the identification of parameters in models with significant degrees of parameter trade-off. 5. REFERENCES AZARM, S. 2004. Non-gradient based parameter sensitivity estimation for single objective robust design optimization. Journal of mechanical design, 126, 395-402. CALVETTI, D., MORIGI, S., REICHEL, L. & SGALLARI, F. 2000. Tikhonov regularization and the L-curve for large discrete illposed problems. Journal of computational and applied mathematics, 123, 423-446. DAVIDON, W. C. 1991. Variable metric method for minimization. SIAM Journal on Optimization, 1, 1-17. DAVIDSON, S. M., DOCHERTY, P. D. & MURRAY, R. 2017. The dimensional reduction method for identification of parameters that trade-off due to similar model roles. Mathematical Biosciences, 285, 119-127.
5479
DELFORGE, J., SYROTA, A. & MAZOYER, B. M. 1990. Identifiability analysis and parameter identification of an in vivo ligand-receptor model from PET data. Biomedical Engineering, IEEE Transactions on, 37, 653-661. DOCHERTY, P. D., CHASE, J. G. & DAVID, T. 2012a. Characterisation of the iterative integral parameter identification method. Medical & biological engineering & computing, 50, 127-134. DOCHERTY, P. D., SCHRANZ, C., CHASE, J. G., CHIEW, Y. S. & MÖLLER, K. 2012b. Traversing the Fuzzy Valley: problems caused by reliance on default simulation and parameter identification programs for discontinuous models. IFAC Proceedings Volumes, 45, 490-494. DOCHERTY, P. D., SCHRANZ, C., CHASE, J. G., CHIEW, Y. S. & MÖLLER, K. 2014. Utility of a novel error-stepping method to improve gradient-based parameter identification by increasing the smoothness of the local objective surface: a case-study of pulmonary mechanics. Comput Methods Programs Biomed, 114, e70-8. DOCHERTY, P. D. C., J. G.; LOTZ, T.; HANN, C. E.; SHAW, G. M.; BERKELEY, J. E.; MANN, J. I.; MCAULEY, K. 2009. DISTq: An iterative analysis of glucose data for low-cost, real-time and accurate estimation of insulin sensitivity. The Open Medical Informatics Journal. ENGL, H. W., KUNISCH, K. & NEUBAUER, A. 1989. Convergence rates for Tikhonov regularisation of non-linear ill-posed problems. Inverse problems, 5, 523. FISCHER, U., SCHENK, W., SALZSIEDER, E., ALBRECHT, G., ABEL, P. & FREYSE, E. J. 1987. Does Physiological Blood Glucose Control Require an Adaptive Control Strategy? Biomedical Engineering, IEEE Transactions on, BME-34, 575-582. GOLDSTEIN, A. & PRICE, J. 1971. On descent from local minima. Mathematics of Computation, 25, 569-574. GOLUB, G. H., HANSEN, P. C. & O'LEARY, D. P. 1999. Tikhonov regularization and total least squares. SIAM Journal on Matrix Analysis and Applications, 21, 185-194. LEVENBERG, K. 1944. A method for the solution of certain problems in least squares. Quarterly of applied mathematics, 2, 164168. MARQUARDT, D. W. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial & Applied Mathematics, 11, 431-441. REVIE, J. A., STEVENSON, D. J., CHASE, J. G., HANN, C. E., LAMBERMONT, B. C., GHUYSEN, A., KOLH, P., SHAW, G. M., HELDMANN, S. & DESAIVE, T. 2013. Validation of subject-specific cardiovascular system models from porcine measurements. Comput Methods Programs Biomed, 109, 197-210. SCHRANZ, C., DOCHERTY, P., CHIEW, Y., CHASE, J. & MÖLLER, K. A Time-Continuous Model of Respiratory Mechanics of ARDS Patients. World Congress on Medical Physics and Biomedical Engineering May 26-31, 2012, Beijing, China, 2013. Springer, 2166-2169. SCHRANZ, C., DOCHERTY, P. D., CHIEW, Y. S., CHASE, J. G. & MÖLLER, K. 2012. Structural identifiability and practical applicability of an alveolar recruitment model for ARDS patients. IEEE Transactions on Biomedical Engineering, 59, 3396-3404. SCHRANZ, C., KNOBEL, C., KRETSCHMER, J., ZHAO, Z. & MOLLER, K. 2011. Hierarchical parameter identification in models of respiratory mechanics. IEEE Transactions on Biomedical Engineering, 58, 3234-3241.
5659