A multi-parallel multiscale computational framework for chemical vapor deposition processes

A multi-parallel multiscale computational framework for chemical vapor deposition processes

G Model ARTICLE IN PRESS JOCS-404; No. of Pages 5 Journal of Computational Science xxx (2015) xxx–xxx Contents lists available at ScienceDirect J...

674KB Sizes 2 Downloads 67 Views

G Model

ARTICLE IN PRESS

JOCS-404; No. of Pages 5

Journal of Computational Science xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Journal of Computational Science journal homepage: www.elsevier.com/locate/jocs

A multi-parallel multiscale computational framework for chemical vapor deposition processes N. Cheimarios c,∗ , G. Kokkoris a,b , A.G. Boudouvis a a b c

School of Chemical Engineering, National Technical University of Athens, Athens 15780, Greece Institute of Nanoscience and Nanotechnology, NCSR Demokritos, Athens 15310, Greece Scienomics SARL, Paris 75008, France

a r t i c l e

i n f o

Article history: Received 11 April 2015 Received in revised form 17 August 2015 Accepted 30 August 2015 Available online xxx Keywords: Parallel multiscale Chemical vapor deposition Domain decomposition Synchronous master-worker

a b s t r a c t A multiscale computational framework for coupling the multiple length scales in chemical vapor deposition (CVD) processes is implemented for studying the effect of the prevailing conditions inside a CVD reactor (macro-scale) on the film growth on a wafer with predefined topography (micro-scale). A multi-parallel method is proposed for accelerating the computations. It combines domain decomposition methods for the macro-scale (reactor scale) model, which is based on partial differential equations (PDEs), and a synchronous master-worker scheme for the parallel computation of the boundary conditions (BCs) for the PDEs; BCs are coming from the micro-scale model describing film growth on the predefined topography. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Chemical vapor deposition (CVD) is the primary process for producing thin solid films, e.g. for semiconductors and microor nano-electro-mechanical systems. The process is performed in reactors equipped with special designed surfaces, the wafers, where the solid film is deposited via surface reactions. The size of CVD reactors is of the order of some centimeters and in some cases, especially in industrial applications, can reach several meters. On the other hand, the uniformity of a film inside a feature, such as the one used in semiconductor industry, or the roughness development of a coating are key parameters in thin film fabrication that refer to micro- and/or nano-scales. Multiscale modeling techniques ought to be implemented for studying the physical/chemical phenomena in these multiple, co-existing scales. In the present work, a multiscale computational framework [1] is utilized for studying a CVD process in a predefined topography of long, initially rectangular, micro-trenches (see Fig. 1). It consists of a reactor scale model (RSM) [1] and a feature scale model (FSM) [2]. The RSM is based on the partial differential equations (PDEs) of the conservation of mass, momentum, energy and chemical species, and is used to describe the physical/chemical phenomena in the macro-scale of the CVD reactor. For numerically solving the aforementioned set of PDEs, Ansys/Fluent (henceforth Fluent) [3] solver

∗ Corresponding author at: 16, rue de l’ Arcade, Paris 75008, France. E-mail address: [email protected] (N. Cheimarios).

is used. The FSM is based on ballistic transport [4] combined with a surface reaction kinetics model and the level set method [2] for tracking the film profile evolution inside the micro-trenches. However, the multiscale computational framework is not enough; efficient parallel methodologies [5–8] that can handle the associated high computational demands are also required. For accelerating the present computations, we propose a multi-parallel method by combining different parallel techniques in the multiple scales of interest: domain decomposition techniques to accelerate the computations with the RSM along with a “master-worker” scheme to calculate the appropriate boundary conditions (BCs) for the RSM by using the FSM. Different number of processors is used to implement the different parallel methodologies since the computational demands for each kind of computation is different. The aim of the present work is to produce a useful parallel tool for the multiscale analysis of realistic problems in realistic time by combining two different sources of acceleration. 2. Reactor scale model (RSM), feature-scale model (FSM) and the coupling methodology Concerning the RSM, the governing equations, describing the physical/chemical phenomena, are the continuity, the momentum, the energy, and the species transport equations [1]. The aforementioned set of equations is discretized via the finite volume method and is solved numerically with Ansys/Fluent for the velocities, pressure, temperature, and species mass fractions inside the CVD reactor. If the reactor is of industrial scale or of complex geometry

http://dx.doi.org/10.1016/j.jocs.2015.08.011 1877-7503/© 2015 Elsevier B.V. All rights reserved.

Please cite this article in press as: N. Cheimarios, et al., A multi-parallel multiscale computational framework for chemical vapor deposition processes, J. Comput. Sci. (2015), http://dx.doi.org/10.1016/j.jocs.2015.08.011

G Model JOCS-404; No. of Pages 5 2

ARTICLE IN PRESS N. Cheimarios et al. / Journal of Computational Science xxx (2015) xxx–xxx

consumption rate of each species on the wafer is modified in order to take into account, at the level of the macro-scale, the existence of the topography, without the trenches being included in the computational domain of the macro-scale. During the calculations, the information ‘flows’ from the macro-scale to the micro-scale and vice versa. The coupling of the two scales is schematically shown in Fig. 1. Each boundary cell on the wafer neighbors the top surface of a feature cluster [see Fig. 1(b)]. A denotes the total surface of the trenches in the cluster and A¯ the surface of the boundary cell through which the information is transferred from the macro-scale to the micro-scale and vice versa. The boundary condition for the ¯ consumption of the species is imposed on A. The coupling methodology starts with the numerical solution of the equations in the macro-scale. The boundary condition for the species equation, Eq. (1), i.e. the consumption of species on the wafer, is corrected in order to take into account the increased consumption of species inside the topography of the wafer. The boundary condition for the species equation which is imposed at each boundary cell j of the wafer (see Fig. 1) is Di n · ∇ ωi = Mi

m 

S ik rmacro,k = Mi

k=1

Fig. 1. Schematic of the coupling methodology. (a) Macro-scale: CVD reactor. The decomposition of the computational domain in 5 processors can be seen in different colors. (b) Boundary cells at the top of clusters of trenches. The computations in each boundary cell are assigned to a processor.

then the number of the discretized volumes can be very high – of the order of millions – and the solution procedure can be very time consuming on a single processor. For that, many researchers turn to parallel computing techniques to solve this kind of problems [9]. Especially for the numerical study of CVD processes, another factor that can increase the computational demands is the number of the chemical reactions; if detailed chemistry is used – of the order of decades or hundreds of reactions – the number of species mass balances that have to be incorporated into the set of equations is increased and thus the degrees of freedom (DOF) of the system are increased. Salinger et al. [10] and later Pawloski et al. [11] have performed detailed high performance computing (HPC) analysis of single scale CVD processes by using a Galerkin/least-squares finite elements code designed for massively parallel architectures. In our previous work [12], we have also studied how efficient are parallel computation techniques in CVD analysis. Concerning the FSM, it is an integrated “homemade” computational framework [1,2,13] developed in C++ and python. Since in low pressure conditions – as in the present work – RSM cannot be used to model the transport phenomena because Knudsen number is greater that unity and the continuum hypothesis collapses, the FSM is based on ballistic transport – described via an integral equation [13] – for the calculation of the local fluxes of each reactant inside the feature (trenches), a surface chemistry model for the calculation of the deposition rate, and a profile evolution algorithm based on the level set method [14] for ‘growing’ the film inside the trenches. The parameter that affects the most the computational time of the FSM is the aspect ratio – defined as depth/width – of the trenches. As the aspect ratio increases, the number of the DOF for solving the integral equation and the level set equation can be high, thus increasing the computational time. Still, no parallel technique is used for accelerating the standalone FSM computations; in the present work, the trenches are shallow and the computational time for a single trench is small compared to the RSM computations. The coupling of RSM with FSM is performed through the boundary condition for solving the species equation; more specifically, the

m 



ik εk rkS  ¯

A

(1)

k=1

Di and Mi are the diffusion coefficient and molecular weight of species i, respectively, n the unit vector normal to the surface of the wafer, m the number of chemical reactions the species i participates in, rkS  ¯ the rate of surface reaction k, εk the rate correction coeffiA cient of surface reaction k and  ik the stoichiometric coefficient of species i in surface reaction k. The density, , the temperature, T, and the mass fraction, ωi for species i are calculated by RSM and fed to FSM. The local rates for all surface reactions are calculated by FSM. Then, the rate correction coefficient of surface reaction k is calculated at the face of each boundary cell j with the fixed point iteration method [7]. After convergence, i.e. after the relative Euclidean norm of εk [7], – computed via the successive iterations of RSM and FSM – is below 10−3 , the profile evolution inside the trenches is performed with the level set method for a time step t. t is chosen as to satisfy the stability criterion for the solution of the level set equation. The iterative procedure is performed at each time step of the simulation and for every boundary cell on the wafer, i.e., the computations with the FSM must be repeated for every boundary cell on the wafer surface to compute the boundary condition for the RSM. The latter increases the computational time significantly since these boundary cells can vary from some decades in simple 2D reactor models to some thousands in an industrial, 3D reactor. 3. The multi-parallel method The computations in the macro-scale with the RSM are performed via domain decomposition, namely the principal axes technique [3] which are performed with the parallel solver of Fluent. To compute the boundary conditions for the RSM through the FSM, the computations are performed independently for each boundary cell. To exploit this characteristic of the FSM, a synchronous master-worker parallel technique is implemented. The master node in this work is the processor responsible for executing RSM. The term ‘synchronous’ implies that all processors must have finished their computations before communicating with the master node. The computational load, i.e. the computations in each boundary cell, is partitioned to the processors, the workers [see Fig. 1(b)]. Each processor handles the computations for one or more boundary cells. Each processor accesses only the information needed – which is passed from the RSM to FSM – for its assigned problem. When the latter computations are finished, the master node gathers all the appropriate information from all the other workers and returns it to the RSM. RSM computations are idle during the FSM

Please cite this article in press as: N. Cheimarios, et al., A multi-parallel multiscale computational framework for chemical vapor deposition processes, J. Comput. Sci. (2015), http://dx.doi.org/10.1016/j.jocs.2015.08.011

G Model

ARTICLE IN PRESS

JOCS-404; No. of Pages 5

N. Cheimarios et al. / Journal of Computational Science xxx (2015) xxx–xxx

computations and vice versa. The exchange of computational information between the processors is performed by using Message Passing Interface (MPI). 4. Case study The case study is Si CVD from Silane (SiH4 ). The kinetics for volumetric and surface reactions are described in [15]. The reactor is described in [16]. The gas mixture enters the reactor with mass flow rate Qin = 1000 sccm and uniform temperature Tin = 300 K. The wafer diameter is 0.24 m and its temperature is Tw = 1200 K. The operating pressure is Pop = 133 Pa. Concerning the conditions in the micro-scale, the topography consists of long rectangular microtrenches of uniform density (8 trenches per 32 ␮m) along the wafer. The width of the trenches is kept constant at 1 ␮m while for studying the deposition inside the trenches the initial depth is 1.5 ␮m (and so is the aspect ratio) and for the multi-parallel method four cases for the trenches initial depth are studied: 0.5, 1.5, 3, and 5. All computations are performed on a 16-node distributed memory cluster, Pegasus [17], where each node consists of 2 Intel Xeon processors at 3 GHz with 2 Gb of RAM. The nodes are interconnected with a Myrinet network. 5. Results and discussion 5.1. Film deposition inside the trenches In Fig. 2a the deposition rate of Si, DRSi , along the wafer with a predefined micro-topography is shown at various time instances. The existence of the micro-topography increases the area where

3

Si can be deposited and the deposition rate is reduced compared with the case of a flat wafer (no micro-topography). The latter is due to the loading phenomenon [18,19] observed in the diffusion limited regime; the micro-topography increases the area where Si can be deposited and leads to the depletion of the reactants. Upon trench filling, either compactly or by forming a void, the available for deposition area approaches the area of a flat surface, and DRSi approaches the value without micro-topography. Since the deposition rate varies along the wafer, it is expected that the profile evolution inside trenches at distinct positions along the wafer will not be identical. Indeed, the results presented in Fig. 2 b and c verify the latter: The profile evolution inside the trenches at two characteristic positions (center, edge) on the wafer are shown. Due to the low deposition rates at the center of the wafer, the trenches fill later, at t = 81 s, than the trenches at the edge, which fill at t = 67 s. The different time instances that the trenches along the wafer radius fill are shown in Fig. 2d. The difference in the time required for filling is the reason for the jump in deposition rate observed in Fig. 2a at t = 80 s: Only the trenches at a distance greater than 8 cm from the wafer center are filled at t = 80 s. The conditions for the results presented in Figs. 2a–d favor the formation of void (non-conformal deposition). Conformal deposition and compact filling of the trenches can be achieved if we reduce the wafer temperature. In Fig. 2e, the profile evolution at the cluster of trenches extending from 0 to 5 mm from the center of the wafer are shown for Tw = 900 K (compare with Fig. 2b). The conformal deposition in Fig. 2e is due to the low value (∼10−5 ) of the effective sticking coefficient [19,20] of SiH4 , which is the reactant of the dominant surface reaction. Low effective sticking coefficient means that the number of reemissions, and consequently the

D

(d)

t = 100 s

t = 100 s t = 3000 s

t=0s

t=0s

t=0s

(b)

(c)

(e)

Fig. 2. (a) Deposition rate of Si, DRSi , along the wafer at different time instances. (b) Trench profile evolution at the cluster of trenches extending from 0 to 5 mm from the center of the wafer (Tw = 1200 K). Void formation occurs at 81 s. (c) Trench profile evolution at the cluster of trenches extending from 0.115 to 0.12 m (Tw = 1200 K). Void formation occurs at 67 s. The curves for (b) and (c) are at equidistant time spaces (20 s). (d) Time required for trench filling along the wafer radial distance. (e) Trench profile evolution at the cluster of trenches extending from 0 to 5 mm from the center of the wafer (Tw = 900 K). The curves are at equidistant time spaces (600 s).

Please cite this article in press as: N. Cheimarios, et al., A multi-parallel multiscale computational framework for chemical vapor deposition processes, J. Comput. Sci. (2015), http://dx.doi.org/10.1016/j.jocs.2015.08.011

G Model

ARTICLE IN PRESS

JOCS-404; No. of Pages 5

N. Cheimarios et al. / Journal of Computational Science xxx (2015) xxx–xxx

4

redistribution of flux inside the trench, is high. As a result, the flux is almost the same for every elementary surface of the trench and the deposition is isotropic. For high Tw the value of the effective sticking coefficient of SiH4 increases (∼10−2 ), the redistribution of flux is not so effective, and a void is formed. Note, that at Tw = 1200 K the process is performed in the diffusion limited regime [19]. For a thorough investigation on the effect of the process parameters, i.e. operating conditions of the reactor, on the film conformity inside the trenches, the interested reader is referred to [20].

Furthermore, to investigate the scalability of the master-worker parallel scheme for the FSM, computations are performed for different trench aspect ratios, i.e. for different computational load on each processor. The speedup versus trench aspect ratio is shown in Fig. 3b; np,micro is 25 (equal to the boundary cells) and as in Fig. 3a, no parallel computations for RSM are performed (np,macro = 1). As the aspect ratio of the trenches, and hence the computational load in each processor, increases, the speedup approaches the ideal. The processors consume most of the computational time for the actual computations with the FSM; the I/O operations time and the computational time of Fluent become insignificant as the computational load increases. For the case with trench aspect ratio 5 the serial run takes 89.1 h and with the master-worker scheme is reduced to 3.7 h in 25 processors which a remarkable decrement of the computational time. Finally, multi-parallel multiscale computations are performed: Domain decomposition techniques are used to accelerate the computations with the RSM and the master-worker scheme is implemented to calculate the BCs for the RSM by using the FSM. In order to have a sufficiently large computational problem for the RSM, the computations are expanded and performed in a 2D computational domain of the reactor [Fig. 1(a)], without taking account the axial symmetry and a much denser grid is utilized. The computational domain is discretized with 1,263,166 cells (10,105,328 DOF) which overcomes the number of volumes of a 3D computational grid for this particular reactor [21]. In this case the number of boundary cells is 60. The computations are performed for one time step, namely at t = 0 s. At t = 0 s, and particularly for the first iteration step during the convergence procedure, the initial guess for the solution provided to the RSM is in general far from the actual solution. As a consequence, the number of iterations for the convergence of the RSM is maximum. For the next steps, the initial guess is conveniently provided by the solution in the previous step which leads to relatively fast convergence [7]. For the achieved speedup of Fig. 3c, np,macro varies from 1 to 10 while np,micro is kept constant and equal to 10, 20, or 30. When np,macro = np,micro = 1, the run time is 19.11 h and is the serial time used to compute the resulting speedup of Fig. 3c. As it can be seen from Fig. 3c, when np,micro = 10, the speed up quickly reaches a plateau of a poor speedup; the value of np,micro is not adequate to accelerate the computations for the FSM and becomes a bottleneck for the multiscale computations. For np,micro = 20 and 30, and np,macro = 10, the achieved speedup is satisfactory, with a resulting acceleration of 5.6 and 5.9 respectively, i.e. a resulting decrease of the run time to 2.6 h and 2.4 h. The deviation from the ideal speedup comes mostly – since the computations with the master-worker are close to the ideal (see Fig. 3a) – from

5.2. Efficiency of the multi-parallel method The acceleration in terms of speedup is investigated for (a) the master-worker scheme and (b) the multi-parallel multiscale computations. The number of processors for the RSM parallel computations is denoted as np,macro while the number of processors for the parallel computations with the master-worker scheme and the FSM is np,micro . All the time measurements reported are wall clock times. Firstly, the acceleration – in terms of speedup – of the masterworker parallel scheme is studied (Fig. 3a). For that, the axial symmetry of the reactor [Fig. 1(a)] is exploited, thus the number of cells and DOF for RSM is relatively small (1783 cells and 14,624 DOF), and no parallel computations for RSM are performed (np,macro = 1). In Fig. 3a the speedup achieved by applying only the master-worker parallel scheme is shown versus np,micro (1, 5, 10, 15, 20, and 25). The speedup is calculated by dividing the run time with np,micro processors by the serial run time. In Fig. 3a, the circles correspond to the parallel runs with balanced load; the number of the boundary cells is a multiple of the number of processors. The rectangles correspond to the cases with load imbalance. In the latter case some processors undertake extra computational load. For example, if the total number of processors is 10 and the boundary cells are 25 (as in the present case) then each one of the 5 processors will perform the computations for 2 boundary cells and each one of the remaining 5 processors for 3 boundary cells. Thus the time of the overall computations will be bounded by the time that the 5 processors with the 3 boundary cells each will need to perform the computations. The maximum speedup (∼23) is achieved, when the number of processors is equal to the number of boundary cells. In this case, the run time for the serial run is 2.8 days and it is reduced to 2.94 h in 25 processors. The deviation from the ideal speedup is attributed to two main reasons; Input/Output (I/O) operations during the execution of the code of FSM (reading and storage of the solution of the level set equation at each time step) and the time required for the serial run of Fluent.

25

26

ideal

15 10 5

20 18

0

5

10

15 np,micro

20

25

(np,micro = 30)

6

(np,micro = 20)

4 2

16

(a)

ideal

8

22

speedup

speedup

speedup

20

0

10

ideal

24

(b)

14 0

1

2

3

4

trench aspect ratio

5

0

(np,micro = 10) 0

2

4

6

8

(c) 10

np,macro

Fig. 3. (a) Speedup measurements with the master-worker parallel scheme for the parallel computation of the boundary conditions. np,micro varies from 1 to 25 and np,macro is 1. The number of the boundary cells on the wafer surface is 25. The speedup is computed for the multiscale code to perform 30 time steps. Trench aspect ratio is 1.5. (b) Speedup measurements with the master-worker scheme by increasing the computational load (trench’ aspect ratio) on the processors. np,mirco is 25 and np,macro is 1. The number of the boundary cells on the wafer surface is 25. The speedup is computed for 30 time steps. Trench aspect ratio varies from 0.5 to 5. (c) Speedup measurements by using the multi-parallel multiscale computational method. np,macro varies from 1 to 10 and np,micro from 10 to 30. The number of the boundary cells on the wafer surface is 60. The speedup is computed for the multiscale code to perform 1 time step. Trench aspect ratio 1.5.

Please cite this article in press as: N. Cheimarios, et al., A multi-parallel multiscale computational framework for chemical vapor deposition processes, J. Comput. Sci. (2015), http://dx.doi.org/10.1016/j.jocs.2015.08.011

G Model JOCS-404; No. of Pages 5

ARTICLE IN PRESS N. Cheimarios et al. / Journal of Computational Science xxx (2015) xxx–xxx

the RSM and the communication between the processors in order to exchange computational information [12]. It is worth mentioning that the speedup results for np,micro = 30 are very close to those obtained with the standalone parallel RSM, i.e. the parallel solver of Fluent. 6. Conclusions Two example results are presented by utilizing a multi-parallel computational framework for the CVD of Si. In the first one, the effect of micro-topography on the deposition rate along the wafer is demonstrated. In the second, the effect of a (macro-) operating condition, namely the wafer temperature, on the deposition inside long rectangular trenches is discussed. Concerning the parallel computations, the present analysis through measurements of speedup, shows that the proposed multi-parallel method creates a parallel tool which looks promising in enabling complex, realistic, multiscale computations. Although openMP can be used as an alternative to MPI for the implementations of the proposed multiparallel method for 2D ‘small’ problems, industrial scale problems dictate the use of MPI. References [1] N. Cheimarios, G. Kokkoris, A.G. Boudouvis, Multiscale modeling in chemical vapor deposition processes: coupling reactor scale with feature scale computations, Chem. Eng. Sci. 65 (2010) 5018–5028. [2] G. Kokkoris, A. Tserepi, A.G. Boudouvis, E. Gogolides, Simulation of SiO2 and Si feature etching for microelectronics and microelectromechanical systems fabrication: a combined simulator coupling modules of surface etching, local flux calculation, and profile evolution, J. Vac. Sci. Technol. A 22 (2004) 1896–1902. [3] Ansys v12, USA, Fluent’s Documentation (2010). [4] T.S. Cale, G.B. Rauup, A unified line-of-sight model of deposition in rectangular trenches, J. Vac. Sci. Technol. B 8 (1990) 1242–1248. [5] A. Nakano, M.E. Bachlechner, R.K. Kalia, E. Lidorikis, P. Vashishta, G.Z. Voyiadjis, T.J. Campbell, S. Ogata, F. Shimojo, Multiscale simulation of nanosystems, Comput. Sci. Eng. 3 (2001) 56–66. [6] T.O. Drews, S. Krishnan, J.C. Alameda Jr., D. Gannon, R.D. Braatz, R.C. Alkire, Multiscale simulations of copper electrodeposition onto a resistive substrate, IBM J. Res. Dev. 49 (2005) 49–63. [7] N. Cheimarios, G. Kokkoris, A.G. Boudouvis, An efficient parallel iteration method for multiscale analysis of chemical vapor deposition processes, Appl. Numer. Math. 67 (2013) 78–88. [8] J. Borgdorff, M. Mamonski, B. Bosak, K. Kurowski, M. Ben Belgacem, B. Chopard, D. Groen, P.V. Coveney, A.G. Hoekstra, Distributed multiscale computing with MUSCLE 2, the Multiscale Coupling Library and Environment, J Comp. Sci. 5 (2014) 719–731. [9] H.D. Simon, Partitioning of unstructured problems for parallel processing, Comput. Syst. Eng. 2 (1991) 135–148. [10] A.G. Salinger, J.N. Shadid, S.A. Hutchinson, G.L. Hennigan, K.D. Devine, H.K. Mo, Analysis of gallium arsenide deposition in a horizontal chemical vapor deposition reactor using massively parallel, J. Cryst. Growth 203 (1999) 516–533.

5

[11] R.P. Pawlowski, A.G. Salinger, L.A. Romero, J.N. Shadid, Computational design and analysis of MOVPE reactors, J. Phys. IV 11 (2001) 197–204. [12] N. Cheimarios, A.N. Spyropoulos, A.G. Boudouvis, Simulation of chemical vapor deposition processes on high-performance computational clusters, in: CD-ROM Proceedings of the 6th GRACM International Congress on Computational Mechanics, Thessaloniki, Greece, 19–21 June, 2008. [13] G. Kokkoris, A.G. Boudouvis, E. Gogolides, Integrated framework for the flux calculation of neutral species inside trenches and holes during plasma etching, J. Vac. Sci. Technol. A 24 (2006) 2008–2020. [14] S. Osher, P.R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, Springer, New York, 2003. [15] C.R. Kleijn, A mathematical model of the hydrodynamics and gas-phase reactions in silicon LPCVD in a single-wafer reactor, J. Electrochem. Soc. 138 (1991) 2190. [16] H. van Santen, C.R. Kleijn, H.E.A. van den Akker, On multiple stability of mixed-convection flows in a chemical vapor deposition reactor, Int. J. Heat Mass Transfer 44 (2001) 659–672. [17] http://febui.chemeng.ntua.gr/pegasus.htm. [18] R.A. Gottscho, C.W. Jurgensen, T.B. Laboratories, M. Hill, D.J. Vitkavage, Microscopic uniformity in plasma etching, J. Vac. Sci. Technol. B 10 (1992) 2133–2147. [19] N. Cheimarios, G. Kokkoris, A.G. Boudouvis, Multiscale computational analysis of the interaction between the wafer micro-topography and the film growth regimes in chemical vapor deposition processes, ECS J. Solid State Sci. Technol. 1 (2012) P197–P203. [20] N. Cheimarios, S. Garnelis, G. Kokkoris, A.G. Boudouvis, Linking the operating parameters of chemical vapor deposition reactors with film conformality and surface nano-morphology, J. Nanosci. Nanotechnol. 11 (2011) 8132–8137. [21] E.D. Koronaki, N. Cheimarios, H. Laux, A.G. Boudouvis, Non-axisymmetric flow fields in axisymmetric CVD reactor setups revisited: influence on the film’s non-uniformity, ECS Solid State Lett. 3 (2014) P37–P40. Nikolaos (Nikos) Cheimarios is a research scientist in Scienomics SARL. He received his B.Sc. in Chemical Engineering from the University of Patras (2006). He holds a M.Sc. in “Computational Mechanics” (2008) and a PhD in Chemical Engineering (2012), both from the National Technical University of Athens (NTUA). His research interests are in multiscale modeling and systemic analysis of chemical vapor deposition processes and in high performance (CPU and GPU) computing. Currently, in Scienomics SARL, he is developing a connectivity altering Monte Carlo software for the prediction of properties of polymer systems. George Kokkoris received the B.Sc. in Chemical Engineering from National Technical University of Athens (NTUA) in 1998, the M.Sc. in Microelectronics in 2000 from University of Athens and the Ph.D. in 2005 from NTUA. He is a research fellow at the Institute of Nanoscience & Nanotechnology of NCSR Demokritos. His research interests are in multiscale modeling of plasma etching and chemical vapor deposition processes and in surface roughness formation during micro- or nano-fabrication processes. He is the author or co-author of ∼50 peer-reviewed journal articles. Andreas G. Boudouvis is a Professor in the School of Chemical Engineering of the National Technical University of Athens (NTUA) and, since March 2013, the Dean of the School. He is the Director of the Computer Center of the School, the Director of the Inter-Departmental Graduate Studies Program “Computational Mechanics” of NTUA and formerly the Head of the Department of Process Analysis and Plant Design of the School of Chemical Engineering. He holds a Diploma from NTUA (1982) and a PhD from the University of Minnesota, USA (1987), both in chemical engineering. His research interests are in: computational transport phenomena; interfacial phenomena and especially electromagnetic effects at fluid interfaces; nonlinear phenomena including instabilities and pattern formation; multiscale analysis; largescale scientific computing.

Please cite this article in press as: N. Cheimarios, et al., A multi-parallel multiscale computational framework for chemical vapor deposition processes, J. Comput. Sci. (2015), http://dx.doi.org/10.1016/j.jocs.2015.08.011