FGCS OUTURE (~ENEF, A T I O N ~ ( I M P U FER
El AI:VIER
Future Generation Computer Systems 10 (1994) 183-188
~Y~TEMS
Science and industry in HPCN Pietro Rossi * CRS4, lPlaN. Sauro, I0, 1-09123 Cagliari, Italy
Abstract The interplay of science and industry in the development and exploitation of a mutually beneficial HPCN policy is a non-issue. Everyone, I believe, is willing to subscribe to this view. Just a tad harder is to outline in detail the precise lines along which such a partnership should evolve and what the advantages are for both partners. CRS4 is a research institute based in Cagliari (Sardegna) with a vocation for mathematical modeling and numerical simulation. Research at CRS4 means applied research in close contact with industry. It would seem that we are occupying a privileged position to entertain thoughts and develop theories and models on the interplay of science and industry in HPCN. Nonetheless our experience is too time-limited to allow me to make any general conclusion on the subject. In my paper I will first speculate on what kind of development we will see in the coming years in the field of High Performance Computing and Networks. I will describe the interaction between CRS4 and industry, and extrapolate from this some of the needs of industry from HPCN. I will close by describing our view of the interaction between a HPCN and CRS4. Key words: Applied research; CRS4; Distributed computing; LAN; Mathematical modelling; Parallel computing;
WAN
1. Introduction The interplay of science and industry in the development and exploitation of a mutually beneficial H P C N policy is a non-issue. Everyone, I believe, is willing to subscribe to this view. Just a tad harder is to outline in detail the precise lines along which such a partnership should evolve and what the advantages are for both partners. It should come as no surprise that the Commission of European Communities has decided to require
* Email:
[email protected] Elsevier Science B.V. SSDI 0167-739X(94)00008-3
from an advisory committee, under the direction of Prof. Rubbia, a set of guidelines on how to implement a focussed European H P C N policy [1]. CRS4 is a research institute based in Cagliari (Sardegna) with a vocation for mathematical modeling and numerical simulation. It consists of a permanent structure made up of 'groups', forming the backbone of the center with the goal of providing services and expertise to the approved research activities and 'projects', which are focussed, specialized activities engaged with the primary purpose of responding to some clearly identified need of the scientific and industrial world. The active groups are: Mathematical Modeling, Scientific Visualization, Networks and Par-
184
P. Rossi~Future Generation Computer Systems 10 (1994) 183-188
allel Computing; ongoing projects are: Combustion, Computational Chemistry, Macromolecular Biology, Environmental Modeling and Computational Genetics. Groups are also strongly urged to seek industrial contracts. Research at CRS4 means applied research in close contact with industry. It would seem that we are occupying a privileged position to entertain thoughts and develop theories and models on the interplay of science and industry in HPCN. Nonetheless our experience is too time-limited to allow me to make any general conclusion on the subject. In my paper I will first speculate on what kind of development we will see in the coming years in the field of High Performance Computing and Networks. I will describe the interaction between CRS4 and industry, and extrapolate from this some of the needs of industry from HPCN. I will close by describing our view of the interaction between a HPCN and CRS4.
2. Guessing HPCN hardware and software evolution H P C N will be here whether we like it or not, but I believe there are good reasons to like it. Technology screams parallel, but is is quite soft spoken when you ask 'in what form'. This is the first thing we will try to guess in order to progress in our endeavor. Possible scenarios are guided by fundamental principles: Portability and Scalability. It would like to take issue with the second: • For a given amount of purchasing power we will buy more and more powerful computers but the maximum amount of dollars available is finite. • H W evolution is much faster than funding cycles. The result of these considerations is that every time we get new money to purchase H W we go with the latest model and do not expand the old machine. The old machine gets demoted (or promoted) to trustworthy workhorse or teaching tool. For the single user (read: funding agency) the issue is not how to get a machine that can be
scaled arbitrarily large but: what is the best I can buy with my money. MPP manufacturers are well advised to listen to the above suggestion. Scalability, that I have just thrown out of the door, sneaks back in through the window. If I cannot buy a large machine sure enough, once I am through with writing my code, I'd like to be able to run it on one, perhaps by using remote access to a suitable site. Since the machine is larger, I do expect my code to run faster. If the manufacturer has cleverly concocted its creature to scale properly I will be happy, otherwise I'll write a paper claiming that the machine does not scale. Quite possibly, the larger machine I will access will not be the same as mine. If this is the case, only a portable programming style will be able to rescue me. Let us stop a moment to consider the scenario we are envisioning: a widely distributed environment of small to medium sized machines devoted to development and hopefully some large machine where production could be run. We could imagine the existence of Supercomputing centers providing this type of facility, many do already exist. They will be, and mostly are, connected by high speed networks. A question can be asked. Why not join most of the financial resources available to these supercenters in order to assemble a monster like machine, so when we really need computing muscle we find it? After all if the networking keeps up its promises, there is not a big difference between having the machine next door or on another continent. Technically this seems to me very sound. Not politically. I believe, extrapolating from past experiences, that it will be very unlikely for different computer centers based in different countries to join their financial strengths. This might be understandable across national boundaries since you would have to cope with different regulations, funding cycles, administrative bureaucracies, etc. A solution could be advocated through the intervention of the European Community, but who would assume the responsibility of choosing one country rather than another? The natural solution would be, inevitably, the creation of a number of Super-
P. Rossi ~Future Generation Computer Systems 10 (1994) 183-188
centers, thus taking what is called a 360 degree turn, back to where we started. I do not see any solution to the riddle of joining financial resources across Supercenters to create the largest facility possible. Invoking national barriers was only meant to grant some form of dignity to the ensuing quarrel. I really do believe that unifying financial resources would turn into a political disaster even within national borders. It seems unlikely to me that some computer center would contribute its financial assets to an operation based somewhere else where all it would get in return were network access to a machine somewhat larger than the one the center could obtain on its own. A proof of this conjecture rests on the European Teraflop Initiative (ETI). This is an initiative of European scientists trying to keep up with the development of a similar initiative on the part of American scientists. This ETI group, originally started by scientists active in Lattice QCD research, eventually broadened to a larger scientific scenario. The proposal that emerged from the working group was presented to Brussels in September 1991 and called for a first procurement of four 100-200 Gflops machines by end 1993 and a Tflops machine by end 1995. The first phase of this procurement certainly stems from technical reasons related to the content of the proposed research, but in my view, mainly from the consideration that it would be a lot easier, politically speaking, to gather support for a proposal that would call for a large installation in each of the larger E E C countries. This being the trend, does it mean that we have to forego the possibility of pooling resources to scale up our computing facility? For the moment yes, but in the future there might be a way out. We have to go back to my opening statement that the technology screams parallel and portability is a major issue. Let us consider the issue of portability for a while. Portability, in its crudest form, means that you only have to maintain one source code and it will run, and perform well, on every interesting platform. Where 'interesting' is, more often than not, defined by the computational power of the
185
platform itself. This concept is extremely important for two reasons, both of them deeply tied-in with the economics of software development. Developing different codes for each target machine is unthinkably expensive and nobody does it. Besides, it is largely unnecessary, since high level languages do a good job in ensuring, at large extent, portability. The next most expensive activity is code maintenance. None, among the software houses I spoke to, is even remotely interested in considering maintaining two or more versions of the same code. But this is where we are at. When we step into the realm of parallel computing, we are, sadly, doing away with the concept of portability. We are hit extremely hard by the lack of standards and by the profusion of different beliefs on the part of the MPP manufacturers. This does not need to be so. There is enough software technology to define standards for parallel machines that would make porting across different architectures a mere question of recompiling [2,3]. I would like to postpone my view on this issue for a little bit longer and dive fully into the consequences of such an achievement. Programming a massively parallel machine is often an invariant concept with respect to the architecture at hand. A few basic concepts apply whether the machine is a tightly coupled MPP or a loosely coupled cluster of workstations. The next step is obvious. Would it be the same thing if the nodes of our metamachines were distributed over a very wide-area network? Probably yes. It is my conviction that we could conceive, and successfully realize a machine geographically distributed over all Europe (for that matter the whole world) whose processing elements were the supercomputers housed in the various computing centers, industrial research laboratories and so on.
Let us take a moment to see what I mean. Assuming the development of a programming model that would encompass geographically distributed clusters, we could fathom the development of a metacomputer that could grow slowly but steadily as partners, located in different parts of the globe, would decide to join in. This idea would immediately realize the closest approxima-
186
P. Rossi~Future Generation Computer Systems I0 (1994) 183-188
tion to the scalability of funds. In this framework a computing center, regardless of its size, would not have to give up on its local computing capability in order to contribute to something larger but remote. By contributing some proportion of its resources it would, in turn, provide its users, when needed, with the largest system available.
3. Industrial collaborations at CRS4 After non-committal speculations it is high time to come back to earth and delve into reality. At CRS4 we have been quite adamant in stating that our goal is to introduce supercomputing into the daily activity of industrial research. And our bet is fully laid on the MPP technology. The common denominator of our relationship with industrial partners has been based on the consideration that industry (at least in Italy) is unwilling to shoulder all the risk connected with the introduction of a new technology. There are a number of reasons for this to be so, but mainly three considerations. The fact that the H W scenario is not sufficiently defined to justify a massive SW investment, some lingering doubt on the validity of the technology, and the difficulty in recruiting skilled personnel to work with the new technology. In this list rests the substance of our industrial relationships. CRS4 is seen as an experimenter of new technology that can provide guidance to the new emerging architectures, and by integrating over a complex and diversified experience, shorten the time needed to develop an application. As a prototype of this concept we can look at the relationship we are establishing with the Italian petroleum company AGIP in the field of developing an MPP, and algorithm for seismic migration. The research agreement we are about to sign, provides for CRS4 to be the test bed of different architectures and serve as a competence center advising on SW and H W technology. We will develop parallel implementations of the most widely used algorithms on various platforms and transmit the feedback to the industrial partner.
This competence, across machines, would be difficult to acquire within the standard research structure of A G I P which is more sensitive to the pressure coming from their production department, and is certainly unwilling to experiment with different architectures. As a result of our partnership, we will provide not only code but training to their staff and finally, recommendations concerning which platforms best suit their needs. This is the model that we are trying to import in our daily activity. We want to identify an end-user who could profit from MPP technology and establish with them some sort of scientifictechnical partnership on a well-defined project. A strong point of the CRS4 approach is that internally we have acquired strong competence in various fields of potential industrial interest like CFD, combustion modeling, computational chemistry and we can provide, besides the expertise on MPP program development, also a solid background in modeling and an insight on a variety of non-trivial scientific issues. To offer a product, whose content is scientific research, we cannot limit ourselves in porting existing applications but we have to maintain and develop in-house a high-level research activity out of which can be drawn competence that can be transmitted to the industrial world.
4. Lessons from our one-year experience Approaching industrial applications from the software point of view, there is one issue that needs an immediate answer: the role of porting. This is a crucial issue since the code in use in the industrial world has been conceived for sequential machines. There are two views confronting each other. Some people seem to believe that the way to go is to provide 'smart' compilers that, when confronted with a sequential, possibly vectorized, f77 code, would produce a 'parallelized' version. My personal point of view is that parallelization is not a compiler issue but rather an algorithmic issue and as such it is the programmer's task to obtain parallelization.
P. Rossi/ Future Generation Computer Systems 10 (1994) 183-188
Accepting this point of view implies that we are willing to make the software investment needed to tackle the large set of problems ahead of us. There are in my view a few compelling reasons to endorse this view. Primarily we must realize that most of the so-called 'dusty decks' existing are applications that have been developed a number of years ago, with as target a machine capable of delivering sustained performance in the hundreds of Mflops range. The complexity of the models described by these codes is consistent with that limitation. We are now looking at a technological revolution that aims at much higher performances. It would seem excessively reductive to limit ourselves to that type of code. Together with the programming effort of defining native parallel algorithms we must also complete the job and introduce more sophisticated models aimed at a closer representation of the physical phenomena. The level of details attainable by 100 Gflops sustained performance, must be qualitatively better than what is available in most existing codes. It would be an exceedingly reductive interpretation of the meaning of the MPP revolution if we would limit ourselves to a mechanical translation of existing applications. The potential hidden in a new technology must be exploited by a creative effort on the modeling side as well. Relationship between science, engineering and industry must get closer than ever to generate the appropriate pool of competence and to identify real needs. It seems just a waste of energy to tackle the effort of porting, to a parallel machine, a commercial combustion code when the same code has very little to do with the phenomena it is trying to describe. It goes without saying that we cannot tackle this task if we are not given a reasonable assurance that this effort will be protected, at least across the lifetime of MPP technology. Parallel machines will keep evolving, but some basic concepts will remain stable. They will have physically distributed memory with more or less hardware devices to provide a shared memory model, but the substance of exploiting locality will stay. Minimizing communications will remain as the bread and butter of parallel programming. The topology of the interconnect might change, it is not even
187
obvious that the details of this topology will be relevant. In any case, we cannot ask that the burden of worrying about these details be left to the programmer; this would imply that for every new architecture the whole game would need to be played again. Some responsibility has to be left to the compiler. I do not mean to go back to the position that claims that we can compile effectively the old f77 code, I simply mean that the compiler writer must know that the programmer will provide enough information to the compiler, that he will be able to write an effective compiler. The existing languages simply do not do that. This is the root of my skepticism about the possibility of success of an effort based on the existing f77 code. Such a code does not contain enough information for the compiler to exploit, in order to optimize effectively the parallel version of a typical program. Jumping to a higher level of architectural sophistication, distributed processors and distributed memory, we see that an essential ingredient of the algorithm is: how do we intend to move our data. An array whose usage is foreseen as mostly nearest neighbor communication will need to be laid out differently than an array undergoing mostly fast Fourier transform, and the details of the layout are critically machine dependent. Since we maintain that the machine dependency cannot be a worry of the programmer, the programmer must, therefore, feed enough information to the compiler. It is inevitable that we will be looking at some form of extension of the existing languages. The ongoing effort on High Performance Fortran [2] could provide sensible answers to these questions and I urge manufacturers to commit serious resources to its success. I do not see at present any alternative to promote acceptance by the industrial partners of the ambitious plan of developing applications on MPP without such a tool.
5. HPCN and CRS4 CRS4 is a place where science and industry should meet and the meeting ground will be high performance computing and networks. After just one year, since we started operating, we can
188
P. Rossi ~Future Generation Computer Systems 10 (1994) 183-188
claim an impressive list of successes. We are connected with a 2-Mbit line to the outside world, the computing power installed is already quite substantial and we are in the process of installing a number of parallel machines. We have built a credibility towards the industrial world by enlisting first-rate scientists and endorsing a program of applied research. The goal is to provide research capabilities in mathematical modeling and numerical simulations that can be reutilized in industrial environments. Research topics are selected either as a result of a contract with an outside partner or in the flamework of acquiring competences that we deem promising to attract future contracts. As an example, in the parallel computing group, besides other topics, we have an ongoing research on simulation of liquid water. In defining this choice we considered, besides the obvious academic interest of the subject, the validity of the experience we would have gained in programming N-body algorithms and the proximity of this subject with themes of interest to the pharmaceutical industry. The full interplay between scientific research and applied research, between internal generated interests and industrial contract, and the relative weight that all these activities should have within CRS4, is not fully clear in my mind. We are looking for balance as we move along. There are not many experiences like ours out of which we can draw wisdom. It is clear though that we have made a distinct choice. The introduction of supercomputing in the industrial processes must be
accelerated and my vision of supercomputing is massively parallel supercomputing.
Acknowledgements It is a pleasure to acknowledge the critical reading of the manuscript by A. Scheinine, M. Guanziroli, E. Bonomi, C. Nardone, and the helpful suggestions generated from them. The work done at CRS4 is partially supported by the Sardegna Regional Authorities.
References [1] Commissionof the European Communities, Report of the High Performance Computing and Networking Advisory Committee, Oct. 1992. [2] High performance Fortran language specification, Rice University, Houston, TX, Nov. 1992. [3] Draft document for a standard Message-Passing Interface, Draft document of the MPI standard, March 1993. Dr. P. Rossi received his 'Laurea' in physicsat the University of Parma in 1979 and a Ph.D. from New York University in 1984. After spending two years at Cornell University as a postdoctoral research associate in the Physics department he went to the University of California in San Diego first as a postdoctoral research associate and later as a scientific research associate. In April 1990 he joined Thinking Machines Corporation working on porting of application to MPP and in June 1992 he became director of the parallel computinggroup at CRS4. His research activityhas been in the field of particle physics with particular emphasis on numerical simulations. He has also authored a number of papers on mathematical physics and recently his interests have moved towards parallel algorithms.