FGCS @ITTIJRE @ENERATION @OMPUTER
Future Generation
ELSEVIER
Computer Systems 12 (1997) 335-344
@YSTEMS
Intelligent storage devices for scalable information management systems Robert Kukla*, Jon Kerridge’ Department of Computer Studies, Napier University, 219 Colinton Road, Edinburgh, EH14 1 DJ, UK
Abstract For most large commercial organisations the ability to store and manipulate massive volumes of data is a key operational requirement. Database systems are fundamental to the efficient provision of a range of business tasks from on-line transaction processing, decision support and enterprise information systems. High performance multiprocessor database servers open up new possibilities for the realisation of these commercial requirements. If these opportunities are to be realised intelligent storage devices will be required to provide a uniform interface regardless of implementation and which also directly supports the basic need of scalability. In this paper we explore some of the issues for such a device. Keywords:
Database management systems; Parallel processing; Storage device; ODBC; Transputer
1. Introduction
and rationale
The goal of any designer building a scalable parallel database machine is to ensure that the interface between the relational processing part of the machine and the storage system is independent of the actual storage media. The interface Cl] should be as high a level as possible to ensure that as much processing as possible can be encapsulated in the storage level components. The storage design should also consider such aspects as scalability, access optimisation, backup and recovery, concurrency management, mixed on-line processing and decision support, support for triggered actions and low-level support for the full range of Structured
Corresponding author. Email: r. kukla @dcs.napier.ac.uk. 1 Email: J.
[email protected].
l
Query Language (SQL) data manipulations. The design should also consider the requirements for advanced data types to support multi-media, object-based scientific applications and other applications requiring abstract data types. These requirements are motivated by the increasing commercial requirement for scalable information systems that hold data which is to be used both for on-line transaction processing (OLTP) and decision support systems (DSS). By scalability we identify two basic needs, firstly, the ability to size the computer system to the initial needs of the application with an appropriate performance and secondly to be able to easily change the size of the system as organisational needs impose changes on the performance and capacity of the installed system. Previous projects have shown how this requirement can be achieved [Z] but only for those applications where the amount of decision support is limited.
0167-739X/97/$17.00 Copyright 0 1997 Elsevier Sciena B.V. All rights reserved PIISOl67-739X(96)00020-9
336
R. Kukla. J. KerridgejFuture
Generation Computer Systems 12 (1997) 335-344
Access optimisation, commonly in the form of indexes, is crucial if the overall performance of the system is to be adequate. The requirements for OLTP and DSS differ widely and it is thus not sensible to assume that a single indexing strategy will suffice. The system implementor must have the capability of selecting from a range of such optimisations. As systems become larger, in terms both of the amount of data held and its rate of change, aspects such as backup and recovery become more critical. In particular, such systems cannot have large periods of non-availability while part of the system is being backedup. Thus we must have a system in which each storage device has a multi-level backup and recovery capability. This can be achieved, at a primary level, by having autonomous backup and recovery located within each storage device. Similarly, as the DSS applications become more sophisticated some form of support for monitoring the state of the database which is not dependent on continual querying of the database by application becomes essential. The move to active databases has identified the commercial need but there is still a need for low-level support, within the storage device itself if such capability is not to be too large an overhead. Underpinning these changes there has to be a radical redesign of the way in which interaction between transactions is managed. Traditional concurrency management systems rely on the maintenance of locks. In general lock managers are hard to scale as is evidenced by the amount of effort database suppliers have expended in trying to implement parallel lock managers. A more radical approach is to use an optimistic strategy [3] which can be more easily distributed to individual storage devices and which requires limited inter-storage device communication only when an isolation level violation is detected. Many SQL data manipulation operations require access only to a single table which means that this processing can be off-loaded to the table storage device if it has some form of processing capability. Furthermore, if the storage device contains its own meta-data repository it is easy to extend the data types which can be manipulated because appropriate processing code to support abstract data types can easily be
incorporated into the processor associated with the storage device. We have thus provided a motivation for a storage device which contains its own processing capability and which is able to undertake the processing requirements outlined above.
2. Fundamental concepts for an intelligent storage device We propose a low-level Data Access Component (DAC) as shown in Fig. 1. Each DAC comprises a processor, local code space and working memory, bulk table storage which can be provided by any storage medium such as disk or semi-conductor memory and finally nonvolatile backup storage is provided for storing transaction recovery information and checkpoints. This provides first level backup and recovery local to the DAC. For a more secure system a second level of remote backup capability will be required. Depending on the types of storage chosen the DAC will have different performance characteristics which can be exploited by the database machine designer. A database machine will comprise many such DACs with tables partitioned over the DACs to improve intra-query parallelism. Performance is gained by having many processors working at the same time on their own data sets. This is the basis of first generation parallel database machines such as the Teradata machines. These machines tended to have fixed hardware and be limited in the way in which data could be placed on the hardware. They had
I
Data Access Component
Access Interface
Fig. 1. Structure of a generic DAC.
I
R. Kukla, J. KerridgelFuture Generation Computer Systems 12 (1997) 335-344
a limited view of how the data were to be processed being optimised either for OLTP or DSS but not both at the same time. Much more flexibility is required in the way data are to be placed on the storage devices if systems designers are to gain acceptable performance. For example, the designer must be able to choose an appropriate partitioning strategy such as range, round-robin and hash keybased, where each has benefits in different application environments. The key to performance gain is that the designer must be able to split the data over a large number of suitably sized storage devices to gain the desired performance. This performance gain will only be achieved if each storage device has its own processor. The improvement in processor performance has not been matched by an equivalent improvement in disk performance. Disks have become larger and cheaper but their access performance has not improved by the same amount as processor performance. The key to the design is the level at which the access interface interacts with the DAC. We have chosen to use a protocol which is much closer to an SQL command than, say, the SCSI access to a disk. The interface refers to tables and columns and passes predicates as reverse polish expressions. The access interface is at a sufficiently high level that the detailed way in which data are accessed in the bulk table storage is devolved to the DAC [l]. The access interface, called W-SQL, is based in the main upon a synchronous client-server style in which a DAC receives a request for some operation to be undertaken and then responds to that request with a reply containing the result of the operation. SQL SELECT requests for data from a DAC always result in (part of) a table being returned to the client. Relational join processing is always undertaken external to the DAC. The DAC does, however, maintain system-wide unique row-identifiers which permit a sophisticated form of foreign-key processing which means that foreign-key joins can be processed very efficiently without the use of indexes. The same mechanism can be used to implement the star-schema commonly found in data warehouses [4]. SQL INSERT, DELETE and UPDATA requests have the desired effect on the table and any associated indexes. A DAC can deal with concurrent requests from different transactions but always
337
responds to a request from a transaction before accepting the next request from that transaction. Apart from undertaking relational select and project operations the DAC also provides low-level support for cursors, grouping and aggregations though the total effect of such an operation may only be realisable after further processing where the results from different DACs are combined. The only asynchronous operation within the DAC results when the monitoring system detects a state which results in a monitor firing. A message is sent to an external process containing the requested data for that monitor and that external process has to be able to accept the data asynchronously. The monitors currently implemented can fire pre- or postcommit, can take account of the type of SQL command that effected the change and can differentiate between cumulative, relative and fixed point changes in values in the database. The monitor is provided as low-level support and complex active rules involving several table partitions are resolved by further processing external to the DAC. The monitor capability permits the implementation of SQL2 triggered actions but enables the construction of more complex active rule-based systems.
3. Disk-based implementation: DAC[d] The first successful implementation was a memorybased DAC[m] based on a specialised microprocessor, the transputer [S]. Its internal design is shown in Fig. 2. The DAC[m] comprises the components
0
AP
Fig. 2. Internal structure DAG[m].
338
R. Kukla, J. KerridgefFuture Generation ComputerSystemsI2 (1997) 335-344
introduced in Fig. 1 (bulk memory, backup storage), but also gives details about the process structure. It can be viewed as a number of concurrently running sequential processes communicating via message passing channels. An external access process (AP) communicates with a message handler (MH) that will distribute tasks to transaction processes (TRs), a concurrency and lock manager (CLM) or the disk interface (DI). The tables are stored in memory, organised in shared arrays of pointers to data structures that can be easily processed by the TRs. However, only the CLM may change the data during commit time. Together with an optimistic concurrency strategy this guarantees a consistent data set. The DI is responsible for transaction log management as well as backup and recovery. Note that there is no operating system or a file system. The transputer provides its own, hardware implemented scheduler and disk access is performed via direct access to the SCSI hardware. The DAC has full control over the processor and its resources. The DAC integrates important parts from different layers of commonly known database technology into a single low-level unit. When used as part of a DBMS it frees the higher layers from secondary tasks, like backup and concurrency control, to become more efficient in data processing. Only prefiltered data needs to be analysed, an important advantage in data mining environments. However, the success of the DAC[m] implementation made it also clear that further work was needed to transform the DAC concept into a unit that can be a basic building block in a parallel database machine. It showed, for example, that it was not feasible to store large data volumes in memory for reasons of cost. Another aspect is that with growing data amounts checkpointing becomes a critical factor, slowing down a memory-based system. The concept is not restricted to a memory-based unit and so the next logical step was to build a diskbased DAC. In this context ‘disk-based’ means that not all the data is held in memory at one time, instead is retrieved from a mass storage device (disk) on demand. The decrease in bandwidth has to be compensated for by software and the fact that typically access to all data at one time is not needed.. In real world applications only a selection of tables are
being worked on with respect to a specific task. Using an intelligent caching system, these could be held in memory after the first access. Sections 3.1-3.3 look at the work that has been done to satisfy the requirements of the DAC[d J. 3.1. Fast disk access mechanism Moving data tables from memory to an external disk means that individual rows cannot be manipulated directly. Instead a local copy has to be created in memory first and, in the case of an update, written back to disk later. The fact that disk access is based on fixed length blocks/sectors not only generates a massive overhead when the data item to be read/written is small, but is also a possible cause for inconsistency in a parallel environment because there may be multiple copies of the same disk block in memory. The way to cope with both problems is to have a central instance that manages disk access. This disk interface process (DIP) acts as a server to TRs and CLM and makes sure there are no multiple copies of the same data in memory. Fig. 3 shows how this idea would be incorporated in the DAC. When a row/number of rows is loaded and locked in memory (MEM), TR and CLM will proceed as usual, until the block is released during commit time. As multiple processes can access the same data at a given time a reference count needs to be maintained. Blocks need only be written back to disk if the data have changed, otherwise they can be invalidated and the space reused. Write can be delayed if a logging mechanism ensures data safety. Holding blocks in memory as long as possible reduces disk
0
AP
Fig. 3. Internal structure DAC[d].
R. Kukla, J. KerridgeJFuture
Generation Computer Systems 12 (1997) 335-344
access, if data from the same block are requested later. This principle is known as caching. If blocks which are needed next can be anticipated data can be pre-fetched into memory. The algorithms to do so, are part of other research projects. Obvious candidates are either to read whole cylinders of a disk in one revolution or tables that are referenced by the current block. With the DIP solution, every access to the data tables would be replaced by a dialogue with the DIP, leaving the rest of the code working as it was. The DIP is designed for maximal possible through-put of the disk bus as message handling and disk access are decoupled. Requests for data blocks can either be served from memory or are queued by a buffer process. Here they can be reordered to optimise disk access. The design of the DIP conforms to concurrent programming requirements, i.e. it is dead-lock free and CPU utilisation is low [6].
339
Interface definitions allows the use of the extended row-id and permit integrity checks over a number of DACs. There are dialogues that allows a read, an update or a delete based on a row-id. The concept introduces a number of problems. To avoid references to non-existent rows a reference check has to be made for each delete. This is supported by the design. Care must be taken when reusing rowids, if it is allowed at all, as the row-id is derived from physical parameters. The biggest advantage is gained with databases that are appended only (like the Shell example described in section 5) or do not require frequent updates. The concept of row-ids also allows support for complex data structures (objects) as it is possible to store row-ids in tables for another level of indirection. Although research is being undertaken [7], not all implications of this mechanism have been investigated yet. 3.3 Local backup and recovery
3.2 Support for multi-DAC transactions In a multi-DAC environment transactions may or may not involve more than one DAC. For a given query there are two possibilities: the location of the record is known from the placement strategy, the location is unknown. In the first case the query can be directed to the DAC in question, the second involves searches in all DACs, which can be done concurrently. However, often it is the case that the location of a record is only known indirectly. That means a record is referenced from another record, usually via a foreign-key reference. A query can be implemented as two reads of the previously mentioned types, where the predicate for the second is the result of the first query. Joining tables like this can become a time-consuming task. A solution would be to store a pointer type reference instead of the foreign-key reference. This makes it necessary to introduce a system-wide unique indentifier for every row. The pointers are stored in hidden columns (semi-)automatically and are defined during database design. For the implementation of the so-called row-ids, a combination of DAC number, table number and row number is used.
Recovery as implemented in DAC[m] allows any individual DAC to recover to its latest consistent state. However, this approach is not sufficient in a multi-DAC system. A failure during a commit involving multiple DACs can lead to a corrupted transaction that cannot be detected on restart. Possible solutions were discussed (central logging, log entry on prepare to commit) and the decision was made to implement centrally controlled recovery with a decentralised (distributed) logging system. A list of all DACs that are included in a transaction is stored in the log at commit time. On start-up a central agent (usually the DBMS) ensures that only those transactions are rolled forward that have successfully committed on all involved DACs. Checkpoints are used to create a system-wide consistent state which is numbered for identification. It guarantees that transactions are either committed on all DACs or on none. Checkpoints are uniquely identified. A numbering system is used to determine the time order of checkpoints. For each DAC this state can be restored avoiding the need to roll forward the whole transaction log. The central agent must ensure that any failure to receive a
340
R. Kukla, J. Kerridge/Future GenerationComputerSystems12 (1997) 335-344
satisfactory response to a checkpoint request from any DAC results in suspension of transaction processing. The machine should be allowed to shut down ungracefully. As long as the transaction log is intact, the system will recover on start-up once the cause of the failure has been rectified. In order to cope with failures during checkpointing there must always be the possibility of restoring to a previous checkpoint. This needs to be controlled centrally to ensure all DACs roll forward to the same checkpoint which requires a modified start-up dialogue. A DAC may be in one of the two recovery modes referred to as latest and external, respectively. If all DACs are in the latest recovery mode, no special action is taken as they all roll forward to their latest consistent state. This method is not recommended for multi-DAC systems. If at least one DAC is in external-recovery mode (indicating a mixed environment or a disruption of the mode change), all these DACs will explicitly roll forward to their latest consistent state. If all DACs are in external recovery mode, the latest common checkpoint has to be determined by querying the DACs. All DACs will then roll forward to the identified checkpoint. Log entries will be applied for all transactions that have committed on all DACs involved in the transaction. It is suggested, but not obligatory, to perform a checkpoint after a successful recovery in an external checkpoint mode. The mechanism aids the recovery from a machine fault or power failure. In addition to this, backups will take place off-line, supported by additional software that will allow a fall back to a consistent (but not necessarily up to date) state. Provided the log is still intact, transactions can be rolled forward as described above. The backup should only be created when all DACs are off-line with a checkpoint at end of the log (the same checkpoint), which should be checked at backup time. This means, a backup is only valid (and useful) if the last entry in the log of each DAC is a checkpoint. Backups should be marked with the checkpoint identifier. A set of backups is only valid if they have the same checkpoint identifier. They can be done centrally or individually for each DAC.
4. Integrating the storage device 4.1 A scalable database machine A more detailed description of the structure of a scalable database machine (IRISS) which incorporates DACs is given by Walter [4,8]. The machine comprises of a number of fundamental subsystems, see Fig. 4, each of which is itself scalable. Transaction Mangers (TM). Each transaction is allocated a TM which controls the execution of all the statements within that transaction. Each statement is passed to the TM as an optimised execution plan. The process of decomposition and optimisation of SQL statements is performed by other processes within the control subsystem. Resources for transactions are allocated at this stage. No transaction is allowed to start for which resources are not available. Base Table Handlers (THb). These perform operations on base tables, either by applying changes to the base tables, or by extracting data from them. In either case, the operations are viewed as table transformations. In the former case, base tables are transformed from one state to another by changing the data values. In the latter case, base tables are transformed into derived tables which are typically processed further by other parts of the system. THb processes work at the table level, hiding details of
Control
Subsystem
1
L-l Relational
Processmg S”Dsystem
Fig. 4. The architecture of the IRISS database machine.
R. Kukla, J. KerridgejFuture
Generation Computer Systems 12 (1997) 335-344
the underlying data partitioning from the other processes in the Table Transformation System. Base Table Volume Handlers (BTVH). These are the underlying storage controllers which deal with data access at the partition level and are implememted using DAC technology. Derived Table Handlers (THd). These perform two main functions. Firstly, they act as buffers for the temporary storage of derived table data, and secondly, they perform transformations upon derived tables. Storage is designed to be flexible. At one extreme, small amounts of data can be transferred very quickly using fast solid state memory. At the other extreme, large tables can be stored in their entirety using partitioned backing store facilities. The details of data storage are transparent to the rest of the system and are not discussed at length in this paper. Transformations performed by THd include ordering, grouping, duplicate elimination and selection/projection. Yet again, the storage is implemented using DAC technology, though the concurrency management aspect is not required as a derived table can only be manipulated by a single transaction. These DACs have been further specialised so that they have a large semi-conductor memory as well as disk-based table storage so that small intermediate results do not get stored on disk, except that they will be stored on the backup storage in case of failure. Relational Elements (RE). These perform transformations which have more than one input table and a single output table. This includes several different join algorithms, merge, and certain operations to support the processing of ANY/ALL type subqueries. Unlike the THd processes, RE processes have no data storage capability, working entirely on streamed data. The RE, THd and THb processes are all highly replicated within the architecture. The level of intertransaction concurrency is dictated by the number of TM processes. The level of physical data partitioning is dictated by the number of BTVH processes, which will typically be mapped onto physical volumes on a one-to-one basis. That is, the BTVH process is the access process to the DAC which implements that volume. In general, it is often advantageous to perform as much select/project processing as possible at the
341
BTVH level. This exploits the parallelism of the partitioned data storage and associated processing elements, and reduces the amount of data flowing around the rest of the system. Rows of derived tables will then be pipelined around the system according to the execution plan. THd processes are always used to buffer the output of THb and RE processes, even when no transformation is required. This ensures free flow through processes which themselves have no storage capability. Note that many THd operations are pipelineable, i.e. output can begin before input is completed. Some operations, however, require the whole of the input derived table to be stored before any output can be produced. The software architecture described above is designed to be mapped onto a massively parallel distributed memory hardware architecture. The only assumptions made about the architecture are that it has multiple processors each with a local memory, and that processors communicate via a high speed interconnect fabric. Exact details of the process to processor mapping will depend on the configuration in question. The development platform for the IRISS project was a 32 node system based on Inmos T9000 processors and Cl04 crossbar switches [S]. 4.2. interfacing to commercial software packages We have shown how the DAC can be used within an environment that was specifically designed to support all its features. The question now is, is it also feasible to use it as a ‘stand alone’ unit, attached to a plain workstation or PC and is it beneficial to do so. Two aspects are to be considered. How can the unit be physically connected to the machine and how can it be interfaced from commercial software packages. The existing DAC implementations are based on the transputer, a specialised microprocessor with integrated serial links, which easily allow the construction of processor networks. There exist interface boards that fit into a PC and allow the sending and receiving of data packages via those links at different speeds. While a PC1 interface is the most effective in terms of transfer rates and latency, a 16 bit ISA interface is more economic but slower. In both cases low-level drivers need to be written to utilise the hardware.
R. Kukla, J. KerridgefFuture
342
Generation Computer Systems I2 (1997) 335-344 ODBC
Application
!iE! -
ODBC Interface
Driver Manager Driver
Data Source
Driver
Driver
Data
Data
Source
Source
I 1Transformation 1 System Management1 \ I
Routing and Run-time Control Low-level Driver
Low-level Driver
Fig. 5. ODBC components.
The Microsoft Open Database Connectivity (ODBC) interface [9] is an industry standard that allows ODBC enabled applications to access data from database management systems using SQL. ODBC hides the underlying data source behind a uniform C language programming interface. The ODBC has four components: The application calls ODBC functions to submit SQL statements and retrieve results. The driver manager loads drivers on behalf of the application and provides parameter validation for ODBC calls. The driver establishes connection to the data source, forwards requests and results. It will also modify requests to conform with the syntax required by the DBMS. The data source contains the data the user wants to access. In order to connect a DAC to an application via ODBC, we need to create a driver that translates from ODBC and generates the appropriate W-SQL calls utilising the low-level driver discussed earlier. Comparisons show a near one-to-one match between most of the transaction messages. However, there are some problems that have to be solved. - W-SQL is not SQL. A parser is required to transform the ASCII text SQL into the binary format of W-SQL. Main differences are: reverse polish notation for predicates and conditions, table and column reference by number (not name). - The DAC does not allow dynamic table creation or deletion. The initial table definition will be handled by a separate application, that is also needed to define the data placement in case of a multi-DAC system. Also other ‘oddities’ like pre-defined joins, that are not part of the SQL standard, will be handled by this application. The user will be able to manipulate (query, update) the defined tables as standard SQL tables via ODBC.
Fig. 6. W-SQL ODBC driver.
- DAC maintenance. It is anticipated that the program code for the DAC will either be booted from ROM or loaded via an external program. Also start-up/recovery needs to be controlled by an entity that is not necessarily part of the driver. The resulting structure is shown in Fig. 6, which would replace the driver and data source in Fig. 5. First, the ODBC calls are transformed into WSQL interface commands. In some cases the query has to be replicated, if it is not clear which DAC contains the required data. W-SQL commands are then directed to the correct DAC by the routing and run-time control system using a low-level driver. Run-time control also deals with error conditions. In the simplest case the run-time control is communicating with a single DAC directly connected to the interface. It is also possible to have a number of DACs connected via a routing network. Network routing processes can extract destination information from W-SQL commands and direct the messages to the correct DAC. The system management layer can communicate with all DACs via the same route as transaction messages.
5. Example applications
The IRISS architecture is being evaluated in two main application areas. The first relates to a retail banking environment where account information is being updated by on-line transactions whilst at the same time the transactions are being analysed for fraudulent access using on-line analytical process-
R. Kukla, J. KerridgejFuture
Generation Computer Systems 12 (1997) 335-344
ing techniques. The data set can be scaled using a synthetic data generation system which mimics the real application [lo]. The database machine is also being used to store data resulting from a large scale measurement activity being undertaken by Shell Research Ltd. They are taking measurements using LIDAR to determine escaping gases from oil refineries and other chemical plants. Once in full scale operation this equipment will result in 500 Mbytes of data being saved per day per oil refinery. The data will be analysed to determine whether there is any consistent pattern in which such gases are being released from different plants and if this is dependent on the age of the plant. The data are captured from the measuring equipment as a sequence of 10000 integers per firing of the tuned laser which forms the kernel of the LIDAR equipment. Such firings can occur every 2 s in the most usual operational mode of the equipment, where the laser is moved after every firing. The data are then preprocessed to extract useful information concerning the presence or otherwise of gases to which the laser has been tuned. Over a period of time the shape of the cloud of any escaping gas can be determined. This shape is also determined by the prevailing weather conditions. Chemical and petroleum refining companies tend to build many similar processing plants throughout the world. Apart from providing a record of what was happening to a particular plant at a particular time, the data will also identify changes in the plant that are associated with component wear. This is achieved by looking at information from the same and similar plants over a period of time. Thus the data is to be used both in OLTP and DSS modes.
The next step will be to add different optimisation strategies for fast data access in the form of caching algorithms. Future extensions are planned and will include support for complex data structures using the ADT capability inherent within the DAC design. Acknowledgements Most of the DAC development work is carried out with funding from the EPSRC as part of the OBADIAH project. Also the LPD project funded by the European Union contributed to its advancement. We wish to acknowledge our collaborators H. Walmsley from Shell Research Ltd. and S. Smith from Transtech Parallel Systems Ltd.
References [l] J.M. Kerridge, D. Walter, and R. Guiton, “W-SQL: An Interfacefor Scalable, Highly Parallel, Database Machines, Aduances in Databases”, eds C. Goble and J. Keane, Lecture
[Z]
[3]
[4]
[S] [6]
6. Conclusion We have described a component which can support the basic functionality of an SQL database machine. The design is inherently parallel and exploits the advantages that such a design gives to produce a scalable database machine. With the development of the disk-based version, the DAC concept penetrates application areas that have to cope with large data volumes and also becomes more flexible to use in a multi-DAC system.
343
[7]
[8]
[9] [lo]
Notes in Computer Science, vol. 94O(Springer Berlin, 1995) 263-216 J.M. Kerridge, “IDIOMS: A Multi-transputer database machine”, in: Emerging Trends in Database and Knowledge-base Machines (eds) M. Abdelgurll and S. Lavington (IEEE Computer Science Press, Silver Spring, MD, (1995) 9-23. S.W. Waithe and J.M. Kerridge, “A scalable massively parallel architecture for database concurrency control”, Proc. Int. Conf: on Concurrent Engineering: Research and Applications (Pittsburgh, June 1994) D. Walter and J.M. Kerridge, “Relational query processing on the IRISS parallel database machine”, HPCS96 (Ottawa, Canada, May, 1996). INMOS, The T9000 transputer products overview manual, SGS-Thomson Microelectronics, 1991. R. Kukla and J. Kerridge, A plug-in disk-interface-process for the WSQL Data access controller, in: Parallel Processing Developments, ed. B.0’ Neil1 (10s Press, Amsterdam, 1996) 75-88. K. Alnaljan and J.M. Kerridge, “A first step towards implementing an object interface to a parallel shared-nothing database machine,” submitted for publication. D. Walter, “Parallel database machine design overview”, Design Document: IRISS/DDl, version 3, NTSC, University of Sheffield, 1994. Microsoft: “ODBC 2.0 Programmer’s Reference and SDK Guide” (Microsoft Press, 1994). C. Bates, I. Jelly and J.M. Kerridge, “Modelling test data for performance evaluation of large parallel database machines”, Distributed and Parallel Databases, 4 (1996) 5-23.
344
R. Kukla, .I. KerricigelFuture Generation Computer Systems 12 (1997) 335-344
Robert Kukla is a Research Fellow at the Department of Computer Studies at Napier University. He received his degree in Electronical Engineering from the Humboldt University Rerlin m 1992 and his M.Sc. in Computer Science from Sheffield University in 1993. Since 1993 he has been working on various database and transputer related projects. His research interests include parallel processing, data management and client-server applications. European !ine grain and road committee
Jon Kerridge is Professor in Corn uter Studies at Napier University in 1.dmburgh, a post to which he was appointed in January 1996. Prior to that he had been at the University of Sim5eld where he was a senior lecturer in Computer systems. During the period 1988-1995 he was the Director of the National Transputcr Support Centre and was reslblc for a number of parallel zontabase machine projects fimded by the UK Government and also by the Union. His research interests lie in the application of parallel processing techniques to data management trafllc modeling. He is a past member of the IS0 that standard&d SQL and SQLZ.