Microprocessing and Microprogramming EISEVIER
Microprocessing
and Microprogramming
41 (1995) 315-330
Supporting user mobility in the Distributed Logical Machine System Jim-Min Lin al*, Shang Rong Tsai b a Department of Information Engineering, Feng Chia University, Taichung City 40724, Taiwun b Computer System Laboratory, Department of Electrical Engineering, National Cheng Kung University, Tainan City 70101, Taiwan Received 30 August 1994; revised 13 February
1995; accepted 7 April 1995
Abstract
User or program mobility in distributed computing systems is becoming increasingly significant in the modem community since users may change their working locations frequently. Job migration is supplementary to remote login in the support of user mobility. However, the migration facility is not a common feature in distributed systems yet. This is mainly due to the inherent complexity in implementing such a facility. This paper proposes a logical machine migration mechanism that can effectively support software environment migration. The basic idea behind logical machine migration is to migrate a logical machine, including the running processes and their execution environment, by a single mechanism. Thus most of the migration difficulties due to the dependency on the operating system kernel are eliminated. We have realized an experimental system, called DLMS386, which successfully demonstratessuch idea. Keywords:
Distributed
operating
systems; Job migration;
Distributed
logical machine systems; Logical machine migration
1.Introduction User or program mobility in distributed computing systems is becoming increasingly significant in the modem community since users may change their working locations frequently. Remote login and job
* Corresponding
author. Email:
[email protected]
0165-6074/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved SSDI 0165-6074(95)00011-9
migration are two approaches to supporting user mobility. By using remote login facilities one may connect a remote host (using a simple terminal emulation program, for example VTERM, KERMIT, or rlogin command) wherever he is. This is the simplest and most popular approach to supporting user mobility. As multimedia networking applications becomes more and more popular, the network traffic will unavoidably become much heavier. Thus the response of remote jobs would not always be satis-
316
J.-M. Lin, S.R. Tsai/Microprocessing
factory to users. Furthermore, the remote terminal emulation programs cannot be used for window-based applications if there is no support of special terminals, say the X-terminals, at the remote site. Job migration is supplementary to remote login in the support of user mobility. With job migration, users can change the location of an executing job. Remote login merely supports a way to connect to a remote host. With the mobility of an executing job, users can have more benefits. For instance, when a host is to be shut down the jobs executing on this host should be terminated immediately. If we can migrate all the executing jobs before the host is shut down, jobs can be continued on another hosts without disconnected operations or loss of computing power. This is especially significant for long running jobs. Let us state a scenario that illustrates such a situation. A user starts a long job on a computer at his home. For some reason, he should go to his office immediately. He then issues a command to migrate all the jobs to the host at his office. Finally, he goes to his office and then continues his jobs at his office. In addition to supporting user mobility, job migration is also an important mechanism in balancing system load in a distributed system. Although job migration is a desirable function, it is not a common feature yet. The term job migration actually means the migration of executing software. As shown in Fig. 1, software migration may take place at three different levels, namely at the process interface, real machine interface, or system level interface. One commonly used software migration scheme is to migrate a process [l-5]. However, implementing a totally transparent and efficient process migration facility is not trivial. Therefore, up to date process migration is not widely implemented in distributed systems. The difficulties occurring in designing process migration are mainly due to the statefulness of most of the operating systems. The kernel has to keep track of many process states. In many
and Microprogramming
41 (1995) 315-330
mzgrat ion /-P-P 0’
(A)
(B)
migration
migration
(C)
of a process
of a software
migration
on the
on the
real
of an LM on the
OS
machine
LMM
Fig. 1. Migrating sofhvare on different levels.
systems, the process states are distributed among a number of system tables. This makes it hard to extract information from the source host and create corresponding entries on the destination host. For example, in a UNIX-like system, the states of an open file are distributed in three tables: a process file table, a system open file table, and an i-node table. When a process is migrated, rebuilding these pointers is necessary and needs careful handling. Another approach to migrating jobs would be to migrate the whole software running on a real machine interface. A Virtual Machine (VM) [6,7] is such a system running on a real machine interface supported by a control program called Virtual Machine Monitor (VMM). Since the software directly controls all the physical devices, its states are highly hardware-related. The difficulties in migrating a whole real machine code are due to its dependency on the real machine. The states of the whole software
J.-M. Lin, S.R. Tsai/Microprocessing
and Microprogramming 41 (1995) 315-330
on a real machine may include in-memory image, CPU register context, hardware control status, and I/O device states. The first two states can simply be detached and restored. On the other hand, the hardware control status and I/O device states are not always possible to be extracted or restored on another machine. The difficulty is due to the incompatibility of hardware devices between the source machine and destination machine. For example, different hard disk types, different hard disk controller types (AT-bus, ESDI, or SCSI), or screen monitor types (Monochrome, EGA, or VGA) can make the migration difficult. Even different settings of the I/O configuration parameters (for example the I/O port address of a controller) can make the migration fail. If there is no special mechanism supporting device simulation or I/O requests redirection, the migrated software may not be continued or the execution of migrated software may be incorrect. Thus, migrating the whole software from a real machine to another is almost impossible in practice. To alleviate the difficulties encountered in process migration and real-machine-level software migration, we propose to migrate a Logical Machine (LM) [8-111 in a Distributed Logical Machine System (DLMS) [12,13]. A logical machine is a highly ma-
/’
//
Logical
/,l
I
/
‘-
/
Machine Real
Machine
Monitor
7 : : :-
Lagzcal MachLne Monttor Baszc Machtne
lntG?rfm
Fig. 2. Overall structure of the Logical Machine System.
I
317
I
LAN
Fig. 3. Overall structure of the Distributed J_ogical Machine System.
chine-independent abstract machine running under the control of a control program called Logical Machine Monitor (LMM) (see Fig. 2). As we apply the concept of logical machine to a distributed environment, more potential benefits can be achieved [14,15]. Instead of migrating an individual process or a whole real machine software, the LM migration is to migrate a whole logical machine that runs on a system level interface. This system interface has the advantage of being highly hardware independent. All devices in a LM are virtual and only few LM states are kept in an underlying control program, the Distributed Logical Machine Monitor (DLMM). Thus, a LM is a system object supported by the DLMM and is highly machine independent. LM migration is basically supported by the DLMS. The core of the DLMS (see Fig. 3) is the DLMM that controls the system resources and provides the execution environments for one or multiple logical machines. Within a logical machine one can execute a logical machine operating system (LMOS) and a number of user processes. The LMOS provides services and execution environments to user processes. The DLMM manages LMs rather than individual processes. It supports logical machine migra-
318
J.-M. Lin, S.R. Tsai/Microprocessing
tion rather than individual process migration. To tolerate a machine crash, we need either to replicate a machine using a standby unit or’to checkpoint and recover a machine state. The LM migration mechanism can also be used as an effective means to checkpoint the machine state for fault tolerance. The basic idea is to migrate a logical machine, including the running processes and their execution environment by a single mechanism. Most of the migration difficulties due to the dependency on the operating system kernel are thus eliminated. Compared to conventional process migration, the most significant advantage of our approach is applied is the simplification on migrating cooperative processes running on the same logical machine. In the design of a mechanism to migrate a concurrent program, one should carefully consider the correctness of process interaction after migration. Migrating a whole machine rather than individual processes eliminates many of the problems we will discuss later. The rest of the paper is organized as follows: We first discuss the design considerations about the LM migration mechanism in Section 2. The implementation details of the LM migration mechanism are described in Section 3. Section 4 shows some performance measurements of the experimental system, the DLMS386. Finally, Section 5 summarizes and concludes the paper.
2. Design considerations In the design of LM migration, one should consider both the migration mechanism and migration policy. The migration mechanism is related to ‘what to migrate’ and ‘how to migrate’, whereas the migration policy is related to ‘where to migrate’, ‘when to migrate’, and ‘which LM to migrate’. The LM migration policy is not the subject of this article. We concentrate on how the LM migration mechanism is implemented;
and Microprogramming
41 (1995) 315-330
2.1. Design goal The system was designed to support user mobility permitting a user to carry his execution environment (the logical machine) to the target host. In addition to user mobility, LM migration mechanism can also be used for checkpointing a logical machine state for fault tolerance. Instead of migrating a logical machine to another host’s main memory, we can use a similar mechanism to extract the whole machine states and save them on the volatile memory of the remote host. On the detection of a node crash, the logical machine states can be restored and continued. 2.2. Design considerations In the design of the LM migration mechanism, four important factors are considered, namely design complexity, residual dependency, transparency, and performance. We particularly emphasize the first three factors in our experimental system. We also attempt to make a comparison between conventional process migration and the proposed LM migration based on the four factors listed above. 2.2.1. Design complexity One of the important advantages of LM migration, as compared with traditional single process migration, is its design simplicity. As described in Section 1, the difficulties of process migration are mainly due to the statefulness of most of the operating systems. In a logical machine system, all the internal logic of a LMOS, such as file processing, interprocess communications within the same LMOS, relative timing management, and so on, are integrated within the individual LMOS context and are totally transparent to the DLMM. Thus the DLMM could ignore all the internal logic of a LMOS. From the viewpoint of the DLMM, the logical machine is just a simple system object that contains a logical
J.-M. Lin, S.R. Tsai / Microprocessing and Microprogramming 41 (1995) 315-330
space, a virtual processor and several logical devices. This makes LM migration much easier. In a conventional operating system, an open file may include control information, such as file references and access position, in main memory. This control information is used for referring and controlling the accesses to the file. In addition, there might also be some cache buffers containing the data of this file. In the case of process migration, one should carefully collect, transfer, and rebuild the control information separately for each open file. In case the file server supports a client side cache, the dirty cache of the migrated file should be flushed to preserve the coherence of the cache. Flushing back the dirty cache buffers may increase system overhead, since many commonly used cache buffers must be written into disk before migration and re-cache from disk into memory after migration. As a result, these cache buffers may be moved twice between the client node and the file server node. Thus a lot of system overhead would be incurred due to migration as the size of the cache buffers increases. (Unfortunately, it is not uncommon that the size of the cache of the open file is larger than the size of a process.) Whereas in the case of LM migration, the file states, including its cache, are transferred intact and atomically when the whole LM address space is transferred during LM migration. There is no problem of rebuilding the pointers for keeping file states intact or of flushing and re-caching the buffers. The interactions between processes might be implemented with various forms of interprocess communication (IPC), such as messages, signals, and shared memory. In process migration, the IPC incurs other critical system design problems. For example: (A) To ensure the correct handling of message/signal deliveries, some complicated algorithm must be applied to reconstruct the migrated process’s communication links; (B) In case that several processes communicate through shared memory, it is difficult to migrate the cooperative processes if the operating system does not support distributed shared memory.
319
Alternatively, if we apply the LM migration approach, all the relevant states of the cooperative processes within the same host are migrated as a single object. It is not necessary to consider the reconstruction of the communication link between processes or between processes and the operating system. Additionally, the complicated problems in handling pending messages/signals do not exist in LM migration if the messages/signals are transported within the same machine. Similarly, migration of cooperative processes that communicate through shared memory does not incur problems in LM migration, since the sharing processes are moved together and kept intact with their shared memory during migration. In some cases, restoring the whole picture of a video display is helpful for users to recall and recover their interrupted jobs. For example, in a control or monitor system, a screen attached to a host is used to display the system output that may change at anytime. When the host crashes, we can try to restore the control program on another host and restore the output on another screen. In LM migration, the video frame buffer associated with each LM is treated as LMOS level shared memory and the movement of the video frame buffer is handled in a similar way as the ordinary memory within a LM. Thus the screen output can be restored at the new site.
2.2.2. Residual dependency The residual dependency is defined in [3] as ‘an ongoing need for a host to maintain data structures or provide functionality for a process even after the process migrates away from the host’. Residual dependencies are undesirable for three reasons: reliability, performance, and complexity. Although residual dependencies have many disadvantages, it is difficult to eliminate them all in process migration, for example using a device that is installed in the specific host; the clock reference if the clocks are not syn-
320
J.-M. Lin, S.R. Tsai/Microprocessing
chronized between hosts; and so on. The greatest drawback of residual dependencies is that it will seriously damage the reliability of the migrated process. This is a particularly important issue in fault tolerant systems. Therefore, residual dependency should be kept as low as possible. In LM migration, all of the residual dependencies can be eliminated except the access to some hardware devices on the original node. The reasons are: (a> In DLMS, the DLMM does not recognize process semantics and the process states are kept totally in the LMOS. The whole LMOS is moved with processes to the destination host. The user processes get their services totally from the same LMOS. Therefore, residual dependency in process level does not exist. (b) There are only few LM states (in our experimental system, only hundreds of bytes) kept in the DLMM. These states can be completely and simply extracted from the source host and restored on the destination host. (c) Since the DLMMs in each host are identical, some global data structures are replicated in every LMM. These data structures include the LM location table and the global logical device mapping tables. Therefore a LMOS could get the necessary system services from the local LMM in the destination node. (d) There is a logical clock associated with each LM to support the related timing services, for example, to get the time of day or to schedule a time event. A logical clock maintains the time value of a LM from its startup. It is used as a timing reference on a particular logical machine. In some cases, if a LM is migrated it should use the same logical clock as its timing reference. For example, suppose a process issues a request to do a job at a particular time instant, if this process is migrated before the time event occurs, the process should be notified when the event occurs using the same timing reference. In the DLMS system the logical clock is part of the LM states and is transferred to the destination host. We have proposed a method [13] to implement the logi-
and Microprogramming
41 (1995) 315-330
cal clock in the DLMS system. Therefore, as the reasons listed above, the LM level residual dependency does not exist. This is an important feature for designing a fault-tolerant system. In our implementation, the migrated LM could continue its execution as usual even for the interactive types of job, say editing a program or document, after its original host is power-off (crashed). Therefore, we can say there is no residual dependency in LM migration except for the access to some specific devices connected in the original node.
2.2.3. Transparency Regarding transparency, three conditions should be considered. First, no extra code needs to be added inside any LMs or processes to handle additional exceptions caused by LM migration. That is, LM migration should be transparent to both process level and LM level codes. Furthermore, the behavior of a LM and the associated processes should not be affected due to migration. Finally, the appearance of a LM and the associated processes on this LM to the rest of the system should not be affected due to migration. Transparency can be classified into two levels, namely the process level transparency and the LMOS level transparency. All the necessary run time supports for a process is provided by the LMOS. Since the kernel (LMOS) of a migrant LM is moved together with the associated processes, the execution environment of the processes does not change after migration. User processes are unaware of any difference due to migration. Thus all the three conditions described above are matched for the process level transparency. We analyze the LMOS level transparency in three aspects: (a) The interface between the LMOS and DLMM are the DLMM primitives. The DLMM primitives are primarily used for accessing logical resources by LMOSs. One large set of primitives is for I/O device access. Another would be for real memory
J.-M. Lin, S.R. Tsai/Microprocessing
allocation and deallocation. No operations of memory allocation and deallocation are required by the LMOSs, if the DLMM could pre-allocate a memory space (of fixed size) for each LM. (In our DLMS386 system, each LM is allocated an 1MB memory for MINIX logical machine.) Consequently, the DLMM primitives used are for I/O device access only. The DLMM primitives for device access can be designed to be access transparent. That is, the primitives issued by a LMOS would generate the same results independent of the location of the LMOS. (b) The names of the logical resources connected to the real machines collectively form a global name space. Every logical disk is assigned a unique and global name. Thus, the LM can access its logical devices uniformly around the system. (c) The DLMM imposes protection on LMs. The interactions between LMs are done via network transparent ILMC (Inter-Logical Machine Communication) supported by the DLMM. Thus, the interface between LMs remains unchanged and the migration of a LM does not affect the integrity of other LMs. With the support of transparent LM migration in the DLMM, a LMOS executes on it inherits the migration benefits without paying any effort in doing the migration itself. The LM migration is just a simple mechanism in the DLMS as described in the section discussing the design complexity. To support transparent process migration, some distributed services are desirable, such as a global name server for process identification, a network transparent global file system, network transparent IPC and device access, and proper cooperation between kernels. These supports are the characteristics of a distributed operating system. Thus, to support transparent process migration on a traditional centralized operating system, lots of efforts need to be undertaken. On the contrary, it is much easier to support transparent migration in the DLMS using a LM migration mechanism. The centralized operating systems executed on it enjoy the benefits of migration.
and Microprogramming 41 (1995) 315-330
321
2.2.4. Performance In comparison with process migration the extra overhead of LM migration is mainly caused by the transfer of the LMOS itself. According to our experience, it takes about 792 ms to migrate a bare LMOS of the MINIX V1.3 with 272KB size. Although this might be a significant migration overhead, we think it is an acceptable value as there are multiple processes or a large process to be migrated. One of the objectives in our system is to support user mobility rather than balance system load. Thus the extra overheads in migrating the LMOS are not significant.
3. Implementation of logical machine migration In this section, experiences in implementing our LM migration mechanism are addressed. Currently, we have implemented both ‘push’ and ‘pull’ types of migration. In ‘push’ type migration, the source DLMM issues a migration request and dominates the progression of the migration. Whereas in ‘pull’ type migration, the destination DLMM issues and dominates a migration. The mechanisms used in both schemes are very similar. In this section we only show the mechanism used in ‘push’ migration to illustrate LM migration. 3.1. Basic DLMM supports for LM migration In the implementation of LM migration, some basic DLMM supports are helpful: 3.1.1. Global logical disk support The support of global logical devices, such as logical disks, is a desirable feature to facilitate LM migration. We have successfully implemented the logical disk with replication capability that supports global logical disk management. The DLMM is responsible for logical disk replication. Information regarding disk mapping is included
322
J.-M. Lin, S.R. Tsai/Microprocessing
and Microprogramming
41 (1995) 315-330
f-?
1-J
Level- 1 mapping (Local-to-Global mapping) Level- 2 mapping (Logicalto- Physical
mapping)
Fig. 4. Two-level Logical Disk Mapping.
LM name bflNIX_I
1
h.dl
1
G2
bfINlX_I
1
hd3
i
C3
Cn:CLD
disk
for Logical
index number (LPD ID) available
Local
flag
lock
flag
each
disk
size
Yes
I
Yes
NO Yes
1024KB
2
res
NO
2048OKB
for
each
disk(s) LPD-ID
)U ,ser
LM name
UINIX_l
-
MINIX_2
access
mode
READ/WRITE READ
fO,O) -
MINIX_2
READ/WRITE
(1,1) -
UINIXJ
READ/WRITE
.&UNIX_1
READ/WRITE
LMM)
(2,3)
1ZOKB
LMM. but
Physical
-
disk address file system (sty spc 612e) type
(0,17)
MINIX
(2.77)
MINIX
(lb.171
MlNIX
SC#: Starting Cylinder number SPC_si*e: Sectoss Per Cylinder
(C) Local
physxel
0.1)(1?5)(2,0)
Disk (LLD) Table
0
(one
mapped
real_machine_lD.
n
(one (A)
Index number (GLD ID’
mapped GLD ID
LLD name
none
m
diskless
node)
(one
for
Disk (LPD) Table Fig. 5. Logical Disk Mapping tables.
(B)
whole
system
Global
and
Logical
rephcated
in every
Disk (CLD) Table
LMM)
J.-M. Lin, S.R. Tsai/Microprocessing
in a system file ‘CONFIG.LMM’, which is referred to while a LM is created. Mapping a logical disk to the corresponding physical disk(s) is done with a two-level mapping scheme (see Fig. 4). The first level is to map the name of a local logical disk within a LM to the corresponding global logical disk identifier. A global logical disk identifier is represented by an index number to the global logical disk table. While a local logical disk is mapped to exactly one global logical disk, a global logical disk can be mapped from more than one local logical disks. Thus a global logical disk can be shared among LMs (within the same operating system domain). The second level will map this global logical disk to the physical disk(s) in the corresponding site(s). The structure of the local logical disk table, the global logical disk table, and the physical disk table are shown in Fig. 5. For the sake of independence, each logical disk is given a globally unique name. The names of the logical disks collectively form a global and uniform name space. We implemented the disk name mapping as a two-level scheme to simplify logical disk management and speedup the look-up for mapping information. The global logical disk table is configured identically in every node around the system. Owing to the two-level disk name mapping schemes, transferring the states of a local logical disk during LM migration is simplified. When a LM is migrated, the local logical disk table of this LM is moved to the destination node. Thereafter, the LM can access its logical disks as if they were in the source node. All these tasks are done by DMM and are totally transparent to LMOSs. 3.1.2. Network transparent Inter-LA4 Communication (ILMC) support In the DLMS, multiple LMs could coexist on a node. In our design, we regard each LM as an independent machine, so no shared memory is provided among different LMs, even if they are located
and Microprogramming 41 (1995) 315-330
323
on the same node. To provide communication functions to the LMs we implemented the ILMC facility. The ILMC is basically a message-passing mechanism supported by the DLMM. The DLMM provides a virtual Ethernet interface [20] for the communication between LMs. A LM communicates with another LM in a location-independent way. In current implementation, the ILMC is supported on the Ethernet. The virtual Ethernet interface is necessary to emulate the functions of the controller that the MINIX system (the LMOS that runs under the DLMM) supports. A detailed description of the implementation of the ILMC can be found in [20]. 3.1.3. I / 0 access transparency In our system model, all I/O should be performed through DLMM primitive calls. Upon receiving the I/O requests, all the DLMMs will cooperate (if necessary) and complete these requests. Thus, I/O accesses in DLMS are location transparent. We have implemented the DLMM primitives for diskI/O accesses. Some character devices, such as keyboard and printer, are directly mapped into the corresponding local physical devices. Currently, only text mode video display is supported by our logical video device. A 4KB and an 8KB memory space are used for video RAM buffers of Monochrome and VGA devices, respectively. In LM migration support, the DLMS386 that we have implemented supports I/O access transparency for disks but not for screen-type terminals. The migrated logical machine will use the terminal on the new site. Thus if the new terminal is not compatible with that on the source site, the users will detect the difference. In practice, terminal emulation is possible but usually not efficient. We leave this as a future research. 3.2. LM states The DLMS386 is built on an 80386-based architecture where the DLMM is running in the 80386protected mode whereas a LM is executed as a
J.-M. Lin, S.R. Tsai/Microprocessing
324
Page
I,‘0
Permission
Table
Map
and Microprogramming 41 (1995) 315-330
3.2.1. TSS The TSS is a collection of data structures representing a snapshot of the states of a virtual CPU in an 80386 system. Each task in the 80386 system is associated with a TSS structure. The TSS structure of a V86 task contains two parts: a hardware-related part with a fixed size 104 Bytes, which is specified by the 80386 processor and the internal registers of the corresponding virtual 8086 processor; and a task-related part with variable size (408 bytes in DLMS386 system), which is specified by the operating system, such as the execution priority, file descriptors, etc. 3.2.2. Local logical disk table This table records the mapping information of local logical disks to the corresponding global logical disk identifiers.
Fig. 6. Logical Machine task state segment.
Virtual-8086 (V86) task. The reasons why a LMOS is not built as an 80386 task in the 80386-protected mode are: (1) The target LMOS we chose is the MINIX V1.3. MINIX is designed for running on an 8086 architecture. (2) Building a LM with a V86 task is more convenient than with an 80386 task to implement the LM migration, since there are fewer states in memory management needed to be maintained. For instance, the GDT (Global Descriptor Table), LDTs (Local Descriptor Tables), and IDT (Interrupt Descriptor Table) are 80386 system tables which are not available in a V86 task. In addition, the relocation of a GDT entry for a migrated 80386 task is another problem in the design of migration. The states of a LM simply include the TSS (Task State Segment) (see Fig. 6), the local logical disk table, the LM memory space, and the associated video frame buffer.
3.2.3. LM memory space The DLMM supports the execution environment of LMs including their logical spaces. In the early experimental logical machine system, LM286 [8,9], the logical machine monitor did support dynamic memory allocation service to the LMs. With the support, the LMOS can request its storage on-demand. The storage that a LM acquire is its logical space. In the DLMS386 system, the target system (the MINIX) executes within 1MB space that is quite small, so in our prototype system, the DLMM allocates an 1MB space for a LM when it is created. The dynamic memory allocation service can be supported in our system. The memory space of each LM is aligned at a fixed boundary, such as lM, 2M, 3M, and so on (see Fig. 7). This facilitates the collection and restoration of a LM’s memory space. 3.2.4. Video frame buffer There is a memory space allocated as the logical terminal output (video frame buffer) of a LM when this LM is created. All video output of a LM is redirected into the associated video frame buffer.
J.-M. Lin, S.R. Tsai /Microprocessing and Microprogramming 41 (1995) 315-330
LMUS
325
3.2.5. ILMC message buffer During the migration of a logical machine, there could exist ILMC messages being sent to it by another logical machine, thus the DLMM can handle these pending messages. The pending ILMC messages are kept temporarily in some specific system queues in the DLMM space. In DLMS386, each LM is associated with a 12 KB buffer (4KB for sending buffer and 8KB for receiving buffer) for storing ILMC pending messages.
:!
3.3. LM state transfer The LM state transfer progresses in four steps (see Fig. 8): (1) Requesting migration permission and tran.s-
LMOS I (MINIS:
ferring the TSS structure and local logical disk table.
INP
“C’t<>ll
i&o,
MOdi~)
0
Fig. 7. Memory allocation
in the DLMS386.
The video frame buffer will be displayed on the video screen when this LM is scheduled and executed as a foreground machine. By transferring the LM video frame buffer as a LM is migrated, we can migrate interactive jobs gracefully. People can continue their jobs on a new machine with exactly the same screen image. In our system, there is a 4KB video frame buffer allocated to each LM. Currently, only text mode video is supported by the DLMM.
When the DLMM is going to migrate a LM to a specific destination node, it should first request the agreement of the destination DLMM. If there are enough resources available in the destination node, i.e. memory space and free logical disks, the contract will be made; otherwise the request will be rejected. Since the size of the TSS structure plus the LM logical disk table does not exceed lKI3, this data is piggy-backed on the migration request message. (2) Transferring the LM’s memory space. In transferring the LM’s memory the DLMM suspends the execution of a LM before it moves the memory space. Since the LM memory space resides in a contiguous area, the space transfer and restore are simplified. Although the maximum memory space of a LM is up to 64OKB, it is unnecessary to move all the 640KB memory. Only the pages with significant contents need to be moved. When we apply the LM migration mechanism to checkpointing a LM periodically, only the dirty pages should be transferred. (3) Transferring the logical machine video frame buffer.
The movement of the 4KB video frame buffer is the same as that of the LM’s memory space.
J.-M. Lin, S.R. Tsai/Microprocessing
326
and Microprogramming
frame buffer, the 12KB ILMC message buffer associated with the migrated LM is migrated to the destination node. 3.4. LM state restoration and ILMC message redirection
In restoring the LM state, all what the destination DLMM must do is: (1) To restore the LM memory space and the associated video frame buffer; (2) To restore the LM logical disk table of the migrant LM; and (3) to adjust some system related table entries and pointers, including the pointer to the corresponding PAGE DIRECTORY (that is the CR3 field in the TSS) of a LM; and the I/ 0 MAP BASE in the TSS.
Ts2
iJ
5 TSI : Ts2
:
Freeze Collect Collect Send
..*
migratzon
t (destmatmn)
TD~
request
to destznatzon
node
mzgration permzsszon LM address space begin
Ts3 : Transfer
Ts4
u.
the mzgrated LM, start LM migratzon LM’s LMM data Logzcal Devzce co?l.fcguratzon data
Receive Transfer
LM address
Transfer
LM screen
Ts4 : Transfer
LM screen mzgratzon_OK
Send
‘I’s3
,i
‘i
TD, TDZ
Tso :
,\
315-330
The ILMC message redirection is an important issue. To reconstruct the communication links of the migrated process, two typical schemes, address change handshaking and message forwarding, may be applied. The former is to change the process location record in the communication peers and the messages should be sent to the new host after a process is migrated. The latter does not change the information of process location in the communication peers. The original host receives messages for the migrated process as usual and forwards these messages to the new host. The approaches that use message forwarding technique require a ‘home machine’ as termed in Sprite [3], or an ‘original site’ as termed in LOCUS [151. Changing process location records is not a simple job, especially in a large scale system. Logically, it must be done atomically, otherwise, chaos in message delivery may occur. If the underlying communication system does not support multicast, it is neces-
(4) Transferring the ILMC message buffer. As in the movement of LM address and LM video
TsoTsi
41 (1995)
space f?ame
end buffer
begin
frame blLffer end PKT to destznatzon
node
TD, : Recezve Check
if
TD2: Reply
LM migration request any free LM slot
permzsszon (or reject request) Restore LM’s LMM data into free LM slot Restore Logzcal Devzce confzgurataon data TDZ: Receive LM’s address space PhT Restore LM’s address space TD~: Recezve LM’s screen frame buffer PKT Restore LM’s screen frame buffer PhT Tus: Recezve mzgration_OK PKT Relznk logzcal devices TD6: Start runnzng of the magrated LM
Fig. 8. Logical Machine migration
scenario.
J.-M. Lin, S.R. Tsai/Microprocessing
and Microprogramming 41 (1995) 31.5-330
by all the DLMM. Meanwhile, the ILMC messages sent to the migrated LM will be lost. These ILMC messages will be resent using the new destination address after migration is completed. Such strategy is similar as the ‘message loss recovery’ strategy used in V-system [5,21].
sary to use unicast to inform all the hosts the changed location of a migrated process. The scheme using message forwarding does not need to concern itself with this problem. However, it might leave some residual dependencies in the original host. It will also incur many considerable overhead in forwarding messages, especially when a process is migrated from node to node many times. Considering the intention to support fault-tolerance, the message forwarding is evidently unacceptable. Therefore, address changing approach is used in our implementation. In a LAN we can use multicasting to notify the address changes to all hosts. Process identifiers are another issue. If the system does not provide global process identifiers, address conflict would occur because the process identifier of the migrated process would conflict with the one that already exists on the new site. In the DLMS386, the DLMM supports globally unique logical machine identifiers, which can be mapped to the physical machine identifier dynamically. There is a LM _ location_ table kept in every DLMM to keep track the physical locations of all the LMs. When a LM is migrated, the associated entry in the table is changed. In our implementation, ILMC message redirection is done by broadcasting an LM -MOVEMENT message to every node in the system. On receiving the LM -MOVEMENT message, the new LM-tohost mapping in the LM_ location_ table is updated
step
a.l(equestrng
migration
4. Performance We performed some preliminary measurements to assess the performance of the DLMS386 system. We measure the basic migration cost which includes no ILMC and no remote disk access. The testing environment for LM migration is made up of three identical 386-AT personal computers connected with a 10M Ethernet. The CPU used in each host is an 80386-20 processor. The measurement of the desired operations is done by looping a specific operation 100000 times, and then measuring the elapsed time by reading the system timer. According to the measurements (see Fig. 9), we model the average elapsed time spent in migration by applying following formula: LM migration time = 2.77 + 777.92 + 2.86s + 11.5 = 792.19 + 2.86s (ms)
Size
Operation pernussian
550 bytes
1
b.Transferr,“g TSSatructurp c Transferring LLDT table __-
2
Translerrlng IAl’S memory SPWX
3
Transferring LM’S videoflTUlx buffer
321
LOO]’ times
Elapsed time
1OOOOH181.6 (65536) set
610 KB
100
-l KB
1000
liZi 11.5 set
Fig. 9. LM migration performance measurements.
Average elapsed time
2 77m.s
2 86ms,,,KB 115ms
328
J.-M. Lin, S.R. Tsai/Microprocessing
where 2.77ms is the elapsed time for collecting and transferring a LM’s TSS and local logical disk table; 777.92ms is the overhead for transferring the basic execution environment of a MINIX including the MINIX operating system core image, I/O tasks, system server processes, and shell process (of total size 272KB); the average elapsed time for transferring 1KB memory space is 2.86ms; S denotes the total size of the user part memory space of the user processes in the migrated LM; and 11Sms is the elapsed time spent in transferring the 4KB video frame buffer of a LM. This formula shows that the LM migration cost mainly depends on the total size of the user part memory space of the user processes in the migrated LM. We have also implemented a process migration facility for the Distributed MINIX system [16,17]. We compare the performance of the migration facility of both systems in the Appendix.
5. Conclusions This paper on LM migration proposes an approach supplementary to conventional process migration. LM migration focusses on user mobility and fault-tolerance. The basic idea behind LM migration is to migrate processes associated with their execution environment as a whole, to eliminate most difficulties associated with process migration, for example, the transfer of open file states, the reconstruction of communication links, and shared memory among concurrent processes. LM migration has several advantages, such as supporting mobility of a complete user environment; simplifying the migration of concurrent processes within a host; reducing residual dependencies through the uniform and simple system interface; and supporting a totally transparent migration to both process level and LM level code. We have demonstrated the feasibility of LM migration. The. MINIX system which is a centralized
and Microprogramming 41 (1995) 315-330
system can be ported to execute within a logical machine environment and profit from the benefits of job migration. LM migration would be a useful approach to migrate systems that contain several cooperative long-running processes. In addition, the LM migration mechanism can also be used to checkpoint a whole LM for fault-tolerance.
Acknowledgements We would like to thank referees for their valuable comments. We also appreciate very much referee A’s patience of correcting errors for this paper.
Appendix We assess the performance of process migration in Distributed MINIX (DMINIX) [16,17], a MINIX extension developed in our laboratory. DMINIX is a process-based, UNIX-like distributed operating system. The process migration facility in DMINIX is used to increase the system throughput. DMINIX enhances the traditional MINIX system with a distributed file system and a process migration facility. The DMINIX file system provides a global file naming tree. Thus, a process can be moved during its execution, and continue on another processor, with continuous access to all its resources. However, the migrated process should be a sequential process. In the DMINIX system, several 80386-based microcomputers are connected with an Ethernet and DMINIX is running in 80386 Real-Mode. We have two teams to develop migration facilities, one for the DMINIX and one for the DLMS386. We found that the migration facilities on the DLMS386 system were completed more rapidly. On the basis of the DLMM kernel and the distributed disk service of the DLMS386, only one man-month was required to complete the LM migration facility (with no ILMC support).
J.-M. Lin, S.R. Tsai /Microprocessing
and Microprogramming
41 (1995) 315-330
Within the measurements, We model the performance of the DMINIX process migration as the following formula [171: DMINIX process migration time = 53.6 + 12.1s (ms) where 53.6ms is the elapsed time spent in transferring a null process that contains no user image; 12.lms/Kl3 is the average transfer time for transferring 1KB of user image; s is the memory space size of user image. The average transfer time of the DMINIX process migration is much higher than that in our system (about 3.2 times). The reason is that there is much overhead occurring in the movement of message data among the user process space and those of the cooperative MINIX server processes, such MM, FS, Amoeba Task, Kernel Task, and the context switch among these server processes during process migration. The overheads do not exist in the DLMS386 system in which the migration is performed totally by the lowest level monolithic control program, the DLMM. To make a comparison with the above two migration schemes, we assume there is one process with an image size X (and S = s = X) to be migrated. From the above two formulas, we found that the elapsed time spent in the two systems are the same when the process’s image size is 79.8KB. For a process with smaller image size, the process migration scheme has better migration performance, otherwise, LM migration is superior. Another comparison was made for the migration of multiple processes. In such a case, the formula of DMINIX process migration should be modified as follows: DMINIX process migration time (multiple process migration) : 53.6p+12.18
(ms)
0
7981 00
300
200
T:Migration time S:Total size of user part p:Migrated process number
368400 S
memory
(KB)
space
Fig. 10.Formula chart for LM migration and process migration in DMINIX.
where p is the number of migrant processes; S is the total size of images of the migrated processes. From Fig. 10, we conclude that LM migration is particularly valuable for migrating processes with large user image size.
References Dl Y. Artsy
and R. Finkel, Designing a process migration facility: the Charlotte experience, IEEE Compuf. 22 (9) (1989) 47-56. 121F. Doughs and J. Ousterhout, Process migration in the Sprite operating system, Proc. 7th IEEE Int. Conf: on Distributed Computing Systems (1987) 18-24. [31 F. Douglis and J. Ousterhout, Transparent process migration: Design alternatives and the Sprite implementation, Soware-Practice and Experience 21 (8) (Aug. 1991) 757784. [41 J.M. Smith, A survey of process migration mechanisms, ACM Operating Syst. Reu. 22 (3), (July 1988) 28-40. 151 M. Theimer, K. Lantz and D. Cheriton, Preemptable remote execution facilities for the V-system, Proc. 10th ACM Symp. on Operating Systems Principles (1985) 2-12.
330
J.-M. Lin, S.R. Tsai/Microprocessing
161R.P. Goldberg, Architecture principles for virtual computer systems, Ph.D. Thesis, Division of Engineering and Applied Physics, Harvard University, Cambridge, Massachusetts, 1972. 171 R.P. Golberg, Architecture of virtual machines, in Proc. AFIPS National Computing ConfY (1973) 309-318.
b31S.R. Tsai, A logical machine system to support concurrent operating environments, Ph.D. Dissertation, National Cheng Kung University, Tainan, Taiwan, ROC, Nov. 1987. [91 S.R. Tsai, L.M. Tseng and M.D. Beer, A microprocessorbased logical machine system, Microprogramming and Microprocessing
19 (1987) 375-384.
[lo] S.R. Tsai, L.M. Tseng and C.N. Chen, On the architecture support for logical machine systems, Microprocessing and Microprogramming
22 (1988) 81-96.
[ll] S.R. Tsai and L.J. Tsai, A logical machine monitor supporting an environment for development and execution of operating systems, J. Systems and So&vare 21 (1993) 27-39. [12] S.R. Tsai and J.M. Lin, Distributed Logical Machine System, Proc. 8th LASTED Int. Symp. Applied 395-397.
Informatics
(1990)
[13] Jim-Min Lin, A distributed logical machine monitor support-
ing data replication and logical machine migration, Ph.D. Dissertation, Department of Electrical Engineering, National Cheng Kung University, Nov. 1992. [14] J.M. Lin and S.R. Tsai, On the Logical Disk replication mechanism of the Distributed Logical Machine System, Proc. Nat. Computer Symp (1991) Chung-Li, Taiwan, ROC (Dec. 1991) 306-311. 1151G. Popek and B.J. Walker, The LOCUS Distributed System Architecture (MIT Press, 1985). [161S.R. Tsai, DMINIX - A Distributed MINIX Operating System, Proc. 8th LASTED Int. Symp. Appliedlnformatics (1990) 390-394. 1171 H.C. Tsai, Design and implementation of process migration
in distributed MINIX system, MS. Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, ROC, June 1991.
and Microprogramming
41 (1995) 315-330
[18] J.M. Lin and S.R. Tsai, On the fault-tolerant support of the Distributed Logical Machine Migration System, IEEE Singapore ISITA ‘92 (Nov. 16-20, 1992). [19] R. Zicari, Operating system support for software migration in a distributed system, Proc. 5th IEEE Symp. on Reliability in DistributedSoj&vare andDatabaseSystems, (1986) 178-187. [20] J.M. Lin and S.R. Tsai, The implementation of a network
transparent. Logical Machine communication mechanism, Proc. 1992 Int. Computer Symp. Taichung, Taiwan, ROC
(Dec. 1992) 569-575. [21] D.R. Cheriton, The V Kernel: A software base for distributed systems, IEEE Sojiware 1 (2) (April 19841 19-42. [22] Intel 80386 Programmer’s Reference Manual, Intel Inc., 1986. [23] DP8390/NS32490 network interface controller reference manual. National Semiconductor Corporation, 1987. Jim-Mm Lin is an associate professor at the Department of Information Engineering, Feng Chia University, Taiwan. He received his BS in engineering science, 1985, and MS and PhD in electrical engineering in 1987 and 1992 respectively., all from National Cheng Kung University, Taiwan. His research interests include operating systems, distributed systems, and software integration. Jim-Mm Lin is a member of IEEE.
Shang Rang Tsai is a professor at the Department of Electrical Engineering, National Cheng Kung University. His research interests include distributed systems, operating systems, and computer networking. He received his BS in Electrical Engineering from National Taiwan University in 1976 and his PhD in Electrical Engineering from National Chenn Kune Universitv in 1987. He is a member of The IEEE Computer Society.