Future Generation Computer Systems 23 (2007) 116–122 www.elsevier.com/locate/fgcs
A global and parallel file system for grids F´elix Garc´ıa-Carballeira ∗ , Jes´us Carretero, Alejandro Calder´on, J. Daniel Garc´ıa, Luis. M. Sanchez Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganes, Madrid, Spain Received 20 December 2005; received in revised form 18 April 2006; accepted 7 June 2006 Available online 2 August 2006
Abstract Data management is one of the most important problems in grid environments. Most of the efforts in data management in grids have been focused on data replication. Data replication is a practical and effective method to achieve efficient data access in grids. However all data replication schemes lack in providing a grid file system. One important challenge facing grid computing is the design of a grid file system. The Global Grid Forum defines a Grid File System as a human-readable resource namespace for management of heterogeneous distributed data resources, that can span across multiple autonomous administrative domains. This paper describes a new Grid File System according to the Global Grid Forum recommendations that integrates heterogeneous data storage resources in grids using standard grid technologies: GridFTP and the Resource Namespace Services, both defined by the Global Grid Forum. To obtain high performance, we apply the parallel I/O techniques used in traditional parallel file systems. c 2006 Elsevier B.V. All rights reserved.
Keywords: Data grids; Parallel I/O; Data declustering; High performance I/O; GridFTP; RNS
1. Introduction Currently there is great interest in the grid computing concept. Usually this concept denotes a distributed computational infrastructure in the field of engineering and advanced science [8]. A grid is composed by geographically sparse resources that join to form a virtual computer. The resources (computers, networks, storage devices, etc.) that define the grid are heterogeneous and reside in differentiated domains. This kind of system differs from other distributed environments such as clusters or local area networks in several aspects: (1) They are located in several administration domains. (2) The communication network used is the Internet. This feature allows us to build a grid with resources placed, for example, in Europe, America or Asia. (3) The different resources of the grid have a high degree of heterogeneity and must be accessible from any other part of the grid.
∗ Corresponding author. Tel.: +34 916249060; fax: +34 916249129.
E-mail address:
[email protected] (F. Garc´ıa-Carballeira). URL: http://arcos.inf.uc3m.es (F. Garc´ıa-Carballeira). c 2006 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2006.06.004
(4) The grid must be transparent to the users. They should not know where their programs are being executed or their data are being stored. Many applications in grids require the access to large amounts of data. For this kind of application, most of the efforts in data management in grids have been focused on data replication. Data replication is a practical and effective method to achieve efficient data access. File replicas are multiple copies of a file spread across the grid used for improving data access. However all data replication schemes provided in the literature lack in providing a global file system for accessing files in a data grid environment. One important challenge facing grid computing is a true and global file system for grid applications running in different administrative domains. The Global Grid Forum [11] defines a Grid File System as a human-readable resource namespace for management of heterogeneous distributed data resources that can span across multiple autonomous administrative domains, that can include: • A logical resource namespace across multiple administrative domains. • Standard interfaces. • A virtual namespace of a WAN file system. • Independence of physical data access/transport, authentication mechanisms.
F. Garc´ıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122
Furthermore, a grid file system must provide high performance data access. Parallel I/O has been the traditional technique for improving data access in clusters and multiprocessors. Parallelism in file systems is obtained by using several independent servers and striping data among these nodes to allow parallel access to files. There are many parallel file systems for these kinds of platforms; however very few parallel file systems have been provided for grid environments. The main contribution of this paper is to describe a grid file system based on the Expand Parallel File System [12] designed by the authors. Expand is a parallel file system for clusters that uses standard servers and protocols for building parallel and distributed partitions where files are distributed. We have extended this parallel file system for providing a parallel file system for grids. The new version of Expand for grids uses standard grid technologies: GridFTP for data access, and the Resource Namespace Service for naming. This system allows the integration of existing heterogeneous servers that span across multiple administrative domains in grids for providing parallel I/O services using standard interfaces like POSIX and MPI-IO. The new version of this parallel file system has been implemented using Globus [10], one of the most important middlewares for building grid applications. The rest of the paper is organized as follows: Section 2 presents the related works. Section 3 describes the standard grid technologies used for implementing the new version of Expand. Section 4 presents the main design aspects of Expand for grids. Section 5 shows some evaluation results. Finally, Section 6 presents some conclusions and the future work. 2. Related work In the context of replication, most of the works have been focused on data availability and high performance data access. File level and dataset level replication and replica management have been studied in several works [17,19,23,2, 22]. The use of replication originates two main problems: it supposes an intensive use of resources, not only in storage, but also in management. Furthermore, it is not appropriate for applications that modify the same set of information, because some resources must be used in collaborative environments. A way to improve the performance of I/O is parallel I/O. The use of parallelism in file systems is based on the fact that a distributed and parallel system consists of several nodes with storage devices. Parallelism in file systems is obtained using several independent server nodes supporting one or more secondary storage devices. Data are striped among these nodes and devices to allow parallel access to different files, and parallel access to the same file. The usage of parallel file systems and parallel I/O libraries has been studied in a great number of systems and platforms: Vesta [6], ParFiSys [5], Galley [14], PVFS [4], GPFS [20], and MAPFS [16]. Armada [15] is a parallel file system for computational grids, but does not use standard grid services. There are several high-performance file systems for supporting more than a thousand clients such as for example Lustre [13] and the Google File System [9]. GFARM [24] is
117
a file system designed for file sharing and high-performance distributed and parallel data computing in a Grid across administrative domains. However, GFARM implements a I/O server running on every file system node and does not uses standard grid services. 3. Grid technologies used The main objective of the Grid parallel file system described in this paper is to use standard grid technologies. Using standard technologies allows an easy deployment of the parallel file system in practically any grid environment. With this aim, we use services defined by the Global Grid Forum Recommendation, one of them implemented in the Globus Toolkit. We use two main technologies for the grid parallel file system: the GridFTP protocol for transferring data, and the Resource Namespace Service (RNS) for building the directory service. The following sections describe these elements. 3.1. GridFTP GridFTP [1,10] is a data transfer protocol defined by the Global Grid Forum Recommendation, that provides secure and high performance data movement in grid systems. The Globus Toolkit provides the most commonly used implementation of that protocol, though others do exist (primarily tied to proprietary internal systems). This protocol extends the standard FTP protocol and includes the following features: • • • • •
Grid Security Infrastructure (GSI) support. Third-party control and data transfer. Parallel data transfer using multiple TCP streams. Striped data transfer using multiple servers. Partial file transfer and support for reliable and restartable data transfer.
The transfer data in our parallel file system is based on this protocol and the implementation provided by the Globus Toolkit. The access to this protocol in Globus is provided via two libraries: the Globus FTP control library, and the Globus FTP client library. The first library provides low-level services needed to implement FTP client and servers. The API provided by this library is protocol specific. The Globus FTP Client library provides a convenient way of accessing files on remote FTP servers. 3.2. Resource namespace service The Resource Namespace Service (RNS) is a specification of the Grid File System Working Group (GFS-WG) of the Global Grid Forum that allows the construction of a uniform, global, hierarchical namespace. It is a web service described by a RNS WSDL [18]. The GFS-WG proposes a RNS profile for use with Grid File Systems. RNS is a three-tier naming architecture (see Fig. 1) which consists of human interface names, logical reference names, and endpoint references. RNS allows two levels of indirection. The first level is realized by mapping human interface names directly to endpoint references. The second
118
F. Garc´ıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122
Fig. 1. RNS three-tier naming architecture.
level of indirection may be appreciated when mapping human interface names to logical references or logical names, which in turn map logical names to logical references and hence the second level of indirection. This second level of indirection has the advantage of using a logical name to represent a logical reference, and therefore, logical names may be referenced and resolved independently of the hierarchical namespace. This means that logical names may be used as a globally unique logical resource identifier and be referenced directly by both the RNS namespace as well as other services. RNS is composed of two main components: virtual directories and junctions. Directories are virtual because they do not have any corresponding representation outside of the namespace. A junction is an RNS entry that interconnects a reference to an existing resource into the global namespace. There are two types of junction, referral junction and logical reference junction. The first is used to graft RNS namespaces. A logical reference junction contains a unique logical name. For this type of junction, the RNS service returns an endpoint reference (EPR). This endpoint is resolved using the Resource Endpoint Resolution Service or RNS resolver. RNS is composed of several types of operations: operations for querying namespace entry information; operations for creating, removing, renaming, and updating entries; and operations for managing properties or status of an entry. The RNS resolver is a service independent of RNS that has the mapping between logical names and endpoint references or address. For each logical name, the RNS resolver may store several endpoint references. The RNS resolver has operations for resolving names to endpoint references, and operations for creating, removing, and updating logical references. We have implemented a prototype of web service for Globus that incorporates the RNS specification and that has been used in Expand for providing a global namespace in grids. 4. Expand design for grid environments To provide a global and parallel file system for grid computing, the authors have modified the Expand Parallel File system for using the standard grid technologies described in the previous sections: GridFTP protocol and RNS.
Fig. 2. Expand architecture for grid environments.
Fig. 2 shows the architecture of Expand. This figure shows how Expand can be used for data management in cluster environments and how it can be used to access several sites using the GridFTP protocol. File data are striped by Expand among several servers using different protocols, using blocks of different sizes as the stripping unit. Processes in clients use an Expand library to access to an Expand distributed partition. Expand offers an interface based on POSIX system calls. This interface, however, is not appropriate for parallel applications using striped patterns with small access size [14]. For parallel applications, we use ROMIO [21] to support MPIIO interface [3], implementing the appropriate Expand ADIO. The next sections describe data distribution, file structure, naming, metadata management, and parallel access to files in Expand, using the GridFTP protocol and the RNS service specification. 4.1. Data distribution and files Expand combines several GridFTP servers (see Fig. 2) in order to provide a generic distributed partition. The use of GridFTP allows us to use servers allocated in different administrative domains. Each server provide one or more directories that are combined to build a distributed partition
119
F. Garc´ıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122
Fig. 3. The file structure and directory mapping in Expand.
through the grid. All files in the system are striped across all GridFTP servers to facilitate parallel access, with each server storing conceptually a subfile of the parallel file. A file consists of several subfiles, one for each GridFTP server. All subfiles are fully transparent to the Expand users. On a distributed partition, the user can create striped files with cyclic layout. In these files, blocks are distributed across the partition following a round-robin pattern. This structure is shown in Fig. 3. 4.2. Naming and metadata management Partitions in Expand are defined using a small configuration file. For example, the following configuration file defines two partitions: /xpn1 8 4 gsiftp://host1/export/home1 gsiftp://host2/export/home2 gsiftp://host3/export/home3 gsiftp://host4/home /xpn2 4 2 nfs://server1/users nfs://server2/export/home/users
Each subfile of a Expand file (see Fig. 3) has a small header at the beginning of the subfile. This header stores the file’s metadata. This metadata includes the following information: stripe size, base node, that identifies the NFS server where the first block of the file resides and the file distribution pattern used. By the moment, we only use files with cyclic layout. All subfiles have a header for metadata, although only one node, called master node (described below) stores the current metadata. The master node can be different from the base node. To simplify the naming process and reduce potential bottlenecks, Expand does not use any metadata manager, as that used in PVFS [4]. Fig. 3 shows how directory mapping is made in Expand. The naming service is provided by a prototype web service that incorporates the RNS specification. Fig. 4 show the naming and the data access process. The metadata of a file resides in the header of a subfile stored in a GridFTP server. This server is the master node of the file, similar to the mechanism used in the Vesta Parallel File System [6]. To obtain the master node of a file we use the mechanism implemented in Expand, the file name is hashed into the number of the node. The hash function used in the current prototype is: Server(name file) =
This configuration file defines two Expand partitions. The first partition uses 4 servers (host1, host2, host3, and host4) and it uses by default a striping unit of 8 kB. The partition in this case uses GridFTP protocol, and the Globus Toolkit for accessing to the files located in different sites. The second partition uses 2 NFS servers and it uses a striping unit of 4 kB. This type of partition is appropriate for cluster data access. The path /xpn1 is the root path for the first partition, and /xpn2 is the root path for the second partition. So, the Expand file /xpn1/dir/data.txt is mapped in the following subfiles: gsiftp://host1/export/home1/dir/data.txt gsiftp://host2/export/home2/dir/data.txt gsiftp://host3/export/home3/dir/data.txt gsiftp://host4/home/dir/data.txt
i=strlen(name X file)
! name file[i] mod numServers
i=1
Because the determination of the master node is based on the file name, when a user renames a file, the master node for this file is changed. The algorithm used in Expand to rename a file is the following: rename(oldname, newname) { oldmaster = hash(oldname) newmaster = hash(newname) move the metadata from oldmaster to newmaster } Expand offers two different interfaces. The first interface is based on POSIX system calls. This interface, however, is not appropriate for parallel applications using stripe patterns
120
F. Garc´ıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122
Fig. 4. Naming and data access process.
with small access size [14]. Parallel applications can also used Expand with MPI-IO [3]. Expand has been integrated inside ROMIO [21] and can be used with MPICH. Portability in ROMIO is achieved using an abstract-device interface for IO (ADIO). 4.3. Parallel access and authentication
All file operations in Expand use a virtual filehandle. This virtual filehandle is the reference used in Expand to reference all operations. When Expand needs to access a subfile, it uses the appropriated filehandle. For GridFTP Expand uses the appropriate handle managed by the GridFtp Client Library provided by Globus. To enhance I/O, user requests are split by the Expand library into parallel subrequests sent to the involved servers. When a request involves k GridFTP servers, Expand issues k requests in parallel to the servers, using threads to parallelize the operations. The same criteria is used in all Expand operations. A parallel operation to k servers is divided in k individual operations that are provided by the Globus GridFTP Client Library to access the corresponding subfile. This process is show in Fig. 4. The access control and authentication is guaranteed in Expand for grid environments, because GridFTP uses the Grid Security Infrastructure provided by Globus. GSI uses public key cryptography as the basis for its functionality and provides secure communication (authenticated and perhaps confidential) between elements of a computational Grid, security across organizational boundaries, thus prohibiting a centrally-managed security system, and support single sign-on for users of the Grid, including delegation of credentials for computations that involve multiple resources and/or sites.
5. Performance evaluation The main motivation for our performance test is to study the feasibility of using grid services for implementing a parallel file system. With this aim, the evaluation has been made in two scenarios. The first scenario evaluates Expand with grid services in a typical grid computing environment. The second tries to analyze the use of this system for parallel applications using MPI running in a cluster. For analyzing the system in a typical grid scenario we have defined a grid benchmark that consists of 500 jobs scheduled on 4 workstations. Each job accesses a random number of files (between 1 and 10 files) chosen from among 1000 files. The size of each file is 500 MB. This benchmark has been tested in different systems: • All files are stored in one GridFTP server (1 Site GridFTP in the figure) and they are accessed using the globus url copy, the command line tool provided by Globus. • The files are distributed among 4 GridFTP servers (4 sites Distributed Replicas in the figure). Each server store 250 files and they are accessed using the globus url copy. • All files are replicated in the 4 GridFTP servers (4 sites Full replication in the figure). Each server stores 1000 files that are accessed sequentially using the globus url copy. • All files are replicated in the 4 GridFTP servers (4 sites Full replication-parallel access in the figure). Each server stores 1000 files, but in this case each file is accessed in parallel using the 4 servers. • Using Expand with GridFTP protocol (GridExpand in figures) and several number of servers for the distributed partition (1, 2, and 4). The files are accessed using POSIX system calls. For the second scenario, we have used the FLASH I/O benchmark [7]. This benchmark uses the parallel HDF5
F. Garc´ıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122
121
Fig. 5. Performance results for the grid benchmark.
appropriate for building grid file systems spanning across multiple administrative domains. For this kind of environment, Expand with NFS and PVFS cannot be applied. 6. Conclusions and future work
Fig. 6. Performance results for the Flash-IO benchmark.
interface that uses in turn the MPI-IO interface. The FLASH I/O benchmark performs three separate performance tests: checkpoint, plotfile without corners, and plotfile with corners. This benchmark is intensive in write operations and allows us to analyze file system behaviour. For this benchmark, we have compared the performance of Expand with GridFTP, and NFS and the performance of PVFS. In the evaluation we have used 4 workstations for running the processes and 4 workstations for the GridFTP servers, all of them with the Globus Toolkit 4 installed. Fig. 5 shows the results obtained for the Grid Benchmark. The best results are obtained for 4 sites Full replication-parallel access and Expand for 4 servers. This result shows how the parallel I/O is an effective technique for improving data access in grids. Furthermore, Expand does not require full replication for obtaining these results. Fig. 6 shows the performance obtained for the FlashIO benchmark for 4 and 8 processes. As we can see the performance is better for Expand (XPN-NFS) using NFS protocol and PVFS, than that obtained for Expand using the GridFTP protocol (XPN-GridFTP). Although GridFTP does not provide good results for this benchmark, however it demonstrates that it is feasible to use this approach for providing a grid file system. The use of this system is
In this paper we have described a new parallel file system for grids according to the Global Grid Forum Recommendations. This system is based on the Expand Parallel File System designed by the authors. The new version of Expand for grids uses standard grid technologies: GridFTP for data access, and the Resource Namespace Service for naming, and allows the integration of existing heterogeneous servers spanning across multiple administrative domains in grids for providing parallel I/O services using standard interfaces like POSIX and MPI-IO. The performance results demonstrate that it is feasible to use this system for providing a grid file system. Future work is going on into fault tolerant support and the study of new schemes for data allocation and prefetching algorithms for data grids. Acknowledgment This work has been supported by the Spanish Ministry of Education and Science under TIN2004-02156 contract. References [1] B. Allock, J. Bester, J. Bresnahn, A. Chervenak, I. Foster, C. Keseelman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke, Secure, efficient data transport and replica management for high performance data-intensive computing, in: Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies-volume 00, April 17–20, 2001, pp. 13–28. [2] H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, F. Zini, OptorSim — A Grid simulator for studying dynamic data replication strategies, International Journal of High Performance Computing Applications 17 (4) (2003). [3] A. Calderon, F. Garcia, J. Carretero, J.M. Perez, J. Fernandez, An implementation of MPI-IO on Expand: A parallel file system based on NFS servers, in: 9th PVM/MPI European Users Group, Johannes Kepler University Linz, Austria, September 29–October 2, 2002, pp. 306–313. [4] P.H. Carns, W.B. Ligon III, R.B. Ross, R. Takhur, PVFS: A parallel file system for linux clusters, Tech. Rep. ANL/MCS-P804-0400, 2000.
122
F. Garc´ıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122
[5] J. Carretero, F. Perez, P. de Miguel, F. Garcia, L. Alonso, Performance increase mechanisms for parallel and distributed file systems, in: Parallel Computing: Special Issue on Parallel I/O Systems, no. 3, Elsevier, 1997, pp. 525–542. [6] P. Corbett, S. Johnson, D. Feitelson, Overview of the Vesta parallel file system, ACM Computer Architecture News 21 (5) (1993) 7–15. [7] FLASH I/O benchmark routine — Parallel HDF 5, http://flash.uchicago.edu/˜zingale/flash benchmark io/. [8] I. Foster, C. Kesselman (Eds.), The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. [9] S. Ghemawat, H. Gobioff, S. Leung, The Google file system, in: Proceedings of the 19th ACM Symposium on Operating Systems Principle, 2003. [10] The Globus Toolkit, http://www.globus.org. [11] Global Grid Forum, The GGF file system architecture workbook, http://www.globalgridfoum.net, 2002. [12] F. Garcia, A. Calderon, J. Carretero, J.M. Perez, J. Fernandez, The design of the Expand parallel file system, International Journal of High Performance Computing Applications 17 (1) (2003) 21–37. [13] Cluster File Systems, Inc., Lustre: A scalable, high-performance file system, http://www.lustre.org. [14] N. Nieuwejaar, D. Kotz, The Galley parallel file system, in: Proceedings of the 10th ACM International Conference on Supercomputing, May 1996. [15] R. Olfield, D. Kotz, Armada: A parallel file system for computational Grids, in: International symposium on cluster computing and the grid, Brisbane, Australia, May 2001, IEEE Computer Society Press, 2001, pp. 194–201. [16] M.S. Perez, J. Carretero, F. Garcia, J.M. Pe˜na, V. Robles, MAPFS: A flexible multiagent parallel file systems for clusters, Future Generation Computer Systems 22 (2006) 620–632. [17] K. Ranganathan, I. Foster, Identifying dynamic replication strategies for high performance data grids, in: Proceedings of International Workshop on Grid Computing, Denver, November 2002, pp. 75–86. [18] M. Pereira, O. Tatebe, L. Luan, T. Anderson, J. Xu, Resource Namespace Service specification, November, 2005, http://www.global.http://www.globalgridforum.net. [19] A.S. Tosun, H. Ferhatosmanoglu, Optimal parallel I/O using replication, in: Proceedings of International Workshop on Parallel Processing, ICPP, Vancouver, Canada, 2002, pp. 506–513. [20] F. Schmuck, R. Haskin, GPFS: A shared-disk file system for large computing clusters, in: Proceedings of the Conference on File and Storage Technologies, FAST’02, 28–30 January 2002, Monterey, CA, pp. 231–244. [21] W. Gropp, R. Takhur, E. Lusk, An abstract-device interface for implementing portable parallel-I/O interfaces, In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, October 1996, pp. 180–187. [22] M. Tang, B.-S. Lee, X. Tang, C. Ueo, The impact of data replication on job scheduling performance in the Data Grid, Future Generation Computer Systems 22 (2006) 254–268. [23] O. Tatebe, Worldwide fast file replication on Grid datafarm, in: Proceedings of the 2003 Computing in High Energy and Nuclear Physics, CHEP03, March 2003. [24] O. Tatebe, N. Soda, Y. Morita, S. Matsuoka, S. Sekiguchi, Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing, in: Proceedings of the 2004 Computing in High Energy and Nuclear Physics, CHEP04, Switzerland, September 2004.
F´elix Garc´ıa-Carballeira received the M.S. degree in Computer Science in 1993 at the Universidad Politecnica de Madrid, and the Ph.D. degree in Computer Science in 1996 at the same university. From 1996 to 2000 was an associate professor in the Department of Computer Architecture at the Universidad Politecnica de Madrid. He is currently an associate professor in the Computer Science Department at the Universidad Carlos III de Madrid. His research interests include high performance computing and parallel file systems. He is coauthor of 9 books and he has published some 70 articles in journals and conferences. ´ Carretero got his Computer Science degree and Jesus his Ph.D. at the Universidad Politecnica de Madrid. Since 1989, he has been teaching Operating Systems and Computer Architecture in several universities. During 1997 and 1998, he had a visiting scholar position at the Northwestern University, in Chicago. He is a full professor at the Universidad Carlos III de Madrid, Spain, since 2001. His research interest is focused on Parallel and Distributed Systems, especially data storage systems, Real-Time Systems and Multimedia Techniques. He is author of several educational books and he has published papers in several major journals in this area, such as, for example, Parallel Computing and Journal of Parallel and Distributed Computing. Alejandro Calder´on got his M.S. in Computer Science at the Universidad Politecnica de Madrid in 2000 and his Ph.D. in 2005 at the Universidad Carlos III de Madrid. He is an associate professor in the Department of Computer Science at the Carlos III University of Madrid, Spain. His research interests include high performance computing and parallel file systems. Alejandro has participated in the implementation of MiMPI, a multithread implementation of MPI, and Expand parallel file system. J. Daniel Garc´ıa got his Computer Science degree at the Universidad Politecnica de Madrid in 2001 and his Ph.D. in 2005 at the Universidad Carlos III de Madrid. He is an assistant professor since 2002 at the Universidad Carlos III de Madrid teaching Computer Architecture and Operating Systems. His research interest is focused on Parallel and Distributed Systems, especially data storage systems, and RealTime Systems. Luis. M. Sanchez got his Computer Science degree at the Universidad Carlos III de Madrid in 2003. He is an assistant professor since 2003 at the Universidad Carlos III de Madrid teaching Computer Architecture and Operating Systems. His research interest is focused on Parallel and Distributed Systems, especially data storage systems.