Volunteer computing: requirements, challenges, and solutions

Journal of Network and Computer Applications 39 (2014) 369–380 Contents lists available at ScienceDirect Journal of Network and Computer Application...

Download PDF

798KB Sizes 7 Downloads 81 Views

Report

PDF Reader
Full Text

Journal of Network and Computer Applications 39 (2014) 369–380

Contents lists available at ScienceDirect

Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca

Review

Volunteer computing: requirements, challenges, and solutions Muhammad Nouman Durrani n, Jawwad A. Shamsi Systems Research Laboratory, Department of Computer Science, FAST NU, Karachi, Pakistan

art ic l e i nf o

a b s t r a c t

Article history: Received 3 August 2012 Received in revised form 3 May 2013 Accepted 18 July 2013 Available online 29 July 2013

Volunteer computing is a form of network based distributed computing, which allows public participants to share their idle computing resources, and helps run computationally expensive projects. Many existing volunteer computing platforms consist of millions of users, providing huge amount of memory and processing. Since the rapid growth in the volunteer computing projects, more researchers have been attracted to study and improve the existing volunteer computing system. However, the progress of concurrently running projects has slowed down due to the increasing competition of volunteers. Moreover, because of high computational needs and low participation rate of volunteers, attracting more volunteers and using their resources more efﬁciently have become extremely important, if volunteer computing is to remain a feasible method. In order to competently use the huge number of volunteered resources, workers' analysis and efﬁcient task retrieval policies are important. The purpose of this paper is to assess the strengths and requirements of current volunteer computing platforms. The paper analyses different issues relating to volunteer computing such as analysis of workers, the effectiveness of workers, how their communication and computation can be modeled and how the effectiveness of task distribution and results veriﬁcation policies are analyzed. At the end, some research directions in the form of partial results, and their intermediate veriﬁcation have been shown, which may improve the performance of the overall system. Moreover, this survey will enable the research community to study the available schemes used in volunteer computing and help them ﬁll gaps in existing research. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Volunteer Computing Performance Task retrieval Scheduling Voting Classiﬁcation

Contents 1. 2.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Volunteer computing platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 2.1. Evolution of volunteer computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 2.2. Requirements for volunteer computing platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 2.3. Computation model of volunteer computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 3. Research issues and challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 3.1. Analysis of volunteer computing workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 3.2. How to identify patterns resources?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 3.3. How to improve resource management in VC by applying knowledge of availability patterns?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 3.3.1. Existing task distribution policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 3.3.2. Solutions: beyond task distribution policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 3.3.3. Role of trust and security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 3.3.4. Security mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 4. Open problems and future research directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

1. Introduction n

Corresponding author. Tel.: +92 33 33887366; fax: +92 21 34100549. E-mail addresses: [email protected], [email protected] (M. Nouman Durrani), [email protected] (J.A. Shamsi). 1084-8045/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jnca.2013.07.006

Volunteer computing (VC) is a form of distributed computing, which allows public participants to share their idle computing resources and helps run computationally expensive projects

370

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

(Volunteer computing, 2013; Desell et al., 2009; Watanabe et al., 2011). VC provides bulk of inexpensive resources that can supply more computing power to science than does any other type of computing. It boosts public interest in scientiﬁc research and provides the scientists with voice in determining the directions of scientiﬁc research that could not be performed otherwise. A research project that has limited resource funding but large public appeal can catch the massive computing power of VC. Many existing VC platforms consist of millions of users, providing huge amount of memory and processing (Volunteer computing, 2013). VC offers a scalable, low cost, reliable, and powerful computing platform to its users where expensive tasks are disintegrated into small chunks for parallel execution. Volunteers (also known as workers, hosts or nodes) provide a portion of their storage or computational resources to form a resource-cloud. In order to run computationally expensive projects, a middleware is needed, which collects chunks (a portion of data that needs to be processed – workunit) from servers, and sends these workunits to volunteers for storing or processing (Toth and Finkel, 2009; Toth, 2008). In VC, users have heterogeneous resources with varying capabilities (Schmohl and Baumgarten, 2008). The foremost challenge is proper distribution of tasks according to the capacity of each volunteer in order to achieve timely completion of jobs. Further, this would ensure efﬁcient utilization of resources, reduced cost of operation, and enhanced user satisfaction (Anderson et al., 2005; Toth and Finkel, 2008). Heterogeneity-aware distribution of workload would also permit priority-based execution of tasks and improve real time behavior. In addition, it may increase the pool of applications that may be executed on the VC platform. Various task scheduling policies such as earliest-deadline-ﬁrst, work fetch, buffer multiple tasks, weighted round robin, work send and job completion estimation policies have already been practiced in VC (Toth and Finkel, 2009; Toth, 2008; Toth and Finkel, 2008; Kondo et al., 2007; Anderson, 2011). However, most of these policies are used to reduce the overall execution time of parallel computation, without consideration of the correctness of computational results. Furthermore, VC projects may vary in terms of job length, errors sensitivity, and result veriﬁcation requirements (Estrada et al., 2009a; Sarmenta, 2001; Sarmenta, 2002). Large numbers of inexpensive resources are available to store and process big-data. Numerous schemes are used to utilize these resources. However, the progress of concurrently running projects has slowed down due to inefﬁcient use of voluntary resources. The purpose of this paper is to evaluate the strengths and requirements of current volunteer computing platforms. The paper analyses different issues relating to volunteer computing such as analysis of workers, the effectiveness of workers, how their communication and computation can be modeled and how the effectiveness of task distribution and results veriﬁcation policies are analyzed. At the end some research challenges are also identiﬁed. Moreover, this paper will enable the research community to study the available schemes used in VC and help them ﬁlling research requirements. The sequence of this paper is organized as follows: Section 2 discusses about the volunteer computing platforms, Section 3 presents the research issues and challenges, Section 4 open problems and future research directions.

2. Volunteer computing platforms This section explains the background concept of VC. The section begins with brief introduction leading to detailed discussion of BOINC, which is deemed as the most popular VC middleware.

2.1. Evolution of volunteer computing In 1990′s, there was a burst of research activities related to VC, including ATLAS, Charlotte, Bayanihan, ParaWeb, Popcorn, Javelin, Gucha, Distriblets, and others (Toth, 2008; Sarmenta, 2001). At the end of 2000, interest in the ﬁeld was shifted towards the more productive area of grid computing, and only few research projects remained active due to their low-cost, large-scale distributed computing. In grid computing, resources are owned and shared within and between organizations; where in VC, resources are sporadic and unreliable in nature. At any instance, they may leave the system and make few guarantees about the machine and network. Using this bulk of inexpensive resources, initial goals of projects like GIMPS, distributed.net and grid.org were made successfully possible. Different middleware platforms were designed to build VC projects. However, middleware platforms like BOINC(Berkeley Open Infrastructure for Network Computing) (Volunteer computing, 2013; Yi et al., 2010; Estrada et al., 2006) and Xtremweb continued to move forward, with BOINC having become the leading framework to build volunteer computing projects (Toth, 2008; Toth and Finkel, 2008). Due to the successful development of middleware platforms, the following projects were initiated: GIMPS: in January 1996, George Woltman started Great Internet Mersenne Prime Search (GIMPS), the oldest of the major VC projects. On December 20, 2012, GIMPS ﬁnished doublechecking every smaller Mersenne number than M(25964951) – proving that this prime is indeed the 42nd Mersenne prime (Great Internet Mersenne Prime Search (GIMPS), 2013). distributed.net: founded in 1997, distributed.net was the Internet's ﬁrst general-purpose distributed computing project. Distributed.net used VC to win several cryptographic challenges of well-known encryption algorithms. Having successfully completed Optimal 26-Mark Golomb Rulers, DES-III, DES II-2, DES II-1, RC5-56 and RC5-64, it is now working on RC5-72 encryption algorithm, RSA prime factoring, fermat numbers and elliptic curve cryptosystem (ECC) (Distributed.net, 2013). SETI@home: SETI@home (setiathome.berkeley, 2013; Anderson et al., 2002) searches massive amount of radio telescope data for signs of extra-terrestrial intelligence. SETI@home is one of the most well-known VC project started in May 1999. As of January, 2013, over 3.32 million users are participating in this project. It has the ability to compute over 925 teraFLOPS. grid.org: established in 2001, grid.org was a website and online community services that operated several different VC projects and allowed workers to donate their free computing resource to valuable projects. In this regard, cancer research project was the ﬁrst project of Grid.org. In the project, it successfully screened billions of target molecules against known cancer target proteins. Following the cancer project, it worked on other projects to identify treatments for smallpox and anthrax. Currently, grid.org is working on analyzing human proteins folding, hidden Markov Modeling and Webload testing (grid. org, 2013). Folding@home: Folding@home (What is protein folding, 2013), understands protein folding, misfolding, and related diseases, with a minor emphasis on protein structure prediction. The primary purpose of this project is to determine the mechanisms of protein folding, which is the process by which proteins reach their ﬁnal 3-D structure. It also observes the roots of protein misfolding. The project has founded the use of, PlayStation 3s, GPUs and MPI for distributed computing and scientiﬁc research. Folding@home uses statistical simulation methodology that is a complete paradigm shift from traditional computational approaches. Moreover, Folding@home is one of the

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

world's fastest growing computing systems, with a speed 1 PetaFLOP, greater than all projects running on the BOINC. At the end of 2012, the project was also the world′s most powerful molecular dynamics simulator. Also the increase in computational power has endorsed researchers to run thousand times computationally expensive atomic-level simulations of protein folding. The Quake-Catcher Network: QCN (2013) uses accelerometers connected to personal computers to detect earthquakes and to perform education about seismology and so forth. Besides the current running projects under VC platforms, there are hundreds of large-scale projects which need huge computational resources. Only the power of VC can in-expensively process those projects. 2.2. Requirements for volunteer computing platforms Generally, VC follows a master-worker parallel computing model (Watanabe et al., 2011; Sarmenta, 2001; Heien et al., 2009; Watanabe and Fukushi, 2010) as shown in Fig. 1. In this model the master disintegrates massive tasks into small chunks. The master then distributes these small chunks among workers. Workers perform required computation and send results back to the master. The master then veriﬁes data results and aggregates them to compute ﬁnal results. All operations of a VC system are handled by a middleware. A VC system must ensure the following requirements: (i) efﬁcient division of tasks into small chunks, so that they may be resourcefully executed over workers, (ii) robust scheduling of tasks and resources, so that the overall throughput of the system will increase, (iii) modeling communication and computation of huge number of sporadic resources, (iv) trust: the volunteers trust middleware of different projects that do not harm their systems or attack their privacy, (v) truthfulness: the volunteers trust that the project is truthful about the claimed applications of project, claimed work done by their middleware, and how the processing and storage resources (intellectual property) will be used, and (vi) security: the volunteer trusts the projects to use appropriate security policies, so that hackers may not use the project as a source for malicious activities (Volunteer computing, 2013; Security Threats, 2012). Currently, BOINC is the most popular platform. It is an opensource platform for VC. The BOINC model involves volunteers and projects. Volunteers are users who participate by contributing their resources in the form of storage and processing. Projects are organizations (normally academic research groups) that need computational power. Each project runs on its own BOINC server. Volunteers contribute by running BOINC client on their computers and other devices.

371

The popularity of BOINC stems in making it possible for scientists and researchers to hit into the massive processing power of heterogeneous devices around the world. BOINC has been developed by a team based at the Space Sciences Laboratory, University of California, Berkeley. As a high performance parallel distributed computing platform, BOINC has about 8,374,328 hosts, 540,130 active hosts with 7.4 petaFLOPS average processing as of January 2013 (Volunteer computing 2013). As shown in Fig. 2, a VC system consists of workers and server side middleware. Workers consist of a middleware and project speciﬁc execution code. Workunit are executed on the BOINC workers using the project speciﬁc execution code. A BOINC server comprises of a data server, scheduling server, database server, and BOINC daemon processes such as work generator (generating new workunits), feeder (reorganize the scheduler′s database access), transitioner (as needed, it generates new results and identiﬁes error conditions); validator (validating results); assimilator (handle newly canonical results), deletor (deletes those input and output ﬁles that are no longer needed) and database purger (removes jobs and instance database entries that are no longer needed) (Anderson et al., 2005; Estrada et al., 2009a; Yi et al., 2010; Taufer et al., 2007; Lee et al., 2010). 2.3. Computation model of volunteer computing This section discusses the computational model of VC. Generally, it consists of X sets of tasks denoted by S1,… Sx. Each set Si has N tasks JN. The submission time, computational time and deadline of each task i are represented by Subi, SiTi and Di respectively. Figure 3 illustrates model of tasks and sets used in this paper. Satisﬁed tasks are completed and reported before its deadline and its set is called a satisﬁed set (Heien et al., 2009). In VC systems, there are P workers W1….., WP to compute the tasks of a set. When the worker initially become available, it request for a connection. After speciﬁed intervals, a worker also requests for a connection. This interval is known as a reconnection time. A worker at any given time is either available or unavailable,

Project Specific Execution Code Volunteer Computing Middleware Worker

Scheduling Server

Data Server

Database Server

Master

W1

W2

W3

W4

WP-2

WP-1

WP

Work generator

Purger

Feeder

Deletor

Transitioner

Assimilator Validator

Workers Fig. 1. Master-worker model of VC.

Server Fig. 2. Client and server side of BOINC middleware.

372

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

S1

S2

SX Time

J1

J1

J1

(Sub1, T1, D1)

(Sub1, T1, D1)

(Sub1, T1, D1)

J2 .

J2 (Sub2, T2, D2) .

J2 (Sub2, T2, D2) .

.

.

.

.

.

.

JN (SubN, TN, DN)

JN (SubN, TN, DN)

JN (SubN, TN, DN)

(Sub2, T2, D2)

3.1. Analysis of volunteer computing workers

Fig. 3. Computational model of VC having X sets S and each set have N jobs J. Submission, computational time T and deadline D of each job JN in a set SN may be different from each other.

0

50

100

150

200

250

300

priority-based execution of tasks and improve real time behavior. In addition, it may increase the pool of applications that may be executed on the VC platform. In order to efﬁciently utilize the huge number of volunteered resources, workers analysis, efﬁcient size of a workunit, workers communication, computation, task distribution and results veriﬁcation are important.

350

Days Fig. 4. In this ﬁgure black sections indicate the workers availability; white sections indicate the workers unavailability and the striped bars show the beginning and end of workers lifespan.

whereas the project server is always available. If a worker is an uninterrupted available state, then it is called availability interval and a period where the worker is in uninterrupted unavailable state, then it is called unavailability interval. Hence, the lifespan of a worker is the interval between the start of the initial availability interval say AI and the end of the latest availability interval AL. In large scale distributed systems like VC, the key focus is the average availability, which is the fraction of worker lifespan spent in the available state. Heien et al. (2009) analyzed the average availability interval of three different workers (A, B and C) at the BOINC server, and found to be 95%, 65% and 33% respectively as shown in Fig. 4. Through checkpointing, the task is resumed at the same point, when a worker transitions from unavailable state to available state.

The workers traces allow us to know whether a worker is available when a powered-on worker is unavailable to work, and when a worker′s computer is powered off (Estrada et al., 2006; Heien et al., 2009; Taufer et al., 2007). To accomplish effective task distribution, workers analysis is the foremost challenge. Speciﬁcally, the following issues are discussed in this section: (i) How to measure CPU availability across millions of workers? (ii) How to identify patterns of availability among voluntarily dedicated resources? and (iii) How to improve resource management in VC by applying knowledge of availability patterns? How to measure CPU availability across millions of workers? Middleware such as BOINC, xtremWeb, Xgrid, Grid MP are mainly used for gathering and measuring data at large-scale. The BOINC client is normally used for measuring CPU availability. It records the start and stop interval of CPU availability, and conﬁne the temporal structure of CPU. Toth and Finkel (2009) wrote a service to query the operating system every 10 s, to determine whether the screensaver was running or not. After a collection period of 28 days, 140 traces were recorded out of which 68 (49%) came from public computer labs, 25 (18%) from students, 20 traces (14%) from home users and 27 (19%) from business users. These results are shown in Fig. 5. In order to check the nature of workers, CPU, and patterns of availability among volunteered resources, a modiﬁed BOINC client was distributed among 112,268 users. After a collection period of about seven months, the log ﬁles of these workers were collected at the BOINC server for SETI@home project. The logs collected during this time period traced 16,293 years of CPU availability. Through these logs ﬁles, the following observations were deduced: Nature of workers: about 81% of the workers were at home, 17% at ofﬁces, and 2% at schools (Heien et al., 2009). CPU availability: many P2P availability studies focus on the host availability but not on CPU availability (Kondo et al., 2008; Javadi et al., 2009; Zheng and Subhlok 2008). It has been observed, that resources may have 100% host availability but 0% CPU availability. Volatility of resources: volatility of resources signify the friction of time they are available. It has been observed that 50% of workers are available for less than 40% and merely a small fraction of workers (about 4%) are available more than 85% of their lifespan. Hence, it makes difﬁcult to use the existing scheduling algorithms, due to this downtime and highly unpredictable nature of workers (Kondo et al., 2008; Byun et al., 2005).

3. Research issues and challenges In this section, main issues of existing VC are described. At the end some possible solutions described in the literature are also discussed. The foremost challenge in efﬁcient utilization of hosts is that users have heterogeneous resources with varying capabilities. Proper distribution of tasks according to the capacity of each volunteer is desired in order to achieve timely completion of jobs. Further, this would ensure efﬁcient utilization of resources, reduced cost of operation, and enhanced user satisfaction. Heterogeneity-aware distribution of workload would also permit

Fig. 5. Average period durations of workers.

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

Workers lifespan: an analysis of VC projects shows that the lifespan of workers generally follows an exponential distribution with a mean of 90 days. The Mean interval length of 60% workers is less than 24 h, which suggests the requirement of fault-tolerance mechanism for extensively long-running applications. The mean and median lifespan of a worker is roughly 2–3 months. It means that tasks in length less than 15 h will fail less than 1% due to permanent worker unavailability (Heien et al., 2009; Kondo et al., 2008; Estrada et al., 2008). Group of resources with similar availability and CPU speed: David et al. (Kondo et al, 2008) performed a series of experiments over these traces and have found the following results: (i) nearly 10,000 workers having processing speed approximately 2 109 FPOPS are more than 90% available, (ii) about 2500 workers having CPU frequency 2 109 are 30% available, (iii) there is a little correlation between host speed and CPU availability. (iv) Workers available more than 90% contribute 40% of the total CPU time, (v) workers which are less available, contribute a lot, i.e. Workers having availability between 50% and 85%, contribute about 45%, and (vi) dedicated workers with availability more than 95%, contribute less than 5% of their computational power.

Table 1 Workers attributes in volunteer computing system. Worker attributes

Description

Processor type OS Worker type No. of Cores (Cn) CPU frequency (CPUs) Cores free (Cf) CPU free (CPUf) Memory (Ms) Free memory (Mf) Storage (Sf) Availability session (Aws) Timeout workunits(Tw) Workunit completed session (Wcs) Total workunit completed (wct) Reliability (Rws)

Type of processor Operating system Desktop user, mobile User, video game console No. of cores Processing speed Percentage of free cores Percentage of free CPU Memory of the system Free available memory Free storage Worker′s availability since the last connection

Availability(Awp) Reliability(Rwp) Lifespan

3.2. How to identify patterns resources?

Average turnaround

As discussed in the previous section, task scheduling is an important issue to efﬁciently utilize large number of computing resources. In order to efﬁciently distribute a set of tasks among workers, available resources may be classiﬁed. Nouman and Shamsi et al. (Durrani, 2013) described that attribute based resource grouping is important to know the pattern of available, reliable and computational resources. Some of the worker′s attributes are listed in Table 1. These attributes may be static or random. Static attributes are those attributes which may not change in a short span of time, for example, CPU frequency, memory, operating system, number of cores etc. However, random attributes will exhibit random behavior and will change in time interval t, for example, free CPU, free number of cores, availability, reliability, average dedicated time, lifespan, etc. After having resource traces at time t, weighted sum of static and random attributes of a worker x may be calculated as n

Sx ðTÞ ¼ ∑ ðSx i:W x iÞ i¼0 n

ð1Þ

Rx ðtÞ ¼ ∑ ðRx i:W x iÞ

ð2Þ

W x ðtÞ ¼ ð1αÞW x ðt1Þ þ αðRx ðtÞÞ

ð3Þ

C x ðtÞ ¼ Sx ðTÞ þ W x ðtÞ

ð4Þ

i¼0

where Sx(T) is the weighted sum of all static variables (T»t), Rx(t) is weighted sum of all random variable at time t, Wx(t 1) is the weighted sum of worker x at time t 1, and α (0oα≤1) is the sensitivity parameter. Based on the weight distribution, resources are grouped after calculating Cx (t) for all resources. 3.3. How to improve resource management in VC by applying knowledge of availability patterns? After analyzing and identifying resource patterns, the next important issue is how to improve resource management? Of course, task distribution policies may solve the resource management issue and may efﬁciently utilize all the resources. In the next subsection different task distribution policies have been discussed in detail.

373

Acquired workunits Average dedicated time Workunits in progress Workunit in progress

Number of timeout workunits Count of workunits completed since last connection Without error or timeout over WUI distributed Worker/application WUI valid over WUI collected since the last worker connection Worker WUI completed without error or timeout over WUI distributed since worker joined the project Worker/application WUI valid over WUI since worker joined the project Worker number of hours since the worker joined the project Worker average time to return a WUI for a given worker Worker/application number of WUIs given to the worker in the last connection Worker average time dedicated to VC project Worker number of WUIs still in progress Application average size of WUs in ﬂops

3.3.1. Existing task distribution policies The rapid growth of VC projects has attracted more researchers to study and improve the existing platform. The progress of concurrently running projects has slowed down due to the increasing competition for hosts. According to a recent report, there are 360 million internetconnected users out of which less than 2% are participating in VC (Volunteer computing, 2013). Moreover, because of the high computational needs and low participation rate, it has become extremely important to attract more participants, and use their resources more efﬁciently. In the context of task scheduling, the quality of a schedule is guaranteed in spite of a certain level of performance ﬂuctuation, such as resource performance degradation and incorrect estimates of task completion times (Lee et al., 2010). The authors have addressed the task scheduling problem based on proactive reallocation scheme which enables output schedules to tolerate a certain degree of performance degradation in VC system. According to them, rescheduling occurs only if a new task-resource match improves the robustness without increasing the make-span. Furthermore, it has been pointed out that task retrieval policies might force the clients to complete the number of successful tasks. Hence, the second main issue is the use of efﬁcient task retrieval policy. Young and Heien et al. (Heien et al., 2009; Lee et al., 2010) proposed a communication model of task requests for VC workers. Task requests from VC workers are modeled as a Poisson process, and through the workers reconnection period (T), the request rate of the process is controlled. Hence, rather than scheduling tasks requests for individual workers, the entire worker pool is considered as a tunable stream. Each worker is given a reconnection time given by Reconnect ðTÞ ¼ CurrentðTÞ þ T=P active

ð5Þ

374

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

where Pactive are the recently active workers in the last 2T seconds. They found that the average time gap between connections of 3.76 closely matches to the expected time gap equal to 4 h/4025 workers ¼3.58 s. Moreover, according to this task distribution scheme, “the time gap between connections to the master can be modeled as an exponential distribution function (EDF)”. Best results can be obtained when the minimum p-value ¼0.05 for Kolmogorov–Smirnov (KS) test is applied to all results of simulation. Based on the communication model of task requests for VC workers, different task retrieval policies have been proposed by the researchers. In VC projects most existing task distribution policies are based on heuristics and can be categorized in two modules: naive and knowledge-based. Naïve scheduling policies assign computational tasks without taking into account the history of the workers. Following are examples of naive task distribution policies: (i) First-come-ﬁrst-served (FCFS): in this policy workunits for computation are sent to any worker who requests (Estrada et al., 2009a, 2006; Taufer et al., 2007). The main disadvantage of FCFS policy is assigning tasks to even those workers that have shown low availability and reliability in the past. (ii) Locality scheduling policy: here tasks are preferably sent to workers who already have the necessary data to accomplish the work. Workers having issues of availability and reliability in this policy may degrade the overall performance of the system (Anderson et al., 2005; Estrada et al., 2008). (iii) Random assignment: in this scheme workers are randomly selected and workunits are sent to them (Kondo et al., 2007; Anderson 2011). As the selection in this policy is random, the performance of the system will ﬂuctuate time to time. In order to get steady results, workers may be grouped according to their availability and reliability patterns, and then the task may be distributed among them randomly (Estrada et al., 2006, 2008; Heien et al., 2009; Lee et al., 2010).

Knowledge-based task distribution policies look at the workers history requesting for computational tasks and the whole community. Following are examples of knowledge-based task distribution policies: (i) World community grid: this scheduling policy distributes workunits based on the average turnaround time of the client. The average turnaround time is the average time a client needs to return a result before its deadline (Toth and Finkel, 2009; Toth, 2008; Estrada et al., 2008). (ii) Weighted round robin: this policy is round-robin between projects of different servers, and weighted according to their resource share (Kondo et al., 2007; Anderson 2011, 2007). (iii) Work fetch: in this policy, the workers download sufﬁcient amount of tasks and queue those tasks that will last for the next connection interval, and divide the queued jobs between different projects based on resource share, with each project always having at least one job queued or running. This policy works well in some cases, but there are some other cases in which it fails terribly (Kondo et al., 2007; Anderson 2007). (iv) Threshold-based policies: in such policies, two thresholds are used to determine the distribution of tasks to workers. Out of the two, one threshold is related to workers availability and the second to workers reliability. If availability and reliability values of a worker are less than the minimum deﬁned threshold, tasks are not distributed to workers. Threshold based policies are divided into ﬁxed thresholds and variable thresholds.

In ﬁxed threshold scheduling scheme, the server calculates the availability and reliability values of the requesting worker. If these values are greater than a deﬁned threshold, the server assigns the requested tasks to that worker. Whereas in variable threshold policy, the scheduler varies the threshold (reliability and availability values) at runtime (Estrada et al., 2009a, 2006, 2008). Here, if fewer requests are generated from workers and more tasks (with strictly deadline) are waiting for distribution, then the threshold of the system is decreased, otherwise it is increased. The main disadvantage of this policy is starving workers with low availability and reliability values. As these workers are assumed unsuitable for computation they are removed. (v) Priority round robin policy: in priority round robin scheduling, workers having similar resources are grouped together to guarantee the correctness of results and hence, to reduce the overall execution time of tasks. As the user can interrupt the volunteer task at any time, the authors assume that for each computer, the task execution time is a random variable with Gaussian distribution equal to (μi, s2i). The workers are sorted and placed in a priority queue by their increase in computational time μi. When a worker becomes available, it is inserted into the queue. Workers having high priority are popped out and assigned an un-computed task. Upon receiving a result, server performs credibility checking to verify whether the task result is above the minimum threshold (Ngo et al., 2008). (vi) Buffer none policy: this policy instructs each host to retrieve one task at a time from the project′s server. When a worker completes an assigned task, and returns the result to the project′s server, it retrieves another task. In this policy, the CPU cycles are idle and completely wasted while the new task is requested for downloading. This time may become more signiﬁcant when the task requires a large data ﬁle (Toth and Finkel, 2009; Toth, 2008; Toth and Finkel, 2008). As shown in Table 2, grid.org and Xtremweb clients use the buffer none method. (vii) Buffer N days: also known as buffer multiple task retrieval policy. It instructs each worker to retrieve multiple tasks from the project′s server at a time. After retrieving multiple tasks, they are stored in a buffer and processed one at a time. When a task is completed, the result is sent to the project′s server and the next task stored in the buffer is started for processing. When all the tasks are executed and the buffer is almost emptied, the workers retrieve another set of tasks from the server. This policy is called buffer N because the workers buffers N (number) days or hours of work (Toth and Finkel, 2009; Toth, 2008; Toth and Finkel, 2008). The client

Table 2 Projects, middleware and task retrieval policies Project

Framework used to create project

Task retrieval policy

SETI@home Folding@home Einstein@home QMC@home LHC@home Rosetta@home Grid.org Climateprediction.net SIMAP The Riesel Sieve Project World Community Grid

BOINC BOINC BOINC BOINC BOINC BOINC GridMP BOINC BOINC BOINC

Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer

BOINC

Buffer multiple

N days none N days N days N days N days none N days N days multiple

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

buffer the number of tasks they could possibly complete before deadline D. Distributed.net′s projects and BOINCbased projects, allow clients to download multiple tasks at one time. For example, the BOINC allows workers to conﬁgure their client, by specifying how much of work at one time they want to download. Though, the BOINC will take priority over the user′s settings by restricting the client to download only tasks that it would complete before deadline, this limit makes this policy considerably better, by reducing the number of uncompleted and half-ﬁnished tasks and thereby decreasing wasted cycles exhausted on tasks that would not complete and hence overall throughput of the system. Figure 6 shows differences amongst buffer none, download early and buffer N tasks. It also shows the percent of task completed using these policies. Comparing wasted time for the systems: Toth (2008) derived an equation for comparing the wasted time experienced by different clients used to retrieve tasks. He found that the

wasted time is given by Wasted time ¼ T D þ T U þ T W

ð6Þ

where TW is the number of idle cycle, TD is the ﬁle downloading time, and TU is the time the client spent on working on tasks that are not completed on time. It has been found that for projects using buffer 1 method, TW ¼0, as at least one task is always downloaded before the last task is completed. In contrast to buffer 1, for grid.org and Xtremweb projects, TW is the sum of the idle cycles which occur during the downloading of tasks. (viii) Super-optimal policy: in this policy, ﬁrst the result of the previous task is returned and then a new task is downloaded, but the downloaded task is assumed instantaneously. This policy does not buffer any task. Also, it does not try to execute any task that will not be completed before its deadline. As super-optimal policy requires knowledge of the CPU future availability, super-optimal policy may not be implemented in

3.50% 3.00% 2.50% 4 h 300 kbps

2.00%

4 h 10 mbps 1.50%

24 h 300 kbps 24 h 10 mbps

1.00% 0.50% 0.00% Buffer none

Download early

Buffer 1 task

Buffer 1 day of Buffer 3.5 days Buffer 7 days of Buffer 14 days tasks of tasks tasks of tasks

Fig. 6. Differences amongst buffer none, download early and buffer N tasks.

90 CHARMM

80 70 60

Successful Results %

50

Unsuccessful Results %

40

Valid Results %

30

Invalid Results %

20 10 0 0.95/0.95

0.75/0.75

0.50/0.50

0.25/0.25

0/0 (FCFS)

BEST

Worst

100 MFold

90 80 70 60

Successful Results %

50

Unsuccessful Results %

40

Valid Results %

30

Invalid Results %

20 10 0 0.95/0.95

0.75/0.75

0.50/0.50

0.25/0.25

375

0/0 (FCFS)

BEST

Worst

Fig. 7. Results of MFold and CHARMM for different availability and reliability value.

376

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

volunteer computing clients (Toth and Finkel, 2009; Toth, 2008; Toth and Finkel, 2008). However, in a given amount of time, this policy gives an upper bound to clients, the number of tasks they could complete. The upper bound work as a standard, allowing us to ﬁnd out whether there is enough motivation to try and develop better policies and how to scale other policies effectiveness in VC. The main disadvantage of this policy is starving workers with low availability and reliability values. As these workers are assumed unsuitable for computation they are removed. Experiments of P@H project for two applications: MFold and CHARMM (Estrada et al., 2006, 2009b) are illustrated in Fig. 7. It has been found that using 0.75/0.75 (availability/reliability threshold), the successful results percentage is high. Moreover it also shows that at this value the percentage of valid results is greater, and the error rate is minimal. (ix) Distributed evolutionary method: in this method, automatically a set of project-independent scheduling policies having maximum throughput and minimum errors are generated. This method comprises a genetic algorithm where the representation of individuals, the ﬁtness function, and the genetic operators are speciﬁcally tailored to get effective policies in a short time (Estrada et al., 2008; Desell et al., 2010). Both naive and knowledge-based task distribution policies use Homogeneous Redundancy (HR) for the task distribution of multiple job instances. HR distributes instances of the same tasks to workers that are computationally same, meaning that they have the same set of resources, i.e. operating system and vendor (Kondo et al., 2007; Estrada et al., 2006, 2008; Atlas et al., 2009).

3.3.2. Solutions: beyond task distribution policies As discussed in the previous section, resource grouping is important to know the pattern of computational, available and highly reliable resources. In order to accomplish resource classiﬁcation, attributes knowledge and its signiﬁcance level is important to assign proper weight. In this regard, Nouman and Shamsi et al. (Durrani, 2013) used a transformation scheme to calculate the level of signiﬁcance known as weight of attribute. Mathematically, this transformation scheme WYi is given by n

WY i ¼ Y i = ∑ Y i¼0

resource groups i.e. A, B and C will increase. Only 16% of the workers have been identiﬁed for deadline sensitive tasks. (ii) Resource groups on the basis of availability Resource availability is foremost for real time sensitive set of tasks e.g. weather forecasts. Even resourceful hosts with low availability are not considered in VC. Regarding group based availability, we have found the following results: (i) no worker is 87% or more available in its life span, (ii) only 20% of the workers are available for more than 72% in their life span, (iii) only 58.34% of the workers are 26% or more available in their life span, (iv) only 48% of the workers are available at time t. These results are shown in Table 4 and Fig. 9. Due to this low availability of hosts, it is difﬁcult to run delay sensitive and real-time set of tasks. Table 3 Resource groups at different values of α. Classes

Number of workers

Class name

α¼ 0.125

α ¼0.25

α ¼ 0.5

α ¼ 0.75

α ¼ 0.875

A B C D E F G

90 278 454 560 422 260 111

0 41 344 688 740 312 50

0 44 317 724 746 281 63

0 45 322 723 736 289 60

0 44 317 724 746 281 63

1000

C(t) Classes

600 400

C(t) Classes

200

(i) Resource groups for deadline sensitive set of tasks As discussed earlier, satisﬁed tasks are completed and reported before its deadline and its set is called a satisﬁed set. In order to accomplish timely completion of tasks, efﬁcient task distribution is important; otherwise large number of resources may be wasted. In Durrani (2013), the authors have classiﬁed all the resources, and found that resource groups with C(t)α ¼ 0.25 and above are more useful for massive computing and deadline sensitive set of tasks. As shown in Table 3 and Fig. 8(a), no signiﬁcant difference is found at C (t)α ¼ 0.25, C(t)α ¼ 0.5, C(t)α ¼ 0.75 and C(t)α ¼ 0.25. Experimentally, it has been found that as soon as the task sensitivity will decrease, the number of workers in high computational

C(t) Classes

0 A B C D E

ð7Þ

where Yi is the value of an attribute Y for the ith worker. Once the weights are assigned, the next step is formation of resource groups to deal with different set of tasks. For example, a deadline sensitive and error sensitive tasks may be computed with highly available A(t), or reliable R(t) resource groups respectively. Hence, resource grouping is important to efﬁciently utilize the available resource.

C(t) Classes

Classification of Workers

800

F G

Classification at C(t) 600

=.125

Number of Workers

400 200 0 A

B

1000

C

D

E

F

G

Classification at C(t)

=.875

500 0

Number of Workers A

B

C

D

E

F

G

Fig. 8. (a) Classiﬁcation on the basis of attributes. Signiﬁcant difference is found only at α¼ 0.125. Classiﬁcation at all other values is same. (b) Resource groups at α¼ 0.25. (c) Resource groups at α ¼0.875.

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

(iii) Resource groups on the basis of reliability Result veriﬁcation and prior knowledge of malicious users are important to efﬁciently utilize dedicated resources; otherwise, the overall performance of the system will be compromised and degraded. Classiﬁcation of workers on the basis of reliability is required to deal with error sensitive set of jobs.

Regarding group based reliability, we have found the following results: (i) only 29% of the total number of hosts are reliable, (ii) only 6% of the hosts are highly reliable, available and computationally powerful. These results are presented in Table 5 and Fig. 10. Due to small number of reliable workers, a framework is required to verify data results. The framework should also provide security to intellectual resources of hosts participating in VC systems.

Table 4 Resource groups are shown in the second column when the value of α ¼0.125 is used. In the third column, classiﬁcation of active workers in Cα ¼ 0.125(t), at time t is shown. In the fourth and last column, classiﬁcation is based purely on the basis of availability values. Classes Class name A B C D E F G

Total number of workers C(t)α ¼ 0.125

Available workers in C(t)α ¼ 0.125 A(t)α ¼0.125

Availability based classiﬁcation Availability classes

90 278 454 560 422 260 111

49 137 221 272 195 134 56

0 0 454 626 643 452 0

700 600

Availability

500

3.3.3. Role of trust and security Volunteers in VC systems are effectively anonymous. They are not connected to a real-world identity, although an email address or other information is required to register. Because of their anonymity, volunteers are not answerable to projects. In this regard, VC systems use policy-based trust for all projects. Likewise volunteers must trust projects in several ways: (i) the volunteers trust middleware of different projects that do not harm their systems or attack their privacy, (ii) the volunteers trust that the project is truthful about the claimed credit, processing and storage resources (intellectual property usage), and (iii) the volunteer trusts the projects to use appropriate security policies, so that hackers may not use the project as a source for malicious activities. A successful attack could discredit all projects, and the overall performance of the public-participation computing. A series of possible security threats have been identiﬁed for projects running under VC platform. These threats are summarized as follows: (i) Result falsiﬁcation: the attackers return incorrect results or send false information regarding data results. (ii) Credit falsiﬁcation: the attacker returns results claiming more CPU usage than was actually served. (iii) Malicious executable distribution: in this type of attack, the attackers take control of the distribution server, modify the database and ﬁles, and attempt to distribute their own executable (e.g. a malicious program) disguised as a science application. Further, the attacker may steal project ﬁles, email addresses and other account information of participants. (iv) Overrun of data server: in this type of Denial of Service (DoS) attack, attackers repeatedly send large ﬁles to the server, ﬁlling up their disks, and rendering them unusable. The attackers may also exhaust other resources such as bandwidth and memory of the project servers.

Table 5 Resource groups are shown in the second column when the value of α ¼ 0.125 is used. In the third column, classiﬁcation on the basis of reliability is shown. Classes

400 300 200 100 C D

E

F

Total Number of Workers

Reliability based Classiﬁcation

Reliable Workers in C(t)α ¼ 0.125

C(t) Classes

Class name

C(t)α ¼ 0.125

Reliability classes

R(t)α ¼ 0.125

Currently Active

A B C D E F G

90 278 454 560 422 260 111

0 0 479 609 611 476 0

0 0 141 174 178 138 0

0 A B

377

G

Availability

700 600

600

500

500

400

Availability

300

400

200

300

100

200

0

A

B

C

D

E

F

G

Fig. 9. (a) Classiﬁcation on the basis of availability and currently active workers in the resource groups C(t) α ¼ 0.125 are shown and (b) availability based classes/ Resource groups are shown.

Relaible C(t) Classes

100 0 A

B

C

D

E

F

G

Fig. 10. Reliable workers classiﬁcation in resource group C(t)α ¼ 0.125.

378

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

(v) Intentional abuse of participant hosts by projects: a project deliberately send malicious programs that abuses participant′ s hosts, e.g. by stealing sensitive information stored in ﬁles. (vi) Accidental abuse of participant hosts by projects: a project may also release an application that accidentally or unintentionally abuses participant hosts, e.g. modifying conﬁguration ﬁles within the host, delete hosts ﬁles or may cause their system to crash (Volunteer computing, 2013; Security Threats, 2012).

3.3.4. Security mechanisms Anonymous networks are untrusted networks; hence, they need to be secured against cheating, sabotage, and espionage. Also, result veriﬁcation and prior knowledge of malicious users are foremost to efﬁciently utilize voluntarily dedicated resources. VC systems provide the following security mechanisms to reduce the likelihood of some of these attacks: (i) Result Falsiﬁcation: in order to prevent this type of attack, the project running under the VC platform must have the following routines: a validator, to conﬁrm the validity of the returned results and an assimilator, to handle valid results. To conﬁrm the validity of returned results, redundant computing and result veriﬁcation mechanisms are used. Traditional veriﬁcation techniques, like parity and checksum schemes are not effective against pre-planned saboteurs. For this purpose, sabotage-tolerance mechanisms like voting and spot checking were proposed to check the presence of malicious saboteurs. Some task veriﬁcation schemes used are discussed as under: (a) Voting: in voting, each workunit is sent to several hosts and through voting, the best result is selected. Two types of voting, namely m-ﬁrst voting and majority voting are used. In majority voting, a result group of the highest number of matching is selected as the ﬁnal result group. In m-ﬁrst voting, the execution of a task is repeated until the server collects m-ﬁrst matching results. BOINC makes use of the m-ﬁrst voting method (Watanabe et al., 2011, 2010; Sarmenta, 2001; Watanabe and Fukushi, 2010). Due to redundant computing, resources are wasted due to voting, as the same task is executed on multiple machines. (b) Spot-checking: in spot-checking, the server that already knows the result of a job, sometimes assign a spotter job to the host to randomly verify results of the host and hence to check whether a host is a saboteur or not. If a host returns with an erroneous result for that spotter job, the worker is caught as a saboteur. Once a host is caught as a saboteur, two methods namely Blacklisting and Backtracking are used. As discussed in (Toth, 2008; Sarmenta, 2001; Sarmenta, 2002; Watanabe and Fukushi, 2010; Watanabe et al., 2010), all the results processed and returned by the saboteur host are invalidated and the saboteur host is not allowed to return results or get any more jobs in future. Limited spotter jobs (c) Credibility-based voting: Sarmenta et al. (Watanabe et al., 2011, 2010; Sarmenta, 2001; Watanabe and Fukushi, 2010), described reliability of volunteer computing systems. He pointed out that anonymous networks are untrusted networks and they need to be secured against cheating, sabotage, and espionage. They presented a new voting based method, known as “credibility-based voting” using spot-checking. In this method the credibility of a result for a work entry may be deﬁned as the conditional probability of that result being correct up to some threshold. As far as the credibility of a result is smaller

(ii)

(iii)

(iv)

(v)

(vi)

(vii)

than some threshold, the result for that entry is recomputed by some other workers, and hence the system can mathematically guarantee that the error rate will not exceed a given acceptable value. In this method, a credibility value that represents its correctness is assigned to each worker, task, result and result group. Credit falsiﬁcation: credit falsiﬁcation can be probabilistically detected using redundant computing and credit veriﬁcation schemes. Credit is granted, if the result returned is within the quorum of results. Some other algorithms, such as calculating the mean or median of claimed credits are also used. Malicious executable distribution: In order to prevent malicious distribution, BOINC system uses code signing. Each project running under BOINC periodically changes their code signing key pair. The project generates a new key pair, then using the old private key, it generates a signature for the new public key. The client will accept a new key pair only if it is signed with the old private key. In this way, even if the attackers break into the project servers, will not be able to cause clients to accept a malicious code ﬁle and distribute false keys. DoS attacks on data servers: projects running under BOINC use an upload authentication key pair. Further, the data servers also assigned an associated maximum size to each result data ﬁle. The data server veriﬁes the authentication key, ﬁle description, and then ensures that the uploaded result data ﬁle does not exceed the maximum threshold size. Theft of participant account information: VC projects should also address theft of private account information. All project servers should disable unused network service and must be protected by a ﬁrewall. Moreover, encrypted protocols such as SSH may be used to access these servers. Further, the servers should be subjected to regular security audits. Sophisticated attackers snifﬁng network trafﬁc could get volunteer's account keys, and may use them to change volunteer′s preferences, or get email addresses. Popular middleware systems like BOINC do nothing to prevent this eavesdropping. Theft of project ﬁles: Project data ﬁles are not encryptted. Middleware such as BOINC encrypt them by themselves. This mechanism is still unsecure, as data in memory resides in clear text and through a debugger, it can be easily accessed. Intentional or accidental abuse of participant hosts by projects: participants must realize that they are intrusting the security of their systems to that project. Two important approaches are used to prevent intentional abuse of volunteer hosts by projects. In the ﬁrst approach known as sandboxing, the project is banned from accessing any ﬁles outside of the BOINC environment. In the second approach, the middleware notices the amount of resources being used by the project. If the project resources are using too much disk space, memory or processing time, and if the resource usage is not according to the ﬁle description, the project is closed.

Muralidharan and Kumar (2012) proposed a novel reputation management system for the project deployed in volunteer clouds. Their system utilizes a reputation metric based on three parameters, namely performance to ﬁnish a job, crash history and correctness of results, to calculate the reputation of the volunteer nodes while considering them for service deployment. Participants should need to trust the project that is utilizing their resources in two ways. First, they have to be assured that the project is itself secure by using secure infrastructure. The volunteer node will not become a target of executing or storing any malicious code. Secondly, the project is trustful in its resource allocation and scheduling policies.

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

Efﬁcient scheduling of computational loads to participants hosts can also raise project′s trustfulness. A policy based scheduling will obey the volunteer node′s volunteering proﬁle by either freeing their resources or reduce their usage as required. Furthermore, the chances of accidental abuse can be minimized by prereleased application testing on all platforms. To improve the performance of the credibility based voting using round-robin scheduling, Watanabe et al. (2011, 2010), Watanabe and Fukushi (2010) proposed a job scheduling method by considering the progress of each job. For each job, two metrics were deﬁned: the expected credibility and the expected number of results. In the next section some un-explored research challenges have been described, which need serious attention of the research community to enhance utilization of voluntarily resources.

4. Open problems and future research directions While VC is getting popularity, there are several issues which require serious considerations. For example, the existing VC platforms cannot support workunits especially designed for heterogeneous devices volunteer computing (HDVC). This is due to the fact that there are different hardware architectures with varying capabilities such as limited power, processing speed and memory requirements of mobiles nodes. Hence, design of an efﬁcient and reliable middleware is needed which can incorporate both the requirements and challenges of existing VC and HDVC. Some possible challenges of volunteer computing for heterogeneous devices are given as under: (i) Hardware heterogeneity: hardware heterogeneity means different devices with different power, memory and processing capabilities, as well as different communication interfaces (Sarmenta, 2001). As different devices have different capabilities, therefore it is hard to classify and design workunits accordingly. Moreover, it is also difﬁcult to group these devices according to their capabilities in VC. (ii) Network heterogeneity: network heterogeneity reﬂects environments where network interconnections do not share any common architectural characteristics (Desell et al., 2009; Schmohl and Baumgarten, 2008) and different network services with different capabilities availed by these volunteer computing devices. For example, prominent network technologies are next generation networks, 3 and 4G-networks etc. Based on the network capability or service availed by heterogeneous devices, different issues may arise while submitting results or receiving workunits with strong dependent real-time sensitive nature from server(s). Further, scheduling on a network requires complex rules, which makes it difﬁcult to build a mathematical model to estimate response time, queuing time and transfer time for a high dependent real-time sensitive workunits. Planning reliable capacities in advance is still a hard task. (iii) Load distribution: in the existing VC, load distribution is static. Workunits of different sizes may easily be created and distributed among volunteers. However, designing efﬁcient workunits compatible with a wide-range of devices and their distribution is a hard task. (iv) No use of partial results: nodes receive workunits from servers, and after completely processing these workunits, they submit their results. In the middle of such processing, results of partially computed workunits are not considered. (v) No intermediate result veriﬁcation: in the existing VC platform, volunteers perform their required computation and send data results back to the master. The master then veriﬁes

379

data results and discards bad or erroneous results. In this way, massive computation is wasted as result veriﬁcation is done at the end of processing. This is due to no intermediate result veriﬁcation mechanism in the existing architecture. (vi) Result aggregation: results veriﬁcation and aggregation are primary important. Results obtained in the existing VC can easily be veriﬁed and aggregated. However, data results and partial results obtained from heterogeneous devices cannot be easily aggregated and veriﬁed. Hence, a framework is required to handle data results and partial results aggregation in HDVC.

5. Conclusion Due to the high computational needs and low participation rate of volunteers, attracting more volunteers and using their resources more efﬁciently has become extremely important, if VC is to remain a feasible method. In this paper, workers analysis, task distribution and role of trust and security have been discussed. At the end some open problems like efﬁcient load distribution, partial results, intermediate result veriﬁcation and result aggregation have been brieﬂy discussed. This paper will also enable the research community to study the available schemes and open problems and help them ﬁlling research gaps. References Anderson DP. Local scheduling for volunteer computing. In: Proceedings of the IEEE international parallel and distributed processing symposium, IPDPS 2007; 2007. Anderson DP. Emulating volunteer computing scheduling policies. In: Proceedings of the IEEE international symposium on parallel and distributed processing workshops and Ph.D. forum (IPDPSW); 2011. Anderson DP, et al. SETI@ home: an experiment in public-resource computing. Communications of the ACM 2002;45(11):56–61. Anderson, DP, Korpela, E, Walton R.. High-performance task distribution for volunteer computing. In: Proceedings of the ﬁrst IEEE international conference one-science and grid computing; 2005. Atlas J, et al., Balancing scientist needs and volunteer preferences in volunteer computing using constraint optimization. In: Proceedings of the international conference on computational science – ICCS. Springer; 2009. p. 143–52. Byun E, et al. Scheduling scheme based on dedication rate in volunteer computing environment. In: Proceedings of the 4th international symposium onparallel and distributed computing, ISPDC; 2005. Desell T, et al. Robust asynchronous optimization for volunteer computing grids. In: Proceedings of the ﬁfth IEEE international conference one-science, e-Science′ 09; 2009. Desell T, et al. Validating evolutionary algorithms on volunteer computing grids. Distributed applications and interoperable systems. Springer; 2010. Distributed.net. Available from: 〈http://www.distributed.net/Main_Page〉 [cited 17.04.13]. Durrani NM. Towards efﬁcient resource grouping in heterogeneity-aware volunteer computing. Stanford Undergraduate Research Journal 2013. Estrada T, et al. The effectiveness of threshold-based scheduling policies in BOINC projects. In: Proceedings of the second IEEE international conference onescience and grid computing, e-Science′06; 2006. Estrada T, Fuentes O, Taufer M. A distributed evolutionary method to design scheduling policies for volunteer computing. ACM SIGMETRICS Performance Evaluation Review 2008;36(3):40–9. Estrada T, Taufer M, Anderson DP. Performance prediction and analysis of BOINC projects: an empirical study with EmBOINC. Journal of Grid Computing 2009a;7 (4):537–54. Estrada T, Taufer M, Reed K. Modeling job lifespan delays in volunteer computing projects. In: Proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid. IEEE Computer Society; 2009. Great Internet Mersenne Prime Search (GIMPS). Available from: 〈www.mersenne. org〉; 2013 [cited 17.04.13]. grid.org Available from: 〈www.grid.org〉; 2013 [cited 17.04.13]. Heien EM, Anderson DP, Hagihara K. Computing low latency batches with unreliable workers in volunteer computing environments. Journal of Grid Computing 2009;7(4):501–18. Javadi B. et al. Mining for statistical models of availability in large-scale distributed systems: an empirical study of seti@ home. In: Proceedings of the 2009 IEEE international symposium on modeling, analysis & simulation of computer and telecommunication systems MASCOTS′09; 2009.

380

M. Nouman Durrani, J.A. Shamsi / Journal of Network and Computer Applications 39 (2014) 369–380

Kondo D, Anderson DP, McLeod J. Performance evaluation of scheduling policies for volunteer computing. In: Proceedings of the IEEE International Conference on one-science and grid computing; 2007. Kondo D, Andrzejak A, Anderson DP. On correlated availability in internetdistributed systems. In: Proceedings of the 2008 9th IEEE/ACM international conference on grid computing. IEEE Computer Society;2008. Lee YC, Zomaya AY, Siegel HJ. Robust task scheduling for volunteer computing systems. The Journal of Supercomputing 2010;53(1):163–81. Muralidharan SP, Kumar V. A novel reputation management system for volunteer clouds. In: Proceedings of the IEEE international conference on computer communication and informatics (ICCCI); 2012. Ngo SH, et al. Efﬁcient scheduling schemes for sabotage-tolerance in volunteer computing systems. In: Proceedings of the 22nd international conference on advanced information networking and applications, AINA; 2008. Sarmenta LF. Sabotage-tolerance mechanisms for volunteer computing systems. Future Generation Computer Systems 2002;18(4):561–72. Sarmenta LFG. Volunteer computing. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology; 2001. Schmohl R, Baumgarten U.. Heterogeneity in mobile computing environments. In : Proceedings of the 2008 international conference on wireless networks, ICWN′08; 2008. Security Threats. Volunteer computing environments using the Berkeley Open Infrastructure for Network Computing (BOINC). International Journal of Computer Technology and Applications 2012;3(3). setiathome.berkeley.edu; 2013 [cited 17.04.13]. Taufer M, et al. Moving volunteer computing towards knowledge-constructed, dynamically-adaptive modeling and scheduling. In: Proceedings of the IEEE international parallel and distributed processing symposium, IPDPS; 2007.

The Quake-Catcher Network: QCN. Available from: 〈qcn.stanford.edu〉; 2013 [cited 17.04.13]. Toth D, Finkel D. Increasing the amount of work completed by volunteer computing projects with task distribution policies. In: Proceedings of the IEEE international symposium on parallel and distributed processing, IPDPS; 2008. Toth D, Finkel D. Improving the productivity of volunteer computing by using the most effective task retrieval policies. Journal of Grid Computing 2009;7 (4):519–35. Toth DM. Improving the productivity of volunteer computing. Worcester Polytechnic Institute; 2008. Volunteer computing. Available from: 〈https://boinc.berkeley.edu/volunteer.php〉; 2013 [cited 17.04.13]. Watanabe K, Fukushi M. Generalized spot-checking for sabotage-tolerance in volunteer computing systems. In: Proceedings of the 10th IEEE/ACM international conference on cluster, cloud and grid computing (CCGrid); 2010. Watanabe K, Fukushi M, Horiguchi S. Expected-credibility-based job scheduling for reliable volunteer computing. IEICE Transactions on Information and Systems 2010;93(2):306–14. Watanabe K, Fukushi M, Kameyama M. Adaptive group-based job scheduling for high performance and reliable volunteer computing. Journal of Information Processing 2011:39–5119 2011:39–51. What is protein folding?. Available from: www.folding.stanford.edu. [cited 17.04.13]. Yi S, Kondo D, Anderson DP. Toward real-time, many-task applications on large distributed systems. Euro-Par 2010-parallel processing. Springer; 355–66. Zheng R, Subhlok J. A quantitative comparison of checkpoint with restart and replication in volatile environments. Technical Report UH-CS-08-06. University of Houston; 2008.

Volunteer computing: requirements, challenges, and solutions

Volunteer computing: requirements, challenges, and solutions

Recommend Documents