Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks

Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks

Knowledge-Based Systems xxx (xxxx) xxx Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/k...

2MB Sizes 0 Downloads 15 Views

Knowledge-Based Systems xxx (xxxx) xxx

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks✩ ∗

Jiuchuan Jiang a , , Yifeng Zhou b , Yichuan Jiang b , Zhan Bu a , Jie Cao a a

Jiangsu Provincial Key Laboratory of E-Business, School of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210003, China b School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

article

info

Article history: Received 18 June 2019 Received in revised form 10 January 2020 Accepted 13 January 2020 Available online xxxx Keywords: Crowdsourcing e-markets Task allocation Complex task Decomposition Social network

a b s t r a c t In existing studies on decomposition-based complex task crowdsourcing e-markets, a complex task is first decomposed into a flow of simple subtasks and then the decomposed subtasks are allocated independently to different individual workers. However, such retail-style independent allocation of decomposed subtasks costs much time and the intermediate results of subtasks cannot be utilized by each other; moreover, the independent allocation does not consider the cooperation among assigned workers and the time-dependency relations among subtasks. To solve such a problem, this paper presents a novel batch allocation approach for decomposition-based complex task crowdsourcing in social networks, in which the similar subtasks of complex tasks are integrated into a batch that will be allocated to the same workers. In the presented approach, it is preferable that a batch of subtasks will be allocated to the workers within the same group or the workers with closer relations in a social network; moreover, the allocation will consider the time constraints of subtasks so that the deadlines of the whole complex tasks can be satisfied. This batch allocation optimization problem is proved to be NP-hard. Then, two types of heuristic approaches are designed: the lateral approach that does not consider the subordination relationship between subtasks and complex tasks and the longitudinal approach that considers such relationships. The experiments on real-world crowdsourcing datasets show that the two presented heuristic approaches outperform traditional retail-style allocation approach in terms of total payment by requesters, average income of assigned workers, cooperation efficiency of assigned workers, and task allocation time. © 2020 Elsevier B.V. All rights reserved.

1. Introduction Crowdsourcing is a new task allocation paradigm in which requesters allocate tasks to workers chosen from a population [1– 3]. With the increasing complexity of tasks [4] involving many computational operations and cannot be completed directly by individual non-professional workers, complex task crowdsourcing has been achieved much attention [5–8]. In existing related studies, decomposition-based complex task crowdsourcing has been extensively studied [6,9]: a complex task is first decomposed into a flow of simple subtasks and then the decomposed subtasks are allocated to individual workers; the results of all subtasks are aggregated to achieve the final result for the complex task. ✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys. 2020.105522. ∗ Corresponding author. E-mail address: [email protected] (J. Jiang).

In general, existing related studies on decomposition-based complex task crowdsourcing typically have the following characteristics. (1) Each subtask of a complex task will be assigned and executed independently from scratch. This approach may involve repeatedly solving for the optimal matching between subtasks and workers, which may cost much time especially when the number of complex tasks is high. Moreover, the requester needs to pay in full for each subtask that is completed independently from scratch. (2) The independent allocation of different subtasks decomposed from a complex task may assign workers without post cooperation experience. Therefore, those assigned workers undertaking different subtasks may not cooperate smoothly and effectively for the whole complex task. For example, in an outsourced software-development project, the code of one worker assigned for a subtask may be not compatible with that of another worker assigned for another subtask.

https://doi.org/10.1016/j.knosys.2020.105522 0950-7051/© 2020 Elsevier B.V. All rights reserved.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

2

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

(3) The allocation of subtasks of a complex task does not systematically consider the effects of the time constraints of subtasks and the execution time of workers. Therefore, sometimes the deadline of a complex task may not be satisfied. Our solution to address the above drawbacks is motivated by the following observations in popular crowdsourcing e-markets. (1) Many tasks in the same category at a complex taskoriented crowdsourcing website are often similar, and many workers undertake no more than one task contemporaneously [10]. For example, by randomly analyzing 4950 tasks in the category of web and mobile development at www.upwork.com, we find that 769 tasks involve WordPress, 731 tasks involve PHP, and 683 tasks involve website development; we also find that the period during which workers contemporaneously undertake no more than one task occupies 61.58% of the workers’ total duration at the Upwork website and 80.16% at the Freelancer website. Therefore, it is possible to integrate the similar subtasks into a batch and allocate them to the same workers. With such approach, significant allocation time can be saved. Moreover, the real execution cost of each subtask can be reduced since the partial execution results of one subtask can be reused by another similar subtask in the batch, thus the system can discount the real payment for each task so that the requesters can pay less and the worker can also receive higher payment. (2) Workers are often connected through social networks and are organized in groups. For example, workers often communicate via phone, forums, chat, Facebook, or in person to share information about tasks and requesters and workers can self-report their social connections to other workers [11,12]; moreover, we find that the workers affiliated with any groups (i.e., agency freelancers) can constitute 47% of the total workers at the Upwork website. The workers within a social network or a group are often more cooperative [13,14], thus we can try to allocate the subtasks of a complex task to the workers with close social relations or within a group so that the assigned workers can cooperate smoothly and effectively. (3) The estimated completion time of a worker for a task can be estimated by the system [15], and it is possible to calculate the deadline of each subtask. Therefore, the allocation of each subtask should take into account the estimated time and deadline of each subtask, which aims to satisfy the deadline of the whole complex task. Next, we describe an example scenario on software development task crowdsourcing to demonstrate our idea: Currently, at www.upwork.com, there are three complex tasks: (1) task 1 ‘Financial Android App Development’, which can be decomposed to four subtasks, ‘UI Design’, ‘Android Dev.’, ‘Back-End Dev.’, and ‘Database’; (2) task 2 ‘Mobile app to be build for ios & andriod’, which can be decomposed to three subtasks, ‘UI Design’, ‘Android Dev.’, and ‘IOS Dev.’; and (3) task 3 ‘Java Servlet JDBC development’, which can be decomposed to four subtasks, ‘UI Design’, ‘Front-End Dev.’, ‘Back-End Dev.’, and ‘Database’. We can see that three complex tasks all involve subtask ‘UI Design’ requiring Java and JavaScript skill, two tasks involve subtask ‘Android Dev.’ requiring Java skill, two tasks involve subtask ‘Back-End Dev.’ requiring Java and SQL skills, and two tasks involve subtask ‘Database’ requiring Java and SQL skills. Therefore, now the system can integrate some subtasks of different complex tasks and allocate them in bath to some workers with excellent required skills. For example, the ‘UI Design’ subtasks of the three complex tasks can be allocated in batch to a worker

with excellent JAVA skills (such as the highest-rated Java developer at www.upwork.com), which can significantly reduce the allocation and execution time of the subtasks of ‘UI Design’ in three complex tasks. Moreover, the allocation of four subtasks in task 1 should try to consider the workers within the same group or with close social relations, the same applies to tasks 2 and 3. Certainly, the allocation of all subtasks should consider the deadlines of three complex tasks. Therefore, this paper presents a novel batch allocation approach for decomposition-based complex task crowdsourcing in social networks. The presented approach can integrate the similar subtasks of different complex tasks into a batch and allocate them to the same workers; it is preferable that a batch of subtasks will be allocated to the workers within the same group or the workers with closer relations in a social network; the allocation will consider the time constraints of subtasks so that the deadline of the whole complex task can be satisfied. This paper is different from our previous work [10] that implemented batching directly for the complex tasks. In other words, the approach in [10] directly integrates more than one entire complex task into a batch, but this paper mainly focuses on the batching of subtasks decomposed from complex tasks. In summary, the presented batch allocation approach in this paper has the following advantages: (1) the subtasks are allocated in batches, which can make requesters pay less since the task execution costs can be reduced and workers receive higher payment since each worker undertake more subtasks contemporaneously; (2) the assigned workers are more cooperative because they are often within the same group or have close social relations; (3) the allocation time can be significantly saved and the deadline of a complex task can be ensured. Finally, this paper conducts a series of experiments on realworld crowdsourcing datasets. The experimental results show that the presented batch allocation approach outperforms traditional retail-style allocation for decomposition-based complex task crowdsourcing in terms of total payment by requesters, average income of assigned workers, cooperation efficiency of assigned workers, and task allocation time. The remainder of this paper is organized as follows. In Section 2, we compare our work with related work; in Section 3, we present the problem description; in Section 4, we present two approaches for the batch formation and allocation of subtasks; in Section 5, we present the approach for finding assistant workers for a batch; in Section 6, we provide the experimental results; in Section 7, we present managerial implication; and finally, we conclude our paper in Section 8. 2. Related work We now introduce two types of works that are related to the subject of this paper: retail-style task allocation in crowdsourcing, decomposition-based crowdsourcing for complex tasks, and optimization in operations and logistics. 2.1. Retail-style task allocation in crowdsourcing Retail sale is a common market mechanism in which a product is sold entirely to an individual consumer for his own use [16]. By the same rule, retail-style task allocation means that a task is assigned entirely to an individual worker who will execute the task independently [10]. Existing studies on crowdsourcing often adopt the retail-style allocation approach, in which all tasks are outsourced individually and independently. In detail, retailstyle task allocation in crowdsourcing includes two types: one is the retail-style task allocation of simple task; the other is the retail-style task allocation of complex task.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

In traditional crowdsourcing markets, the tasks are often simple and micro. Many crowdsourcing platforms, such as Amazon’s Mechanical Turk, are oriented to micro-tasks [17]. The microtasks are atomic computation operations and can be completed in minutes by non-professional individual workers [18]. In the retail-style crowdsourcing of micro-tasks, each task is redundantly allocated to more than one individual worker to improve the accuracy; each assigned worker fully satisfies the required skills of the task and executes the task independently. Finally, the requester will select the correct result from the multiple answers from the redundantly allocated individual workers [18]. Therefore, many existing studies mainly investigated how to maximize the number of workers who are assigned for a micro-task under a predefined budget constraint or how to achieve a trade-off between budget and quality in completing a micro-task [19]. Complex tasks are involved in many computation operations and need multiple skills. One typical method for retail-style crowdsourcing complex tasks is team formation, in which a collection of people with different skills form a team to execute a complex task. In many existing studies, the team formation of workers is centrally controlled by the requester, in which interested candidate workers advertise their skills and bid for participation on the team. Liu et al. [20] presented an efficient method that is implemented through profitable and truthful pricing mechanisms. Kargar et al. [21] presented a team formation method to satisfy the two objectives in social networks: finding a team of experts that covers all the required skills for the tasks and minimizing the communication cost between workers in the team. Moreover, a small number of studies on self-organized team formation exist in which some workers organize a team to bid for the task [22]. For example, Lykourentzou et al. [23] presented a self-organized team formation strategy where the hired workers can select the teammates themselves. Another typical method for crowdsourcing complex tasks is to decompose a complex task into simple subtasks and then assign them to individual workers, which will be introduced in Section 2.2. By comparing the previous retail-style task allocation in crowdsourcing, this paper can integrate the similar subtasks of different complex tasks into a batch and allocate them to the same workers, which can make requesters pay less since the task execution costs can be reduced and workers receive higher payment since each worker undertake more subtasks contemporaneously. This paper is different from our previous work [10] because the previous work in [10] only considers the similarity among the entire complex tasks, but this paper further considers the similarity and time constraints among subtasks decomposed from complex tasks. 2.2. Decomposition-based crowdsourcing for complex tasks Decomposition allocation of a complex task means that the task will first be decomposed into a flow of simple subtasks and then the decomposed subtasks will be allocated to individual workers [24]. This approach is mainly used in the following two situations: in crowdsourcing systems that are oriented to microtask markets, such as Amazon’s Mechanical Turk, and when the workers are non-professional and can only complete simple or micro-tasks. Current related work often focused on the decomposition methods of tasks and the distribution of budget among subtasks. Tran-Thanh et al. [24] proposed a crowdsourcing algorithm, BudgetFix, to specify the number of micro-tasks to be allocated at each phase of a workflow and dynamically allocate its budget to each micro-task. Bernstein et al. [25] used Mechanical Turk to

3

present a word processing interface that can be used for proofreading and editing documents, which decomposed a complex task into three stages: Find, in which workers identify patches of the requester’s task that need more attention; Fix, in which other workers revise the patches that were identified in the previous stage; and Verify, in which newly allocated workers vote on the best answers from the Fix stage and perform quality control on the revisions. In comparison, this paper is also based on the decompositionbased crowdsourcing but integrates the similar decomposed subtasks into batches to make batch allocation. 2.3. Crowdsourcing-based source quality inference Because source reliability is a key factor to consider during the task allocation process, there are many related work on the crowdsourcing-based source quality inference. Next we will introduce some representative works. Li et al. [26] proposed a budget allocation framework, Requallo, to allow requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The experiments on real-world tasks and simulated tasks both showed that the proposed framework outperform existing methods. Zhang et al. [27] investigated the reliability of data sources in social sensing, and they developed a Scalable Streaming Truth Discovery (SSTD) solution to address the issues of dynamics, scalability, heterogeneity, and unpredictability in truth discovery of social sensing. The experiments on real-world data showed that the proposed SSTD solution is scale and outperforms existing methods. Moreover, Zhang et al. [28] addressed the two important challenges in truth discovery, misinformation spread and data sparsity, and then developed a robust scheme. Meng et al. [29] investigated the truth discovery in crowd sensing to identify the true information and the reliable users. They considered the correlation among entities and proposed sequential and parallel solutions. Experiments show that the proposed approach can perform well in real-world crowd sensing applications. In comparison, the above related studies mainly considered the simple tasks, but this paper mainly considers the complex tasks. Therefore, in this paper, the decomposition of complex tasks and dependency among subtasks can bring about new challenges in task allocation, such as the complexity of algorithm and the constraints in optimization process. 2.4. Optimization in operations and logistics There are many studies on the optimization in operations and logistics; especially [30]. Next, we will introduce some related representative works in this area. Sayyadi and Awasthi [31,32] presented a simulation-based optimization approach for identifying key determinants for sustainable transportation planning. They mainly adopted a system dynamic (SD) [33] based conceptual model to understand the system and analyzed the sensitivity. Gharaei et al. [34] investigated the optimal supply chain batchsizing policy, and they presented a multiproduct, multi-buyer model with real stochastic constraints. Moreover, Gharaei et al. [35] dealt the Joint Economic Lot-Sizing Problem (JELP) in the context of the supply chain to minimize the total inventory cost. Duan et al. [36] investigated the selective maintenance scheduling under stochastic maintenance quality with multiple maintenance actions, and they used a simulated annealing algorithm to solve the optimization problem.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

4

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Table 1 Definitions of notations. Notation

Definition

Notation

Definition

CT m

A complex task

∪n Ωtnmx

The set of all subtasks that should be completed after tmx

Tm

The set of subtasks in CT m

Imx

The temporal sequence importance of tmx within CT m

Rm

The set of temporal sequence relationships among the subtasks in CT m

Bk

A batch of tasks

Sm

The set of skills necessarily to complete CT m

wi

A worker

bm

The budget provided by the requester of CT m

Pwi (tmx )

Real payment to an assigned worker wi for completing subtask tmx

dm

The deadline of CT m

Pwi (Bk )

The real payment that wi gets from Bk

tmx

A subtask in CT m

C

Batching scheme of subtasks

Smx

The set of skills necessarily to complete tmx

CV wi (Bk )

Crowdsourcing value of wi for Bk

lmx

The workload of tmx

AVwi (Bk )

Crowdsourcing value of wi for assisting wj to complete Bk

bmx

The divided budget for tmx

timewi (tmx )

The completion time of tmx by wi ,

dmx

The deadline of tmx

time(CT m )

The final completion time of CT m

Gm

The graph of the temporal sequence relationships among all subtasks within CT m

δ (tmx , tny )

The skill distance between two subtasks, tmx (tmx ∈Tm ) and tny (tny ∈Tn )

℧1tmx

The set of subtasks with the 1st-order dependency sequence relation linking tmx

δ (B1 , B2 )

The skill distance between two batches , B1 an B2

∪n ℧ntmx

The set of all subtasks that should be completed before tmx

N = ⟨B, E ⟩

The skill distance graph among batches, where B is the set of batches, E is the set of edges between all pairs of batches.

Ωt1mx

The set of subtasks with the 1st-order domination sequence relation linking tmx

Swi

The set of skills possessed by wi ∗

Rabbani et al. [37] introduced a fuzzy group decision model into the sustainable supply chain management, which can evaluate the sustainability performance of suppliers. Sarkar and Giri [38] investigated the stochastic supply chain model with imperfect production and controllable defective rate, which aims to optimal decisions and the best investment policy to minimize the total cost. Tsao [39] presented a continuous approximation method for formulating the supply-chain-network design problem under carbon emissions consideration and trade credits, then he used a nonlinear optimization algorithm to solve the problem. Dubey [40] generated theory using total interpretive structural modeling for sustainable manufacturing. Kazemi et al. [41] investigated the sensitivity analysis for economic order quantity models. Yin et al. [42] used the Stackelberg game theory to model the supply chain coordination problem involving one manufacturer and multi-suppliers. In summary, the business framework in this paper is a little similar to the form of multiproduct multi-buyer model. However, the optimization objective in these related works is to minimize the related cost such as the storage, movement, and carbon emission in logistics; in comparison, this paper mainly aims to minimize the payment of requesters. Moreover, this paper mainly focus on the optimal task batching and allocation scheme, but those related work mainly focus on the optimal planning of supply chain.

3. Problem description Table 1 summarizes the notations used in this paper. Now we will describe the problem to be investigated in this work.

wj

3.1. Decomposition of complex tasks A complex task is involved in many computation operations and requires multiple skills, which often needs a professional worker or more than one worker to complete. Definition 1. A complex task, let CT m , can be described by a tuple ⟨Tm , Rm , Sm , bm , dm ⟩, where:

• Tm denotes the set of subtasks in CT m , Tm = {tmx |1 ≤ x ≤ λ}, where λ denotes the number of subtasks in CT m . • Rm denotes the set of temporal sequence relationships among the subtasks in CT m . If there is a binary sequence relationship occurring from subtask tmx to subtask tmy , it denotes that tmx should be completed before tmy . • Sm denotes the set of skills necessarily to complete CT m . • bm denotes the budget provided by the requester of CT m . • dm denotes the deadline of CT m . A complex task can be decomposed into a set of subtasks. Definition 2. Let there be a complex task CT m . A subtask within CT m , tmx , can be described by a tuple ⟨Smx , lmx , bmx , dmx ⟩, where:

• Smx denotes the set of skills necessarily to complete tmx . • lmx denotes the workload of tmx , which can be estimated by the system or the workers. When a worker wants to bid for a subtask, he/she will estimate the workload of the task according to the current real situation and his/her experiences. In fact, the estimated completion time of a task is often used to measure the workload of that task [15]. • bmx denotes the divided budget for subtask tmx . • dmx denotes the deadline of tmx , which is calculated based on the deadline of CT m and the constraints of temporal sequence relationships among the subtasks in CT m .

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Based on our previous work on modeling the decision structures among agents [43,44], now we model the temporal sequence relationships among subtasks. The temporal sequence relationships among subtasks can be formed as a graph in which the vertices denote the subtasks and the edges denote their temporal sequence relationships. For each subtask, the ‘in’ relations denote it will be performed after other subtasks, the ‘out’ relations denote that it should be completed before other subtasks. Definition 3. The dependency sequence structure of a subtask within a complex task is defined as the set of ‘in’ relations linking this subtask with other subtasks. Let the graph of the temporal sequence relationships among all subtasks within a complex task CT m be Gm = ⟨Tm , Rm ⟩. Then, the set of subtasks with the 1st-order dependency sequence relation linking t mx , tmx ∈Tm , is the union of all subtasks with immediate ‘in’ links to tmx :

℧1tmx = {tmy | tmy ∈ Tm ∧ ⟨tmy , tmx ⟩ ∈ Rm }

(1)

Obviously, the set of subtasks with the 2nd-order dependency sequence relation linking t mx is:

℧2tmx = {tmz | tmy ∈ Tm ∧tmz ∈ Tm ∧⟨tmy , tmx ⟩ ∈ Rm ∧⟨tmz , tmy ⟩ ∈ Rm } (2) Then, the set of subtasks with the nth-order dependency sequence relation linking t mx can be defined as:

℧ntmx

= {tmz | tmz ∈ Tm ∧ tmy ∈

−1 ℧ntmx

∧ ⟨tmz , tmy ⟩ ∈ Rm }

(3)

Therefore, the set of all subtasks that should be completed before tmx is:



℧ntmx

(4)

n

On the other hand, a subtask can influence the start time of other subtasks, so we have the following definition:

5

Fig. 1. An example for the temporal sequencer relationships among subtasks within a complex task.

Definition 5. The temporal sequence importance of a subtask, tmx , within a complex task, CT m , is defined as the ratio of the number of tmx ’s all-orders domination subtasks to the average number of all subtasks’ all-orders domination subtasks: Imx

⏐⋃ n ⏐ ⏐ ⏐ n Ωtmx ⏐⋃ ⏐ = ∑ ⏐ 1 n ⏐ tmy ∈Tm ⏐ n Ωtmy ⏐ λ

(9)

The budget of the complex task is divided for each subtask according to such subtask’s required skills, workload, and sequence importance. Let bm be the budget provided by the requester of complex task CT m . Let tmx be a subtask in CT m . The budget distributed to tmx is: bmx = ∑

α1 · Imx + α2 · Lmx + α3 ·

|Smx | |Sm |

α · Imy + α2 · Lmy + α3 ·

tmy ∈Tm ( 1

|Smy | ) |Sm |

· bm

(10)

where α1 , α2 , and α3 are there parameters to determine the relative importance of the three factors, α1 +α2 +α3 = 1. Then, we design an algorithm to calculate the deadline of each subtask, shown as Algorithm 1. If a subtask has no domination subtasks, we can set the subtask’s deadline to be the same as the deadline of the complex task. Then, other subtasks’ deadlines will be set according to their temporal sequence relations with the already settled subtasks.

Definition 4. The domination sequence structure of a subtask within a complex task is defined as the set of ‘out’ relations linking this subtask with other subtasks. Then, the set of subtasks with the 1st-order dependency sequence relation linking t mx , tmx ∈Tm , is the union of all subtasks with immediate ‘out’ links to tmx :

Ωt1mx = {tmy | tmy ∈ Tm ∧ ⟨tmx , tmy ⟩ ∈ Rm }

(5)

Obviously, the set of subtasks with the 2nd-order domination sequence relation linking t mx is:

Ωt2mx = {tmz | tmy ∈ Tm ∧ tmz ∈ Tm ∧ ⟨tmx , tmy ⟩ ∈ Rm ∧ ⟨tmy , tmz ⟩ ∈ Rm } (6) Therefore, the set of subtasks with the nth-order domination sequence relation linking t mx can be defined as: −1 Ωtnmx = {tmz | tmz ∈ A ∧ tmy ∈ Ωtnmx ∧ ⟨tmy , tmz ⟩ ∈ Rm }

(7)

Therefore, the set of all subtasks that should be completed after tmx is tmx ’s all-orders domination subtasks, which is defined as:



Ωtnmx

(8)

n

Example 1. Fig. 1 shows an example to model the dependency/domination sequence structures. Now we will consider the sequence structures of subtask t13 . We have:

℧1t13

= {t12 , t14 },

℧2t13

= {t11 },



℧nt13

= {t12 , t14 , t11 };

n

{t15 }, Ωt213 = {t16 },

⋃ n

Ωtn13 = {t15 , t16 }.

Ωt113

=

3.2. Discounting of payment due to batch allocation This paper aims to present a new batch allocation approach for the subtasks decomposed from complex tasks. In the approach, a set of subtasks with similar skill requirements are integrated into a batch and are assigned to the same workers. However,

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

6

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

if more than one subtask waits at the same workers, some of them will be delayed. Therefore, to encourage the requesters to accept the delayed service due to batch allocation, a discounting mechanism needs to be introduced to reduce the real payment of requesters. On the other hand, the real task execution costs can be reduced because the partial execution results of similar subtasks can be used for each other in a batch; moreover, workers can receive higher payment since each worker undertakes more subtasks contemporaneously. Therefore, both the requesters and the workers will welcome such batch allocation approach with discounting mechanism. Discounting is a general market mechanism in which the payment can be discounted in exchange for batch service [45]. If a batch is larger, i.e., the service will be delayed more seriously and the execution cost will be saved more significantly, then the discounting will be made more significantly. Formally, a requester’s real payment to an assigned worker wi for completing subtask tmx can be discounted according to the number of batched subtasks that are queueing for wi , |Bwi |. Now, we define the discounting function as follows: Pwi (tmx ) = ψ (|Bwi |) · bmx

(11)

where ψ (X ) (0 ≤ ψ (X ) ≤ 1) is a discounting function decreasing monotonically from 1 to 0 with the increase of X , which is defined as exp(σ ·(−|Bwi | + 1)) and σ is a given discounting factor; bmx is the original budget of tmx . Let there be a batch of subtasks, Bk = {tmx }, allocated to a free worker wi ; If the subtasks in Bk are executed sequentially, the real payment that wi can get from batch Bk is



Pwi (Bk ) =

(ψ (x) · bmx )

enlarge each batch, the reason is that reducing the real payments of requesters is associated with increasing the batch size, which might also lead to higher earnings of workers. However, if a batch is too large, i.e., too many subtasks are allocated to the same worker simultaneously, the following problems may arise: (1) Some skill requirements of the subtasks in the batch may be difficult to be satisfied by the assigned worker and his/her cooperators in the social network, Cont (wi ). (Constraint 1). (2) If too many subtasks decomposed from complex tasks are waiting at the same worker, the completion time of some subtasks may exceed their deadline, which may influence the completion time of other subtasks with any domination sequence relations linking to these waiting subtasks (Constraint 2). (3) If too many subtasks decomposed from complex tasks are waiting at the same worker, the final completion time of some associated complex tasks may exceed their deadlines (Constraint 3). (4) The payment will be heavily discounted and the real payment may be too low to satisfy the candidate worker’s reservation wage (Constraint 4). Therefore, the final optimization objective in this paper is to minimize the real payments of requesters under the constraints that the above four problems should be solved. Formally, the objective is to find a batching scheme as follows. C∗ = arg min( C

(12)

wj ∈W

3.3. Optimization objective

subject to:

A batch of subtasks is a set of subtasks decomposed from one or more than one complex task. Let there be a set of complex tasks {CT m } and the set of subtasks of these complex tasks be {Tm }. Let there be a batch of subtasks, which can be denoted as: Bk = {tmx }. A batching scheme of the subtasks decomposed from different complex tasks is the combination scheme of these subtasks, which is denoted as:

Smx ⊆ ((

k

m

(13)

The crowdsourcing system will assign each batch of subtasks, Bk , to a worker with the maximum probability to successfully complete these subtasks. The probability of completing a batch of subtasks is denoted as the crowdsourcing value of such worker for the batch of subtasks, CV wj (Bk ). Therefore, the batch Bk will be allocated to the following worker:

wi = arg max(CVwj (Bk )) wj ∈W

Pwi (C ) =

Pwi (Bk )

(16)



Swj ) ∪ Swi ), ∀Bk ∈ C , tmx ∈ Bk

(17)

(18)

wj ∈Cont(wi )

timewi (tmx ) ≤ dmx , ∀Bk ∈ C , tmx ∈ Bk

(19)

time(CTm ) ≤ dm

(20)

Pwi (tmx ) ≥ γwi , ∀ Bk ∈ C , tmx ∈ Bk

(21)

where timewi (tmx ) denotes the completion time of subtask tmx by worker wi , time(CT m ) denotes the final completion time of complex task CT m , Pwi (tmx ) denotes the real payment that wi receives from the requester of subtask tmx , cont(wi ) denotes the contextual workers of wi in the social network, and γwi denotes the reservation wage of worker wi . Eqs. (18)–(21) are used to satisfy Constraints 1–4 respectively. 3.4. Property analyses of the problem

(14)

where W denotes a crowd of workers. The total real payment that the requesters of Bk pay to worker wi is Pwi (Bk ). Then, for a batching scheme C, the total real payment that the requesters pay to the worker is:



Pwi (Bk ))

Bk ∈C

where wi = arg max(CVwj (Bk ))

x=1,...,|Bk |

C = {Bk | ∪Bk = ∪Tm }



(15)

Bk ∈C

The main objective of this paper is to minimize the real payments of requesters by conducting a proper batch allocation for the subtasks decomposed from complex tasks. According to the discounting function in Section 3.2, the larger a batch is, the less the requesters will really pay to the assigned worker, and the more the assigned worker will also earn. Therefore, we will try to

Theorem 1. The batch allocation problem for decomposition-based complex task crowdsourcing with the optimization objective in Eqs. (16)–(21) is NP-hard. Proof Sketch. The batch allocation problem for decompositionbased complex task crowdsourcing includes two sub-problems, the batching allocation of subtasks within each complex task and the one across different complex tasks. In fact, the first subproblem is similar to the batching allocation of general tasks by additionally considering the temporal sequence relations among tasks. The batching allocation of general tasks without temporal sequence relations has already been proved to be NP-hard in previous work [10], thus the research problem on batching allocation of subtasks within a complex task is also NP-hard. Now, the batch

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

allocation problem for decomposition-based complex task crowdsourcing involves this NP-hard sub-problem in combination with another sub-problem by additionally considering the constraints of temporal sequence relations among subtasks and the deadline of complex tasks, thus we have Theorem 1. □ 3.5. Crowdsourcing value To measure the probability of an assigned worker to successfully complete a batch of subtask, we define a concept of crowdsourcing value that also denotes the priority of a given worker’s being selected to undertake a batch of subtasks. To satisfy the optimization objective and constraints in Eqs. (16)–(21), the design of crowdsourcing value of a worker should consider the following factors: (1) skills possessed by the worker; (2) the estimated completion time of the worker for the subtasks in the batch; and (3) the worker’s reservation wage. Based on the modeling method in our previous work [10], now we model the three factors by considering the temporal sequence importance of each subtask, shown as follows. (1) The coverage degree of the worker’s skills for the skills required by the subtasks in the batch. Let Bk be a batch of subtasks. The skill coverage degree of wi for Bk will consider each subtask’s temporal sequence importance in Bk and the skill matching degree between wi and each subtask in Bk : Cov er(wi , Bk ) =



(∑

tmx ∈Bk

Imx tmx ∈Bk Imx

|Swi ∩ Smx |)

(22)

(2) The estimated completion time of the worker for subtasks in the batch. The estimated completion time of a worker for a subtask can be estimated by the system or the worker [15]. Moreover, the temporal sequence importance of each subtask in Bk should also be considered. Let f (wi , t mx ) be the estimated completion time of worker wi for subtask tmx . The index can be calculated as follows:





∑ f (wi , tmx ) ⎠

Est(wi , Bk ) = ⎝

Imx

tmx ∈Bk

/|Bk |

(23)

(3) The occupancy rate of the worker’s reservation wage on each subtask’s real payment. Suppose the real payment that the requester of subtask tmx will pay to wi is Pwi (tmx ). Let the reservation wage of wi be γwi . Then, the occupancy rate of wi ’s reservation wage on batch Bk ’s payment by considering each subtask’s temporal sequence importance is

⎛ Occ(wi , Bk ) = ⎝

( ∑ tmx ∈Bk

γwi Pwi (tmx )

∑ ·

)⎞ tmx ∈Bk Imx ⎠ /|Bk | Imx

(24) If the assigned worker wi has a lower Occ(wi , Bk ), wi may have more potential to distribute more utility to other workers who assist wi in executing subtasks in B. People are often connected by social networks [11,12,46]. If a worker cannot complete a batch of tasks by himself/herself, he/she needs to seek the help of other workers through his/her social networks. Moreover, The phenomenon of grouped workers is common in real crowdsourcing systems, i.e., workers are often naturally organized into groups through social connections [13, 14]. The workers within a social network or a group are often more cooperative, so we can try to allocate the subtasks of a complex task to the workers with close social relations or within

7

a group so that the assigned workers can cooperate smoothly and effectively. Therefore, we present the following definition of crowdsourcing value to measure the priority of a worker being assigned a batch of subtasks, which considers the above three factors that involve the following three types of workers: the worker himself/herself, the workers within the same group, and the workers in his/her social network contexts. Definition 6. The Crowdsourcing value of a worker, wi , for a batch of subtasks, Bk , is defined by considering the three factors in Eqs. (22)–(24), shown as follows. β1 ·Cov er(wi ,Bk )

CVwi (Bk ) = α1 ( β

2 ·Est(wi ,Bk )+β3 ·Occ(wi ,Bk )

( ∑

+α2

|WG(wi )−{wi }| (



+α3

wj ∈(WG(wi )−{wi })

)

β1 ·Cov er(wj ,Bk ) β2 ·Est(wj ,Bk )+β3 ·Occ(wj ,Bk ) /dij

wj ∈(W −WG(wi )−{wi })

)

β1 ·Cov er(wj ,Bk ) β2 ·Est(wj ,Bk )+β3 ·Occ(wj ,Bk ) /dij

(25) )

|W −WG(wi )−{wi }|

where W denotes the crowd of workers in the social network context, WG(wi ) denotes the group containing wi ; both W and WG(wi ) include w i . dij denotes the distance between w i and w j in the social network, which can be defined as the length of the shortest path between the two workers in the social network. α1 + α2 + α2 = 1; β1 + β2 + β3 = 1. Moreover, we want to try the best to allocate the subtasks to the same worker or the workers within the same group that may be more cooperative, we can set α1 > α2 > α2 . Because the assigned worker may not have all necessary skills for completing all subtasks in the batch, he/she has to seek other workers within the same group or the social network for help. To measure the probability of a worker that can help other workers completing the task, we have the following definition: Definition 7. Assistant crowdsourcing value of a worker. Let wi be the worker assigned for a batch of subtasks, Bk . Let Lacking_S Bk be the set of skills for Bk that are currently lacking by wi . The assistant crowdsourcing value of a worker, wj , for helping wi to perform the subtasks in Bk is defined as: ACwj (wi , Bk ) =

|Swj



Lacking_SBk |/|Lacking_SBk |

γwj · dij

(26)

4. Batch formation and allocation of subtasks This paper presents two approaches for batch formation and allocation of subtasks, the lateral and longitudinal approaches. The lateral approach does not consider the subordination relationship between subtasks and complex tasks, but the longitudinal approach considers these relations when it conducts batch formation and allocation. 4.1. The lateral approach The lateral batch formation and allocation approach presents that all subtasks subordinated to different complex tasks are all considered flatly, i.e., two subtasks may be combined into a batch no matter whether they are decomposed from the same complex task. Fig. 2 shows the flow chart of the lateral approach: at first the complex tasks are decomposed into subtasks, and the decomposed subtasks are mixed up together, then we can conduct batch formation on the mixed subtasks regardless their host complex tasks and assign each batch of subtasks to a worker.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

8

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

batch to justify whether such batch can satisfy the constraints. This process will be repeated until two nodes can be found to be integrated into a new batch satisfying the constraints. Then, Step 1 will be called. The above process will be repeated until no two batches can be found to be integrated together to satisfy the constraints. Algorithm 4 is used to integrate two batches, which is implemented according to the deadlines of the subtasks in the two batches. Fig. 2. Flow chart of the lateral approach.

Definition 8. The skill distance between two subtasks. Let there be two subtasks, tmx (tmx ∈Tm ) and tny (tny ∈Tn ). The sets of necessary skills required by tmx and tny are Smx = {sx1 , sx2 , . . .} and Sny = {sy1 , sy2 , . . .}, respectively. Then, the skill distance between subtasks tmx and tny is

δ (tmx , tny ) = 1 −

|Smx ∩ Sny | |Smx ∪ Sny |

(27)

Let there be a set of complex tasks {CT m }. The whole set of subtasks is: T = {tmx | ∀Tm ∧(∀tmx ∈Tm )}. Definition 9. The skill distance between two batches of subtasks. Let there be two batches of subtasks, B1 and B2 . The skill distance between B1 and B2 is the minimum skill distance between all subtasks in B1 and the ones in B2 , shown as follows:

δ (B1 , B2 ) =

min

tmx ∈B1 ,tny ∈B2

δ (tmx , tny )

(28)

Definition 10. The skill distance graph among batches is a weighted graph N = ⟨B, E ⟩, where B is the set of batches, E is the set of edges between all pairs of batches. The graph is fully connected, and each edge between two batches, ⟨B1 , B2 ⟩∈E, is associated with a weight, Weight (⟨Bm , B_candidate⟩) = (B1 , B2 ), which denotes the skill distance between the two batches. Now we present the algorithm for lateral batch formation and allocation, shown as Algorithm 2. In Algorithm 2, at first the initial weighted graph N = ⟨T , E ⟩ is inputted, in which each node represents a subtask, and the weight associated with each edge is the skill distance between the two subtasks. In the algorithm, at first we will set each subtask as an initial batch; then, these batches form the nodes in the graph N. We select two nodes with the minimum skill distance and integrate them into a candidate batch. Then, we will observe whether there exists a worker satisfying the following two constraints: the estimated completion time of the worker for the subtasks in the candidate batch can satisfy the deadlines of all subtasks in the batch; the real discounted payment by the requester of each subtask in the candidate batch can satisfy the reservation wage of the worker.

• If the constraints of deadlines of subtasks and the reservation wages of workers can be satisfied with the candidate batch, the candidate batch will be accepted. Then, the graph will be reformed by calling Algorithm 3, shown as follows: the batch will be regarded as a new node in the graph and the two original nodes are deleted; then we will re-calculate the skill distances among all nodes in the new graph. (Step 1) • If the constraints cannot be satisfied with the candidate batch, we will select the two nodes with the second minimum skill distance and integrate them into a candidate

Theorem 2. Let the batching scheme of the subtasks using our presented lateral approach be C. Let the mean skill distance between all pairs of subtasks within a batch Bj be md(Bj ) md(Bj ) =

1 2 · |Bj | · (|Bj | − 1)



δ (tmx , tny )

tmx ,tny ∈Bj

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

9

Fig. 3. Flow chart of the longitudinal approach.

It is assumed that there is another batching scheme of the subtasks, C’, which can satisfy all constraints in Eqs. (18)–(21). Then, we have:

¬((∃Bj ∈ C ∧ ∃Bk ∈ C ′ ) ⇒ (Bj ⊆ Bk ∧ md(Bj ) > md(Bk ))) Proof Sketch. We can use reductio ad absurdum to prove Theorem 2. Assume there is another batching scheme, C ′ , which can satisfy all constraints in Eqs. (18)–(21). If there is a batch, Bk , in C ′ , which contains a batch in C, Bj , it denotes that there exists two batches with less skill distance that can satisfy all constraints but are not integrated into a batch, yet there exists other two batches with higher skill distance that can satisfy all constraints are integrated into a batch. However, from Step 4 in Algorithm 2, we can see that in each round we will first consider the batch pair with less skill distance; only when the batch pair with less skill distance cannot satisfy the constraints, the batch pair with higher skill distance will be considered. Therefore, the above assumption cannot be real by using Algorithm 2. Thus, we have the theorem. □ 4.2. The longitudinal approach The subtasks decomposed from the same complex task may call the results of each other. If the subtasks decomposed from the same complex task are undertaken by the same workers, the results of these subtasks may be more compatible. Therefore, we here present a longitudinal batch formation and allocation approach considering the host complex tasks of subtasks. Fig. 3 shows the flow chart of the lateral approach. In this approach, at first we will try to implement batch formation and allocation within a complex task. After the batch formation and allocation within all complex tasks are finished, we will make batch formation and allocation for the existed batches of different complex tasks. Therefore, the longitudinal approach includes two parts, one is the batch formation and allocation within each complex task, the other is the batch formation and allocation among different complex tasks. The batch formation and allocation within each complex task is based on the temporal sequence among subtasks. At first we select the subtask which has the minimum deadline to form an initial batch; then, we will select another subtask which has the minimum deadline from the remaining subtasks without being integrated into any batches; if the constraints in the optimization objective in Eqs. (18)–(21) can be satisfied, the subtask can be integrated into the batch. Such process will be repeated until no satisfied tasks can be found to be integrated into the batch. Then, we will select a new subtask with the minimum deadline from the remaining subtasks to form a new batch and conduct the above iterations for the new batch. Finally, all subtasks are integrated into any batches. The whole process is shown as Algorithm 5; the time complexity of Algorithm 5 is O(|Tm |2 ).

After the subtasks within each complex task are finished of batch formation, now we design the algorithm for batch formation and allocation among complex tasks, shown as Algorithm 6. In Algorithm 6, time (Bk ) is denoted as the estimated completion time of a batch, which is the maximum deadline of all subtasks within the batch. In the algorithm, the batching of multiple batches is implemented according to the deadlines of the batches.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

10

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Let there be a set of original batches, B. At first we will select the batch with the minimum deadline to form an initial host batch, then other batches will be considered to be integrated into this host batch according to the ascending sequence of their deadlines by considering whether the constraints can be satisfied. If the host batch cannot be enlarged any more, we will select another batch from the remaining original batches with the minimum deadline to form a new host batch and repeat the process. Such batching process will be repeated until all original batches are considered. Theorem 3. Let there be two subtasks in complex task CTm , tmx and tmy . If tmx and tmy are not integrated into the same batch during the batch formation within CTm , they are also not integrated into the same batch during the batch formation between CTm and other complex tasks. Proof Sketch. If tmx and tmy are not integrated into the same batch during the batch formation within CT m , it denotes that at least one of the following situations occur during the batch including only tmx and tmy : (1) the deadline of tmx cannot be satisfied; (2) the deadline of tmy cannot be satisfied; (3) no workers can accept the real payment from tmx ; and (4) no workers can accept the real payment from tmy . Now, if any subtasks of other complex tasks are considered to be integrated into the batch of tmx and tmy , the new completion time of tmx and tmy will not be less than the original ones; moreover, the new real payments of tmx and tmy may also be reduced since the new batch is larger. Therefore, tmx or tmy in the new batch cannot be satisfied by any workers for the constraints. Thus, we have the theorem. □ 5. Finding assistant workers for a batch After each batch is assigned to a principal worker by calling Algorithm 2 or 6, the assigned principal worker will seek other assistant workers since the principal workers alone may not possess all necessary skills to complete the assigned batch of tasks. Some recent notable studies [11,12] have shown that workers often communicate via social networks and workers can selfreport their social connections to other workers. Because the workers within a social network are often cooperative [46,47], this paper assumes that the assigned principal worker will seek other assistant workers from his/her social network Moreover, current workers are often naturally organized into groups through social networks [13,14]. For example, at www. upwork.com, we find that the workers affiliated with any groups (i.e., agency freelancers) can constitute 47% of the total workers; at the Github website, there were 226 449 groups registered within half a year [13]. Because the workers within the same group have common characteristics and often have rich cooperation experience, it is more probable that the workers can cooperate smoothly and effectively in performing a task. Now we design the algorithm for finding assistant workers for complete a batch. At first the assigned principal worker will seek assistant workers within his/her group; only when the principal worker cannot find enough assistant workers within his/her group, he/she will seek other assistant workers outside the group from the whole social network. Moreover, to minimize the communication cost between allocated workers which will significantly influence performance in completing the outsourced task [21], we can let the principal worker seek assistant workers from near to far places within and outside the group from his/her social network. The algorithm is shown as Algorithm 7.

6. Experiments We now conduct experiments for our approaches by comparing with the traditional benchmark retail-style approach that assigns each decomposed subtask individually and independently. Moreover, we compare our two batch allocation approaches: the lateral and longitudinal batch formation and allocation approaches. We collect the data of complex tasks from Upwork.com and decompose them into subtasks. For example, ‘Financial Android App Development’ can be decomposed to four sub-tasks, ‘UI

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

11

Design’, ‘Android Dev.’, ‘Back-End Dev.’, and ‘Database’; ‘Mobile app to be build for ios & andriod’ can be decomposed to three subtasks, ‘UI Design’, ‘Android Dev.’, and ‘IOS Dev.’; ‘Java Servlet JDBC development’ can be decomposed to four sub-tasks, ‘UI Design’, ‘Front-End Dev.’, ‘Back-End Dev.’, and ‘Database’. By referring to the previous work [10] and considering the optimization objective of this paper, we define four indices to evaluate the performances of different approaches, shown as follows.

• Total Payment by Requesters: We define this index as the sum of all requesters’ real payments to evaluate our optimization objective of minimizing the requesters’ total real payment. • Average Income of Assigned Workers: We define this index as the average of all assigned workers’ real earnings within a given duration to evaluate another aspect of our optimization objective: improving the real hourly wages of workers. • Cooperation Efficiency of Assigned Workers. We use a weighted graph to denote the cooperation relations among assigned workers. If two assigned workers are in the same group, there is an edge between the two workers; the weight of an edge is 0.5 if the two workers undertake the subtasks attributed the same complex tasks, or 1 if they undertake the subtasks attributed to different complex tasks. If there are any isolated workers in the graph, the distance of these workers to other workers would be infinity; however, in order to ensure all the workers are connected, we assign a large value, 10, to connect the isolated workers with the main branch of the graph. We use the inverse of the minimum distance between two workers in the graph to denote their cooperation efficiency, and we use the average of the pairwise efficiencies of all pair workers in the set of assigned workers to denote the cooperation efficiency of all assigned workers. • Task Allocation Time (Running Time of Algorithm): This index is used to measure the computational efficiency of the task allocation approaches. The running time of our approaches includes the batch formation and allocation processes.

Fig. 4. The performance on total payments by requesters.

Fig. 5. The performance on average income of workers.

Our experiments are implemented in Python 2.7 and tested on an Intel(R) Core(TM) CPU i7-4770 3.4 GHz and 16G memory. Next we will give the experimental results on the above four performance indices of our presented two batching approaches and the traditional benchmark retail-style approach. 6.1. Tests on the performance of three approaches Fig. 4 shows the experimental results on the total payment by requesters. From the results, we can see that both our two approaches outperform the retail-style approach in terms of the total payment by requesters; therefore, it shows that our presented batch approaches can approach the optimization objective better than the traditional retail-style approach. Moreover, our lateral approach outperforms the longitudinal approach in terms of this performance index; the potential reason is that the lateral approach conducts batching on all subtasks no matter whether the subtasks are attributed to the same complex tasks, thus the average size of batches by the lateral approach is larger than the one by the longitudinal approach. Fig. 5 shows the experimental results on the average income of workers. From the results, we can see that both our two approaches outperform the retail-style in terms of the average income of workers; and we also see that the lateral approach

Fig. 6. The performance on cooperation efficiency.

outperforms the longitudinal approach in terms of this performance index. The reason is the same as the above one for the performance index of total payment by requesters. Therefore, it shows that the optimization objective of reducing total payment by requesters is compatible to the one of improving average income of workers. Fig. 6 shows the experimental results on the cooperation efficiency among assigned workers. From the results, we can see that both our two approaches can achieve assigned workers with

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

12

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

6.2. Tests on the effects of the number of crowd sources

Fig. 7. The performance on task allocation time.

higher cooperation efficiency by comparing the traditional retailstyle approach. Moreover, our longitudinal batching approach outperforms the lateral batching approach in terms of the cooperation efficiency among assigned workers; the potential reason is that the longitudinal batching approach considers the subordination relation between subtasks and their host complex tasks, thus it is more probable that two adjacent workers undertake the subtasks attributed to the same complex task. Fig. 7 shows the experimental results on the task allocation time. From the results, we can see that both our two approaches can significantly reduce the task allocation time; the reason is that we make allocation for a batch of subtasks so that significant allocation time can be saved. However, there is not obvious difference between the lateral and longitudinal batching approaches. Fig. 8 shows the tests comparing the approach considering the grouping of workers and the approach without considering the grouping of workers. In the former approach, the assigned principal worker will seek other assistant workers according to Algorithm 7, i.e., he/she will at first seek other assistant workers within his/her group; only when he/she cannot find enough assistant workers within his/her group, he/she will seek other assistant workers outside the group from the whole social network. In the latter approach, the assigned principal worker will seek other assistant workers directly from the whole workers, which is often explored in previous work [46]. From the results, we can see that our approach using Algorithm 7 can reduce total communication time of completing tasks and improve the cooperation efficiency of workers by comparing with the approach without considering the grouping of workers.

In real-world, the crow is dynamic and the number of crowd sources may often vary over time. Now we conduct a new series of experiments to test the effects of the number of crowd sources (i.e., number of workers), shown as Table 2. From the experimental results, we can see that the three performance indices (total payments by requesters, average income of workers, cooperation efficiency) will become better with the increasing of number of workers, i.e., the total payments by requesters will decline as the increasing of number of workers, and the average income and cooperation efficiency of workers will increase as the increasing of number of workers. The reason is that the increasing of worker number can result in more available candidate workers, thus more suitable workers can be selected to optimize the three performance indices. However, when the worker number exceeds a certain degree, the three performance indices will keep relatively steady; the reason is that a majority of suitable workers can be selected within a certain number of candidate workers. Moreover, the task allocation time will always increase as the increasing of workers. The reason is that the task allocation will consider all candidate workers, thus more candidate workers can result in more task allocation time. 7. Managerial implication Crowdsourcing is a new business solution for outsourcing, which means that tasks can be outsourced by a new business model that can harness the skills, knowledge and other resources of a crowd of people via an open call [47]. Therefore, crowdsourcing has been significantly investigated in management area [48– 51]. Many laws and mechanisms have been used to ensure the management efficiency of crowdsourcing. For example, regardless of the identities of the crowdsourcing companies and requesters, one of their most important objectives is to reduce costs; thus, many related studies have investigated efficient management mechanisms for realizing this objective [54]. In general, the managerial perspective can enable crowdsourcing systems to have managerial agilities that provide efficient and economic tools for utilizing human intelligence for real management applications; moreover, it can effectively utilize economic and management forces to shape crowdsourcing systems, which is crucial for the success of crowdsourcing markets. At current popular crowdsourcing websites, the outsourced complex tasks are often large scale. Therefore, it is a challenge to manage these large-scale crowdsourcing tasks. Traditional decomposition-based retail style crowdsourcing mechanism

Fig. 8. Tests comparing the approaches considering/without considering grouping of workers.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

13

Table 2 Effects of the number of crowd sources. Worker number

400

500

600

700

800

900

1000

Total payments by requesters

① ② ③

180 305 120 890 150 982

174 125 103 408 139 901

164 125 90 231 120 895

150 058 80 000 110 000

149 058 76 027 100 098

140 058 71 293 98 705

139 199 70 902 96 091

Average income of workers

① ② ③

583 890 710

692 997 866

758 1054 937

806 1110 1058

831 1204 1101

849 1286 1128

852 1299 1133

Cooperation efficiency

① ② ③

0.112 0.379 0.451

0.135 0.391 0.521

0.158 0.426 0.598

0.163 0.465 0.671

0.165 0.470 0.681

0.166 0.472 0.685

0.166 0.473 0.686

Task allocation time

① ② ③

183 92 95

216 102 99

239 121 125

252 131 129

294 149 158

325 163 167

391 192 203

① Retail-style allocation. ② Lateral batching allocation. ③ Longitudinal batching allocation.

needs to decompose each complex task into a series of simple sub-tasks, which may produce massive computational costs when the complex tasks are large scale; moreover, the independent allocation and execution of subtasks are difficult to manage. With the proposed approaches in this paper, similar subtasks are allocated and executed in a batch, which can effectively manage the crowdsourcing of large-scale tasks to reduce the computational costs and the real total payments of requesters. Moreover, the proposed approaches are sensitive to the numbers of crowd sources and tasks, and they outperform the traditional retail style crowdsourcing mechanism when the numbers of crow sources and tasks increase. Therefore, it is concluded that the proposed approaches can be applied to effectively manage the large-scale crowdsourcing e-markets. 8. Conclusion In previous decomposition-based complex task crowdsourcing where each complex task is decomposed into a set of subtasks, the retail-style task allocation approach is often used in which each subtask is allocated independently to individual workers. However, such traditional approach has the following typical drawbacks: independent allocation of decomposed subtasks will cost much time and the intermediate results of subtasks cannot be utilized by each other; the independent allocation does not consider the cooperation among assigned workers and the time constraints among subtasks. To solve the first drawback, this paper presents a novel batch allocation approach that can integrate similar subtasks into a batch and allocate such batch to the same workers. To solve the second drawback, the assigned workers will seek other assistant workers within the same group or with closer relations in a social network; moreover, the time constraints of subtasks are considered in the allocation. Then, two heuristic approaches are presented, the lateral and longitudinal approaches. Then, extensive experiments on real-world datasets have been conducted. The results show that our two presented approaches can improve the performance in terms of total payment by requesters, average income of assigned workers, cooperation efficiency of assigned workers, and task allocation time, by comparing with traditional benchmark retail-style allocation approach. Moreover, this paper conducts comparison between our two presented approaches, it shows that the lateral approach can achieve better performance in terms of total payment by requesters and average income of assigned workers, but the longitudinal approach can achieve better performance in terms of cooperation efficiency of assigned workers. Finally, we made a series of experiments to test the effects of the number of crowd sources (i.e., number of workers), which show that the three performance

indices will become better with the increasing of number of workers In the future, we will consider the situation in which the groups and social networks of workers are dynamic. Moreover, this paper only focuses on the optimization of allocation after decomposition; in the future, we will make research on optimizing the decomposition of complex tasks as well as optimizing the batch allocation of decomposed subtasks. CRediT authorship contribution statement Jiuchuan Jiang: Methodology, Conceptualization, Formal analysis, Writing - original draft. Yifeng Zhou: Investigation, Validation, Software. Yichuan Jiang: Formal analysis, Methodology, Writing review & editing. Zhan Bu: Software, Visualization. Jie Cao: Writing - review & editing. Acknowledgments This work was supported by the National Key Research and Development Program of China (2019YFB1405000), the National Natural Science Foundation of China (No. 61932007, No. 61806 053, No. 61807008, No. 61472079, No. 71871109, and No. 91646 204), and the Natural Science Foundation of Jiangsu Province of China (No. BK20171363, BK20180356, BK20180369). References [1] L. Shamir, D. Diamond, J. Wallin, Leveraging pattern recognition consistency estimation for crowdsourcing data analysis, IEEE Trans. Hum.-Mach. Syst. 46 (3) (2016) 474–480. [2] J.C. Bongard, P.D.H. Hines, D. Conger, P. Hurd, Z. Lu, Crowdsourcing predictors of behavioral outcomes, IEEE Trans. Syst. Man Cybern. Syst. 43 (1) (2013) 176–185. [3] J. Zhang, X. Wu, V.S. Shengs, Active learning with imbalanced multiple noisy labeling, IEEE Trans. Cybern. 45 (5) (2015) 1095–1107. [4] K. Mao, L. Capra, M. Harman, Y. Jia, A survey of the use of crowdsourcing in software engineering, J. Syst. Softw. 126 (2017) 57–84. [5] Z. Pan, H. Yu, C. Miao, C. Leung, Efficient collaborative crowdsourcing, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI-16, Phoenix, Arizona, USA, February 12–17, 2016, pp. 4248–4249. [6] Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali D. Ramchurn, Nicholas R. Jennings, Crowdsourcing complex workflows under budget constraints, in: Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI-15, Austin, Texas, USA, Jan 25–30, 2015, pp. 1298–1304. [7] A. Kittur, B. Smus, R. Kraut, Crowdforge: crowdsourcing complex work, in: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST-11, Santa Barbara, CA, USA, October 16–19, 2011, pp. 43–52. [8] M.A. Valentine, D. Retelny, A. To, N. Rahmati, T. Doshi, M.S. Bernstein, Flash organizations: crowdsourcing complex work by structuring crowds as organizations, in: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI-17, Denver, Colorado, USA, May 6–11, 2017, pp. 3523–3537.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.

14

J. Jiang, Y. Zhou, Y. Jiang et al. / Knowledge-Based Systems xxx (xxxx) xxx

[9] A. Kulkarni, M. Can, B. Hartmann, Collaboratively crowdsourcing workflows with turkomatic, in: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, CSCW-12, Seattle, Washington, USA, February 11–15, 2012, pp. 1003–1012. [10] J. Jiang, B. An, Y. Jiang, P. Shi, Z. Bu, J. Cao, Batch allocation for tasks with overlapping skill requirements in crowdsourcing, IEEE Trans. Parallel Distrib. Syst. 30 (8) (2019) 1737. [11] M. Yin, M.L. Gray, The communication network within the crowd. in: Proceedings of the 25th International World Wide Web Conference, WWW-16, Montreal, Canada, April 11–15, 2016, pp. 1293–1303. [12] M.L. Gray, S. Suri, The crowd is a collaborative network, in: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing, CSCW-16, San Francisco, USA, February 27- March 02, 2016, pp. 134–147. [13] J. Jiang, B. An, Y. Jiang, C. Zhang, Z. Bu, J. Cao, Group-oriented task allocation for crowdsourcing in social networks, IEEE Trans. Syst. Man Cybern. Syst. (2019) http://dx.doi.org/10.1109/TSMC.2019.2933327 in press. [14] C. Wang, L. Cao, C.H. Chi, Formalization and verification of group behavior interactions, IEEE Trans. Syst. Man Cybern. Syst. 45 (8) (2015) 1109–1124. [15] J. Wang, S. Faridani, P.G. Ipeirotis, Estimating the completion time of crowdsourced tasks using survival analysis models, in: Proceedings of the WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining, CSDM-11, Hong Kong, China, Feb. 9, 2011. [16] B. Afshar-Nadjafi, H. Mashatzadeghan, A. Khamseh, Time-dependent demand and utility-sensitive sale price in a retailing system, J. Retail. Consum. Serv. 32 (2016) 171–174. [17] J.T. Jacques, P.O. Kristensson, Crowdsourcing a HIT: Measuring workers’ pre-task interactions on microtask markets, in: Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing, HCOMP-13, Palm Springs, CA, USA, November 7–9, 2013, pp. 86–93. [18] M.C. Yuen, I. King, K.S. Leung, A Survey of crowdsourcing systems, in: Proceedings of the IEEE Third International Conference on Social Computing, SocialCom-11, Boston, USA, Oct. 9–11, 2011, pp. 766–773. [19] Long Tran-Thanh, M. Venanzi, A. Rogers, N.R. Jennings, Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks, in: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, AAMAS-13, Saint Paul, USA, May 6–18, 2013, pp. 901–908. [20] Qing Liu, Tie Luo, An Efficient and truthful pricing mechanism for team formation in crowdsourcing markets, in: Proceedings of the 2015 IEEE International Conference on Communications, ICC-15, London, UK, June 8–12, 2015, pp. 567–572. [21] Mehdi Kargar, Aijun An, Morteza Zihayat, Efficient bi-objective team formation in social networks, in: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD-12, LNCS, Vol. 7524, Bristol, UK, September 24–28, 2012, pp. 483–498. [22] Dayong Ye, Minjie Zhang, Athanasios V. Vasilakos, A survey of selforganisation mechanisms in multi-agent systems, IEEE Trans. Syst. Man Cybern. Syst. 47 (3) (2017) 441–461. [23] I. Lykourentzou, R.E. Kraut, S. Wang, S.P. Dow, Team Dating: A selforganized team formation strategy for collaborative crowdsourcing, in: Proceedings of the 2016 ACM CHI Conference on Human Factors in Computing Systems, CHI-16, San Jose, CA, USA, May 07–12, 2016, pp. 1243–1249. [24] Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali Ramchurn, Nicholas R. Jennings, Budgetfix: budget limited crowdsourcing for interdependent task allocation with quality guarantees. in: Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS-14, Paris, France, May 5–9, 2014, pp. 477–484. [25] M.S. Bernstein, G. Little, R.C. Miller, B. Hartmann, M.S. Ackerman, D.R. Karger, D. Crowell, K. Panovich, Soylent: A word processor with a crowd inside, Commun. ACM 58 (8) (2015) 313–322. [26] Q. Li, F. Ma, J. Gao, L. Su, C.J. Quinn, Crowdsourcing high quality labels with a tight budget. in: Proceedings of 9th ACM International Conference on Web Search and Data Mining, WSDM-16, San Francisco, USA, Feb. 22–25, 2016, pp. 237–246. [27] D. Zhang, C. Zheng, D. Wang, D. Thain, C. Huang, X. Mu, G. Madey, Towards scalable and dynamic social sensing using A distributed computing framework, in: Proceedings of the IEEE 37th International Conference on Distributed Computing Systems, ICDCS-17, Atlanta, GA, USA, June 5–8, 2017, pp. 966–976.

[28] D. Zhang, R. Han, D. Wang, C. Huang, On robust truth discovery in sparse social media sensing. in: Proceedings of the 2016 IEEE International Conference on Big Data, IEEE BigData-16, Washington, USA, Dec. 5–8, 2016, pp. 1076–1081. [29] C. Meng, W. Jiang, Y. Li, J. Gao, L. Su, H. Ding, Y. Cheng, Truth discovery on crowd sensing of correlated entities, in: Proceedings of The 13th ACM Conference on Embedded Networked Sensor Systems, SenSys-15, Seoul, South Korea, Nov. 1–4, 2015, pp. 169–182. [30] N.H. Shah, U. Chaudhari, L.E. Cárdenas-Barrón, Integrating credit and replenishment policies for deteriorating items under quadratic demand in a three Echelon supply chain, Int. J. Syst. Sci. Oper. Logist. (2020) http://dx.doi.org/10.1080/23302674.2018.1487606 in press. [31] R. Sayyadi, A. Awasthi, A simulation-based optimisation approach for identifying key determinants for sustainable transportation planning, Int. J. Syst. Sci. Oper. Logist. 5 (2) (2018) 161–174. [32] R. Sayyadi, A. Awasthi, An integrated approach based on system dynamics and ANP for evaluating sustainable transportation policies, Int. J. Syst. Sci. Oper. Logist. (2020) http://dx.doi.org/10.1080/23302674.2018.1554168 in press. [33] S. Ashkan, H. Shekarabi, A. Gharaei, M. Karimi, Modelling and optimal lotsizing of integrated multi-level multi-wholesaler supply chains under the shortage and limited warehouse space: Generalised outer approximation, Int. J. Syst. Sci. Oper. Logist. 6 (3) (2019) 237–257. [34] A. Gharaei, M. Karimi, S. Ashkan, H. Shekarabi, An integrated multi-product, multi-buyer supply chain under penalty, green, and quality control polices and a vendor managed inventory with consignment stock agreement: The outer approximation with equality relaxation and augmented penalty algorithm, Appl. Math. Model. 69 (2019) 223–254. [35] A. Gharaei, M. Karimi, S.A.H. Shekarabi, Joint economic lot-sizing in multi-product multi-level integrated supply chains: Generalized benders decomposition, Int. J. Syst. Sci. Oper. Logist. (2020) http://dx.doi.org/10. 1080/23302674.2019.1585595 in press. [36] C. Duan, C. Deng, A. Gharaei, J. Wu, B. Wang, Selective maintenance scheduling under stochastic maintenance quality with multiple maintenance actions, Int. J. Prod. Res. 56 (23) (2018) 7160–7178. [37] M. Rabbani, N. Foroozesh, S.M. Mousavi, H. Farrokhi-Asl, Sustainable supplier selection by a new decision model based on interval-valued Fuzzy sets and possibilistic statistical reference point systems under uncertainty, Int. J. Syst. Sci. Oper. Logist. 6 (2) (2019) 162–178. [38] S. Sarkar, B.C. Giri, Stochastic supply chain model with imperfect production and controllable defective rate, Int. J. Syst. Sci. Oper. Logist. (2020) http://dx.doi.org/10.1080/23302674.2018.1536231 in press. [39] Y.C. Tsao, Design of a Carbon-efficient supply-chain network under trade credits, Int. J. Syst. Sci. Oper. Logist. 2 (3) (2015) 177–186. [40] R. Dubey, A. Gunasekaran, Sushil, T. Singh, Building theory of sustainable manufacturing using total interpretive structural modelling, Int. J. Syst. Sci. Oper. Logist. 2 (4) (2015) 231–247. [41] N. Kazemi, S.H. Abdul-Rashid, R.A.R. Ghazilla, E. Shekarian, S. Zanoni, Economic order quantity models for items with imperfect quality and emission considerations, Int. J. Syst. Sci. Oper. Logist. 5 (2) (2018) 99–115. [42] S. Yin, T. Nishi, G. Zhang, A game theoretic model for coordination of single manufacturer and multiple suppliers with quality variations under uncertain demands, Int. J. Syst. Sci. Oper. Logist. 3 (2) (2016) 79–91. [43] Y. Jiang, J. Hu, D. Lin, Decision making of networked multiagent systems for interaction structures, IEEE Trans. Syst. Man Cybern. A 41 (6) (2011) 1107–1121. [44] Y. Jiang, J.C. Jiang, Understanding social networks from a multiagent perspective, IEEE Trans. Parallel Distrib. Syst. 25 (10) (2014) 2743–2759. [45] Praveen K. Kopalle, Carl F. Mela, Lawrence Marsh, The dynamic effect of discounting on sales: Empirical analysis and normative pricing implications, Mark. Sci. 18 (3) (1999) 317–332. [46] J. Jiang, B. An, Y. Jiang, D. Lin, Context-aware reliable crowdsourcing in social networks, IEEE Trans. Syst. Man Cybern. 50 (2) (2020) 617–632, http://dx.doi.org/10.1109/TSMC.2017.2777447. [47] J. Jiang, B. An, Y. Jiang, D. Lin, Z. Bu, J. Cao, Z. Hao, Understanding crowdsourcing systems from a multiagent perspective and approach, ACM Trans. Auton. Adapt. Syst. 13 (2) (2018) Article 8. [48] T. Kohler, M. Nickel, Crowdsourcing business models that last, J. Bus. Strategy 38 (2) (2017) 25–32. [49] J. Bloodgood, Crowdsourcing: Useful for problem solving, but what about value Capture? Acad. Manag. Rev. 38 (3) (2013) 455–457. [50] H. Piezunka, L. Dahlander, Idea rejected, tie formed: Organizations’ feedback on crowdsourced ideas, Acad. Manag. J. 62 (2) (2019) 503–530. [51] Y. Papanastasiou, K. Bimpikis, N. Savva, Crowdsourcing exploration, Manage. Sci. 64 (4) (2017) 1727–1746.

Please cite this article as: J. Jiang, Y. Zhou, Y. Jiang et al., Batch allocation for decomposition-based complex task crowdsourcing e-markets in social networks, KnowledgeBased Systems (2020) 105522, https://doi.org/10.1016/j.knosys.2020.105522.