Efficient and fair assignment mechanisms are strongly group manipulable

Efficient and fair assignment mechanisms are strongly group manipulable

Resource Representative Model for Educational Resource Recommendation Zhaoli Zhang Di Zhang Hai Liu and Xiaoxuan Shen National Engineering Research...

587KB Sizes 0 Downloads 31 Views

Resource Representative Model for Educational Resource Recommendation Zhaoli Zhang

Di Zhang

Hai Liu and Xiaoxuan Shen

National Engineering Research Center for E-learning Central China Normal University Wuhan 430079, Hubei, China

National Engineering Research Center for E-learning Central China Normal University Wuhan 430079, Hubei, China

National Engineering Research Center for E-learning Central China Normal University Wuhan 430079, Hubei, China

[email protected]

[email protected]

[email protected]

caused the problems of redundant information and uneven quality of educational resources. When choosing the educational resources, the learners not only be faced with the problem of choosing resources that match their needs from massive educational resources but also be faced with the problem of selecting and judging high-quality educational resources. It is an urgent problem to find the suitable and high-quality educational resources for learners in the mass of learning resources.

ABSTRACT Educational resource recommendation has become an increasingly crucial problem. It is important for learners to find appropriate and high-quality educational resources in massively redundant information and educational resources of uneven quality. In this paper, a resource representative model based on user-resource network is proposed to select the representative resources from large user-resource network according to user’s behaviors. The selected representative resources can provide coarse-grained resource recommendation for learning-beginners, and supply the resource pre-selection set for personalized recommendation of non-learning-beginners so as to offer better resources for personalized recommendation. The model defines the concept of resource representative degree and propose a method to calculate representative degree with considerations of relative influence, similarity and connectivity between resources. To optimize the proposed model effectively, we present a greedy heuristic algorithm with provable approximation guarantees on the public dataset “citeulike”. Experimental results depict that proposed method significantly outperforms other traditional methods in representative resources selection.

For selecting the appropriate educational resources for learners, personalized recommendation method is an effective and promising solution. Personalized recommendation methods are generally divided into two categories: the historical data-based recommendation method (HDBR) and the content-based recommendation method (CBR) [4-7]. Collaborative Filtering is one of the most distinguished approaches [8-9]. In terms of recommendation methods, many researchers have conducted researches. Mou [10], took the characteristics of learners’ personality as the analysis dimensions, and constructed a learner model based on e-schoolbag learning system database through the analysis of the division of e-schoolbag learning system and the functional analysis of the system database. Taking the learner model as analysis object, they designed a personalized learning resource recommendation framework based on learner model. In order to achieve the personalized recommendation of e-learning knowledge resources, Zhao [11] used recommendation method based on knowledge and ontology technology to create learner knowledge model, the results showed that students considered the ontology and ontology-driven personalized recommendation of knowledge resource which satisfied individual learning requirements, stimulated learning motivations, promoted learner autonomy and optimized study process is very useful. Shen [12] proposed an automatic learning resources recommendation algorithm based on convolutional neural network (CNN) and the latent factor model with L1-norm regularization to recommend learning resources to learners. Experimental results on public database in terms of quantitative assessment showed significant improvements over conventional methods.

CCS Concepts • Applied computing ➝ Distance learning • Open & Elearning technological issues ➝ Intelligent e-learning technology.

Keywords Resource Representative Model; Representative Degree; Resource Recommendation; User-Resource Network.

1. INTRODUCTION With the development of online course platform such as MOOC (massive open online course) [1] and the innovations of the learning pattern [2-3], the digitization of educational resources has become the norm and the development of educational big data has flourished. However, the development of educational big data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] ICDEL '18, May 26–28, 2018, Beijing, China © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6431-7/18/05…$15.00

For the selection of high quality educational resources, it is essentially similar to the TOP-N resource recommendation. There are several relevant studies, such as: Yin [13] modeled five kinds of academic resources based on four content characteristics including resource type, subject distribution, keyword distribution and LDA theme distribution, and modeled the user’s interest performances after integrating user behavior, at the meantime, three non-content characteristics of authority, community heat and academic resources novelty were used for quality assessment. Finally, top - N resources were recommended based on the interest score and quality value of academic resource. Yang [14]

DOI: https://doi.org/10.1145/3231848.3231866

145

from the model. In the right fig 2, the v3, v4, v5, highlighted with rectangles, are the representative resources selected by resource representative model. The weighted directed edge between representative resource and another resources indicates how likely the selected resource represents the other resources. For example, resource v4 has a higher representative degree for resource v2 than v3.

put forward a recommendation strategy of high-quality resources based on statistical thinking. It provided users with high-quality resource recommendation through statistical analysis of historical data and user learning records. In summary, the personalized recommendation can effectively solve the problem of recommending appropriate educational resources for learners, but personalized recommendation takes more consideration on high interest similarity between user and resource and lacks of the consideration on the quality of resource. Furthermore, the personalized recommendation method suffering from the “cold-start” problem makes it a not very good method to recommend resources for learning-beginners. In this paper, we propose a resource representative model (RRM) based on users’ behavior in user-resource network. Considering the relative influence, similarity and connectivity among resources, this model selects the most representative high-quality resources from the large scale user-resource network. The selected high-quality resources can not only provide coarse-grained resource recommendation for learning-beginners, but also serve as a resource preparation set for personalized recommendation, to make up for the quality problem in personalized recommendation and also partially solve the “cold-start” issue.

v2 u2

u7

v5

v1

u3

v2

v6

u6 v7

v4

v1

v5

v4

u4

v7

u9

v3

u5

v3

Figure 2. Example of selecting representative resources from user-resource network. In this paper, we aim at investigating the problem of selecting representative resources from large user-resource network. To address this problem, we firstly define some function terminologies. Resource Influence function. Educational resources are special resources, which are enlightening and inheriting. Educational resources themselves have the ability to influence other educational resources, which is called the influence of educational resources. We define function f(vi, vj) as the relative influence for vi on vj. Intuitively, the larger the relative influence of resource, the more likely it can represent other resources. The influence value is determined by their attributes value, such as the number of views, clicks, and collections and so on. Let attributes of resource vi be n-dimensional vector Ai  [aik ]k 1,...n , n denotes the number of selected attributes. Attributes value Hi of resource vi is defined as follow,

Influence f(vi, vj)

C(vi, vj)

v6

u1

In this section, we propose a resource representative model to select representative resources from large user-resource network. Figure 1 depicts the structure and effect of RRM. RRM consists of three parts including influence, similarity and connectivity. Representative resources which are selected based on this model are mainly used to resource recommendation for learners. The problem formulation and model optimization are illustrated in following content.

Connectivity

user

resource u8

2. RESOURCE REPRESENTATIVE MODEL

Educational resource

u v

Representative educational resources

Similarity T(vi, vj)

RRM

beginner

H i   wk  aik

Personalized Recommendation Engine

wk represents the weight of attribute ak which indicates the importance of the attribute. 0 ≤ wk ≤1, . Formula of relative influence between resources is described below,

non-beginner

 H f (vi , v j )  1  exp   i  H j 

Figure 1. Effect of resource representative model (RRM).

2.1 Problem Formulation Considering finding the most representative resources set S from a large user-resource network, we define a user-resource network: G = (V, U, E), where V is the set of resources, U is the set of users, and E is the set of directed/undirected link edge between users and resources. To identify the most representative resources, we define the representative degree between resources as R(S, vi) in range of [0, 1], S perfectly represents vi if R(S, vi) = 1. For S = ∅, we have R(S, vi) = 0. In this paper, our goal is to find the most representative resources in user-resource network, we use an example to clearly demonstrate the motivation of this work.

  

Resource Similarity function. The resource similarity defined here is the content similarity between resources. Defining the resource similarity T(vi, vj) and the resource feature vector Z=(z1, z2, z3, …, zm), m denotes the number of features. Resource similarity is calculated by the cosine similarity.

T (vi , v j )  cos( Z vi , Z v j ) 

Z vi  Z v j || Z vi ||  || Z v j ||

where T(vi, vj) represents the similarity between vi and vj, T(vi, vj) ∈ [0, 1]. It indicates that the two resource are not similar at all when T(vi, vj) = 0. Thus the probability of representing each other is basically 0. When T(vi, vj) = 1, it indicates that two resources are exactly the same, therefore they can represent each other.

Figure 2 shows an intuitive example of selecting representative resources from user-resource network. The figure on the left shows a heterogeneous network composed of users (u1, u2, u3, u4, u5, u6, u7, u8, u9) and resources (v1, v2, v3, v4, v5, v6, v7). The figure on the right shows the resource representative network generated

146

Resource Connectivity function: In the user-resource network G = (V, U, E), E represents the connection between users and resources. Defining e(u, v) be the edge function between user and resource. Let e(ui, vj) = 1 when user node is connected with resource node, while e(ui, vj) = 0 when user node is not connected with resource node.

EU ,v j 

end if ( repsum > max ) then v = vi; Update max = repsum; end end

e

ui U

S = S ∪ {v}; end return S On the basis of approximate algorithm, each time we select the resource node that has the largest representative degree by evaluating quality function until the resource subset S containing k resources to represent the entire resource network to the maximum.

ui , v j

According to user’s behavior, we define the connectivity between resources C(U, V), let user behavior vector be P={p1, p2, … , pl}, l denotes the number of types of user actions, such as click, view, collect, comment, buy and so on. C(vi, vj) is the connectivity between resources and E(U, vj) represents the edge connection of all user U to resource vj. The degree of connectivity between resources is equal to the total number of users with the same user behavior between two resources.

C (vi , v j ) 

 

pi P {vi V ,v j V }

E

U ,vi

EU ,v j

3. EXPERIMENTAL RESULTS AND DISCUSS



3.1 Experimental Setup To validate the effectiveness of the proposed model, we test RRM algorithm on the public dataset “citeulike” in this section. The “citeulike” dataset is used in the article “Collaborative Topic Regression with Social Regularization” [16] and the dataset is collected from the CiteULike website and google scholar, which includes the users’ collection of digital resources, the details of resources’ title and abstract (see the details of dataset at http://www.citeulike.ort/faq/data.adp). It should be noted that all citations of resources are recollected. A brief description of this dataset is illustrate in Table 1.

Resource representative degree: Considering the above factors, the definition of the resource representative degree is showed as follows.

R(vi , v j ) 

C (vi , v j ) 2 max Cvi ,v j

[ f (vi , v j )  T (vi , v j )]

where α denotes the weighted parameter, which can adjust the weight relationship between influence and similarity based on the actual distribution of data to make the selected resources more representative and diverse.

Table 1. Introduction of dataset Items Values Number of users 5551 Number of sources 16980 Number of user-source pairs 204987 Number of source-source pairs 11253566 Number of source abstract 16980 Number of citations 16980

2.2 Model Optimization A resource representative model is defined to generate the set of representative resources and design a quality function Q to evaluate the set of selected representative resources. Top-N selection problem had been proved to be NP-hardness [15]. In purpose of implementing the model effectively, greedy heuristic algorithm is employed to address it in this paper. Each time we traverse all resources and find the one that has the largest quality function value. The quality function Q defines below.

After cleaning the incomplete values and false values, we finally achieve 16980 resources and 11253566 resource relations. At first, samples have to be preprocessed. For resources’ abstract text information, TF-IDF algorithm is utilized to extract 8000dimensional word vector to represent. The text vector is reduced to 10 dimensions by auto-encoder algorithm since the data is very sparse. This 10-dimensional text feature vector is utilized to calculate the similarity.

Q( S ,U ,V , E )  max R(V , vi ) V T

The process of approximate algorithm for RRM is shown as Algorithm 1 in detail.

3.2 Experimental Results

Algorithm 1: Resource representative algorithm. Input: The set of users U; the set of resources V; the set of attributes of A; the values of R(vi, vj) for every pair of resources; the value of parameter α; the number of resources to find k. The maximum iteration number of k is the total number of resources. Output: A set S of k representative resources. S=∅ while |S| < k do max = 0; for each vi ∉ S do repsum = 0; for each vj ∈ vi.neighbor do r (vi, vj) = R (vi, vj); repsum += r (vi, vj);

In this section, the proposed model is utilized to select representative resources in large user-resource network. To verify the validity of it, RRM is applied on the public dataset and compared with other traditional methods. During the experiment, the top 5% resources of highest citations value is regarded as high-quality resources target subset. We compare resources which are selected by different methods with target subset, precision is used as the evaluation standard. Traditional Methods. For comparison purpose, the following methods are defined. In Degree. It selects the representative resources which have the highest links.

147

PageRank. This algorithm was invented by Larry Page and Sergey Brin in 1998 to estimate the importance of web page. This method selects nodes with the highest PageRank score as representative resource. Hits. This algorithm proposed by John Kleinberg in 1977 to calculate two values for each node: authority representing the authority score of node and hub representing out-links to other nodes [17]. Hits_a and Hit_h select the representative resources according to the authority value and the hub value respectively. S3. The S3 algorithm was proposed by Tang [18] to sample representative users in social networks. S3 algorithm samples representative users by different user attribute distribution. Table 2. Performance of different methods for the beginner Methods P@5 P@10 P@15 P@20 InDegree 40.00 50.00 40.00 45.00 PageRank 40.00 50.00 46.67 55.00 Hits_a 40.00 40.00 33.33 30.00 Hits_h 40.00 40.00 33.33 30.00 S3 20.00 20.00 13.33 25.00 RRM 60.00 60.00 53.33 60.00

Figure 4. Precision curve for beginners. Precision-curve. Figures 3 and 4 illustrate the precision curve of different methods. It can be observed that RRM outperforms other traditional methods in both circumstances on this dataset. The curve is relatively stable with a small decrease when recommending resources for beginners. However, the precision decreases more quickly when providing candidate set of resources for non-beginners’ personalized recommendation.

Table 3. Performance of different methods for the nonbeginner Methods P@50 P@100 P@150 P@200 InDegree 30.00 29.00 26.67 26.00 PageRank 42.00 33.00 30.67 28.50 Hits_a 26.00 19.00 15.33 12.50 Hits_h 26.00 19.00 15.33 12.50 S3 14.00 11.00 7.33 7.5 RRM 52.00 39.00 32.00 29.50 Both Table 2 and Table 3 list the results of the comparison methods on the dataset with following observations:

Effect of parameter α. To discuss the effectiveness of the parameter α on the result, different values of α are set in range of [0, 1]. We find that the model is not very sensitive with parameter α, although the results with different value of α are slightly different. In sum, the model performs best when α = 0.1. Table 4. Examples of representative source ID found by different methods RRM HITS S3 InDegree PageRank 3982 16046 12866 3982 16046 29 4180 12478 16046 29 16046 16 5162 29 4180 478 768 5954 15 15 15 1539 5439 478 478 16 1200 16076 9104 3982 7546 14716 1825 14608 1201 1201 15816 14113 4180 14608 4180 29 14934 1201 16 18 1674 12446 227 1193 NOTE: The number in the table is the resource id of the dataset.

High Precision. For the learning-beginners and the non-beginners, our model has achieved the best performance. For learningbeginner, in terms of P@5, P@10, P@15 and P@20, our model outperforms the other traditional methods (+5% ~ 20%) and performs well for non-beginners too (+1% ~ 26% by P@20 for example). The higher precision indicates the effectiveness of our model. For the learning-beginners, RRM can effectively recommend high-quality resources to them. And it can pick the candidate resource set for personalized resource recommendation to provide better quality educational resources for the nonbeginner.

Table 4 shows representative resources found by different methods on our dataset. For each method, we list the top 10 representative resources. The advantage of the proposed model is that it makes the selected representative resources to be of highquality and diversity with the consideration of more factors of resource and the combination of user behavior factors.

4. CONCLUSION In this paper, we propose a resource representative model (RRM) to select high-quality resources based on users’ behavior. First of all, a concept of resource representative degree, which represents how likely a resource can represent other resources, is proposed. Furthermore, according to the user’s behavior and the properties of resource, we propose an algorithm to calculate the representative degree between resources with consideration of relative influence, similarity and connectivity. To prove the

Figure 3. Precision curve for candidate set for non-beginners

148

Transactions on Consumer Electronics. 54, 3 (May. 2008), 727-735. DOI= http://dx.doi.org/10.1109/TCE.2008.4560154.

validity of RRM, greedy heuristic algorithm is utilized to address the model. Experimental results demonstrate that our model performs better than other traditional methods.

[7] R. Krestel and P. Fankhauser. 2012. Personalised topic-based tag recommendation. Neurocomputing. 76, 1 (January. 2012), 61-70. DOI= http://doi.org/10.1016/j.neucom.2011.04.034.

The major novelty of this paper is dealing with the relationship between users and resources into a large heterogeneous network. Moreover, we define a resource representative degree to quantize how likely a resource represents other resources and adopt approximate algorithm to select the representative resource set in the network. The selected resources are more authoritative and the model can provide better guidance resource for learning-beginners to achieve higher teaching efficiency. Experiments show that the proposed model can effectively provide coarse-grained recommendation for learning beginners, and provide better resource candidate set for non-beginner’s personalized recommendation. We believe that RRM method certainly has practical value in e-learning systems. Although resources considered here is educational resources, the method can be useful in other types of resources.

[8] Y. Korea and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender System Handbook. Springer. ch 5, 145-186. [9] A. Salah, N. Rogovschi, and M. Nadif. 2016. A dynamic collaborative filtering system via a weighted clustering approach. Neurocomputing. 175, A (January. 2016), 206-215. DOI= http://doi.org/10.1016/j.neucom.2015.10.050. [10] Zhijia Mou and Fati Wu. 2015. Research on personalized learning resources recommendation based on learner model in e-schoolbag. e-Education Research. 1 (January. 2015), 6976. DOI= http://dx.chinadoi.cn/10.13811/j.cnki.eer.2015.01.010. [11] Wei Zhao, Qiang Jiang, Pengjiao Wang and Liping Wang. 2015. Ontology-driven Personalized Recommendation of Knowledge Resource in e-Learning. China Educational Technology. 5 (May. 2015), 84-89.

5. ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China under Grant (No. 61505064), Hong Kong Scholars Programs under Grant (No. XJ2016063), the National Natural Science Foundation of Hubei Province under Grant (No. 2016CFB497), the National key Research and Development Program (No. 2017YFB1401301, 2017YFB1401303, 2013BAH72B01), the Cultivating Excellent Doctoral Dissertations Program of CCNU (No. 2017YBZZ009), and the Specific Funding for Education Science Research by Selfdetermined Research Funds of CCNU (No. CCNU16JYKX031, CCNU16JYKX027).

[12] Xiaoxuan Shen, Baolin Yi, Zhaoli Zhang, Jiangbo Shu and Hai Liu. 2016. Automatic Recommendation Technology for Learning Resources with Convolutional Neural Network. In 2016 International Symposium on Educational Technology (Beijing, China, July 19 - 21, 2016). IEEE, 30-34. DOI= http://dx.doi.org/10.1109/ISET.2016.12. [13] Liling Yin, Baisong Liu and Yangyang Wang. 2017. Research on cross-type excellent recommendation algorithm for academic resources. Journal of the China Society for Scientific and Technical Information. 7 (July. 2017), 715-722.

6. REFERENCES [1] M. Mitchell Waldrop. 2013. Online Learning: Campus 2.0. Nature. 495, 7440 (March. 2013), 160-163. DOI= http://dx.doi.org/10.1038/495160a.

[14] Zhuo Yang, Ludong Zhou, Fengqi Li and Feng Xia. 2013. Research on recommendation strategy of quality educational resources. The Chinese Journal of ICT in Education. 11 (November. 2013), 26-28.

[2] Hai Liu, Deli Kong, Zhaoli Zhang, Jiangbo Shu and Taihe Cao. 2017. Cloud-Class Blended Learning Pattern Innovation and its Application. In 2017 International Symposium on Educational Technology (Hong Kong, China, June 27 - 29, 2017). IEEE, 19-23. DOI= http://dx.doi.org/10.1109/ISET.2017.13.

[15] David Kempe, Jon Kleinberg and Eva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (Washington, D.C, August 24 - 27, 2003). ACM, New York, NY, 137-146. DOI= http://doi.acm.org/10.1145/956750.956769.

[3] Hai Liu, Yingying Chen, Zhaoli Zhang, Jiangbo Shu and Zhifei Li. 2017. Cloud-Terminal Integration Learning Platform and its Application in Blended Learning. In 2017 International Symposium on Educational Technology (Hong Kong, China, June 27 - 29, 2017). IEEE, 71-73. DOI= http://dx.doi.org/10.1109/ISET.2017.24.

[16] Hao Wang, Binyi Chen and Wu-Jun Li. 2013. Collaborative topic regression with social regularization for tag recommendation. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (Beijing, China, August 03 – 09, 2013). AAAI Press, 27192725.

[4] P. Lops, M. De Gemmis, and G. Semeraro. 2011. Contentbased recommender system: State of the art and trends. n Recommender System Handbook. Springer. ch 3, 73-105.

[17] Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM. 46, 5 (September. 1999), 604-632. DOI= http://dx.doi.org/10.1145/324133.324140.

[5] M. Balabanovi and Y. Shoham. 1997. Fab: content-based, collaborative recommendation. Communications of the ACM. 40, 3 (March. 1997), 66-72. DOI= http://doi.acm.org/10.1145/245108.245124.

[18] Jie Tang, Chenhui Zhang, Keke Cai, Li Zhang and Zhong Su. 2015. Sampling representative users from large social networks. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas, January 25 -30, 2015), AAAI Press, 304-310.

[6] Y. Blanco-Ferrandez, J. J. Pazos-Arias, A. Gil-Solla, M. Ramos-Cabrer, and M. Lopez-Nores. 2008. Providing entertainment by content-based filtering and semantic reasoning in intelligent recommender system. IEEE

149