A user-centered approach for integrating social data into groups of interest

A user-centered approach for integrating social data into groups of interest

DATAK-01505; No of Pages 14 Data & Knowledge Engineering xxx (2015) xxx–xxx Contents lists available at ScienceDirect Data & Knowledge Engineering j...

2MB Sizes 0 Downloads 21 Views

DATAK-01505; No of Pages 14 Data & Knowledge Engineering xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Data & Knowledge Engineering journal homepage: www.elsevier.com/locate/datak

Editorial

A user-centered approach for integrating social data into groups of interest Xuan-Truong Vu, Marie-Hélène Abel, Pierre Morizet-Mahoudeaux Heudiasyc CNRS 7253, University of Technology of Compiegne, Compiegne, France

a r t i c l e

i n f o

Available online xxxx Keywords: Social network sites Groups of interest Information sharing Information organization Collaboration Information retrieval

a b s t r a c t Social network sites with large-scale public networks like Facebook, Twitter or LinkedIn have become a very important part of our daily life. Users are increasingly connected to these services for publishing and sharing information and contents with others. Social network sites have therefore become a powerful source of contents of interest, part of which may fall into the scope of interests of a given group. So far, no efficient solution has been proposed for a group of interest to tap into social data, especially when they are protected by and scattered across different social network sites. We have therefore proposed a user-centered approach for integrating social data into groups of interests. This approach makes it possible to aggregate social data of the group's members and extract from these data the information relevant to the group's topic of interests. Moreover, it follows a user-centered design allowing each member to personalize his/her sharing settings and interests within their respective groups. We describe in this paper the conceptual and technical components of the proposed approach. To illustrate further the approach, a web-based prototype is also presented. A preliminary test using this prototype was carried out and showed encouraging results. © 2015 Elsevier B.V. All rights reserved.

1. Introduction During the past years, social web sites with large-scale websites like Facebook, Twitter, LinkedIn have become a very important part of our daily life. Hundreds of millions of users are highly connected to these websites for networking, communicating, publishing, and sharing with each other. An enormous amount of data, generally called social data including users' conversation, personal updates, and shared information (e.g. news, web contents) [26] is increasingly generated by users. That makes social network sites powerful sources of information, news, and content of interest. Meanwhile, people are often part of different groups of people sharing common interests. They join a group to take advantage of its collective knowledge which is built on every individual contribution. The more the members contribute to the group, the more they can learn for themselves. For the group to progress, each member is supposed to push relevant information into the group. However, people get used to post any content of interest directly on their social profiles, thus letting it available on their designated social networks. On the one hand, by doing this they can maintain their presence on the corresponding social networks and promote the interactions with their social contacts, and, on the other hand, sharing with a group requires extra efforts to select information relevant to its scope of interests. Since the social networks and the groups of a user are not identical, even though some groups can be formed within a social web site, most information shared by users on different social networks are probably missed by their groups. The question is: how can we make it available for the group?

E-mail addresses: [email protected] (X.-T. Vu), [email protected] (M.-H. Abel), [email protected] (P. Morizet-Mahoudeaux).

http://dx.doi.org/10.1016/j.datak.2015.04.004 0169-023X/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

2

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

We propose an answer in this paper, based on a user-centered approach for integrating social data into groups of interest. This approach allows users to aggregate their social data from different social networks like Facebook, Twitter, and LinkedIn and to share some parts of the aggregated data with their respective groups of interest. Users are also able to personalize their sharing settings and interests within their respective groups according to their own preferences. The paper is organized as follows. In the next section, we elaborate further our motivations for integrating social data into groups of interest. Then, we introduce two data models supporting social data aggregation and information sharing within groups of interest respectively. The technical aspects are then detailed in Section 4. Next, we present our first prototype and a preliminary test using it. We discuss some related work and what distinguish our approach in Section 6. Before giving our conclusions, we outline our future work. 2. Motivation In this section, we first present the definition of social network sites and the definition of groups of interest. Then, we show the underlying motivation for integrating social data into groups of interest. 2.1. Social network sites Social network sites, also called social networking sites, are open web-based services whose main functionality is to connect people. Basically, they allow individuals to (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system [7]. Social network sites have gradually experienced and rolled out new features to comply with users' upcoming demands such as sending instant or private messages, posting statuses, sharing links, creating events, and so forth. There are a large number of social network sites available for users to choose. Among others, Facebook, Twitter, LinkedIn and Google+ are the most successful examples in terms of number of active users, traffic volume, and number of generated contents [4]. Their coverage and focus are not identical. While Facebook and Google+ are two general-purpose networks, Twitter is devoted to micro-blogging activities, and LinkedIn is oriented to the professional community. As such, it is common that a single user is simultaneously connected to many of these social network sites to take advantage of different free services offered by each social network. Recently, these originally profile-centric platforms have increasingly played pivotal roles in supporting content production and diffusion. Firms, organizations, and media companies intensively use them as an efficient target of advertising and marketing to engage in a timely and direct way with a very broad audience. Likewise, users are turning to the same websites as their primary source of information, news, and content of interest. A large part of users' social activities therefore consists of reporting news and sharing information or links [21]. 2.2. Groups of interest Unlike social network sites which are individual-centric services, communities of interest are group-centered, which means that they are held and driven by a common interest. It may be a hobby, something that the community members are passionate about, a common goal, a common project, or merely the preference for a similar lifestyle, geographical location, or profession [38]. Taking part in the community enable its members to exchange information, to obtain answers to personal questions or problems, to improve their understanding of a subject, to share common passions or to play [19]. Due to the multifaceted lifestyle of modern living, any individual is often a part of many different communities [38]. Communities of interest are neither restricted to a particular geographical area nor a given number of members. They can be created and maintained on-line and/or off-line. Their forms also vary from small, closed groups such as those within a larger organization, to very large, open communities on the Web such as the Wikipedia, Youtube and Flickr communities. A given community may furthermore contain many nested communities. In this work, we are more concerned by small-sized communities of interest that we will refer to, in the remainder of this paper, as groups of interest. The underlying reason will be provided in the next subsection. 2.3. Integrating social data into groups of interest Both social network sites and groups of interests have played important roles in various areas. In regard to the information discovering and filtering process, they represent advantages and disadvantages as well. Social network sites provide a powerful multidomain source where recent information is constantly added. However, its numerous and heterogeneous natures often overwhelm users' limited cognitive processing capacity. Moreover, due to the imposed privacy rules, users are limited to their personal circles of social connections which mean that interesting information from outside the circles will be not shown to them. Groups of interests impose a group setting which makes sure that the members share only contents related to one or several particular topics at a single place. This makes it much easier to discover interesting information and useful contents. Nevertheless, the group commitment degree is different among members. Often, it is only a small number of members who actively generate contents, while the majority of members are passively consuming. A group may be therefore short of good contents if its active members are no longer active. This is more and more common, as people get used to systematically push interesting information on social network sites while forgetting to also share it with their interested groups. Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

3

Actually, there is a significant overlap between social network sites and groups of interest. More especially, many group members are also social network users, and vice versa. Nevertheless, there is no guarantee that the members of a given group are connected to the same social network sites, or connected to each other. Interesting information published by a member on a particular social network is not necessarily visible to other members. Certain social network sites like Facebook or LinkedIn provide features allowing users to create, join groups, ask and share specific information within the created groups. However, these groups are exclusive to the corresponding social network. The interaction between two groups, for example a Facebook group and a LinkedIn group, is not directly possible, even though they share a common interest and many identical members. One member has to manually copy the same information to the two groups to ensure that it could be shown by all members. So far, there has not been an efficient solution for a group of interest to tap into social data, especially when they are protected by and scattered across different social network sites. Our proposed solution consists of a user-centered approach for integrating social data into groups of interest. We are convinced that this solution could improve the group's efficiency and competitiveness, in particular with respect to the information sharing. The proposed approach allows a given group of interest to serve the social data generated by its members to empower the internal information sharing process. Especially, every member can be an active contributor of his/her group while keeping using the social network sites normally. Interesting information published by the members on any supported social network site is automatically retrieved into the interested group without requiring any extra efforts. It works even if the members are not active in publishing contents on social network sites, since it also considers the information shared with the members by their respective circles of friends, given the fact that people with similar interests are more likely to be connected. In addition, being able to access to its members' social data can also lead a group to advanced possibilities. One of them is to enhance the group's partial knowledge of each individual member. Based on what a member shares and who he/she follows on social network sites, the group could learn more about the member, for example: • Members' degrees of interest: The frequency of information sharing may be used to evaluate how much one is interested by a given topic of interest. • Members' affinity: The social relationships may give extra indicators for determining the affinity between one member and another. On the other hand, taking part in a number of groups of interest and sharing the social data with these groups will indirectly allow the users to solve the problem of information overload that they are facing on social network sites [13,18]. The interesting information is automatically extracted from the social data and split into the different groups. The users can therefore easily track down the expected information in the corresponding groups. It is important to note that this approach is not a substitution of neither social network sites nor groups of interest. It should be considered as a bridge between social network sites and groups of interest with the objective of combining the strengths of both. It therefore enables groups of interest to extend their internal collaboration to social network sites. Given that the users and their social data are the central elements of our approach, we have adopted a user-centered design which lets the users decide what to do with their social data. Such setting is furthermore believed to be more suitable for groups with a small set of people, in which the member may feel more comfortable and at ease while sharing his/her social data with other members. These could be formal groups or informal groups, depending on the topics of interest which are collectively defined by every individual member. 3. Modeling In this section, we present two data models. The first model allows to represent the users and their social data aggregated from different social network sites. The second model allows to represent the groups of interest, their interests and shared contents. 3.1. User's aggregated social data A user's social data include data published by, or involving, or shared with the user in social network sites. They therefore comprise a wide range of information such as profile information, social connections, postings, interests, and so forth. These various data are different from one social network site to another with respect to their scope and their completeness. For example, the user profile on Twitter is currently very bare. It only includes name, bio, and location of the member. The user profile on Facebook is more elaborated. It includes: basic information such as the name, photo, age, birthday, and relationship status; personal information such as interest, favorite music & TV shows, movies, books, and quotations; contact information such as mobile phone, landline phone, school mailbox, and address; and education and work information such as the names of schools attending/attended, and current employer. Moreover, each social network site utilizes its own syntax and terms for representing users' social data. For example, a piece of text published by a user is called “tweet” on Twitter but “post” on Facebook. Therefore, to represent social data from different social network sites, it is necessary to define a common model. Given the diversity of social data, such a model should be generic as well as extensible to support frequent social data and to easily be extended to accept Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

4

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

new kinds of social data. Based on a comparative study of social data available on Facebook, Twitter, LinkedIn, and Google+, we have identified the five most frequent dimensions as follows: 1. The Profile Information dimension includes basic information about the user such as name, description, city, email, gender, and location; 2. The Friend dimension represents connections established between the user and others; 3. The Group dimension contains information about the groups in which the user is involved; 4. The Interest dimension lists the user's interests; 5. The Post represents all contents shared by or shared with the user. Taking these important dimensions into consideration and based on ActivityStream1 which is a standardization effort for syndicating activities taken by users in different social network sites, we have built an adapted model as illustrated in Fig. 1. According to the model, a user can have several social accounts from different social networks. Each social account contains a number of attributes identical to the Profile Information. It also includes a number of timestamped social activities taken in the same social network. There are at this time four types of Social Activity: (i) the social account posts a post, (ii) it receives a following post (i.e. a post posted by another social network member), (iii) it befriends with a social network member, and (iv) it adds an interest. Each social activity refers to some given social data: (i) the activities are related to the Post-type social data; (ii) the activities are related to the Post-type and the Member-type social data; (iii) the activities are related to the Member-type social data; and (iv) the activities are related to the Interest-type social data which incorporate the Interest and Group dimensions. The social activities are unique to the corresponding social account whereas the social data are unique to the original social network. This means that some social activities from different social accounts may refer to a same piece of social data. For example, two social accounts befriend with a same social network member, thus receiving the same posts from this member. For the sake of simplicity, in the rest of this paper, we will use the term social data in order to refer to the associations of Social Activity and Social Data which are actually things that the users can share with their groups. Accordingly, there will be four shareable types of social data: Friends, Interests, Posts, and Following Posts. Note that every subclass of Social Data contains at least one text-valued property which will be used as input for the text-based retrieval techniques for filtering social data. For example, a member includes a description. An interest includes a name and a description. A post contains a text and in the case where it also contains a link, the title and description of the referred webpage are considered as well. This general social data model is expected to be extensive. If later we detect some important types of social data and would like to include them into the model, all we have to do is to add them as subclasses of the class Social Data and declare their corresponding social activities. There will not be any effect on the current model. 3.2. Group's interests and shared contents 3.2.1. Personalized sharing settings As mentioned above, a group's shared information is made up of the information extracted from its members' social data which may contain sensible contents that the users do not want to reveal. It is therefore important to give the users a control over what they are ready to share with the group instead of systematically sharing all of their aggregated social data. Thus, the proposed model contains features for personalizing the membership dues. As shown in Fig. 2, the two classes User and Group are linked through the association class called memberOf which contains three specific attributes reflecting a member's sharing settings. These are (i) authorized accounts, (ii) authorized data, and (iii) review. The first and second attributes allow a user to restrict the scope of the social data to share with a given group. For example, let us consider Table 1 where the columns represent different types of social data (e.g. Friends, Interests, Posts, Following Posts) and the rows represent different social accounts (e.g. Facebook, Twitter, LinkedIn). As a member of the group, the user can freely choose which social accounts and which types of social data will be used to share with the group. The only rule is that the user has to open at least the Posts-type social data of one of his/her social accounts. As shown in the table, the user decided to share his Twitter and LinkedIn accounts. Consequently, the Post-type social data from these two social accounts will by default be selected to match with the group's topics of interest in order to extract the relevant information. Moreover, the user decided to also share the Following post-type social data. Other social data like friends and interests will not be disclosed to the group. The third attribute called “review” is optional and complementary to the two first ones. If enabled, it will prevent any information detected relevant to the group from being immediately shared with the group, and makes it waiting for the user's verification. During the next visit of the group, the user therefore needs to review such information, thus deleting sensible information. This option can furthermore be used as a collaborative filter to filter out “false positive” information that automated filters missed. It is worth noting that we consider all the members of a group equally. There is at this time no particular need for specifying further the different membership types and their respective roles. 3.2.2. Topic and selector To extract the interesting information from the members' social data and organize the shared contents within a given group, we have applied a two-level structure. The first level called topics correspond to the topics of interest of the group. The second level called 1

http://www.activitystrea.ms/.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

5

Fig. 1. A general social user modeling.

selectors are technical specifications of the corresponding topics. While the topics allow the group to organize and categorize its shared contents by themes, the selectors configure the filtering process. As illustrated in Fig. 3, the group can be interested by a number of topics, each of which contains one or several selectors. There are at this time three types of Selector: (i) Keyword, (ii) Hashtag, and (iii) Concept that we will examine in detail in the next section. A piece of social data is considered as the group's shared content when it matches at least one of the group's selectors. The shared content can therefore be assigned to different topics.

3.2.3. Collective and personalized interests In our approach, the group's topics of interests are not statical but dynamic. Furthermore, they are collectively added and specified by any member. In other words, a member can propose a new topic whenever he/she finds it relevant to the group. Such collective principle allows the group to benefit from the expertise of each of its members. This is especially useful while dealing with social contents, the topics of which are constantly changing. If the members are active enough, the group will have enriched, updated topics and precise selectors, and thus end with more appropriate contents. However, this collective way of defining the group's topics of interest may lead to an important number of topics. It is unlikely that all topics fit the needs of all members. A given member may personally find some topics too broad or too specified or even unsuitable. The proposed model therefore includes the features allowing each member to personalize his/her interests inside a given group. The member does not need to accept all proposed topics, but can accept only a subset of topics that interest him/her the most. As illustrated in Fig. 4, all the topics of a group are created by the users who are the members of the group. Each member is able to choose to follow a new topic or to unfollow an old topic. Following a topic implies default acceptance of its current selectors, but the member can later deselect certain selectors if he/she wants to. The member can moreover suggest new selectors to the topic. Other members following the topic can accept or ignore these new selectors according to their preferences.

Fig. 2. Membership settings.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

6

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Table 1 Example of a member's sharing settings: the light-gray color means that the element has to be shared by default.

Authorized social data

Authorized profiles

Friends

Interests

Posts

Following posts

FB

No

No

No

No

TW

No

No

Yes

Yes

LI

No

No

Yes

Yes

These personalization features have twofold purpose. Firstly, they prevent the members from facing the overload of topics, and subsequently the overload of shared contents. Secondly, they provide the group with simple means to measure the relevancy of each topic and selector. The more a topic or a selector is followed by the members, the more relevant it is. 4. Proposed system In this section, we present two principal modules of a system that are able to support our approach. These are (see Fig. 5): (1) social data aggregation and (2) relevant information filtering. Both modules are detailed in the following subsections. 4.1. Social data aggregation The social data aggregation module is responsible for gathering the users' social data from their subscribed social network sites. Some techniques consist of automatically searching the users' various social profiles and crawling those pages to extract needed information [10]. Such techniques are however not efficient. Firstly, they have to identify unique users across social network sites that is not a trivial task. Secondly, only a small number of user information is made public by the social network providers for crawling. We have adopted another method which requires the users to explicitly authenticate and authorize access to their different profiles. Therefore, we have created a number of programs called aggregators which are able to deal with the different APIs (e.g. [15,32]) provided by the social network site providers. Each aggregator is dedicated to a specific social network site. Given the users' permissions, the aggregators can request the corresponding APIs for the users' social data at any time. This straightforward method enables an extended access to most of the users' social data. All the five aforementioned dimensions of social data are thus retrievable. The aggregation is technically possible thanks to the mapping rules hard-coded within each aggregator. These rules indicate which social data to be requested and how to assign them to the corresponding entities defined by our previous data model. To support a new social network site, we have to add an aggregator and include in it the corresponding mapping rules. The aggregators are scheduled to regularly (e.g. once per day, or every two days) recover the newly created social data. Otherwise, certain social network sites like Facebook and Twitter provide a very helpful feature called real-time update, to which the aggregators can subscribe to receive the users' new social data within a couple of minutes of their occurrence. 4.2. Relevant information filtering The objective of relevant information filtering module is to extract from the members' social data the information relevant to the groups' topics of interest. For that purpose, we have applied the information retrieval techniques which consider the members' social data as a collection of information resources and the groups' selectors as search queries. Basically, the process is composed of three main steps (see part 2 of Fig. 5): (i) social data indexing, (ii) information searching, and (iii) information indexing. The (i) step is to gather all the newly aggregated social data, to enrich and index them. To enrich social data, we have applied one method that consists of appending the social data containing external links with the content extracted from the referred web pages (e.g. the titles and the descriptions). This simple technique seems to be very useful, since lots of social data refer to external web resources. Each piece of enriched social data is considered as a document with multiple fields including its textual contents, its type, its provenance, its owner, and its timestamps. All these fields are then indexed for searching purpose. The (ii) step is executed per group while taking into consideration the members' sharing settings and personalized interests. Its main goal is to build suitable queries and to search them against the saved indexes in order to return matching contents. In practise, the list of members of a group and their respective sharing settings and personalized interests are first returned. Then, the filtering module goes through this list, for each member of which it creates as many queries as selectors that the member follows. The queries are differently generated according to the types of selector as follows: 1. Keyword-based selector accepts as value a single word or multiple terms with boolean operators (i.e. “AND”, “OR”, “NOT”). The initial value is expanded with its derived forms (e.g. plural or singular) and/or its synonyms using dedicated dictionaries. The final Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

7

Fig. 3. Topic and selector structure.

query is the OR-concatenation of the initial value and all of its derived terms, for example “automobile OR automobiles OR car OR auto”. 2. Hashtag-based selector expects as value a valid hashtag which is a word or a phrase prefixed with the symbol “#”. The hashtagbased selection is actually an “upstream” effort. When posting some contents in a social network site, a user may append it with a commonly agreed hashtag to directly indicate its relevancy. Therefore, we do not need to expand further the hashtag value. 3. Concept-based selector requires a referenceable concept belonging to a ontology publicly accessible via a SPARQL query endpoint. A group is encouraged to use its domain-specific ontology. Otherwise, it may be interested by DBpedia2 which is a generic knowledge base containing millions of multi-language and multi-domain entities. The configured concept is used to request its formal definition based on which the final query will be created. At this time, the final query is the OR-concatenation of the multilanguage labels of the concept. For example, the concept DBpedia:automobile leads to the query “automobile OR automobil OR automóvil”. Later, the final query could be further expanded by the labels of the related concepts of the initial concept, thus being less ambiguous and more precise. The created queries are at this stage not complete, as they do not consider the member's sharing settings yet. They should therefore incorporate additional conditions, for example “type:Post or FollowingPost”, “provenance:Twitter”. The matching contents returned upon these multi-field queries are then indexed during the (iii) step. Such indexing step is important for improving and speeding up the access to the shared contents within a group. Note that, when a member follows some particular selectors, it also means that the user will see only the contents matching these selectors. Each piece of shared content contains the reference of the corresponding social data and three additional indexes: the member who owns it, the group with which it is shared, and the selector that it is matching. With these indexes, it would be very easy and quick to answer to queries like: • • • • •

Return all contents shared with a given group, Return all contents shared by a given member with a given group, Return all contents shared by a given member with a given group and matching a given topic, Return all contents matching a given selector, Return all contents matching a given topic.

5. Web-based prototype and preliminary test 5.1. Web-based prototype Based on the previous system architecture, we have implemented a first prototype called SoCoSys standing for social and collective system. This web-based service, accessible at,3 allows the user to register with a unique email and a password, then to create an aggregated profile. At this time, the user can authorize SoCoSys to access to his/her Facebook, LinkedIn, Twitter profiles, thus to aggregate his/her social data. 2 3

http://dbpedia.org/About. http://212.129.40.98/scs/.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

8

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Fig. 4. Collective and personalized interests within a group.

Fig. 5. System modules: (1) social data aggregation and (2) relevant information filtering.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

9

Fig. 6. The user's aggregated data: (1) different views.

Once it is completed, the user can view his/her aggregated social data which are arranged into five views: profile information, friends, posts, following posts, and interests corresponding to the social account entity and the four types of social data (see Fig. 6). These social data are regularly updated according to the actual aggregation frequency. They are moreover exclusive to the user, and no one else can see them. Then, the user can view the list of groups that he/she belongs to. The user can also create a new group and search and join a group (see Fig. 7). We have distinguished two types of group: (i) private groups and (ii) open groups. The former is accessible only on invitation whereas the latter is open to any SoCoSys user. Therefore, the user can choose to create open or private groups. We have also added special features called notifications. These notifications show the user the latest news of each group, for example the number of new members, the number of new topics, the number of newly shared contents, and the number of contents to review. They thus give the user a good indication to decide which groups to visit first. After joining a group, the user is encouraged to edit the default sharing settings according to his/her preferences. The user then needs to select some topics among all recent topics to follow, and subsequently choose for each followed topic, its appropriate selectors (see Fig. 8). Furthermore, the user can propose his/her own topics and selectors as well.

Fig. 7. The user's groups: (1) groups that the user belongs to, (2) keyword-based group search feature, (3) group creation feature, (4) group recommendation feature, (5) notification features.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

10

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

Fig. 8. The user's personalized interests: (1) followed topics, (2) topic creation feature, (3) selectors suggested and followed by the different members, (4) selector suggestion feature, (5) accepted selectors.

Finally, the user can view all the shared contents matching his/her personalized interests in a chronological order. It is possible to filter contents by selecting a particular topic (see Fig. 9). For each shared content, the user can furthermore vote it as “relevant” or “irrelevant” that will help others to focus on the most relevant contents in the first place. If a piece of content receives a certain number of irrelevant votes (e.g. 2 or 3 if the group size does not exceed 10 members), it will be deleted from the group's shared contents. The user can also delete definitively a piece of content if it is extracted from his/her social data. 5.2. Preliminary test We invited a small group of users to test this prototype. The test is not meant to represent a real experiment, but to observe how the users understand the proposed approach and utilize the provided services. The test group consisted of ten volunteered international PhD students at the University of Technology of Compiègne, who are regular users of social network sites. They were introduced to our user-centered approach for integrating social data into groups of interest and to the operation and specific features of the prototype. After one month of observation (from June 1st to 30th 2014), we found out that: 1. All 10 participants granted access to their Facebook profiles, and 6 of them connected at least one another profile. This implies that the participants consider Facebook as an important source of information and contents of interest. Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

11

Fig. 9. The group's shared contents: (1) shared contents, (2) the user's followed topics.

2. On average, they all had more than 300 friends on Facebook and received nearly 180 following posts per day. Quite a big number of contents would require the participants to spend considerable time and effort to select the good contents. 3. About 90% of saved following posts contained at least one URL. This confirms the fact that social network sites play an increasing role in supporting information and content production and diffusion. 4. During the test, the participants created in total 10 groups, of which 6 were private groups and 4 were open groups. 5. 8 out of 10 participants joined open groups and most of them did not edit their respective sharing settings and also disabled the review option. This could be explained by the fact that they already knew and trust each other. 6. The topics of interest of the groups varied a lot from general areas like football and politics to specialized areas like social media and social responsibility. Furthermore, every group received a number of contents extracted from its members' social data (e.g. between 34 and 300 pieces of contents). All these prove that social network sites can provide a wide range of contents. 7. The two most “successful” groups were the two open groups football and politics which had respectively 6 and 4 members. This understandably seen the very broad and common centers of interest of the two groups. 8. Within the open groups, the participants mainly created topics following some major real events, for example, “the FIFA world cup” or “the Brazilian general election” while in the private groups, they preferred more static topics, for example “photography” or “guitar”. 9. The members of the open groups were quite active for suggesting selectors. They mostly used keyword-based selectors and hashtag-based selectors, with which they are already familiar. Moreover, they did not accept systematically every selector but selected well those corresponding to their interests. 10. There are two behaviors when suggesting selectors: (i) adding multi-language or synonymous terms, and (ii) adding specialized terms. For example, in the case of “the FIFA world cup”, while some added three keywords “world cup”, “coupe du monde”, and “copa do mundo” to be able to follow the event in different languages, others added “England football team” to keep track of the specific element of the event. These interesting findings have confirmed our initial assumptions for a user-centered approach for integrating social data into groups of interest. The participants have been clearly aware of the potential of social network sites as sources of interesting information and useful contents. They have also found the interest of sharing with and learning from others in a group setting.

6. Discussion With the proliferation of social web sites, especially social network sites, the number of related studies in different fields ranging from information systems, communication to marketing, is constantly increasing [5]. Kwak and Lee [22] have investigated the current state of social network sites related research. By adapting the New Media Evolutionary Model proposed by Wimmer and Dominick [37], the authors have attempted to divide studies on social network site into four research trends: (1) focus on social network sites itself (e.g. definition and history, features), (2) focus on users and users of social network sites (e.g. user behaviors, motivations), Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

12

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

(3) focus on effects of social network sites (e.g. privacy and security issues, social outcomes, education effects), and (4) focus on how social network sites can be improved (e.g. social games, social commerce). However, such division with most cited studies principally focus on social aspects of social network sites. Another more applicative research direction on social network sites does exist. Indeed, it tries to extract and exploit data generated by social network sites. Social network analysis is one stream of this direction. It gathers works that first crawl the underlying graph from social network sites [8] and then applies graph algorithms for investigating latent structures and organization inside this graph. Thereby, they allow to build real applications such as detecting community [24], analyzing information propagation [23], and identifying influential users [36,30]. Some specific works can be classified into social prediction research. These works take advantage of the real-time nature and big volume characteristics of social data for identifying or predicting some real-world events. For example, Sakaki et al. [29] have proposed an earthquake reporting system, which is intended to detect an earthquake in Japan with a high probability merely by monitoring tweets. Bollen et al. [6] have extracted public mood states from public tweets to improve the accuracy of prediction of the daily closing values of the stock market. Other similar works have been introduced in [40,33,12]. While both these two research streams are oriented to the crowd on social network sites, the third one is rather user-centric. It actually exploits a user's information available on social network sites to learn more about the user and then adapt the user experience within third systems. Recently, most of related works are recommendation oriented, such as books, movies recommender [31], news recommender [3,27], people recommender [20,25,35], for example. The first part of our work belongs to this user-centric research stream. It focuses on user-centered social network aggregation. Unlike systems called social network aggregators like [16,17,39] which allow users to aggregate their different social accounts in a single location for personal uses [34], our is extended to also support the information sharing. To our knowledge, this is the first time that a user-centered approach for integrating social data into groups of interest has been proposed. Our research requires the definition of a common data model. There are many user modeling approaches in the literature, ranging from general purposes models [9] to specialized models dedicated to social web sites [1]. As mentioned above, our proposed model is designed as an extensive model so that it could be able to handle all the most frequent social data dimensions and to easily include the upcoming additional dimensions. Another originality of our work is the adoption of a semi-automated solution for filtering information. The centers of interest of a group are collectively added and specified by each of its members over the time. This allows the group to closely follow the emerging topics in social network sites. The solution furthermore contains user-friendly methods like hashtags and keywords which do not require the users to have specific knowledge and are generic enough to cope with the heterogeneous and text-based natures of social data. 7. Future work Our ongoing work focuses on two major tasks. The first task is to manage a more detailed evaluation of the proposed system. The second task consists of first studying the benefit and the feasibility of some advanced features beyond the information sharing enhancement purpose. 7.1. Evaluation Although, our preliminary test allowed us to statistically prove the interest of the users toward our approach and its usefulness as well, we would like to better evaluate the proposed system, SoCoSys, on some more qualitative aspects such as ease of use, satisfaction, best practises, and so on. Thus, we started another test with a bit larger set of users which are 13 computer engineering students at the University of Technology of Compiègne. Also, we will send them after the observation period a questionnaire to fill out with their own opinions and notes. The explicit feedbacks will help us to evaluate our system and improve it as well. 7.2. Advanced features We have identified some possible advanced features which enable to extend the primary use of our approach that is to enhance the sharing information process within groups of interests. These features are especially dedicated to the group awareness which is the understanding of the activities of each member of the group [14]. We discuss three of them below. 7.2.1. New topic discovering This feature allows to discover new topics, which are emerging keywords from the group's shared contents. A keyword is defined as emerging if it frequently occurs in a specified time interval, but not in previous ones [11]. In our case, the considered time interval can be a week or a couple of weeks. When a new topic is created with a frequent keyword, the system will propose it to all members of the group so that they can set it as their topics of interest. They may also complete it by adding additional selectors then. 7.2.2. Members' interest profiling With the proposed group model, it is statistically possible to find out which topic recently interests the most the members of a given group or which members are interested the most in a given topic. However, it does not allow to have the picture of a given member's evolving interests and subsequently his/her degrees of interest toward different topics. For that purpose, a user interest Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

13

profiling feature is needed. This feature can compute the user's interest degree about a given topic based on the number of shared contents that belong to the user and related to the topic taking into consideration their timestamps as well. We could rely on some related works such as [2,28], for implementing this feature.

7.2.3. Group-related decision making support A decision support feature consists of providing decision-makers with key information and/or its synthetic forms about a given problem. In the scope of our approach, we are especially interested by four types of group-related decisions: (1) expert finding, (2) project team building, and (3) group competency. In the first case, the task is to find the right member for a given task requiring specific knowledge and skills. Based on members' profiles of interests, the task can be easily solved. The more complete and updated the profiles are, the better the decision will be. The second task is subsequent to the first one. Once people with the right expertise are identified for a project, it is interesting to select from among them the best candidates to form a good team. In this case, we can also base the decision on their mutual affinity, with the assumption that the more they interact, the more they know each other and could work efficiently together. The last type of decision, group competency, is the most interesting and really strategic for any professional group. By putting all members of the group and their respective interests at the same level, we can obtain an updated overview of the group's available competency. We therefore know the most and the less competent topics according to the number of members interested in the topic. To display key information and its synthetic forms, there are different approaches. Regarding our types of decisions and available information, we can choose a visual approach which consists of representing a graph of the members and the topics of interest of a group. For example, Fig. 10 illustrates some focuses on such a graph: (1) a focus on a member's relationships with other members of the group, (2) a focus on a member's interested topics, and (3) a focus on a topic and members who are interested by it. A missing edge between two elements means that we do not have the corresponding information. The size of related elements shows their importance relatively to the considered element.

8. Conclusion In this paper, we have introduced a user-centered approach for integrating social data into groups of interest. This approach enables a group of interest to extend the internal collaboration to social network sites, in particular large-scale social networks. More specially, it makes it possible to aggregate social data of the group's members and extract from these data the information relevant to the group's interests. Therefore, we have presented a social data model and a group model which are our first contribution. The first model allows to represent the users and their social data aggregated from different social network sites. It is quite generic to handle the most important dimensions of social data available on social network sites, and is extensive to easily include the upcoming additional dimensions. The second model allows to represent the groups of interest and their topics of interest and shared contents. It also contains features allowing each member to adapt his/her sharing settings and personalize his/her interests within a given group. Note that the topics of interests of the group are collectively added and specified by any member over the time thanks to a topic-selector structure. Our second contribution is the system structure able to support our approach. This system is basically composed of two main modules responsible for two particular tasks: (i) social data aggregation and (ii) relevant information filtering. The first task is straightforward. The second task is essentially done by information retrieval techniques. Three different classes of selectors (i.e. keyword-based, concept-based and hashtag-based selectors) have been thus developed. The keyword-based and hashtag-based selectors are already user-friendly whereas the concept-based selectors require the knowledge of some domain ontologies. We have implemented a first web-based prototype called SoCoSys supporting the three social network sites, namely Facebook, Twitter, and LinkedIn. A preliminary test using SoCoSys has thus been carried out with a small group of volunteered participants. The findings have statistically shown the interest of the users toward our approach and its usefulness as well.

Fig. 10. Members/topics graphs: (1) one's relationship, (2) one's interested topics, (3) members who are interested by a given topic.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004

14

X.-T. Vu et al. / Data & Knowledge Engineering xxx (2015) xxx–xxx

References [1] A. Abdel-Hafez, Y. Xu, A survey of user modelling in social media websites, Comput. Inf. Sci. 6 (2013) 59–71, http://dx.doi.org/10.5539/cis.v6n4p59. [2] F. Abel, Q. Gao, G. Houben, K. Tao, Analyzing temporal dynamics in twitter profiles for personalized recommendations in the social web, Proceedings of the ACM WebSci'11, Koblenz, Germany 2011, pp. 1–8. [3] F. Abel, Q. Gao, G.J.G. Houben, K. Tao, Analyzing user modeling on twitter for personalized news recommendations, Proceedings of the 19th International Conference on User Modeling, Adaption, and Personalization, Springer-Verlag, Berlin, Heidelberg 2011, pp. 1–12. [4] H. Ajmera, Social media facts, figures and statistics 2013URL: http://blog.digitalinsights.in/social-media-facts-and-statistics-2013/0 560387.html2013. [5] K. Berger, J. Klier, M. Klier, F. Probst, A review of information systems research on online social networks, Commun. Assoc. Inf. Syst. 35 (2014) 145–172. [6] J. Bollen, H. Mao, X.J. Zeng, Twitter Mood Predicts the Stock Market, 2010. (CoRR abs/1010.3003). [7] D.M. Boyd, N.B. Ellison, Social network sites: definition, history, and scholarship, J. Comput.-Mediat. Commun. 13 (2007) 210–230, http://dx.doi.org/10.1111/j. 1083-6101.2007.00393.x. [8] F. Buccafurri, G. Lax, A. Nocera, D. Ursino, Moving from social networks to social internetworking scenarios: the crawling perspective, Inf. Sci. 256 (2014) 126–137, http://dx.doi.org/10.1016/j.ins.2013.08.046. [9] F. Carmagnola, F. Cena, C. Gena, User model interoperability: a survey, User Model. User-Adap. Inter. 21 (2011) 285–331, http://dx.doi.org/10.1007/s11257-0119097-5. [10] F. Carmagnola, F. Osborne, I. Torre, User data distributed on the social web: how to identify users on different social systems and collecting data about them, Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems, ACM, New York, NY, USA 2010, pp. 9–15, http://dx.doi.org/10.1145/1869446.1869448. [11] M. Cataldi, L. Di Caro, C. Schifanella, Emerging topic detection on twitter based on temporal and social terms evaluation, Proceedings of the Tenth International Workshop on Multimedia Data Mining, ACM, New York, NY, USA 2010, pp. 4:1–4:10, http://dx.doi.org/10.1145/1814245.1814249. [12] Y. Chun, H. Hwang, C. Kim, Development of a disaster information extraction system based on social network services, Int. J. Multimed. Ubiquit. Eng. 9 (2014) 255–264, http://dx.doi.org/10.14257/ijmue.2014.9.1.24. [13] R. Cohen, N. Sardana, K. Rahim, D.Y. Lam, M. Li, O. Maccarthy, E. Woo, G. Guo, Reducing information overload in social networks through streamlined presentation: a study of content-centric and person-centric contexts towards a generalized algorithm, in: S. Marsh, J. Zhang, C. Jensen, Z. Noorian, Y. Liu (Eds.),3rd Workshop on Incentives and Trust in E-Communities, Québec City, Québec, Canada 2014, pp. 13–18. [14] P. Dourish, V. Bellotti, Awareness and coordination in shared workspaces, Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work, ACM, New York, NY, USA 1992, pp. 107–114, http://dx.doi.org/10.1145/143457.143468. [15] Facebook-Graph-API, URL: https://developers.facebook.com/docs/reference/api/. [16] FriendFeed, Friendfeed is the easiest way to share onlineURL: http://friendfeed.com/. [17] Gathera, 2013, With Gathera, you access all your web accounts in one place, URL: http://www.gathera.com/. [18] M. Gomez-Rodriguez, K.P. Gummadi, B. Schölkopf, Quantifying Information Overload in Social Media and its Impact on Social Contagions, 2014. (CoRR abs/1403.6838). [19] F. Henri, B. Pudelko, Understanding and analysing activity and learning in virtual communities, J. Comput. Assist. Learn. 19 (2003) 474–487, http://dx.doi.org/10. 1046/j.0266-4909.2003.00051.x. [20] D. Horowitz, S.D. Kamvar, The anatomy of a large-scale social search engine, Proceedings of the 19th International Conference on World Wide Web, ACM, New York, NY, USA 2010, pp. 431–440, http://dx.doi.org/10.1145/1772690.1772735. [21] A. Java, X. Song, T. Finin, B. Tseng, Why we twitter: understanding microblogging usage and communities, Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, ACM, San Jose, California 2007, pp. 56–65. [22] H. Kwak, H.G. Lee, A review of research on social network services using the new media evolutionary model, Informatization Policy 18 (2011) 3–24. [23] K. Lerman, R. Ghosh, T. Surachawala, Social Contagion: an Empirical Study of Information Spread on Digg and Twitter Follower Graphs, 2012. (CoRR abs/1202.3162). [24] K.H. Lim, A. Datta, Finding twitter communities with common interests using following links of celebrities, Proceedings of the 3rd International Workshop on Modeling Social Media, ACM, New York, NY, USA 2012, pp. 25–32, http://dx.doi.org/10.1145/2310057.2310064. [25] P.D. Meo, A. Nocera, G. Terracina, D. Ursino, P. De Meo, Recommendation of similar users, resources and social networks in a social internetworking scenario, Inf. Sci. 181 (2011) 1285–1305, http://dx.doi.org/10.1016/j.ins.2010.12.001. [26] M. Naaman, J. Boase, C.h. Lai, Is it really about me? Message content in social awareness streams, Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work 2010, pp. 189–192. [27] S. O'Banion, L. Birnbaum, K. Hammond, Social media-driven news personalization, Proceedings of the 4th ACM RecSys Workshop on Recommender Systems and the Social Web, ACM, New York, NY, USA 2012, pp. 45–52, http://dx.doi.org/10.1145/2365934.2365943. [28] F. Orlandi, J. Breslin, A. Passant, Aggregated, interoperable and multi-domain user profiles for the social web, Proceedings of the 8th International Conference on Semantic Systems, ACM, New York, NY, USA 2012, pp. 41–48, http://dx.doi.org/10.1145/2362499.2362506. [29] T. Sakaki, M. Okazaki, Y. Matsuo, Earthquake shakes twitter users: real-time event detection by social sensors, Proceedings of the 19th International Conference on World Wide Web, ACM, New York, NY, USA 2010, pp. 851–860, http://dx.doi.org/10.1145/1772690.1772777. [30] D. Schall, Expertise ranking using activity and contextual link measures, Data Knowl. Eng. 71 (2012) 92–113, http://dx.doi.org/10.1016/j.datak.2011.08.001. [31] B. Shapira, L. Rokach, S. Freilikhman, Facebook single and cross domain data for recommendation systems, User Model. User-Adap. Inter. 23 (2013) 211–247, http://dx.doi.org/10.1007/s11257-012-9128-x. [32] Twitter-REST-API, URL: https://dev.twitter.com/docs/api/1.1. [33] K.N. Vavliakis, A.L. Symeonidis, P.A. Mitkas, Event identification in web social media through named entity recognition and topic modeling, Data Knowl. Eng. 88 (2013) 1–24, http://dx.doi.org/10.1016/j.datak.2013.08.006. [34] C. Virmani, A. Pillai, D. Juneja, Study and analysis of social network aggregator, Optimization, Reliability, and Information Technology (ICROIT), India 2014, pp. 145–148, http://dx.doi.org/10.1109/ICROIT.2014.6798314. [35] T. Vu, A. Baid, Ask, don't search: a social help engine for online social network mobile users, 2012 35th IEEE Sarnoff Symposium, IEEE 2012, pp. 1–5, http://dx.doi. org/10.1109/SARNOF.2012.6222758. [36] J. Weng, E.P. Lim, J. Jiang, Q. He, TwitterRank: finding topic-sensitive influential twitterers, Proceedings of the Third ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA 2010, pp. 261–270, http://dx.doi.org/10.1145/1718487.1718520. [37] R. Wimmer, J. Dominick, Mass media research: an introduction, Wadsworth Series in Mass Communication and Journalism, Wadsworth Pub, 2000. (URL: http://books.google.fr/books?id=o7UXFfIazWYC). [38] M. Wu, Community vs. social networkURL: http://lithosphere.lithium.com/t5/science-of-social-blog/Community-vs-S ocial-Network/ba-p/52832010. [39] J. Zhang, Y. Wang, J. Vassileva, SocConnect: a personalized social network aggregator and recommender, Inf. Process. Manag. 49 (2013) 721–737, http://dx.doi. org/10.1016/j.ipm.2012.07.006. [40] B. Zhao, Z. Zhang, W. Qian, A. Zhou, Identification of collective viewpoints on microblogs, Data Knowl. Eng. 87 (2013) 374–393, http://dx.doi.org/10.1016/j.datak. 2013.05.003.

Please cite this article as: X.-T. Vu, et al., A user-centered approach for integrating social data into groups of interest, Data Knowl. Eng. (2015), http://dx.doi.org/10.1016/j.datak.2015.04.004