Expert Systems with Applications 41 (2014) 2345–2352
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
A defence scheme against Identity Theft Attack based on multiple social networks Bing-Zhe He a, Chien-Ming Chen b, Yi-Ping Su a, Hung-Min Sun a,⇑ a b
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, ROC Innovative Information Industry Research Center, School of Computer Science and Technology, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
a r t i c l e
i n f o
Keywords: Identity Theft Attack Social networks Multi-dimensional social network
a b s t r a c t Recently, on-line social networking sites become more and more popular. People like to share their personal information such as their name, birthday and photos on these public sites. However, personal information could be misused by attackers. One kind of attacks called Identity Theft Attack is addressed in online social networking sites. After collecting the personal information of a victim, the attacker can create a fake identity to impersonate this victim and cheat the victim’s friends in order to destroy the trust relationships on the on-line social networking sites. In this paper, we propose a scheme to protect users from Identity Theft Attacks. In our work, users’ personal information can be still kept public. It means that this scheme does not violate the nature of the social networks. Compared with previous works, the proposed scheme incurs less overhead for users. Experimental results also demonstrate the practicality of the proposed scheme. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction With the popularity of on-line social networking sites, more and more people are becoming members of on-line social networking sites and sharing their personal information (Boyd & Ellison, 2007; Zhou, Nie, & Li, 2010). Users of on-line social networking sites tend toward three behaviors. First, people tend to share their personal information such as their name, birthday, E-mail address, photos on on-line social networking sites. The default setting for user’s profile is usually set to public and most users never change it. This large amount of public personal data attracts attackers’ attention. Second, people tend to believe information shared by their friends. If an attacker manage to successfully impersonate a friend of a victim and spread information, the victim may be cheated by the information from the attacker. Third, it is common for a user to register for accounts on multiple on-line social networking sites with the same public username. As a result, protecting a user’s personal information and identity from being misused by attackers becomes an important issue for on-line social networking sites (Cutillo, Manulis, & Strufe, 2010; Gross & Acquisti, 2005; Su, 2011; Zheleva and Getoor, 2009). Adversaries have developed many ways to attack on-line social networks to gain benefits. One attack called ‘‘Identity Theft Attack’’ ⇑ Corresponding author. Address: No. 101, Section 2, Kuang-Fu Road, Hsinchu, Taiwan 30013, ROC. Tel.: +886 3 5742968; fax: +886 3 5714787. E-mail addresses:
[email protected] (B.-Z. He), chienming.taiwan@ gmail.com (C.-M. Chen),
[email protected] (Y.-P. Su),
[email protected]. edu.tw (H.-M. Sun). 0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.09.032
is introduced by Bilge, Strufe, Balzarotti, and Kirda. Their work presents two attacks: the profile cloning attack and the cross-site profile cloning attack. In the profile cloning attack, an attack creates a fake account with the victim’s name and photograph inside the same social networking site and sends friend requests to the victim’s friends. If the victim’s friends do not realize the friend request is a fake identity, they will accept the attacker as their friend. The attacker can rebuild the victim’s friend network and make the fake identity be more similar to the victim. In addition, because the attacker can browse the victim’s friends’ profiles, the attacker can create more fake identities based on these data. On the other hand, in the cross-site profile cloning attack, an attacker steals victim’s information and creates a fake account on another social networking site that the victim does not use. After that, the attacker contracts the victim’s friends who are on the both sides. In their experimental results, both attacks are successful. This also demonstrates that most users are not aware that their identities on on-line social networking sites may be stolen. Identity Theft Attack invades the victims’ rights and influences the trust relationship they build on on-line social networking sites. Actually, improving the privacy settings is one way to resist the Identity Theft Attack. If all users decide not to disclose their personal information on social networking sites, the Identity Theft Attack may not become a problem. However, this violates the nature of social networks and causes social networks to become ‘‘non-social’’ networks. Besides on-line social networking sites, E-mail and Instant Messenger such as Windows Live Messenger or SKYPE also store
2346
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
a user’s social networks. According to our questionnaires, we observe the relationship of the user’s multi-dimensional social networks, including on-line social networking sites, E-mail and Instant Messenger, and we obtain some characters which is helpful for protecting users from Identity Theft Attack. In this paper, we divide a users’ on-line social network into three one-dimensional networks such as on-line social networking sites, E-mail and Instant Messenger. Through interaction of these networks, we attempt to propose a scheme to protect users from Identity Theft Attack. In this paper, we define the model for Identity Theft Attacks in multi-dimensional social networks. We then propose a scheme to protect users from Identity Theft Attacks. In the proposed scheme, users’ personal information can be still kept public. It means that this scheme does not violate the nature of the social networks. Compared with previous works, the proposed scheme incurs less overhead for users. Experimental results demonstrate the practicality of the proposed scheme. The rest of this work is organized as follows. We discuss related works in Section 2. The system model is defined in 3. In Section 4, the proposed scheme is described. The experiment and analysis are shown in Section 5. We discuss limitations and issues in Section 6. Finally, we conclude this work in Section 7.
2. Related work 2.1. Defending against Identity Theft Attack In order to protect users from Identity Theft Attack, some on-line social networking sites provide privacy setting features to help users manage their personal information to prevent profile cloning. In Facebook, users can edit their ‘‘Privacy settings’’ to decide what information is allowed to be browsed and who can browse it. On other social networking sites, such as Wretch.cc and Pixnet, users have the same rights to control their privacy. However, studies show that most users keep their settings on default rather than to change their settings to a custom one (Gross & Acquisti, 2005; Strater & Lipford, 2008; Fang & LeFevre, 2010; Church, Anderson, Bonneau, & Stajano, 2009). One reason for this user behavior is due to the complex user interface. Fang and LeFevre address the template design of Privacy Wizard to configure a user’s privacy setting automatically. They built a machine learning model to ask the user to assign privacy ‘‘labels’’ to some friends. The wizard then uses the input to built a classifier. This classifier assign privileges to the user’s friends who are not labelled automatically. They claim that users can complete their privacy setting with limited input. Although improving the privacy settings is one way to resist the Identity Theft Attacks, users have to struggle between availability and security. On the other hand, Bilge et al.’s work suggests three solutions. The first solution is to add extra information such as country information based on the IP and the profile creation time. The second is to increase the difficulty of CAPTCHAs. The third solution is for online social network service providers to use behavior-based anomaly detection techniques to detect and block malicious activities. Jin, Takabi and Joshi (2011) proposed two approaches based on attribute similarities and friend network similarities to discover suspicious identities. Their approach measures what ratio of the profile attributes are the same between the victim and the suspicious account. Their scheme assumes that the victim and the fake identity exist on the same social networking site, so they cannot detect cross-site profile cloning. Detection of cross-site profile cloning is not easy because the victim does not know that her identity has been spoofed on other social networking sites that she has never used. Once the attacker launches the Identity Theft Attack with
cross-site profile cloning, the victim’s friends becomes victims of fraud. Very recently, Goga, Lei, Parthasarathi, Friedland, Sommer and Teixeira also focus on the user identify problem. They utilize three features, the geo-location, timestamp of posts and the user’s writing style, on Yelp, Flickr, and Twitter. The results in Goga et al.’s paper have significant privacy implications and they show that the posts can provide enough information to correlate the accounts. In fact, this method requires to collect large amount of data such as ZIP Code, timing patterns, user’s post, etc. It seems not practical for users. Identity Theft Attack occurs not only on on-line social networking sites, but also in real life (Definition of Identity Theft and Attack). It usually resulted in financial loss (Atkins & Huang, 2013). In the U.S., Identity Theft is one of the fastest growing crime (Identity Theft). Some companies help people protect their identities, such as Identity Guard, TrustedID, LifeLock and Protect my ID. These companies provide services for customers’ credit, Internet, computers, mobile devices, etc. 2.2. E-mail address as an unique identifier The concept of Login Account As An Identifier approach comes from our observation of on-line social networking sites, E-mail and Instant Messagers. We conclude that a user’s E-mail address represents a user’s identity. In Balduzzi et al.’s work, they use a similar idea to launch automated user profile attacks (Balduzzi, Platzer, Holz, Kirda, Balzarotti, & Kruegel, 2010). Most on-line social network sites provide a finding friend feature. This function allows a user to enter his friends’ E-mail address in order to search for a friend. Balduzzi et al. propose an attack to find out the same user in the different sites. This attack is implemented on eight social network sites: Facebook, MySpace, Twitter, LinkedIn, Friendster, Badoo, Netlog and XING. Starting with 10.4 million E-mail addresses, they queried the social networking sites for registered E-mail addresses. Then, a crawler collects personal information available on each profile. Finally, a correlator identified if certain profiles indicate the same person by using the E-mail address as a unique identifier. Their result shows that they can identify more than 1.2 million user profiles. They also address countermeasures. Now, Facebook and XING limit the number of friend finding queries from the same source in order to address this weakness. 2.3. Multi-dimensional social network A person’s social network covers many kinds of relationships, such as friendship, kinship and coworkership. Based on these different relationships, a person’s social network can be divided into many one-dimensional social networks. The interaction of multidimensional social network is an issue because it discloses the individuals between networks. In Zhao, Yen, Ngamassi, Maitland and Tapia’s work, they analyzed the inter-organizational networks and modeled the emergence of collaboration. They disclosed how changes in the communication network influences the collaboration network. In the case study, they used the event-based multi-agent team formation model which they defined in a previous research (Zhao, Yen, Maitland, Tapia & Tchouakeu, 2010) to simulate the collaboration network among humanitarian organizations. Then they observed the emergence phenomenon with the agent-based simulation (Berry, Kiel, & Elliott, 2002). This simulation shows structures and patterns that are the result of interactions and decisions of heterogeneous agents results in. The leaders of the organizations benefits from an agent-based simulation, because they can foresee the impacts of their decisions. Their research helps humanitarian organizations to make more effective and efficient collaboration with others.
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
In our work, according to the types of the communication platform, we divide a users’ on-line social network into three onedimensional networks, on-line social networking sites, E-mail and Instant Messenger. Through interaction of these three networks, we address the scheme in order to protect users from Identity Theft Attacks. 3. Problem definition 3.1. System model We define n social networks Gn = (Vn,En), where Vn is a set of nodes and En is a set of relationships of nodes. Each node v ni 2 V n is an individual in the social network Gn. An edge enði;jÞ 2 En indicates
v ni and v nj are friends in Gn. Everyone may have different identities and relationships in multiple social networks. The login account of
v ni is v ni :login. v ni :login can be one of v ni ’s E-mail addresses or a combination of alphabets and numbers which is defined by v ni . n o v ni :fl ¼ v nj jv nj 2 V n ; enði;jÞ 2 En is the set of v ni ’s friend’s nodes in Gn. For example, an user Alice exists in three social networks G1 = (V1,E1), G2 = (V2,E2) and G3 = (V3,E3). And she uses E-mail addresses as her login account in G1. Her identity in G1 is v 1Alice . And v 1Alice :login ¼ alice@abc:com represents her login account. Her friend list in G1 is v 1Alice :fl. In G2 ; v 2Alice represents her. Her login account is v 2Alice :login ¼ alice and the friend list is v 2Alice :fl:v 3Alice ’s login account is v 3Alice :login ¼ alice@xyz:com and her friend list in G3 is v 3Alice :fl. v 1Alice :fl; v 2Alice :fl and v 3Alice are not the same, because her friends in G1 do not always exist in G2 and G3. According to previous works and our questionnaires (shown in Section 5), we can conclude the following phenomenons. These phenomenons would become the bases of our work. 1. For easy memory, users usually use the similar identities for different social networks. The identity can be a login account or a unique attribute of a user’s profile such as E-Mail address. For example, Alice may use her name to register for a mail service, and use this mail account,
[email protected], to register for other social networks. Since the core value of social networks is sharing information, users need to leave some personal information that can be searched by other users to easily build social links. Hence, using the identity as public information is a straightforward matter. Alice’s friends can easily construct social links with her via searching Alice’s identity. For the above reason, we examine the effect of similar identities for different social networks in the identity theft attacks and consider this phenomenon while designing our method. 2. Normally, a user can obtain her friends’ friend lists in the social network. Besides, a user may still see other users’ friend list since friend lists are public by default. Based on these reasons, we can conclude that it is possible to get other users’ friend list on the social networking sites. 3.2. Problem description Here we define the ‘‘Identity Theft Attack’’. If Alice has a friend, Bob, in G1. Bob’s identity is v 1Bob and the login account is his E-mail, v 1Bob :login ¼ bob@xyz:com in G1. There are three scenarios. Scenario 1. Bob does not registered in G2. v 1Bob 2 v 1Alice :fl but not v 2Bob 2 v 2Alice :fl. When an attacker observes this situation, she can launch a ‘‘Identity Theft Attack’’. According to Mannan and van Oorschot’s work, 74 percent online social network sites users exposed their personal information. So the attacker has a high probability of getting Bob’s information in G1 and create a fake identity v 2Bob0 in G2. Then, v 2Bob0 sends the
2347
friend request to v 2Alice . When v 2Alice receives a friend request from v 2Bob0 , Alice cannot confirm whether the request is sent by Bob who she knows in G1. Supposing Alice accepts the request, the attack is successful. v 2Bob0 is in the set of v 2Alice :fl. Scenario 2. In this scenario, v 2Bob exists in G2. However, v 2Bob is not in v 2Alice :fl. The attacker creates a fake identity v 2Bob0 with Bob’s personal information which is from G1 and G2. The attacker then sends the friend request to v 2Alice . Because no v 2Bob 2 v 2Alice :fl, Alice may consider v 2Bob0 as her friend Bob in G1. If Alice accepts the request from the attacker, v 2Bob0 is in v 2Alice :fl, and the attacker achieves her goal. Scenario 3. Supposed that Bob exists in G2 and v 1Bob 2 v 1Alice :fl \ v 2Bob 2 v 2Alice :fl. Like Scenario 1 and 2, an attacker can create a fake identity v 2Bob0 to cheat Alice. For example, the attacker claim that v 2Bob0 is Bob’s new account in G2. When the attacker sends a friend request to v 2Alice , she adds a message into the request, such as ‘‘Hi, I’m your friend Bob. I change my account v 2Bob to v 2Bob0 . Please add v 2Bob0 to your friend list.’’ If Alice is not a cautious user, she might add v 2Bob0 to her friend list, making the attack successful. As soon as the attacker becomes Alice’s friend in G2, she can collect more details about Alice by v 2Bob0 . She can also use a similar method to obtain more of Bob’s friends’ personal information in G2. As a result, the attacker can re-create a new identity to substitute Bob. If Bob does not notice that his identity in G2 is forged, his rights are violated. 4. The proposed schemes We address a scheme which consists of three approaches in order to solve the Identity Theft Attack. Our purposed scheme is based on multiple social networks. More social networks yields better results with our scheme. The approaches are Challenge, Login Account As An Identifier and Friend Network Similarity. We introduce these approaches in the following paragraphs. 4.1. Challenge Challenge is the simplest way to confirm whether or not a user is registered in a specific social network. It is direct and reliable. Since users can send messages on most on-line social networking sites and communication is the main purpose for E-mail and Instant Messengers; therefore, users have many channels to send Challenge messages to each other. Challenge is duplex, not only for Alice but also for Bob. They can send Challenge messages on the social network if they have a relationship. If G1 is an Email-based social network, Alice and Bob can send challenges via E-mails. If G1 is a social network based on Instant Messenger, the challenges can be sent via the Instant Messenger. If G1 is an on-line social networking site, Alice and Bob can transmit the challenge by a message-sending service. In Scenario 1 and 2, since v 1Bob 2 v 1Alice :fl, Alice can ask v 1Bob whether he is registered in G2. Because e1ðAlice;BobÞ exists, Alice can use this trusted relationship to confirm the new relationship in G2. The real Bob can actively tell Alice that e1Alice;Bob exists. It helps Bob gain Alice’s trust so that Alice is willing to add v 2Bob to v 2Alice :fl. In Scenario 3, Alice can ask both v 1Bob and v 2Bob . Bob also can provide more information in order to confirm his identity. Although Challenge is a convenient way to identify users, there are limitations. Alice has to wait for Bob’s response. If Bob forgets to answer the challenge message from Alice or seldom uses the social network in which they have created a relationship in scenario 1 and 2, it is G1; in Scenario 3, it could be G1 or G2 – Alice will have difficulty deciding whether or not she should accept the friend request in G2 or G3, respectively.
2348
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
4.2. Login account as an identifier
4.3. Friend network similarity
When people want to use services on the Internet, such as Email, E-commercial websites and on-line social networking sites, most service providers ask users to register their accounts. People also have to create accounts before they use Instant Messengers. In general, people tend to use the same or similar accounts in order to ease memorability. In the past, a user has to create an account on each web service site, but a resulting problem is that a user has to manage multiple accounts. Although a user tends to use the same or similar accounts, account management is still inconvenient. Therefore, there are some solutions to solve this problem. Many approaches are addressed, like OpenID or account management softwares. However, these approaches are not popular among users. E-mail is the most popular web service. In the United States, 94 percent of Internet users send or read E-mail (Email and webmail statics). In other words, almost every Internet user has an E-mail address. In addition, most people have more than one E-mail address. In our survey, each user has 3.45 E-mail addresses on average. Some on-line social networking sites such as Facebook, Pixnet and Plurk provide a feature that allows users to use their E-mail address as their login account. Yahoo! Mail, Windows Live Hotmail and Gmail are the top three free E-mail providers in some countries (Email and webmail statics; Stats). We find that many web service providers support users to log in with E-mail addresses which is provided by Yahoo! Mail, Windows Live Hotmail and Gmail. For example, when a user chooses to log in Yahoo! with her Gmail account, Yahoo! links to the Gmail sever to verify the user’s identity. According to Alexa, Facebook is the most popular on-line social networking site in the world. Instant Messenger is one of popular communication tools (Wikipedia). Top three Instant Messenger providers are Windows Live Messenger, (Yahoo! Messenger) and (Skype). We also can find that many Instant Messenger users use their E-mail as their username. In brief, because of the wide use of E-mail, we can regard one’s E-mail address as an user’s identity. When a user creates a new account, she chooses a variable to be her username. As we know, a user’s login account can be her E-mail address or a variable, such as Alice, v 1Alice :login and v 3Alice :login are her E-mail addresses and v 2Alice :login is a variable. This depends on the service providers’ rules. v 1Alice :login and v 3Alice :login are Alice’s E-mails, but these two E-mail are register in different mail servers – abc.com and xyz.com. The congruent part of these two E-mails is the account name Alice which is as same as v 2Alice :login. Our statistics show that 80 percent of users tend to use the same accounts and 75 percent users have at least two accounts which they use often. Our survey also shows that 61 percent of E-mail users have their own contacts lists. In addition, most on-line social networking sites provide friend lists. Moreover, Instant Messenger users only communicate with those who are in their contacts lists. Namely, most people have created friend lists on on-line social networking sites, E-mails and Instant Messengers. Based on these observations, we consider that Login Account As An Identifier among a user’s social networks of Instant Messengers, E-mail and on-line social networking sites would be an effective way to confirm her friends’ identity. In Scenario 1 and 2, because v 1Bob 2 v 1Alice :fl, Alice checks whether v 2Bob :login is as same as v 1Bob :login 2 v 1Alice :fl. If v 2Bob0 :login ¼ v 1Bob :login, it indicates that v 2Bob0 belongs to Bob. The same idea suits Scenario 3. Alice can compare the equivalence among v 1Bob :login; v 2Bob :login and v 2Bob0 :login in order to verify v 2Bob0 . The disadvantage of this approach is that sometimes a user’s login accounts are not always the same. If her login account is registered by others, she has to choose another variable as her username.
In this approach, we use the similarity measure to calculate the friend network similarity. The friend network similarity is defined as below. Definition 1. Let v m c be the candidate identity in Gm. We define the friend network similarity between v nt and v m c as S.
S
v v n t;
m c
n m CF v ; v t c ¼ jEX ðGm ; v nt :flÞj
ð1Þ
n m where CF friends between vt ; v c n is the number of common m and v . EX Gm ; v :fl is the number of v n ’s friend in Gm. c
t
v nt
t
After calculating the similarity, we obtain a value a. As a increases, the similarity between v nt and v m t also increases. We evaluate a to get the probability of v m c being real in Gm. Definition 2. We define the probability of
P S v nt ; v m ¼k c
v mc ’s validity as P. ð2Þ
We use a simple example to explain this approach. Suppose that Alice and Bob have already constructed a social link on the social network SN1. It means that Alice can obtain v 1Bob :fl. Next, Alice receives a friend request from Bob on the social network SN2. If Alice can obtain v 2Bob :fl, she can evaluate the friend network similarity a with Eq. 1. Based on Eq. 2, Alice can decide whether or not to believe v 2Bob0 is valid by evaluating the probability k of v 2Bob0 . 4.4. Further discussion In this section, we propose three approaches to solve the Identity Theft Attack. Users can select the most suitable approach to prevent this attack according to the information they have gotten. As an example shown in Section 4.3, if Alice does not get the permission for obtaining v 2Bob :fl, she still can apply the other two approaches to validate v 2Bob0 on SN2. A cautious user can even apply these three approaches respectively to check if his friend’s identity is valid. 5. Experiment The main purpose of the experiments is to validate the scheme. There are two parts: (1) the user behavior survey, and (2) the evaluation of the friend network similarity. Details are given below. 5.1. Experiment 1: the user behavior survey We investigate the similarities between user accounts among on-line social networking sites, E-mail and Instant Messengers by questionnaires. We collected 200 samples from Internet users on campus. Most of them are under 30-years-old. 5.1.1. The usage of E-mail Our statistics shows that: Everyone has 3.515 E-mail addresses on average. 60 percent of people use the same prefix – the words before @ – for their E-mail address. 6 percent of people have only one E-mail address. 34 percent of people use different prefixes. We conclude that people commonly use E-mail. Most people have more than one E-mail address and people usually register the same prefix as their E-mail address.
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
2349
5.1.2. The usage of the login accounts on on-line social networking sites We chose the top 20 most popular on-line social network sites in Taiwan (Business Next) and checked the usage of user login accounts. Table 1 lists these sites. 10 of these on-line social networking sites allow users to use their E-mails as their login identifies. The result shows that: On average, every user is registered in approximately 5.2 of the 20 on-line social networking sites. A user registers 2.285 on-line social networking sites with the same E-mail and 1.755 on-line social networking site with the same account prefix. Generally speaking, most users register in multiple on-line social network sites, and they tend to use the same E-mail address or account name as their login accounts across several social networking sites. 5.1.3. The usage of the login accounts for Instant Messenger We chose Windows Live Messenger, Yahoo! Messenger and Skype as targets in order to examine the usage of the login accounts for Instant Messenger. We discovered that: 98 percent of users’ login accounts are their E-mail. 41 percent of users use two different Instant Messengers and 63.41 percent of these users use the same E-mail. The result is listed in Fig. 1. 71 percent of users have the same login account between online social networks and Instant Messengers. From our result, we discovered some fact about login accounts of Instant Messenger: Most people use Instant Messenger. Most users’ login accounts are their E-mail addresses. About 60 percent of users who use more than two Instant Messengers use the same E-mail address as their login accounts. Users’ login accounts for Instant Messenger are as same as their login accounts for on-line social networking sites. We also investigate the relationships between users’ login accounts of E-mail, Facebook and Windows Live Messenger. 88 percent of users’ login accounts of Facebook and E-mail are the
Table 1 The top 20 on-line social networking sites in Taiwan. Name
URL
Facebook Wretch Gamer Eyny Pixnet Windows Live Mobile01 Atlaspost Plurk Xuite Wikipedia iPartment Blogger CK101.com Gamebase YouthWant
http://www.facebook.com http://www.wretch.cc/ http://www.gamer.com.tw/ http://www.eyny.com/ http://www.pixnet.net/ http://home.live.com/ http://www.mobile01.com/ http://www.atlaspost.com/ http://www.plurk.com/ http://www.xuite.net/ http://www.wikipedia.org/ http://www.i-part.com.tw/ http://www.blogger.com/ http://www.ck101.com/ http://www.gamebase.com.tw/ http:// www.youthwant.com.tw/ http://www.flickr.com/ http://www.fc2.com/ http://www.wahas.com/ http://www.ipeen.com.tw/
Flickr FC2 Wahas iPeen
E-mail as login account p p
Fig. 1. The ratio of E-mail as login accounts for Instant Messenger.
same. 73 percent of users use E-mail as their Windows Live Messenger accounts, and 58 percent of users’ Facebook login accounts are as same as Windows Live Messenger accounts. In addition, 52 percent of users’ use the same E-mail as their login accounts for Facebook and Windows Live Messenger. The result is listed in Fig. 2. 5.1.4. Users’ contacts list 61 percent of users has their own E-mail contacts lists. 68 percent of users think their contacts lists are similar between E-mail and on-line social network sites. 81 percent of users consider their E-mail and Instant Messenger contacts lists to be similar. 80 percent of users think their on-line social network site contacts lists are similar to their Instant Messenger contacts lists. The detailed results are illustrated in Fig. 3. 5.1.5. Friend requests We also investigate whether or not a user checks the source of request when she receives a friend request on on-line social networking sites and Instant Messengers. The result is shown in Fig. 4. 5.1.6. Users’ accounts Here we desire to realize users’ habits when they want to register for a new account. About 80 percent of users tend to use the same variable as their login accounts. Only 4 percent of users do not use the same variable. 75 percent of users have more than two variables, and 21 percent of users have only one variable as their login accounts. 5.1.7. Summary In summary, our statistics shows some conclusions:
p p p p
p
p p p
Fig. 2. The ratio of the same account among E-mail, Facebook and Windows Live Messenger.
2350
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
and others by Eq. 3. Then we check everyone’s maximum a in each combination. We find out that the maximum value indicates the same person in common. Finally, we evaluate the cumulative distribution of a and generate the figures. The partial results are shown as Fig. 5–7. The probability of real identity k depends on a. We discovered that a becomes higher in each combination. The reason is that users tend to use the same prefix. Some conclusions from our experimental result concluded.
Fig. 3. The ratio of the similarity of the contact lists between E-mail and on-line social networking sites.
We discover that the results of some combinations is not good enough, including Gmail and Facebook and Gmail and Windows Live Messenger. A possible reason is that some contacts such as the users’ co-workers of Gmail do not appear in the other two network. Not all social networks display a user’s complete friend network. For example, users usually do not add their boss or coworkers into their Facebook or Windows Live Messenger networks. Because data collection difficult, we show rough results. Ideally, if we can get more samples, our results will be more accurate. 6. Discussion In this section, we discuss some issues about this work. 6.1. Special cases on on-line social networking sites
Fig. 4. The ratio of the friend request check in on-line social networking sites and Instant Messenger.
Most people have more than one social network, including Email, on-line social network sites and Instant Messengers. Many people use their E-mail as their login accounts in on-line social network sites and Instant Messengers. People tend to use the same variables as their login accounts. People’s friends’ lists among E-mail, on-line social network sites and Instant Messenger are similar. 5.2. Experiment 2: the result of friend network similarity We collected a volunteer’s and her 22 friends’ contacts lists on Gmail, Facebook and Windows Live Messenger. The users of Facebook have to export their contacts lists via their Yahoo! accounts. Only those who have Windows Live ID can get their contacts lists. These limitations increase the difficulty of our data collecting. Another limitation is that we cannot know all social networks of Gmail, Facebook and Windows Live Messenger in reality. So we modified the Eq. 1 to estimate the friend networks similarity. The modified equation is as below:
S
v nt ; v mc
¼
n m CF v ; v t
jv nt :flj
c
Famous people are hard to validate. Because their personal information is easy to collect, the attacker can create fake accounts for them easily. Most users like to add their idols as their friends. In general, it is difficult for an user to confirm the famous person’s identity with our scheme. Therefore, a user cannot use our scheme to verify the famous person’s identity. If a user creates multiple identities on the same social network, her friends may become confused by her multiple identities. In this case, her friends can use our scheme to validate the user. If the user wants to add her friends as friends of her other identities, she also can send a Challenge to her friends to confirm her identity. Facebook allows a user to change her login account. In this case, our scheme works even though the user has changed her login account. Because her friends list does not change and the login account still belongs to her, thus her friends can validate her identity with Challenge and Friend network similarity. For those who never use computers or Internet, our scheme does not work, but this is an extremely special case.
ð3Þ
Because v nt :fl P EX Gm ; v nt :fl , the experiment result shows the lower bound of a. We compare the 23 people’s contacts lists of Gmail, Facebook and Windows Live Messenger. Emails of people’s contacts list are the identifiers. Suppose the candidate identity is in one network and the known identity is in other, there are six combinations: Gmail and Facebook, Gmail and Windows Live Messenger, Facebook and Gmail, Facebook and Windows Live Messenger, Windows Live Messenger and Gmail, Windows Live Messenger and Facebook. First, we list their friend network similarity with themselves
Fig. 5. The probability of real identity between Facebook and Gmail.
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
2351
idea, we can introduce weight to our similarity measure. In experiment, users can pre-define a weight for each of them. We think that this fits the true situation and provides more information. This part can be improved in the future. Our similarity measure compares the identity in two social networks. The measure could be extended to three or more social networks. Ideally, more social networks should show more information. It could be our next research direction. 7. Conclusions
Fig. 6. The probability of real identity between Facebook and Windows Live Messenger.
This paper put emphasis on Identity Theft Attack. Of course, if all users decide not to disclose their personal information on social networking sites, the Identity Theft Attack may not become a problem. However, this violates the nature of social networks and causes social networks to become non-social networks. In this paper, we proposed three approaches to solve the Identity Theft Attack. Users can select the most suitable approach to prevent this attack according to the information they have obtained. In addition, the proposed approaches can be applied to the real social networking sites easily. Furthermore, we also demonstrated that our scheme is practical through the experiments. Acknowledgments The work of C.-M. Chen was supported in part by Natural Scientific Research Innovation Foundation in Harbin Institute of Technology, and the Shenzhen Peacock Project, China, under Contract KQC201109020055A. The work of H.-M. Sun was supported in part by the National Science Council, Taiwan, R.O.C., under Grant NSC 100–2628-E-007–018-MY3 and NSC 101–2221E-007–026-MY3. References
Fig. 7. The probability of real identity between Gmail and Facebook.
6.2. Friend network similarity issue In general, the maximum of a should belong to the same person; however, there are exceptions. In our experiment, we discovered that the maximum a does not always indicate the same person. This situation can happen when the two people’s friend networks are too similar. For example, Alice and Bob are classmates, so their friend lists might be similar. Although their common friends both exist in G1 and G2, they don’t add all of their common friend in both networks. It is possible that Alice adds more classmates than Bob in G2. If most of their friends in G2 1 2 are their classmates, the friend network similarity Sðv GBob ; v GAlice Þ G1 G2 G1 is bigger than Sðv Bob ; v Bob Þ. As a result, v Bob is more similar to 2 2 v GAlice rather than v GBob . However, this case shows the complexity of users’ social network may influence the experiment result. Because of the limitation of data collection, most of our data belong to the same group. In the real world, a person’s social network would be more complicated with life experience. If we can get more data, our experiment for the friend network similarity may show different results. In this work, we provide the basic concept of friend network similarity. Our similarity measure depends on the number of common friends between the target identity and the candidate identity. In real life, the relationships between a person and his/ her friends are different. Some friends are closer and important, some are not. For instance, family members are more important than co-workers. It shows that people have a measure mechanism to define the importance of the relationship. With this
Alexa: Top 500 Global Sites.
. Atkins, B., & Huang, W. (2013). A study of social engineering in online frauds. Open Journal of Social Sciences, 1, 23–32. Balduzzi, M., Platzer, C., Holz, T., Kirda, E., Balzarotti, D., & Kruegel, C. (2010). Abusing social networks for automated user profiling. In Recent Advances in Intrusion Detection, LNCS, (vol. 6307, pp. 422–441). Berry, B. J., Kiel, L. D., & Elliott, E. (2002). Adaptive agents, intelligence, and emergent human organization: Capturing complexity through agent-based modeling. Proceedings of the National Academy of Sciences of the United States of America, 99, 7187–7188. Bilge, L., Strufe, T., Balzarotti, D., & Kirda, E. (2009). All your contacts are belong to us: Automated identity theft attacks on social networks. In Proceedings of the 18th international conference on World wide web, WWW ’09 (pp. 551–560). Boyd, D., & Ellison, N. (2007). Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication, 13, 210–230. Business Next No.202 (2011). Top 100 Popular Websites in Taiwan. . Church, L., Anderson, J., Bonneau, J., & Stajano, F. (2009). Privacy stories: Confidence in privacy behaviors through end user programming. In Proceedings of the 5th symposium on usable privacy and security, SOUPS ’09 (pp. 20:1–20:1). Cutillo, L., Manulis, M., & Strufe, T. (2010). Security and privacy in online social networks. Handbook of Social Network Technologies and Applications (pp. 497– 522). Definition of Identity Theft and Attack. . Email and Webmail Statistics. . Fang, L., & LeFevre, K. (2010). Privacy wizards for social networking sites. In Proceedings of the 19th international conference on World wide web, WWW ’10 (pp. 351–360). Goga, O., Lei, H., Parthasarathi, S. H. K., Friedland, G., Sommer, R., & Teixeira, R. (2013). Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd international conference on World Wide Web, WWW ’13 (pp. 447–458). Gross, R., Acquisti, A. (2005). Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society, WPES ’05 (pp. 71–80). Identity Guard. . Identity Theft Protection.org: Keeping personal information always personal. .
2352
B.-Z. He et al. / Expert Systems with Applications 41 (2014) 2345–2352
Jin, L., Takabi, H., & Joshi, J. B. (2011). Towards active detection of identity clone attacks on online social networks. In Proceedings of the first ACM conference on Data and application security and privacy, CODASPY ’11 (pp. 27–38). LifeLock. . Mannan, M., & van Oorschot, P. C. (2008). Privacy-enhanced sharing of personal content on the web. In Proceedings of the 17th international conference on World Wide Web, WWW ’08 (pp. 487–496). OpenID. . Pixnet. . Plurk. . Protect my ID. . Skype. . Stats: Hotmail still on top worldwide; Gmail gets bigger. . Strater, K., & Lipford, H. R., Strategies and struggles with privacy in an online social networking community. In Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction, BCS-HCI ’08 (vol. 1, pp. 111–119). Su, Y. P. (2011). A defence scheme against Identity Theft Attack based on multiple social networks. Master’s thesis, National Tsing Hua University, Hsinchu, Taiwan, R.O.C.
TrustedID. . Wikipedia: Instant Messaging. . Wretch.cc. . Yahoo! Messenger. . Zhao, K., Yen, J., Maitland, C., Tapia, A., & Tchouakeu, L. M. N. (2009). A formal model for emerging coalitions under network influence in humanitarian relief coordination. In Proceedings of the 2009 Spring Simulation Multiconference, SpringSim ’09 (pp. 10:1–10:7). Zhao, K., Yen, J., Ngamassi, L. M., Maitland, C., & Tapia, A. (2010). From communication to collaboration: Simulating the emergence of interorganizational collaboration network. In 2010 IEEE Second International Conference on Social Computing (SocialCom) (pp. 413–418). Zheleva, E., & Getoor, L. (2009). To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international conference on World wide web, WWW ’09 (pp. 531–540). Zhou, X., Nie, Z., & Li, Y. (2010). Fast algorithm for searching adjacent communities and its application in hierarchical community discovery. Journal of Information Hiding and Multimedia Signal Processing, 1, 261–268.