Anonymity and privacy: a guide for the perplexed

Anonymity and privacy: a guide for the perplexed

FEATURE with massive flexibility in their adoption and use of cloud services. Using OPE, an organisation can protect numeric or alpha-numeric fields w...

2MB Sizes 1 Downloads 82 Views

FEATURE with massive flexibility in their adoption and use of cloud services. Using OPE, an organisation can protect numeric or alpha-numeric fields while preserving functionality such as sorting and range queries in a cloud service. Practitioners should realise that leaking order implies leaking related information, though it’s difficult to quantify such leakage. In addition to leaking the order of plaintexts, ciphertexts created with any OPE scheme leak the relative distance between the underlying plaintexts (eg, small ciphertexts are likely to correspond to small plaintexts). Given a ciphertext it is relatively easy to guess the approximate value of the underlying plaintext. Therefore, while OPE preserves sorting functions in an application, the potential visibility into the underlying data must be seriously weighed as a risk trade-off.

Data tokenisation Data tokenisation involves creating tokens for each plaintext, storing the data and tokens locally and then passing the tokens to the cloud application. Using this approach, a great deal of application functionality can be preserved. For example, searching for keywords and sorting the data on the server are possible. The security drawbacks are similar to those of searchable and order-preserving encryption. In addition, the local storage for the data and the corresponding tokens should be protected. Another weakness is that the tokenisation database must also

be accessible by users and thus may cause issues for remote or mobile users. This method works well if you’d like something similar to searchable deterministic encryption but need to obey compliance rules for data residency.

Fully homomorphic encryption Fully Homomorphic Encryption (FHE) allows for encrypted ciphertexts to be computed over by an untrusted party. This theoretically allows the client to ask the server to search the encrypted data for any function of the plaintexts (eg, a given substring) or ask the server to compute, say, the average of all encrypted numbers in the database field. The server will not learn anything about the data.

“Given the current cloud computing environment and concerns around privileged access to data, it’s critical to understand how to weigh security against access trade-offs” FHE has long been considered the holy grail of encryption. While the technology holds promise, higher-level operations and real world functionality are still many years away. Despite this fact, some vendors have made claims that they can deliver homomorphic encryption in practice today. Technical evaluators would be prudent to tread carefully before giving any credence to these claims. Even when FHE becomes feasible to use, its imple-

mentation will require significant code changes on the server side, and for each query, search will be linear in the size of the database, which may be unacceptable for large databases.

Conclusion Given the current cloud computing environment and concerns around privileged access to data, it’s critical to understand how to weigh security against access trade-offs in order to decide on a suitable encryption scheme.

About the authors Alexandra Boldyreva is an associate professor in the School of Computer Science of the College of Computing at Georgia Tech where she carries out research in cryptography and information security. She is affiliated with Georgia Tech Information Security Centre (GTISC) and Algorithms, Combinatorics and Optimisation programme (ACO). She received her PhD in Computer Science from the University of California at San Diego and BS and MS in Applied Mathematics from the St Petersburg State Technical University, Russia. Paul Grubbs is a cryptography engineer at Skyhigh Networks, responsible for the cryptography and key management modules for its cloud security software. He received his BS degrees in Mathematics and Computer Science from Indiana University, where he did research on secure multi-party computation. He will begin his PhD in Computer Science at the University of Virginia next year.

Anonymity and privacy: a guide for the perplexed Danny Bradbury, freelance journalist Danny Bradbury

Anonymity and privacy are two separate concepts that are often confused. It is a common problem, made worse by the fact that they are often connected and related to each other. Which of them should be used when, and what do they mean? “One is about hiding the content, and one is about hiding who is saying it,” says Cooper Quintin, staff technologist 10

Network Security

at the Electronic Frontier Foundation (EFF). A person can happily post material in clear text on a publicly acces-

sible forum while using an anonymous handle. Conversely, two people using their real names can send information between each other in encrypted form. That is

October 2014

FEATURE an act of privacy, and the two are clearly distinguishable.

Rich history Anonymity has a rich history in our culture, points out Harry Lewis, Gordon McKay Professor of Computer Science in the School of Engineering and Applied Sciences at Harvard University. “We have classical examples of anonymity, where speech is simply untraceable,” argues Lewis, who is a regular lecturer on privacy and anonymity issues. “Some of the most important early tracts that built support for the American revolution were written under a pseudonym, adopting the names of fictional or mythical people.” Thomas Paine and Benjamin Rush published Common Sense, the pamphlet that inspired the revolution, anonymously. Many people won’t see anonymity as particularly important in their everyday lives, argues Micah Lee, a former staff technologist for the EFF and a founder and board member of Freedom of the Press Foundation. “If you’re going about your daily work day, then unless you are managing a separate anonymous identity, you don’t want anonymity in that case,” he says. “However, you probably really do want privacy.” The need for privacy runs through daily life. We have a need to preserve everything from our Twitter account passwords to our social security numbers. And most of us would like our emails to remain private, too. The use cases for anonymity are more nuanced. Cases where a person could be rebuked for speaking out are good examples. Whistleblowing on corporate transgressions and reporting harassment in the workplace both fall into these categories. Journalists reporting in oppressive regimes, and their sources, are all in need of anonymous communication methods. Anonymity is naturally a goal for bad actors. Terrorists and cyber-criminals will go to great lengths to hide their identities. In many cases, they may also seek privacy, to protect the contents of sensitive communications. But it is a goal for good actors, too. Law enforcement officials have been known to use

October 2014

The Privacy Badger browser plugin helps prevent users being tracked across the web using cookies.

anonymity measures to track their targets stealthily. This latter example points to a broad use case: being able to operate in plain sight without revealing a role. Everyone from cops to celebrities may wish to engage in public spaces – either online or off – without attracting unwanted attention from others. There are other, less obvious applications for anonymity. “A lot of people don’t like targeted advertising, and don’t like those companies selling to them online,” Lee says. “So you might want to be anonymous to the advertising companies, so that they don’t know it’s you looking at a particular website.” There are various tools and techniques for preserving anonymity in this latter case, and under other circumstances too. “The easiest way isn’t to be anonymous to the ad network, but just to block it,” suggests Lee. Browser add-ons such as Privacy Badger will block pieces of websites that are trying to track visitors, or will block their cookies.1 “So maybe the website itself might know who you are, but the third-party resources on the site like the advertisers won’t be able to get their cookies to track you,” Lee explains. “It’s possible, but it’s tricky, and it’s always an arms race.”

Autonomous networks Decentralised (or distributed) autonomous communities (DACs) are another, relatively new way to hide identities online. These networks are typified by Bitcoin, the payment network and currency developed by Satoshi Nakamoto.2 Used correctly, DACs offer a unique combination of verification and anonymity. When a node makes a transaction (for example, sending a bitcoin to another node), the transaction is cryptographically verified by large proportions of other nodes. Anyone can create a network node – and a bitcoin wallet – without revealing their identity, meaning that bitcoins can be sent between nodes without anyone revealing who is behind the node. However, decentralised autonomous networks aren’t without their dangers. The networks are entirely transparent, with every transaction – and its corresponding addresses – stored in a public ledger called the blockchain. A rookie mistake for those wishing to keep bitcoin transactions anonymous is to keep transacting from the same bitcoin address. Over time, this would enable sharp-eyed analysts to infer relationships between those addresses.3

Network Security

11

FEATURE management, the use of anonymous credentials, and social procedures such as community moderation, to help solve the problem.6

Nymwars

Leading social networks worldwide as of June 2014, ranked by number of active users. Source: Statistica.

Onion routing Bitcoin is a form of peer to peer network. Another is Tor, the anonymity network that uses onion routing to protect the identity of its members. Lee warns that Tor falls wholly on one side of the privacy/ anonymity fence. “Tor is entirely about shielding who you are. It’s a common misconception, when you have people who use Tor thinking that this makes them totally secure,” he says. A person logging into a website unprotected by SSL, for example, might be even less secure if doing it via Tor. A malicious gateway node might snoop on their traffic. Used properly, Tor has been one of the most successful anonymity networks to date, but it isn’t without its flaws. In July 2014, an attack was identified that found the identities of hidden services – anonymous websites – operating on the network. It used approximately 6% of the nodes on the network to launch a traffic confirmation attack, which times the passage of packets on the network to identify their routes. It also tagged packets, making them easier to identify.4 Nevertheless, Tor is still one of the most robust anonymity tools available. “If it’s not 100%, it’s as close as you’ll get in this life,” argues Larry Kearley, president of the Canadian Access and Privacy Association, and vice-president 12

Network Security

of the Canadian Institute of Access and Privacy Professionals. “The various cracks all involve software being out of date. If you keep your software up to date and use it properly, you’re safer.”

Threats to anonymity There are other, simpler ways to defeat Tor anonymity. Several sites such as Pinterest specifically block visitors from Tor-based addresses from interacting. A list is being developed of those sites that block or limit Tor users.5 For example, Wikipedia allows visitors from Tor sources to read its pages, but prevents them from making edits. The rationale, according to Lee, is that many people wanting to abuse these services will do so via Tor to avoid being identified. Tor’s Roger Dingledine worries that this activity could end up siloing the Internet, as sites implement mismatched policies that exclude anonymous users. The problem is exacerbated as websites providing centralised services used by many other sites implement anti-anonymity policies. He lists anti-DDoS site CloudFlare, content distribution network Akamai and discussion site Disqus as examples, He suggests enlisting an individual to engage sites with anti-anonymity blocking policies, and to offer them a range of solutions, spanning reputation

Simply blocking Tor addresses isn’t the only kind of policy decision that is helping to dismantle anonymity online. Forcing individuals to use their real names in online interactions is another tactic. In 2011, with the launch of the Google+ social networking service, Google announced that it would only allow users to sign up for it using their real names.7 Nicknames and pseudonyms were not allowed. This prompted a backlash from worried users in what became known as the ‘nymwars’, and feminist blog GeekFeminism put together a list of those who were likely to be affected, including victims of physical abuse trying to avoid being tracked by violent ex-partners.8 Google finally lifted the real names policy completely in July 2014, three years after the social networking service was launched.9 By that stage, however, its mission had already been accomplished: the social network had been seeded with people using their real names – 343 million people were registered on the service.10

Light leakage But perhaps one of the most insidious threats to anonymity spans the realm of privacy as well. Metadata may be the key to unlocking our identities online. “It’s hard to function in modern society without having electronic transactions of various kinds, including social and commercial,” argues Lewis. “Every one of those transactions leaves digital footprints or fingerprints of your activities which can be used to reconstruct who you are.” We have already seen this deconstruction in action, several times. One of the most famous cases involved AOL, which released an ‘anonymised’ search history for researchers to analyse and from which to extract insights. The initiative backfired, as researchers found ways to cross-reference searches and produce

October 2014

FEATURE detailed histories for individuals. This quickly led them to identify Thelma Arnold – along with online searches revealing her interests.11 Researchers at the University of Texas did something similar with Netflix datasets.12 Finding identities in anonymous data sets is a function of entropy. In information science, entropy is the degree of uncertainty about a particular piece of data. By defining more details about that data item, one decreases the entropy. Decreasing the entropy to zero results in complete de-anonymisation.

“Using similar phrasing and writing styles can help investigators to match communications made by the same person using different online identities” Arvind Narayanan, one of the two people who broke the Netflix dataset, points out that when discussing the 6.6 billion people on the planet, we only need 33 bits of information to number them all. If I know that you live in a town with 100,000 people, then that gives me 16 bits of information about you, leaving just 17 bits.13 Since then, with the advent of mobile phones, the increase in social networking, and the advent of big data, the amount of data that we generate as part of our online activities – our ‘digital exhaust’ – has grown significantly.14 There are less technical ways to find those who would rather be hidden online. Analysing communications can lead to surprising insights. Simple context clues, such as posting communications at similar times, can give people hints as to your location. These mistakes can help to reduce your level of entropy, narrowing down those searching for your identity. Other giveaways are more subtle. Using similar phrasing and writing styles can help investigators to match communications made by the same person using different online identities. “The human element is the weak link in any system,” warns Kearley. “If you’re going to be doing really sneaky things you have to be incredibly careful.”

October 2014

Building a social network map from email activity. Source: Bits of Freedom – see reference 14.

Other anonymity tools There are other anonymity tools emerging, targeting consumers exclusively, often with specific use cases. Leak, for example, was a service for sending anonymous emails to others, although this has now shut down.15 Whisper, is a mobile app allowing users to post ‘anonymous’ messages, which are usually confessions. These messages are packaged for consumption by others.16 However, Whisper uses user IDs (installed on the device with the app) to track posts from the same person, and also collects their IP address, along with other information such as your web browser, operating system, ISP and what pages you view. These could quickly narrow entropy to zero, especially with the help of law enforcement.17 Other ‘anonymous’ apps also have their flaws. Secret.ly is a social networking app, described as a cross between Instagram and Twitter, that allows people to anonymously post their secrets. Rhino Labs found that it was possible to identify users with a simple flaw in the app’s API. This API uploads a phone’s contact list upon account creation, so that the user sees the secrets that friends are posting, without knowing who exactly is posting what. Creating a fake account (the site requires no verification)

and uploading just one contact enables a user to identify that contact’s secrets.18

True anonymity With anonymity apps and networks proving to be less than perfect, will we ever gain true anonymity online? “It’s an economic question,” says Lewis. “That is to say, the extent to which identity is kept secret is inversely proportional to the value that it has.” The more valuable a piece of information is, the more money and resource particular actors will spend to acquire it, especially powerful ones such as nation states. The other issue is the trade-off. Lewis loves Uber, the ride-sharing service that lets passengers call up cheap rides on their cellphones. It’s convenient, and often less expensive than a taxicab, he says. It lets you rate your driver, and also build up a reputation as the driver rates you. “So the driver can’t get away with taking you the long way, and you can’t get away without making a mess in the back of the cab,” he says. But the advantages that Uber has over taxis are due to the loss of anonymity that taxi cabs give you. With that service, you can’t simply jump into the back of an anonymous vehicle and wave cash at a driver who is unaware of your identity.

Network Security

13

FEATURE

The Whisper website.

In the end, the most damaging threat to anonymity could be us, and our craving for the latest, shiny features and services. And there’s no easy technical fix for that.

About the author Danny Bradbury is a freelance technology writer with over 20 years’ experience. He has written extensively for publications including the Guardian, the Independent, the Financial Times, and the National Post. He also works as a documentary film maker and writing coach.

References 1. Magid, Larry. ‘EFF Launches Free Privacy Badger For Firefox And Chrome To Block Hidden Trackers’. Forbes, 21 July 2014. Accessed Oct 2014. www.forbes. com/sites/larrymagid/2014/07/21/ eff-launches-free-privacy-badgerfor-firefox-and-chrome-to-blockhidden-trackers/. 2. Nakamoto, Satoshi. ‘Bitcoin: A Peerto-Peer Electronic Cash System’. Bitcoin.org, 2009. Accessed Oct 2014. https://bitcoin.org/bitcoin.pdf. 14

Network Security

3. Bradbury, Danny. ‘How Anonymous is Bitcoin?’. CoinDesk, 7 Jun 2013. Accessed Oct 2014. www.coindesk. com/how-anonymous-is-bitcoin/. 4. Dingledine, Roger. ‘Tor security advisory: “relay early” traffic confirmation attack’. Tor Project Blog, 30 Jul 2014. Accessed Oct 2014. https://blog.torproject.org/blog/torsecurity-advisory-relay-early-trafficconfirmation-attack. 5. ‘List Of Services Blocking Tor’. Tor Project. Accessed September 2014. https://trac.torproject. org/projects/tor/wiki/org/doc/ ListOfServicesBlockingTor. 6. Dingledine, Roger. ‘A call to arms: Helping Internet services accept anonymous users’. Arma’s blog, Tor project site, 29 Aug 2014. Accessed Oct 2014. https://blog.torproject. org/blog/call-arms-helping-Internetservices-accept-anonymous-users. 7. Galperin, Eva. ‘2011 in Review: Nymwars’. EFF, 26 Dec 2011. Accessed Oct 2014. www.eff.org/ deeplinks/2011/12/2011-reviewnymwars.

8. ‘Who is harmed by a “Real Names” policy?’. Geek Feminism Wiki. Accessed Sept 2014. http://geekfeminism.org/2011/07/08/anti-pseudonym-bingo/. 9. Google+ post, Google, 16 July 2014. Accessed Oct 2014. https:// plus.google.com/+googleplus/posts/ V5XkYQYYJqy. 10. ‘Leading social networks worldwide as of June 2014, ranked by number of active users (in millions)’. Statista, June 2014. www.statista.com/statistics/272014/global-social-networksranked-by-number-of-users/. 11. Barbaro, Mike; Zeller, Tom. ‘A Face Is Exposed for AOL Searcher No. 4417749’. New York Times, 9 Aug 2006. Accessed Oct 2014. www.nytimes.com/2006/08/09/ technology/09aol.html?_r=0. 12. Narayanan, Arvind; Shmatikov, Vitaly. ‘Robust De-anonymization of Large Sparse Datasets’. University of Texas, 2006. Accessed Oct 2014. www.cs.utexas.edu/~shmat/shmat_ oak08netflix.pdf. 13. Narayanan, Arvind. ‘What this blog is about’. 33 Bits of Entropy blog, 29 Sep 2008. http://33bits. org/2008/09/29/serious-contentcoming-soon/. 14. Door Hans de Zwart. ‘How Your Innocent Smartphone Passes On Almost Your Entire Life To the Secret Service’. Bits of Freedom, 30 Jul 2014. Accessed Oct 2014. www.bof. nl/2014/07/30/how-your-innocentsmartphone-passes-on-almost-yourentire-life-to-the-secret-service/. 15. Leak website. Accessed Sep 2014. https://justleak.it/. 16. Whisper website. Accessed Sep 2014. www.whisper.sh. 17. Olsen, Parmy. ‘3 Reasons To Be Wary Of Secret-Sharing App Whisper’s Claim To Anonymity’. Forbes, 24 Jan 2014. www.forbes.com/sites/ parmyolson/2014/01/24/3-reasons-tobe-wary-of-secret-sharing-app-whispers-claim-to-anonymity/. 18. Benjamin C. ‘Speak Carefully How We Hacked (Your) Secret’. Rhino Security Labs, 21 Aug 2014. Accessed Oct 2014. www.rhinosecuritylabs.com/keeping-secret/.

October 2014