c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5
Available online at www.sciencedirect.com
ScienceDirect www.compseconline.com/publications/prodclaw.htm
Malicious web pages: What if hosting providers could actually do something… Huw Fryer b,*, Sophie Stalla-Bourdillon a, Tim Chown b a b
Institute for Law and the Web, University of Southampton, UK ECS, Faculty of Physical Sciences and Engineering, University of Southampton, UK
abstract Keywords:
The growth in use of Internet based systems over the past 20 years has seen a corre-
Web security
sponding growth in criminal information technologies infrastructures. While previous
Drive-by download
“worm” based attacks would push themselves onto vulnerable systems, a common form of
Malware
attack is now that of drive-by download. In contrast to email or worm-based malware
E-Commerce directive
propagation, such drive-by attacks are stealthy as they are ‘invisible’ to the user when
Immunities
doing general Web browsing. They also increase the potential victim base for attackers
Internet intermediaries
since they allow a way through the user's firewall as the user initiates the connection to the
Hosting providers
Web page from within their own network. This paper introduces some key terminology
ISP
relating to drive-by downloads and assesses the state of the art in technologies which seek
Search engines
to prevent these attacks. This paper then suggests that a proactive approach to preventing compromise is required. The roles of different stakeholders are examined in terms of efficacy and legal implications, and it is concluded that Web hosting providers are best placed to deal with the problem, but that the system of liability exemption deriving from the E-Commerce Directive reduces the incentive for these actors to adopt appropriate security practices. © 2015 Huw Fryer, Sophie Stalla-Bourdillon and Tim Chown. Published by Elsevier Ltd. All rights reserved.
1.
Introduction
The ability of cyber criminals to compromise networked computer systems through the spread of malware allows the creation of significant criminal information technologies (IT) infrastructures or ‘botnets’. The systems compromising such infrastructures can be used to harvest credentials, typically through keylogging malware, or provide a cover for illegal activities by making victim computers perform criminal acts
initiated by others, such as distributed denial of service (DDoS) attacks. A single compromise may result in an infected system that is used in multiple criminal activities, and the cumulative effect of these activities and the resources dedicated to prevention can be considerable.1 This paper explains how the phenomenon of drive-by downloads has evolved to become a significant threat to both Internet users and third party systems. To effect a compromise via a drive-by, a criminal will create a malicious Web page which, when visited, attempts to
1 See e.g. Ross Anderson and others, “Measuring the Cost of * Corresponding author. ECS, Faculty of Physical Sciences and Cybercrime”, Proceedings (online) of the 11th Workshop on the Engineering, University of Southampton, Highfield, SouthEconomics of Information Security (WEIS), Berlin, Germany ampton, SO17 1BJ, UK. E-mail address:
[email protected] (H. Fryer). (2012). http://dx.doi.org/10.1016/j.clsr.2015.05.011 0267-3649/© 2015 Huw Fryer, Sophie Stalla-Bourdillon and Tim Chown. Published by Elsevier Ltd. All rights reserved.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5
exploit vulnerabilities on the user's computer automatically. In contrast to email or worm-based malware propagation, such drive-by attacks are stealthy as they are ‘invisible’ to the user when doing general Web browsing. They also increase the potential victim base for attackers since they allow a way through the user's firewall, as the user initiates the connection to the Web page from within their own network. The phenomenon of drive-by downloads is not a new one, but remains one of the significant threats to the security of the Web, with the prominent malware variants being distributed in this way.2 The perception that malware only resides on ‘suspect’ sites such as file sharing sites, or those carrying pornography is now far from reality. Commonly, an attacker will seek to compromise an otherwise legitimate website and use that to distribute malware. They may also attempt to place malware on a cheap throwaway domain name, but it is harder for ISPs or authorities to take measures against a legitimate website, and it also increases the probability of a potential victim visiting it. Where the target is a website on a trending topic, the risk of exposure is even greater. With the rise of blogging and similar content creation, there is also a significant risk of vulnerabilities in common blogging platforms, such as WordPress, exposing visitors to such sites to potential driveby malware. This article provides a review of the existing strategies being used to mitigate this problem, and explains why they are not enough. We suggest that simple actions by Web intermediaries, in particular companies providing hosting services, could significantly impact upon the amount of malicious web pages, and force the criminals to use a smaller, more readily identifiable set of platforms to spread their malware. We conclude that laws excluding liability for intermediaries such as the E-commerce Directive in the European Union do not necessarily give an incentive to hosting providers to engage in such security practices and legitimate use of the Web suffers as a result.
2.
Background
Like any other technology, computers have turned out to have a significant amount of use by criminals as well as legitimate use. The problem has been more severe than with previous technology, due to the combination of two factors. Firstly, computers have increased the speed at which a task can be automated. Secondly, the Web has got rid of the majority of the geographic limitations towards finding more victims so this automation can be put to good (or rather malicious) use. An example of this automation in action comes from the volume of spam, which despite having reduced considerably from a high of 92.6%, still represents 75.2% of all emails.3 The main way that criminal groups are able to maintain 2
Chris Grier and others, “Manufacturing Compromise: The Emergence of Exploit-as-a-Service”, Proceedings of the 2012 ACM conference on Computer and communications security (2012). 3 Trustwave, “Trustwave 2013 Global Security Report” (2013)
accessed July 22, 2014.
491
infrastructure which can send this volume of spam, or perform other undesirable actions is through the use of malicious software (malware). Malware takes over a victim's computer, and having done that can either attack the users directly, or recruit them into a botnet, i.e. a distributed network of computers which is of great to value to an attacker. Targeting the users might include something as simple as altering search results to gain advertising revenue, or spying on the browsing habits to target adverts. More seriously, it can steal credentials to online banking; or render a user's computer unusable (e.g. through encrypting all their files) unless they pay a ransom. Distributed computing offers the opportunity to conduct distributed denial of service attacks; sending spam; and more recently mining bitcoins.4 Over the years, the tactics that criminals have used to distribute malware have evolved and now different strategies are required to combat them. This section provides some background of this evolution, up to the primary focus of the paper: that of “drive-by” downloads. The distinctions between different types of malware are often unhelpful, since a lot of them do not fit neatly into one category, and in corporate elements of different types of malware. The reason for the distinctions in this section is to emphasise the differences in propagation methods, and the differences in strategy which are required to combat them.
2.1.
Exploitation vs social engineering
In order to work, malware needs to be able to run on a victim machine. One method to infect a victim is known as social engineering which is to simply make the user voluntarily run the malicious code.5 This can be accomplished through the use of Trojan style malware. Like the name suggests, this is a reference to the Trojan horse from Greek legend, which was let into Troy and allowed the Greeks hiding within to sneak out and open the gates of the besieged city from the inside. In the context of security, this might comprise an application purporting to perform a certain task, whilst at the same time an application hidden within would simultaneously attempt to subvert the machine it was run on. Another method is to exploit a vulnerability on the machine. A vulnerability is a flaw, or bug in a piece of software which amounts to a security weakness. Vulnerabilities will have a greater or lesser degree of severity, but the most serious are those which allow Remote Code Execution (RCE). These vulnerabilities allow an attacker to run their own code rather than the code intended by the application. This is done by confusing the program into accepting input as commands to be executed, rather than as data to be manipulated. An exploit is a piece of code which takes advantage of the vulnerability, in order to run the desired code. In traditional computer based applications, this will be done by corrupting 4
Bitcoins are a virtual currency, a part of which relies on solving a “hard” mathematical problem, for which the miner is compensated. The power requirements for doing this are significant, so using a network of victim computers can save a considerable amount of money. 5 In this context, code refers to the series of instructions written by the programmer which gets converted into “machine code” (a series of 0s and 1s) that the computer can understand.
492
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5
the memory, but in Web applications there are many other methods through which this can be achieved, and other less serious vulnerabilities which exist.6 The two methods can also be used together. For example, a common attack vector is that of a malicious attachment sent in an email. Sending an executable file would simply be social engineering,7 and that would be used to infect the computer. For the most part though, in order for it to be a plausible reason for the user to open the attachment, then a file not typically associated with being malicious (such as Word documents, or PDF files) can be used. As the victim opens the maliciously crafted file, a vulnerability in the file reading software is exploited and that is used to take over the computer. A typical example is an email purporting to be from a delivery company, with information in a PDF file on the location of a parcel sent to the victim.
2.2.
Viruses
Early malware was mostly computer viruses, now incorrectly used as a layperson's term for all forms of malware. A virus was characterised by the fact that it would attach itself to a previously benign file, and then spread from file to file on the computer. The concept originally came from Von Neumann, and then Cohen analysed the properties of computer viruses in more detail.8 An early Masters thesis by Kraus considered a biological analogy in that code did not satisfy the requirements for being classed as alive, whereas a virus was simpler than most other organisms, had the ability to reproduce, and hence became a workable analogy.9 The virus would spread from file to file on a computer, and then transfer to different computers through the physical transfer of floppy disks between users. This was the most logical way for it to spread, since use of networks in general, and in particular the Web, was in its infancy10. The nature of the spread of real world viruses, and actions to limit them, appeared to hold with computer viruses too,11 which led to a substantial body of work on epidemiology of computer networks. Given the similarities between virtual 6
See infra, Section 4.1. Mostly. It is possible to rely on the fact that Windows hides file extensions by default, so filename.doc.exe would display as .doc, whereas running it would cause the executable file to run. 8 Fred Cohen, “Computer Viruses: Theory and Experiments” (1987) 6 Computers & security 22. 9 Kraus, 1988 Masters thesis, translated by D Bilar and E Filiol, “On Self-Reproducing Computer Programs” (2009) 5 Journal in computer virology 9. 10 Interestingly this approach is still used with memory sticks, in particular sensitive machines which are not connected to the Internet to avoid malware, for example the Stuxnet malware used this method, Ralph Langner, “Stuxnet: Dissecting a Cyberwarfare Weapon” (2011) 9 Security & Privacy, IEEE 49. It was also a propagation method of the Conficker worm, see Phillip Porras, Hassen Saidi and Vinod Yegneswaran, “Conficker C Analysis” [2009] SRI International. 11 William H Murray, “The Application of Epidemiology to Computer Viruses” (1988) 7 Computers & Security 139. 12 Jeffrey O Kephart and Steve R White, “Measuring and Modelling Computer Virus Prevalence”, Research in Security and Privacy, 1993. Proceedings., 1993 IEEE Computer Society Symposium on (1993). 7
and physical viruses, a direct analogy was drawn and strategies for minimising the spread came from that. Kephart & White were amongst the first to look at this,12 and other ideas like selective immunisation or quarantine were also tried with varying levels of success.13
2.3.
Worms
Like a virus, a worm also self-propagates. The difference is that whilst a virus was constrained in having to go from file to file, a worm could install itself only once on a computer and then scan other computers to push itself onto. Rather than relying on physical devices to propagate, it automates the process through using network or Internet connections. It required no intervention from the user, it would simply scan for vulnerable machines and exploit the same vulnerability on each one. As usage of the Web began to grow significantly in the early 2000s, worms were incredibly common and incredibly effective. Few of the computers connecting to the Internet had adequate security to deal with these attacks. Firewalls were not commonly installed, meaning that it was possible to push this malware without any barriers, and many computers were directly accessible on the Internet to an attacker.14 Operating system vendors also took time to adjust to the nature of the threat, in that their products were under such relentless attack. For example it was not until 2003 that Microsoft introduced a regular patching cycle, and even then the update mechanism required users to opt-in rather than be done automatically, which meant that a lot of the time updates never happened. This attack method has since fallen out of favour, and there are several possible reasons for this. The first is that operating system vendors have caught up with the threats and the hostile environment they have to work within, and have introduced additional security and updates into their products. As such, worm exploits e which attacked operating systems e are not so easy to find. Modern operating systems also have firewalls installed by default,15 which largely solves the problem of malware “pushing” onto a machine. Similarly, the depletion of IP addresses also led to the adoption of Network Address Translation (NAT) hardware, which enabled multiple computers on a local network to share the same IP address on the Internet as is the case on most home networks. A side effect of this is that the NAT hardware will not accept unsolicited communications from the Internet, and this can block worm based attacks. 13
See e.g. Giuseppe Serazzi and Stefano Zanero, “Computer Virus Propagation Models”, Performance Tools and Applications to Networked Systems (Springer 2004). and Chenxi Wang, John C Knight and Matthew C Elder, “On Computer Viral Infection and the Effect of Immunization”, Computer Security Applications, 2000. ACSAC0 00. 16th Annual Conference (2000). 14 At a certain point, it was not possible for even vigilant users to configure their computers before becoming infected, Scott Granneman, “Infected in 20 Minutes” (The Register, 2004) accessed June 20, 2014. 15 Joe Davies (Microsoft), “New Networking Features in Microsoft Windows XP Service Pack 2” (2004) accessed June 20, 2014.
c o m p u t e r l a w & s e c u r i t y r e v i e w 3 1 ( 2 0 1 5 ) 4 9 0 e5 0 5
2.4.
Drive-by downloads
Attackers reacted to the defences against worms by taking the opposite approach with drive-by downloads. A drive-by download waits for the user to come to them rather than attempting to force it onto other machines (using a “pull” rather than “push” propagation mechanism). The process requires that a user visits a website which is under the control of an attacker. Once the website has been visited, malicious code on the website will attempt to subvert the user's browser, and take over the computer that way since it is the browser which is used to access the website. This might be done through using JavaScript to corrupt the browser itself, or will use one of the plugins which the browser is running. Plugins are additional features added to the browser, often to play multimedia content e such as Flash, Adobe Reader or Java. Like with operating systems, these will also contain vulnerabilities, so the attacker will attempt to exploit these in the same way to get their malicious code to run. This offers some significant advantages to an attacker over a worm-based attack. Firstly, any logs which exist of a user visiting a malicious website will be virtually indistinguishable from normal Web browsing, so the compromise has less chance of being discovered. Secondly, although a firewall can block attacks from the Internet, it has to let some traffic through in order to make Web browsing possible. Using a website can therefore offer the attacker a way through the user's firewall, and therefore increase the potential victim base.16 The phenomenon of drive-by downloads is not a new one, but it has remained a significant threat. Provos et al. performed a detailed analysis of drive-by attacks as early as March 2006e07,17 and also JanuaryeOctober 2007.18 They found 1.3% of results in Google search results were malicious; and 0.6% of the most popular 1 million URLs had, at some point, been used as malicious hosting. A typical attack will use a previously benign website which is compromised by the attacker to include malicious content. This is a separate part of the attack, before the victim browses to the website, and will use some weakness in the website or the server it is hosted on, commonly including out of date software; malicious advertising; or exploits using unchecked user data.19 Following the exploitation of the website, the content will then be changed to include malicious code, usually to redirect the victim to an attack website, which contains the code performing the exploit20.
16
Niels Provos and others, “The Ghost in the Browser Analysis of Web-Based Malware”, Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets (2007). 17 Ibid. 18 Niels Provos and others, “All Your iFRAMEs Point to Us”, Proceedings of the 17th Conference on Security Symposium (USENIX Association 2008) . 19 See infra Section 4.1. 20 An