Designing flexible sandboxing solutions to adapt to new malware trends

Designing flexible sandboxing solutions to adapt to new malware trends

FEATURE Designing flexible sandboxing solutions to adapt to new malware trends Matteo Cafasso Mathieu Tarral Matteo Cafasso and Mathieu Tarral, F-...

986KB Sizes 0 Downloads 23 Views

FEATURE

Designing flexible sandboxing solutions to adapt to new malware trends

Matteo Cafasso

Mathieu Tarral

Matteo Cafasso and Mathieu Tarral, F-Secure Every day, security organisations analyse thousands of new files and URLs, identifying the harmful ones to constantly improve their knowledge of computer threats. It is hard to guess whether a file or URL could be harmful or not without executing it, and executing unknown malicious software is dangerous.

Unknown threats When dealing with unknown threats, such as malicious files or web URLs, one of the best analysis techniques is to execute them in order to study their behaviour. This approach does not come without risks. A typical mitigation technique will see the objects being executed as a test within a sandbox. Sandboxes are restricted environments carefully designed for safely running dangerous programs and recording the changes that they produce. Their implementation may vary according to the specific use case and the use of hardware assisted virtualisation is a pretty common example. There are many advantages to virtualisation: it’s a mature and stable technology; it allows quick provision of a vast multitude of environments; it scales very well; and it provides a good degree of isolation when running dangerous software. Other examples of sandbox implementation are emulators and containers. IT security researchers and analysts have been using sandboxes for several years in order to study the behaviour of interesting samples. Many IT security companies rely on automated sandboxes to analyse large amounts of files and web URLs. Despite their popularity among IT security specialists though, commercialising sandboxes seems to be slow to happen. February 2018

One of the main reasons for this appears to be the high maintenance cost of this type of solution. Behavioural analysis sandboxes are required to execute a large variety of programs in order to identify the malicious ones. Considering the number of different devices, operating systems, executables and media types that are available today, this is not an easy task.1 Another challenging aspect to consider is the evolutionary growth of malware, which forces forensic experts to constantly perfect their feature extraction tools. Such tools are required to produce the data that provides a verdict on the nature of a sample. These tools must be quickly integrated within the sandboxes themselves without affecting their operational status. It seems clear that there is a need to reduce the maintenance and operational costs of such solutions. This article explores design patterns and technologies suitable for this kind of problem, aiming to help the security industry bring sandbox technologies closer to their customers. We’ll look at an architectural design aiming to produce a flexible and maintainable sandboxing platform. Then we’ll discuss the introduction of a software development kit (SDK) called Sandboxed Execution Environment. The framework design reproduces what is discussed in the first

part. Finally, we’ll evaluate virtual machine introspection techniques as a medium to provide cost effective behavioural extraction features to sandboxes. To conclude this last part, we’ll introduce a virtual machine introspection tool called Nitro.

Requirements For this research, requirements elicitation was conducted with malware analysts, security researchers and behavioural analysis experts. After several sessions, the requirements were gathered and sorted by relevance. Here follows their brief introduction. In the past, malware only used to be prevalent on Microsoft Windows operating systems. But in recent years with the introduction of portable devices such as smartphones and tablets, the trend started shifting towards modular malware capable of affecting multiple platforms at once. Operating systems such as Android, macOS and Linux are becoming more and more interesting for attackers. One of the reasons is the adoption of such platforms in enterprises and other organisations, which makes them interesting targets for information exfiltration. Therefore the support of different operating systems is critical nowadays for the IT security business. A natural consequence is the support for different hardware platforms and devices. Virtualisation technologies are of great benefit for these kinds of problems. Yet Computer Fraud & Security

5

FEATURE the lack of standards in this field often forces the adoption of multiple virtualisation solutions, rather than a single one. While observing malware analysts at work, it became evident how the use of different forensic tools and technologies plays a key role in the analysis of a sample. Behavioural analysis encompasses multiple fields of expertise, such as memory and file system forensics or operating system internals. Each of these fields allows observation of the behaviour of a program from a specific angle. The co-operation of these competences is of primary importance and should be taken into account when designing these types of solutions. It should be easy for malware experts to integrate their knowledge and skills into the behavioural analysis platform. Finally, a straightforward requirement is the development of a simple and well documented framework enabling users and developers to achieve what is mentioned above.

A flexible design The event-driven architectural pattern is based on a simple mechanism consisting of the production and consumption of events. Several entities act as producers and consumers of events that are shared between all the components of the platform. An event is represented as a change of state within the architecture.2 The transition may trigger internally in some of the platform’s components or externally as a request from a user. Events may be simple entities, such as strings or integers, or more complex messages with headers and containing further metadata – for example, names, time-stamps and payload. To be a complete Event-Driven Architecture (EDA), the system must share the events between all its components. It requires a common buffer able to dispatch and, if necessary, queue the events in case the consumers are not consuming them quickly enough. EDAs are loosely coupled and support modular design.3 In an EDA, the events are neither aware of the causes of their trig6

Computer Fraud & Security

gering nor of the consequences they might generate once consumed. In addition, the single components of the architecture are free to limit their knowledge based only on their own responsibilities, ignoring all the rest of the system internals. A good design pattern for implementing the event flow control is the observer pattern.4 The observer pattern is one of the simplest design patterns for implementing EDAs. An observer-based architecture consists of multiple observer entities subscribing to one or more observable objects. In this context, the sandbox plays the observable role, whereas the forensic tools are the observers. Each observer can subscribe, through the observable sandbox, different handlers to specific events. Any time a change occurs in the sandbox state, a related event is triggered, allowing the forensic tools to act accordingly. Some examples of events are the powering on of the execution environment, the insertion or removal of a device or a specific API call from the test sample. A protocol consisting of the formal description of the events and their approximate sequence order is introduced. The protocol is the contract the developers (the forensic analysis experts) need to honour and the only thing they need to be aware of. As the observers are totally decoupled from each other, there is little or no risk of interference. Such design promotes the development of generic forensic plugins which are well isolated and can be re-used in different contexts. The end user can run multiple tests employing different forensic plugins according to his or her need.

paired with a plugin loading interface and an observer pattern-based event flow controller. The user can describe each test case, specifying the sandbox type and characteristics, as well as the forensic plugins to be used. Once launched, the framework will apply the designed protocol and route the events to the plugins. Each plugin has access to the sandbox resources. The resources consist of the virtual machine memory, the CPU and its registries, the attached devices, the disk and the network interfaces. At any point during the execution, the resources can be inspected to extract relevant information. It is possible, as well, to change the state of such resources. Each state change should result in specific events being triggered, making sure all plugins are aware of what is happening. Internally, SEE relies on the Libvirt Virtualisation API to offer the support for multiple virtualisation technologies.6 It currently supports QEMU/KVM, Virtualbox and Linux Containers. More virtualisation technologies can be easily added by exposing the related Libvirt API.

Sandboxed Execution Environment

Core concepts

The Sandboxed Execution Environment (SEE) is a framework that implements what has been previously illustrated.5 Developed in the Python language, SEE offers a unified interface for managing different virtualisation technologies. It allows for easy execution and control of multiple sandboxes. Each sandbox is

VMI for flexible sandboxing In this second part, Virtual Machine Introspection (VMI) technologies are evaluated with the goal of improving flexibility in sandboxing solutions. VMI as a concept will be introduced first, followed by the possible applications. Following this is an explanation as to why it reduces the maintenance and development cost of modern sandboxes.

The term VMI first appeared in a publication in 2003 and is described as “Inspecting a virtual machine from the outside, for the purpose of analysing the software running inside it.”7 This first research was driven by the desire to create a new type of intrusion detection system (IDS), combining the benefits of the attack resistance of a network-based soluFebruary 2018

FEATURE tion with the greater visibility offered by an endpoint-based one. The technology itself has evolved since, but the core concepts remain the same and can be summarised as follows. The main goal is to monitor the execution of a virtual machine in the most transparent and seamless way possible, without requiring any modification of the execution environment or installing an intrusive guest agent. The key element is the Virtual Machine Monitor (VMM), which is modified to provide an API, giving a VMI application complete access to the guest hardware state (CPU registers, memory, etc). The APIs can then be integrated within eventbased architectures (EDA), allowing an application to subscribe and listen to guest hardware events (page table modifications, read/write access on CR/MSR registers, etc). When such an event occurs in the virtual machine, it is transferred to the VMI application, if needed. The guest execution is suspended while the VMI application processes the event and sends an explicit acknowledgement to continue. The VMI application must solve a problem known and described as the ‘semantic gap’ by querying the guest hardware state, inspecting the memory and rebuilding the high-level context (current process name, API being called and so on). Only then can the event be be transferred to the true VMI application logic, which can be just monitoring.

Figure 1: SEE architecture overview.

• Malware analysis. • Memory forensics.

Hypervisor adoption The development and implementation of VMI capacities into a given hypervisor takes a lot of effort, firstly due to the complexity of the VMM itself and then because finding a mechanism to force the guest OS to exit to the VMM in the right context and implementing it involves a substantial amount of research beforehand. Furthermore, side issues such as filling the semantic gap have to be addressed for the whole solution to be usable. As of today, Xen is leading the VMI research, a consequence of being the first open source hypervisor available at the time, allowing researchers to explore the field and demonstrate the usefulness

of such modifications. But it was not until recently that the industry started to seriously consider VMI as a key technology to analyse and counter the most advanced malware trends. As a consequence, the first memory inspection API was officially integrated into Xen in 2011, followed by a more complete API in the 4.6 release in 2015.8,9 Other hypervisors, such as KVM, also followed this trend and very recently, a set of patches has been posted on the Linux Kernel development mailing list, making the same features available in Xen.10

Reducing development costs The following section will illustrate the development and maintenance cost of guest agents in the perspective of

Key advantages VMI has become popular in computer security research because of two main advantages: • Isolation: the observer (the VMI application) is totally isolated from the observable (the virtual machine), and this is provided and enforced by the CPU itself through virtualisation. • OS agnosticism: once a VMM has been modified to allow VM introspection, this is applicable to every VM being virtualised on the same hardware architecture, regardless of the operating system. All use cases for VMI are security related: • Robust and trustful monitoring. • VM-based intrusion detection. February 2018

Figure 2: Architecture of a VMI application.

Computer Fraud & Security

7

FEATURE multi-platform support. At first, operating system-specific internals must be acquired and updated in order to understand what to observe when it comes to software behaviour. Then an in-depth knowledge of the OS low-level facilities is required to develop and maintain the agent, increasing its robustness and reliability, as well as hiding its presence with techniques such as DKOM (Direct Kernel Object Manipulation). When it comes to the monitoring itself, various techniques have been described, both in the user and kernel space, to perform a system tracing of API calls. Most of the time, they revolve around hacks rather than official and documented APIs. Furthermore, some famous techniques (like inline hooking) suffer from synchronisation issues, making them non-thread safe. Microsoft has introduced hot patch points to counter these issues but they are only compatible with the latest versions of Windows.11 This situation leads to a very complex piece of software designed for a specific platform. By taking a look at the maintenance angle, the complexity will only increase dramatically. Let’s take the example of adapting our driver to support platforms from Windows XP to Windows 10. The evolution of operating system internal APIs makes the driver’s code mostly incompatible, hence forcing the developer to maintain multiple versions of the same driver. The hooking techniques and frameworks also have to be adapted, since the new security mechanisms introduced might break the previous workaround setup to intercept API calls. Finally, on top of this, it is inconceivable to add support for other platforms, such as Linux or Android.

Leveraging SLAT Second Layer Address Translation (SLAT) is the generic name given to a hardware virtualisation technology that avoids the overhead of maintaining software emulated shadow page tables for a hypervisor by adding a new set of hardware page tables 8

Computer Fraud & Security

specifically designed to translate a GPA (guest physical address) to an HPA (host physical address).12 It was implemented first by AMD as RVI (Rapid Virtualisation Indexing) and was then followed by Intel with EPT (Extended Page Table).13 As page tables contain permission flags, it is possible to take advantage of them by changing the configuration of a specific page table in order to generate a fault on purpose. Doing so will force the guest to yield control back to the hypervisor whenever a process tries to read, write or execute the content of a page table. This mechanism can be leveraged to hide traditional breakpoint injections from the guest. The breakpoint is injected into a specific address in memory and the page is marked as non-readable. If the guest tries to read the page, a trap is generated, returning the control to the hypervisor. The hypervisor will restore the original instruction replaced by the breakpoints, as well as the page read permission flag. The hypervisor can single-step the execution and restore the injected breakpoint and the non-readable permission of the page. This technique has been demonstrated by the DRAKVUF project and allows a VMI application to stop the guest execution on any memory address access.14 By coupling this technology with a powerful memory introspection framework (such as Rekall) making use of debug symbols, it is possible to know the exact address of every system call handler (SSDT), as well as every userland API call by parsing every loaded DLL’s export table.

The semantic gap The core concepts of VMI are not new to IT security – in fact, they have been around for a while. The main reasons for the slow adoption of VMI are the challenges brought by trying to fill the semantic gap. By moving the source of information from the execution environment to the hypervisor, the entire high-level context provided by the OS facilities is lost. Bridging the gap between the low-level details that the hypervisor can observe and

the high-level semantics that the security analysts are looking for is the missing component required to make VMI techniques successful in detecting malware. The Rekall memory forensic framework can come to the rescue here. By directly downloading official debug symbols from the OS developers’ servers and applying them to memory images, it can provide a high-level view on top of a raw memory dump.15 LibVMI, a popular VMI library, already has an integration with Rekall, allowing a VMI application to translate OS-specific symbols.16

Nitro and building an SDK Starting as a masters thesis, the Nitro project aimed to intercept system calls on top of KVM.17 By leveraging the x86 architecture mechanisms involved in implementing fast system calls, the operating system can be forced to yield control back to the hypervisor at the desired moment. The result is the possibility of intercepting any given system call at the beginning and end of its execution in the kernel space. This project became unmaintained for a few years, until it was folded into the KVM-VMI project.18 The KVM-VMI project first aimed to provide a semantic translation layer, which was missing from the original project. This contribution identifies the system call name and rebuilds the execution context by examining the memory state. To that end, Nitro has included two projects well known in the VMI community. First Rekall, briefly presented in the previous section, has brought the identification of famous data structures in memory, such as the list of EPROCESS maintained by the Windows kernel, as well as how to read the internal fields by providing the right offsets. Then, LibVMI has allowed advanced read/write access to the memory, effectively extracting the context to be rebuilt. Furthermore, a system call argument mapping combined with a dynamic hooking system allows the user February 2018

FEATURE to easily react to events and extend Nitro to write its own VMI application logic. Finally, the KVM-VMI project now aims to integrate standardised VMI capabilities in the QEMU/KVM virtualisation solution, to provide a similar VMI interface as in DRAKVUF.

Conclusion Sandboxing solutions are expensive in terms of development and maintenance, yet their value for IT security has been proven over several years. The belief is that new technologies, such as VMI, paired with effective design patterns, can significantly reduce the cost, allowing security analysts and researchers to focus on the real value delivered by such technology. By combining the VMI features provided by the KVM-VMI solution with an event-driven framework (like SEE), a modern and flexible platform can be designed. The platform would allow the analysts to quickly add forensic capabilities while maintaining a fine-grained control over the execution flow. This previously required specific knowledge about operating systems platforms and virtualisation technologies. These details are now abstracted to the user, allowing him or her to focus on extracting the actual behavioural features necessary to drive their investigations.

About the authors Mathieu Tarral is a software engineer working for F-Secure Labs’ Advanced Protection Solutions team, where he focuses on the development of behavioural analysis platforms. His areas of expertise are malware analysis and virtualisation. Matteo Cafasso is a senior software engineer who also works for F-Secure Labs’ Advanced Protection Solutions team. He focuses on the design and development of automated malware analysis systems. His areas of expertise are distributed programming in cloud environments and the development of secured environments for handling hazardous software. February 2018

Figure 3: NtOpenFile system call hooking using Nitro.

References 1. Sikorski, M; Honig, A; ‘Practical Malware Analysis’. No Starch Press, February 2012, pp.40-41. 2. Mani Chandy, K; ‘Event-Driven Applications: Costs, Benefits and Design Approaches’. California Institute of Technology, 2006. 3. Michelson B. ‘Event-Driven Architecture Overview’, Patricia Seybold Group, February 2 2006, DOI 10.1571/bda2-2-06cc. 4. Gamma E; Helm R; Johnson R; Vlissides J. ‘Observer’, in the book, ‘Design Patterns’, edited by BW Kerninghan, Addison-Wesley, 1994, pp.293-303. 5. Sandboxed Execution Environment, home page. Accessed Jan 2018. https://github.com/F-Secure/see. 6. Libvirt Virtualisation API, home page. Accessed Jan 2018. https://libvirt.org. 7. Garfinkel, T; Rosenblum, M; ‘A Virtual Machine Introspection Based Architecture for Intrusion Detection’. In proceedings, Network and Distributed Systems Security Symposium, 2003. Accessed Jan 2018. https://suif.stanford.edu/ papers/vmi-ndss03.pdf. 8. Virtual Machine Introspection, home page. Accessed Jan 2018. https:// wiki.xenproject.org/wiki/Virtual_ Machine_Introspection. 9. ‘Xen Project Virtualisation Updated with Improved VMI and Security’.

Linux Foundation, 13 Oct 2015. Accessed Jan 2018. www.xenproject. org/users/why-the-xen-project/about/ in-the-news/194-xen-project-4-6-virtualisation-updated-with-improvedvmi-and-security.html. 10. ‘[RFC PATCH v3 0/1] VM introspection’. KVM mailing list. Accessed Jan 2018. https://marc.info/?l=kvm& m=150514457912721&w=2. 11. Chen, R. ‘Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?’. Microsoft, 21 Sept 2011. Accessed Jan 2018. https://blogs.msdn.microsoft.com/ oldnewthing/20110921-00/?p=9583. 12. ‘Second Level Address Translation’. Wikipedia. Accessed Jan 2018. https://en.wikipedia.org/wiki/ Second_Level_Address_Translation. 13. Nikhil, Bhatia. ‘Performance Evaluation of Intel EPT Hardware Assist’. VMware, 2008. Accessed Jan 2018. https://www.vmware.com/pdf/ Perf_ESX_Intel-EPT-eval.pdf. 14 Drakvuf, home page. Accessed Jan 2018. https://drakvuf.com. 15. Rekall, home page. Accessed Jan 2018. https://github.com/google/recall. 16. LibVMI, home page. Accessed Jan 2018. http://libvmi.com. 17. Nitro, home page. Accessed Jan 2018. http://nitro.pfoh.net. 18. KVM-VMI, home page. Accessed Jan 2018. https://github.com/KVMVMI/kvm-vmi. Computer Fraud & Security

9