Digital Investigation 14 (2015) 63e75
Contents lists available at ScienceDirect
Digital Investigation journal homepage: www.elsevier.com/locate/diin
BrowStEx: A tool to aggregate browser storage artifacts for forensic analysis Abner Mendoza a, Avinash Kumar b, David Midcap b, Hyuk Cho b, Cihan Varol b, * a b
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA Department of Computer Science, Sam Houston State University, Huntsville, TX, USA
a r t i c l e i n f o
a b s t r a c t
Article history: Received 7 November 2014 Received in revised form 25 July 2015 Accepted 1 August 2015 Available online xxx
Web storage or browser storage, a new client-side data storage feature, was recommended as a part of the HTML5 specifications and now widely adopted by major web browser vendors. Web storage with native browser support has changed the paradigm of web application development unprecedentedly because persistent data storage with increased data size can be realized on the client. Web storage is poised to quickly become an area of particular interest for forensic investigators due to the potential to discover critical information from web browser artifacts at client side. However, the literature work on web browser forensics has traditionally focused on browsing history, browser cache, and cookie files (Oh et al., 2011). Therefore, we first discuss the prevalence of web storage implementation in widely used websites. Then, we compare and contrast the web storage technology currently implemented in the five major web browsers, Google Chrome, Internet Explorer, Mozilla Firefox, Opera, and Apple's Safari. Moreover, in order to provide more insights into web storage and enable unified forensic analysis, a proof-of-concept tool, named as BrowStEx (Browser Storage Extractor), is described with implementation details. The commonalities, differences, and the proof-of-concept tool discussed in this paper can be useful in developing advanced forensic tools that can extract browser storage artifacts. © 2015 Elsevier Ltd. All rights reserved.
Keywords: HTML5 Local storage Persistent storage Web browser forensics Web storage
Introduction With the increasing reliance on the web to sustain a modern lifestyle, a wealth of information can be harvested by analyzing a user's web browsing activities. Indeed, the bulk of a user's interactions with end-user workstations today is related to Internet communications (Altheide and Carvey, 2011). Accordingly, web browser forensics is fast becoming an important investigation topic within the computer forensics field. As web technologies evolve and generate explosive amounts of data, the methods and tools
* Corresponding author. E-mail address:
[email protected] (C. Varol). http://dx.doi.org/10.1016/j.diin.2015.08.001 1742-2876/© 2015 Elsevier Ltd. All rights reserved.
for extracting useful information need to properly handle the volume, the velocity, and the variety of data accordingly. Computing devices are becoming more cloud-based, with requirements to work seamlessly with and without an Internet connection; thus, web developers have to cope with this paradigm shift by utilizing HTML5 (HyperText Markup Language 5) APIs (Application Programming Interfaces) that support persistent data storage even when devices are offline. The most popular of these persistent data storage mechanisms is a new HTML5 API known as Web storage, sometimes also referred to as Browser Storage, HTML5 Storage, Local Storage, Offline Storage, or DOM (Document Object Model) Storage (Laine, 2012). For consistency, we refer to these collectively as Web storage in this paper. Web storage is a browser-based API bundled in
64
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
HTML5 specifications that allow persistent client-side storage for web applications. As major web browser vendors adopted this new storage mechanism, developers have realized the benefits of being able to store persistent data on the client side. This new mechanism for client side data storage is appealing not only because it is natively supported by browsers, but also because it offers much larger storage capacity than that was available with older mechanisms such as cookies. Additionally, unlike cookies, web storage data does not get transmitted with each request to and from a web server, thereby reducing the bandwidth overhead. Despite the increased use of web storage among the popular websites today, to the best of our knowledge, there is almost no detailed discussion on this newly adopted technology in browser forensic investigation literature. Therefore, this paper intends to discuss the prevalence of web storage implementation in current major web browsers, emphasize the high potential of finding information on client-side web storage data generated by multiple web browsers, and guiding in developing web storage forensics tools. We first compare client storage mechanisms and provide the characteristics of HTML5 web storage. Then we provide an evidence of how prevalently the web storage technology has been adopted by web browsers. Particularly, we compare and contrast the details and nuances of web storage implementation among the five major web browsers, Chrome, Explorer, Firefox, Opera, and Safari. Furthermore, in order to provide more insights and guides in developing web storage forensic tools, a proof-ofconcept tool, named as BrowStEx (Browser Storage Extractor), is described. BrowStEx can extract the data artifacts from related web storage files and present them in an aggregated view for unified analyses. The remainder of the paper is organized as follows. In Section Prior work, we discuss related work about web storage. In Sections Persistent storage and HTML5 web storage, the details of persistent storage and HTML5 client-side storage in different browsers are summarized. The findings on web storage implementation and stored data are discussed in Section Web storage implementation and utilization. In Section BrowStEx: design and implementation, the BrowStEx application is introduced. Finally, we conclude the paper with summary and brief discussion on future research direction in Section 7. Prior work Other than the W3C (World Wide Web Consortium) draft specification guidance (W3C, 2013), browser vendors have not disclosed much details regarding the implementation of web storage technology in their respective browsers. Moreover, there seems to be no comprehensive research that compares and contrasts details on how web storage is implemented and incorporated among different browsers. Prior research work related to HTML5 web storage has mostly focused on the ways in which web storage may be used, such as the variety of stored data, and specifically, the issues surrounding security and privacy. West and Pulimood (2012) provided an analysis of the web storage specification and its usage in the context of
privacy and security issues. The authors implemented a simple budget management web application, which uses web storage to store data locally on the client machine and periodically synchronizes the data with the server. Their experimental study provides an insight into how improper usage of web storage could expose sensitive data to other users using the same web application on the same system, such as users in an Internet cafe. The study exemplifies a potential security breach of personal data and also hints at an opportunity in the context of digital forensics. For example, if a website records search queries, browsing history, and other user information into web storage, it could provide invaluable information in a digital investigation, especially if a user has attempted to hide his/her browsing activities by clearing the browsing history simply using the browser-provided history deletion functionality. In such a case, the web storage contents would be a better target for investigation. Silo (Mickens, 2010) leveraged web storage to cache JavaScript and CSS (Cascading Style Sheets) chunks on the client side, thereby improving the performance of web applications by requiring less data transmission between the browser and the remote web server. Bogaard et al. (2011) explored how malicious web developers could exploit web storage and use it as a covert channel to distribute sensitive information across the Internet and also to retrieve it at a later time. The premise of this technique is that if a malicious user does not want to store information locally in their own machine or in a server, they can store it in unsuspecting client machines leaving no evidence of the data in their own machine. The authors built a web application that would break a file into 26 portions and distribute each portion to a large number of clients on the Internet that visit the web application. The goal of the experiment was to show that the file could be reconstructed eventually and to investigate how many copies of each portion of the file were necessary to be distributed to increase the probability that it could be restored on subsequent visits. The authors exploited the fact that web storage usage by any given web application is often transparent to the user, while that the process for manually removing web storage data is not very intuitive to the user. They also highlighted that almost any type of data could be stored in web storage simply by changing it to a text-based representation that could be later converted to its native format. This means it is difficult to prevent a malware distributor from using this storage to store malware payload, or perhaps code that could be injected into the browser to download malware. Similarly, a nefarious developer could easily use a binary-to-text translation tool to store binary files in a client machine. Lekies and Johns (2012) showed how malicious users might use web storage to inject JavaScript payload files into web applications. The authors provided an excellent overview of three attack scenarios, which shed light on the vulnerabilities associated with web storage. In their research, they investigated the top 500,000 domains, as ranked by Alexa (2013), and showed that web storage was already widely implemented, especially among the most popular websites. These previous studies show the versatile use of web storage. Current trends indicate that web storage is mostly
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
used as a means of preserving state in offline situations and using it for caching and performance purposes (Mickens, 2010; Hanna et al., 2010). Only a few research articles are available and discuss about how web storage technology is being used; however, it is evident that there is an increasing adoption of web storage usage, as discussed in Lekies and Johns (2012). Therefore, digital investigators must understand this new technology, web storage, and recognize the importance of analyzing data stored by this mechanism on the host system. Despite this, leading browser forensic tools such as NetAnalysis (Digital Detective Group Ltd, 2014) and Web Historian (Mandiant Corporation, 2014) do not currently support the analysis of web storage data, yet. Persistent storage As the web has evolved, so has the need for a native method of storing data on the client that can be persisted across browsers and operating system restarts. Web storage is a technology recently developed to fill that gap (Anthes, 2012). An important feature of web storage is that web storage data is not transmitted to the server at each HTTP request. This feature increases performance but it may introduce some security concerns such as the ability to modify locally stored content and use it to manipulate the application on the client side, while effectively bypassing any integrity checks on the server (Bogaard et al., 2011). Other security issues with web storage applications can be found in literature (West and Pulimood, 2012; Lekies and Johns, 2012; Bogaard et al., 2011; Mickens, 2010; Hanna et al., 2010). Several client side data storage mechanisms have been developed over the years (Microsoft, 2014; Pilgrim, 2010), but none has been widely accepted and consistently adopted by browser vendors. This is mostly due to the infamous browser wars and the reluctance of browser vendors to agree on any single protocol (e.g., HTML5) (Pilgrim, 2010). Web applications have not traditionally enjoyed the luxury of persistent and abundant data storage mechanisms on the client side. Early on, cookies were developed to compensate for the stateless nature of HTTP (Hypertext Transfer Protocol), and web developers began using them as a means of persistent storage (Pilgrim, 2010). Further comparison among the five widely-used client storage mechanism is summarized in Table 1. The variety of mechanisms and other products developed over the years for local persistent data storage were either browser specific or dependent on browser plugins. For example, Flash
Table 1 Comparison of widely-used client storage mechanisms. Plugin required Cookies Web storage Session storage Flash Local Shared Objects (LSO) IE UserData
✓
Persistent
Standards specs
Data size
✓ ✓
✓ ✓ ✓
✓
4 KB >5 MB >5 MB 100 KB
✓
64 KB
65
Cookies, also known as Flash Local Shared Objects (LSO) (Ayenson et al., 2011; Soltani et al., 2010), were widely used because the Adobe Flash plugin was available on a large number of browser installations. However, the dependency on the Flash plugin implies that the decline of Flash also results in the decline of Flash LSO usage. An extreme case of persisting data on the client side is through the use of an evercookie (Kamkar, 2010), which leverages the fact that browsers have various mechanisms of storing data locally, and thus browsers attempt to utilize all available data storage mechanisms on the client browser to persist user data. Evercookie abstracts the multitude of mechanisms and allows a website developer to use a single JavaScript API that utilizes all available storage mechanisms on the client browser to ensure that data is persisted across restarts. Until recently, there has been no consensus among browser vendors as to how persistent storage should be implemented. However, the development of the HTML5 standard has changed that fact and web storage is now implemented in recent versions of all major browsers, both on the desktop and mobile platforms (CanIUse.com, 2015). Web storage has an advantage over previous methods specifically because it is native to the browser and therefore not dependent on any plugin installations. HTML5 web storage HTML5 Web Storage comprises of two browser-based APIs (localStorage and sessionStorage) bundled in the specifications that allow persistent client-side storage for web applications (W3C, 2013; Anthes, 2012). These native browser APIs allow websites to store domain-specific key/ value pairs of data. The two APIs mainly differ in the fact that localStorage data persists across browser sessions, while sessionStorage data is discarded when the browser session ends. Although we provide comparison of both local and session storage in Section Local storage vs. session storage, in this paper, we primarily focus only on localStorage, since ours and other previous work (Lekies and Johns, 2012) show that localStorage API is most commonly used by leading websites. Therefore, when we refer to web storage, we are specifically referencing the localStorage API. While it is tempting to consider web storage as a replacement for cookies, both technologies significantly differ in how they are implemented as well as how they can potentially be used. These differences are noted in Table 1. The first notable difference is the amount of data that can be stored. While cookies allow for a small amount of data (4 kilobytes), web storage allows 5 megabytes or more of storage space per domain (West and Pulimood, 2012). It is also important to note that no specifics have been set as to the amount of storage space provided for web storage, but the general implementation thus far has suggested a 5 megabyte default space per domain. From our analysis of different browsers, we find that most browsers follow this general guideline, with some exception. For example, Crowley (2010) notes that Internet Explorer provides 10 megabytes of web storage by default. Another advantage of web storage over cookies is to track user's progress locally. Specifically, if one closes the web browser while playing an
66
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
Table 2 Adoption of web storage API in Top 15 US Websites.a Rank
Website
Dec. '12
Jul. '13
Oct. '14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Facebook Google Youtube Yahoo! Amazon eBay Wikipedia Craigslist Twitter Live LinkedIn Go Blogspot Bing Pinterest
7 ✓ ✓ ✓ 7 7 7 7 ✓ ✓ 7 7 7 7 7
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 7 ✓ 7 ✓ 7
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
a
awareness of the prevalent adoption of this technology and also to provide a way to extract useful information from artifacts in web storage. Browser implementation
Ranking based on statistics as of July 2013, from Alexa (2013).
online game, the progress may not be automatically stored locally to the cache with cookies (Curran and George, 2012). However, HTML5 web storage may grant the events and progress to be stored locally, remembering where the user left the system. Therefore, the previous moves and positions in the game can be traceable and retrievable.
Adoption of web storage API Lekies and Johns (2012) showed that web storage had been widely adopted in the most popular web applications. During our research, we have looked more closely at the desktop version of the top 15 US websites (Table 2), as ranked by Alexa in December 2012, July 2013, and October 2014. In December 2012, we found that five websites had implemented web storage in their desktop-optimized websites. This was confirmed by manual examination of each site. Each site was loaded into a desktop browser and some common user actions were conducted, such as scrolling and clicking, to invoke the site's dynamic functionality that could trigger the use of the web storage API. The web storage contents were then examined to determine if the site utilized web storage on its landing page. In July 2013, repeating the same procedure, we found that twelve desktop-optimized sites employed web storage. A last examination in October 2014 showed that all the listed 15 websites had been using web storage. As reflected above, in the span of just over two years, the number of sites utilizing web storage among the top 15 US websites rose dramatically. Therefore, this study aims to raise
We further investigated the use of web storage employed in the top five browsers, Internet Explorer, Chrome, Firefox, Opera, and Safari, and on the three operating systems, Windows 7 and 8.1, Mac OS X Yosemite version 10.10.2, and Ubuntu 14.04. Collectively, these five browsers held 96.2% of web browser market share in the first quarter of 2015 according to statistics compiled by StatCounter (2015). As shown in Table 3, the storage of local data is identical among the mentioned operating systems, with the exceptions of Internet Explorer not available on both Mac OS X and Ubuntu environment and also Safari being not available on the Ubuntu operating system. Table 4 summarizes the paths of web storage information located in each of the operating systems. For each browser, we identified the storage location, file type, and style used by the browsers to store the data on the local hard drive. For instance, we utilized the Process Monitor tool from Microsoft SysInternals (2015) to identify the location on disk where each browser writes its data files in Windows operating systems as the web storage API is invoked. It is important to note that the files could be located under several folders on a disk image, based on user-specific folder structures. Therefore, we specify the locations using the Windows “%userprofile%” environment variable. This is a special variable which references the folder on disk where user-specific data is stored. Therefore, this user-specific data may result in multiple locations on disk depending on the number of users authorized on the target computer. Additionally, all browsers separate client data storage by the website origin, which is a fundamental concept used by web browsers for uniquely identifying and separating website code for security policy enforcement. An origin is defined as a triple consisting of protocol, domain, and port (Jackson and Barth, 2008). If no port is specified in the Uniformed Resource Identifier (URI), then none is used. For example, if a user browses http://www.google.com, then the browser will use the tuple (www.google.com) as the web origin for enforcing privilege separation. According to each browser implementation, a unique storage space for web storage will be created for each web origin. This can be in the form of a separate file on disk, or using the origin as the primary key in a database schema. If a website is
Table 3 Comparison of web storage implementation for different browsers on different operating systems.
Firefox Chrome Opera Safaria Internet Explorera
Standard naming schema
File extension
File type
Number of files
Webappstore [Protocol]_[Domain]_0 [Protocol]_[Domain]_0 [Protocol]_[Domain]_0 [Domain][1]
.sqlite .localstorage .localstorage-journal .localstorage .localstorage-journal .localstorage .xml
SQLite SQLite SQLite SQLite XML
1 # # # #
of of of of
origins * 2 origins * 2 origins origins
a Internet Explorer is not available on both the Mac OS X (Discontinued since 2003) and the Ubuntu operating systems. Safari is not available on the Ubuntu operating system.
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
67
Table 4 File system location of web storage files for different browsers across different operating systems. Operating system
Web browser
Location of web storagea
Windows 7 & 8.1
Chrome Internet Explorer Firefox Opera Safari Chrome Internet Explorer Firefox Opera Safari Chrome Internet Explorer Firefox Opera Safari
“%userprofile%yAppDatayLocalyGoogleChromeyUser DatayDefaultyLocal Storagey” “%userprofile%yAppDatayLocalLowyMicrosoftyInternet ExploreryDOMStorey” “%userprofile%yAppDatayRoamingyMozillayFirefoxyProfilesy” “%userprofile%yAppDatayRoamingyOpera SoftwareyOpera StableyLocal Storagey” “%userprofile%yAppDatayLocalyApple ComputerySafariyLocal Storagey” “%userprofile%yLibraryyApplication SupportyGoogleyChromeyDefaultyLocal Storagey” N/A “%userprofile%yLibraryyApplication SupportyFirefoxyProfilesy” “%userprofile%yLibraryyApplication Supportycom.operasoftware.OperayLocal Storagey” “%userprofile%yLibraryySafariyLocalStoragey” “/home/ubuntu/.config/google-chrome/Default/Local Storage/” N/A “/home/ubuntu/.mozilla/firefox/
.default/” “/home/ubuntu/.config/opera/Local Storage/” N/A
Mac OS X 10.10.2
Ubuntu 14.04
a %userprofile% is an environment variable that refers to the user-specific settings, including the specific file path on disk that varies for different users on the system.
composed of contents from multiple origins, then each origin will similarly have a separate data storage space. Since the web storage scheme and the stored information do not vary among the operating systems for each browser, we limit our discussion to web storage observations on the Windows 8.1 operating system. Our salient observations can be generally applicable to all the operating systems. Chrome Chrome has supported web storage since version 22.0 (CanIUse.com, 2015). In our experiment, we used version 23. Chrome stores most of its data artifacts in SQLite database files (SQLite.org, 2013) and web storage is no exception. In Windows 8.1, web storage data is stored under “%userprofile%yAppDatayLocalyGoogleChromeyUser DatayDefaultyLocal Storagey” folder in the file system (see Table 4 for details on the other operating systems). Chrome stores one file per domain, with the convention [protocol]_[domain]_0.localstorage, where [protocol] is the protocol used (http or https), and [domain] is the domain of the website. For example, a file containing data related to http://m.facebook.com would be named http_m.facebook.com_0.localstorage. This file can be opened with a SQLite database file browser to inspect their contents. Within the SQLite database file, Chrome uses a simple structure with two fields, named ”key” and “value”, which refer to the id specified by the web application code and the value stored for that id, respectively. There can be an arbitrarily large number of key/value pairs stored for any domain, bounded only by the size limitation enforced by the browser for the entire domain. Internet Explorer Internet Explorer has supported web storage since version 8 (CanIUse.com, 2015). We conducted our experiments using version 10. Browsing artifacts for Internet Explorer have traditionally been stored in binary files on disk, requiring more steps to extract the data (Jones, 2003). Web storage data, however, is stored in plain text XML files, making it easier to analyze using various tools. On
Windows 8.1, Internet Explorer stores web storage data files under the “%userprofile%yAppDatayLocalLowy MicrosoftyInternet ExploreryDOMStorey” folder (see Table 4 for details on the other operating systems). A random folder name is created for each session, and the XML files are stored in these folders. The filename convention used is domain[1].xml, where domain is the domain name of the website that stores the data. Additionally, the structure of the XML files is as follows:
The name attribute contains the “id” of the storage item, and the “value” attribute contains the data stored for that particular id. Internet Explorer 10 allows up to 10 megabytes of storage for each domain (Microsoft MSDN Library, 2012; Crowley, 2010). Firefox Mozilla Firefox has supported web storage since version 15 (CanIUse.com, 2015). For our experiment, we used version 16 on 8.1. Firefox stores all web storage data in a single SQLite database named webappstore.sqlite saved under “%userprofile%yAppDatayRoamingyMozillay FirefoxyProfilesy” folder (see Table 4 for details on the other operating systems). Unlike Chrome, which also uses SQLite files, Firefox uses a single file to save all web storage data. The structure of the database file includes an additional field called scope which specifies the domain name of the website that owns the data stored in that record, as well as the protocol used, such as HTTP or HTTPS, and the port, usually 80. This makes it easier for an investigator to collect the data since everything is located in one file. Opera Opera has included web storage since version 24 (CanIUse.com, 2015), which is the version we used in our experiments. Opera stores its web storage data as SQLite
68
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
files in “%userprofile%yAppDatayRoamingyOpera SoftwareyOpera StableyLocal Storagey” folder (see Table 4 for details on the other operating systems). Opera stores the web storage for each domain as a separate file using the naming convention similar to Chrome (see Section 4.2.1). Opera's SQLite files for web storage contain a single table, ItemTable, made up of two fields, “key” and “value”, where key is the id used by the web application for that data and value is the actual data stored. Safari Safari has used web storage since version 4 (CanIUse.com, 2015). For our experiment, we used version 5 on Windows 8.1. Safari stores its web storage data as SQLite files in “%userprofile%yAppDatayLocalyApple ComputerySafariyLocal Storagey” folder (see Table 4 for details on the other operating systems). Safari uses a separate file for each origin to store the web storage of each website, employing the naming convention similar to Chrome (see Section Chrome). The SQLite database file for each origin is made of a single table called ItemTable that contains two fields, “key” and “value”, with key referring to the id of the web application data and value being the actual data associated with that key. Web storage implementation and utilization The World Wide Web Consortium (W3C) has developed the web storage specification and proposed the following set of recommendations for implementation and usage (W3C, 2013): * * * * *
Persistent storage can be implemented for each website. Preferred space limit is 5 MB for each website. ata should be stored in key/value pairs. No particular specification is set for the data type. User agents must have a set of web storage areas, one for each origin.
Differences and similarities among browsers There are important differences in how web storage is implemented among different browsers. There are also similarities in how different browsers implement web storage. Chrome, Safari, and Opera are all built on a variant of the WebKit rendering engine, and thus have similar implementations (Garsiel and Irish, 2011). Internet Explorer uses a proprietary rendering engine, and thus displays the most difference among the implementation approaches. The only commonality across all the five browsers is the API interface used by JavaScript code that utilizes web storage. This enables browser-independent utilization of web storage in web applications. File types and data variety Among the top five web browsers, Internet Explorer uses the XML file format and the other four browsers use the SQLite format for web storage. Differences between SQLite and XML storage are summarized in Table 5. Chrome, Opera, and Safari store user's browsing activities into local (web) storage in a similar manner, using one
Table 5 SQLite vs. XML. Criteria
SQLite
XML
Speed Access
Faster for larger dataset Requires SQL query to access the data Data can be searched easily and quickly
Slower for larger dataset Can be directly accessed via any text editor Difficult to search data in large dataset
Search
SQLite file for each web origin. The only difference is the location of the folders on disk. Although the stored file type for Firefox is similar, it generates only one SQLite file for all the visited websites. Besides the key/value pair, it cumulates other attributes, such as scope and origin that contains URL name along with the port and protocol that are used for the accessed website. Internet Explorer also stores the information in a key/value format, but it uses an XML scheme instead of SQLite. During a website visit, the web storage API creates multiple key/value pairs for different operations according to custom JavaScript code embedded in the website. When a particular website is visited multiple times, some key/ value pairs do not get modified. The logic is dictated by the JavaScript code embedded in the website. In some instances, the web storage keeps the original key attributes, but the corresponding value of that key gets overwritten each time on subsequent reloads of the same website. The presence of some key/value data depends on the activities or sequence of activities and event triggered by user actions on a website. For example, if a user clicks on an ad, a JavaScript event may be triggered to record details of the click to web storage. This key/value pair would not be present if the user did not perform that particular action. In all cases, the database table stores a unique key for each data artifact saved. These keys are not duplicated for any given origin. As a result, the web storage data only contains the latest and unique information. We observed that some websites, such as Linkedin, Facebook, Amazon, and Ebay, stored data to web storage only if the user logged into the websites. Conversely, other websites cumulated web storage information as soon as the user visited the sites. Also, the web storage data for some websites like yahoo.com and youtube.com do not change much regardless of the user's login state. In many cases, the only change that occurs is in terms of the timestamp, which gets overwritten for most of the key values. Additionally, since browsers use the web origin concept to create separate files, then some data could get duplicated if a user first accesses a website using HTTP, then switches to HTTPS and performs similar actions on the website. For example, when browsing Craigslist.com in the guest user mode, the browser creates a web storage file with the HTTP. When the user logs in to the website, a new web storage file is created with the HTTPS. The new file includes completely new key/value pairs without overwriting the previously created http file. Another difference is on the stored information. Specifically, there are no common trends in the types of data stored in web storage among different websites. For example, CNN.com stores the user location as well as the IP address of the user and number of times it is visited, while YouTube.com stores other technical details such as the
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
screen size selected by the user, the volume level, the channel that was watched, etc. as shown in Table 6. Additionally, while some key/value pairs are not recorded for an activity in one browser, such as “kxtech” for Opera web browser while visiting CNN, those missing key/value pairs can be found in web storage for the same action in another (Chrome) browser. This is due to JavaScript code branches conditioned on the observed browser user agent string. Accordingly, the information obtained from web storage is not always consistent. In other words, the information that will be stored into web storage is filtered not only by the web storage implementation of the specific browser, but also by the website developers. Timelines Timelines can be useful in digital forensics for reducing data volume as well as identifying suspicious activities that occurred on a target computer. A suspect's web browsing activities stored in web storage can be collected and arranged in chronological order, through which a timeline of websites visited and actions taken by the user can be traced and analyzed further. Oh et al. (2011) summarized that the five web browsers use different time formats when storing log files, such as cache and web history log files. Specifically, Internet Explorer uses FILETIME format, Firefox employs PRTime format, Chrome is based on WEBKIT Time format, Safari uses CF Absolute Time format, and Opera stores the information in UNIX Time format. In addition, they found that all five leading web browsers store Coordinated Universal Time (UTC); thus, the timestamp information provided by the computer is not the user's local time. Therefore, they suggested that the investigator should apply a time zone correction based on the crime and investigation location so as to identify exactly when a certain activity occurred on a computer. We performed three specific tests to explore the nature of time stamps in localStorage. First, we compared key/ value pairs from Youtube.com and CNN.com across all the five browsers. Independent from browsers, time stamp values were recorded in UNIX format using milliseconds. This corresponds to our general observations that the management of localStorage data is determined by website, not by the browser. Secondly, we observed time stamp values stored by Chrome across a multitude of websites. The vast majority of time stamps found were stored in the same UNIX format using milliseconds. However, there are some exceptions. For example, the key/value pair found in the file http_android.stackexchange.com_0 included a UNIX time stamp in seconds, not milliseconds. From this, it can be inferred that localStorage time stamps are not restricted to one time format, but it is likely due to the implementation of the website (e.g. Javascript's “Date.now()” function). Thirdly, we tested whether the time stamps were being taken from the server-side or the clientside. To do this, the “startdate” localStorage value had been obtained twice, one with the current system clock time and the other with the system clock set to be seven days later. The difference in the time stamps corresponded exactly to the changes made to the system clock, which shows that the time stamp is obtained client-side.
69
In summary, unlike log files that store data with a UTC timestamp (Oh et al., 2011), we observed that localStorage records web browsing activities with a client's system timestamp in UNIX Time format, regardless of web browser type. Accordingly, the time information on a specific web browsing activity extracted from localStorage is client system's local time at which the action was taken in a client. Therefore, making the arrangement of timelines of web browsing activities stored in localStorage is easier compared to other components of the stored web data, such as history and log files. However, this can be considered as problematic, in case the time stamp of the computer is manipulated. Furthermore, there are a variety of limitations about web storage that can negatively impact a digital investigation. For instance, if the “incognito” mode is used during browsing, no files are generated in the local hard drive, which is also supported by Ohana and Shashidhar (2013). Therefore, even an image of a hard drive won't reveal any information related to web storage. While these traces of web storage information can be found in memory dump, file carving is necessary to solve the puzzle in the memory. Another major problem is in the variety of different key/ value formats being used for the same activity in different websites. For instance, while with CNN there is a key/value pair of “startdate” that indicates the session start time, it is embedded to another key/value pair when Yahoo is visited. However, web storage can contain valuable information, such as user's IP address, the computer location used for web browsing, and computer and operating system profiles. Moreover, the user's actions on a web browser and user preferences can be located and obtained from the web storage as well. Accordingly, comprehensive user web browsing profiles can be collected via matching the IP address along with browser configuration such as the user-agent, plugins, screen size, etc, which can be vital for investigations conducted on web browsing. Therefore, it is evident that associating the data collected from web storage with other stored web data can reveal valuable information that was not possible before the implementation of web storage. Local storage vs. session storage As stated earlier, HTML5 web storage consists of both localStorage and sessionStorage APIs which differ in scope and lifetime. Session storage is implemented to allow separate instances of the same web application/process to run in different windows/tabs without interfering with each other. For example, if a user buying book in two different windows/tabs, using the same website, the book currently being purchased would “leak” from one window to the other if only cookies are being used for storage. This may result in potentially buying the same book twice without really noticing. This problem can be addressed by using session storage. Session storage is per-origin-per-window and is limited to the lifetime of the window (Sorensen, 2013). In other words, an investigator will likely retrieve session storage information with a forensic image of a physical drive, unless it is overwritten by other file(s). Other notable differences that we have observed are: 1) Session storage do not
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
Table 6 Youtube and CNN local storage samples.
70
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
Browsers Internet Explorer
Web Storage Files
Google Chrome
XML
Mozille Firefox
SQLite
Safari
71
BrowStEx Parser Case Files Aggregated Data
Opera Fig. 1. Schematic diagram of web browser forensics using BrowStEx.
have any official length limit; 2) When same webpage is opened in two different tabs, then each will have its own session storage, while this is not the same with web storage as we have discussed earlier; 3) By limiting the data access to a tab and then deleting the data when the tab is closed make the storage more secure than local storage; and 4) Session storage implementation among web browsers are different. While we see consistency on local storage implementation on major browsers, Chrome stores session storage identical to the implementation of IndexedDB (Jakus et al., 2010). This includes mostly plaintext files with names such as LOCK, LOG, CURRENT, various .ldb and .txt files with six-digit strings for names. However, Firefox holds the session storage information in a Javascript file. Session storage also contains valuable information, such as the number of times a button clicked by the user or context of the filled text fields, for an investigator but lacks of implementation among websites. Out of the 15 tested websites, only six of them (Facebook, Google, Youtube, Amazon, Wikipedia, and Twitter) create data in session storage. While Yahoo, CNN, etc. do not provide any session storage data, Google stores valuable data such as search terms, suggestions by Google, and location information. However, likewise local storage, if “incognito” mode is employed, even temporary files will not be generated for session storage. BrowStEx: design and implementation In this section, we discuss the motivation for building a tool to extract web storage data and describe the design and implementation of the proof-of-concept tool, BrowStEx. It is open to further improvement in future work. Motivation As discussed, the HTML5 web storage technology has been adopted at a rapid pace by most popular websites. Lekies and Johns (2012) found that the use of web storage was most common in the popular websites on the Internet. For that fact alone, it is an important source of information that could be useful during the course of a digital investigation. For example, when searching for web storage artifacts from Chrome, an investigator may encounter hundreds of individual SQLite files. Accordingly, the investigator has to open each individual file to examine its contents. As described in sections 4 and 5, web storage is implemented differently in each of the major browsers; however, no tools are currently available to aggregate web storage data from
multiple browsers. Additionally, popular web browser forensic tools, such as Web Historian (Mandiant Corporation, 2014), NetAnalysis (Digital Detective Group Ltd, 2014), and WEFA (Oh et al., 2011), do not currently include web storage as a source of data artifacts. As a result, digital investigators using such tools could potentially overlook critical data artifacts in web storage. Architecture and scope The overall process of web browsing forensics using BrowStEx is illustrated in Fig. 1. Most web browsers store localStorage data in either XML or SQLite formats. Therefore, both XML and SQL files are parsed to collect user's browsing activities from the five browsers. All files and artificats that are properly formatted as in Table 3 are collected from the folders specified in Table 4. Then, each key/value pair is stored as an entry in the aggregated data set. BrowStEx stores all results of each case in an XMLformatted file, which can be opened later for further analyses. XML provides investigators both interoperability and flexibility in choosing to open the case file in the BrowStEx tool, a text editor, or other tools that can read XML files. The basic outline of a case file is shown below:
72
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
Table 7 Aggregated data model. Field name
Data type
Description
Browser Site Key Value LastModified
Text Text Text Text Date/Time
Name of browser that saved the data Website or domain that stored the data Id used to identify the saved data Contents of the data stored Last date/time data modified or data file modified
The case number, case name, investigator name, and description of the case shown in the XML report are entered by the user of the tool before retrieving web storage data. The date information is added by the tool as soon as the user initiates the tool to generate case file. BrowStEx also appends the MD5 hash value of the aggregated data set
to the case file. This will assist in verifying the integrity of the aggregated data set. Other extracted data, such as browser name, visited website, each key/value pair and also last modified date/time stamp, are also stored within an “ExtractedData” tag in the XML file. This corresponds to the data model for the aggregated data set as shown in Table 7. Fig. 2 represents the class diagram of the BrowStEx. It consists of six classes, one main class referred as BrowStEx and five other classes (browser specific classes) dedicated to the five major browsers under consideration, Mozilla's Firefox, Google's Chrome, Microsofts' Internet Explorer, Apple's Safari and Opera. These browser specific classes have the same set of methods but with different access and parsing approaches on getting the location of localStorage data file, extracting data from the localStorage file, and storing the information on a data table in five different browsers. The main class consists of the user interface
Fig. 2. Class diagram of BrowStEx.
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
design of the BrowStEx and also includes various event handlers for variety of user actions, such as check boxes, button clicks, etc. Moreover, the main class uses the methods defined in the object of the five major browsers to show the data extracted from their local storage in a unified format. For example, btnGetData-Click() is an event that presents the data extracted from the selected browser in a grid view. Overall, since the main class obtains the related localStorage data from separate classes designed for web browsers, enhancing the tool to aggregate other web browsers can be easily conducted by just adding one relevant class to the implementation. BrowStEx is implemented with ASP.Net version 4.5 using Visual Basic 2012 as a programming language. Although BrowStEx is originally designed for use in 64-bit Windows operating systems, the general concepts and methodologies for locating and extracting data artifacts could be applied to other operating systems on which modern browsers are used. The only modification needed to enhance the tool to work on different operating systems is to update the file locations in the browser specific classes that points to the localStorage files. User interface and functionalities Usability is an important and often overlooked aspect of any digital forensic tool. Hibshi et al. (2011) performed a
73
study among forensic practitioners and found a number of usability issues that should be considered when developing forensic tools. They identified the use of graphical user interfaces as a preferred feature among some practitioners. Additionally, graphical user interfaces help to reduce the number of human errors, omissions, and other hindrances that may arise with command-line tools. The experts interviewed in Hibshi et al. (2011) also indicated a desire to have a “Get Evidence” button in a forensic tool. We incorporated these suggestions into BrowStEx in an effort to increase its usability. Likewise other well known digital investigation tools, such as FTK Forensic Toolkit (FTK) (2014) version 5.0 and Encase EnCase Forensic (2014) version 7.0, upon starting a new case, the user is prompted to enter case and investigator related information in BrowStEx. After selecting appropriate checkboxes referring to the five major browsers, the user can commence the retrieval of the localStorage data by clicking “Get Data” button. This button executes “btnGetData-Click” method in BrowStEx main class and presents the data in Grid View as shown in Fig. 3. Besides retrieval of whole web storage data, user can also conduct searches for a particular data by entering the search keyword in the Search Filter textbox located on the right side of the main screen. Another functionally added to BrowStEx is the TimeLine Analysis. The TimeLine Analysis tab provides the user to
Fig. 3. BrowStEx user interface.
74
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
search and present the web storage data on the discussed five web browsers based on a certain date/time frame as shown in Fig. 4. The marked rectangle area in the figure contains drop-down menus by which the timestamp content can be filtered. As discussed earlier, localStorage records web browsing activities with a client's system time stamp. Therefore, BrowStEx uses the local computer system time localization for the timeline analysis, rather than using time stamps stored in web storage data. The rationale behind this selection is that although date/time information can be located on the contents of the web storage, there is inconsistency on the key/value pair names for the time stamps as well as the time formats used. Conclusion and future work We presented a brief overview of the evolution of persistent storage mechanisms on websites and described the new web storage feature bundled with the new HTML5 specifications. The main contribution of this paper is to identify the means by which different browsers implement web storage, and to show that further information can be extracted from web storage artifacts that may not be present in other browser artifacts, such as Cookies and History, which are traditionally extracted during the course of a digital investigation. We observed the widespread
adoption of web storage technology, which served as the main motivation for the development of a forensic tool to aggregate data from web storage files. We designed and implemented a tool, BrowStEx, through which one can analyze web storage artifacts on Windows platform. It parses both SQLite files and XML files in web storage used by the five major web browsers. The information presented in an aggregated view can help digital investigators review and analyze the data in a unified manner. In this paper, we limited our investigation to the Windows operating system on the desktop version of the websites, but future research will delve deeper into other platforms, such as Unix/Linux and Mac OS. The web storage feature of HTML5 in the area of mobile forensics is also of our future interest since mobile web browsers and applications have been early adopters of the new HTML5 specifications. Therefore, the web storage implementation on mobile websites needs to be explored as well. Moreover, recently a new client-side storage standard, namely Indexed Database (IDB), has been introduced (Jakus et al., 2010). It was initially proposed by Oracle in 2009 and shares many similarities to web storage. The basic structure of the IDB is a database file, specifically .ldb for Chrome and Opera, and .sqlite for Firefox, which stores key/value pairs. The primary difference however is that IDBs are, as the name would entail, indexed, which allows a web
Fig. 4. BrowStEx TimeLine search.
A. Mendoza et al. / Digital Investigation 14 (2015) 63e75
application to efficiently access and modify the database. To be more specific, a B-tree is implemented in order to efficiently modify and traverse the database. Even though IDB is designed to be more efficient than web storage, the majority of the widely visited websites does not utilize IDB to store local data, yet. However, it is likely to get more attention in the near future. Therefore, it seems to be worth exploring it as a future work. Acknowledgment Funding and support for this research project was provided by the Office of Research and Sponsored Programs (ORSP) award number: 290067 and The Center of Excellence in Digital Forensics at Sam Houston State University. References Alexa. Alexa us website rankings. 2013. URL, http://www.alexa.com/ topsites. Altheide C, Carvey H. Digital forensics with open source tools: using open source platform tools for performing computer forensics on TargetSystems: Windows, Mac, Linux, Unix, etc. Elsevier; 2011. Anthes G. Html5 leads a web revolution. Commun ACM 2012;55(7):16e7. Ayenson M, Wambach DJ, Soltani A, Good N, Hoofnagle CJ. Flash cookies and privacy ii: now with html5 and etag respawning. Social Science Research Network; 2011. Bogaard D, Johnson D, Parody R. Browser web storage vulnerability investigation: html5 localstorage object. In: 2012 International Conference on Security and Management; 2011. CanIUse.com. Web storage compatibility. 2015. URL, http://caniuse.com/ #feat¼namevalue-storage. Accessed 4/4/2015. Crowley M. Interoperability and compatibility. In: Pro Internet Explorer 8 & 9 Development. Springer; 2010. p. 39e53. Curran K, George C. The future of web and mobile game development. Int J Cloud Comput Serv Sci (IJ-CLOSER) 2012;1(1):25e34. Digital Detective Group Ltd. Netanalysis. 2014. URL, http://www.digitaldetective.co.uk/netanalysis.asp. EnCase Forensic. Encase. 2014. URL, https://www.guidancesoftware.com/ products/Pages/encase-forensic/overview.aspx. Forensic Toolkit (FTK). Ftk. 2014. URL, http://accessdata.com/solutions/ digital-forensics/forensic-toolkit-ftk/. Garsiel T, Irish P. How browsers work: behind the scenes of modern web browsers. Google Project, August. 2011. Hanna S, Shin R, Akhawe D, Boehm A, Saxena P, Song D. The emperors new apis: on the (in) secure usage of new client-side primitives. In: Proceedings of the Web, Vol. 2; 2010.
75
Hibshi H, Vidas T, Cranor LF. Usability of forensics tools: a user study. In: IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on. IEEE; 2011. p. 81e91. Jackson C, Barth A. Beware of finer-grained origins. Web 2.0 Security and Privacy Workshop. 2008. Jakus G, Jekovec M, Toma zi c S, Sodnik J. New technologies for web development. Ljubljana, Slovenia: Electro technical Review; 2010. Jones KJ. Forensic analysis of internet explorer activity files. Forensic Analysis of Microsoft Windows Recycle Bin Records. 2003. Kamkar S. Evercookie-virtually irrevocable persistent cookies. 2010. URL, https://github.com/samyk/evercookie. Laine M. Client-side storage in web applications. 2012. Lekies S, Johns M. Lightweight integrity protection for web storage-driven content caching. In: 6th Workshop on Web, IEEE Symposium on Security and Privacy, Vol. 2; 2012. Mandiant Corporation. Web historian. 2014. URL, https://www.mandiant. com/resources/download/web-historian. Mickens J. Silo: exploiting javascript and dom storage for faster page loads. In: Proceedings of the 2010 USENIX conference on Web application development. USENIX Association; 2010. p. 9. Microsoft. userdata storage. 2014. URL, http://msdn.microsoft.com/en-us/ library/ms531424(VS.85).aspx. Microsoft MSDN Library. Web storage. 2012. URL, http://msdn.microsoft. com/en-us/library/bg142799.aspx. Microsoft SysInternals. Process monitor. 2015. URL, http://live. sysinternals.com. Oh J, Lee S, Lee S. Advanced evidence collection and analysis of web browser activity. J Dig Invest 2011;8(0):S62e70. the Proceedings of the Eleventh Annual {DFRWS} Conference 11th Annual Digital Forensics Research Conference. URL, http://www.sciencedirect.com/ science/article/pii/S1742287611000326. Ohana DJ, Shashidhar N. Do private and portable web browsers leave incriminating evidence?: a forensic analysis of residual artifacts from private and portable web browsing sessions. EURASIP J Inf Secur 2013;2013(1):1e13. Pilgrim M. The past, present, and future of local storage for web applications. 2010. URL, http://diveintohtml5.info. Soltani A, Canty S, Mayo Q, Thomas L, Hoofnagle CJ. Flash cookies and privacy. In: AAAI Spring Symposium: Intelligent Information Privacy Management; 2010. Sorensen O. Zombie-cookies: case studies and mitigation. In: Internet Technology and Secured Transactions (ICITST), 2013 8th International Conference for. IEEE; 2013. p. 321e6. SQLite.org. Sqlite. 2013. URL, http://www.sqlite.org/. StatCounter. Browser market share. 2015. URL, http://gs.statcounter.com/ #browser-ww-monthly-201501-201503-bar. W3C. Web storage specification api. July 2013. URL, http://www.w3.org/ TR/2013/REC-webstorage-20130730/. West W, Pulimood SM. Analysis of privacy and security in html5 web storage. J Comput Sci Coll 2012;27(3):80e7.