Dr Andrew Blyth School of Computing, University of Glamorgan, Pontypridd CF37 1DL, UK. Tel +44 1443 48 2245, E-mail: [email protected]
An XML-based architecture to perform data integration and data unification in vulnerability assessments Abstract One of the problems facing penetration testers is that a test can generate vast quantities of information that need to be stored, analysed and cross-referenced for later use. Consequently, this paper will present an architecture based on the encoding of information within an XML document. We will also demonstrate how, through application of the architecture, large quantities of security-related information can be captured within a single database schema. This database can then be used to ensure that systems are conforming to an organisation’s network security policy. Vulnerability assessment; Penetration testing; XML; XML encoding; Information unification and retrieval.
1 Introduction The growth of the Internet over the past 20 years has resulted in more and more companies going online in an attempt to engage in electronic commerce. Within Europe and the UK, data protection legislation has been enacted to force companies to take due care in protecting an individual’s personal data. One of the effects of this legislation is that companies are now performing security/vulnerability assessments of their IT infrastructure. While various definitions of security, threat and vulnerability have been proposed [1–4], for the purpose of this paper I will define a vulnerability and a vulnerability assessment as follows:
A vulnerability is a weakness in the system that could allow security to be violated. A vulnerability assessment is the systematic identification and validation of the vulnerabilities that exist within a given infrastructure. Various methods, such as in [5] and [6], have been proposed to help people and organisations perform a vulnerability access. Some methods, such as in [5], even support the use of deception in a network to confuse and obfuscate the attacker. The problem that companies encounter when performing this assessment is how to store and analyse the large quantities of data that is generated by the even larger number of tools being used to perform a security assessment. Tools such as Nessus, Retina, Net-recon and Whisker, to name but a few, all attempt to identify vulnerabilities on a target system and produce information in different formats. None of these tools share information and, consequently, they do not interoperate easily. Typically, when an analyst is performing a penetration test, they are using pencil and paper to record all salient information. This process can become extremely difficult, tedious and prone to error when a large number of hosts are being analysed. Consequently, what is required is a way to unify these outputs so that they can easily be unified into a single representation of the vulnerability state of a machine. In this paper I will develop an open architecture based on XML and
Parser
SQL
Database
Dr Andrew Blyth An XML-based architecture to perform data integration and data unification in vulnerability assessments
demonstrate how this architecture can be used to integrate disparate data from a variety of tools into a single, unified view of the security state of a system. XML is a general-purpose information mark-up language that supports the encoding of information [8–10]. By using XML as the communication medium, the proposed architecture will also support the concept of plug-and-play with reference to the addition of new security tools.
2 The architecture At its simplest level, the architecture functions to encode data from a data source into a set of predefined XML documents. These documents can then be parsed and insertion statements generated that comply with a single database schema. The database schema functions to define the unified vulnerability state of the target system (see Figure 1). By encoding the results of a security tool into an XML document, that architecture supports the concept of plug-and-play, as any tool that can produce an XML-compliant document can be used to capture information relating to the security state of the system. In addition, tools that do not provide native support for the production of XML can have a wrapper produced for them. This wrapper would take the native information produced by the tool, parse it, and then use the parsed information to create an XML document. This use of wrapping tools/systems in applications that can produce XML has been applied successfully in the area of legacy systems’ management [11]. Consequently, it is conceivable that any tool could be used to capture securityrelated information. The ability of the architecture to support plug-and-play of new security tools is derived from the ability of an XML encoder to be created that parses the
information produced by the security tool and inserts it into the predefined XML documents. For example, we could write an XML encoder that parses the output produced by nmap [12] and then, using this information, constructs an XML document (see Figure 2) which is semantically equivalent to the output produced by the nmap tool while being syntactically different. Currently, the architecture only makes use of two types of XML documents. The first type of document is used to define whether a TCP/UDP port is open or closed on the target system; the second type of document is used to record the existence of a vulnerability on the target system. In Figure 2 we can see that the port scanning tool called strobe is being used to execute a TCP Connect scan against the target of www.target.com on port 80. Having performed a port scan against the target and identified that port 80 is open, we would now take a HTTP vulnerability scanning tool and use that for identification. 18-12-203 06:37:15 www.my-target.comStrobeTCP ConnectOpen Figure 2: Type-one XML document.
Information Security Technical Report. Vol. 8, No. 4
15
Vulnerability Assessment
18-12-203 06:37:15www.target.comMTX-ScannerCVE-1999-1011529Windows-NTMicrosoftAccess Validation Error Figure 3: Type-two XML document. The output from the port scanning tool, or the vulnerability scanning tool, is used to create the XML document that is then passed to the parser, which uses it to create a DOM Tree [10]. The parser parses the XML documents with reference to their document type definitions (DTD) to check that the XML documents are valid and well formed. The Document Object Model is a platform-neutral and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. A DOM tree is a tree-structured directed graph that provides access to the elements/tags, attributes and information contained in an XML document. The DOM tree is used by the parse as a vehicle through
16
which information can be extracted and SQL insert statements generated. Figure 4 defines the database schema that is used to store the snap-shot of the target system with reference to the network services that are being executed. The database definition of the network services allows us to capture the list of network services that the target system is making available. The main relation that is used to store information regarding a port scan is the scan table. This relation stores information about when the scan took place, the target of the scan, the type of scan, and the result of the scan. The state of each port is stored in the port table. This table stores information about the protocol that is used to connect to the port, the port
Information Security Technical Report. Vol. 8, No. 4
Dr Andrew Blyth An XML-based architecture to perform data integration and data unification in vulnerability assessments
CREATE TABLE port ( port_id
INT4 NOT NULL,
port
INT4 NOT NULL,
protocol
CHAR(1) NOT NULL,
name
VARCHAR(8),
PRIMARY KEY (port_id)); CREATE TABLE system
( sysm_id
SERIAL NOT NULL,
ip
CIDR NOT NULL,
target
VARCHAR(32),
type
VARCHAR(4),
PRIMARY KEY(sysm_id)); CREATE TABLE type ( type_id
PRIMARY KEY (scan_id)); Figure 4: Network services database schema. number, and the name of the service that is normally associated with that port number. Information on the target of the scan is stored in the system table. This relation defines the IP address of the target [13], the DNS name of the target, and the DNS record type of the target [14]. These three relations allow us to capture the state of the machine from the perspective of its network services. Consequently, we can track this state and, through the use of triggers in our database, we can ensure that we can identify when a system violates the network security policy.
References [5] and [6] show that the first part of any penetration test and vulnerability assessment begins with identifying the reachable open TCP and UDP ports on that target system. This assessment can be achieved via a variety of tools such as nmap [12]. Having utilised a port-scanner to identify a list of open ports, the next step is to identify if any of the reachable ports are offering services that contain vulnerabilities. The vulnerability database schema is based around the vuln table. This relation
Information Security Technical Report. Vol. 8, No. 4
17
Vulnerability Assessment
CREATE TABLE cve ( cve_id
SERIAL NOT NULL,
cve_no bugtraq_id
VARCHAR(16), INT8,
description VARCHAR(388), PRIMARY KEY (cve_id)); CREATE TABLE os ( os_id
Information Security Technical Report. Vol. 8, No. 4
Dr Andrew Blyth An XML-based architecture to perform data integration and data unification in vulnerability assessments
captures information relating to when a vulnerability assessment was performed, the target of the assessment, the tool used to make the assessment, and the state of the assessment. The relation state is used to hold information regarding the individual vulnerabilities that have been identified on the target system. This relation stores information relating to the CVE number of the vulnerability, the operating system, vendor and object that are vulnerable, and whether the vulnerability is local or remote. The relation CVE provides a detailed breakdown of the information associated with a particular CVE/CAN number. The relation vendor provides information pertaining to the version number and name of the specific application that is vulnerable. The relation OS provides a detailed breakdown of the type and version of the operating system that is executing on the vulnerable system. Using the Python programming language [15], a variety of simple tools has been developed that supports the identification of vulnerability-related information and its encoding in XML. For example, the tool psxml is a simple port scanning tool that uses the three-way TCP handshake to identify if a port is open or closed on the target system. This information is then
encoded and used in the XML document defined in Figure 2. This XML document is then passed to the back-end XML database system called xmldb. This tool uses the XML documents defined in Figures 2 and 3 to produce a domain object model (DOM) for the XML documents. This DOM is then queried and a set of SQL insert/query statements is generated that complies with the database schema defined in Figures 4 and 5. Figure 6 shows an example of the psxml and xmldb tools running in verbose mode. Both the tools psxml and xmldb by default work in silent mode, only echoing error messages to the screen on standard error. The standard error output channel is used to display the information when the psxml tool is executed in verbose mode. By default, the psxml tool displays the XML definition of the results to standard out. This XML document is then read as input into the xmldb tool. The psxml tool provides a set of parameters that allows us to specify the ports, and the host or network address, that we would like to scan. If no host or network address is provided, then the tool will scan the local host (IP:127.0.0.1). If no ports are specified, then the list of
$ psxml –p 80 –v –h 10.63.19.12 | xmldb –v –c /etc/xmldb.conf The PortScanning XML Tool Version 1.0 ([email protected]) Interesting ports on www.my-victim.com (10.63.19.12) Port
State
Service
80
open
http
Connecting to database xmldb on host db.my-hacker.ac.uk Inserting Information regarding port: 80/open/http Figure 6: Port scanning and database tools output.
Information Security Technical Report. Vol. 8, No. 4
19
Vulnerability Assessment
$ psxml –p 80 –p 25 –p 22 –s –h mail.my-victim.com 25-12-2003 07:57:55mail.my-victim.compsxmlTCP ConnectOpenOpenclosed Figure 7: Port scanning the resulting XML. ports found in /etc/services are used. The following is an example of the psxml tool running in silent mode and the XML document that it produces on the standard output. Examining the XML output produced by the psxml scanning tool, we observe that the scan took place on 25 December 2003 at 07:57:55 GMT. The target was mail.my-victim.com with an IP address of 10.63.19.13. Only the three ports of 22, 25 and 80 were scanned, with the result that port 22 and port 25 were identified as being open and port 80 was identified as being closed. The port scanning technique that was used was that of a TCP Connect scan. Typically, this XML document would then be fed as standard input into the xmldb tool. The xmldb tool takes an XML document as input and, from that input,
20
produces a DOM tree. From that DOM tree it derives a set of SQL insert statements that it then executes. By using a standard tool to insert information into the database, the tool-set becomes flexible and extendible. In addition, the database will hold an error record of the vulnerability results as detected by the various tools. In using XML as the medium through which a scanner records information, the tool-set can easily be extended. When a new security vulnerability assessment tool is created, all that is required is that a wrapper is produced that can parse its output and produce a valid XML document. The xmldb tool can be configured at the command line to run in verbose or silent mode. By default, it will always execute in silent mode and get the database configuration information from the file /etc/xmldb.conf. At the command line, we can also specify the location of a different configuration file.
Information Security Technical Report. Vol. 8, No. 4
Dr Andrew Blyth An XML-based architecture to perform data integration and data unification in vulnerability assessments
$ cat /etc/xmldb.conf # xmldb configuration file create on the 31st August 2003 # Configuration file created by A Blyth([email protected]) # # Define the host that is running the database host
db.my-hacker.ac.uk
# Define the name of database dbname
xmldb-05
# Define the username used to connect to the database username
postgres
# Define the password used to authenticate to the database passwd
Babylon5
# End of Configuration File Figure 8: An xmldb configuration file. The xmldb configuration file is used to define the host that is executing the database, along with the database name and the username and password used for authentication purposes. Figure 8 illustrates a typical configuration file that might be used on a vulnerability assessment exercise. The keyword of the host is used to define the name of the machine that is running the database. This name can be given in DNS format [14] or IP format [13]. The keyword of dbname is used to identify uniquely the name of the database. The keyword of the username is used to identify the user on the host that is executing the database and that is identified by the keyword host, which has the right to insert data into the database identified by the dbname keyword. If, for any reason, any of these parameters are not provided, then the program will assume that: (a) the database is executing on the local machine, (b) the database name is xmldb, and (c) the username and password used for
authentication purposes is postgres. The xmldb tool echoes any error messages, relating to parsing the XML document or connecting/inserting information into the database, to standard error. The ability to change the name of the database where the vulnerability assessment information is being stored allows for information relating to multiple tests performed at multiple times to be stored and contrasted. Specific vulnerability assessment tools can be developed that test for specific vulnerabilities and then record that information via an XML document. For example, the tool webxml is designed to test for the set of vulnerabilities that can exist on web servers. This tool is simply the Whisker tool with an XML wrapper. It echoes the XML document to the standard output and any error messages to standard error. In addition, the tool can also execute in verbose mode, in which case information relating to identified vulnerabilities will also
Information Security Technical Report. Vol. 8, No. 4
21
Vulnerability Assessment
$ webxml –p 80 –h www.my-victim.com 25-12-2003 07:57:55www.my-victim.comwebxmlCVE-2000-0457Microsoft Windows NT ServerMicrosoftBoundary Condition Error Figure 9: The webxml vulnerability assessment tool. be echoed to standard error. Figure 9 is an example of how we can use webxml to identify vulnerabilities on a target system. By default, the webxml tool will always execute in silent mode. In Figure 9 we can see that the webxml tool is scanning the host www.myvictim.com on port 80. The tool has only identified a vulnerability with the CVE number of CVE-2000-0457. The tool has also identified the version number of the IIS server that is running on port 80 and the build of the operating system that is executing the IIS server. The tool does not know the BugTarqID associated with the CVE number. When the xmldb tool parses the output produced by the webxml vulnerability scanner it will also query the database to fill in any bits of information
22
that the scanner has failed to identify. The output produced by this tool is printed on standard output and can thus be used as input for the xmldb tool. Once we have used some tools to identify the security state of the system in terms of how vulnerable the system is, we can then query the database to ascertain the level of threat posed by the vulnerabilities present. In addition, we can use the database to check that the target system adheres to the specified security policy. For example, we can have the security policy that, on our corporate Intranet, only certain specified systems could run a WEB service, and that those systems must have use of the latest security patches. We can use the psxml tool to scan TCP port 80/http on every system on our corporate Intranet to identify
Information Security Technical Report. Vol. 8, No. 4
Dr Andrew Blyth An XML-based architecture to perform data integration and data unification in vulnerability assessments
$ /usr/local/pgsql/bin/psql xmldb-05 xmldb-05=# select scan.timestamp, system.ip, system.target from port, system, scan where port.port=80 and port.port_id=scan.port and scan.target=system.sysm_id;
Time-stamp
|
IP
|
+
Target
+
2003-12-25 07:32:27 |
10.63.19.12
| www.my-victim.com
2003-12-25 07:32:27 |
10.63.128.37
| wkstn07.my-victim.com
2003-12-25 07:32:27 |
10.63.128.37
| wkstn13.my-victim.com
Figure 10: Querying the vulnerability database for a service. all the systems that are running a WEB server. Then we can use the webxml vulnerability assessment tool to scan every machine on the corporate Intranet on port 80 and identify if, and how, any system is vulnerable to attack. Once we have completed these scans, we can then query the database to identify the results and, via a visual comparison with the approved security policy, we can identify any machines that are in breach of it. By analysing the output contained in Figure 10 we can see that three computer systems are running web servers on port 80 on our Intranet. If it is the policy that only the system www.my-victim.com can run a web server, then we can say that we have identified two systems that have violated the security policy. By having a time-stamp on when the scan was performed, we can track a system over time. Having identified the systems that are running a web service, we can now start drilling down on a specified target to identify the vulnerabilities that exist on the target. A specific tool has been created to allow us to generate text reports for the vulnerability state of a specific machine.
From Figure 11 we can see that the computer system in question is running the Microsoft Windows 2000 Advanced Server with IIS 5.0 functioning as its web server. The target system currently has three vulnerabilities: CAN-2003-0224/CAN2003-0225 and CAN-2003-0226. All three of these vulnerabilities can be fixed by applying the Q811114. The database currently supports the use of Common Vulnerability Exposure Numbers and Common Vulnerability Advisory Numbers as a dictionary for defining a vulnerability.
3 Summary and conclusions The role and function of this tool-set is to allow system administrators to measure the number of vulnerabilities that exist on Intranets via the use of open and extendible tools. Via the use of XML, tools can share data between themselves, and that data can be integrated and unified with other data sources (see Figure 1). XML provides us with the ideal vehicle for encoding information. This encoding functions as a universal language from which other tools can parse and extract information. By encoding this information into an SQL
Information Security Technical Report. Vol. 8, No. 4
23
Vulnerability Assessment
$ vbmxlrep
10.63.19.12
Vulnerability Assessment Tool – Version 1.0 ([email protected]) Date and Time of Report Generation: 2003-12-25 12:54:56
Target Name/IP:
www.my-victim.com/10.63.19.12
OS/Vendor:
Windows 2000 Advanced Server/Microsoft
Service:
Microsoft IIS 5.0
————————————————————————————————Vulnerability:
CAN-2003-0224 (Remote)
Description:
Microsoft IIS ASP Header Denial Of Service Vulnerability
Microsoft Patch:
Q811114
————————————————————————————————Vulnerability:
CAN-2003-0225 (Remote)
Description:
Microsoft IIS SSINC.DLL Server Side Includes Buffer Overflow Vulnerability
Microsoft Patch:
Q811114
————————————————————————————————Vulnerability:
CAN-2003-0226 (Remote)
Description:
Microsoft IIS WebDAV PROPFIND and SEARCH Method Denial of Service Vulnerability
Microsoft Patch:
Q811114
————————————————————————————————Figure 11: Querying the vulnerability database for vulnerabilities. database, we then gain the ability to capture, query and manipulate vast quantities of information, thus removing the need for a penetration tester to keep detailed paper and pencil notes of the state of a security test. By using a single consistent database design that adheres to the XML/DTD
24
definition of what data can be encoded, we can create a picture of the vulnerability state of a system and use this picture to enforce and measure the level of adherence to a network security policy. Security management standards such as BS7799 and ISO 17799 [16] force us to create security policies that encompass network security.
Information Security Technical Report. Vol. 8, No. 4
Dr Andrew Blyth An XML-based architecture to perform data integration and data unification in vulnerability assessments
Via the encoding for vulnerability-related information in an XML document and the integration/unification of many such documents into a single database schema, we can start to measure the number of vulnerabilities that exist on a network.
4 References [1] John M. Carroll, 1996. Computer Security, Third Edition, Butterworth-Heinemann, 1996, ISBN 0-75069600-1. [2] Charles P. Pfleeger and Shari Lawrence Pfleeger, 2003. Security in Computing, Third Edition, PrenticeHall, 2003, ISBN 0-13-035548-8 [3] Andrew Blyth and Gerald L. Kovacich, 2001, Information Assurance: Surviving in the Information Environment, Springer-Verlag, 2001, ISBN 1-85233-326-X. [4] Andy Jones and Iain Sutherland, 2003, Threats to Information Systems and the Way we Deal with Them, Information Security Bulletin, Volume 8, Number 4, Pages 143-155, 2003. [5] Stuart McClue, Joel Scambray and George Kurtz, 2003, Hacking Exposed, Osborne, 2003, ISBN 0-07222742-7. [6] Pete Herog, 2000, Open-Source Security Testing Methodology Manual, Version 2.0, August 2000, http://www.isecom.org/projects/osstmm.htm [7] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler (Eds), 2000, Extensible Markup Language (XML)
1.0 (Second Edition), W3C Recommendation 6 October 2000, http://www.w3.org/TR/REC-xml. [8] Takeshi Imamura, Blair Dillaway and Ed Simom, 2002, XML Encryption Syntax and Processing, W3C Recommendation 10 December 2002, http://www.w3.org/TR/xmlenc-core/ [9] Phillip Hallam-Baker (Eds), 2003, XML Key Management Specification (XKMS), W3C Working Draft 18 April 2003, http://www.w3.org/TR/xkms2 [10] Arnaud Le Hors, Philippe Le Hégaret, Lauren Wood, Gavin Nicol, Jonathan Robie, Mike Champion and Steve Byrne, 2000, Document Object Model (DOM) Level 2 Core Specification, Version 1.0, W3C Recommendation 13 November 2000, http://www.w3.org/TR/DOM-Level-2-Core [11] Michael Stonebraker and Joseph M. Hellerstein, 2001, Content Integration for E-Business, Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, ACM Press, May 2001. [12] The NMAP Network Mapper, http://www.insecure.org/nmap/index.html [13] Douglas Comer, 2000, Internetworking with TCP/IP: Principles, Protocols and Architectures – Volume 1, Fourth Edition, Prentice Hall, 2000, ISBN 0-13-018380-6. [14] Paul Albitz and Cricket Liu, 1998, DNS and BIND, Third Edition, O’Reilly, 1998, ISBN 1-56592-512-2. [15] Mark Lutz, 2001, Programming Python, Second Edition, O’Reilly, 2001, ISBN 0-596-00085-5. [16] B. Dodwell, 1997, Managing Information Security Achieving BS7799 (Financial Times Management Briefings), Financial Times Prentice Hall, December, 1997.
Information Security Technical Report. Vol. 8, No. 4