ECS 235A Project - NVD Visualization Using TreeMaps Kevin Griffin Email: kevgriffin@ucdavis.edu December 12, 2013 1 Introduction The National Vulnerability Database (NVD) is a continuously updated United States Government repository of vulnerability data [2]. The repository contains a large set of data dating back to around 1997 to the present. The NVD is also a multivariate dataset containing attributes like vulnerability score, attack vector, access complexity, integrity impact, etc. The NVD website provides an interface for users with a priori knowledge and clues to conduct targeted searches of the underlying data. There are also applications, like Nessus 1, that use various components of this data. However, what is missing is a way to explore and visualize the underlying dataset, without a priori knowledge and clues, to find trends and vulnerabilities of interest for analysis and hypothesis generation. Traditional visualizations fall short for two main reasons. The first reason is that visualization components like bar, line, and pie charts are not space filling which only allows a very limited amount of data to be visualized at once. This is an issue with NVD since it contains over fifteen years of vulnerability data. Second, most traditional visualizations typically can only handle data with a single attribute. NVD is a multivariate data set that revels a lot of information to the user when subsets of these attributes are visualized together. The purpose of this research is to demonstrate how using a lesser known and utilized visualization, the treemap [3] [7] [8], can solve the short-comings associated with traditional visualizations by being able to visualize large datasets, because its a space filling visualization that can use the entire display space, and 1 http://www.tenable.com/products/nessus is able to handle multivariate data. Multivariate data is visualized with treemaps by mapping the various attributes of the NVD data to the various visual attributes of the treemap like size, shape, color, and height. The main contributions of this project are: 1. Understanding Treemap s utility for visualizing large data sets 2. Measuring Treemap s utility for visualizing multivariate data 3. Showing Treemap s advantages over traditional visualizations (i.e line and bar charts) 4. Visual Analysis Tool. The current system provides a simple, interactive visual analysis environment to explore the NVD data. Coordinated Visualization Views. The system consists of a main overview, using a treemap that was invented in the early 1990s by Ben Shneiderman at the University of Maryland, and two secondary bar chart views. All of these views are integrated together and allows the user to perform detailed analysis of the NVD data. Filtering. Programmatic filtering of the NVD data has been implemented and is based on the year the vulnerability of interest was discovered. Future enhancements will allow the user to filter on other attributes of the data, like vendor, product, and access complexity, in real time from the user interface. This will give the 1
user the ability to explore the underlying data, without a priori knowledge, to find trends and vulnerabilities of interest for analysis and hypothesis generation. 2 Related Work The work done by [5] uses NVD along with other security metrics (Nessus scans, router configurations, and firewall rules) to create custom security metrics (Patch Risk, Criticality, Security Score, Time Series) and visualize them using scatter graphs, pie charts, ring graphs, bar charts, histograms, and quartiles (see Figure 1). They also provide a modest what-if visual analysis of security changes to the computers and networks. Identifier (CWE-ID) and distribution of date-time. The tool enables the user to: 1. Filter the data in a variety of ways. NVDvis can filter on the vulnerability score as well as the six attributes that contribute to the score: Access vector, access complexity, authentication, confidentiality impact, integrity impact, and availability impact. It also provides access to Part (application, hardware, operating system), CWE-ID, date-time, and vendor. After each filtering operation, the Data Analysis pane is updated as well as the visualization. 2. Parallel Coordinate plot the data. These plots are a way to visualize multidimensional data. They were invented by Alfred Inselberg, who has a tutorial online. Our visualization can be viewed both on the desktop as well as in our immersive environment. 3. Output data in csv, arff, or binary format for further analysis Figure 1: Automatic Security Analysis Dashboard The Scientific Applications & Visualization Group within the National Institute of Standards and Technology (NIST) created a tool, NVDvis (see Figure 2), that reads the lastest version of the National Vulnerability Database [4]. The user can choose Common Vulnerabilities and Exposures (CVE) 1.2 or 2.0. The tool does an initial analysis that is displayed in the Data Analysis pane of the tool. It displays which CVE database was selected and how many entries there were. It provides the average vulnerabilty score as well as the distribution of the scores. NVDvis also gives the number of elements as well as the percentage for each value of the six attributes that make up the score as well as the part and Common Weakness Enumeration Figure 2: NVDvis Other visualization work using this type of data has been primarily in the form of attack graphs. The work by [6] is an example of this type of work. CVE data, which is a subset of the NVD data, is used to identify hosts in a network that have vulnerabilities. An attack graph is then generated that shows the sequence of hosts that an attacker can exploit to gain access to a system. Figure 3 illustrates this type of visualization with the CVE data overlaid on the graph. 2
Figure 3: Attack Graph 3 System Architecture The overall system architecture is illustrated in Figure 4. The database is initially populated with data from the NVD XML Data feed with Common Vulnerability Scoring System (CVSS) and Common Platform Enumeration (CPE) mappings (version 2.0). Each year s published vulnerabilities are kept in an XML file of the format nvdcve- 2.0-[year recent modified].xml, where year [2002... 2013]. The file nvdcve 2.0 [year].xml contains all of the vulnerabilities found in year, nvdcve-2.0- recent.xml contains all of the recently published vulnerabilities, and nvdcve-2.0-modified.xml contains all of the recently published and recently updated vulnerabilities. The files are parsed, using a SAX parser, and inserted into a MySQL 2 database. The complete dataset contains over sixteen years of vulnerability data totaling more than 1.5 million database records. Finally, once the view is ready to be made visible, the data is formated and placed into an appropriate data structure by the Viz Pre- Processor. The pre-processor then hands the data off to the visualization interface. 3.1 Data Storage The data is stored in a MySQL database using the schema shown in Figure 5. The entity table contains most of the data parsed from the XML file except for the vulnerable software information and the CWE identifiers. The entity table contains over 58, 000 records. The software table stores, along with other attributes, the name of the vendors, vendor s products, and product versions affected by vulnerabilities stored in the entity 2 http://www.mysql.com Figure 4: System Architecture table. The software table contains over 148, 000 records. The entity software join table maps the CVE vulnerability in the entity table to the vulnerable product in the software table. This table is the largest with approximately 1.6 million entries. Figure 5: Database Schema 3.2 Visualizing Large Data Sets As Figure 6 shows, treemaps are very good for displaying large datasets because of its space-filling characteristics. The treemap visualization on the left is displaying over 10,000 software products. In contrast, the bar charts on the right, both top and bottom, are displaying 20 products/vendors combined. If you increase that number to only 100 the two bar chart visualizations become almost un- 3
readable. full meaning of the underlying dataset. Figure 6: Microsoft 3.3 Figure 7: Apple Visualizing Multivariate Data 3.4 As stated earlier, NVD is a multivariate dataset. Multivariate data requires a subset of its attributes to be visualized together before the user can start extracting useful meaning from the underlying dataset. For example, Figure 7 shows vulnerability data for Apple in both the Treemap display on the left and the bar chart at the top right. The bar chart gives the vulnerability count for each Apple product. While this gives the user some information, it falls short on providing a complete understanding of the underlying data. In particular, it doesnt answer questions like; What type of vulnerabilities are they? How many vulnerabilities were severe (root access) or just minor nuisances? or What vulnerabilities are easy to exploit? If we assume that the size of each treemap node indicates how difficult/easy a vulnerability is to exploit and the color (red = severe, green = minor) indicates the severity of the exploit, we can see that we start to get a better understanding of the underlying NVD dataset. At a glance we get a rough idea of how many severe vulnerabilities each product has, how easy it is to exploit them, and how each of the vulnerabilities for each product compare to each other. Furthermore, if other attributes were mapped to the height of each node we get an even better visual interpretation of the underlying data set. Because of the ability to map multiple attributes to treemap attributes, treemaps are exponentially better than bar charts at conveying the 3.4.1 Visual Analysis Tool Overview The visual analysis tool was designed using a treemap visualization as its main display with coordinated bar chart views for providing detailed information on selected nodes (see Figure 8). There are two groupings used for the treemap visualization. The main grouping is based on the vendor (i.e Microsoft) and the subgrouping is based on the vendor s product (i.e. Internet Explorer). The nodes in the treemap represent a one-to-one mapping of vulnerability to vendor s product. A semitransparent tooltip dialog shows additional details for each node as the user probes the treemap. The top right bar chart provides the vulnerability count for the selected vendor s top ten products. The bottom right bar chart provides the overall vulnerability count for the top ten vendors. The JFreeChart [1] API was used to implement the bar charts. 3.4.2 Future Work Real-Time Filtering: Currently the data is only filtered by the vulnerability discovery year. A very useful enhancement is to allow the user to be able to filter the data, in real-time, on the various attributes of the dataset. The NVD XSD file (nvd.nist.gov/schema/nvdcve-feed 2.0.xsd) can be viewed for the complete 4
Figure 8: NVD Visualization list of attributes to filter on. of the treemap, and ordering of the treemap nodes based on certain characteristics of the node like size. Automated Analysis: Future work in this area will include automatically infering trends and patterns about the data. Important things to infer would be: Vendors/Products that are the worst/best for providing a particular capability (i.e. Web Server) 4 Conclusion This project allowed me to experiment with visual Products that are potentially targets of the izing a large, multivariate dataset using treemaps. next round of zero-day exploits The preliminary results showed some of the ad The Vendors/Products most susceptiple to a vantages of using treemaps over traditional visualizations. In particular, treemaps proved to be certain type of exploit (buffer overflow) very effective at visualizing large quantities of data TreeMap Enhancements: and providing a more accurate visual interpretaadditional enhancements to the treemap include; tion of the underlying dataset. Future enhancemapping of dataset attributes to the height of the ments will provide a more robust exploration and treemap nodes, semantic zooming, the ability to visualization capability for the National Vulneradrill up/down on a particular group or subgroup bility Database. 5
References [1] Jfreechart - http://www.jfree.org/jfreechart/. [2] National Vulnerability Database NVD - http://nvd.nist.gov/home.cfm. [3] Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies. ACM Trans. Graph., 21(4):833 854, 2002. [4] John Hagedorn Styvens Belloge Terence Griffin Sandy Ressler Judith E. Terrill, Kevin Rawlings. Visualization and analysis of the national vulnerability database - http://www.nist.gov/itl/math/hpcvg/nvdvis.cfm. [5] Sun Kun, S. Jajodia, J. Li, Cheng Yi, Tang Wei, and A. Singhal. Automatic security analysis using security metrics. In MILITARY COMMUNICATIONS CONFERENCE, 2011 - MILCOM 2011, pages 1207 1212. [6] O. Sheyner and J. Wing. Tools for generating and analyzing attack graphs. In Formal methods for components and objects, pages 344 371. Springer. [7] Ben Shneiderman. Tree visualization with treemaps: 2-d space-filling approach. ACM Trans. Graph., 11(1):92 99, 1992. [8] Ben Shneiderman. Treemaps for spaceconstrained visualization of hierarchies, 2009. 6