Federal Mobile App Vetting Center for Assured Software Pilot Nick Valletta
MOBILE SECURITY: A REAL ISSUE Slide 2 of 20
NOTEWORTHY ANDROID APP TRENDS 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Trends Among Top 5000 Google Play Apps These trends highlight the readily apparent need to develop a mobile application security testing methodology. Slide 3 of 20
UNCLASSIFIED//FOR OFFICIAL USE ONLY CENTER FOR ASSURED SOFTWARE MISSION To substantially increase the degree of confidence that software used within DoD's critical systems is free from exploitable vulnerabilities, either intentionally or unintentionally introduced, using: Scalable Tools Scalable Techniques Scalable Processes Scalable => Timely => Effective Slide 4 of 20 UNCLASSIFIED//FOR OFFICIAL USE ONLY
CAS BACKGROUND IN MOBILITY The CAS surveyed the market, looking for the leading commercial, open source, academic, and free software tools for the analysis of mobile applications. The CAS performed an initial test of the top ten tools from this survey by analyzing open source applications and comparing the findings to Java code (10x10 Study). One of the major conclusions of the 10x10 study was that no one tool is adequate for mobile application testing. Furthermore, the dynamic nature of mobile app development and deployment necessitates a quick, cost-effective method for assessing mobile software assurance. Slide 5 of 20
WHY A PILOT? The CAS is working on a mobile application testing pilot with the goal of answering the following questions: How do we efficiently scale mobile application assessments? How do we trust that a given tool s finding is accurate? How can we create a testing infrastructure that is platform-agnostic, application-agnostic, criteria-agnostic, etc., and still have confidence that a given app is being assessed properly and accurately? The Pilot will leverage the capabilities of the best commercial, free, academic, and open source tools in order to assess Android applications Slide 6 of 20
PILOT OBJECTIVE Create and validate a scalable, efficient, and automated mobile application software assurance testing process that can be implemented across Government, through the use of: Multiple tools to increase coverage Automation to reduce manual review Defined processes to speed decision making A repository of test results with Metadata Slide 7 of 20
TESTING WALKTHROUGH App Store No SHA256 found Multiple SwA Tool Tests Report Scrape Combine Results with Confidence and Severity Automated/ Analysis: Pass/Fail Recommendation Analyst Review Analyst Report & Recommendation Management Adjudication Go Decision No Go Decision Update Database SHA256 Lookup SHA256 found Slide 8 of 20
TOOL CONFIDENCE CONCEPT: How do we trust a tool s output if we don t know if the tool is accurate? PROCESS: Test x open source applications using the desired tools Analyst manually reviews each finding by comparing to original source code The ratio α, where 0 α 1, is simply defined as the total number of accurate findings divided by the total number of findings Example: Tool A makes Y number of findings for a vulnerability. The analyst reviews and notes that X findings are correct (where X Y). Therefore, α = X/Y, which is the tool s confidence score for a given weakness Slide 9 of 20
TOOL CONFIDENCE ADVANTAGES If a confidence score is high enough or low enough, the findings for those vulnerabilities can be accepted or rejected automatically. Only confidence scores in the middle range (not high, not low), will necessitate manual review of the associated findings. Through the use of tool confidence scores, only a subset of the total findings are flagged for manual review, which significantly expedites the processing of a given application. Current values estimate that less than 25% of tool findings will be flagged for review (which makes the Pilot s methodology ~75% more efficient than traditional solutions of manual verification). Slide 10 of 20
CRITERIA SPECIFICATION CONCEPT: With so many differing lists of weakness criteria for evaluating mobile applications, which one should I use? PROCESS: The pilot does not advocate any specific list of criteria, but instead demonstrates a process that, theoretically, allows for the use of any list of evaluation criteria to be used to test apps. For the Pilot, we are using the CAS-defined weaknesses list*, the DISA SRG, OWASP Mobile Top 10 list, and MITRE CWEs * Available on request Slide 11 of 20
WEAKNESS RANKING CONCEPT: Weaknesses encountered during app testing/certification should be treated differently, depending on the impact they can cause to a given system. PROCESS: The CAS examined each criterion in each weakness specification listing to determine the weakness s severity. Weaknesses were rated as Low, Medium, High, or Fatal Depending on a given environment, thresholds can be established for weakness severities: One fatal finding may equal a failure, but a handful of high findings may be tolerable before the app is rejected. Slide 12 of 20
WEAKNESS RANKING PROCESS Weakness Source Ranking High Medium Low Fatal N/A CAS Weaknesses CAS 15 13 10 0 0 DISA Mobile APP SRG DISA 15 45 13 0 0 OWASP Top Ten CAS 8 1 0 0 1 Mobile Specific CWEs CAS 9 4 1 0 0 CAS Traditional Weakness Classes (CWEs) CAS 5 6 1 0 0 Slide 13 of 20
DATABASE AND HASH CREATION CONCEPT: A database of which apps have been tested prevents duplication of efforts. PROCESS: All data from the application tests are saved to a shared database. When an app is submitted through the Pilot s processes, a SHA256 hash is created. This hash is compared to hashes stored in the database. If there is a match, the app does not need to be evaluated, and instead the app s reports are pulled from the database. Slide 14 of 20
SWID TAGS AND QUERIES CONCEPT: A centralized database of testing results allows multiple agencies to share, upload, and query app results, thus reducing duplication of efforts. PROCESS: Every time an app is evaluated, upon completion of the evaluation, a software ID (SWID) tag is created. Contains metadata information for the app and the evaluation. Allows other agencies to quickly query the SWID database in order to find information about apps and prior evaluations. Slide 15 of 20
STATUS OF PILOT Process Documents are complete (pending comments and validation) Weakness severities complete Tools are in-house & operational Tool Report Scraping Complete Tool Trust in Progress All tools results being evaluated System coding is in progress Methodology Document is in progress Slide 16 of 20
MILESTONES Initial Tool Trust (25 Apps) Feb 28 (DONE) Initial Code Development April 15 App Testing (75 Apps) April 31 (DONE) Process Testing (250 Apps) May 31 Revise Processes May 31 Publish Testing Methodology June 30 Release Software June 30 Slide 17 of 20
INTERESTED IN LEARNING MORE? Contact: cas@nsa.gov Slide 18 of 20
PILOT OVERVIEW P6 Build SWID Tag SWID Tag Creation R3 Perf Reports & Metric Reporting I2 Data Input Tool I3 Data Input Tool App and Process Analytics Agency Evaluation Criteria Specific Situation Criteria P3 Build Specific Test Critieria Database App Data Repository/ Metadata P7 App Searches & Lookups P2 App Store Data Scrape App Store Agency Apps P1 SHA256 Build and Compare Commercial Apps I1 Manual Data Input Analyst Review Multiple SwA Tool Tests Align Results with criteria and order by risk P5 Report Scraping Automated Analysis- Pass/Fail R1 Results Analysis & Report Gen Automated Reports I4 Data Input Tool Analyst Review Analyst Report & Recommendation R2 Mgmt Report Gen I5 Data Input Tool Management Adjudication Go Decision No Go Decision P8 Tool Conf Data Collect & Calc P4 Initiate Tool Runs Slide 19 of 20