How to be a CSI (encoding Crime Scene Investigator)



Similar documents
Forensic Science : Course Syllabus Forensic Science : Secrets of the Dead

Critical Values for I18n Testing. Tex Texin Chief Globalization Architect XenCraft

Internationalization & Pseudo Localization

Internationalizing JavaScript Applications Norbert Lindenberg. Norbert Lindenberg All rights reserved.

Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner

FileMaker 13. ODBC and JDBC Guide

Unicode Security. Software Vulnerability Testing Guide. July 2009 Casaba Security, LLC

Introducing Apache Pivot. Greg Brown, Todd Volkert 6/10/2010

Contents. BMC Atrium Core Compatibility Matrix

Product Internationalization of a Document Management System

Internationalizing the Domain Name System. Šimon Hochla, Anisa Azis, Fara Nabilla

Web Extras. Customer Service Description. Version 3.0. February 26, 2002

TRAVELING FORENSIC EDUCATION PROGRAM

Project 2: Web Security Pitfalls

Internationalization of Domain Names

Data Integrator. Encoding Reference. Pervasive Software, Inc B Riata Trace Parkway Austin, Texas USA

In 2014 China to Crack Down More on Foreign Corruption

The Unicode Standard Version 8.0 Core Specification

How is it processed in lab

CA Performance Center

Topics in Website Testing. [Reading assignment: Chapter 14, pp ]

HP Business Process Monitor

National Language (Tamil) Support in Oracle An Oracle White paper / November 2004

Security at Scale: Effective approaches to web application

HKSCS-2004 Support for Windows Platform

Web Pages. Static Web Pages SHTML

FileMaker 14. ODBC and JDBC Guide

XML. CIS-3152, Spring 2013 Peter C. Chapin

CommonSpot Content Server Version 6.2 Release Notes

In this Lecture you will Learn: Implementation. Software Implementation Tools. Software Implementation Tools

Risks with web programming technologies. Steve Branigan Lucent Technologies

How To Manage Web Content Management System (Wcm)

Software Engineering Services. Product i18n and l10n services Rev. 1.1 December 15, 2003 nextedge Technology, Inc.

Errors That Can Occur When You re Running a Report From Tigerpaw s SQL-based System (Version 9 and Above) Modified 10/2/2008

Finding Your Way in Testing Jungle. A Learning Approach to Web Security Testing.

How To Write A Domain Name In Unix (Unicode) On A Pc Or Mac (Windows) On An Ipo (Windows 7) On Pc Or Ipo 8.5 (Windows 8) On Your Pc Or Pc (Windows

Government Girls Polytechnic, Bilaspur

Unicode Enabling Java Web Applications

HP Business Service Management

Symbols in subject lines. An in-depth look at symbols

Kaspersky Security Center Web-Console

Kaspersky Security Center Web-Console

Release Bulletin Sybase ETL Small Business Edition 4.2

SECURE BACKUP SYSTEM DESKTOP AND MOBILE-PHONE SECURE BACKUP SYSTEM HOSTED ON A STORAGE CLOUD

COMPANIES REGISTRY. Third Party Software Interface Specification. (Part 1 Overview)

Acunetix Website Audit. 5 November, Developer Report. Generated by Acunetix WVS Reporter (v8.0 Build )

DNS REBINDING DENIS BARANOV, POSITIVE TECHNOLOGIES

INFORMATION BROCHURE Certificate Course in Web Design Using PHP/MySQL

Introduction - MSM 150 Appliance

Translating QueueMetrics into a new language

4PSA DNS Manager Translator's Manual

Chapter 14 Analyzing Network Traffic. Ed Crowley

Lesson 7 - Website Administration

Detecting and Exploiting XSS with Xenotix XSS Exploit Framework

TIBCO Managed File Transfer Platform Server for UNIX Release Notes

Figure 1. Example of an Excellent File Directory Structure for Storing SAS Code Which is Easy to Backup.

NCH Secure Web Delivery Instructions

Right-to-Left Language Support in EMu

Lesson Overview. Getting Started. The Internet WWW

Contents. BMC Remedy AR System Compatibility Matrix

Jonathan Worthington Scarborough Linux User Group

EURESCOM - P923 (Babelweb) PIR.3.1

[MS-RDPESC]: Remote Desktop Protocol: Smart Card Virtual Channel Extension

Oracle Essbase Integration Services. Readme. Release

RWC4YD3S723QRVHHHIZWJXPTQMO6GKEQR

BAA Course Approval submission: Introduction to Forensic Science 1:

Hypercosm. Studio.

HP LoadRunner. Software Version: Ajax TruClient Tips & Tricks

Performance Testing Web 2.0

HP AppPulse Mobile. Adding HP AppPulse Mobile to Your Android App

Course Forensic Science. Unit II History

Field Properties Quick Reference

Reading and Writing in Chinese

A Study on Dynamic Detection of Web Application Vulnerabilities

Bypassing Internet Explorer s XSS Filter

Creating While Loops with Microsoft SharePoint Designer Workflows Using Stateful Workflows

1. Introduction. 2. Web Application. 3. Components. 4. Common Vulnerabilities. 5. Improving security in Web applications

HP Service Manager Compatibility Matrix

This guide consists of the following two chapters and an appendix. Chapter 1 Installing ETERNUSmgr This chapter describes how to install ETERNUSmgr.

Unity web- player issues in browsers & in client system

Pulse Secure Client. Customization Developer Guide. Product Release 5.1. Document Revision 1.0. Published:

Contents. vii. Preface. P ART I THE HONEYNET 1 Chapter 1 The Beginning 3. Chapter 2 Honeypots 17. xix

Advanced evidence collection and analysis of web browser activity 5

DEBT COLLECTION SYSTEM ACCOUNT SUBMISSION FILE

About This Service Fix... 1 New Features... 1 Defects Fixed... 1 Known Issues in Release Documentation Updates... 6

BusinessObjects Enterprise XI Release 2 for Solaris

List of some usual things to test in an application

1 Review Information About this Guide

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh

Important Release Information and Technical and Deployment Support Notes

Secure in 2010? Broken in 2011!

FileMaker Server 9. Custom Web Publishing with PHP

How to Monitor and Identify Website Issues. 2013, SolarWinds Worldwide, LLC. All rights reserved. Share:

FileMaker 11. ODBC and JDBC Guide

Software Requirements Specification For Real Estate Web Site

User Guide. Copyright 2003 Networks Associates Technology, Inc. All Rights Reserved.

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Application Setup help topics for printing

Blackbox Reversing of XSS Filters

Bridging the Gap: from a Web App to a Mobile Device App

WEB ATTACKS AND COUNTERMEASURES

Transcription:

Objectives for Crime Scene Investigation How to be a CSI (encoding Crime Scene Investigator) ex exin Internationalization Architect Yahoo Inc. Have some fun Prevent death by bullet points Introduce strategies for analyzing character corruption Compare V Show and Software investigation of mutilated character corpses he latest version of the presentation is at www.i18nguy.com/how-o-be-a-csi.pdf 2 Our CSI Heroes V CSI vs. Software CSI Our Software CSI Heroes Gil Grissom Catherine Willows Nick Stokes Sara Sidle Greg Sanders Warrick Brown Dr. Al Robbins Capt. Brass 3 4 30th Internationalization and Unicode Conference 1

Crime Discovery On V First steps Close off crime scene and gather clues ake lots of odd pictures Collect potential forensic evidence Identify the victim(s) (or what s left of them) Interview witnesses hey lie! hey presume! hey are prejudiced! Put down either the jaded or newbie cop If it s suicide then how do you explain Crime Discovery in Software A woman calls 11 yelling: here has been a Mojibake! Someone commited Mojibake! * Characters have died. Decomposition has already set in. Who did it? *Mojibake = Japanese for garbage characters. 5 6 Crime Discovery in Software Collecting Images Characters Corrupted Wrong characters White or black boxes Collect forensic evidence Screen captures, Keystrokes, Environment Witnesses/Users: It happens when I he entire application is busted hey lie, hey presume, hey re prejudiced he newbie or jaded developer hat vendor always does that. Our code is fine. 7 Often included in bug reports Better than nothing You can detect patterns But often not helpful Better to collect: Expected bytes/chars Actual bytes/chars Fonts referenced, available Expected/detected encoding Keystrokes A way to reproduce 8 30th Internationalization and Unicode Conference 2

Example Case History Developers refused to debug based on an urban legend- Crime happens. Can t catch these guys. It s Microsoft IE s fault- Our software is fine Instead of: <meta http-equiv=content-ype content="text/html; charset=utf-8"> he code said: <meta http-equiv=content-ype content="text/html; charset=utf8"> (Note the missing hyphen in utf8 ) Analysis On V Autopsy: More data collection Possibilities: What could have happened? Conjecture: How could it have happened? Reconstruction: Does it fit the data? Research: Does other data fit the theory? Always ask the most narrowing question last Experiments Measurements and proof of concept Decay rates under abnormal circumstances 10 Analysis on V Analysis in Software Anatomy and decay science Autopsy ypical computer queries How many trucks on the road? How many on route 1? How many on route 1 at 3pm? How many stopped at Joe s bar? Core dumps, architecture and debugging ypical computer queries Names of encodings Relations between encodings Variations of encodings Quality of encoding conversions rajectory Reconstruction Data flow analysis And points of conversion Experiments with controls Known data injection 11 12 30th Internationalization and Unicode Conference 3

Analysis In Software Character Encoding Negotiation Architecture: he Web is complex Clients, Browsers, mix technologies (DOM, java, css, php, active-x, applets, etc.), *ml standard+non-standard behaviors, OS dependencies, etc. Servers, protocols are also mixed anguage, locale, encoding, are ambiguous Negotiation tactics are unreliable But must be understood & kept up-to-date See Web Internationalization utorial, et al. 13 CSS Encoding JavaScript Encoding Unix user html html Windows user GB2312 1252 DOM Encoding HM Encoding HM CSS Java Script D O M 14 Architecture Architecture Data flow and points of (mis)conversion Interfaces are common failure points Between egacy, new and 3 rd party software API, web services, protocols, devices Data sources, sinks ypes of misconversion Missing conversion Source Source Incorrect conversion Source X backwards conversion arget Source double conversion (Source arget) arget Other contributors to conversion errors Missing or incorrect encoding identifier False detection e.g. ASCII vs UF-7 oo little data to detect ext not like corpus used for detection statistics Encoding name variations I didn t know it was a crime. I just did what I was told! 15 16 30th Internationalization and Unicode Conference 4

Encoding Name Variations Encoding Name Variations E.g. ISO 885-1 vs Windows-1252 E.g. ISO 885-1 vs Windows-1252 80-F Control Codes in ISO 885-1 1252-specific characters 17 18 Encoding Naming Variations Encoding Name Variations Several variations of big-5, shift-jis, etc. E.g. Microsoft, Apple added chars to shift-jis hen there is big5-hkscs Hong Kong Supplementary Character Set Application dependent names Windows IE expects KSC5601 or Korean, not cp4, windows-4 Firefox would expect euc-kr Sometimes the error is font not encoding Poor naming conventions in the industry Application dependent 1 20 30th Internationalization and Unicode Conference 5

Encoding Conversions Missing Character Conversions Correct conversion is not always obvious: 0x5C is it a file separator \ or currency? (or Won, or other currency) Half-width or Full-Width? Sometimes you need different converters?? Converts to? without error code. 21 22 Encoding Conversions Breaking Multibyte Characters he family secret: Versioning Unicode is an evolving standard As characters are added the opportunity for improved conversions exist Which Unicode version is the converter for? If I convert data today how will it compare with the same data converted 3 years ago? How long after death before rigor mortis sets in and when do the maggots come? By not treating all bytes as one character Don t insert bytes in the middle Don t delete 1 byte of a multi-byte char. Be careful with block boundaries Don t treat individual bytes as characters E.g. uppercase a trailing byte E.g. reat trailing byte 0x5C as \ in file pathnames 23 24 30th Internationalization and Unicode Conference 6

Making Mojibake Splitting multi-byte characters Byte type: Bytes: 3 日 本 語 F A 6 7 B 8 C Inserting a (61) in second byte Byte type: Bytes: 3 殿 6 1 a F A 6 S 7 B E A 坙 { 語 8 C E A Byte types: = ead Byte = rail Byte 25 Encoding Pathology and Forensics Piecing together who did what to whom Knowing what was expected and what resulted, and comparing with typical patterns of failure, we can deduce what occurred Do all characters fail or certain ones? How do they fail? Wrong characters, Question marks, Blackboxes, etc. 26 Encoding Forensics and Pathology Encoding Forensics and Pathology Question Marks Conversion to an encoding that doesn t have the characters replaced by?. All non-ascii characters are mojibake Missing, wrong, backwards, or 2x conversion Expected ç. Actual: Ã Expected 0xE7 (ç). Actual: 0xC3 0xA7 (Ã ) Note that U+00E7 in UF-8, ç is 0xC3 0XA7 Conclusion: UF-8 bytes displayed as ISO885-1 Only certain characters misconverted Variant encoding, out of date conversion, data flow, incomplete font E.g. Only euro, trademark, smart quotes, OE ligature,... Used iso-885-1 to utf-8 conversion instead of windows-1252 to utf-8 27 28 30th Internationalization and Unicode Conference 7

Encoding Error Patterns Comparing conversion error patterns is like identifying the gun that fired a bullet by the shared striation marks It s encoding DNA Often there are obvious candidate patterns Round up all the usual suspects A stoolie (er tool) for testing conversions Given an input string, try the usual and expected error conversions, See if any of the results match Other tools: Encoding validator, byte to %hh Encoding Research How many trucks stopped at Joes bar? ook for variants you might be incurring www.iana.org/assignments/character-sets Conversions supported See iconv, ICU, or doc for your converter Check for font versions/updates 2 30 Data Injection Setting raps for Criminals esting with live data feeds like being on a stakeout- Hoping the criminal will come by while you wait Create a pseudo-feed or simulate data entry for controlled experiments. Known, repeatable values simplify debugging. ess threatening for debugging foreign languages Inject directly into subsystems to eliminate other components as suspects. Use validation routines at key points of data acceptance or conversion Especially useful for UF-8, which has illegal byte patterns. Reference: www.w3.org/international/questions/qaforms-utf-8.html 31 32 30th Internationalization and Unicode Conference 8

Conclusions: How to be a CSI Gather forensic evidence Be objective, be thorough Understand the architecture, the data flow, the points of conversion Identify the potential patterns of failure hank You! Questions? Concentrate on what cannot lie. he evidence -Gil Grissom Note: Images from the CSI V show and those of CSI cast members are courtesy of CBS Broadcasting Inc. All Rights Reserved, and/or the actual copyright owners. he owner and copyright for these photos were generally not documented from the sites I downloaded them. If notified I will update this file to include the correct copyright. 33 34 30th Internationalization and Unicode Conference