Caml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr



Similar documents
Chapter 7D The Java Virtual Machine

WIZnet S2E (Serial-to-Ethernet) Device s Configuration Tool Programming Guide

Application Note. Introduction AN2471/D 3/2003. PC Master Software Communication Protocol Specification

Xbox 360 File Specifications Reference

1 The Java Virtual Machine

The programming language C. sws1 1

Hagenberg Linz Steyr Wels. API Application Programming Interface

USB Card Reader Configuration Utility. User Manual. Draft!

Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI)

Windows 7 Security Event Log Format

Binary Representation

Table 1 below is a complete list of MPTH commands with descriptions. Table 1 : MPTH Commands. Command Name Code Setting Value Description

Number Representation

A JAVA VIRTUAL MACHINE FOR THE ARM PROCESSOR

A White Paper about. MiniSEED for LISS and data compression using Steim1 and Steim2

This section describes how LabVIEW stores data in memory for controls, indicators, wires, and other objects.

Sources: On the Web: Slides will be available on:

[MS-OXTNEF]: Transport Neutral Encapsulation Format (TNEF) Data Algorithm

Pemrograman Dasar. Basic Elements Of Java

Exam 1 Review Questions

Informatica e Sistemi in Tempo Reale

This text refers to the 32bit version of Windows, unfortunately I don't have access to a 64bit development environment.

Java Virtual Machine, JVM

Application Note AN0008. Data Logging Extension. For. Venus 8 GPS Receiver

Introduction to Java

INTERNATIONAL TELECOMMUNICATION UNION

DNA Data and Program Representation. Alexandre David

Nemo 96HD/HD+ MODBUS

Instruction Set Architecture (ISA)

How To Write Portable Programs In C

A Catalogue of the Steiner Triple Systems of Order 19

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Section 1.4 Place Value Systems of Numeration in Other Bases

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Extensible Storage Engine (ESE) Database File (EDB) format specification

EWF specification. Expert Witness Compression Format specification. By Joachim Metz

C++ Wrapper Library for Firebird Embedded SQL

Smart Card Application Standard Draft

MODBUS APPLICATION PROTOCOL SPECIFICATION V1.1b CONTENTS

Everything is Terrible

Useful Number Systems

WAVE PCM soundfile format

Communications Protocol for Akai APC40 Controller

The Answer to the 14 Most Frequently Asked Modbus Questions

Type 2 Tag Operation Specification. Technical Specification T2TOP 1.1 NFC Forum TM NFCForum-TS-Type-2-Tag_

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

CTNET Field Protocol Specification November 19, 1997 DRAFT

MODBUS APPLICATION PROTOCOL SPECIFICATION V1.1b3 CONTENTS

Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis

Volume Serial Numbers and Format Date/Time Verification

Extensible Storage Engine (ESE) Database File (EDB) format specification

Lecture N -1- PHYS Microcontrollers

DEBT COLLECTION SYSTEM ACCOUNT SUBMISSION FILE

David Cowen Matthew Seyer G-C Partners, LLC

CS61: Systems Programing and Machine Organization

ELEG3924 Microprocessor Ch.7 Programming In C

Tamper protection with Bankgirot HMAC Technical Specification

[MS-RDPESC]: Remote Desktop Protocol: Smart Card Virtual Channel Extension

The Windows Shortcut File Format as reverse-engineered by Jesse Hager Document Version 1.0

CSI 333 Lecture 1 Number Systems

SRF08 Ultra sonic range finder Technical Specification

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

MarshallSoft AES. (Advanced Encryption Standard) Reference Manual

Extracting, Storing And Viewing The Data From Dicom Files

CS101 Lecture 11: Number Systems and Binary Numbers. Aaron Stevens 14 February 2011

EtherNet/IP Modbus XPort, NET232, and NET485

Programming languages C

OPEN MODBUS/TCP SPECIFICATION

GOM 3D File Format. Version 1.1b GOM mbh, July GOM mbh Mittelweg Braunschweig

sqlite driver manual

RFID MODULE Mifare Reader / Writer SL030 User Manual Version 2.6 Nov 2012 StrongLink

Version 4 MAT-File Format

ACR122 NFC Contactless Smart Card Reader

Erasure Codes Made So Simple, You ll Really Like Them

Consult protocol, Nissan Technical egroup, Issue 6

Implementation Aspects of OO-Languages

Numbering Systems. InThisAppendix...

Application Note: AN00141 xcore-xa - Application Development

HOST Embedded System. SLAVE EasyMDB interface. Reference Manual EasyMDB RS232-TTL. 1 Introduction

HP Service Virtualization

Sample EHG CL and EHG SL10 16-bit Modbus RTU Packet

Technical Support Bulletin Nr.18 Modbus Tips

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

Java Interview Questions and Answers

Preface. DirX Document Set

Jonathan Worthington Scarborough Linux User Group

AKD EtherNet/IP Communication

Java Programming. Binnur Kurt Istanbul Technical University Computer Engineering Department. Java Programming. Version 0.0.

Java Crash Course Part I

Computer Science 281 Binary and Hexadecimal Review

Simple Image File Formats

Bluetooth HID Profile

The New IoT Standard: Any App for Any Device Using Any Data Format. Mike Weiner Product Manager, Omega DevCloud KORE Telematics

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Introduction to MIPS Assembly Programming

20 Using Scripts. (Programming without Parts) 20-1

Memory Systems. Static Random Access Memory (SRAM) Cell

Efficient Low-Level Software Development for the i.mx Platform

RS-485 Protocol Manual

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

Transcription:

Caml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr Copyright c 2007-2010 Xavier Clerc cadmium@x9c.fr Released under the LGPL version 3 February 6, 2010 Abstract: This document describes the binary formats used by Caml 1 in its 3.11.2 version for both bytecode files and marshalled data. This document is structured in two parts: the first one exposes the format of bytecode files, and the second one exposes the format of marshalled data. Bytecode file format The format of bytecode files is summarized by figure 1. Unused header is commonly os-executable code that looks for ocamlrun executable and launch it on the file. Trailer identifies the file as a caml bytecode file by magic ( Caml1999X008 ) and indicates the number of sections in the file. One should notice that datas and descriptors of sections do not need to be in the same order. All character and string values use the ISO-8859-1 encoding. The remainder of this section lists possible sections with their contents. CODE section (mandatory) contains the bytecode to be executed. Its size must be a multiple of 4, as the code is composed of 4-byte integers (in little-endian representation). These integers are either insctructions bytecodes or instructions arguments. The list of intructions with related arguments is given in another document Caml Virtual Machine Instruction set that can be downloaded at http://cadmium.x9c.fr. DATA section (mandatory) contains the global data for the program, in the format defined in the second part of this document. PRIM section (mandatory) contains a null-character-terminated list of null-characterterminated strings. Each string is the name of a primitive requested for program execution. The order of these strings defines the primitive integer identifiers: the first requested primitive is given the 0 integer identifier, the second one is given the 1 integer identifier, etc. DLLS section contains a null-character-terminated list of null-character-terminated strings. Each string is the name of a linked library requested for program execution. 1 The official Caml website can be reached at caml.inria.fr and contains the full development suite (compiler, tools, virual machine, etc.) as well as links to third-party contributions. 1

File format: unused header data for section 1 data for section N descriptor for section 1 descriptor for section N trailer table of contents actual data Section description format: name four 8-bit chars Trailer format: length one 32-bit integer (unsigned) # of sections magic one 32-bit integer (unsigned) twelve 8-bit chars Figure 1: File format. DLPT section contains a null-character-terminated list of null-character-terminated strings. Each string is a path for linked library search. DBUG section contains an unsigned 32-bit integer indicating the number of debug elements. Elements follow, each being a couple containing an offset (as an unsigned 32-bit integer) and a marshalled value representing an Instruct.debug-event instance. Marshalled data format The format of marshalled values is summarized by figure 2. The following paragraphs give some precisions about particular data formats. Integer values are stored in big-endian format. They encode values of the int type (int32, int64 and nativeint types are coded as custom values). String values are stored using ISO-8859-1 encoding. Float values are stored using IEEE 754 encoding. According to code, values may be either big-endian (0x0B, 0x0D, and 0x0F codes) or little-endian (0x0C, 0x0E, and 0x07 codes). Code offset values (0x10) consist in the offset of a code address, relative to code start. Block values are serialized as shown in figure 3. For atoms, no additional data needs to be stored as the tag is given by the header. Other s are stored by serializing their fields in ascending order, the size (number of s) being given by the header. Color is used by garbage collector and is set to zero in serialized data. 2

code 0x00 value signed 8-bit integer integers 0x01 0x02 signed 16-bit integer signed 32-bit integer 0x03 signed 64-bit integer supported only on 64-bit architectures shared elements 0x04 0x05 0x06 unsigned 8-bit offset unsigned 16-bit offset unsigned 32-bit offset s 0x08 0x13 32-bit header (unsigned) 64-bit header (signed) supported only on 64-bit architectures strings 0x09 0x0A unsigned 8-bit length unsigned 32-bit length 0x0B 0x0C 64-bit float (IEEE 754 encoding) floats 0x0D 0x0E unsigned 8-bit length sequence of 64-bit floats (IEEE 754 encoding) 0x07 0x0F unsigned 32-bit length sequence of 64-bit floats (IEEE 754 encoding) miscellaneous 0x10 0x11 0x12 unsigned 32-bit offset 16-byte checksum unsigned 32-bit offset closure 0-terminated string (8-bit ISO-8859-1 characters) custom data Figure 2: Data format. 3

32-bit header: size (color) tag 31 10 9 8 7 0 64-bit header: size (color) tag 63 10 9 8 7 0 Block content: size = 0 size > 0 nothing atom index is given by tag field 0 field 1 field size - 2 field size - 1 Figure 3: Block values. Custom values are stored in two parts: the first one is the custom identifier (as a null-characterterminated string, using ISO-8859-1 encoding), the second one is custom-specific data. Shared values are references to elements already (de)serialized of the current value. The offset defines this reference, zero pointing to the last read object, one pointing to the preceding object, etc. Small elements are used to shorten value representation. They are stored using the specific encodig depicted in figure 4. A code from 0x20 to 0x3F indicates a small string value, a code from 0x40 to 0x7F indicates a small int value, and a code from 0x80 to 0xFF indicates a small value. 4

code 0x20 0x3F value length = code & 0x1F 0x40 0x7F value = code & 0x3F 0x80 0xFF size = (code >> 4) & 0x07 tag = code & 0x0F Figure 4: Small elements. 5