Writing an 8086 emulator in Python Cesare Di Mauro PyCon 2015 Florence April 2015 April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 1
The geek experience Writing your own o.s.: A few steps for a minimal o.s. example: - write an 8086 boot loader (MBR for floppy) - take control of the hardware (clear/set interrupts, etc.) - write some text on the screen from the main code - loop forever (or halt the execution) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 2
The result Credits to Ben Barbour s article: http://www.benbarbour.com/write-your-own-hello-world-bootloaderos/ April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 3
What s next The geeks dream! Credits to Wikipedia: http://en.wikipedia.org/wiki/hal_9000 April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 4
What s really next: some graphic! A mode 13h (320x200 x 256 colors) example Credits to Wikipedia: http://en.wikipedia.org/wiki/mode_13h April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 5
Beyond 8086: Protected Mode The old 8086 (Real Mode) was very limited: - 16-bit code only - 1MB, segmented address space (64KB segments) - no paging (MMU) & virtualization The Protected Mode offers: - 16/32-bit code or 32/64-bit (with Long Mode) - 4GB or 256TB (virtual) linear address space - Paging & virtualization April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 6
Using BIOS services: no chance! Many of them don t work in protected mode! Possible solutions: - Switch back to 8086 (Real) mode - Use an 8086 Virtual Monitor (vm8086) Many drawbacks: - Interrupts served by Real Mode code or disabled! - Some 8086 BIOS calls can switch to Protected Mode - Some 8086 BIOS calls can directly use the hardware - No vm8086 in Long Mode April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 7
Best compromise: 8086 emulator Pros: - Works in Protected Mode, Long Mode, and even on different architectures (ARM, MIPS,PowerPCs, etc.) - Simple routing of hardware accesses ( ports I/O) - Simple routing of interrupts disable/enable requests - Perfect sandboxing (full control of the emulator) Cons: - Slow (not the most important thing; optimizations possible) - A lot of work (writing and testing) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 8
Why another 8086 emulator? Existing emulators can be difficult to adapt Licensing issues (GPL = viral) No weak neither perfect emulation needed: good enough! Easy to read, maintain, modify/experiment Reasonable speed for common cases (make them fast!) Fun! April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 9
Planning 8086 emulator prototype + unit test (in Python) Simple/minimal PC emulator (in Python too) Final C version (Windows DLL for testing) Integration on an hobby o.s. (AROS) AROS: http://www.aros.org/ April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 10
Why AROS? A lightweight and small o.s. Let you easily experiment ideas An Amiga o.s. derived/inspired Passion! Fun! Credits to Eric W. Schwart: http://aros.sourceforge.net/downloads/kitty/ April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 11
What AROS needs Drivers. Drivers. Have I said drivers? Difficult port: Amiga o.s. APIs/ABI Windows or Unix/Posix Primary need: graphic drivers. Few cards supported Primary fallback for graphic drivers: handling VESA modes April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 12
VESA mode (not modes!) Problems: - Only one VESA mode selectable at boot (from GRUB) - Changing VESA mode requires entire o.s. reboot Solution. Calling VESA BIOS APIs (INT 10h) let to: - List available screen modes - Change current mode - Set/Get palette colors - Set screen display inside the framebuffer (virtual screens) - More April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 13
How the trick works Boot time: AROS driver initialize 8086 emulator AROS driver calls emulator s INT 10h (VESA BIOS services) Emulator calls AROS driver s callbacks when needed AROS driver gets results from emulator call AROS: http://www.aros.org/ April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 14
Emulator overview Some APIs exposed to set emulator status & callbacks A couple of APIs to run/stop execution Execution might hang by design! External events & emulation status controlled by caller It s an emulator, not a full PC: no hardware emulated! April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 15
The 8086 architecture General purpose Registers AH AL AX BH BL BX CH CL CX DH DL DX Index Registers Segment Registers Program Counter SI DI SP BP CS DS SS ES IP Source Index Destination Index Stack Pointer Base Pointer Code Segment Data Segment Stack Segment Extra Segment Status Register Flags All registers are 16-bit April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 16
Registers representation Array with 16 16-bit values 0 AX 1 CX 2 DX 3 BX 4 SP 5 BP 6 SI 7 DI 8 (INTERNAL) ES 9 (INTERNAL) CS 10 (INTERNAL) SS 11 (INTERNAL) DS 12 *NOT USED* 13 *SCRATCH PAD* 14 (INTERNAL) IP 15 (INTERNAL) FLAGS April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 17
Registers definition # General purpose registers AX, CX, DX, BX, SP, BP, SI, DI = xrange(8) # Segment registers # Cannot be written normally! Use proper write_segment function INTERNAL_ES, INTERNAL_CS, INTERNAL_SS, INTERNAL_DS = xrange(8, 12) # Special registers # Cannot be read or written normally! # Use proper read/write_ip or read/write_flags functions INTERNAL_TEMP_REG, INTERNAL_IP, INTERNAL_FLAGS = xrange(13, 16) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 18
Registers class class Registers(object): def init (self): self._pointer = Pointer(16 * 2) def getitem (self, index): return internal_read_word(self._pointer, index * 2) def setitem (self, index, value): internal_write_word(self._pointer, index * 2, value) def len (self): return 16 def add (self, other): return self._pointer + other * 2 April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 19
Accessing registers like in C registers = Registers() # The unique/global registers data structure def registers_as_bytes_pointer(): return registers + 0 def pointer_to_byte_register(reg): return registers_as_bytes_pointer() + register_byte_index_to_offset[reg] #AL,CL,DL,BL,AH,CH,DH,BH register_byte_index_to_offset = 0, 2, 4, 6, 1, 3, 5, 7 def inc_reg(reg): inc_operand_16(registers + reg) registers[ax] April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 20
Registers public interface def read_register(reg): return registers[reg] def write_register(reg, value): # The 0xffff masking can be avoided in C registers[reg] = value & 0xffff def read_byte_register(reg): return pointer_to_byte_register(reg)[0] def write_byte_register(reg, value): # The 0xff masking can be avoided in C pointer_to_byte_register(reg)[0] = value & 0xff April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 21
The 8086 memory model Physical address = Segment * 16 + Offset Credits to Brock University: http://www.cosc.brocku.ca/~bockusd/3p92/local_pages/8086_achitecture.htm April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 22
Segments public interface def read_segment(segment): return read_register(segment + INTERNAL_ES) def write_segment(segment, value): write_register(segment + INTERNAL_ES, value) cache_segment(segment) and private! def cache_segment(segment): segments_addresses[segment] = memory + \ read_register(segment + INTERNAL_ES) * 16 April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 23
Pointer class part 1 class Pointer(object): def init (self, size=0, buffer=none, position=0): self._buffer = buffer or bytearray(size) self._position = position def getitem (self, address): return self._buffer[self._position + address] def setitem (self, address, value): self._buffer[self._position + address] = value def add (self, other): return Pointer(buffer=self._buffer, position=self._position + other) def sub (self, other): if isinstance(other, Pointer): return self._position - other._position else: return Pointer(buffer=self._buffer, position=self._position - other) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 24
Pointer class part 2 class Pointer(object): [ ] def iadd (self, other): self._position += other return self def isub (self, other): self._position -= other return self def int (self): return self._position April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 25
Memory data structures # 1MB + 128KB to protect the upper memory access memory = Pointer((1024 + 128) * 1024) # 64KB I/O space + 2 bytes to protect the upper I/O access ports = Pointer(64 * 1024 + 2) # Caches the linear address for every segment # The current instruction is cached as segment #6 == IP segments_addresses = [memory + 0 for i in xrange(8)] April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 26
Memory public interface def fill_memory(start_address, length, value): for address in xrange(start_address, start_address + length): memory[address] = value def read_byte(address): return memory[address] def write_byte(address, value): memory[address] = value def read_word(address): return internal_read_word(memory, address) def write_word(address, value): internal_write_word(memory, address, value) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 27
Accessing words (16-bits data) def internal_read_word(pointer, address): # WARNING: pointer + address are treated as a linear address, # so it can cross the 64KB segment limit, like 80286+ return pointer[address] + (pointer[address + 1] << 8) def internal_write_word(pointer, address, value): # The 0xff masking can be avoided in C pointer[address] = value & 0xff # WARNING: 64KB segment cross # The 0xff masking can be avoided in C pointer[address + 1] = (value >> 8) & 0xff April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 28
Resetting the emulator CS=0xffff, IP=0x0000 -> first instruction at 0xffff0 def reset_8086(): for i in xrange(len(registers)): write_register(i, 0) write_register(internal_cs, 0xffff) write_register(internal_flags, 0xf002) for i in xrange(4): cache_segment(i) # For all four segments. See below segments_addresses[ip] = memory + read_register(internal_cs) * 16 ES, CS, SS, DS = xrange(4) TEMP_REG, IP, FLAGS = xrange(5, 8) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 29
8086 Instructions Instructions may be preceded by one or more prefixes: - LOCK - Data segment override (ES:, CS:, SS:, DS:) - String repeat (REP/REPE, REPNE) - WAIT (for FPU instructions) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 30
Instruction decoding & execution By design, LOCK and WAIT prefixes are ignored (NOPs) Segment and String prefixes must be stored Prefixes must be be cleared after execution Opcodes are grouped to simplify decoding and execution MOV to SS register should be atomic with next instruction April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 31
The main loop! def run_8086(): global running, segment_override, rep_prefix running = True segment_override = NO_SEGMENT_OVERRIDE rep_prefix = NO_REPEAT while running: macro_opcode, parameter = split_opcode[get_byte()] macro_opcode_execute[macro_opcode](parameter) NO_SEGMENT_OVERRIDE = 0 SEGMENT_OVERRIDE_ENABLED = 8 NO_REPEAT, REPEAT_ZERO, REPEAT_NOT_ZERO = xrange(3) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 32
(Macro)Grouping opcodes split_opcode = ( # 0x (BINARY_MEM_REG_8, ADD), (BINARY_MEM_REG_16, ADD), (BINARY_REG_MEM_8, ADD), (BINARY_REG_MEM_16, ADD), (BINARY_AL_IMM_8, ADD), (BINARY_AX_IMM_16, ADD), (PUSH_REG, INTERNAL_ES), (POP_SEG, ES), # 9x (XCHG_REG, (XCHG_REG, (XCHG_REG, (XCHG_REG, (XCHG_REG, (XCHG_REG, (XCHG_REG, (XCHG_REG, AX), # NOP! CX), DX), BX), SP), BP), SI), DI), (BINARY_MEM_REG_8, OR), (BINARY_MEM_REG_16, OR), (BINARY_REG_MEM_8, OR), (BINARY_REG_MEM_16, OR), (BINARY_AL_IMM_8, OR), (BINARY_AX_IMM_16, OR), (PUSH_REG, INTERNAL_CS), (POP_REG, INTERNAL_CS), # 1x (BINARY_MEM_REG_8, ADC), (BINARY_MEM_REG_16, ADC), (INSTRUCTION, CBW), (INSTRUCTION, CWD), (INSTRUCTION, CALL_FAR_IMM16_IMM16), (INSTRUCTION, WAIT), (INSTRUCTION, PUSHF), (INSTRUCTION, POPF), (INSTRUCTION, SAHF), (INSTRUCTION, LAHF), # Ax (INSTRUCTION_IMM_16, MOV_AL_FROM_DIRECT), (INSTRUCTION_IMM_16, MOV_AX_FROM_DIRECT), April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 33
Executing a (macro)instruction def add_16bit(source, target, result): global flags_first_operand, flags_second_operand, \ flags_result, flags_operation flags_first_operand = source flags_second_operand = target flags_result = source + target write_word_to_location(result, flags_result) flags_operation = FLAGS_ADD16 binary_16_execute = ( add_16bit, or_16bit, adc_16bit, sbb_16bit, and_16bit, sub_16bit, xor_16bit, cmp_16bit, mov_16bit, test_16bit, ) def binary_mem_reg_16(code): address, register_pointer = decode_modrm_16bit() binary_16_execute[code](read_word_from_location(address), read_word_from_location(register_pointer), address) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 34
The Mod/RM byte 7 6 5 4 3 2 1 0 MOD REG R/M REG -> 8 or 16-bit register MOD 00 -> Memory, no displacement* 01 -> Memory, 8-bit displacement 10 -> Memory, 16-bit displacement 11 -> 8 or 16-bit register *MOD=00 -> R/M=[Direct Address] R/M 000 -> [SI+BX+Displacement] 001 -> [DI+BX+Displacement] 010 -> [SI+BP+Displacement] 011 -> [DI+BP+Displacement] 100 -> [SI+Displacement] 101 -> [DI+Displacement] 110 -> [BP+Displacement]* 111 -> [BX+Displacement] April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 35
Decoding the ModR/M def decode_modrm_16bit(): modrm = get_byte() rm = modrm & 7 mod_ = modrm >> 6 offset = modrm_offset[rm + segment_override]() address = modrm_address_16bit[mod_](offset, rm) reg = (modrm >> 3) & 7 register_pointer = registers + reg return address, register_pointer modrm_address_16bit = ( mod0_no_displacement, mod1_8bit_displacement, mod2_16bit_displacement, mod3_16bit_register, ) modrm_offset = ( rm0_bx_si_ds, rm1_bx_di_ds, rm2_bp_si_ss, rm3_bp_di_ss, rm4_si_ds, rm5_di_ds, rm6_bp_ss, rm7_bx_ds, rm0_bx_si, rm1_bx_di, rm2_bp_si, rm3_bp_di, rm4_si, rm5_di, rm6_bp, rm7_bx, ) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 36
The FLAGS register 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 O D I T S Z 0 A 0 P 1 C O Overflow D Direction I Interrupt T Trace S Sign Z Zero A Auxiliary Carry P Parity C Carry 1 RESERVED Always 1 0 RESERVED Always 0 April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 37
Flags update Arithmetic operations usually update 6 flags (O, S, Z, A, P, C) Exceptions: INC/DEC don t update the Carry! Logical instructions update S, Z, P; clear O, C; A is undefined Rotates only updates O and C! Luckily, many times some flags are undefined Updating flags has a HUGE impact on performance! April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 38
Calculating flags: THE nightmare! def common_read_flags_8bit(auxiliar): global flags_operation flags_operation = FLAGS_NORMAL # The 0xff masking can be avoided in C result = flags_result & 0xff def read_auxiliary_add(): return ((flags_first_operand & 0x0f) + \ (flags_second_operand & 0x0f)) & \ AUXILIARY_MASK carry = (flags_result >> 8) & CARRY_MASK parity = parity_table[result & 0xff] zero = (result == 0) << ZERO_FLAG sign = result & SIGN_MASK overflow = ((flags_first_operand ^ flags_second_operand ^ result) << (OVERFLOW_FLAG - 7)) & OVERFLOW_MASK flags = (read_register(internal_flags) & (~(CARRY_MASK PARITY_MASK AUXILIARY_MASK ZERO_MASK SIGN_MASK OVERFLOW_MASK RESERVED3_MASK RESERVED5_MASK))) \ (carry parity auxiliar zero sign overflow) write_register(internal_flags, flags) return flags April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 39
A quantum approach to flags Operations do NOT calculate flags every time Operands, result, and rough operation saved Flags status collapses only when needed If few flags needed, calculate ONLY them! If more flags needed, full calculation made Rotates always calculate flags April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 40
An example: CMP + JNZ def cmp_16bit(source, target, result): global flags_first_operand, flags_second_operand, \ flags_result, flags_operation flags_first_operand = source flags_second_operand = target flags_result = source - target flags_operation = FLAGS_SUB16 jump_short_execute = ( cc_o, cc_no, cc_c, cc_nc, cc_z, cc_nz, cc_be, cc_nbe, [ ] def jump_short(jump_type): if jump_short_execute[jump_type](): [ ] def cc_nz(): return not read_zero_for_logical_from_operation[flags_operation]() read_zero_for_logical_from_operation = ( [ ] read_zero_for_logical_generic_16bit, def read_zero_for_logical_generic_16bit(): return not(flags_result & 0xffff) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 41
Testing the beast Unit test developed with regular code! One feature or opcode -> one or multiple tests written All public APIs tested as well Exercise as much scenarios as possible Tests makes it safer (enough!) to experiment Standard lib unittest module used (supported by PTVS) April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 42
Tests in action April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 43
Testing the final C version C 8086 emulator compiled as DLL DLL imported by Python wrapper (ctypes) Python callbacks provided to the DLL Tests transparently run with the regular test suite April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 44
The callbacks def set_on_disable_interrupts(handler): global on_disable_interrupts on_disable_interrupts = handler def set_on_enable_interrupts(handler): global on_enable_interrupts on_enable_interrupts = handler def set_on_byte_input(handler): global on_byte_input on_byte_input = handler def set_on_byte_output(handler): global on_byte_output on_byte_output = handler April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 45
What s missing Unary operations (NOT, NEG, shifts, etc.) String operations (REP MOVS, REP STOS, etc.) Tracing instructions (not needed; easy to implement) 8086 specific behaviors (too much effort; almost zero return) Much more tests coverage April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 46
Thanks to My family To stand me... April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 47
Q & A April 2015 Cesare Di Mauro PyCon 2015 Writing an 8086 emulator in Python 48