On Demand Loading of Code in MMUless Embedded System



Similar documents
OPERATING SYSTEM - VIRTUAL MEMORY

OPERATING SYSTEM - MEMORY MANAGEMENT

An Introduction to the ARM 7 Architecture

Introducción. Diseño de sistemas digitales.1

Overview of the Cortex-M3

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Plug and Play Solution for AUTOSAR Software Components

Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

CPU Organization and Assembly Language

Software based Finite State Machine (FSM) with general purpose processors

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Chapter 12. Development Tools for Microcontroller Applications

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

NIOS II Based Embedded Web Server Development for Networking Applications

Peter J. Denning, Naval Postgraduate School, Monterey, California

Computer Systems Structure Main Memory Organization

Embedded Software development Process and Tools: Lesson-3 Host and Target Machines

ARM Microprocessor and ARM-Based Microcontrollers

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

(Refer Slide Time: 00:01:16 min)

Chapter 7 Memory Management

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

İSTANBUL AYDIN UNIVERSITY

Computer Architecture

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

An Overview of Stack Architecture and the PSC 1000 Microprocessor

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

Exception and Interrupt Handling in ARM

8-Bit Flash Microcontroller for Smart Cards. AT89SCXXXXA Summary. Features. Description. Complete datasheet available under NDA

A3 Computer Architecture

How To Trace

Full and Para Virtualization

PROGRAMMABLE LOGIC CONTROLLERS Unit code: A/601/1625 QCF level: 4 Credit value: 15 OUTCOME 3 PART 1

3 SOFTWARE AND PROGRAMMING LANGUAGES

Hardware Assisted Virtualization

Keil C51 Cross Compiler

Virtual vs Physical Addresses

Memory Isolation in Many-Core Embedded Systems

Microprocessor & Assembly Language

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

Embedded Software development Process and Tools:

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

AN1229. Class B Safety Software Library for PIC MCUs and dspic DSCs OVERVIEW OF THE IEC STANDARD INTRODUCTION

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

FPGA area allocation for parallel C applications

Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop.

HY345 Operating Systems

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

Chapter 1 Computer System Overview

Instruction Set Design


a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

CPU Organisation and Operation

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

Lumousoft Visual Programming Language and its IDE

CSE 141L Computer Architecture Lab Fall Lecture 2

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

Programming Logic controllers

CHAPTER 1 ENGINEERING PROBLEM SOLVING. Copyright 2013 Pearson Education, Inc.

ARM Virtualization: CPU & MMU Issues

The Classical Architecture. Storage 1 / 36

1 The Java Virtual Machine

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

CHAPTER 7: The CPU and Memory

EMBEDDED SYSTEM BASICS AND APPLICATION

ELEC 5260/6260/6266 Embedded Computing Systems

C Programming. for Embedded Microcontrollers. Warwick A. Smith. Postbus 11. Elektor International Media BV. 6114ZG Susteren The Netherlands

Processor Architectures

1/20/2016 INTRODUCTION

CS5460: Operating Systems

Embedded Linux development with Buildroot training 3-day session

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

IA-64 Application Developer s Architecture Guide

Computer Architecture TDTS10

Wiggins/Redstone: An On-line Program Specializer

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

Programming NAND devices

Hypervisors and Virtual Machines

Chapter 2 Logic Gates and Introduction to Computer Architecture

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

SmartFusion csoc: Basic Bootloader and Field Upgrade envm Through IAP Interface

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

PROBLEMS #20,R0,R1 #$3A,R2,R4

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC

Transcription:

On Demand Loading of Code in MMUless Embedded System Sunil R Gandhi *. Chetan D Pachange, Jr.** Mandar R Vaidya***, Swapnilkumar S Khorate**** *Pune Institute of Computer Technology, Pune INDIA (Mob- 8600867094; e-mail sunil.gandhi007@gmail.com). ** Pune Institute of Computer Technology, Pune INDIA ( e-mail:pachange.chetan@gmail.com) *** Pune Institute of Computer Technology, Pune INDIA ( e-mail:mandar2174@gmail.com) **** Pune Institute of Computer Technology, Pune INDIA ( e-mail:swapnilkhorate@gmail.com) } Abstract: The paper proposes a technique for on-demand loading of code in MMUless embedded system using binary re-writing. This technique will be used for re-writing of code so that it runs on low-end embedded systems that lack MMU-support. Such re-writing of code can be helpful for processors having constrained on-chip memory and lack support for virtualization. This will reduce cost and add functionality to hardware using system on chip. We first analyze the problem using Mathematical modeling and based on analysis we propose the system for modifying the code. We put forth problems that need to be handled during loading of code and possible solutions to them. 1. INTRODUCTION Many embedded applications use low end processors due to its low cost and low power consumption. These processors usually have constrained resources. They lack MMU support, are small in size and have less on-chip memory. Processor speed is increasing and so is possibility of running complex programs on these embedded processors. But memory size in low-end processor remains the same. Even though processing power is present, this power cannot be utilized due to lack of memory and MMU support. Usually embedded system programmers reinvent on-demand loading techniques every time they face problem of memory constraint. We try to solve this problem by our code generation mechanism which will support on demand loading. Thus code modified by our system can run even if it is bigger than on-chip RAM available. This will reduce the need of external RAM and thus will also reduce its cost. 2. APPROACHES FOR ON-DEMAND LOADING OF CODE On demand loading in MMUless processors can be achieved using two approaches: 1) Static translation and 2) Dynamic Translation. Dynamic translation is done using technique called Dynamic Binary Translation (DBT). This technique translates the code at run-time just before it gets executed. This type of translation incurs run-time overhead of translating code every time it jumps to new location which is not already translated. This overhead can be reduced by use of code cache. But since low-end processors like ARM cortex-m3 don t have cache, huge run-time overhead will be incurred in DBT. Also the memory required for DBT is large compared to static translation and we are designing system for low-end microcontrollers which have less on-chip RAM, thus use of dynamic translation is not recommend. But, since DBT and static translation have some problems in common, we have reviewed DBT and tried to understand its working. Static translation is done during compilation of code. In this the modified binary is generated which can run even if its size is bigger than size of RAM. Although static translation does not incur runtime overhead, it can be used only for code regions that can be identified statically. It cannot be used for self modifying code and indirect control transfers. Since, Both approaches have their limitations, we are using both approaches in our system, so that minimum overhead is incurred and all cases are handled. We are modifying code during its compilation, But for indirect control transfers, a fault will be generated which will handle on demand loading. 3.1 System on chip 3. RELATED TERMS The System-on-a-chip or system on chip (SoC or SOC) is integrated circuit (IC) which will have all the functionality like flash, RAM, processor on single integrated chip. This is done to reduce the size, cost and power consumption of embedded devices. It also increases speed of execution. Use of SOC in embedded systems is increasing day by day.

.3.2 Binary re-writing A binary re-writing means modifying a program to functionally equivalent program which has some extra features like it is optimized or has smaller code size. We are using Binary re-writing to get new program which can be executed on RAM smaller than program size..3.3 On-demand loading of code A code is loaded into memory only when that code is referenced. Such type of loading of code is called on-demand loading. 4. MATHEMATICAL MODELLING Mathematical modelling is way of expressing a problem in form of set theory, relations and functions. This way of expressing problem helps us to understand system better and determine its input, output, success, failure and other related parameters. It helps in finding out efficient ways of implementing the system and various algorithms using which it can be implemented. 4.1 Mathematical Model Following is mathematical model of loading data in size constraint embedded system, Let S be a System, S ={ Mc 1,Mc 2, R size, P size, P, P,P, Pc, Ap, N, M } Mc 1: Input Binary Program Mc 2 : : Output Binary Program R size : Ram size P size : Page size P: Set of partitions in secondary memory P : Set of partitions in Ram P : Set of modified partitions Pc : Program Counter Ap :Active partition N : Number of partitions that Ram can hold M : Number of partitions present in program Here, each partition is of same and fixed size. P = { P 1, P 2,........., P n } where 0<n< constant P i = { I 1, I 2,........, I n } I i = { Ad, Mne, [Reg], [X] } Where, Ad = Address Mne = Mnemonic Reg = Register X = Register memory address. 4.1.1 Mapping Functions A Initially, P = { ɸ } where P = set of partitions modified RAM F (PC, A p ) I i (1) Program counter PC fetches the instruction at address from active partition A p. F 1 ( I i ) P j (2) Instruction I i is executed and for branch gives the next required partition. F 2 ( PC, P j ) A p (3) Routine to denote partition is in RAM that makes partition active F 2 ( PC, P j, P ) A p (4) Loads partition from secondary memory into primary memory. F 3 ( P ) P (5) Denotes partitions loaded into the RAM. F 4 ( P) P (6) Denotes the partition is modified. 4.1.2 Polymorphism F 2 ( PC, P j ) A p F 2 ( PC, P j, P ) A p

If the required partitions are in RAM then processor will fetch next instruct otherwise it will load partition from secondary memory jump to loaded functions. In this system the corresponding address instruction will be replaced by the code to handle the memory access. limitations. So, we will use both approaches to get optimum performance and ensure that all cases like indirect control transfer are handled. Fig1 shows all phase s code has to go through before on-demand loading. Thus our system works at two phases namely 1) Code Generation Phase and 4.1.4 Search Once the address have been modified, the list has to be searched sequentially and compared with the initial address of current instruction and if an operand has address greater than the one present in initial address register modify the address. 4.1.5 Success State P = { P 1, P 2, P 3........, P n } All the partitions have been successfully modified. 4.1.6 Failure State F 1 ( I i ) P j where j>m Or If the address in the PC is out of program i.e. invalid address then system goes in invalid state and in order to come back to valid state it aborts the execution.. 2) Execution Phase. Working of These phases is as follows: 5.1 Code Generation Phase: Our System works at code generation phase as shown in fig.1. In this phase, we link an (interrupt service routine) ISR to input code of an exception which will be generated whenever there is jump no execute (NX) region. Then we will modify all addresses in binary file to point to NX region. Following are the steps followed in Code Generation Phase: 1. Write ISR routine for NX fault and link it to input code. 2. Change address of each jump to point to NX region. We can trick compiler in doing this by placing our code in NX region by modifying linker script. 4.1.7 System constraint Output file will be produced for single target machine architecture. Fig 1: Phases of code for on-demand loading 5. ON DEMAND LOADING IN SIZE CONSTRAINED MEMORY We can achieve on-demand loading of code using both static and dynamic approaches. But they both have their own Fig 2: flow of function call in execution phase 5.2 Execution Phase:

In this phase, we have to handle all control transfer instructions (CTI) and ensure that code is loaded in RAM before executing it. Also the context in which the CTI instruction is executed must not change. Otherwise it may result in erroneous results especially in case of conditional jumps. This can be achieved through exceptions or faults. Before every control transfer instruction NX fault is generated which will check if code to be executed is loaded in RAM or not. We ensure that fault is occurred by changing branch addresses to point to NX region in code generation phase. Since exception handlers will be frequently used, we will always keep all the data and exception handlers in RAM. The Table is created and Initialized with name of every function and appropriate values of logical address, physical address, used bit and size of function.execution of code in MMUless processor takes place as follows: 1. Every instruction is executed step by step until control transfer instruction. 2. Since we have modified addresses in CTI instructions, NX fault will be generated. 3. Inside Fault we can easily re-generate address of branch instruction. 4. Check whether corresponding function is present in RAM or not. 5. If it is present in RAM then find actual address of loaded function. 6. Copy the jump instruction to other location with actual address. We can use table based approach for replacement of algorithm. In this approach we will store logical address, physical address, used bit and size of function in table at bottom of RAM. Logical address represents the address which is present in actual code. This address will point to NX region. Physical address represents address where function is loaded in RAM. Whenever there is nesting of functions calling function used bit is set to one. This function is not replaced from memory. In this approach we can also consider the size of free memory above and below a function to be replaced. This helps in reducing external fragmentation. But it cannot be completely avoided. We can also use buddy system for loading of functions. Fragmentation problem will be solved using buddy system. 6. LIMITATIONS This algorithm will not work in 2 conditions. These conditions are as follows: 1. Whenever size of one function is bigger than RAM our algorithm fails. 2. Whenever all the functions are nested. And total size of active functions is greater than RAM. But both these conditions are very rare. So, according to us, it is not necessary to handle these conditions as it would increase execution overhead. 7. Change Link register to point to this instruction. 8. And execute the branch instruction to transfer control to loaded code. 9. Else 10. Use Buddy system or any other replacement algorithm to load the function. 11. Copy the jump instruction to other location with actual address. 12. Change Link register to point to this instruction. 13. And execute the branch instruction to transfer control to loaded code. This flow of execution is diagrammatically represented in Fig.2. It shows processing done whenever there is a function call. 6. REPLACEMENT ALGORITHMS 7. RELATED WORK Hae-woo Park, Kyoungjoo Oh, Soyoung Park, Myoung-min Sim, Soonhoi Ha. in Dynamic code overlay of SDFmodeled programs on low-end embedded systems have proposed 2 techniques for Dynamic loading in MMU-less embedded systems namely full shadowing and DBT Full shadowing: - This is a most commonly used technique for running code in MMU-less embedded systems. Full shadowing copies the entire contents of a program s binary from flash to main memory. This approach can be adopted when the size of main memory is greater than size of program's binary. Also with this approach the loading time of program increases as it has to load complete program beforehand. DBT: - Dynamic binary translation looks at a short sequence of code typically on the order of a single basic block then translates it and caches the resulting sequence. Code is only translated as it is discovered and when possible, and branch instructions are made to point to already translated and saved code (memorization).

1. This technique is difficult to implement in size constraints embedded system because of following reasons: 2. It requires setting up table for mapping between addresses in cache and original program. 3. There must be a DBT present in cache always to translate the instruction in cache. 4. MMU support is not present. Due to constrained memory it is difficult to use DBT for dynamic loading. Siddharth Choudhuri and Tony Givargis in Software virtual memory management for MMU-less embedded systems shows various approaches in which data can be virtualized, their advantages and disadvantages. But it only explains virtualization of data. The system proposed in this paper is applicable to both data and code. 8. CONCLUSIONS This paper presents a concept for dynamic loading of code and data in MMU-less embedded system using binary rewriting. 9. REFERENCES Moore, Hae-woo Park, Kyoungjoo Oh, Soyoung Park, Myoung-min Sim, Soonhoi Ha., 2006. Dynamic code overlay of SDF-modeled programs on low-end embedded systems. In 1st conference on Design, automation and test, in Europe. José A. Baiocchi, Bruce R. Childers., 2011. Demand code paging for NAND flash in MMU-less embedded systems In Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 USA. Mathias Payer, Thomas R Gross., 2010. Generating lowoverhead dynamic binary translators. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference on SYSTOR 10. Siddharth Choudhuri and Tony Givargis., 2005. Software virtual memory management for MMU-less embedded systems. In Technical Report, Center for Embedded Computer Systems, University of California, Irvine.