Outline. mass storage hash functions. logical key values nested tables. storing information between executions using DBM files



Similar documents
CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

Main Memory & Backing Store. Main memory backing storage devices

Storage in Database Systems. CMPSCI 445 Fall 2010

Chapter 8 Memory Units

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

Discovering Computers Living in a Digital World

How To Store Data On A Computer (For A Computer)

CSCA0201 FUNDAMENTALS OF COMPUTING. Chapter 5 Storage Devices

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

Lesson Plan. Preparation

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Chapter 2 Data Storage

Physical Data Organization

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Here are my slides from lecture, along with my notes about each slide.

Devices and Device Controllers

File System Management

Record Storage and Primary File Organization

Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if you have questions! Lab #2 Today. Quiz #1 Tomorrow (Lectures 1-7)

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Block diagram of typical laptop/desktop

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Mass Storage Structure

Chapter 3: Computer Hardware Components: CPU, Memory, and I/O

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Database Fundamentals

& Data Processing 2. Exercise 2: File Systems. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen

Computer Logic (2.2.3)

lesson 1 An Overview of the Computer System

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

Communicating with devices

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Discovering Computers Chapter 7 Storage

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Data Storage - I: Memory Hierarchies & Disks

Primary Memory. Input Units CPU (Central Processing Unit)

What's in a computer?

ELECTRONIC DOCUMENT IMAGING

Chapter 7 Types of Storage. Discovering Computers Your Interactive Guide to the Digital World

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Sistemas Operativos: Input/Output Disks

Chapter 12: Secondary-Storage Structure

Computer Organization & Architecture Lecture #19

Unit Storage Structures 1. Storage Structures. Unit 4.3

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Lecture 16: Storage Devices

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Data Storage. 1s and 0s

Chapter 10: Mass-Storage Systems

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

University of Dublin Trinity College. Storage Hardware.

6. Storage and File Structures

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Database 2 Lecture I. Alessandro Artale

1.1 Electronic Computers Then and Now

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Lecture 1: Data Storage & Index

Introduction to the Mathematics of Big Data. Philippe B. Laval

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

NOVA COLLEGE-WIDE COURSE CONTENT SUMMARY ITE INTRODUCTION TO COMPUTER APPLICATIONS & CONCEPTS (3 CR.)

Chapter 2: Basics on computers and digital information coding. A.A Information Technology and Arts Organizations

Solid State Drive Architecture

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

Computer Storage. Computer Technology. (S1 Obj 2-3 and S3 Obj 1-1)

Computer Peripherals

Chapter 8. Secondary Storage. McGraw-Hill/Irwin. Copyright 2008 by The McGraw-Hill Companies, Inc. All rights reserved.

Today we will learn about:

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Cloud storage Megas, Gigas and Teras

FUJIFILM Recording Media U.S.A., Inc. LTO Ultrium Technology

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

To understand how data is processed, by a computer, we can draw a simple analogy between computers and humans.

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

1 Computer hardware. Peripheral Bus device "B" Peripheral device. controller. Memory. Central Processing Unit (CPU)

Hardware: Input, Processing, and Output Devices. A PC in Every Home. Assembling a Computer System

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Prof. Dr. Ing. Axel Hunger Dipl.-Ing. Bogdan Marin. Operation Systems and Computer Networks Betriebssysteme und Computer Netzwerke

K Hinds Page No. 1. Lecture 3 ASCII

Read this before starting!

Platter. Track. Index Mark. Disk Storage. PHY 406F - Microprocessor Interfacing Techniques

Revised: November GCSE Computer Science

Counting in base 10, 2 and 16

Introduction to I/O and Disk Management

Chapter 12: Mass-Storage Systems

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Transcription:

Outline 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments MCS 260 Lecture 7 Introduction to Computer Science Jan Verschelde, 27 January 2016 Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 1 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 2 / 32

Mass Storage tapes and disks Mass storage means 1 the data is persistent 2 large capacity: giga or terabytes We distinguish modes of access: sequential access: one must rewind tapes direct access: read disks from any position We distinguish two different technologies: a magnetic file covers disk and tape surfaces optical disc media rely on laser technology Compression software also helps increasing capacity. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 3 / 32

Units to measure Capacity 1 byte = 8 bits. Large quantities are expressed in thousands (kilo), millions (mega), billions (giga), and trillions (tera). units value value in full Kb = kilobyte 2 10 10 3 1, 024 Mb = megabyte 2 20 10 6 1, 048, 576 Gb = gigabyte 2 30 10 9 1, 073, 741, 824 Tb = terabyte 2 40 10 12 1, 099, 511, 627, 776 The same prefixes (kilo, mega, giga, tera) measure clock speed of the CPU, or other frequencies. 1 hertz = 1 cycle per second 1 kilohertz = 2 10 cycles per second 1 megahertz = 2 20 cycles per second 1 gigahertz = 2 30 cycles per second Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 4 / 32

Disk Organization platters, tracks, sectors, cylinders A disk consists of a number of horizontal platters, covered by a magnetic coating. Data is stored on the two surfaces of each platter. Tracks are concentric circles on a surface. Sectors are track segments of equal size. Disk formatting: writing start and end of sectors. A cylinder is a set of tracks equidistant from the center of all surfaces. Consecutive data is placed in sequence on the same cylinder. Disks rotate and there is one moving read/write head per surface. An input/output block is a group of contiguous data read or written in one single input/output operation. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 5 / 32

I/O Disk Operations reading from and writing information to disk A buffer in main memory holds the entire block of data prior to writing to or after being read from disk. The seek is the movement of the heads towards the required track. The seek time is the time of a seek. The latency time is the time to wait for the required sector to pass beneath the read/write head. On average this equals half the rotation time. Time needed for one i/o operation: t i/o = t seek + t latency + t transfer. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 6 / 32

Flash Drives the memory stick Commonly used portable mass storage. connect to USB port, which powers the drive USB = Universal Serial Bus capacity goes to several gigabytes sends electronic signals to chambers of silicon dioxide, altering the characteristics of small electronic circuits Advantages and disadvantages: + unlike a disk drive, there is no movement, sometimes faster than optical disks can sustain only limited number of write and erase cycles Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 7 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 8 / 32

File Organization records and blocks Data is organized in logical records. One record in a phone book has three fields: name address phone number An input/output block can contain several records. The usage factor is # bytes allocated to logical records # bytes of physical blocks on file Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 9 / 32

Sequential File Organization order records sequentially Every record on file has a key. Records are stored in order of the keys. In a phone book, with names sorted alphabetically, the key is usually the name. Binary search is an efficient way to search through a sorted data collection. The main problem with sequential file organization is the insertion of new elements. Solutions to this problems are 1 store changes in a separate file that is then periodically merged with the main file 2 leave free blocks between records 3 use an overflow zone to insert new data Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 10 / 32

Hash-based File Organization order of records is computed Keys are generated by a hash algorithm. The hash algorithm defines a hash function, mapping logical key values (like a name) to a physical address (or a position). Goal: even distribution of keys over addresses. Mapping names into addresses via combinations of the ASCII codes of the characters in the strings representing the names is a first step. Advantage: fast access, reduced search speed. Disadvantage: two different key values could be mapped to the same address. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 11 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 12 / 32

Using Dictionaries choosing a key as index To select from a list or tuples, the index must be a number. But very often, we list data using names as indices. Consider for example a telephone directory. A dictionary is an unordered set of key:value pairs, where value can be of any data type. The type ofkey must admit an ordering, it must be hashable. For example, list summer sales according to month: >>> sales = { jun :123, aug :342, sep : 212 } >>> sales { jun : 123, aug : 342, sep : 212} >>> sales[ aug ] 342 Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 13 / 32

Operations on Dictionaries Modifying a dictionary: >>> sales { jun : 123, aug : 342, sep : 212} >>> sales[ jun ] = 321 >>> sales { jun : 321, aug : 342, sep : 212} Order a dictionary: >>> sales { jun : 321, aug : 342, sep : 212} >>> sales.keys() [ jun, aug, sep ] >>> ind = sales.keys() >>> sales[ind[2]] 212 By assigning the keys to an ordered listind, we have placed an order on the dictionary. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 14 / 32

Deleting and Adding Example continued... >>> sales.values() [321, 342, 212] >>> sales.keys() [ jun, aug, sep ] >>> len(sales) 3 on holiday in August... delete August sales >>> del sales[ aug ] >>> sales { jun : 321, sep : 212} >>> len(sales) 2 We continued in October... add October sales: >>> sales[ oct ] = 99 >>> sales { jun : 321, oct : 99, sep : 212} Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 15 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 16 / 32

Mileage Tables an application of dictionaries A mileage table lists the number of miles between several major cities. We store the distance between Chicago and 3 cities: Los Angeles, Miami, and New York, in a dictionary. The result ofinput() can immediately be used as the key to query the dictionary. running the program mileage.py: $ python mileage.py Give a city : Miami Chicago - Miami : 1237 miles $ Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 17 / 32

Mileage Tables distances between cities The program saved as mileage.py: # L-7 MCS 260 : a mileage table """ A dictionary records the distance from Chicago to 3 other cities. The result of a raw input is key to query the distances in the dictionary. """ CHICAGO = { Los Angeles : 2047, Miami : 1237, \ New York : 807} CITY = input( Give a city : ) print( Chicago -, CITY, :, CHICAGO[CITY], miles ) Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 18 / 32

Nested Dictionaries more useful mileage tables Mileage tables are two dimensional: we use two cities as index and obtain on return the distance. Build a dictionarydistance and query it as DISTANCE[CITY1][CITY2] wherecity1 andcity2 are 2 strings, holding the names of 2 cities. running the program miletab.py: $ python miletab.py Give first city : Los Angeles Give second city : Miami Los Angeles - Miami : 2780 miles $ Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 19 / 32

Nesting Dictionaries a 4-by-4 mileage table The name DISTANCE refers to a dictionary of dictionaries: DISTANCE = { \ Chicago : { Los Angeles : 2047, \ Miami : 1237, New York : 807}, \ Los Angeles : { Chicago : 2047, \ Miami : 2780, New York : 2787}, \ Miami : { Chicago : 1237, \ Los Angeles : 2780, New York : 1346}, \ New York : { Chicago : 807, \ Los Angeles : 2787, Miami : 1346} \ } >>> distance[ Los Angeles ][ Miami ] 2780 Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 20 / 32

miletab.py the complete code DISTANCE = { \ Chicago : { Los Angeles : 2047, \ Miami : 1237, New York : 807}, \ Los Angeles : { Chicago : 2047, \ Miami : 2780, New York : 2787}, \ Miami : { Chicago : 1237, \ Los Angeles : 2780, New York : 1346}, \ New York : { Chicago : 807, \ Los Angeles : 2787, Miami : 1346} \ } CITY1 = input( Give first city : ) CITY2 = input( Give second city : ) print CITY1, -, CITY2, :, \ DISTANCE[CITY1][CITY2], miles Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 21 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 22 / 32

Persistent Data storing information between executions Data that is persistent outlives programs. Objects constructed by a script are lost as soon as the script ends. Two extremes to make data persistent: 1 files: store string representations, 2 MySQL: store data in tables in a database. Intermediate solution: DBM files. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 23 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 24 / 32

using DBM files DBM files are standard in the Python library (Python2: anydbm). $ python >>> import dbm >>> libdb = dbm.open( library, c ) opened a new dbm with read-write access (flag = c ). Some issues about the location of files... The filelibrary will be in the current working directory. If you have no permissions to write in the current directory, then opening a dbm with read-write access will fail. (Useos.getcwd() to see the current working directory.) To create the dbm filelib in the/tmp directory: >>> libdb = dbm.open( /tmp/lib, c ) Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 25 / 32

adding data to a DBM file >>> libdb[ 0 ] = str({ author : Miller & Ranum,... title : Python Programming }) keys and values must be of type string >>> libdb.keys() [ 0 ] >>> libdb.values() ["{ title : Python Programming, author : Miller & $ ls library is a file in current directory. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 26 / 32

adding and selecting books using the key >>> import dbm >>> mylib = dbm.open( library, c ) >>> mylib.keys() [ 0 ] >>> mylib[ 1 ] = str({ author : Brookshear,... title : Computer Science: an overview }) >>> mylib.values() ["{ title : Python Programming, author : Miller & Ranum "{ title : Computer Science: an overview, author : Bro Selecting the author of book with key1: >>> V = mylib.values() >>> d = V[int(mylib.keys()[1])] >>> eval(d)[ author ] Brookshear Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 27 / 32

DBM File Operations an overview Python code description import dbm load moduledbm f = dbm.open( n, c ) create or open dbm file with namen f[ key ] = value assignvalue forkey f.keys() returns the keys value = f[ key ] loadvalue forkey count = len(f) number of entries stored found = key in f see if entry forkey del f[ key ] remove entry forkey f.close() close dbm file Typical use: every record in database has unique key, values are dictionaries, stored as strings. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 28 / 32

mass storage dictionaries in Python 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming storing rules in dictionaries 5 Summary + Assignments Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 29 / 32

Rule Based Programming: rules in dictionaries Some rules for differentiation: >>> D = { sin(x) : cos(x), \... cos(x) : -sin(x) } To prevent evaluation, the keys and values are strings. Applying the rules = consulting the dictionary. >>> D[ sin(x) ] cos(x) >>> D[ cos(x) ] -sin(x) Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 30 / 32

Differentiation and Integration with Sage and SymPy For differentiation and integration (calculus), Sage containsmaxima (Lisp) andsympy (Python). SymPy can be downloaded and installed separately. Some examples: sage: diff(cos(x),x) -sin(x) sage: integral(sin(x),x) -cos(x) Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 31 / 32

Summary + Assignments In this lecture we covered sections 1.2,1.3 in Computer Science: an overview more of chapter 4 of Python Programming in Context Assignments: 1 For the computers in the lab, find the clock speed, the capacity of the internal memory and disk. 2 Use a dictionary to record state capitols. 3 Store the money exchange rates between dollar, euro, and yen in a dictionary and illustrate how to convert any sum of money. 4 Make a dictionaryito store the antiderivation rules for common trigonometric functions, sin, cos, and tan. 5 Give the Python commands to usedbm for storing the mileage tables of this lecture. Intro to Computer Science (MCS 260) mass storage and dictionaries L-7 27 January 2016 32 / 32