Software Design Document
|
|
|
- Lindsey Wells
- 10 years ago
- Views:
Transcription
1 Software Design Document TESS SPOC Array Database EXP-TESS-ARC-SW-0020 January 29, 2015
2 Sean McCauliff, Sr. Software Engineer (Author) Date Masoud Mansouri-Samani, TESS-SPOC Systems Engineer, ARC Date David Lung, TESS-SPOC System Safety and Mission Assurance Lead, ARC Date Jon Jenkins, TESS-SPOC Analysis Lead, ARC Date Dwight Sanderfer, TESS-SPOC Manager, ARC Date page 1 of 15
3 Contents 1 Introduction Purpose and Scope Intended Audience Related Documents Compliance Documents Reference Documents Architectural Design Context Functional Descriptions Heritage Architecture Application Programming Interface Storage Allocation Deployment Appendices 13 A Requirements 13 B Test Cases 14 page 2 of 15
4 List of Figures 1 Array Database Context ADB Client API CRUD and Transaction Control Methods ADB Client API Data Objects page 3 of 15
5 List of Tables page 4 of 15
6 Todo list page 5 of 15
7 Change Date Notes 4 Jun 2014 You-are-here diagram. Heritage section. Add SDD to the build system. Initial revision. 6 Jun 2014 The need for a new storage allocator. 11 Jun 2014 Grammar. Architecture section. Data objects API. ADB client API. 12 Jun 2014 ADB Document ID. 14 Jun 2014 Modified the publish task to put the docid into the published pdf filenames. Modified the build task to create build/doc-id.sty files specific to each tex document and in turn modified those documents to use it. Added ext.docid to all the docs/sources build.gradle files. 23 Oct 2014 Modify documents to use new automatic editing history. Use length of history entry to estimate the number of lines needed for the history entry. 29 Oct 2014 Update tests. Use ArrayId instead of ObjectId and other minor renaming issues. 28 Jan 2015 Add draft marker. Update signatures page. page 6 of 15
8 1 Introduction This document describes the software design of the TESS Science Processing and Operations Center (SPOC) Array Database (ADB) CSCI. 1.1 Purpose and Scope The purpose of this document is to provide a description of the Array Database CSCI including the (1) architectural design, (2) applicable requirements, (3) functional decomposition, (4) test cases, (5) traceability of requirements to functions that realize them and test cases that verify them, (6) data flow, (7) interface and input/output classes, (8) execution sequence, and (9) planned utilization of NASA Advanced Supercomputing (NAS) Pleiades cluster resources. The TESS SPOC pipeline derives heavily from the Kepler SOC pipeline which has been instrumental in the discovery of thousands of extrasolar planets and candidates, and is largely a proven commodity. Differences in the ADB CSCI between Kepler and TESS are emphasized in this document; these are driven by differences in the science data and data acquisition rate between the two transit photometry missions. 1.2 Intended Audience The intended audience of the document is composed of systems engineers, software engineers, scientists and operations staff of TESS. 1.3 Related Documents Compliance Documents This Software Design Description will be compliant with the following documents. Functional Specification (EXP-TESS-ARC-SW-0018) Test Cases (EXP-TESS-ARC-SW-0019) Software Requirements Document (SRD) (EXP-TESS-ARC-RQMT-0010) System Architecture Document (EXP-TESS-ARC-SW-0003) Reference Documents Pipeline Infrastructure Software Design Document (SDD) (EXP-TESS-ARC-SW-0004) page 7 of 15
9 2 Architectural Design 2.1 Context A block diagram of the SPOC pipeline is shown in figure 1. The ADB CSCI is highlighted along with Oracle (RDBMS). Together ADB and Oracle compose the Datastore component. This is the long term storage for all SPOC data transmitted between pipeline components. Pipeline GUI Original Pixel Data POC Target Lists Compression Tables Synthetic Data TESS Input Catalog Target Lists Data Receipt (DR) Raw Pixels Calibration (CAL) Compression Tables Compression (COMP) Compression Tables Calibrated Pixels CAL Metrics Synthetic Data Compute Optimal Aperture (COA) Optimal Apertures Photometric Analysis (PA) PA Metrics SPOC Science Pipeline & Other Infrastructure Raw Flux Centroids Photometer Management End-to-End Model (Lilith) Presearch Data Conditioning (PDC) Models Pipeline Infrastructure (PI) Corrected Flux Models Models (MOD) Transiting Planet Search (TPS) Models Datastore (ADB/Oracle) Threshold-crossing Events (TCEs) CDPP Photometer Performance Assessment (PPA) Data Validation (DV) Archive (AR) Photometer Performance Metrics Planet Candidates and Reports Exports MAST TSO Figure 1: Array Database Context 2.2 Functional Descriptions The Array Database CSCI is decomposed into a set of functions that enable it to realize the requirements allocated to it. The ADB requirements are enumerated in appendix A. Descriptions of the ADB functions follow with references to the requirements that they realize.. L5 F.ADB Array Data Base (ADB) Description The ADB server is a specialized database management system used to provide an abstraction and simple query interface over the accumulated TESS mission data. The ADB is one component of the datastore which also includes a relational datbase component. The ADB and the relational database combined store the minimal necessary data for running the TESS pipeline. We use the ADB instead of a relational database for performance reasons. For the data types that the ADB handles we have speedups of 8 to 80 times that of a relational database. Realizes R.SPOC.10: The SPOC shall provide automation, persistence, and configuration management services to pipeline algorithms. L6 F.ADB.1 Provide network-based client interface page 8 of 15
10 F.ADB : Array Data Base (ADB) Description This means that the ADB should allow access over a network to the data stored within. Specifically this interface should provide a mechanism for storing arrays, sparse arrays and arbitrary binary data (BLOBs) or some more mission specific version of those data types.. L6 F.ADB.2 Provide persistence to disk F.ADB : Array Data Base (ADB) Description When a transaction is complete the data should be stored on disk as opposed to stored in memory or some volatile storage medium.. L6 F.ADB.3 Provide management interface F.ADB : Array Data Base (ADB) Description This allows an administrator or an engineer to assess the current state of the ADB server and provides for limited control over the current transactions. More specifically the management interface lists the uptime of the server; the number of connected clients; the total number of threads allowed to perform I/O and a means to increase that number; the active transactions and a means to force the rollback of stuck transactions. Some management functionality is available via the ADB command line interface full functionality is available via a Java management extension (JMX) interface. Should the ADB server be implemented in some other language than the management interface can be exported via HTTP or some other remote management service.. L6 F.ADB.4 Provide transaction support F.ADB : Array Data Base (ADB) Description A transaction is the unit of work for the ADB. A transaction must maintain the following properties often referred to as ACID: Atomic, Consistent, Isolated and Durable. The atomicity property is that either all the I/O operations in a transaction will succeed or fail; there can be no partial success. Consistency ensures that integrity constraints are always obeyed. Isolation provides a view of the database to different concurrent processes which appears that no other concurrent processes are running. Durability means that a transaction is not complete until all its data has been written to a persistent storage medium. The ADB server has specific interpretations of some of the ACID properties. Atomicity can be enforced on a single write issued by a client or multiple writes. The only constraint enforced by the ADB server is to ensure that data provenance is kept consistent with the data it is associated with. Transactions have approximately "READ_COMMITTED" level of transaction isolation. In this isolation level transactions can read data that has been completely committed as part of another transaction before the other transaction is complete. It does not lock arrays that have been read as part of a transaction. Durability is handled under a separate function. 2.3 Heritage Kepler used ADB (then poorly named as the filestore or FS in Kepler nomenclature) to store one dimensional arrays, sparse arrays and regular files (also known as Binary Large Objects or BLOBs). Compared with a traditional RDBMs ADB has two advantages. Basic read and write operations have a speedup of about 8 to 80 compared with writing the same data into an RDBMS with an appropriate schema. The natural representation of the data is a much closer match to an array database than to a relational database. Compared with flat files the ADB has the advantage of maintaining transactional properties ACID (Atomic, Consistent, Isolated and Durable) over the data. ADB read and write operations are atomic since all operations that are part of the same transaction either succeed or fail. They are consistent in the sense that originator identifiers (see the PI SDD) always identify the source of every array element or BLOB. This is part of the data accountability requirement. Isolation is the view that each client of the database looks as if they get their own copy of the database. ADB implements a looser version of READ_COMMITTED transaction isolation level. That is transaction A may read transition B s changes after B has committed even though transaction A has not yet committed. Durability is ability to have permanent changes remain page 9 of 15
11 permanent. ADB is resilient to software errors that would cause the ADB process to crash; the ADB sever can be killed at any time without threat of data corruption. ADB can not guarantee durability in the face of power failures or OS kernel panics. [F.ADB.4] TESS ADB will maintain all these features. There are a few additional features that are needed for more efficient TESS pipeline operations. Often we want to represent data that is attached to some cadence or time interval. For example, we want to associate a motion polynomial BLOB with a start cadence of 1000 and an end cadence of Overtime we build up these motion polynomial BLOBs and so ADB clients want to query over a large interval of time and get back every BLOB in that interval. Kepler handles this problem in a very awkward manner where the relational database is asked to process the query and create the list of things that need to be fetched. For TESS we would bring this into ADB itself. Unlike Kepler, TESS ADB will need to be able to delete a large amount of data in order to keep the total storage utilization within the bounds set by the mission. This is about three months of data storage. Since some stars will be observed for longer periods only stars not going to be re-observed after 3 months will need to be dropped. TESS, will have much more pixel data. TESS pixel masks are larger (generally 10 x 10) than Kepler and TESS will be storing full frame images as arrays. Efficiently storing this data means exploiting the data locality implicit in the arrangement of pixels. These two differences (highly selective delete and a need for greater data locality) entail a new storage allocation algorithm for ADB. The new storage allocator will store a large number of arrays in the same file. Deleting an array of part of a array will involve revoking all or some of the file system blocks that are allocated to it. This is similar to how file systems allocate disk block to files. As disk capacity becomes larger we need fewer spinning hard drives to store the same amount of data. This is generally a good thing. But fewer spinning drives means random access performance of a storage system will decrease. This decreases linearly with the number of hard disks in the storage system. As a result TESS will need to exploit more data locality by having reads and writes be more contiguous. The Kepler ADB only had support for one dimensional arrays. By implementing multidimensional arrays for TESS ADB we can insure that I/O operations are less frequent and larger. [F.ADB.2]ADB provides for remote monitoring over Java Management Extensions (JMX) and a custom command line interface. 3 Architecture 3.1 Application Programming Interface [F.ADB.1] The ADB application programming interface provides three kinds of interfaces. Create, Read, Update, Delete (CRUD) methods(1) to manipulate the data objects (2) and transactions over the CRUD methods(3). To begin a transaction a the client can just call begintransaction() on the ADB client instance. Committing or rolling back a transaction is accomplished by calling committransaction() or rollbacktransaction() on the ADB client instance. Transactions will automatically rollback after a configurable transaction timeout. Figure 2 is a UML class diagram that shows the transaction API and the CRUD API. Blue elements indicate new functionality for TESS. Figure 3 shows data objects managed by ADB. Arrays also have subclasses for double precision numbers in additional to those shown in figure 3. Additional array types supporting any Java primitive type are trivial to add. Blue elements indicate new functionality for TESS. The MultidimensionalArrayClient allows for the manipulation of MultidimensionalArrays. We show only a single precision float implementation, but we also plan for a 32-bit signed integer multidimensional array implementation. The CRUD methods associated with MultidimensionalArrays allow for the specification of the indices into the array dimensions should be returned. A query spec for a MultidimensionalArray can be constructed from strings written in MATLAB style. For example, accessing all columns and rows one through 10 from a two dimensional array can be written as "1:10, :". IntervalBlobs are arbitrary binary data, usually these are MAT files, that are too complicated to store as simple arrays and storing them in the database would require much in the way of database schema engineering. An IntervalBlob also has a cadence interval associated with it that determines the time interval over which it is valid. The CRUD API allows for the specification of a object identifier and an interval over which blobs should be fetched. It s possible that multiple BLOBs could be valid over the specified query interval. The value returned, BlobFileSeries, associates an originator id and a cadence interval for each BLOB returned. 3.2 Storage Allocation [F.ADB.2] The Kepler ADB storage allocator attempts to pack multiple 1D arrays into a single file. It does this in a way that assumes that it knows the order in which bytes will be written into in the future. This turns out to work well most of the time, but as the ADB file system instance for Kepler has aged more files have become fragmented and sparse. This leads to more random access reads than are really necessary. The TESS ADB storage allocator will use our existing B+-tree index implementation to store an ArrayId and the extent information for each array. An extent is simply a tuple of the logical array address, the start address into the file of the extent, page 10 of 15
12 Figure 2: ADB Client API CRUD and Transaction Control Methods and a length. This length is the number of blocks allocated to the array. When the space for an extent is exhausted a new extent is allocated from an index of free space in the file. When an array (or part of array) needs to be deleted it s extents are deleted from the index. Efficient storage of originator identifiers can be realized by using an R*-tree. This is similar to a B+-tree, but it stores with it informtion on the hyper-rectanges which bound an area within the multidimensional array. This would be used to associate originator ids with arbitary areas. 3.3 Deployment ADB is never deployed within the NAS supercomputing environment. It always exists on a server within the SPOC network. page 11 of 15
13 Figure 3: ADB Client API Data Objects page 12 of 15
14 Appendices A Requirements SPOC requirements which have been allocated to ADB are enumerated below with references to the functions which realize them. The ADB functions are enumerated in section 2.2. The test cases which verify the requirements are listed in appendix B.. L6 R.ADB.1 ADB shall provide long-term persistent storage for pipeline data R.SPOC.10 : The SPOC shall provide automation, persistence, and configuration management services to pipeline algorithms Rationale Provides a common source for pipeline modules and other code to access pipeline data with distributed access.. L7 R.ADB.1.1 ADB shall provide CRUD for arrays, sparse arrays and blobs. R.ADB.1 : ADB shall provide long-term persistent storage for pipeline data Rationale This is the bulk of the TESS data. Verified By T.1: Automated End-to-end Pipeline Test. L7 R.ADB.1.2 ADB shall maintain ACID properties for managed data types R.ADB.1 : ADB shall provide long-term persistent storage for pipeline data Rationale Allow safe concurrent access to data Verified By T.18: ADB Test 1. T.19: ADB Test 2.. L7 R.ADB.1.3 ADB shall provide query interface for managed data types. R.ADB.1 : ADB shall provide long-term persistent storage for pipeline data Rationale Provide flexible access to pipeline data Verified By T.1: Automated End-to-end Pipeline Test. L7 R.ADB.1.4 ADB shall provide a management interface (JMX, command line) R.ADB.1 : ADB shall provide long-term persistent storage for pipeline data Rationale Support troubleshooting. L7 page 13 of 15
15 R.ADB.1.5 ADB shall allow concurrent access to data. R.ADB.1 : ADB shall provide long-term persistent storage for pipeline data Rationale Support concurrent, distributed pipeline processing Verified By T.1: Automated End-to-end Pipeline Test T.19: ADB Test 2.. L7 R.ADB.1.6 ADB shall be sized to hold at least 3 months worth of mission data at a time R.ADB.1 : ADB shall provide long-term persistent storage for pipeline data Rationale Allow enough time for data to be processed, reviewed, reprocessed if necessary, and exported before taking the data offline. Verified By T.21: ADB Test 4. L4 SPOC_22 The SPOC hardware shall be sized to store at least 3 sectors worth of data. Rationale Ensures that we have enough storage capacity to accommodate the data we receive and process. B Test Cases The test cases which are used to verify ADB requirements are listed below. For further information on the requirements themselves, see appendix A.. L5 T.18 ADB Test 1. Description Write multiple data types in a single transaction to the ADB server. Verify that all the data are written correctly by reading back the values that were written. Stopping the server and restarting it. Then reading back the values that were written. Starting a transaction and overwriting existing data, but then not committing the transaction, but rolling back instead. Data should still have original values once all transactions have been terminated. Verifies R.ADB.1.2: ADB shall maintain ACID properties for managed data types. L5 T.19 ADB Test 2. Description Create two transactions in different threads (A and B). Thread A will write data to ADB and then wait until Thread B releases thread A. Thread B will write different data to the same ArrayIds as thread A. Thread B will commit and verify that it sees B s data but not A s data. Thread B releases thread A. Thread A will verify that it still sees thread A s data and not thread B s data. Verifies R.ADB.1.2: ADB shall maintain ACID properties for managed data types R.ADB.1.5: ADB shall allow concurrent access to data. page 14 of 15
16 . L5 T.20 ADB Test 3 Description Browse the status of a running ADB server with the management interfaces. Verifies R.COMP.2.1: COMP shall compute a mean black table. L5 T.21 ADB Test 4 Description Maintain a spreadsheet of the estimated size of the file system space consumed by ADB. Verifies R.ADB.1.6: ADB shall be sized to hold at least 3 months worth of mission data at a time page 15 of 15
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
David Dye. Extract, Transform, Load
David Dye Extract, Transform, Load Extract, Transform, Load Overview SQL Tools Load Considerations Introduction David Dye [email protected] HTTP://WWW.SQLSAFETY.COM Overview ETL Overview Extract Define
In-memory databases and innovations in Business Intelligence
Database Systems Journal vol. VI, no. 1/2015 59 In-memory databases and innovations in Business Intelligence Ruxandra BĂBEANU, Marian CIOBANU University of Economic Studies, Bucharest, Romania [email protected],
ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001
ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel
Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)
Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster
1.264 Lecture 15. SQL transactions, security, indexes
1.264 Lecture 15 SQL transactions, security, indexes Download BeefData.csv and Lecture15Download.sql Next class: Read Beginning ASP.NET chapter 1. Exercise due after class (5:00) 1 SQL Server diagrams
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
DATA ITEM DESCRIPTION
DATA ITEM DESCRIPTION Form Approved OMB NO.0704-0188 Public reporting burden for collection of this information is estimated to average 110 hours per response, including the time for reviewing instructions,
Oracle Data Integrator: Administration and Development
Oracle Data Integrator: Administration and Development What you will learn: In this course you will get an overview of the Active Integration Platform Architecture, and a complete-walk through of the steps
Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led
Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led Course Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills
www.gr8ambitionz.com
Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1
Chapter 13 File and Database Systems
Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation
Chapter 13 File and Database Systems
Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.
SQL Server 2008/2008 R2 Advanced DBA Performance & Tuning COURSE CODE: COURSE TITLE: AUDIENCE: SQSDPT SQL Server 2008/2008 R2 Advanced DBA Performance & Tuning SQL Server DBAs, capacity planners and system
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2.
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.
CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection
BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: 10.1.1. Security Note
BlackBerry Enterprise Service 10 Secure Work Space for ios and Android Version: 10.1.1 Security Note Published: 2013-06-21 SWD-20130621110651069 Contents 1 About this guide...4 2 What is BlackBerry Enterprise
DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?
DBMS Questions 1.) Which type of file is part of the Oracle database? A.) B.) C.) D.) Control file Password file Parameter files Archived log files 2.) Which statements are use to UNLOCK the user? A.)
DiskPulse DISK CHANGE MONITOR
DiskPulse DISK CHANGE MONITOR User Manual Version 7.9 Oct 2015 www.diskpulse.com [email protected] 1 1 DiskPulse Overview...3 2 DiskPulse Product Versions...5 3 Using Desktop Product Version...6 3.1 Product
A Brief Introduction to MySQL
A Brief Introduction to MySQL by Derek Schuurman Introduction to Databases A database is a structured collection of logically related data. One common type of database is the relational database, a term
Raima Database Manager Version 14.0 In-memory Database Engine
+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized
Oracle WebLogic Server 11g Administration
Oracle WebLogic Server 11g Administration This course is designed to provide instruction and hands-on practice in installing and configuring Oracle WebLogic Server 11g. These tasks include starting and
FAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Siebel Installation Guide for UNIX. Siebel Innovation Pack 2013 Version 8.1/8.2, Rev. A April 2014
Siebel Installation Guide for UNIX Siebel Innovation Pack 2013 Version 8.1/8.2, Rev. A April 2014 Copyright 2005, 2014 Oracle and/or its affiliates. All rights reserved. This software and related documentation
Chapter 3: Operating-System Structures. Common System Components
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation System Generation 3.1
High Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771
ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced
Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic
Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic The challenge When building distributed, large-scale applications, quality assurance (QA) gets increasingly
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
LearnFromGuru Polish your knowledge
SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities
Top 10 reasons your ecommerce site will fail during peak periods
An AppDynamics Business White Paper Top 10 reasons your ecommerce site will fail during peak periods For U.S.-based ecommerce organizations, the last weekend of November is the most important time of the
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code
User Guide for VMware Adapter for SAP LVM VERSION 1.2
User Guide for VMware Adapter for SAP LVM VERSION 1.2 Table of Contents Introduction to VMware Adapter for SAP LVM... 3 Product Description... 3 Executive Summary... 3 Target Audience... 3 Prerequisites...
www.dotnetsparkles.wordpress.com
Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.
B.Sc (Computer Science) Database Management Systems UNIT-V
1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
www.novell.com/documentation Policy Guide Access Manager 3.1 SP5 January 2013
www.novell.com/documentation Policy Guide Access Manager 3.1 SP5 January 2013 Legal Notices Novell, Inc., makes no representations or warranties with respect to the contents or use of this documentation,
SkyRecon Cryptographic Module (SCM)
SkyRecon Cryptographic Module (SCM) FIPS 140-2 Documentation: Security Policy Abstract This document specifies the security policy for the SkyRecon Cryptographic Module (SCM) as described in FIPS PUB 140-2.
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length
AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping
AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping 3.1.1 Constants, variables and data types Understand what is mean by terms data and information Be able to describe the difference
An Oracle White Paper May 2011. Exadata Smart Flash Cache and the Oracle Exadata Database Machine
An Oracle White Paper May 2011 Exadata Smart Flash Cache and the Oracle Exadata Database Machine Exadata Smart Flash Cache... 2 Oracle Database 11g: The First Flash Optimized Database... 2 Exadata Smart
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
MySQL Storage Engines
MySQL Storage Engines Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employs different storage mechanisms, indexing facilities, locking levels
Generating Enterprise Applications from Models
Generating Enterprise Applications from Models Vinay Kulkarni, R Venkatesh, Sreedhar Reddy Tata Research Development and Design Centre, 54, Industrial estate, Hadapsar, Pune, 411 013, INDIA { vinayk, rvenky,
DATA MASKING A WHITE PAPER BY K2VIEW. ABSTRACT K2VIEW DATA MASKING
DATA MASKING A WHITE PAPER BY K2VIEW. ABSTRACT In today s world, data breaches are continually making the headlines. Sony Pictures, JP Morgan Chase, ebay, Target, Home Depot just to name a few have all
How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
How To Monitor A Server With Zabbix
& JavaEE Platform Monitoring A Good Match? Company Facts Jesta Digital is a leading global provider of next generation entertainment content and services for the digital consumer. subsidiary of Jesta Group,
Basic Unix/Linux 1. Software Testing Interview Prep
Basic Unix/Linux 1 Programming Fundamentals and Concepts 2 1. What is the difference between web application and client server application? Client server application is designed typically to work in a
Siebel Installation Guide for Microsoft Windows. Siebel Innovation Pack 2013 Version 8.1/8.2, Rev. A April 2014
Siebel Installation Guide for Microsoft Windows Siebel Innovation Pack 2013 Version 8.1/8.2, Rev. A April 2014 Copyright 2005, 2014 Oracle and/or its affiliates. All rights reserved. This software and
Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011
the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful
PostgreSQL Backup Strategies
PostgreSQL Backup Strategies Austin PGDay 2012 Austin, TX Magnus Hagander [email protected] PRODUCTS CONSULTING APPLICATION MANAGEMENT IT OPERATIONS SUPPORT TRAINING Replication! But I have replication!
Physical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
Data Collection and Analysis: Get End-to-End Security with Cisco Connected Analytics for Network Deployment
White Paper Data Collection and Analysis: Get End-to-End Security with Cisco Connected Analytics for Network Deployment Cisco Connected Analytics for Network Deployment (CAND) is Cisco hosted, subscription-based
Oracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
The Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
How To Improve Performance In A Database
1 PHIL FACTOR GRANT FRITCHEY K. BRIAN KELLEY MICKEY STUEWE IKE ELLIS JONATHAN ALLEN LOUIS DAVIDSON 2 Database Performance Tips for Developers As a developer, you may or may not need to go into the database
14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:
14 Databases 14.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a database and a database management system (DBMS)
Scalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop
Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop What you will learn This Oracle Database 11g SQL Tuning Workshop training is a DBA-centric course that teaches you how
Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University - Commerce
Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University - Commerce Program Objective #1 (PO1):Students will be able to demonstrate a broad knowledge of Computer Science
Microsoft SQL Database Administrator Certification
Microsoft SQL Database Administrator Certification Training for Exam 70-432 Course Modules and Objectives www.sqlsteps.com 2009 ViSteps Pty Ltd, SQLSteps Division 2 Table of Contents Module #1 Prerequisites
Guideline for stresstest Page 1 of 6. Stress test
Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number
Memory-Centric Database Acceleration
Memory-Centric Database Acceleration Achieving an Order of Magnitude Increase in Database Performance A FedCentric Technologies White Paper September 2007 Executive Summary Businesses are facing daunting
Redundancy Options. Presented By: Chris Williams
Redundancy Options Presented By: Chris Williams Table of Contents Redundancy Overview... 3 Redundancy Benefits... 3 Introduction to Backup and Restore Strategies... 3 Recovery Models... 4 Cold Backup...
Automate Your BI Administration to Save Millions with Command Manager and System Manager
Automate Your BI Administration to Save Millions with Command Manager and System Manager Presented by: Dennis Liao Sr. Sales Engineer Date: 27 th January, 2015 Session 2 This Session is Part of MicroStrategy
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
Middleware- Driven Mobile Applications
Middleware- Driven Mobile Applications A motwin White Paper When Launching New Mobile Services, Middleware Offers the Fastest, Most Flexible Development Path for Sophisticated Apps 1 Executive Summary
Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability
Database Management Systems Winter 2004 CMPUT 391: Implementing Durability Dr. Osmar R. Zaïane University of Alberta Lecture 9 Chapter 25 of Textbook Based on slides by Lewis, Bernstein and Kifer. University
Gentran Integration Suite. Archiving and Purging. Version 4.2
Gentran Integration Suite Archiving and Purging Version 4.2 Copyright 2007 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the Gentran Integration Suite Documentation
Technical Note. Dell PowerVault Solutions for Microsoft SQL Server 2005 Always On Technologies. Abstract
Technical Note Dell PowerVault Solutions for Microsoft SQL Server 2005 Always On Technologies Abstract This technical note provides information on the Dell PowerVault storage solutions, based on the Microsoft
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
MS-40074: Microsoft SQL Server 2014 for Oracle DBAs
MS-40074: Microsoft SQL Server 2014 for Oracle DBAs Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills and experience as an Oracle
Introduction to Database Systems
Introduction to Database Systems A database is a collection of related data. It is a collection of information that exists over a long period of time, often many years. The common use of the term database
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Application Performance Management for Enterprise Applications
Application Performance Management for Enterprise Applications White Paper from ManageEngine Web: Email: [email protected] Table of Contents 1. Introduction 2. Types of applications used
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
Database as a Service (DaaS) Version 1.02
Database as a Service (DaaS) Version 1.02 Table of Contents Database as a Service (DaaS) Overview... 4 Database as a Service (DaaS) Benefit... 4 Feature Description... 4 Database Types / Supported Versions...
ORACLE DATABASE 10G ENTERPRISE EDITION
ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.
Fundamentals of Database Design
Fundamentals of Database Design Zornitsa Zaharieva CERN Data Management Section - Controls Group Accelerators and Beams Department /AB-CO-DM/ 23-FEB-2005 Contents : Introduction to Databases : Main Database
EMC MID-RANGE STORAGE AND THE MICROSOFT SQL SERVER I/O RELIABILITY PROGRAM
White Paper EMC MID-RANGE STORAGE AND THE MICROSOFT SQL SERVER I/O RELIABILITY PROGRAM Abstract This white paper explains the integration of EMC Mid-range Storage arrays with the Microsoft SQL Server I/O
DIABLO VALLEY COLLEGE CATALOG 2014-2015
COMPUTER SCIENCE COMSC The computer science department offers courses in three general areas, each targeted to serve students with specific needs: 1. General education students seeking a computer literacy
Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam
Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam Many companies of different sizes and from all sectors of industry already use SAP s inmemory appliance, HANA benefiting from quicker
Oracle Enterprise Manager
Oracle Enterprise Manager System Monitoring Plug-in for Oracle TimesTen In-Memory Database Installation Guide Release 11.2.1 E13081-02 June 2009 This document was first written and published in November
Virtualization of Oracle Evolves to Best Practice for Production Systems
Wikibon.com - http://wikibon.com by David Floyer - 2 May 2013 http://wikibon.com/virtualization-of-oracle-evolves-to-best-practice-for-production-systems/ 1 / 15 Introduction Eighteen months ago Wikibon
4D Deployment Options for Wide Area Networks
4D Deployment Options for Wide Area Networks By Jason T. Slack, Technical Support Engineer, 4D Inc. Technical Note 07-32 Abstract 4 th Dimension is a highly flexible tool for creating and deploying powerful
