Batch Processing How- To Or the The Single Threaded Batch Processing Paradigm



Similar documents
Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

2014 PA STICKY BUN CONTEST Winning Recipes

High Availability Solutions for the MariaDB and MySQL Database

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Performance Tuning for the JDBC TM API

Java EE Web Development Course Program

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation

WELCOME. Where and When should I use the Oracle Service Bus (OSB) Guido Schmutz. UKOUG Conference

Java DB2 Developers Performance Best Practices

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

PTC System Monitor Solution Training

Using RDBMS, NoSQL or Hadoop?

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Java Coding Practices for Improved Application Performance

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Whitepaper: performance of SqlBulkCopy

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

OTM Performance OTM Users Conference Jim Mooney Vice President, Product Development August 11, 2015

Java Monitoring. Stuff You Can Get For Free (And Stuff You Can t) Paul Jasek Sales Engineer

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

What s New in SharePoint 2016 (On- Premise) for IT Pros

Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung bjchung@cubrid.com

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Increasing Driver Performance

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Performance Testing of Java Enterprise Systems

Top 10 Performance Tips for OBI-EE

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Put a Firewall in Your JVM Securing Java Applications!

Understanding Hardware Transactional Memory

Chapter 18: Database System Architectures. Centralized Systems

Microsoft SQL Server performance tuning for Microsoft Dynamics NAV

Monitoring Pramati EJB Server

Google File System. Web and scalability

MADOCA II Data Logging System Using NoSQL Database for SPring-8

Oracle Weblogic. Setup, Configuration, Tuning, and Considerations. Presented by: Michael Hogan Sr. Technical Consultant at Enkitec

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Agility Database Scalability Testing

What s Cool in the SAP JVM (CON3243)

Database migration. from Sybase ASE to PostgreSQL. Achim Eisele and Jens Wilke. 1&1 Internet AG

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Liferay Performance Tuning

Distributed Computing and Big Data: Hadoop and MapReduce

Parallel Algorithm Engineering

No no-argument constructor. No default constructor found

ITG Software Engineering

Fairtrade Fortnight Banana Recipe Book

How To Use An Informix System With A Computer System (For A Dba)

In-memory Tables Technology overview and solutions

Development of parallel codes using PL-Grid infrastructure.

Banana-Cinnamon French Toast (#70)

B M C S O F T W A R E, I N C. BASIC BEST PRACTICES. Ross Cochran Principal SW Consultant

ABAP SQL Monitor Implementation Guide and Best Practices

Mark Bennett. Search and the Virtual Machine

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Software and the Concurrency Revolution

Abstract. For notes detailing the changes in each release, see the MySQL for Excel Release Notes. For legal information, see the Legal Notices.

COM 440 Distributed Systems Project List Summary

A Performance Engineering Story

WebSphere Server Administration Course

2013 OTM SIG CONFERENCE Performance Tuning/Monitoring

IBM WebSphere Server Administration

25 May Code 3C3 Peeling the Layers of the 'Performance Onion John Murphy, Andrew Lee and Liam Murphy

Service Oriented Architectures

Mind Q Systems Private Limited

Holistic Performance Analysis of J2EE Applications

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Patterns of Enterprise Application Architecture

In Memory Accelerator for MongoDB

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

Dynamics NAV/SQL Server Configuration Recommendations

Solutions for detect, diagnose and resolve performance problems in J2EE applications

Customer Bank Account Management System Technical Specification Document

JULIE S CINNAMON ROLLS

SQL Server 2012 Database Administration With AlwaysOn & Clustering Techniques

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Agenda. SSIS - enterprise ready ETL

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Performance Analysis of Lucene Index on HBase Environment

Eloquence Training What s new in Eloquence B.08.00

3 Techniques for Database Scalability with Hibernate. Geert Bevin - SpringOne 2009

WebLogic Server Foundation Topology, Configuration and Administration

2 SQL in iseries Navigator

Java Memory Model: Content

The Classical Architecture. Storage 1 / 36

Business Application Services Testing

Using DOTS as Apache Derby System Test

What is Web Security? Motivation

Raima Database Manager Version 14.0 In-memory Database Engine

Oracle Database Public Cloud Services

Building Web Applications, Servlets, JSP and JDBC

Transcription:

Batch Processing How- To Or the The Single Threaded Batch Processing Paradigm Stefan Rufer, Netcetera Matthias Markwalder, SIX Card Solutions 6840

Speakers 2 > Stefan Rufer Studied business IT at the University of Applied Sciences in Bern Senior Software Engineer at Netcetera Main int erest: Server side application development using JEE > Mat thias Markwalder Graduat ed from ETH Zurich Senior Developer + Framework Responsible at SIX Card Solutions Main int erest: High performance and qualit y batch processing

Why are we here? 3 > Let's learn how to bake an omelet.

AGENDA 4 > What do we do > Sharing our ex perience > Wrap up + Q&A

What do we do 5 > Credit / debit card t ransact ion processing > Backoffice bat ch processing application 24x 7x 365 > 1.7 Mio card transactions a day > Volume will double by end of 2010 be ready > Migrated from Forté UDS to JEE > More agile code base now

How do we do it 6 > Transactional integrity at any time > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your system can take) > 1 application server (12 cores) > 1 database server (12 cores, 1.5TB SAN)

Batch Processing Basics 7 > It s simple, but parallel: Read file(s) Process a bit Write file(s) > Terminology from Spring Bat ch

AGENDA 8 > What do we do > Sharing our ex perience > Wrap up + Q&A

Bake an omelet 9 > 200g flour, 3 eggs, 2 dl milk, 2 dl water, ½ table spoon salt > Stir well, wait 30min ( ) > Stir again > Put little butter in heated pan > Add 1dl dough > Bake until slightly brown, flip over, bake again half as long > Put cheese / marmalade / apfelmus /... on top, fold > Enjoy

Jobs run in parallel 10 > Load balancing > Complete yesterdays reports while doing today's business How to achieve > Use batch scheduling application that cont rols your entire processing. > Read/ modify categorization of jobs

Load limitations 12 > Load balancing > Generate 70 reports, but max 20 in parallel How to achieve > Number of workers one job can use > Priorities of the steps of a job

Decouple controller + workers 13 > Scalabilit y > SETI@home

Step trees, Sequential, Fail on Exception 14 > Avoid structuring st eps in code > Collect dat a, afterwards write a file. How to achieve > Sequent ial ex ecution > Fail on exception (rollback entire st ep)

Step trees, Parallel, Continue on Exception 15 > Minimize work left > Process 30'000 t ransactions in 3 steps. How to achieve > Parallel ex ecut ion > Continue on exception (still rollback entire step)

Parallelize reading 16 > Speedup > A file of 200'000 credit card authorisations and transactions have to be read into database. How to achieve > Cut input file in pieces of 10'000 lines each. btw: perl, sort are unbeaten for this... > Process each piece in a parallel step.

Parallelize processing 17 > Speedup > Summarize accounting data and store result in database again. How to achieve > Group data in chunks of 10'000 and process each chunk in a parallel step. > Choose grouping criteria carefully: No overlapping data areas Pass along data that you had to read for the grouping process

Parallelize processing how to group 18 > Structuring your data in parallelizable chunks > Load balancing > Parallelize processing by client as data is distinct by design. How to achieve > Group by client > Group by keys: Ranges or ids Ranges (1..5) can grow very large Keys (1, 2, 3, 4, 5) can become very many

Parallelize writing 19 > Transact ional int egrit y while writing files. > Easy recovery while writing files. > Collect data for the payment file. How to achieve > Collect data in parallel and write to a staging table. > Staging table content very close to target file format. > In a last step dump entire content of staging table to file.

Different processes write in parallel 20 > Don't lock out each other > Account informat ion changes while account balance grows. How to achieve > No optimistic locking > Modify deltas on sums and counters > Keep dist inct fields for different parallel jobs > Be aware of deadlock potential

Avoid insert and update in same table and step 21 > Speedup > Avoid DB locks > Summary rows in same table as the raw data. How to achieve > Normalize your database.

Let the database work for you 22 > Simple code > Speedup > Sorting or joining arrays in memory. How to achieve > Code review. > Book SQL course.

Read long, write short 23 > Keep lock contention on database minimal > Keep transactional DB overhead minimal > Fully process the whole batch of 1 000 records before starting to write to DB. How to achieve > 1 (one) "writing" database transaction per step. interface IModifyingStepRunner { void preparedata(); void writedata(); }

This omelet did not taste like grandma's! 24 > Despite following the recipe, there are the hidden corners > Let's have a look at some pitfalls

Don't forget to catch Error 25 > Application integrity delegated to DB > OutOfMemoryError caused half of a batch to be committed. Fatal as rerun can not fix inconsistency. How to fix try { result = action.dointransaction(status); } catch (Throwable err) { transactionmanager.rollback(status); throw err; } transactionmanager.commit(status);

Use BufferedReader / BufferedWriter 26 > Speedup (file reading time cut in half) > Forgot t o use BufferedReader in file reading framework. How to fix > Code review. > Profile if performance "feels not right".

Use 1 thread only 27 > Simplicity for t he programmer > Safet y (no concurrent access) > Singleton, synchronized blocks, static variables, stateful step runners we had it all... How to achieve > Configure framework to use one JVM per worker.

Cache wisely 28 > Speedup > Limit memory use > Tax rates do not change during a processing day, cache it long. > Customer data will be reused if processing transaction of same customer cache it short. How to achieve > Cache per worker > Cache lifetimes: Worker / step / on demand

Support JDBC batch operations 29 > Speedup List<Booking> bookings = new ArrayList<Booking>();... bookingdao.update(bookings); How to achieve > Enhance your database layer with a built- in JDBC batch facility. > Execute batch after 1000 items added. > Automatically re- run failed batch using single JDBC statements

Structured patching 30 > Risk management > Stay agile in production > Bug found, fixed and unit tested. Deploy to production asap. How to achieve > Eclipse- wizard to create patch (all files involved to fix a bug) > Pat ch- script t hat applies.class file/ SQL script/ whatever...

Never, ever, update primary keys 31 > Good database design > Speedup > Homemade library always wrote entire row to database. How to fix > Only writ e changed fields (dirt y flags). > Make primary keys immutable on your objects.

AGENDA 32 > What do we do > Sharing our ex perience > Wrap up + Q&A

Future 33 > Scalability is an issue with a single database server. Partitioning options used, but not to the end. Will Moore's law save us again? > Processing double the volume still to be proven...

If you remember just three things... 34 Java batch processing works and is cool :- ) Trade- offs: > Do not stock the work, start. > Single threaded, many JVMs. > Designing for scalability, stability needs experts. http:/ / www.google.ch/ search?q= how+ to+ flip+ an+ omelet

Stefan Rufer Netcetera AG stefan.rufer@netcet era.ch www.netcet era.ch matt hias.markwalder@six- Matthias Markwalder group.com SIX Card Solutions www.six- group.com

Links / References 36 > http:/ / en.wikipedia.org/ wiki/ Batch_processing > http:/ / static.springframework.org/ spring- batch/ > http:/ / www.bmc.com/ products/ offering/ control- m.html > http:/ / www.javaspecialists.eu/ And to really learn how to bake fine omelets, buy a book: > http:/ / de.wikipedia.org/ wiki/ Marianne_Kaltenbach > http:/ / www.oreilly.de/ catalog/ geeksckbkger/

Other batch processing frameworks (public only) 37 > http:/ / www.bmap4j.org/ > http:/ / freshmeat.net/ projects/ jppf > http:/ / hadoop.apache.org/