Big Data Optimization at SAS



Similar documents
Why is SAS/OR important? For whom is SAS/OR designed?

TOMLAB - For fast and robust largescale optimization in MATLAB

Introduction: Models, Model Building and Mathematical Optimization The Importance of Modeling Langauges for Solving Real World Problems

Dantzig-Wolfe bound and Dantzig-Wolfe cookbook

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Optimization Theory for Large Systems

ANALYTICS IN BIG DATA ERA

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Nonlinear Optimization: Algorithms 3: Interior-point methods

Integrating Benders decomposition within Constraint Programming

Data-Driven Decisions: Role of Operations Research in Business Analytics

ANALYTICS IN BIG DATA ERA

Big Data - Lecture 1 Optimization reminders

Fast Analytics on Big Data with H20

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

Make Better Decisions with Optimization

Report: Declarative Machine Learning on MapReduce (SystemML)

Support Vector Machines Explained

Tutorial: Operations Research in Constraint Programming

GAMS, Condor and the Grid: Solving Hard Optimization Models in Parallel. Michael C. Ferris University of Wisconsin

SAS for the Department of Business Studies, ASB

Research Statement Immanuel Trummer

High Performance Predictive Analytics in R and Hadoop:

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

The Gurobi Optimizer

Solving convex MINLP problems with AIMMS

OPTIMIZED STAFF SCHEDULING AT SWISSPORT

RevoScaleR Speed and Scalability

Optimized Scheduling in Real-Time Environments with Column Generation

Technical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Advanced In-Database Analytics

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Simulation-based Optimization Approach to Clinical Trial Supply Chain Management

Machine Learning over Big Data

Noncommercial Software for Mixed-Integer Linear Programming

Large-Scale Reservoir Simulation and Big Data Visualization

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

Simplified Benders cuts for Facility Location

Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology

Massive Data Classification via Unconstrained Support Vector Machines

Using diversification, communication and parallelism to solve mixed-integer linear programs

Load Balancing. Load Balancing 1 / 24

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

A fast multi-class SVM learning method for huge databases

A Survey of Shared File Systems

Federated Optimization: Distributed Optimization Beyond the Datacenter

Mesh Generation and Load Balancing

Support Vector Machines with Clustering for Training with Very Large Datasets

Distributed R for Big Data

Solving Large Scale Optimization Problems via Grid and Cluster Computing

Support Vector Machine (SVM)

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Compression algorithm for Bayesian network modeling of binary systems

Best practices for efficient HPC performance with large models

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Performance and Scalability Overview

An Introduction on SemiDefinite Program

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

An Improved Dynamic Programming Decomposition Approach for Network Revenue Management

Understanding the Benefits of IBM SPSS Statistics Server

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

Several Views of Support Vector Machines

Industrial and Systems Engineering (ISE)

Chapter 13: Binary and Mixed-Integer Programming

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Lecture 11: 0-1 Quadratic Program and Lower Bounds

Optimal Scheduling for Dependent Details Processing Using MS Excel Solver

Distributed R for Big Data

A numerically adaptive implementation of the simplex method

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

Proximal mapping via network optimization

Optimization applications in finance, securities, banking and insurance

Interior-Point Algorithms for Quadratic Programming

Using the analytic center in the feasibility pump

Advanced Big Data Analytics with R and Hadoop

Optimization Modeling for Mining Engineers

High-Performance Analytics

MODELS AND ALGORITHMS FOR WORKFORCE ALLOCATION AND UTILIZATION

Long-Term Security-Constrained Unit Commitment: Hybrid Dantzig Wolfe Decomposition and Subgradient Approach

A Constraint Programming based Column Generation Approach to Nurse Rostering Problems

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Multi-layer MPLS Network Design: the Impact of Statistical Multiplexing

Optimization approaches to multiplicative tariff of rates estimation in non-life insurance

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Solving polynomial least squares problems via semidefinite programming relaxations

Transcription:

Big Data Optimization at SAS Imre Pólik et al. SAS Institute Cary, NC, USA Edinburgh, 2013

Outline 1 Optimization at SAS 2 Big Data Optimization at SAS The SAS HPA architecture Support vector machines Quantile regression Marketing Optimization Local search optimization 3 Distributed/parallel optimization Decomposition Miscellaneous tools 4 Future plans

About SAS The company Leader in business analytics software and services About $3 billion worldwide revenue Largest private software company in the world World's Best Multinational Workplace in 2012 More than 11,000 employees, 400 oces and 600 alliances SAS customers or their aliates represent over 90% of the top 100 FORTUNE 500 companies The software Originally created for basic statistics by professors at NCSU Extended tremendously over the decades Covers all aspects of analytics and business intelligence

SAS/OR Oerings Optimization modelling and solvers Algebraic Modelling Language with all the usual solvers (LP, QP, MILP, NLP, CP, Scheduling, Decomposition,... ) Other tools Graph and network algorithms Discrete event simulation + the rest of SAS Solutions Marketing Optimization, Service Parts Optimization, Revenue Optimization, Size Optimization,... Services Technical Support, Training, Professional Services, Consulting Platforms Windows, Linux, Solaris x64/sparc, HPUX, AIX, z/os

What is Big Data? Nathan Brixius (in a recent blog post) A big data analytics application is simply an analytics application where SAS the required data does not t on a single machine and needs to be considered in full to produce a result. Big data is relative; it applies whenever an organization's need to handle, store and analyze data exceeds its current capacity. Related concepts/tools large-scale optimization distributed optimization parallel optimization

High Performance Analytics at SAS

Parallel Implementations and Determinism Non-determinism a dirty word in the commercial world Sources of non-determinism adding columns in a dierent order aggregating results in a dierent order arbitrary random number seeds dierent machines in the pool time limits Workarounds operations in a xed order deterministic criteria (nodes, iterations) deterministic ticks (see Xpress/Cplex) Determinism comes with a performance penalty.

Big Data Optimization at SAS Quadratic programming Support vector machine Linear programming Quantile regression Mixed-integer linear programming Marketing optimization Derivative-free optimization Local search optimization

Support Vector Machines http://www.cac.science.ru.nl/people/ustun/svm.jpg

Linear SVM Problem Primal formulation minimize w,z,β subject to 1 2 w 2 2 + τet z Y w βd e z z 0. Dual formulation minimize v e T v + 1 2 vt Y Y T v subject to d T v = 0, 0 v τe.

Using Primal-Dual Interior-Point Approach Dominant cost per iteration is forming/solving Newton system Many more observations than columns/features (I + Y T Ω 1 k Y v kvk T ) w = r w Y T Ω 1 Y must be formed every iteration k Many more columns/features than observations ( Y Y T ) ( ) ( ) + Ω k d v ρβ d T = 0 β Y Y T constant for all iterations r Ω

SVM: Parallel Performance

SVM: Features and Plans Features frequency/weight term iterative (PCG) or direct (Cholesky, threaded) method to solve the Newton system balance threads to avoid cache misses balance number of compute nodes to limit communication In progress nonlinear SVM build a distributed QP solver Available soon in SAS/OR

Quantile Regression Goal Approximate the median or some other quantile of the response variable of a number of observations min τu + + (1 τ)u A(β + β ) + u + + u = b β +, β, u +, u 0 A: observations in rows, fully dense b: response variable τ: quantile level Problem size Up to 10 8 observations each of dimension 10 4

Quantile Regression Features distributed IPM, similar to the SVM case Newton system solved directly or iteratively dierent preconditioner Plans categorical variables sparse observations nonlinear quantile regression build a distributed dense LP/IPM solver Available soon in SAS/OR

Marketing Optimization Problem Assign ads/oers to customers based on budget, policy, user preferences, history and other kinds of constraints. Formulation Typical sizes: millions of customers, hundreds of oers Formulated as a MILP with millions of binary variables Solution Special decomposition Subproblems are solved distributed on the grid Subgradient algorithm for the master Available as a SAS solution.

Marketing Optimization Typical data (telecommunications) 15 million customers 910 communications 14 aggregate constraints 19 rolling contact policies (per day, per week, per month) 90 million oers in the contact history Performance Used to take 10 hours on a single machine with regular MO Solved in 2 minutes on an EMC Greenplum appliance (32 nodes, 24 threads, 48GB ram) Allows for scenario analysis

Local Search Optimization Algorithm GA-guided pattern search Continuous, discrete and categorical variables Up to about 100 variables Implementation Classical: each worker evaluates the function at a given point Big data: each worker computes part of the function value from its own data, then these are aggregated Function value cache Leading to simulation-based optimization Available as part of SAS/OR

Parallel Optimization Not Big Data, but uses the same infrastructure Decomposition Multistart NLP Option tuner for MILP

Decomposition Outline Algorithm Dantzig-Wolfe Decomposition embedded in B&B a specic variant of column generation Relax A and subproblem B becomes tractable (even separable) Find convex combinations of extreme points of subproblems that satisfy the continuous relaxation of the master constraints Iterate between master A (reformulated space) and B Blocks From user, network or auto Available in SAS/OR A 1 A 2 A K B 1 B 2... B K

Decomposition Parallel Implementation Implementation Shared (Threaded) and/or Distributed Memory (Gridded) Subproblems use a standard queue Areas of parallelism Branch & Bound Heuristics (non-blocking price-and-branch) Subproblem solves (across subproblems) Master solve (IPM or concurrent) Subproblem solves (for each subproblem) Factors aecting parallel performance percentage of time in subprob vs master (modeling) load balance aggregate subproblems enforce balance with time limits? (non-deterministic) MPI overhead jobs must be signicant

Other parallel optimization applications Network tools Graph centrality, community detection Social network analysis for fraud detection Marketing analysis for telecommunications Multistart NLP (Global optimization) Standard NLP solvers started from dierent points Function evalutions and solvers are distributed Option tuner for MILP Find the best option setting for a set of MILP problems Continuous, discrete and categorical options are all included

Future Plans Extend the list of HP enabled procedures driven by customers' needs distributed LP/QP distributed graph algorithms parallel MILP parallel solves in OPTMODEL simulation-based optimization

Thank you for your attention. imre@polik.net