make connections share ideas be inspired Använd SAS för att bearbeta och analysera ditt data i Hadoop Mikael Turvall
Arkitektur SAS VISUAL ANALYTICS and SAS VISUAL STATISTICS SAS IN-MEMORY STATISTICS FOR HADOOP BLADE ENVIRONMENT MPP DATASTORE WEB-BASED CLIENT SAS VA/VS SAS Studio METADATA SERVER (Optional) MID-TIER WORKSPACE SERVER IN-MEMORY STORE SAS LASR ANALYTIC SERVER Hadoop Cloudera Hortonworks SAS Embedded Process Hadoop Teradata Pivotal Oracle RDBMS Nonrelational Click Stream PC Files Other
Varför? Hadoop som en platform för data Hadoop som kärnan i nästa generations analysplatform EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION DEPLOY MODEL DATA EXPLORATION VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT
DEPLOY & MONITOR Från data till beslut SAS/ACCESS SAS Data Management SAS Federation Server SAS Data Loader for Hadoop MANAGE DATA DATA EXPLORE SAS Visual Analytics SAS In-memory Statistics for Hadoop TEXT SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop DEVELOP MODELS SAS HPA Products SAS Visual Statistics SAS In-memory Statistics for Hadoop SAS Enterprise Miner
Kom igång snabbt Möjligheter Transparent access till Hadoop-tabeller i vanliga SAS-library Man kan programmera i SAS SQL och SAS datasteg som vanligt Man kan hantera Hadoop från SAS: Native HDFS kommandon MapReduce, Pig, och HiveQL Fördelar Man behöver inte vara expert på Hadoopspecifik syntax Byta till Hadoop är lika enkelt som att byta ett libname Befintliga SAS program, rapproter, etc. kan återanvändas Många olika sätt att accessa data ger IT olika möjligheter att utnyttja kapaciteten MAN KAN BÖRJA IDAG
Var får jag tag i Hadoop?
SAS/ACCESS to Hadoop HADOOP SAS SERVER Hive QL Flytta delar av jobbet in i Hadoop
Komma igång med Hadoop libname elefant hadoop PORT=10000 SERVER=sascldserv02 USER=hadoop PASSWORD= hadoop" ;
Hadoop Filename Statement FILENAME hdpfile1 hadoop "/user/hadoop/gutenberg/pg20417.txt" cfg="c:\users\hadoop_config.xml" user='hadoop' ; Definiera en fileref DATA my_analysis_data; INFILE hdpfile1 ; INPUT ; RUN; Använd den som vanligt OBS! Flytta inte över ALL data i till en SAS-tabell
Hadoop File Reader SAS 9.4 kan läsa icke-hive -filer som tabeller Filformatformat Delimited CSV XML JSON (experimental) Binary files Multipla filer i en katalog
Hadoop File Reader libname HDP hadoop user=hadoop pw=hadoop config = '/home/sasinst/hadoop_config.xml hdfs_tempdir = '/user/hadoop/tmp hdfs_metadir = '/user/hadoop/metadata hdfs_permdir = '/user/hadoop/dataload' ; proc hdmd name=hdp.pipedata_dept format=delimited sep = ' DATA_FILE='pipedata_dept.txt' ; COLUMN col1 int; COLUMN col2 char(15); run; proc print data=hdp.pipedata_dept; run; Definiera ett libname Specificera filformatet Använd den som vanligt
DI Studio Access data in Hadoop Transform data inside Hadoop using HiveQL Creating new data in Hadoop
SPDE Traditionellt filsystem libname spdlib spde /path ; proc print data=spdlib.mytab; run; SPDE Open/read/close mytab.mdf Open/read/close mytab.dpf1 Open/read/close mytab.dpf2 t k i o e mytab.mdf mytab.dpf1 mytab.dpf2
SPDE - Hadoop HDFS libname spdlib spde /path hdfshost=default; proc print data=spdlib.mytab; run; Get data block locations Namenode Get data Datanode SPDE Open/read/close mytab.mdf Open/read/close mytab.dpf1 Open/read/close mytab.dpf2 H D F S C l i e n t Get data Get data M1 D1 Datanode D2 Datanode D1 D2
Nästa steg - SAS-jobb i Hadoop SAS SERVER SAS Data Step & DS2 HADOOP SAS Data Loader for Hadoop SAS Code Accelerator for Hadoop SAS Scoring Accelerator for Hadoop
SAS Data Director SAS Data Loader for Hadoop User Name What directive do you want to perform? Show: All Directives Saved Directives Open a previously created directive to run, view, or edit. Schedule a Directive to Run Schedule a directive to run at specified dates and times Chain Directives Together Run a number of directives in a specific order. Copy Data for Visualization Copy data from Hadoop and load it into LASR for visualization. Existing data in the target table will be replaced. Copy Data to Hadoop Copy data from a source and load it into Hadoop. Existing data in the target file will be replaced. Join Tables in Hadoop Create a table in Hadoop from multiple tables. Pivot a Table in Hadoop Transpose the columns of a table in Hadoop. Transform Data in Hadoop Transform the data in an Hadoop data file. 1 Click Verify Mailing Address Check the validity of the mailing address data in a table. Profile Data Create a report profiling the data in a table. Generate Business Rules Analyze data in a table and generate business rules. Send Data for Remediation Select data to send to the remediation queue for further action.
DEPLOY & MONITOR Från data till beslut SAS/ACCESS SAS Data Management SAS Federation Server SAS Data Loader for Hadoop MANAGE DATA DATA EXPLORE SAS Visual Analytics SAS In-memory Statistics for Hadoop TEXT SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop DEVELOP MODELS SAS HPA Products SAS Visual Statistics SAS In-memory Statistics for Hadoop SAS Enterprise Miner
make connections share ideas be inspired mikael.turvall@sas.com