Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1
Connecting the Data-Driven Enterprise 2
Talend Overview Founded in 2006 BRAND AWARENESS 400 employees in 7 countries VIBRANT COMMUNITY CUSTOMER LOYALTY MONETIZATION Dual HQ in Los Altos, CA and Paris, France Open Core business model SubscripBon license Services & training 2007 2008 2009 2010 2011 2012 2013 2014 3
Key messages Talend helps data- driven companies get successful Easiest and fastest computa&on with native code genera&on Open source state- of- the- art technology standards Start small / think Big with future- proof unified architecture Predictable investment with lowest TCO 4
The Power of Hadoop 2015 Talend Inc. 5
Hadoop Strategy: Divide and Conquer Monolithic Parallel Hugely expensive Precious engineering Single points of failure Geographically isolated Commodity hardware Self- healing, redundant Geographically distributed 6
Example of Hadoop Use Cases Big Data - Velocity - Variety - Volume Analysis - Social media - Customer/Clickstream - Opera&onal/Server Logs - Fraud & Compliance - Sensor/Machine - Geographic - etc. value Page 7 7
Hadoop inser&on in Your IT- environment NoSQL Web Logs IOT ERP distro Metadata NoSQL Standard Reports Ad-hoc Query Tools Data Mining Data explosion Batch to Real-Time DBMS / EDW Legacy Systems MDD/ OLAP Analytical Applications Longer active data DWH/Data Marts 8
The Value of Talend 2015 Talend Inc. 9
The Hadoop ecosystem today Ambari, chukwa, DRILL, Flume, Ganglla, GIRAPH, hadoop, HBASE, HCatalog, HDFS, HIVE, MapReduce, mahout, oozie, PIG, Spark, sqoop, Storm, Whirr, YARN, Zookeeper source: j2eedev.org 10
Brief History of Hadoop and Talend Apache Project Established 2006 Enterprise Hadoop distribubon Vendors Hortonworks, Cloudera, Pivotal, 10Gen, 2008 2010 2012 New performance capabilibes 2014 Widely adopted concepts and technologies 1 st Open Source IntegraBon SoPware 2006 1 st on Hadoop HDFS + Map Reduce 2008 2010 2012 1 st on YARN, HIVE, Spark and Storm 2014 à Talend is matching and supporfng Hadoop ecosystem nafvely 2015 Preferred solufon for Big Data integrafon 11
Talend Big Data Integra&on Visual, Drag and Drop UI 800+ Pre-built connectors Generates MapReduce, Java or SQL Run at cluster scale Load balancing & failover Code optimization Supports Big Data management consoles Integrates with native security Centralized scheduling, monitoring and mgmt Shared repository Auto-documentation 101010101010 Design 10101101010101010101 01010101010101010101 Scale 11010110101010101010 Collaborate 01010101101010101010 1011010101010101 Manage 0101010110Deploy Zero Talend install on Hadoop Cleanse and enrich Native support for Kerberos 12
Zero to Big Data in 10 Minutes! 2015 Talend Inc. 13
Talend Big Data Sandbox à Free virtual image* including: A ready- to- run Pla[orm for Big Data 30 days evalua&on included A distribu&on of Apache Hadoop based on either Cloudera, Hortonworks, or MapR A step- by- step Big Data Insights Cookbook with four big data ready- to- run scenarios * Runs on Oracle VirtualBox 4.2+, VMware Fusion 5.0 + (Mac) or VMware Player (Win) 14
Talend Big Data Architecture NoSQL Web Logs Internet of Things ERP DBMS / EDW Legacy Systems Ingestion Develop and Test Studio Talend Big Data Map Profile Parse Match Cleanse Standardize Share Native Change Data Capture Operations Team Schedule Machine Learning Access NoSQL Standard Reports Ad-hoc Query Tools Data Mining MDD/ OLAP Analytical Applications Benefits Increased Productivity Lowest TCO Future Proof Architecture 15
Talend Big Data Scenarios Clickstream analysis Sen&ment analysis with social media data Log stream analysis using Apache weblogs ETL offloading with Hadoop 16
Example scenario: Sen&ment Analysis with social media data 2015 Talend Inc. 17
Sen&ment Analysis: Overview #Hashtag à Twitter API à Sentiment dictionary à Time zones à Google Geocharts 18
Sen&ment Analysis: Anatomy 19
Take away 2015 Talend Inc. 20
Key messages Talend helps data- driven companies get successful Easiest and fastest computa&on with native code genera&on Open source state- of- the- art technology standards Start small / think Big with future- proof unified architecture Predictable investment with lowest TCO 21
Do You like it? 22
Julien Clarysse PRE- SALES CONSULTANT Office +49(0)228 76 37 76 0 Mobile +49(0)170 5768201 Email jclarysse@talend.com Skype jclarysse_talend Twiper whatdoesdatado www.talend.com Talend Germany GmbH Serva&usstraße 53-53175 Bonn - Germany 23
Screenshot 24
Tweets with #CeBIT 25
Tweets with #IoT 26