Defense Industry & Open Source & BigData



Similar documents
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Applications for Big Data Analytics

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

How To Scale Out Of A Nosql Database

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data and Data Science: Behind the Buzz Words

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Hadoop. Sunday, November 25, 12

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop Ecosystem B Y R A H I M A.

Bringing Big Data to People

Hurtownie Danych i Business Intelligence: Big Data

White Paper: What You Need To Know About Hadoop

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big Data and Apache Hadoop s MapReduce

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

So What s the Big Deal?

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

MapReduce with Apache Hadoop Analysing Big Data

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Cloud Scale Distributed Data Storage. Jürmo Mehine

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Big Data Technologies Compared June 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Application Development. A Paradigm Shift

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Large scale processing using Hadoop. Ján Vaňo

HDP Hadoop From concept to deployment.

Big Data and Industrial Internet

Qsoft Inc

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

BIG DATA TRENDS AND TECHNOLOGIES

How To Handle Big Data With A Data Scientist

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Hadoop Job Oriented Training Agenda

I/O Considerations in Big Data Analytics

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Upcoming Announcements

Comprehensive Analytics on the Hortonworks Data Platform

Introduction to Big Data Training

Lecture Data Warehouse Systems

Big Data Realities Hadoop in the Enterprise Architecture

Microsoft Big Data. Solution Brief

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

White Paper: Hadoop for Intelligence Analysis

HDP Enabling the Modern Data Architecture

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Big Data Advanced Analytics for Game Monetization. Kimberly Chulis

Modernizing Your Data Warehouse for Hadoop

A Survey on Big Data Concepts and Tools

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

NoSQL Data Base Basics

How To Store Data In Nosql

HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP

Big Data Explained. An introduction to Big Data Science.

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

A Brief Outline on Bigdata Hadoop

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Community Driven Apache Hadoop. Apache Hadoop Basics. May Hortonworks Inc.

Dominik Wagenknecht Accenture

Open source large scale distributed data management with Google s MapReduce and Bigtable

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Introduction to NOSQL

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

A Cloud Based Platform for Big Data Science Md. Zahidul Islam

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Workshop on Hadoop with Big Data

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Constructing a Data Lake: Hadoop and Oracle Database United!

The little elephant driving Big Data

BBM467 Data Intensive ApplicaAons

How to Hadoop Without the Worry: Protecting Big Data at Scale

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

Big Systems, Big Data

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Big data for the Masses The Unique Challenge of Big Data Integration

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

Introduction to Apache Cassandra

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Transcription:

אלביט מערכות יבשה ותקשוב Defense Industry & Open Source & BigData

מרצה גרמן גברילוב germang@elbit.co.il אלביט מערכות יבשה ותקשוב מנהל מודיעין תחום סייבר

Defense Industry & Open Source & Big Data Big Data Open Source Defense Industry

Agenda צורך גידול בנפח מידע עולמי צורך במערכות מודיעניות מה זה?Big Data 3V Model of Big Data Scale up / Scale out CAP theorem סוגי פתרונות פרוייקט Apache Hadoop HDFS Map Reduce Hadoop Projects דוגמא לארכיטקטורה של מערכת מידע בעזרת Hadoop

צורך - גידול בנפח מידע עולמי Twitter produces over 340 million tweets per day, with over 500 million registered users as of 2012 Over 32 billion searches were performed last month on Twitter Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo Zynga processes 1 petabyte of content for players every day More than 2 billion videos are watched on YouTube every day By 2015, nearly 3 billion people will be online, pushing the data created and shared to nearly 8 zettabytes.

צורך - גידול בנפח מידע עולמי

צורך - גידול בנפח מידע עולמי quantity of global data

צורך - צורך במערכות מודיעניות יכולת קליטה בזמן קצר נפחים גדולים של נתונים real-time( )near יכולת קליטה סוגים שונים של נתונים יכולת עיבוד נפחים גדולים של מידע יכולת הרצת אנליזות שונות מותאמות סוג מידע יכולת תחקור של הצגה של מידע בצורה ברורה, מהירה ונוחה הלקוח רוצה לדעת לקרוא את המידע הקיים בעולם בצורה נוחה

צורך - צורך במערכות מודיעניות

דוגמאות לתמונות שאנשים העלו בחשבון טוויטר

מה זה?Big Data What is data? Data is Information in raw or unorganized form such as alphabets, numeric or symbols. What is Big Data? Big Data refers to large datasets which are difficult to store, manage and analyze. Everyday, we create over 2.5 trillion byte of data so much that 90% of the data in the world today has been created in the last tow years alone.

מה זה?Big Data O Reilly Radar definition: Big data is when the size of the data itself becomes part of the problem EMC/IDC definition of big data: Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. IBM says that three characteristics define big data: Volume (Terabytes -> Zettabytes) Variety (Structured -> Semi-structured -> Unstructured) Velocity (Batch -> Streaming Data)

3V Model of Big Data

ביזור מדיע בין מכונות Scale up / Vertical scaling Scale out / Horizontal scaling / Distributed systems To scale vertically means to add resources to a single node in a system, typically involving the addition of CPUs or memory to a single computer. To scale horizontally means to add more nodes to a system, such as adding a new computer to a distributed software application.

CAP theorem CA RDBMSs (MySql, ( Greenplum Vertica Aster Data CP Hbase MongoDB Terrastore BigTable MemcacheDB AP Cassandra CouchDB SimpleDB Dynamo

סוגי פתרונות Store type Key Value Stores Schema-less Description Conceptual Structures Key Value Column-oriented databases Storage by column Israel Target Weight 2.85 kg Price 24.00 $ Italia 1.23 kg 17.50 $ Turkey 3.76 kg 27.30 $ Graph Databases Uses nodes and edges to represent data. Data Node Data Node Data Node Document Oriented Databases Sharded RDBMS (MPP databases) Store documents that are semi-structured. Often XML databases. Key Structured Document (XML) RDBMS RDBMS RDBMS

סוגי פתרונות Type Performance Horizontal Scalability Flexibility in Data Variety Complexity of Operation Functionality Key-Value stores Berkeley Scalaris MemcacheDB high high high none variable (none) Column-oriented databases Cassandra HP Vertica BigTable Hbase OrientDB high high moderate low minimal Graph Databases Neo4j InfiniteGraph Titan OrientDB variable variable high high graph theory Document Oriented Databases CouchDB MongoDB SimpleDB Redis high variable (high) high low variable (low) Shard RDBMS (MPP) HP Vertica EMC Greenplum Aster Data variable variable low moderate relational

פרוייקט Apache Hadoop hadoop.apache.org The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model wikipedia.org Apache Hadoop is an open-source software framework that supports dataintensive distributed applications. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. Hadoop provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. It enables applications to work with thousands of computation-independent computers and petabytes of data.

פרוייקט Apache Hadoop Companies are provides Hadoop in they products IBM - InfoSphere BigInsights Oracle - Big Data Appliance EMC - Pivotal HD Microsoft HDInsights Others Organizations are using Hadoop to run large distributed computations Facebook.com Amazon.com Ancestry.com Akamai American Airlines AOL Apple ebay Hortonworks Federal Reserve Board of Governors Foursquare Yahoo! InMobi Intuit Joost Last.fm LinkedIn Microsoft NetApp Netflix Ooyala Riot Games The New York Times SAP AG SAS Institute StumbleUpon Twitter Yodlee Fox Interactive Media Gemvara Google Hewlett-Packard IBM

פרוייקט hdfs Apache Hadoop HDFS is a distributed, scalable, and portable file system. HDFS is designed to store a large amount of data in various servers/clusters.

פרוייקט map/reduce Apache Hadoop MapReduce is the key algorithm that the Hadoop MapReduce engine uses to distribute work around a cluster.

פרוייקט Apache Hadoop Data Access / Query abilities Map Reduce Distributed processing Management tools Pig )simply query language( Hive )SQL like queries( Cascading )software abstraction layer ( Mahout )machine learning( Hama )scientific computation( Avro )data serialization system( Hadoop Map Reduce implementation Ambari (deploying, managing, and monitoring tool) Sqoop (transferring data tool) Oozie (workflow scheduler system) Zookeeper (coordination service) Flume (framework for populating Hadoop) Storage / Data structure Hadoop Distributed File System Hue (File Browser for HDFS) HBase (column oriented database) HCatalog (table/storage management service)

Hadoop Ecosystem

דוגמא לארכיטקטורה של מערכת מידע בעזרת Hadoop

סוף גידול בנפח מידע עולמי צורך של מערכות מודעיניות פתרונות Big Data מימוש בעזרת Apache Hadoop Thank You!