Big Data Drupal. Commercial Open Source Big Data Tool Chain



Similar documents
Hadoop Ecosystem B Y R A H I M A.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Hadoop implementation of MapReduce computational model. Ján Vaňo

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

How To Scale Out Of A Nosql Database

Application Development. A Paradigm Shift

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

BIG DATA USING HADOOP

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

How To Create A Data Visualization With Apache Spark And Zeppelin

Workshop on Hadoop with Big Data

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data and Industrial Internet

Hadoop. Sunday, November 25, 12

NoSQL and Hadoop Technologies On Oracle Cloud

Dell In-Memory Appliance for Cloudera Enterprise

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

BIG DATA TRENDS AND TECHNOLOGIES

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Large scale processing using Hadoop. Ján Vaňo

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

The Future of Data Management with Hadoop and the Enterprise Data Hub

Apriori-Map/Reduce Algorithm

Hadoop Big Data for Processing Data and Performing Workload

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Moving From Hadoop to Spark

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Native Connectivity to Big Data Sources in MSTR 10

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

The Future of Data Management

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

#TalendSandbox for Big Data

Oracle Big Data SQL Technical Update

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

Has been into training Big Data Hadoop and MongoDB from more than a year now

HDP Hadoop From concept to deployment.

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Like what you hear? Tweet it using: #Sec360

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Apache Sentry. Prasad Mujumdar

Big Data Analytics for Cyber

Big Data Explained. An introduction to Big Data Science.

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Big Data. Lyle Ungar, University of Pennsylvania

YARN Apache Hadoop Next Generation Compute Platform

Large-Scale Data Processing

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Information Builders Mission & Value Proposition

Qsoft Inc

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Oracle Big Data for Dummies

Big Data Technologies

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Bringing Big Data to People

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

A Survey on Big Data Concepts and Tools

Hadoop Trends and Practical Use Cases. April 2014

The little elephant driving Big Data

Data Analyst Program- 0 to 100

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Cloudera Manager Training: Hands-On Exercises

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Doing Multidisciplinary Research in Data Science

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

BIRT in the World of Big Data

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Big Data - Business, Math, Technology Best combination for big data 商 业 理 解, 数 据 科 学, 技 术 实 践 之 完 美 结 合

Oracle Big Data for Dummies

HDP Enabling the Modern Data Architecture

Tap into Hadoop and Other No SQL Sources

Intro to Big Data and Business Intelligence

Implement Hadoop jobs to extract business value from large and varied data sets

Deploying Hadoop with Manager

Oracle Big Data Handbook

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

BIG DATA CHALLENGES AND PERSPECTIVES

Transcription:

Big Data Drupal Commercial Open Source Big Data Tool Chain

How did I prepare?

MapReduce Field Work

About Me Nicholas Roberts 10+ years web Webmaster, Project & Product Manager Australian Sonoma County www.niccolox.org Niccolo on drupal.org

What Is Big Data? Big data sets Technology Batched or Streamed Hadoop (Spark) Planet scale data HDFS Google search index HBase Facebook social graph MapReduce Oozie Zookeeper Intelligence

How Big? 1,000 gigabytes is a terabyte. 1,000 terabytes is a petabyte. 1,000 petabytes is an exabyte. 1,000 exabytes is a zettabyte. 1,000 zettabytes is a yottabyte.

Google Oregon

Utah Data Center

Big Data Drupal? Intelligent Automation, Inc NEIMiner NEIMiner, which consists of four components. The NEI modeling framework defines the scope of NEI modeling and the strategy of integrating NEI models to form a layered, comprehensive predictability similar to the Framework for Risk Analysis of Multi-Media Environmental Systems (FRAMES). The data integration layer brings together heterogeneous data sources related to NEI via automatic web services and web scraping technologies. The data management and access layer reuses and extends a popular Content Management System (CMS), Drupal, and consists of modules that model and enable interactions over a complex data structure for NEI related bibliography and characterization data. The model discovery and composition layer provides an analysis capability for NEI data

Software Developer DESIRED SKILLS: Exprience in database modeling and access methodologies, SQL and persistent object technologies Ability to utilize cloud computing and Hadoop technologies Expertise with LAMP, PhP and Drupal Knowledge of Graphical User interface design and GIS systems Experience in Mobile app development on Android OS JOB DUTIES: Support software development for projects in multiple areas of data mining, informatics, human-computer interface, mobile app development and cloud computing https://home2.eease.adp.com/recruit2/?id=6069912&t=1

BigDataDrupal.com

BigDataDrupal.org

Tool-chain Proxmox KVM vm server Debian 7 Solr 3.6 Jetty Aegir BOA Cloudera Nutch 1.6 MapReduce 4 Nutch jobs Hadoop Hue search facets Aegir BOA Open Outreach ApacheSolr Nutch Multisite Views

Proxmox KVM / OpenVZ visualization server Debian 7 based distro Bare-metal installer or Debian 7 packages CPU socket commercial licenses AFGL open source

Cloudera 50% engineering donated open-source Doug Cutting Jeff Hammerbacher Cloudera Manager! Hadoop ecosystem distro Hue Impala Etc etc etc

Hadoop The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Google Yahoo Facebook Twitter Amazon NSA

MapReduce parallel processing of large data sets

Hadoop MapReduce Job UI

Hue: Hadoop UI

Nutch Apache Nutch is a highly extensible and scalable open source web crawler Nutch 1.x: A well matured, production ready crawler. 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. Nutch 2.x: inspiration from 1.x, but which differs in one key area; storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings. This means we can implement an extremely flexibile model/stack for storing everything (fetch time, status, content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.

Aegir Drupal hosting Drupal Drush based Provision Aegir BOA commercial open source installer Open Hosting Platform As A Service Omega8cc

Open Outreach

Drupal & ApacheSolr Nutch 1.6 > Solr 3.6 > Drupal ApacheSolr Integration Module ApacheSolr Examples ApacheSolr Nutch Multisite FacetAPI FaceAPI Pretty Paths ApacheSolr Views

Future Search API Cloudera CDK / Kiji Remote Entities API Kettle / Pentaho Nutch 2.x / HBase Mulesoft Drupal 8 RapidMiner HyperDrupal R GuzzlePHP Bonita Soft REST / Thrift API

Thanks & Credits Mitchell Tannenbaum Forest Mar Chris McCafferty Peter Wolanin David Stuart Ryan Szrama Doug Cutting

Contacts Nicholas Roberts Www.niccolox.org Niccolo.roberts@gmail.com 1. 510 684 8264 Sonoma County Www.BigDataDrupal.com Www.BigDataDrupal.org