Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Size: px

Start display at page:

Download "Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN"

Milo Martin
10 years ago
Views:

1 Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

2 Understanding Hadoop

3 Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models

4 Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing

5 Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing fault tolerant, highly available, dynamic, flexible distributed systems distributed file systems, distributed queries, distributed databases

6 Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing fault tolerant, highly available, dynamic, flexible distributed systems distributed file systems, distributed queries, distributed databases Hadoop's Base Components utilities (Common) distributed file system (HDFS ) job scheduling and resource management (YARN)

2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel

7 Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing fault tolerant, highly available, dynamic, flexible distributed systems distributed file systems, distributed queries, distributed databases Hadoop's Base Components utilities (Common) distributed file system (HDFS ) job scheduling and resource management (YARN) parallel processing system (MapReduce)

11 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation)

12 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system)

conversion, filtering, aggregation) data storage

13 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon)

parallel, scalable, replicated, distributed file system) Conclusions we should set up a

14 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)

distributed file system) Conclusions we should set up a distributed file system (which is much more than a

15 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)

16 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)

17 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)

18 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)

19 Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)

BIG DATA USING HADOOP

+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data