Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Understanding Hadoop
Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models
Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing
Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing fault tolerant, highly available, dynamic, flexible distributed systems distributed file systems, distributed queries, distributed databases
Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing fault tolerant, highly available, dynamic, flexible distributed systems distributed file systems, distributed queries, distributed databases Hadoop's Base Components utilities (Common) distributed file system (HDFS ) job scheduling and resource management (YARN)
Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current release 2.2.0) a framework for reliable, scalable, distributed computing to process large data sets using simple programming models Distributed Computing Keywords parallel computing, massively parallel computing, cluster computing, grid computing, high performance computing fault tolerant, highly available, dynamic, flexible distributed systems distributed file systems, distributed queries, distributed databases Hadoop's Base Components utilities (Common) distributed file system (HDFS ) job scheduling and resource management (YARN) parallel processing system (MapReduce)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)
Hadoop Use Cases big users (AOL, Facebook, Google, IBM, Twitter) processing large data (analysis, validation, conversion, filtering, aggregation) data storage (universal, parallel, scalable, replicated, distributed file system) Conclusions we should set up a distributed file system (which is much more than a cloud) and maybe a replicated database (possibly packed upon) but not necessarily Hadoop (because there are other options)