Green Big Data. A Green IT / Green IS Perspective on Big Data



Similar documents
Implement Hadoop jobs to extract business value from large and varied data sets

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

How To Handle Big Data With A Data Scientist

BIG DATA TECHNOLOGY. Hadoop Ecosystem

A Professional Big Data Master s Program to train Computational Specialists

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Industry 4.0 and Big Data

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Migrating Scientific Applications from Grid and Cluster Computing into the Cloud Issues & Challenges

All You Wanted to Know About Big Data Projects Chida Jan 2014

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

The basic data mining algorithms introduced may be enhanced in a number of ways.

Virtualizing Apache Hadoop. June, 2012

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

How To Scale Out Of A Nosql Database

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Big Data - Infrastructure Considerations

Exploring Big Data in Social Networks

Oracle Big Data SQL Technical Update

Manifest for Big Data Pig, Hive & Jaql

BIG DATA IS MESSY PARTNER WITH SCALABLE

Cloud Computing Summary and Preparation for Examination

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Ubuntu and Hadoop: the perfect match

OnX Big Data Reference Architecture

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Simple. Extensible. Open.

Customized Report- Big Data

Part I Courses Syllabus

Bringing Big Data to People

for Oil & Gas Industry

Hadoop Cluster Applications

How To Create A Data Science System

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Open source Google-style large scale data analysis with Hadoop

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

The Next Wave of Data Management. Is Big Data The New Normal?

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Text Analytics and Big Data

Big Data and Data Science: Behind the Buzz Words

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

White Paper: Hadoop for Intelligence Analysis

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1

Massive Cloud Auditing using Data Mining on Hadoop

Business Intelligence meets Big Data: An Overview on Security and Privacy

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

A Brief Introduction to Apache Tez

Luncheon Webinar Series May 13, 2013

Big Data Storage Architecture Design in Cloud Computing

Cloud Storage Solution for WSN Based on Internet Innovation Union

DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Big Data and Natural Language: Extracting Insight From Text

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

The future: Big Data, IoT, VR, AR. Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

BIG DATA TRENDS AND TECHNOLOGIES

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

BiDAl: Big Data Analyzer for Cluster Traces

UPS battery remote monitoring system in cloud computing

USING BIG DATA FOR INTELLIGENT BUSINESSES

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

ICT Perspectives on Big Data: Well Sorted Materials

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Cloud Management: Knowing is Half The Battle

Interactive data analytics drive insights

High-Performance Analytics

The WAMS Power Data Processing based on Hadoop

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

White Paper: What You Need To Know About Hadoop

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

APICS INSIGHTS AND INNOVATIONS EXPLORING THE BIG DATA REVOLUTION

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Large scale processing using Hadoop. Ján Vaňo

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

MEng, BSc Computer Science with Artificial Intelligence

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Apache HBase. Crazy dances on the elephant back

Hadoop implementation of MapReduce computational model. Ján Vaňo

Leveraging Big Data. Processing ISR Data. JP Morgenthal Cloud Ranger. Cloud and Virtual Data Center Services EMC Consulting

Some Economics of Cultural PSI: the Micro Perspective

This Symposium brought to you by

Data Centric Systems (DCS)

Big Data Zurich, November 23. September 2011

Changing the Equation on Big Data Spending

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Transcription:

Green Big Data A Green IT / Green IS Perspective on Big Data

Agenda 1. Starting Point and Research Question 2. Subject of Analysis 3. Research Methodology 4. Results 5. Conclusion Green Big Data 2

Starting Point and Research Question Increasing usage of the notion Big Data in science and practical environment Inconsistent understanding, resulting from the extensive use of the notion in a marketing context Relevance of Big Data differs amongst scientific disciplines Research questions Are resource efficient processes for Big Data applications discussed in recent publications? In how far can aspects of Big Data applied to EMIS? Green Big Data 3

The Big Data Concept Improved understanding of the notion Big Data by identifying characterizing dimensions using a deductive approach Green Big Data 4

Selection of Methodology Problem: Which method can be used to identify if and in how far a new concept (Big Data), which characteristics are not yet defined consistently is already existent in a certain field of research? Traditional Literature Analysis Manual identification of recent fields of research within the current EMIS / Green IT / Green IS - literature Generative Literature Analysis Automated identification of recent fields within the current EMIS / Green IT / Green IS - literature Green Big Data 5

Data Basis Data source: Scopus Keywords: EMIS, Green IT, Green IS in Title, Abstract, Keywords Period under consideration: 2007-2012 Number of resulting documents: 1055 Processed data: Abstracts Green Big Data 6

Underlying Assumptions of Topic Models Topics are probability distributions over words A probability distribution over the contained topics can be defined for each document Each document is represented by a list of words word vector Green Big Data 7

Application of Topic Models Parameter calculation 1. Defining an a-priori distribution over topics 2. Defining an a-priori distribution over words for each topic 3. Applying the Latent Dirichlet Allocation for the calculation of the latent variables based on a corpus Abstracts of the identified publications Green Big Data 8

Results and Discussion of the Topic Models The period 2008 2010 is marked by natural science related topics A rise of an application-oriented perspective since 2011, which contains the first aspects of the Big Data concept implicitly Green Big Data 9

Hadoop A Green IT Perspective on Big Data Goiri et al. (2012) GreenHadoop: Leveraging green energy in data-processing frameworks Mao et al. (2012) GreenPipe: A Hadoop Based Workflow System on Energy-efficient Clouds Hadoop Based on MapReduce, a framework for distributed computation developed by Google, focused on scalability Contains amongst other a file system (HDFS) and a column-oriented database (Hbase), which runs on Commodity Hardware Basis for numerous Big Data products Application of Hadoop in the field of Green IT Energy-efficient controlling of Hadoop cluster Scheduling of MapReduce jobs according to the availability of green energy Green Big Data 10

Possible areas of application for Big Data in the field of Green IS Aspects of Big Data can not be found in Green IS publications so far Outlook Possible area of application: Exploitation and utilization of new data sources for the calculation of the environmental impact Company internal Potential data sources for the development of petri nets can be found in terms of Event-Logs from ERP-System and sensor data in the production environment Company external Identification of environmental impact of upstream supply chain members by using databases as Ecoinvent Application of text mining / ontologies for the analysis of unstructured data Closing the Semantic Gaps resulting from inconsistent denotation standards of different product databases Incorporating public available data sources (Open Data Initiative) Data collection of the Sustainability Consortium within the Open IO projects Green Big Data 11

Results and outlook Conclusion Generative approach has proofed itself as useful for the analysis of emerging research fields Big Data can not be found explicitly in the field of Green IT / Green IS, but has already arrived in terms of Hadoop for Green IT applications Focus on the energy-efficient controlling of Hadoop cluster Outlook Data basis Utilizing further data sources Discipline/Dimension-specific data gathering (Infrastructure, Method etc.) Method Validation of the results using an intrusion approach Green Big Data 12

Thank you very much for your attention. Green Big Data 13