Mapping the Universe; Challenge for the Big Data science



Similar documents
Hadoop Cluster Applications

Distributed Computing. Mark Govett Global Systems Division

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Make the Most of Big Data to Drive Innovation Through Reseach

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

NITRD and Big Data. George O. Strawn NITRD

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

High Performance Computing

Conquering the Astronomical Data Flood through Machine

IBM System x reference architecture solutions for big data

Report about the state of mathematics in Thailand (April 21, 2009)

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Big Data. George O. Strawn NITRD

HPC technology and future architecture

Are You Ready for Big Data?

Advanced Big Data Analytics with R and Hadoop

Prescriptive Analytics. A business guide

Concept and Project Objectives

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

We are Big Data A Sonian Whitepaper

UPS battery remote monitoring system in cloud computing

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

The big data revolution

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

NetApp Big Content Solutions: Agile Infrastructure for Big Data

The Next Wave of Data Management. Is Big Data The New Normal?

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

RevoScaleR Speed and Scalability

Putchong Uthayopas, Kasetsart University

Big Data: Key Concepts The three Vs

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

Understanding the Real Impact of Social Media Monitoring on the Value Chain

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

NTT DATA Big Data Reference Architecture Ver. 1.0

Learning from Big Data in

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

PLA 7 WAYS TO USE LOG DATA FOR PROACTIVE PERFORMANCE MONITORING. [ WhitePaper ]

High Performance Computing. Course Notes HPC Fundamentals

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Big Data: Study in Structured and Unstructured Data

Data Requirements from NERSC Requirements Reviews

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

Data Refinery with Big Data Aspects

HPC & Visualization. Visualization and High-Performance Computing

Agilent OSS access7 Signaling Meter

The Hartree Centre helps businesses unlock the potential of HPC

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data. Patrick Derde. Use Cases and Architecture

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Business Process Services. White Paper. Predictive Analytics in HR: A Primer

Big Data - Infrastructure Considerations

A Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction

Improving Decision Making and Managing Knowledge

Chapter 7. Using Hadoop Cluster and MapReduce

How To Use Hadoop For Gis

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Are You Ready for Big Data?

SOCIAL NETWORKS AND STUDENTS' FUTURE JOBS

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

Hadoop. Sunday, November 25, 12

Survey on Artificial Intelligence Technology in Thailand

Big bang, red shift and doppler effect

Life Insurance & Big Data Analytics: Enterprise Architecture

Role of Social Networking in Marketing using Data Mining

Characterizing Task Usage Shapes in Google s Compute Clusters

Technology Implications of an Instrumented Planet presented at IFIP WG 10.4 Workshop on Challenges and Directions in Dependability

Open source Google-style large scale data analysis with Hadoop

Buildings of the Future Innovation at Siemens Building Technologies Duke Energy Academy at Purdue University June 23, 2015

Data Mining with Hadoop at TACC

Introduction to Data Mining

Boarding to Big data

Business Process Discovery

Exploring Big Data in Social Networks

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Big Data. Fast Forward. Putting data to productive use

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Transcription:

Mapping the Universe; Challenge for the Big Data science Dr.Utain Sawangwit National Astronomical Research Institute of Thailand utane@narit.or.th Observational Cosmology relies on big surveys and very often full-sky reconstructions of the physical universe. The many tools utilized by cosmologists span a wide range of techniques from observing the earliest radiation after the Big Bang to the use of gravitational lensing effect and supernovae explosions. In this talk I will review the challenge that modern cosmology has overcome and what it is facing in the near future when even larger telescopes hence bigger surveys and datasets start to roll in during the coming decades.

DOM: A big data analytics framework for mining Thai public opinions Assoc.Prof.Dr. Tiranee Achalakul King Mongkut s University of Technology Thonburi This work is a part of the From Research to Product program at the department of computer engineering We, human being, have never been more connected through the emergence of social networks. Social networks, in terms of both data and users, have been exponentially growing and connect our lives together in various dimensions. We can connect with people across the planet with a touch of a finger. In every second, hundred thousands of messages are shared through social media such as Facebook, Twitter, Foursquare, and etc. They are about our life, feeling, experience and opinion. This practically represents the 21 st century of our civilization, The era of social network. Social media networks generated huge volumes of data. They have been use in various types of applications including public health, emergency coordination, news recommendation, and stock market prediction. The data from social media networks gathered under the catch-all term, big data. However, as much as 90% of the data stored is "unstructured," meaning that it is spontaneously generated and not easily captured and classified. Big data is only valuable if it tells a story. The organizations that can use stories to make sense of big data are going to excel. In this talk, I will present the developments of DOM (Data and Opinion Mining), a big data analytics engine that is capable of mining Thai public opinions regarding specific issues discussed on the social network sites, and its corresponding mobile solution for answering public opinions about events and locations. The engine takes in data from multiple well-known social network sources, and then processes them using MapReduce, a keyword-based sentiment analysis technique, and an influencer analysis algorithm to determine public opinions and sentiments of certain topics. The system was evaluated its sentiment prediction accuracy by matching the predicted result with the human sentiment and tested on various case studies. The effectiveness of the approach demonstrates the practical applications of the engine.

Data Management for the CMS Collaboration Dr. Kittikul Kovitanggoon Department of Physic, Faculty of Science, Chulalongkorn University kovitang.cern@gmail.com The Compact Muon Solenoid (CMS) at CERN has been recognized not only by its complexities of the detector constructions, operations and physics discoveries, but also the large quantity of data and computing resources. One of the main technical challenges of the experiment is to organize the enormous data volume collected and to efficiently process the data for physics explorations. It would be difficult to meet the computing and storage requirements at any one place. Since the most of CMS collaborators are around the world and have their own significant local resources, the CMS computing system and environment are built as a distributed system of computing services and resources that interact with each other. Tier-0 computing centre at CERN will be directly connected to the experiment for initial processing and data archiving, but considerable processing will be done at remote Tier-1 centres, and analysis will be done at Tier-2 and Tier-3 centres, all interconnected using Grid technology. In this talk, I will present you about the data management for the CMS collaboration. Starting with data recording after the proton-proton collisions at the CERN's Tier-0, the presentation will explain you how the collaboration handles the large amount of data and distributes the data across the world to Tier-1, Tier-2, and Tier-3 via the Grid system. Chulalongkorn University (CU) as a member of the CMS experiment collaborating with National Electronics and Computer Technology Center (NECTEC) for the deverlopment of the CMS Tier-2 at National Science and Technology Development Agency (NSTDA) and Tier-3 at CU will also be presented.

Reproducible e-science Experiments with Docker Asst.Prof.Dr. Chanwit Kaewkasi Department of Computer Engineering, School of Engineering, Suranaree University of Technology Email: chanwit@sut.ac.th e-science involves studying complex computational research works. It is usually hard for reviewers to prepare the computing environment, which is exactly same as authors. Nature Biotechnology announced that it seeks a new way to evaluate computational tools more consistent. The journal is exploring to adopt Docker, a software tool that allows authors to share their computing environment. We discuss steps how to use Docker for e-science research. We also discuss how our team at SUT Aiyara Cluster laboratory contribute to the Docker projects.

HPC Experiences from ASC15 Asian Student Supercomputer Challenge and Implication for HPC Education Asst.Prof.Dr.Putchong Uthayopas, Assoc.Prof.Dr. Chantana Chantrapornchai HPCNC, Department of Computer Engineering, Faculty of Engineering, Kasetsart University Email: putchong@ku.th, chantrapornchai@gmail.com ASC Student Supercomputer Challenge is an annual supercomputing competition held mostly in China. The goal is to promoting HPC application and education worldwide. This year competition, ASC15, is the largest competition in the world with more than 150 universities registered. Our team from Kasetsart University has been chosen as one of the 16 worldwide teams to compete in the final round. Each team has to design, construct a small HPC system, and tune it for selected HPC application under power usage of 3000 Watts. This presentation will share our experiences in preparation, and competition which focus on how to maximize the performance of leading-edge HPC applications such as HPL, NAMD, WRF-Chem, and Palobos, Gridding, and HPCG while maintaining the power consumption under 3000 Watt. This includes the optimal HPC system design that balanced between extreme performance and power consumption and the HPC applications performance tuning. This experience is very valuable for educating future generation engineer in HPC areas.

The Image Post-processing of Weather Forecast Models with High Performance Computing Dr. Prattana Deeprasertkul Hydro and Agro Informatics Institute (Public Organization) prattana@haii.or.th The Weather Research and Forecasting (WRF) modeling system, the Regional Atmospheric Modeling System (RAMS), The Regional Ocean Modeling System (ROMS), the Coupled WRF-ROMS modeling system, and Simulating WAves Nearshore (SWAN) modeling are designed and developed for understanding the climate system and forecasting the weather. These global models produce the various timescales and domains of weather forecasts outputs, every day. Finally, a large volume of images are generated to display the hourly and daily weather forecasting with the image post-processing of modeling systems. This image processing computation needs the capability of more large-scale computing. Therefore, the High Performance Computing architectures are required to process these large volumes of images. Moreover, a parallel processing implementation of the geometric image transformation algorithm will be applied to operate the models post-processing on a cluster computer.