BIG DATA @ EMITLAB & CIDSE. K. Selçuk Candan candan@asu.edu



Similar documents
BIG EMITLAB & CIDSE. K. Selçuk Candan

NEW GRADUATE CONCENTRATION PROPOSALS ARIZONA STATE UNIVERSITY

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS

SURVEY REPORT DATA SCIENCE SOCIETY 2014

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

COMP9321 Web Application Engineering

Big Data and Analytics: Challenges and Opportunities

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Are You Ready for Big Data?

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Transforming the Telecoms Business using Big Data and Analytics

How To Handle Big Data With A Data Scientist

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Search and Real-Time Analytics on Big Data

The Masters of Science in Information Systems & Technology

Where is... How do I get to...

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Big Data Storage Architecture Design in Cloud Computing

Customized Report- Big Data

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Center for Dynamic Data Analytics (CDDA) An NSF Supported Industry / University Cooperative Research Center (I/UCRC) Vision and Mission

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Are You Ready for Big Data?

Log Mining Based on Hadoop s Map and Reduce Technique

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Research at the Department of Computer Science and Software Engineering. Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Industry Impact of Big Data in the Cloud: An IBM Perspective

Introduction. A. Bellaachia Page: 1

BigData at UI CS. Hasan Jamil Department of Computer Science University of Idaho

Big Data Analytics. Lucas Rego Drumond

Deploying Big Data to the Cloud: Roadmap for Success

How To Scale Out Of A Nosql Database

Big Data Driven Knowledge Discovery for Autonomic Future Internet

From Big Data to Smart Data Thomas Hahn

Collaborations between Official Statistics and Academia in the Era of Big Data

Oracle Big Data SQL Technical Update

Big Data and Analytics (Fall 2015)

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

The 4 Pillars of Technosoft s Big Data Practice

Challenges for Data Driven Systems

Doctor of Philosophy in Computer Science

The Internet of Things and Big Data: Intro

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

ANALYTICS CENTER LEARNING PROGRAM

Sanjeev Kumar. contribute

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

The Next Big Thing in the Internet of Things: Real-time Big Data Analytics

IEEE JAVA Project 2012

Big Data Executive Survey

BIG DATA & DATA SCIENCE

Integrating a Big Data Platform into Government:

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Information Management course

Professional Organization Checklist for the Computer Information Systems Curriculum

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

ANALYTICS BUILT FOR INTERNET OF THINGS

The 3 questions to ask yourself about BIG DATA

Getting Started Practical Input For Your Roadmap

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

The Future of Business Analytics is Now! 2013 IBM Corporation

Big Data Analytics Nokia

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Master of Science in Computer Science

Big Data R&D Initiative

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Beyond Watson: The Business Implications of Big Data

Introduction to Data Mining

BIG Data Analytics Move to Competitive Advantage

Big-Data Computing with Smart Clouds and IoT Sensing

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Big Data & Security. Aljosa Pasic 12/02/2015

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Amplify Serviceability and Productivity by integrating machine /sensor data with Data Science

Data Science & Big Data Practice

Computer Science MS/MCS ipos Information Session. Fall 2015

Transcription:

BIG DATA @ EMITLAB & CIDSE K. Selçuk Candan candan@asu.edu

Name: K. Selçuk Candan! Professor of Computer Science and Engineering at (CIDSE) ASU! Director, Enterprise, Media, and Information Technologies Labs (EmitLab)! Fulton Schools of Engineering Exemplar Faculty! Senior Sustainability Scientist- Global Institute of Sustainability

EmitLab Xiaolan Wang Ex-MS (now at U.Mass) Sriram Rathinavelu Ex-MS Mijung Kim ; Ex-PhD (now at HP Labs) Aneesha Bhat M S Jung Hyun Kim PhD Mithila Nagendra Ex-PhD (now at Akamai) Yash Garg M S Parth Nagarkar PhD Xinsheng Liu PhD Sicong Liu PhD Marco Berchiatti MS (U. Torino) Shengyu Huang PhD Adam Tse Undergrad Xilun Chen PhD Leonardo Allisio MS (U. Torino) Silvestro Poccia Research Technologist Ilaria Dal Grande MS (U. Torino) Rosaria Rossini PhD (U. Torino) KSC Maria Luisa Sapino Professor (U. Torino) Claudio Schifanella Ex. Post-doc. (now at RAI) Antonio Penta Post-doc. (U. Torino)

Research Overview Recent Relevant Grants/Projects: [NSF] National Science Digital Library (NSDL) Middleware for Network- and Context-aware Recommendations [KRA] A Framework for Real-time Context Monitoring in Sensor-rich Personal Mobile Environments [NSF] AURA: Design of Dense RFID Systems for Indexing in the Physical World across Space, Time, and Human Experience Ongoing Grants/Projects: [with SHESC, NSF] Management for Real-Time Driven Epidemic Simulations [with SHESC, NSF] Understanding the Evolution Patterns of the Ebola Outbreak in West- Africa and Supporting Real-Time Decision Making and Hypothesis Testing through Large Scale Simulations [NSF] RanKloud: Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis [with JCI, NSF] Analysis and Optimization for Building Energy Management NSF: An Infrastructure to Support Complex Financial Patterns (CFP) based Real-Time Services Delivery and Visual Analytics [NSF] One Size Does Not Fit All: Empowering the User with User-Driven Integration NSF-IGERT: Person-centered Technologies and Practices for Individuals with Disabilities

What do I do?? Executive Committee member, ACM Special Interest Group on Management of (SIGMOD) Associate editor, ACM Transactions on base Systems (TODS) Associate editor, IEEE Transactions on Multimedia Associate editor, the Very Large Bases journal (2005-2012) Associate editor, Journal of Multimedia General Chair, IEEE International Conference on Cloud Engineering (IC2E) 2015. Workshops Chair, International Conference on Extending base Technology (EDBT) 2014 Organizing Committee Member, ACM SIG Multimedia Conference 2013 Panels Chair, Very Large bases (VLDB) Conference 2012 Publicity Chair, ACM SIG Multimedia Conference 2012 General Chair, ACM SIGMOD Conference 2012 General Chair, ACM SIG Multimedia Conference 2011 Program Group leader, ACM SIG Management of (SIGMOD) Conference 2010 PC Chair, the ACM International Conference on Image and Video Retrieval (CIVR) 2010 PC Chair, Workshop on Information & Software as Services. (WISS) 2010 Chair,Workshop on Information & Software as Services. (WISS) 2009 Chair, Workshop on Real-Time Business Intelligence (RTBI) 2009 PC Chair, ACM Workshop on Ambient Media Computing (iwam) 2009. PC Chair, ACM SIG Multimedia Conference 2008

Today, the amount of data being generated is massive. This necessitates engineering of new data architectures with lots of processing power and tools that can match the scale of the data and support split second decision making, through data fusion and integration and analysis and forecasting algorithms, to help non-data-experts (both government and commercial) make decisions and generate value. "Hunting for the Value Gaps in Management, Services, and Analytics ACM SIGMOD blog; http://wp.sigmod.org/

Challenges Cisco estimates we ll see a 1.3 zettabytes of traffic annually over the internet in 2016 Sensors from a Boeing jet engine create 20 terabytes of data every hour. 500 terabytes of new data of all forms are ingested in Facebook every day ISQP 3Vs HMLE [I]mprecision [S]parsity [Q]uality [P]rivacy [V]olume [V]elocity [V]ariety [H]igh-dimensional [M]ulti-modal inter-[l]inked [E]volving

Manage ment Analytic s Dimensi onality reductio n/feature selection Classific ation, clusterin g Summar ization Visual analytics Feature extractio n/media analysis Tempor al/spatial analysis Text Analysis /NLP Web/ social network s Recom mender systems Scalable /real time Perform ance and Scalabili ty Consiste ncy, quality, cleaning models Organiz ation and Schema Integrati on Cloud, DaaS Streami ng Parallel/ Distribut ed DM MapRede ce/ Hadoop Pregel/ Hama Other parallel DBMS Multitenant, Virtualiz ation Security, privacy, assuran ce Mobile, Sensor Visualiz ation Extractio n, filtering Rowstores Column Stores Key-value stores NoSql Relational OO XML Spatial Temporal Sequence Graph Fuzzy/ uncertain Text, image, video

Sequence Spatial management/mining techniques for supporting scalable, real-time, distributed analysis and retrieval systems Rowstores Key-value stores Fuzzy/ uncertain Column Stores NoSql and Schema Integrati on Graph Text, image, video Organiz ation Cloud, DaaS Multitenant, Virtualiz ation Temporal models Manage ment Streami ng XML Relational Mobile, Sensor Security, privacy, assuran ce systems for scalable data/query processing data streaming/mining/fusion OO Perform ance and Scalabili ty Parallel/ Distribut ed DM Visualiz ation Consiste ncy, quality, cleaning Extractio n, filtering MapRede ce/ Hadoop Pregel/ Hama Other parallel DBMS Feature extractio n/media analysis Tempor al/spatial analysis Visual analytics Text Analysis /NLP Summar ization Analytic s Web/ social network s Classific ation, clusterin g Scalable /real time Recom mender systems Dimensi onality reductio n/feature selection

Rowstores Key-value stores Most data in the real world are Spatial Sequence imprecise, multi-modal, and subjective Temporal anyhow Column Stores NoSql and Schema Integrati on Graph Organiz ation Cloud, DaaS Multitenant, Virtualiz ation Manage ment Streami ng XML So can we leverage techniques Fuzzy/ uncertain models from data and Text, media analysis Relational to image, video tackle the so called traditional data management/mining challenges?? Mobile, Sensor Security, privacy, assuran ce OO Perform ance and Scalabili ty Parallel/ Distribut ed DM Visualiz ation Consiste ncy, quality, cleaning Extractio n, filtering MapRede ce/ Hadoop Pregel/ Hama Other parallel DBMS Feature extractio n/media analysis Tempor al/spatial analysis Visual analytics Text Analysis /NLP Summar ization Analytic s Web/ social network s Classific ation, clusterin g Scalable /real time Recom mender systems Dimensi onality reductio n/feature selection

CENTER/CONSORTIUM FOR ASSURED AND SCALABLE DATA ENGINEERING (CASCADE) (CONSTRUCTION STAGE)

Focus and vision

CASCADE NSF I/UCRC Center (Proposal) Academic Partners Arizona State Univ. (KS Candan, H Davulcu, G Ahn, M Sapino) University of Maryland, College Park (Louiqa Raschid) The potential industrial members to the proposed NSF I/UCRC Center for Assured and SCAlable Engineering (CASCADE includes ASU site members: American Express, Early Warning, JCI, HP Labs, MapR, NEC America Labs, Oracle, Computational Analysis & Network Enterprise Solutions (CAaNES), Arizona Cyber Threat Response Alliance (ACTRA) UMD site members: Unscrambl, Leidos, JP Morgan Chase, Applied Communication Sciences (ACS), John Bottega, State Street, IBM Other potential partners Rengen Orion Health

Core CS Faculty working on Name Title Area(s) of Specialization as they relate to proposed concentration K. Selcuk Candan Professor Scalable data management and analysis Hasan Davulcu Assoc. Professor bases and data extraction Huan Liu Professor mining and analysis Ross Maciejewski Assistant Professor visualization Baoxin Li Professor Statistical machine learning, visual data Rao Kambhampati Professor integration, data cleaning Chitta Baral Professor Knowledge representation, NLP Dijuang Huang Associate Professor clouds Hanghang Tong Assistant Professor Graph structured data Mohamed Sarwat Assistant Professor management systems Jingrui He Assistant Professor analysis and sparse learning Paolo Shakarian Assistant Professor and network analysis

Relevant faculty at CIDSE/ASU 1. Gail- Joon Ahn risk management, access control, and security architecture for distributed systems 2. Ron Askin scheduling, opera?ons research; applied sta?s?cs 3. ChiCa Baral knowledge representa?on, bioinforma?cs, and text analysis 4. Rida Bazzi distributed compu?ng, fault tolerance, dynamic schema update in data clouds 5. K. Selcuk Candan scalable data management, integra?on and retrieval, data management and processing systems, mul?media retrieval, accessibility 6. Partha Dasgupta distributed systems, security, and resilience 7. Sandeep Gupta parallel and distributed compu?ng, data centers, energy- efficient, reliable data dissemina?on, and caching 8. Dijang Huang security, virtualiza?on, mobile cloud compu?ng 9. Subbarao Kambhampa? data integra?on, data cleaning, and planning 10. Baoxin Li sta?s?cal inference for visual tracking, feature selec?on for data/sensor fusion, image/video retrieval 11. Huan Liu data mining, machine learning, feature selec?on, classifica?on, subspace clustering, and social compu?ng 12. Ross Maciejewski geo- spa?al and spa?o- temporal visualiza?on, visual analy?cs for healthcare/pandemics, law enforcement 13. Pitu Mirchandhani water distribu?on systems, urban planning, transporta?on, forecas?ng, dynamic systems, remote sensing 14. Sethuraman Panchanathan ubiquituous mul?media analyis, accesibility 15. Andrea Richa adhoc networks, algorithms, self organizing systems, wireless communica?on 16. George Runger sta?s?cal learning, process control, data mining for massive, mul?variate data sets 17. Arunabha Sen network analysis, social, biological, transporta?on, communica?on networks 18. Esma Gel applied probability techniques for modeling, design and control of produc?on systems and supply chain 19. Hari Sundaram mul?- media and social- media analy?cs 20. Yalin Wang data visualiza?on, medical imaging, sta?s?cal pacern recogni?on 21. Peter Wonka data visualiza?on, geo- spa?al visualiza?on, modelling, image analysis 22. Teresa Wu decision making under uncertainty, biomedical informa?cs 23. Guoliang Xue privacy, smart grid, cloud compu?ng, network science 24. Steve Yau service- based systems, informa?on assurance, security, qos monitoring 25. Jieping Ye machine learning, data mining, dimensionality reduc?on, biomedical informa?cs 26. Nong Ye cyber- and network security

Relevant faculty at CIDSE/ASU

Big Systems Concentration for MS in Computer Science

CIDSE MS/MCS Concentration in Big Systems 15 credits of coursework in data engineering and data analytics Required base Management System (DBMS) Implementation Distributed and Parallel Systems Mining Elective (2 out of 5) Virtualization and Cloud Computing Semantic Web Mining Visualization Multimedia and Web bases Statistical Machine Learning

Key knowledge gaps.. Six most critical knowledge competency groups (in terms of the value gap i.e., the difference between current and desired states of the knowledge area) temporal and spatial analyses, summarization, cleaning, visualization, anomaly detection, real-time processing for streaming data, media analytics representations and fusion for unstructured/structured data, semantic Web, make unstructured data queriable, prioritize and rank data, correlate and identify the gaps in the data graph-based models, social networks, entity analytics, (social and other) network analytics, performance and scalability, distributed architectures. performance and scalability, distributed architectures. "Hunting for the Value Gaps in Management, Services, and Analytics ACM SIGMOD blog; http://wp.sigmod.org/

Key Tools.. Tools that can support federated and scalable data storage, analysis, and modeling make unstructured data queriable, prioritize and rank data, correlate and identify the gaps in the data entity analytics, (social and other) network analytics, and media analytics take into account for known models, but also adapt to new emerging patterns going back in history to validate models and going forward into future to support forecasting and if-then hypothesis testing.

Engineers.must have solid algorithmic and mathematical background, complemented with excellent data management, programming, and system development/integration skills

Engineers..should be able make informed architectural decisions based on a MapReduce/Hadoop Clustering/ classification RanKloud good understanding on how Reduce available technologies differ and complement each other Spark Mango-DB Map Map Map Map Feature extraction NetworkX GraphLab MADLib Hadoop-Online

Engineers.should also be able to identify data that is important, restructure data to make it useful, interpret data, formulate observation strategies and relevant data queries, and ask new questions based on the observations and results including what happened?, why did it happen?, and what happens next?.

Engineers..need to have the necessary skills to communicate with non data scientist/engineer co-workers, including domain experts business executives

Key learning outcomes make informed architectural decisions based on a good understanding on how available technologies differ and complement each other and what scalability/consistency trade-offs they provide. be able to pick and deploy the appropriate data management, processing, and analysis systems (including commercial and open-source) with the suitable structured or unstructured data model for the particular task and domain application needs. make informed decisions regarding data storage, indexing, querying, and retrieval. reason about optimization and execution alternatives and will be able to plan within the trade-offs introduces by concurrency control, transaction management, and recovery protocols and algorithms.

Key learning outcomes use tools and develop frameworks for federated and cloud based data storage, analysis, and modeling and mediated data services delivery. use as well as develop high performance distributed and/ or parallel data architectures that can match the scale of the data and support split second decision making, through data fusion and integration and analysis and forecasting algorithms. use as well as develop real-time, on-line data processing systems for temporally and spatially distributed observations for data in motion in applications, including those that include mobile applications, location-aware services, and human behavior modeling at individual and population scales. use as well as develop scalable batch processing systems for data at rest.

Key learning outcomes have knowledge regarding cutting-edge algorithms and systems for temporal and spatial data analyses, summarization, cleaning, anomaly detection, representations and fusion for unstructured/structured data, semantic Web, graph-based models, social networks, and multi-dimensional data visualization, use as well as develop tools that support entity analytics, (social and other) network analytics, text analytics, and media analytics not only for traditional applications like monitoring and security, but also for emerging applications, including enabling interest detection for retail/advertisement, social media, energy, healthcare, and finance.

Key learning outcomes use and develop algorithms, techniques, and tools for reducing the size and/or dimensionality of the data to make data amenable to analysis. make unstructured data queriable, prioritize and rank data, correlate and identify the gaps in the data, highlight what is normal and not normal, and automate the ingest of the data.

Key learning outcomes The graduates will be able to design and develop adaptive systems that take into account known models, but also adapt the models to new emerging patterns. use tools and develop systems that can go back in history to validate models and go forward into future to support forecasting and if-then hypothesis testing. The graduates will have the necessary skills to communicate with technical and non-technical co-workers