BIG DATA @ EMITLAB & CIDSE K. Selçuk Candan
Name: K. Selçuk Candan Professor of computer science and engineering at (CIDSE) ASU Senior Sustainability Scientist- Global Institute of Sustainability Director, Enterprise, Media, and Information Technologies Labs (EmitLab)
EmitLab Sriram Rathinavelu MS Mijung Kim Jung Hyun Kim Yash Garg MS Parth Nagarkar Mithila Nagendra Shengyu Huang Sicong Liu Xilun Chen Rosaria Rossini (U. Torino) Xinsheng Liu Maria Luisa Sapino Professor (U. Torino) Claudio Schifanella Post-doc. (U. Torino) Antonio Penta Post-doc. (U. Torino)
What do I do?? Executive Committee member, ACM Special Interest Group on Management of Data (SIGMOD) Associate editor, IEEE Transactions on Multimedia Associate editor, the Very Large Data Bases journal (2005-2012) Associate editor, Journal of Multimedia General Chair, IEEE International Conference on Cloud Engineering (IC2E) 2015. Workshops Chair, International Conference on Extending Database Technology (EDBT) 2014 Organizing Committee Member, ACM SIG Multimedia Conference 2013 Panels Chair, Very Large Databases (VLDB) Conference 2012 Publicity Chair, ACM SIG Multimedia Conference 2012 General Chair, ACM SIGMOD Conference 2012 General Chair, ACM SIG Multimedia Conference 2011 Program Group leader, ACM SIG Management of Data (SIGMOD) Conference 2010 PC Chair, the ACM International Conference on Image and Video Retrieval (CIVR) 2010 PC Chair, Workshop on Information & Software as Services. (WISS) 2010 Chair,Workshop on Information & Software as Services. (WISS) 2009 Chair, Workshop on Real-Time Business Intelligence (RTBI) 2009 PC Chair, ACM Workshop on Ambient Media Computing (iwam) 2009. PC Chair, ACM SIG Multimedia Conference 2008
What do I do?? How can we provide the relevant data/information to the right person/application fast???
Data data Exabytes (2 60 bytes) 400GB per person 200GB per person
Data in the real world? energy rehabilitation training security smart-offices smart-rooms production life-sciences defense VOLUME sports Cisco estimates robotics we ll see a 1.3 zettabytes of traffic annually over the internet in 2016 elderly-care retail child-care supply-chain entertainment VELOCITY personal-data management transportation Sensors education from a Boeing jet engine create 20 terabytes of data every hour. space exploration pet-care health-care arts VARIETY business/enterprise sciences advertisement 500 terabytes of new data of all forms are ingested in Facebook every day
Data challenges Cisco estimates we ll see a 1.3 zettabytes of traffic annually over the internet in 2016 Sensors from a Boeing jet engine create 20 terabytes of data every hour. 500 terabytes of new data of all forms are ingested in Facebook every day IST 3Vs HMLE [I]mprecision [S]parsity (lack of) [T]rust [V]olume [V]elocity [V]ariety [H]igh-dimensional [M]ulti-modal inter-[l]inked [E]volving
Data Challenges Cisco estimates we ll see a 1.3 zettabytes of traffic annually over the internet in 2016 Sensors from a Boeing jet engine create 20 terabytes of data every hour. 500 terabytes of new data of all forms are ingested in Facebook every day IST 3Vs HMLE [I]mprecision [S]parsity (lack of) [T]rust [V]olume [V]elocity [V]ariety [H]igh-dimensional [M]ulti-modal inter-[l]inked [E]volving
Big Data Systems space Data Manage ment Data Analytic s Dimensi onality reductio n/feature selection Classific ation, clusterin g Summar ization Visual analytics Feature extractio n/media analysis Tempor al/spatial analysis Text Analysis /NLP Web/ social network s Recom mender systems Scalable /real time Perform ance and Scalabili ty Consiste ncy, quality, cleaning Data models Data Organiz ation Data and Schema Integrati on Cloud, DaaS Data Streami ng Parallel/ Distribut ed DM MapRede ce/ Hadoop Pregel/ Hama Other parallel DBMS Multitenant, Virtualiz ation Security, privacy, assuran ce Mobile, Sensor Visualiz ation Extractio n, filtering Rowstores Column Stores Key-value stores NoSql Relational OO XML Spatial Temporal Sequence Graph Fuzzy/ uncertain Text, image, video
Research Overview Ongoing Grants/Projects: [NSF] RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis [NSF] National Science Digital Library (NSDL) Middleware for Network- and Context-aware Recommendations [NSF] One Size Does Not Fit All: Empowering the User with User-Driven Integration [NSF] The Complexities of Ecological and Social Diversity: A Long-Term Perspective [with JCI, NSF] Data Analysis and Optimization for Building Energy Management [with SHESC, NSF] Data Management for Real-Time Data Driven Epidemic Simulations NSF-IGERT: Person-centered Technologies and Practices for Individuals with Disabilities Newer/Other Efforts [with West Point] SHARK: Searching Huge Attribute and Relational Knowledgebases Data management techniques for supporting scalable, real-time integration, analysis, and retrieval of large data sets
CS Faculty working on Data Name Title Area(s) of Specialization as they relate to proposed concentration K. Selcuk Candan Professor Databases and data management Hasan Davulcu Assoc. Professor Databases and data extraction Huan Liu Professor Data mining and analysis Ross Maciejewski Assistant Professor Data visualization Jieping Ye Assoc. Professor Data analysis Rao Kambhampati Professor Data integration, data cleaning Chitta Baral Professor Knowledge representation, NLP Dijuang Huang Assoc. Professor Data clouds
Relevant faculty at CIDSE/ASU 1. Gail- Joon Ahn risk management, access control, and security architecture for distributed systems 2. Ron Askin scheduling, opera?ons research; applied sta?s?cs 3. ChiCa Baral knowledge representa?on, bioinforma?cs, and text analysis 4. Rida Bazzi distributed compu?ng, fault tolerance, dynamic schema update in data clouds 5. K. Selcuk Candan scalable data management, integra?on and retrieval, data management and processing systems, mul?media retrieval, accessibility 6. Partha Dasgupta distributed systems, security, and resilience 7. Sandeep Gupta parallel and distributed compu?ng, data centers, energy- efficient, reliable data dissemina?on, and caching 8. Dijang Huang security, virtualiza?on, mobile cloud compu?ng 9. Subbarao Kambhampa? data integra?on, data cleaning, and planning 10. Baoxin Li sta?s?cal inference for visual tracking, feature selec?on for data/sensor fusion, image/video retrieval 11. Huan Liu data mining, machine learning, feature selec?on, classifica?on, subspace clustering, and social compu?ng 12. Ross Maciejewski geo- spa?al and spa?o- temporal visualiza?on, visual analy?cs for healthcare/pandemics, law enforcement 13. Pitu Mirchandhani water distribu?on systems, urban planning, transporta?on, forecas?ng, dynamic systems, remote sensing 14. Sethuraman Panchanathan ubiquituous mul?media analyis, accesibility 15. Andrea Richa adhoc networks, algorithms, self organizing systems, wireless communica?on 16. George Runger sta?s?cal learning, process control, data mining for massive, mul?variate data sets 17. Arunabha Sen network analysis, social, biological, transporta?on, communica?on networks 18. Esma Gel applied probability techniques for modeling, design and control of produc?on systems and supply chain 19. Hari Sundaram mul?- media and social- media analy?cs 20. Yalin Wang data visualiza?on, medical imaging, sta?s?cal pacern recogni?on 21. Peter Wonka data visualiza?on, geo- spa?al visualiza?on, modelling, image analysis 22. Teresa Wu decision making under uncertainty, biomedical informa?cs 23. Guoliang Xue privacy, smart grid, cloud compu?ng, network science 24. Steve Yau service- based systems, informa?on assurance, security, qos monitoring 25. Jieping Ye machine learning, data mining, dimensionality reduc?on, biomedical informa?cs 26. Nong Ye cyber- and network security
Relevant faculty at CIDSE/ASU