Big Data in The Web. Agenda. Big Data Asking the Right Questions Wisdom of Crowds in the Web The Long Tail Issues and Examples Concluding Remarks
|
|
- Brian Holmes
- 8 years ago
- Views:
Transcription
1 Big Data in The Web Ricardo Baeza-Yates Yahoo! Labs Barcelona & Santiago de Chile Agenda Big Data Asking the Right Questions Wisdom of Crowds in the Web The Long Tail Issues and Examples Concluding Remarks - 3-1
2 Big Data Capture, transfer, store, search, share, analyze, and visualize large data in reasonable time Large volume and growth Petabytes to exabytes Growth is estimated in 3 exabytes per day Structured vs. non-structured data Diversity Types, formats, complexity, topics, etc. Best Public Data Example: The Web Content: text, multimedia Structure: graphs Usage: real time streams Big Data Focus on analytics Many storage technologies: DBs, DWs, distributed file systems, Many processing technologies: Cloud computing, map-reduce (Hadoop), Data mining, clustering, classification, Machine learning, A/B testing, NLP, Simulation Several technology providers Initial best practices (see TDWI report, 2011) Main challenges: scalability, online
3 Big Data: The Five V s Characteristic Data Issue Computing Issue Volume Variety Veracity Scale, Redundancy Heterogeneity, Complexity Completeness, Bias, Sparsity, Noise, Spam Scalability Adaptability, Extensibility Reliability, Trust Velocity Real time Online Value Usefulness, Privacy Business dependent Asking the Right Questions Problem Driven What data we need? How much? How we collect it? How we store and transfer it? Understanding the Data How sparse is the data? How much noise? There is redundancy? There are biases? There is spam? Any outliers? Analyzing the Data Any privacy issues? Do we need to anonymize? How well our algorithms scale? Can we visualize the results?
4 Too Much Data Available The Web is a database! Data does not imply information Many analyses for the sake of it (data driven) Analyzing data is not CS per se Publish in the right forum! Big Data or Right Data? The Different Facets of the Web
5 The Structure of the Web Big Data in the Web Wordnet Explicit Metadata RDF Wikipedia ODP Y! Answers Flickr UGC Blogs, Groups Implicit Text Anchors + links Logs (Clicks+Queries) Private Quality? Scale
6 What is in the Web? How Good it is? Quantity Usergenerated Traditional publishing Quality What else is in the Web?
7 Noise and Spam Noise may come from many places: Instruments that measure How we interpret the data (example later) Spam is everywhere Web Spam Deceiving text, links, clicks due to an economic incentive Depending on the goal and the data, spam is easier to generate Depending on the type & target data, spam is easier to fight Disincentives for spammers? Social Economical Web Spam is NOT Mail Spam
8 Content and Metadata Trends [Ramakrishnan and Tomkins 2007]
9 Web Data Trends User Generated Content Massive (quality vs. quantity) Social Networks Real time (people + physical sensors) Impact Fragmentation of ownership Fragmentation of access (longer heavy tail) Fragmentation of right to access Viability Business model based in advertising The Wisdom of Crowds James Surowiecki, a New Yorker columnist, published this book in 2004 Under the right circumstances, groups are remarkably intelligent Importance of diversity, independence and decentralization Aggregating data large groups of people are smarter than an elite few, no matter how brilliant they are better at solving problems, fostering innovation, coming to wise decisions, even predicting the future
10 6/28/13 Web Data Mining Content: text & multimedia mining Structure: link analysis, graph mining Usage: log analysis, query mining Relate all of the above Web characterization Particular applications Flickr: Clustering Pictures
11 6/28/13 Popularity Flickr: Geo-tagged pictures
12 Crowd Sourcing Web-based peer production has produced a number of successful products and communities Wikipedia, Y! Answers, YouTube, Flickr, Digg,... Can this form of production be harnessed for other ends? Existing successes are hard to replicate at will Amazon Mechanical Turk (AMT) Like outsourcing, but in a micro-distributed fashion Thousands of turkers working on hundreds of HITS (tasks) Rates are typically few cents per task Quality of their work is positively evaluated (e.g. in IR) The Wisdom of (Large) Crowds Crucial for Search Ranking Text: Web Writers & Editors not only for the Web! Links: Web Publishers Tags: Web Taggers Queries: All Web Users! The crowd implicitly knows the experts! (! action Queries and actions (or no
13 Scalability How to scale? Doubling the data in the best case will double the time Time complexity vs. result quality trade-off Example: entity detection in linear time at almost state of the art quality That implies that there exists a text size n* for which the linear algorithm will produce more correct entities Distributed parallel processing Map-reduce not always works Parallelism is problem dependent Online processing needs a different approach Redundancy and Bias There is any dependency in the data? There is any duplication? Lexical duplication in the Web is around 25% Semantic duplication is larger Are there any biases? Example 1: clicks in search engines Bias to the ranking and the interface There is a ranking bias in the Web content Example 2: tag recommendation
14 We can suggest tags: nice but Privacy Example: AOL Query Logs Release Incident A Face Is Exposed for AOL Searcher No , By MICHAEL BARBARO and TOM ZELLER Jr, The New York Times, Aug No conducted hundreds of searches over a three-month period on topics ranging from numb fingers to 60 single men. Other queries: landscapers in Lilburn, Ga, several people with the last name Arnold and homes sold in shadow lake subdivision gwinnett county georgia. Data trail led to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends medical ailments and loves her three dogs
15 Risks of Privacy (ZIP code, date of birth, gender) is enough to identify 87% of US citizens using public DB (Sweeney, 2001) K-anonymity Suppress or generalize attributes until each entry is identical to at least k-1 other entries Federal Trade Commission in US: Privacy policies should address the collection of data itself and not just how the data is used, Dec Data Protection Directive in EU Risks of Privacy: Query Logs Profile: [Jones, Kumar, Pang, Tompkins, CIKM 2007] Gender: 84% Age (±10): 79% Location (ZIP3): 35% Vanity Queries: [Jones et al, CIKM 2008] Partial name: 8.9% Complete: 1.2% More information: A Survey of query log privacy-enhancing techniques from a policy perspective [Cooper, ACM TWEB 2008] A good anonymization is still an open problem
16 Sparsity The Long Tail is always Sparse Why there is a long tail? When the crowd dominates Empowering the tail Example: Relations from Query Logs The Wisdom of Crowds Popularity Diversity Quality Coverage Long tail Heavy tail
17 The Long Tail Most measures in the Web follow a power law Heavy tail of user interests Many queries, each asked very few times, make up a large fraction of all queries Movies watched, blogs read, words used, One explanation Interests People Normal people 42 Weirdos
18 Heavy tail of user interests Many queries, each asked very few times, make up a large fraction of all queries Applies to word usage, web page access, We are all partially eclectic The reality Interests People Broder, Gabrilovich, Goel, Pang; WSDM Example: Click Distribution User interaction is a power law! (Zipf s principle of minimal effort)
19 When the crowd dominates Kills the long tail See (obsolete now) shwarzneger example Empowering the Tail The Filter Bubble, Eli Pariser Avoid the Poor get Poorer Syndrome Solutions: Diversity Novelty Serendipity Explore & Exploit
20 How to Circumvent Sparsity? Wisdom of ad-hoc crowds? Aggregate data in the right way When data is sparse Aggregate users around same intent, task, facet,. Change granularity ad hoc Middle age men Fans of Messi Example: Mining Geo/time Data Optimal Touristic Paths from Flickr Good for tourists and locals De Choudhury et al, HT
21 Aggregating in the Long Tail The long tail is important not only for e- commerce, but because we are all there Personalization vs. Contextualization User interaction is another long tail Interests People Epilogue l The Web is scientifically young l The Web is intellectually diverse l The technology mirrors the economic, legal and sociological reality l Data must be interesting! (Gerhard Weikum) l Problem driven l Plenty of challenges
22 Mirror of Society Exports/Imports vs. Domain Links Baeza-Yates & Castillo, WWW
23 Questions? ASIST 2012 Book of the Year Award Contact: Thanks to many people at Yahoo! Labs 23
Big Data or Right Data?
Big Data or Right Data? Ricardo Baeza-Yates Yahoo! Labs Barcelona & Web Research Group, DTIC, Univ. Pompeu Fabra Barcelona, Spain rbaeza@acm.org Abstract. Big data nowadays is a fashionable topic, independently
More informationSpam & The Power of Social Networks
Spam & The Power of Social Networks Ricardo Baeza-Yates Yahoo! Research Barcelona, Spain & Santiago, Chile Thanks to Tim Converse (Yahoo! Search) & R. Ravi (CMU) Models of Trust Workshop: May 22, 2006
More informationCS377: Database Systems Data Security and Privacy. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Security and Privacy Li Xiong Department of Mathematics and Computer Science Emory University 1 Principles of Data Security CIA Confidentiality Triad Prevent the disclosure
More informationWhy big data? Lessons from a Decade+ Experiment in Big Data
Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationSocial Data Science for Intelligent Cities
Social Data Science for Intelligent Cities The Role of Social Media for Sensing Crowds Prof.dr.ir. Geert-Jan Houben TU Delft Web Information Systems & Delft Data Science WIS - Web Information Systems Why
More informationWhat to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies
What to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies Big Data Value Two Main Issues in Big Data Mining Agenda Four Principles for What to Mine Stories regarding to Principles Search and
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationMLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group
Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationINTRO TO BIG DATA. Djoerd Hiemstra. http://www.cs.utwente.nl/~hiemstra. Big Data in Clinical Medicinel, 30 June 2014
INTRO TO BIG DATA Big Data in Clinical Medicinel, 30 June 2014 Djoerd Hiemstra http://www.cs.utwente.nl/~hiemstra WHY BIG DATA? 2 Source: http://en.wikipedia.org/wiki/mount_everest 3 19 May 2012: 234 people
More informationChallenges and Opportunities in Data Mining: Personalization
Challenges and Opportunities in Data Mining: Big Data, Predictive User Modeling, and Personalization Bamshad Mobasher School of Computing DePaul University, April 20, 2012 Google Trends: Data Mining vs.
More informationBIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationBig Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationExploring Big Data in Social Networks
Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
More informationBig Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs
1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationCAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationIEEE JAVA Project 2012
IEEE JAVA Project 2012 Powered by Cloud Computing Cloud Computing Security from Single to Multi-Clouds. Reliable Re-encryption in Unreliable Clouds. Cloud Data Production for Masses. Costing of Cloud Computing
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
More informationA Professional Big Data Master s Program to train Computational Specialists
A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationEXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity
EXECUTIVE REPORT Big Data and the 3 V s: Volume, Variety and Velocity The three V s are the defining properties of big data. It is critical to understand what these elements mean. The main point of the
More informationCollaborations between Official Statistics and Academia in the Era of Big Data
Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationDAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY
Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationBig Data? Definition # 1: Big Data Definition Forrester Research
Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationExploiting the power of Big Data
Exploiting the power of Big Data Timos Sellis School of Computer Science and Information Technology timos.sellis@rmit.edu.au ITECHLAW Asia-Pacific Conference, February 26-28, 2014 Melbourne Australia Timeline
More informationDe la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data
De la Business Intelligence aux Big Data Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris 22/01/14 Séminaire Big Data 1 Agenda EvoluHon of Business Intelligence SemanHc Technologies
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationCustomized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
More informationwww.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationImproving Data Processing Speed in Big Data Analytics Using. HDFS Method
Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India
More informationOnline Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
More informationSecuring Big Data Learning and Differences from Cloud Security
Securing Big Data Learning and Differences from Cloud Security Samir Saklikar RSA, The Security Division of EMC Session ID: DAS-108 Session Classification: Advanced Agenda Cloud Computing & Big Data Similarities
More informationAn Overview of Computational Advertising
An Overview of Computational Advertising Evgeniy Gabrilovich in collaboration with many colleagues throughout the company 1 What is Computational Advertising? New scientific sub-discipline that provides
More informationBig Data The Next Phase Lessons from a Decade+ Experiment in Big Data
Big Data The Next Phase Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 Outline Big Data Overview Thinking
More informationSEAIP 2009 Presentation
SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationManaging Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
More informationChapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationHow To Create A Data Science System
Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002
More information1. Understanding Big Data
Big Data and its Real Impact on Your Security & Privacy Framework: A Pragmatic Overview Erik Luysterborg Partner, Deloitte EMEA Data Protection & Privacy leader Prague, SCCE, March 22 nd 2016 1. 2016 Deloitte
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationTraining for Big Data
Training for Big Data Learnings from the CATS Workshop Raghu Ramakrishnan Technical Fellow, Microsoft Head, Big Data Engineering Head, Cloud Information Services Lab Store any kind of data What is Big
More informationIndustry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
More informationTechnical challenges in web advertising
Technical challenges in web advertising Andrei Broder Yahoo! Research 1 Disclaimer This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. 2 Advertising
More informationReal Time Analytics for Big Data. NtiSh Nati Shalom @natishalom
Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for
More informationLeveraging Social Media
Leveraging Social Media Social data mining and retargeting Online Marketing Strategies for Travel June 2, 2014 Session Agenda 1) Get to grips with social data mining and intelligently split your segments
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationPrivacy and Privacy-Enhancing Technologies for Big Data Analytics
Privacy and Privacy-Enhancing Technologies for Big Data Analytics Trial lecture, NTNU, April 29 2014 Jostein Jensen Case: Law and order Predictive Policing Foothill'division'LAPD' Crime:'dropped'by'30%'
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationOur Data & Methodology. Understanding the Digital World by Turning Data into Insights
Our Data & Methodology Understanding the Digital World by Turning Data into Insights Understanding Today s Digital World SimilarWeb provides data and insights to help businesses make better decisions,
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationAlexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
More informationCrowdsourcing for Big Data Analytics
KYOTO UNIVERSITY Crowdsourcing for Big Data Analytics Hisashi Kashima (Kyoto University) Satoshi Oyama (Hokkaido University) Yukino Baba (Kyoto University) DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationBig Data Research in the AMPLab: BDAS and Beyond
Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned
More informationSustainable Development with Geospatial Information Leveraging the Data and Technology Revolution
Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
More informationAnalyzing HTTP/HTTPS Traffic Logs
Advanced Threat Protection Automatic Traffic Log Analysis APTs, advanced malware and zero-day attacks are designed to evade conventional perimeter security defenses. Today, there is wide agreement that
More informationIntroduction to Data Mining
Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what
More informationBIG Big Data Public Private Forum
DATA STORAGE Martin Strohbach, AGT International (R&D) THE DATA VALUE CHAIN Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Structured data Unstructured data Event processing
More informationOf all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014
What is Big Data? Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014 Data in the Twentieth Century and before In 1663,
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationPrivacy & data protection in big data: Fact or Fiction?
Privacy & data protection in big data: Fact or Fiction? Athena Bourka ENISA ISACA Athens Conference 24.11.2015 European Union Agency for Network and Information Security Agenda 1 Privacy challenges in
More informationA STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationHow to Drive More Traffic to Your Event Website
Event Director s Best Practices Webinar Series Presents How to Drive More Traffic to Your Event Website Matt Clymer Online Marketing Specialist December 16 th 2010 Today s Speakers Moderator Guest Speaker
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationExecutive Summary By Blake Johnson
Executive Summary By Blake Johnson Creating Business Value with Enterprise Data and Analytics A one-day event presented by: Stanford s Management Science and Engineering department, The Global Supply Chain
More informationTowards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On
More informationBig Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India
Big Data and Semantic Web in Manufacturing Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Outline Big data in Manufacturing Big data Analytics Semantic web technologies Case
More informationWhat do Big Data & HAVEn mean? Robert Lejnert HP Autonomy
What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationBig Data Security Challenges and Recommendations
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-1 E-ISSN: 2347-2693 Big Data Security Challenges and Recommendations Renu Bhandari 1, Vaibhav Hans 2*
More informationBig Data: Study in Structured and Unstructured Data
Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More information