TAAI 2012 Panel Discussion: Big Data. About Me: Chin Yew Lin



Similar documents
Big Data a threat or a chance?

The evolution of Social CRM

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Big Data Analytics in Mobile Environments

White Paper: Big Data and the hype around IoT

The Power of Social Data: Transforming Big Data into Decisions. Andreas Weigend

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

Big Data and Society: The Use of Big Data in the ATHENA project

Big Data :: Big Demand

Real-Time News Analytics With Big Data Technologies. Volker Stümpflen CEO Clueda AG Robert Feckl CIO Baader Bank AG

Marketing Report survey results. Yesler Software Shortlist Maximizer Hanley Wood. Sponsored by

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

Exploring Big Data in Social Networks

Walmart Global ecommerce. Neil Ashe, President and CEO

Social Media Measurement Meeting Robert Wood Johnson Foundation April 25, 2013 SOCIAL MEDIA MONITORING TOOLS

Northwestern University Feinberg School of Medicine Search Engine Optimization Guide

SOCIAL MEDIA AND THE CUSTOMER EXPERIENCE. View The Webinar. Presented by: Jeff Hodson Aspect. Hosted by: Sally Hurley VIPdesk

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Data Mining: Benefits for business.

New Media WEB in Government:

Content Marketing & Public Relations

HOW SOCIAL MEDIA IMPACTS SEO? a publication by

Advisors: Using Marketing to Build Your Pipeline. Presenter: Barbara Kotlyar Sr. Marketing Manager ByAllAccounts Managing Director, Bridge Marketing

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

A Berkeley View of Big Data

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Futuro Digital LATAM 2014

Pinterest Beginner s Guide for Attorneys

Beat The GMAT MBA Watch Guide

Social Media Statistics. What is going on out there

CONTENT MARKETING SPOTLIGHT REPORT. Sponsored by

SILVERPOP MOCIAL SURVEY:

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

A Strategic Approach to Unlock the Opportunities from Big Data

Converged Media. Earned Media. LOCAL: SIMplified. Converged Media in a Digital World

THE OPEN GRAPH ERA PROVIDES RETAILERS WITH NEW OPPORTUNITIES

Twice monthly free newsletter on mobile apps, tips for using them in creative ways, and discounts on courses and ebooks - from Nicole Hennig.

C10: Using Online Tools and Social Media to Attract New Students

Machine Learning Department, School of Computer Science, Carnegie Mellon University, PA

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Apache Kylin. Open Source Journey

What Social Media Platform is Right For You?

Engaging the growing Washington, DC Chapter through a dynamic online presence

Erik Tarkiainen. Vice President, Marketing Communications Line 6

Content Strategy. Frokostmøte Oslo Trondheim

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

experts in your field Get the profile: Managing your online reputation A Progressive Recruitment career guide Managing your online reputation

SOCIAL MEDIA: Make it part of the plan. Marcus Betschel

We are Big Data A Sonian Whitepaper

About Todd Bailey (Presenter)

Introduction to Big Data the four V's

Tablet Banking Series: Edition 3 More than just the ipad?

Date: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea

STATE OF B2B SEARCH MARKETING 2015

In recent years, many Asian regions are busy implementing their large scale academic research initiatives.

Arturo Azcorra. Catedrático Telefónica en la U Carlos III y Director de IMDEA Networks

Introduction to Social Media

Market Your Oracle Cloud Apps and Services

Whitepaper. 10 ways to integrate marketing and social media

NuWave Commerce SEO & Social Media Packages

8 Digital Trends that Impact Your Business

A Brief History About Social Media

Miguel Ortiz, Sr. Systems Engineer. Globanet

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Working in the Cloud A Fresh Look at Today and Future's Jobs. Eric Ku Co-Founder & COO itrain Asia

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh Stratis Viglas Extreme Computing 1

STATE OF B2B SOCIAL MEDIA MARKETING 2015

Big data and its transformational effects

Rapid Visualization with Big Data Analytics. Ravi Chalaka VP, Solution and Social Innovation Marketing

Big Data Analytics Process & Building Blocks

State of Search Marketing 2014

Big Data Analytics: Today's Gold Rush November 20, 2013

Big Data and Big Analy-cs Trends: The Promise and the Hype. Gregory Piatetsky KDnuggets

So What s the Big Deal?

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

The Next Wave of Data Management. Is Big Data The New Normal?

Introduction to Data Mining

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS

Big Data Introduction, Importance and Current Perspective of Challenges

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

BIG DATA. John A. Eisenhauer Chair, Data Governance Society Rick Young - Managing Director 3Sage Consulting

Open source Google-style large scale data analysis with Hadoop

Outline. What is Big data and where they come from? How we deal with Big data?

Take an Enterprise Approach to E-Discovery. Streamline Discovery and Control Review Cost Using a Central, Secure E-Discovery Cloud Platform

Guaranteed Not to Suck. Issue 02. How Not to Suck at Social Media Marketing

ADAPTING ONLINE. Internet Consulting and Website Design

Cloud Marketplace Market Your Oracle Cloud Apps and Services

adjust reports The Undead App Store A 2014 retrospective report on App Store performance

How to Get the Most out of the Microsoft BizSpark Program

Growth in the Cognitive Era Global Business Services Bridget van Kralingen

SOCIAL MEDIA. How Can Your Customer Contact Center Help Listen In? John Magliocca, Principal Consultant, ISG.

STRATEGIC BUSINESS PLAN

The mobile opportunity: How to capture upwards of 200% in lost traffic

The Experts Guide to Keyword Research for Social Media. A WordStream Guide

SOCIAL MEDIA ADVERTISING STRATEGIES THAT WORK

L1: Introduction to Hadoop

Content Marketing, SEO & Social Media Marketing for Open Source. Joseph Guarino Owner/CEO/Sr. Consultant Evolutionary IT

Transcription:

TAAI 2012 Panel Discussion: Big Data Chin Yew Lin cyl@microsoft.com Microsoft Research Asia About Me: Chin Yew Lin Senior Researcher, Knowledge Mining Group, Microsoft Research Asia Areas of Interest Natural language understanding Knowledge mining Social computing Planning AFNLP SIG on Semantics and Knowledge Most recently Program co chair of ACL 2012 Program co chair of AAAI AI & the Web 2011 Previously ROUGE: automatic evaluation of summaries 1

2

* Gartner Hype Cycle Big Data 2012 3

* http://www.npr.org/2011/11/29/142521910/the digital breadcrumbs that lead to big data Decide.com 25GB per day (Nov 29, 2011) ~Read 150K books per day ~500 pages per day = 800KB of data 100 TB = 600M books http://www.npr.org/2011/11/29/142521910/th e digital breadcrumbs that lead to big data Largest Facebook cluster: 100 PB (Nov 8, 2012) ½ PT new data per day 60,000 Hive queries/day https://www.facebook.com/notes/facebookengineering/under the hood schedulingmapreduce jobs more efficiently withcorona/10151142560538920 * http://www.engadget.com/2011/06/29/visualized a zettabyte/ 4

She liked to watch the America s Got Talent show. She shared lots of travel experience. Cruisecritic Youtube 1. Basic Information Jane Doe Female, 52 years old, Married Live: New York, New York, USA Hometown: Boston, MA, USA She has a second-hand guitar. Gibson 4. Interests 2. Personal life Users celebrated her birthday online. Ozcruiseclub janedoe2012 Flickr Places traveled. She found it was difficult to get along with her son. ebay Mentions of her family Schizophrenia 3. Relationship Bought Cigarette online A smoker? Aggregation of Dynamic User Activity Information Basic Information Personal Life Conversations Interests Facebook Twitter Flickr ebay Youtube Cruisecritic Schizophrenia Ozcruiseclub Jane Doe Female, 52 years old, Married Live: New York, New York, USA Hometown: Boston, USA Female, 56years old, Married A second-hand guitar Buy Cigarette on ebay Live: Family North issues Haven, New South Post Wales, travel experience Australia Visit the Pacific Dawn on Celebrate birthday online Dec.19.2007 Hometown: Moss Vale, New Like the South America s Wales Got Talent show 5

Aggregation of Dynamic User Activity Information Basic Information Personal Life Conversations Interests Facebook Twitter Flickr ebay Youtube Cruisecritic Schizophrenia Ozcruiseclub Not a network of social relationships but a network of shared interests She bought Her Cigarette son takes online, drugs. so she does smoke. Female, 56years old, Married Jen She Bailey went (Jennifer to She visit found Blackie) the Pacific it was difficult Dawn to get Live: on along North Dec.19.2007. with her son. Users celebrate her birthday online. Haven, New South Wales, Australia Hometown: Moss Vale, New South Wales * http://www.engadget.com/2011/06/29/visualized a zettabyte/ RGB Data R Right G Good B Big 6

Most Interesting Task A.I. H.I. 7

Virtual World Real World NeedleSeek: Computable Knowledge Mining open domain semantic knowledge from web scale data sources Empower apps with computable knowledge Decrease Find needles in a haystack Improve Increase Tail CEO Revenue Habitat Cat Animal Head News Channel Founder Company High-Tech Company Profit Fortune 500 Company Underwear Athlete Breed Dog Bark Fox Apple Microsoft Product Boxer Beagle Owner Windows OS 8

Data Scale NeedleSeek: Current Status V2.0 (May 2010) V2.5 Terms: 20 million Links: 1 billion Categories: 10M Head labels: 300K Terms: 80M (EN); 40M (CN); 12M (JP) Links: 2.4B (EN); 1.5B (CN); 0.6B (JP) Categories: 30M (EN) Head labels: 500K (EN) Freshness Mar 2009 data Mar 2012 data Language English English; Chinese; Japanese; English+Chinese Knowledge Types External Data Integration Peer similarity; Entity cluster (semantic classes); Entity property: Hypernymy (IsA); Attributes All V2.0 knowledge types; General relations; Feature vectors for entities; entity key sentences Freebase; available structures databases * V2.0 Demo: http://needleseek.msra.cn Data Scale NeedleSeek: Current Status V2.0 (May 2010) V2.5 Terms: 20 million Links: 1 billion Categories: 10M Head labels: 300K Terms: 80M (EN); 40M (CN); 12M (JP) Links: 2.4B (EN); 1.5B (CN); 0.6B (JP) Categories: 30M (EN) Head labels: 500K (EN) Freshness Mar 2009 data Mar 2012 data Language English English; Chinese; Japanese; Glossary English+Chinese Knowledge Term: Literal Peerstring similarity; ( iron ; gone All with V2.0 the knowledge wind ; 狗 types; ; Lumia 800 ) TypesEntity Type: Entity city; cluster animal; (semantic book; film; General person; relations; actress Feature vectors for classes); Entity entities; entity key sentences Entity: Something that we refer to with a specific type property: Hypernymy Property: (IsA); Peer(fox, Attributes dog); IsA(fox, animal); Attr(city, population) External Entity Data Cluster with Peer Similarity: Freebase; {Beijing: available Beijing, structures Shanghai, databases Integration Guangzhou }; {apple: apple, pear, orange, watermelon } * V2.0 Demo: http://needleseek.msra.cn 9

Project SOUL Mining big social data for information discovery and recommendation Build a big entity database (knowledge) Open domain + domain specific curated databases Build a big people database (profiles) People who act on the web Build a big event database (logs) Social interaction records on the web Who do what to whom, when, where, how, and why (intent) Ex: review, QA, comment, tag, share, like, tweet, blog Design algorithms leveraging these databases Develop services & apps enabled by these databases and algorithms SOUL: Toward Big Social Data Analytics People Database People centric People discovery People selection People indexing People ranking Cover people and their friends Event Database Event centric Event discovery Event selection Event indexing Event ranking Cover events; link entities to people and vice versa Entity Database Entity centric Entity discovery Entity selection Entity indexing Entity ranking Cover documents and solutions Algorithms Services Applications PEN Graph 10

* Nokia City Lens Urban Computing With City Dynamics Yu Zheng, Jing Yuan, Xing Xie Data Management, Analytics, and Services Group 11

Sensing What s Urban Computing Improving Urban Computing Mining Understanding Everything in urban areas are used to sense city dynamics and to create a city wide computing graph to tackle the challenges in serving citizens and cities. KDD 12 and ICDE 12 Route Construction from Uncertain Trajectories ACM SIGSPAITAL GIS 10 best paper runner up, KDD 11 Finding Smart Driving Directions Discovery of Functional Regions KDD 12 Ubicomp 11 Passengers Cabbie Recommender system Anomalous Events Detection KDD 11 and ICDM 12 Ubicomp 11 Best paper nominee Urban Computing for Urban Planning 12

Volume Velocity Variety 13