TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION



Similar documents
TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER

What is Analytic Infrastructure and Why Should You Care?

MULTICAST AS A MANDATORY STEPPING STONE FOR AN IP VIDEO SERVICE TO THE BIG SCREEN

Hosting Transaction Based Applications on Cloud

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Alcatel-Lucent 8920 Service Quality Manager for IPTV Business Intelligence

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

How To Make Sense Of Data With Altilia

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Espial IPTV Middleware. Evo Solution Whitepaper. <Title> Delivering Interactive, Personalized 3-Screen Services

BIG DATA What it is and how to use?

Turning Big Data into More Effective Customer Experiences. Experience the Difference with Lily Enterprise

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

An Executive Summary of Deployment Techniques and Technologies

Chapter 7. Using Hadoop Cluster and MapReduce

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Oracle Big Data SQL Technical Update

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Snapshots in Hadoop Distributed File System

Transforming the Telecoms Business using Big Data and Analytics

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Webmeeting Projektgruppe Nutzerführung Presentation Rovi

How To Make Data Streaming A Real Time Intelligence

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Implementation of a Video On-Demand System For Cable Television

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive

COMP9321 Web Application Engineering

CSE-E5430 Scalable Cloud Computing Lecture 2

Big Data: Study in Structured and Unstructured Data

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Accelerating and Simplifying Apache

MEDIA & CABLE. April Taras Bugir. Broadcast Reference Architecture. WW Managing Director, Media and Cable

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Data Refinery with Big Data Aspects

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

Cyber Forensic for Hadoop based Cloud System

Modernization in the Digital Era

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Alcatel-Lucent Multiscreen Video Platform RELEASE 2.2

Trafodion Operational SQL-on-Hadoop

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

BIG DATA-AS-A-SERVICE

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

How To Turn Big Data Into An Insight

Distributed Lucene : A distributed free text index for Hadoop

Digital Asset Management. Content Control for Valuable Media Assets

IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY

Big Data and Analytics: Challenges and Opportunities

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

Self-Service Business Intelligence

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

How To Use Big Data For Telco (For A Telco)

IBM System x reference architecture solutions for big data

Cloud Computing at Google. Architecture

Paper Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

Introduction to Hadoop

How To Write A Paper On The Integrated Media Framework

Future Prospects of Scalable Cloud Computing

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Log Mining Based on Hadoop s Map and Reduce Technique

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Consumer Trend Research: Quality, Connection, and Context in TV Viewing

A Detailed Analysis of Key Drivers and Potential Pitfalls

Next Generation. Surveillance Solutions. Cware. The Advanced Video Management & NVR Platform

The Future of Data Management

B.Sc (Computer Science) Database Management Systems UNIT-V

How To Handle Big Data With A Data Scientist

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Big Data. Fast Forward. Putting data to productive use

International Journal of Advance Research in Computer Science and Management Studies

Hadoop Technology for Flow Analysis of the Internet Traffic

Big Data Efficiencies That Will Transform Media Company Businesses

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Comprehensive Analytics on the Hortonworks Data Platform

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Big Data Volume, Velocity, Variability

Boarding to Big data

Comparative analysis of Google File System and Hadoop Distributed File System

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Transcription:

TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION AN ARRIS WHITE PAPER BY: BHAVAN GANDHI, ALFONSO MARTINEZ- SMITH, & DOUG KUHLMAN

TABLE OF CONTENTS ABSTRACT... 3 INTRODUCTION INTERSECTION OF TV & BIG DATA... 3 Overview of Big Data... 3 TV Sources for Big Data... 4 DERIVING INSIGHT FOR EMERGING TV SERVICES... 4 Quantifying User Interaction... 4 Operational Analytics for quantifying network DVR (ndvr)... 5 Media Analysis Deriving Sub- program Content Structure... 6 EXAMPLE INSIGHTS... 7 User Recording & Playback Behavior... 7 Operational Insight - ndvr... 8 Bridging User Behavior with Content Structure... 9 CONCLUSIONS... 11 ACKNOWLEDGEMENTS... 11 2

ABSTRACT Television continues to evolve from pure consumptive linear viewing to web- like interactive experiences. Emerging new applications, such as multi- screen discovery, consumption, and network DVR, all allow and support user interaction. This is possible due to the evolution of TV application architectures from dedicated hardware- centric functionality to a combination of hardware appliances controlled by modular software services. Big Data technologies can support these rich, interactive TV experiences by collecting, storing, and analyzing federated events and creating usable information for end- consumers, operators, and programmers. This paper provides an application s view into how Big Data enables insight from user behavior, network / appliance operations, and content analysis. INTRODUCTION INTERSECTION OF TV & BIG DATA Overview of Big Data The advent of the Internet and the resulting volume of data that needs to be managed, stored, and analyzed led to the emergence of modern day Big Data technologies. Yahoo and Google developed breakthrough technologies on processing Internet scale data and storing the processed results on distributed commodity servers to enable high availability and scalability. Google File System (GFS) [4] and Big Table [5] showcase early pioneering work that was developed for storing data. This is the precursor to some of the mainstream, open source big data related technologies that have been developed and evolved by the open source community like Apache. Apache HBase [6] and Apache Hadoop Distributed File System (HDFS) [7] are open source equivalents of Big Table and GFS respectively. A typical Big Data technology stack is comprised of an event intake layer facilitating the input of multi- formatted information from a variety of sources, and a core Hadoop eco- system for storing and processing collected information. Analysis of the collected data provides meaningful understanding and insight through dashboards, reports or application- facing interfaces. Figure 1 The TV Ecosystem 3

TV Sources for Big Data The end- to- end TV ecosystem is a complex audio and video delivery system that carries source multimedia content that is encoded and distributed to a head- end. Within the head- end, the content is further manipulated or formatted for delivery to users on their customer premise equipment (CPE) device via the core network. This end- to- end functionality can be logically thought of as a vertical decomposition into application, control, video, and network layers. Instrumenting these various layers allows access to rich information that can be harnessed by Big Data. Client- facing applications resident on a CPE device or a web- application can be instrumented to collect user interaction and content consumption information. This provides insight into user behavior. An instrumented control plane, used for managing subscribers and creating network centric applications such as ndvr and targeted ad insertion, can help drive operational understanding and efficiencies. Instrumented video and network layers provide insight into sub- program information, appliance operation (encoders, network recorders, packagers, and delivery) and network QoS. DERIVING INSIGHT FOR EMERGING TV SERVICES Quantifying User Interaction Instrumented client applications, whether they are on traditional CPE or multi- screen devices, help quantify user interaction behavior. They enable measuring traditional events like user navigation, content discovery, session start, tuner change, and direct interaction during content play. It is also possible to analyze the collected events to ascertain more abstract and aggregated consumption behavior such as program popularity by geographic region. Event monitoring and tracking systems facilitate user interaction logging in an unprecedented way. Keeping in mind user privacy concerns, known hashing and security techniques are employed to protect subscriber anonymity and provide data security. Even in this form, meaningful operational and content utilization information can be determined for prescriptive and predictive purposes. Some examples of the types of analyses include number of active users at any given date and time, program/channel popularity, channel tuning persistence rates, and recorded content peak playback statistics. This information itself is operationally useful and is typically presented via dashboards and reports. 4

Collecting and analyzing subscribers consumption and usage is an essential step in creating targeted experiences. Consumer- facing applications and services may also access actionable information for individual subscribers through defined interfaces, allowing customized user experiences and applications. Subscriber consumption patterns can be supplemented with appropriate subscriber profile information accessed through APIs to filter and construct customized application views for each user. An example of this would be dynamically creating a My Favorite Channels view by combining all the above with a user s channel usage patterns. This particular example would be possible through the use of recommender systems using machine- learning techniques, applied to the mined data and usage patterns. Operational Analytics for quantifying network DVR (ndvr) As TV continues to move from linear consumptive to one that offers interactivity, ndvr is an emerging application that offers end- consumers flexibility to record in the cloud and subsequently playback content on a variety of devices. This drives efficiencies for the operator in providing thinner end- clients, while optimizing content storage and delivery. Conceptually, an ndvr system is comprised of consumer facing application(s) that interfaces with Backoffice services, which control physical recording and delivery sub- systems. Figure 2 ndvr High- level Analytics Architecture Figure 2 shows a high- level architecture of an Analytics system for ndvr. The ndvr sub- components are instrumented to push events into the Data Intake Layer in a federated manner. The instrumented sub- components or services are: 5

Client Application The client or user facing application captures user interactions with the application / content including navigation and trick play behavior Backoffice Services These are comprised of subscriber (SMS), content guide / electronic program guide (EPG) information, content recording (Scheduler), and content playback / fulfilment (FM) transactions Data Plane Appliances The recorders are instrumented to provide physical recording statistics and the just- in- time packaging (JITP) system provides physical delivery information Events are collected from the various sub- components and services in a variety of formats. These transactions are stored in HDFS and processed real- time or batched to create actionable insight. The specific analysis jobs result in creating meaningful dashboards and reports providing useful operational and usage information. The objective of any analytics system is to drive meaningful understanding into how the application, services, and hardware are being used. Operationally, such a system could be used to predict system scaling to meet demand. Media Analysis Deriving Sub- program Content Structure Beyond usage and operational transactions, usable information can be derived from the content itself by employing media analysis techniques on the various components of the content [8]. Figure 3 - Media Analysis High Level Process Copyright 2015 ARRIS Enterprises Inc. All rights reserved. 6

Figure 3 provides a high- level processing framework for analyzing content. The Media Analysis process is a suite of event detectors (audio, video, statistical, natural language) that feed into a fusion engine with the goal of semantically labelling the content with actionable metadata. Combining different types of detectors provides a simple yet powerful mechanism to align events from different modalities to yield interesting results (e.g., detecting black video frames and quick silence audio intervals to infer ad boundaries). Other detector combinations can be used, often in hierarchical structures, to obtain semantically rich events that are present in the content and would be otherwise hard to extract. Examples include creating sports highlight reels, harvesting news, or classifying weather stories by topic. Combining logos, camera angles, telestrations, and speech patterns to infer a particular channel s creative features in TV productions of different genres does this. While different TV program features are interesting in and of themselves, combining the usage behavior and operational information with semantically labelled content can provide meaningful insight across the various vertical layers of the TV ecosystem. EXAMPLE INSIGHTS Information that is collected and processed must provide actionable value to the user, operator, or programmer. This section shows example dashboards derived from actual data providing examples of meaningful insights. User Recording & Playback Behavior Understanding how users interact with content on their CPE device / application is a fundamental starting point to gain a better understanding of user behavior. Figure 4 shows an aggregate view of typical user behavior over a time span. It shows DVR Recordings, DVR Playback, and VOD Plays. 7

Figure 4 - User Activity Dashboard This dashboard shows DVR usage is predominant in comparison to traditional VOD consumption; ten times as many programs are viewed via a user s DVR system than are viewed via VOD. Also, the number of recordings outnumber playback consistently indicating that not all recordings are played back. Lastly, there is a cyclic pattern with recordings whereas playbacks are uniformly distributed; this corresponds to more recordings being instantiated during the week than on weekends. Detailed, empirical evidence such as this provides insight into how DVR systems should be sized, bandwidth efficiencies that could be garnered if an operator moves to an ndvr system, and understanding detailed recording and playback indirectly provided insight into content affinity. Operational Insight - ndvr Operationally, it is important to understand how the data plane is being utilized to help with capacity planning and management. Peak usage information is more insightful than average activity or total activity. 8

Figure 5 Peak User Recording & Playback Activity Figure 5 shows the trend of peak user recording and playback rates over time. Analysis of this data indicates that both playback and recording rates are increasing. Unsurprisingly, recording peaks surpass playback peaks and show a similar cyclical behavior as in Figure 4. Analysis of this sort is invaluable for capacity management and planning of additional resource needs. In this example, if the system can support a peak for both recordings and playbacks of 100K concurrent sessions, the danger of hitting that maximum for recordings is imminent. Peaks tend to present spikes and a single major content program event could surpass the system s limitations. Playback rate is safer, currently, though it is growing faster (additional analysis shows rate of growth) and will probably surpass 200K concurrent sessions before recordings will. Bridging User Behavior with Content Structure Big Data enables analyzing information from disparate sources to help glean insight across different verticals in a TV ecosystem. 9

Figure 6 - User Behavior Correlated with Content Structure Figure 6 shows the insight afforded by combining content structure information with usage behavior. Content analysis, as performed by the Media Analysis, was used to automatically characterize and annotate content segments as Ad or Program segments. Usage analytics for the same programming provides insights into the way consumers interact with the content. Examining information in a superimposed manner from the two sets of processed events provides content owners, producers, and operators with actionable insight into the actual value placed by consumers on specific content segments; through this dashboard, it can be determined what percentage of the viewers skipped through the ads. The data presented here is for a specific asset from a two- week window of operational data in 2015. Beyond simply characterizing ad- skipping behavior, information about content structure and ad segments allows operators and programmers to surmise the relative value of sub- program segments, by tabulating the number of viewers who watched, who fast- forwarded through it, or who performed a rewind operation to watch it again. 10

CONCLUSIONS Big Data technologies were developed primarily to target specific needs of the Internet. However, as our TV delivery systems grow more complex and as the services evolve to be more web- like, there is direct applicability and flexibility in using these technologies for television systems. Actionable insight derived from collecting, processing, and analyzing federated sets of events provides several levels of valuable information. User behavior prediction, operational planning, and program/sub- program affinity are just some of the areas that can be forecast with this approach. In this paper, we provided an overview of the Big Data technology stack, discussed the sources of events in the TV system, and provided examples of meaningful conclusions that can be ascertained from analyzed events across the various vertical layers of application, control, network, and content planes. The combination of these insights, together with state- of- the- art recommender systems and machine learning strategies, will be essential in driving operational efficiencies. ACKNOWLEDGEMENTS The authors wish to thank their colleagues in the Applied Research Center at ARRIS for their contributions to this work. 11

REFERENCES (1) Arthur, L. (2013, August 15). What is big data? Retrieved from http://www.forbes.com/sites/lisaarthur/2013/08/15/what- is- big- data/ (2) Couts, A. (2012, September 25). Breaking down 'big data' and Internet in the age of variety, volume, and velocity. Retrieved from http://www.digitaltrends.com/web/state- of- the- web- big- data/ (3) Laney, D. (2001). 3D data management: Controlling data volume, velocity, and variety. In Application Delivery Strategies. Stamford, CT: META Group, Inc. (4) Ghemawat, S., Gobioff, H., & Leung, S. (2003, October). The Google file system. ACM - SOSP '03, Bolton Landing, NY. (5) Chang, F., Ghemawat, S., Hsieh, W., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R. E. (2006, November). Bigtable: A distributed storage system for structured data. OSDI'06: Seventh symposium on operating system design and implementation, Seattle, WA. (6) Apache hbase. (2014, March 17). Retrieved from http://hbase.apache.org (7) Apache hadoop. (2014, March 14). Retrieved from http://hadoop.apache.org (8) Liu, Z., and Wang, Y., Audio Feature Extraction and Analysis for Scene Segmentation and Classification, Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, Kluwer Academic Publishers, Vol. 20, Issue 1-2, pp. 61-79, 1998. 12