Spark use case at Telefonica CBS



Similar documents
StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón

Unified Batch & Stream Processing Platform

Big Data Analytics Nokia

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Real-time Big Data Analytics with Storm

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Upcoming Announcements

What s New in Security Analytics Be the Hunter.. Not the Hunted

How To Create An Insight Analysis For Cyber Security

How To Handle Big Data With A Data Scientist

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Reference Architecture, Requirements, Gaps, Roles

GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.

Client Overview. Engagement Situation. Key Requirements

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

Big Data Use Cases. To Start Today. Paul Scholey Sales Director, EMEA. 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)

Azure Data Lake Analytics

Real Time Big Data Processing

Three Open Blueprints For Big Data Success

Ganzheitliches Datenmanagement

Big Data & Security. Aljosa Pasic 12/02/2015

Where every interaction matters.

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

The Future of Data Management

Integrating a Big Data Platform into Government:

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Getting Real Real Time Data Integration Patterns and Architectures

Databricks. A Primer

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Information Technology Policy

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Identity and Access Management Integration with PowerBroker. Providing Complete Visibility and Auditing of Identities

Towards Smart and Intelligent SDN Controller

Syslog Analyzer ABOUT US. Member of the TeleManagement Forum

SANS Top 20 Critical Controls for Effective Cyber Defense

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

SAP and Hortonworks Reference Architecture

Oracle Big Data Strategy Simplified Infrastrcuture

A New Era Of Analytic

Databricks. A Primer

The Cyber Threat Profiler

redborder IPS redborder Just common sense IPS overview Common sense

Huawei One Net Campus Network Solution

The 4 Pillars of Technosoft s Big Data Practice

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Hadoop & Spark Using Amazon EMR

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

Five keys to a more secure data environment

Real Time Data Processing using Spark Streaming

How To Manage Security On A Networked Computer System

Environment. Attacks against physical integrity that can modify or destroy the information, Unauthorized use of information.

How To Make Data Streaming A Real Time Intelligence

Machine-to-Machine Exchange of Cyber Threat Information: a Key to Mature Cyber Defense

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Massive Cloud Auditing using Data Mining on Hadoop

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

EnterpriseLink Benefits

Big Data Use Case: Business Analytics

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

SGT Technology Innovation Center Dasvis Project

The Future of Data Management with Hadoop and the Enterprise Data Hub

WHITE PAPER. FortiWeb and the OWASP Top 10 Mitigating the most dangerous application security threats

Dominik Wagenknecht Accenture

Comprehensive Analytics on the Hortonworks Data Platform

Big Data Pipeline and Analytics Platform

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Bellevue University Cybersecurity Programs & Courses

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Time-Series Databases and Machine Learning

Hayri Tarhan, Sr. Manager, Public Sector Security, Oracle Ron Carovano, Manager, Business Development, F5 Networks

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Cloudera Enterprise Data Hub in Telecom:

BIG DATA & DATA SCIENCE

The Internet of Things and Big Data: Intro

From Spark to Ignition:

RSA envision. Platform. Real-time Actionable Security Information, Streamlined Incident Handling, Effective Security Measures. RSA Solution Brief

This module provides an overview of service and cloud technologies using the Microsoft.NET Framework and the Windows Azure cloud.

Logical Operations CyberSec First Responder: Threat Detection and Response (CFR) Exam CFR-110

Introduction to Cybersecurity Overview. October 2014

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

The Hillstone and Trend Micro Joint Solution

Big Data must become a first class citizen in the enterprise

BIG DATA ANALYTICS For REAL TIME SYSTEM

Transcription:

CiberSecurity Spark use case at Telefonica CBS Telefónica Digital Digital Services

WHOAMI o Francisco J. Gomez o Worker at Telefónica (Spain) o Securityholic o @ffranz

WHY

WHY

WHY

CiberSecurity

Spark use case at Telefonica CBS Contents 01 Telefonica 02 Our Skills 03 Stratio 03 Upcoming Challenges What Does Cyber-Security Mean? Our skills Stratio distribution Spark Stratio Streaming What does it mean for us? What do we need? Architecture SQL Like Interface Cyber-Security in numbers Data adquisition Sinfonier Show me the reality Data fusion Traditional Approach Batch New Approach

1 CiberSecurity Telefonica

Telefonica What Does Cyber-Security Mean? Cybersecurity is the collection of tools, policies, security concepts, security safeguards, guidelines, risk management approaches, actions, training, best practices, assurance and technologies that can be used to protect the cyber environment and organization and user s assets. Organization and user s assets include connected computing devices, personnel, infrastructure, applications, services, telecommunications systems, and the totality of transmitted and/or stored information in the cyber environment. Cybersecurity strives to ensure the attainment and maintenance of the security properties of the organization and user s assets against relevant security risks in the cyber environment. The general security objectives comprise the following: Availability; Integrity, which may include authenticity and non-repudiation; Confidentiality ITU-T X.1205, Overview of cybersecurity (8) CYBERSECURITY THREAT. The term cybersecurity threat means any action that may result in unauthorized access to, manipulation of, or impairment to the integrity, confidentiality, or availability of an information system or information stored on or transiting an information system, or unauthorized exfiltration of information stored on or transiting an information system. DEPARTMENT OF HOMELAND SECURITY CYBERSECURITY AUTHORITY

Telefonica What does it mean for us? Cybersecurity is the collection of tools, policies capabilities to protect the cyber environment and organization and user s assets. Cybersecurity strives to ensure unauthorized access to, manipulation of the integrity, confidentiality, or availability of an information, or unauthorized exfiltration of information.

Telefonica Cyber-Security in numbers Hacktivism Cyber Crime Cyber Warfare Cyber Espionage DDoS (23%) SQLi (19%) Defacement (14%) Account Hijacking (9%) Unknown (18%)

Telefonica Show me the reality Storm UI World map Wordpress

Telefonica Storm UI

Telefonica Check point

Telefonica Traditional Approach

Telefonica Events, Logs and Alerts: Correlation Event Event Event Event Event Event Normalize Centralize Correlate Engine Storage Alerts Audit Event

Telefonica New Approach

Telefonica Context, Behavior, Anomalies: Processing and Storage Information Information Information Information Information Information Normalize Queue Process Store Access Early alerts Reports Trends Support Visualization

Telefonica Before / After

2 CiberSecurity Our Skills

OUR SKILLS But

OUR SKILLS We need skills in: Big Data Cloud NoSql

3 Detección de Amenazas Stratio

Stratio Whoami o Oscar Mendez Soto o CEO Stratio o CEO Paradigma tecnológico o Pon aquí lo que quieras!!!

Stratio Stratio distribution Spark is: A unifier data hub Combine historical data and real time data streaming in a single query Multi-application Multi-Data Concentrate all and any type of data into a single Data Hub that allows the implementation of any use case or application Big Data a child s play An easy SQL interface to access all the power and capabilities of the platform

Stratio Stratio distribution Spark

Stratio Fewer components: SDS 100 1/2 Components = 1/2 Odds of failure

Stratio Our architecture We adapt our platform to this Telefonica project. We have three step: Ingestion: We use Kafka Data fusion: We use Storm. Batch: We use Cassandra+Spark

Stratio Data Adquisition Data are in several sources: DNS traffic Data sources IP Social media Underground sources Goverment sources Sources Sources Sources Sources Sources Sources Sources Sources PULL Sources with the information. API The data are going to Kafka. The volume is totally variable. KAFKA

Stratio Data fusion We use Storm to process and normalized the information. The system must to generate alert to the customer. This use case required a Big Data component capable of processing the data and extract its information in real-time Warnings and alerts are time-sensitive in order to deal efficiently with security attacks.

Stratio Batch The data are saved in Cassandra. We use Cassandra direcly for the easy queries. And we used Spark to extract the information not accesible to cassandra directly. Data process INTEGRACIÓN INTEGRACIÓN INTEGRACIÓN

Stratio Spark+Cassandra Spark working over Cassandra Two plus two is four? Sometimes Sometimes it is five. G. Orwell

Stratio Spark+Cassandra FEW USERS / HIGH VOLUME OF DATA Usually when an analyst or a single user wants to access a big data repository, you need to distribute the information because it s too big. The main solution is to use tools based on MapReduce distributed processing like Spark. But the process is very hard and the cluster can t support many users/many concurrent operations.

Stratio Spark+Cassandra MANY USERS WITH HIGH VOLUME OF DATA You have many users and a high volume of data. In this case, you must design the queries correctly, because you need a database that is prepared for specific queries. The system will work well for the predesigned queries. This is the perfect case for Cassandra. This case is common in a lot of samples Machine to machine communication, financial transfers, mobile apps, log monitoring, network sensors, surveillance systems, ad hoc applications

Stratio With Spark-Cassandra we are covering a more complete use case A lot of users accessing a lot of data from applications or predefined reports. The needs of data analysts that can transform,analyze, and query openly high volumes of data with a more powerful data manipulation tool in their hands.

Stratio Spark-Cassandra enables the implementation of any use case or application with any number of users Applications or dashboards with many users and much data using a predefined set of queries, perfectly solved with Cassandra, using very few cluster resources. BI applications or tools with few users (BI analysts or similar) executing open queries, perfectly solved with Spark over Cassandra using the remaining power of the cluster. With Spark-Cassandra, Spark integrated with Cassandra, you combine the best of both solutions

Stratio Spark-Cassandra allows selecting just the initial data Spark needs from the Cassandra data store With the integration Spark-Cassandra we can leverage the power of Cassandra s main indexes, and especially secondary indexes in order to only and efficiently fetch the data we need And moreover In Spark-Cassandra we have improved the use of Cassandra s secondary indexes in order to speed up any interactive query, therefore we have maximum efficiency for the initial recovery of data. In Spark-Cassandra we have extended the filters and queries of Cassandra s secondary indexes to any logical operation or almost any sql sentence

4 CyberSecurity Upcoming challenges

Upcoming challenges Data fusion Change Storm to Spark Streaming in data fusion layer.

Upcoming challenges Use SQL-Like interface for analyst queries An SQL-Like language to simplify the use and combination of historical data and data streaming SQL-Like language that allows making interactive queries that combine queries over batch/historic data, with queries over real time data streaming in the easiest way. SQL + DSL (Linq) abstractions. Domain specific language: Business oriented, easy and scalable. Evolves and adapts to the business needs of any customer with UDFs (user defined functions). Script extension to allow imperative programming for smart scripts. Easy interface to all Stratio Dsitribution Spark modules. Meta is an SQL abstraction for distributed programming that combines stream and batch process (abstraction of tables and streams).

Upcoming challenges Sinfonier

Upcoming challenges Sinfonier: Simplify Building Process

Cierre