Big data security on.nl: infrastructure and one application
|
|
|
- Rosalyn Gardner
- 10 years ago
- Views:
Transcription
1 Big data security on.nl: infrastructure and one application Giovane C. M. Moura, Maarten Wullink, Moritz Mueller, and Cristian Hesselman SIDN Labs IEPG Meeting IETF 94 Yokohama, Nov. 1st, 2015
2 Open.nl DNS datasets SIDN is the.nl registry; nonprofit 5.6M domains registered; 5th cctld in zone size 2.5M DNSSec signed domains; 1st worldwide aggregated.nl auth servers data (DNS/IP/DNSSEC...) 18 months + ; daily updated open for research collab: talk to me [email protected]
3 Background SIDN is the.nl registry; nonprofit SIDN Labs research arm This presentation: big data security Contains parts of material submitted to NOMS 2016 and PAM 2016 conferences Mini-bio: joined SIDN last May; in academia before
4 Introduction Newly registered malicious domains have an abnormal initial DNS lookup [1] Random Sample Jan Mar, 2015 Phishing Daily Queries Daily Queries Days After Registration Days After Registration Figure:.nl DNS lookups - 20K Random vs Netcraft Phishing
5 Introduction Why is that? Assumption: spam-based business model Automated Maximize profit before being taken down Question: can we use this to improve security in the.nl zone? Or build an early warning system for newly registered domains? We have both registration and DNS traffic data Registry role Privacy framework and board that oversees it
6 Introduction What we need: 1. High-performance data analytics platform 2. Efficient algorithm that can be used in production This presentation covers both things
7 ENTRADA: our big data platform Hadoop cluster data streaming warehouse interactive response times 5K USD/EUR per node; low cost Store traffic data from.nl auth servers Enable production applications Based on open-source, can be deployed even in a cloud environment only one part (one converter) we develop in house studying open-source it
8 ENTRADA: our big data platform By definition, a data-streaming warehouse must deliver interactive response times pcap storage& analysis wouldn t fly not at low cost So, what are the alternatives? Our requirements: Usability = SQL Extensibility: no vendor lock-in Security: Dependability Low cost High performance
9 ENTRADA: our big data platform Engine Usab. Exten. Perf. Scal. Dep. HBase(HDFS) Elasticsearch MongoDB Hadoop+MapReduce(HDFS) PostgreSQL Impala+Parquet(HDFS) Table: Comparison of Data Query Engines (1 = matches our requirements, 0 does not match) Two Core parts: 1. Optimized Apache Parquet file format (based on Google s Dremel [2]) Column-based storage; reads only necessary columns convert pcap to parquet 2. MPP query engine (Impala [3]) multi parallel queries
10 ENTRADA: data flow Users (I) Recursive DNSes (II) Internet (III).nl Auth Servers (IV) pcap files STAGING pcap to parquet converter parquet files (V) Impala (VII) Daily Buffer (VI) Spark (VII) Data warehouse (VIII) Analyst and Applications HDFS Hadoop Cluster Figure: ENTRADA data sequence flow
11 ENTRADA: evaluation Query: select concat_ws('-',day,month,year), count(1) from dns.queries where ipv=4 and year = 'X' and month = 'Y' and day='z' group by concat_ws('-',day,month,year). 10 parallel queries (1 per day) 52 TB of pcap data = 2.2 TB of parquet; Time: 3.5 minutes on 4 data nodes Conclusion: fast, and cheap; and open-source
12 Part 2: Early Warning System (I) Pull new domains (II) Stats for domains (III) Classify into 1st-timer and re-regs. 1st-timers domains Suspicious Normal (IV) k-means Figure: ndews Architecture Bad domains are likely to be more popular k-means clustering algorithm: unsupervised, classifies according to features Req, IPs, CC, ASes Run it every day
13 Evaluation 1,5+ years of DNS data on ENTRADA 78B DNS request/responses All registration database Key Value Interval Jan 1st, 2015 to Aug 30th 2015 Average.nl zone size 5,500,000 new domains 586,201 New domains - first timers 476,040(81.2%) New domains - re-registered 110,161 (18.8%) Total DNS Requests 32,864,402,270 DNS request new domains (24h) 826,740 DNS request new domains - first-timers (24h) 420,362 Table: Evaluated datasets (from one.nl auth server)
14 Evaluation Cluster Size Req IPs CC ASes Normal 132, Suspicious 2, Table: Mean values for features and clusters - excluding domains with 1 request - 1st Timers
15 Validation Were those suspicious domains really malicious? Very hard to verify on historical data: if they had pages; they might be gone or diff by now Results on historical data: Content analysis: 148 shoes stores, 17 adult/malware 19 phishing domains (out of 49 reported by Netcraft on the same period) VirusTotal: 25 domains matched Results on current data: By far the shoes sites dominate it Adult and malware is also detected; we now download screenshots and content as we classify False positives: rapidly popular political websites and others
16 Discussion Why so many (5 10) new shoes stores per day? Probably concocted websites [4] Automatically created; spam based
17 Why shoes? Most counterfeit product = 40% of US Border seizures [5] Large demand Re-current registration suggest profitability; one site down does not affect operations Online fraud is the NL: 5.3 billion EUR in 2 years; many from site websites [6] Evade industry s tools/techniques: Solutions for phishing and malware exist Users left unprotected Shoes are a smart play: high demand, and low penalties Currently: studying how to share/handle this
18 Summary We showed ENTRADA, our data streaming warehouse Fast & cheap Anyone can deploy it; even on a cloud We showed one application of it: new malicious domains early warning system Under the radar abuse form (shoes) Can be detected by their lookup patterns Run it on a daily basis; have to reduce false positives Studying pilot studies to handle that information More big-data based security applications to come
19 Questions? Contact: Looking for collaboration to : build and validate systems to improve security; write measurement papers Thank you for your attention
20 Bibliography I Hao, Shuang and Feamster, Nick and Pandrangi, Ramakant, Monitoring the initial dns behavior of malicious domains, in Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, ser. IMC 11. New York, NY, USA: ACM, 2011, pp S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis, Dremel: Interactive Analysis of Web-scale Datasets, Proc. VLDB Endow., vol. 3, no. 1-2, pp , Sep M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs et al., Impala: A modern, open-source SQL engine for Hadoop, in Proceedings of the Conference on Innovative Data Systems Research (CIDR 15), 2015.
21 Bibliography II A. Abbasi and H. Chen, A comparison of tools for detecting fake websites, Computer, no. 10, pp , N. Schmidle, Inside the Knockoff-Tennis-Shoe Factory - The New York Times, FraudHelpdesk.nl, Ruim miljoen Nederlanders opgelicht (in Dutch), ruim-miljoen-nederlanders-opgelicht-2/, Dec 2014.
Malicious domains: Automatic Detection with DNS traffic analysis
Malicious domains: Automatic Detection with DNS traffic analysis Giovane C. M. Moura, Maarten Wullink, Moritz Müller, and Cristian Hesselman SIDN Labs {first.lastname}@sidn.nl Network Machine Learning
DNS Big Data Analy@cs
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken! Tweede niveau! Derde niveau! Vierde niveau DNS Big Data Analy@cs Vijfde niveau DNS- OARC Fall 2015 Workshop October 4th 2015 Maarten Wullink,
ENTRADA: Enabling DNS Big Data Applications
ENTRADA: Enabling DNS Big Data Applications Maarten Wullink - SIDN APWG ecrime 2016 June 2 nd 2016 - Toronto What if You have many TB s of network data? And you want to: 1. Store it efficiently 2. Query
.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken Tweede niveau Derde niveau Vierde niveau.nl ENTRADA Vijfde niveau CENTR-tech 33 November 2015 Marco Davids, SIDN Labs Wie zijn wij? Mijlpalen
TLD Data Analysis. ICANN Tech Day, Dublin. October 19th 2015 Maarten Wullink, SIDN. Klik om de s+jl te bewerken
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken Tweede niveau Derde niveau Vierde niveau TLD Data Analysis Vijfde niveau ICANN Tech Day, Dublin October 19th 2015 Maarten Wullink, SIDN Wie
Cymon.io. Open Threat Intelligence. 29 October 2015 Copyright 2015 esentire, Inc. 1
Cymon.io Open Threat Intelligence 29 October 2015 Copyright 2015 esentire, Inc. 1 #> whoami» Roy Firestein» Senior Consultant» Doing Research & Development» Other work include:» docping.me» threatlab.io
Big Data in Action: Behind the Scenes at Symantec with the World s Largest Threat Intelligence Data
Big Data in Action: Behind the Scenes at Symantec with the World s Largest Threat Intelligence Data Patrick Gardner VP Engineering Sourabh Satish Distinguished Engineer Symantec Vision 2014 - Big Data
Domain Name Abuse Detection. Liming Wang
Domain Name Abuse Detection Liming Wang Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2 Why?
Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs
Decoding DNS data Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs The Domain Name System (DNS) is a core component of the Internet infrastructure,
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
WE KNOW IT BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA
WE KNOW IT BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA Email {wei.xu, ksanders, yzhang}@ paloaltonetworks.com ABSTRACT Malicious domains
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
Ali Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP
OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP 1 KALYANKUMAR B WADDAR, 2 K SRINIVASA 1 P G Student, S.I.T Tumkur, 2 Assistant Professor S.I.T Tumkur Abstract- Product Review System
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
We Know It Before You Do: Predicting Malicious Domains
We Know It Before You Do: Predicting Malicious Domains Abstract Malicious domains play an important role in many attack schemes. From distributing malware to hosting command and control (C&C) servers and
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting
Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting A.Nappa, M.Z.Rafique, J.Caballero IMDEA Software Institute Madrid, Spain July 18, 2013 Drive-by-downloads 1 Visit to
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
IANA Functions to cctlds Sofia, Bulgaria September 2008
IANA Functions to cctlds Sofia, Bulgaria September 2008 Kim Davies Internet Assigned Numbers Authority Internet Corporation for Assigned Names & Numbers What is IANA? Internet Assigned Numbers Authority
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Big Data on Cloud Computing- Security Issues
Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Building your Big Data Architecture on Amazon Web Services
Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha [email protected] AWS Services Deployment & Administration Application Services Compute Storage Database Networking
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
A Privacy framework for DNS big data. November 28, 2014 Jelte Jansen
A Privacy framework for DNS big data November 28, 2014 Jelte Jansen SIDN.nl (Registry voor Nederland) 5.5M domain names, >1.600 registrars > 1.300.000.000 DNS queries per day Private foundation with public
Basheer Al-Duwairi Jordan University of Science & Technology
Basheer Al-Duwairi Jordan University of Science & Technology Outline Examples of using network measurements /monitoring Example 1: fast flux detection Example 2: DDoS mitigation as a service Future trends
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
Internet Monitoring via DNS Traffic Analysis. Wenke Lee Georgia Institute of Technology
Internet Monitoring via DNS Traffic Analysis Wenke Lee Georgia Institute of Technology 0 Malware Networks (Botnets) 1 From General-Purpose to Targeted Attacks 11/14/12 2 Command and Control l Botnet design:
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
In-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP
Operates more like a search engine than a database Scoring and ranking IP allows for fuzzy searching Best-result candidate sets returned Contextual analytics to correctly disambiguate entities Embedded
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Full-text Search in Intermediate Data Storage of FCART
Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia [email protected],
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
Big Data on Google Cloud
Big Data on Google Cloud Using Cloud Dataflow, BigQuery, and friends to process data the Cloud way William Vambenepe, Lead Product Manager for Big Data, Google Cloud Platform @vambenepe / [email protected]
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience
Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃 振 修 (Chris Huang) SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 About Me SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 SPN Hadoop 基 礎 運 算 架 構 師 Hadoop in Taiwan
BOTNET Detection Approach by DNS Behavior and Clustering Analysis
BOTNET Detection Approach by DNS Behavior and Clustering Analysis Vartika Srivastava, Ashish Sharma Dept of Computer science and Information security, JIIT Noida, India Abstract -Botnets are one of the
The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014
The Flash-Transformed Financial Data Center Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014 Forward-Looking Statements During our meeting today we will
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Whose IP Is It Anyways: Tales of IP Reputation Failures
Whose IP Is It Anyways: Tales of IP Reputation Failures SESSION ID: SPO-T07 Michael Hamelin Lead X-Force Security Architect IBM Security Systems @HackerJoe What is reputation? 2 House banners tell a story
Dashboard Engine for Hadoop
Matt McDevitt Sr. Project Manager Pavan Challa Sr. Data Engineer June 2015 Dashboard Engine for Hadoop Think Big Start Smart Scale Fast Agenda Think Big Overview Engagement Model Solution Offerings Dashboard
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Parquet. Columnar storage for the people
Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li [email protected] Software engineer, Cloudera Impala Outline Context from various
Hadoop and its Usage at Facebook. Dhruba Borthakur [email protected], June 22 rd, 2009
Hadoop and its Usage at Facebook Dhruba Borthakur [email protected], June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at [email protected].
WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
FAQ (Frequently Asked Questions)
FAQ (Frequently Asked Questions) Specific Questions about Afilias Managed DNS What is the Afilias DNS network? How long has Afilias been working within the DNS market? What are the names of the Afilias
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Know Your Foe. Threat Infrastructure Analysis Pitfalls
Know Your Foe Threat Infrastructure Analysis Pitfalls Who Are We? Founders of PassiveTotal Analysts/researchers with 10+ years of collective experience Interested in Better UX/UI for security systems Improving/re-thinking
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Beyond Batch Processing: Towards Real-Time and Streaming Big Data
Beyond Batch Processing: Towards Real-Time and Streaming Big Data Saeed Shahrivari, and Saeed Jalili Computer Engineering Department, Tarbiat Modares University (TMU), Tehran, Iran [email protected],
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Data Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
CentralNic Privacy Policy Last Updated: July 31, 2012 Page 1 of 12. CentralNic. Version 1.0. July 31, 2012. https://www.centralnic.
CentralNic Privacy Policy Last Updated: July 31, 2012 Page 1 of 12 CentralNic Privacy Policy Version 1.0 July 31, 2012 https://www.centralnic.com/ CentralNic Privacy Policy Last Updated: February 6, 2012
International Journal of Applied Science and Technology Vol. 2 No. 3; March 2012. Green WSUS
International Journal of Applied Science and Technology Vol. 2 No. 3; March 2012 Abstract 112 Green WSUS Seifedine Kadry, Chibli Joumaa American University of the Middle East Kuwait The new era of information
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding
Stanford Computer Security Lab. TrackBack Spam: Abuse and Prevention. Elie Bursztein, Peifung E. Lam, John C. Mitchell Stanford University
Abuse and Prevention Stanford University Stanford Computer Security Lab TrackBack Spam: Introduction Many users nowadays post information on cloud computing sites Sites sometimes need to link to each other
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity
Performance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
Examining How The Great Firewall Discovers Hidden Circumvention Servers
Examining How The Great Firewall Discovers Hidden Circumvention Servers Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver, and Vern Paxson Oct 29, 2015 1 Circumventing Internet
Approaches for parallel data loading and data querying
78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] This paper
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Customized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
A USE CASE OF BIG DATA EXPLORATION & ANALYSIS WITH HADOOP: STATISTICS REPORT GENERATION
A USE CASE OF BIG DATA EXPLORATION & ANALYSIS WITH HADOOP: STATISTICS REPORT GENERATION Sumitha VS 1, Shilpa V 2 1 M.E. Final Year, Department of Computer Science Engineering (IT), UVCE, Bangalore, [email protected]
Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee
Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee {yhlee06, lee}@cnu.ac.kr http://networks.cnu.ac.kr/~yhlee Chungnam National University, Korea January 8, 2013 FloCon 2013 Contents Introduction
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
