CRITEO INTERNSHIP PROGRAM 2015/2016



Similar documents
Databricks. A Primer

Databricks. A Primer

Big Data and Data Science. The globally recognised training program

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Workshop on Hadoop with Big Data

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Big data blue print for cloud architecture

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Customer Case Study. Automatic Labs

Upcoming Announcements

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Ubuntu and Hadoop: the perfect match

Has been into training Big Data Hadoop and MongoDB from more than a year now

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

How Companies are! Using Spark

TRAINING PROGRAM ON BIGDATA/HADOOP

Implement Hadoop jobs to extract business value from large and varied data sets

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Hadoop & Spark Using Amazon EMR

How To Handle Big Data With A Data Scientist

Cisco Data Preparation

Performance Engineering and Optimizations. Database Services and Data Quality Solutions

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA ANALYTICS For REAL TIME SYSTEM

ROCANA WHITEPAPER Rocana Ops Architecture

Native Connectivity to Big Data Sources in MSTR 10

Chapter 7. Using Hadoop Cluster and MapReduce

Soma: Linked Data Infrastructure

HADOOP. Revised 10/19/2015

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Best Practices for Hadoop Data Analysis with Tableau

Big Analytics in the Cloud. Matt Winkler PM, Big

From Spark to Ignition:

HDP Hadoop From concept to deployment.

Streaming items through a cluster with Spark Streaming

Ali Ghodsi Head of PM and Engineering Databricks

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

The Virtualization Practice

How To Create A Data Visualization With Apache Spark And Zeppelin

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Big Data for Investment Research Management

A Sumo Logic White Paper. Harnessing Continuous Intelligence to Enable the Modern DevOps Team

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

How To Write A Trusted Analytics Platform (Tap)

Big Data Analytics - Accelerated. stream-horizon.com

Analytics on Spark &

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Logentries Insights: The State of Log Management & Analytics for AWS

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Interactive data analytics drive insights

How To Manage Event Data With Rocano Ops

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

I/O Considerations in Big Data Analytics

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Hadoop Cluster Applications

Advanced Big Data Analytics with R and Hadoop

Navigating Big Data business analytics

Making big data simple with Databricks

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Moving From Hadoop to Spark

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Performance Testing of Big Data Applications

Industrial Dr. Stefan Bungart

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

SEIZE THE DATA SEIZE THE DATA. 2015

How To Turn Big Data Into An Insight

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Towards Smart and Intelligent SDN Controller

ANALYTICS CENTER LEARNING PROGRAM

Private Cloud Management

Hadoop in the Hybrid Cloud

Data Challenges in Telecommunications Networks and a Big Data Solution

WHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Presenters: Luke Dougherty & Steve Crabb

Building Your Big Data Team

Big Data and Market Surveillance. April 28, 2014

Avature CRM. Get Engaged to Talent

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

DevOps Best Practices: Combine Coding with Collaboration

Data processing goes big

Transcription:

CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with the latest web technologies, delivering value week after week. Opportunity to bring UX creativity to the table. Topic 2: Build the plugins for major e-commerce platforms to help our customer setting up tags & feeds for their technical integration. Challenge(s): Playing with different platforms in different languages. Topic 3: Build a proper graphic library for Criteo web applications. Challenge(s): Working with an UX designer, having a significant visual impact. Duration: 3 to 4 months Topic 4: Test platform implementation Challenge(s): Working on the full E-mail retargeting platform. ENGINE Topic 1: Study impact of user data update collisions on prediction performance ; design, prototype and bench workaround solutions. Challenge(s): Synchronizing billions of updates for billions of users without losing a single byte of data Topic 2: Improve Criteo offline ecosystem to support time-based incremental jobs, AB test performance improvement over existing workflow. 1

Challenge(s): Hadoop jobs on terabyte logs, comparison of persistence layers (HDFS flat files, HBase...). Topic 3: Provide metrics, analysis and methodologies to study the impact of graphical decisions in our banners. Extract guidance for future enhancements. Challenge(s): Tons of data to explore for the first time, a new Eldorado. Confront theories and assumptions to our billion users through AB Tests. Topic 4: Build up a modern web interface to administrate our creative offer. Challenge(s): Freshest web technologies can be explored and used in production. Topic 5: Build a platform for data scientists to experiment and analyze different ML technics. Challenge(s): State-of-the-art data processing technologies, including data management, processing engine and visualization. Topic 6: Implement and test new ways to predict which products the user is ultimately going to buy. Challenge(s): Hadoop jobs on massive datasets, applied machine learning Topic 7: Improve the reactivity of our prediction models. Challenge(s): Hadoop jobs on massive datasets, applied machine learning Topic 8: New optimization techniques for models trainings. Challenge(s): Hadoop jobs on massive datasets, applying machine learning Topic 9: Deep learning for features extraction. Challenge(s): State of the art machine learning techniques. Topic 10: Platform Monitoring. Challenge(s): Transverse vision of the prediction platform; state of the art UX design. 2

Topic 11: Model learning technical analysis ; prediction metrics analysis. Challenge(s): Transverse vision of the prediction platform; leverage anomaly detection. Topic 12: TestFwk Webservices & Jupyter Notebook. Challenge(s): Transverse vision of the prediction platform; state of the art client / backend integration. SCALABILITY Topic 1: Improve or develop a mutation testing tool in C# in order to help us increase the quality of our tests. Challenge(s): dive into the arcanes of.net while handling big data problematics: the solution developed must be usable on a several million lines code base. The resulting code will likely be open sourced. Topic 2: Develop a safety net to ease deletion of legacy features. Challenge(s): work on production C# code. The resulting approach can be used as raw matter for a technical paper. SITE RELIABILITY Topic 1: Extensive metrics on cluster usage : anomaly and trend detection. Topic 2: Cluster indexation of technical / job logs for analysis. Topic 3: Explore alternative to Tableau for dashboarding. Topic 4: Create a way to easily configure our Couchbase/Memcached clusters. Challenge(s): Devops topics. 3

Topic 5: Next generation of the software persisting all of Criteo's input (kafka2hdfs v2). Challenge(s): Real word data and mission-critical constraints. If it works, all of Criteo's input will go through it. Using Spark, Kafka,Hadoop on Criteo's big data. Duration: 4 months Topic 6: Improve the Kafka Mirrormaker used by many companies to implement HA. Challenge(s): Real word data and mission-critical constraints. If it works, all of Criteo's input will go through it. Very high visibility open source project. Duration: 3 months Topic 7: Deployment of persistent frameworks over Apache Mesos. Challenge(s): Devops problematics. Topic 8: Tracking Resource Usage and Isolation with Mesos and Docker. Challenge(s): Devops problematics, using Docker. Topic 9: Manage developers' workstations through Chef. Challenge(s): Development with Ops affinities, Chef, Linux and Windows environments. Duration: 3 months Topic 10: Create personal dashboard for developers. Challenge(s): Provision of a configurable dashboard for each developer, which will make it easy and safe to change code.. Topic 11: Developers tool - Enhance IDE productivity. Challenge(s): Working on the main development tool, using different languages and environments. Topic 12: Exploring the field of predictive monitoring (i.e. the use of prediction algorithms to identify issues before they happen on the platform). Challenge(s): Exploratory, direct impact on on-duty team, broad application, using development and Machine Learning. Topic 13: Automated monitoring check configuration. 4

Challenge(s): Exploratory, huge impact on usability of whole system, Machine Learning. Topic 14: Extend dashboarding system: The dashboard solution needs some improvements (addition of new data source, definition of new widgets, etc) and thus this internship aims at continuing the development of Dashing and make those improvements. Challenge(s): UX Challenge(s) : (s) :, diversity of data source, advanced HTML/JS/... use, very visible output. ANALYTICS INFRASTRUCTURE Topic 1: SQL Similarity and Recommendation: The goal of this internship is to provide, for a given input query, suggested queries ranked by similarity and performance. Challenge(s): Automate a typically very manual DBA task in order to improve the lives of hundreds of users. Open Source it and reap fame and glory. INFRASTRUCTURE & OPERATIONS Topic 1: Improve the escalation report dashboard with a JIRA extraction. Challenge(s): Automation, development, JIRA/Rest API improvement. B. How to apply We are considering candidates in their final year of studies, expected to graduate in 2016. Internship start dates are flexible; applicants will be considered on a rolling basis. The duration of the internship may vary between 3 and 6 months, according to the topic. All internship opportunities are based in Paris. Please send your resume to r&drecruitment@criteo.com. 5