CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with the latest web technologies, delivering value week after week. Opportunity to bring UX creativity to the table. Topic 2: Build the plugins for major e-commerce platforms to help our customer setting up tags & feeds for their technical integration. Challenge(s): Playing with different platforms in different languages. Topic 3: Build a proper graphic library for Criteo web applications. Challenge(s): Working with an UX designer, having a significant visual impact. Duration: 3 to 4 months Topic 4: Test platform implementation Challenge(s): Working on the full E-mail retargeting platform. ENGINE Topic 1: Study impact of user data update collisions on prediction performance ; design, prototype and bench workaround solutions. Challenge(s): Synchronizing billions of updates for billions of users without losing a single byte of data Topic 2: Improve Criteo offline ecosystem to support time-based incremental jobs, AB test performance improvement over existing workflow. 1
Challenge(s): Hadoop jobs on terabyte logs, comparison of persistence layers (HDFS flat files, HBase...). Topic 3: Provide metrics, analysis and methodologies to study the impact of graphical decisions in our banners. Extract guidance for future enhancements. Challenge(s): Tons of data to explore for the first time, a new Eldorado. Confront theories and assumptions to our billion users through AB Tests. Topic 4: Build up a modern web interface to administrate our creative offer. Challenge(s): Freshest web technologies can be explored and used in production. Topic 5: Build a platform for data scientists to experiment and analyze different ML technics. Challenge(s): State-of-the-art data processing technologies, including data management, processing engine and visualization. Topic 6: Implement and test new ways to predict which products the user is ultimately going to buy. Challenge(s): Hadoop jobs on massive datasets, applied machine learning Topic 7: Improve the reactivity of our prediction models. Challenge(s): Hadoop jobs on massive datasets, applied machine learning Topic 8: New optimization techniques for models trainings. Challenge(s): Hadoop jobs on massive datasets, applying machine learning Topic 9: Deep learning for features extraction. Challenge(s): State of the art machine learning techniques. Topic 10: Platform Monitoring. Challenge(s): Transverse vision of the prediction platform; state of the art UX design. 2
Topic 11: Model learning technical analysis ; prediction metrics analysis. Challenge(s): Transverse vision of the prediction platform; leverage anomaly detection. Topic 12: TestFwk Webservices & Jupyter Notebook. Challenge(s): Transverse vision of the prediction platform; state of the art client / backend integration. SCALABILITY Topic 1: Improve or develop a mutation testing tool in C# in order to help us increase the quality of our tests. Challenge(s): dive into the arcanes of.net while handling big data problematics: the solution developed must be usable on a several million lines code base. The resulting code will likely be open sourced. Topic 2: Develop a safety net to ease deletion of legacy features. Challenge(s): work on production C# code. The resulting approach can be used as raw matter for a technical paper. SITE RELIABILITY Topic 1: Extensive metrics on cluster usage : anomaly and trend detection. Topic 2: Cluster indexation of technical / job logs for analysis. Topic 3: Explore alternative to Tableau for dashboarding. Topic 4: Create a way to easily configure our Couchbase/Memcached clusters. Challenge(s): Devops topics. 3
Topic 5: Next generation of the software persisting all of Criteo's input (kafka2hdfs v2). Challenge(s): Real word data and mission-critical constraints. If it works, all of Criteo's input will go through it. Using Spark, Kafka,Hadoop on Criteo's big data. Duration: 4 months Topic 6: Improve the Kafka Mirrormaker used by many companies to implement HA. Challenge(s): Real word data and mission-critical constraints. If it works, all of Criteo's input will go through it. Very high visibility open source project. Duration: 3 months Topic 7: Deployment of persistent frameworks over Apache Mesos. Challenge(s): Devops problematics. Topic 8: Tracking Resource Usage and Isolation with Mesos and Docker. Challenge(s): Devops problematics, using Docker. Topic 9: Manage developers' workstations through Chef. Challenge(s): Development with Ops affinities, Chef, Linux and Windows environments. Duration: 3 months Topic 10: Create personal dashboard for developers. Challenge(s): Provision of a configurable dashboard for each developer, which will make it easy and safe to change code.. Topic 11: Developers tool - Enhance IDE productivity. Challenge(s): Working on the main development tool, using different languages and environments. Topic 12: Exploring the field of predictive monitoring (i.e. the use of prediction algorithms to identify issues before they happen on the platform). Challenge(s): Exploratory, direct impact on on-duty team, broad application, using development and Machine Learning. Topic 13: Automated monitoring check configuration. 4
Challenge(s): Exploratory, huge impact on usability of whole system, Machine Learning. Topic 14: Extend dashboarding system: The dashboard solution needs some improvements (addition of new data source, definition of new widgets, etc) and thus this internship aims at continuing the development of Dashing and make those improvements. Challenge(s): UX Challenge(s) : (s) :, diversity of data source, advanced HTML/JS/... use, very visible output. ANALYTICS INFRASTRUCTURE Topic 1: SQL Similarity and Recommendation: The goal of this internship is to provide, for a given input query, suggested queries ranked by similarity and performance. Challenge(s): Automate a typically very manual DBA task in order to improve the lives of hundreds of users. Open Source it and reap fame and glory. INFRASTRUCTURE & OPERATIONS Topic 1: Improve the escalation report dashboard with a JIRA extraction. Challenge(s): Automation, development, JIRA/Rest API improvement. B. How to apply We are considering candidates in their final year of studies, expected to graduate in 2016. Internship start dates are flexible; applicants will be considered on a rolling basis. The duration of the internship may vary between 3 and 6 months, according to the topic. All internship opportunities are based in Paris. Please send your resume to r&drecruitment@criteo.com. 5