CURRICULUM VITAE PERSONAL DATA Name Street + Nr. Postal code + City Country Phone E- Mail Date of birth Homepage Lars Francke Diplom Wirtschaftsinformatiker (FH) Sülldorfer Kirchenweg 34 22587 Hamburg Germany +49 172 / 4554978 mail@lars- francke.de 1981-12- 17 http://lars- francke.de PROJECTS 02.2014 Cloudera, Palo Alto, USA I am a Certified Cloudera Consultant. Working with Cloudera EMEA on customer engagements. This entails deployment and optimization of Hadoop clusters based on CDH including security using Kerberos. Projects: Retailer, Great Britain (October 2015) Setting up High Availability for Hue, Hive, Oozie and others using HAProxy Backup & Disaster Recovery Plan and Implementation using BDR & DistCP YARN Resource Pool Configuration Benchmarking Spark application debugging Financial institution, Great Britain (August September 2015) Setup of secure Hadoop on two CDH Clusters o Active Directory Integration o Sentry SSL/TLS Encryption * LDAP Authentication Financial institution, Great Britain (August 2015) Setup of a CDH 5.4 cluster Integration of the company s Active Directory 1 / 5
Online ticketing and event company, Germany (November 2014) Installation of CDH 5.2 on a new cluster (Ubuntu 14.04, Spark on YARN) Integration of the company s Active Directory to provide secure Hadoop Software development company, Poland (9.2014) Installation of CDH5 on a new virtual cluster o Installation of a local MIT KDC and setting up Hadoop security Training on Cloudera Manager Integration of Pig in a C++ application Consultation around best practices in Hadoop development Price comparison site, United Kingdom (6.2014-7.2014) Installation of CDH5 on a new cluster Review and optimization of the configuration Import of existing MongoDB databases in BSON format to HDFS Processing of the BSON files using Hive, Impala and Pig, Transformation of the data to Avro and Parquet Export of data to SQL- Server using Sqoop Online marketplace for car sales, Germany (5.2014) Migration of an existing CDH4 cluster which was installed using packages with Puppet to Cloudera Manager and Parcels Design and deployment of a Security concept using Kerberos with Active Directory Integration and Sentry Optimization of an existing Flume infrastructure Setup and demonstration of Hue, Oozie and Impala Telecommunications company, Belgium (3.2014-4.2014) Certification of an existing CDH4 cluster Review and optimization of the cluster Design and deployment of a Security concept using Kerberos with Active Directory Integration and Sentry IT Consulting company, France (2.2014) Advice on Hardware selection as well as network design Preparation of the Operating System (CentOS) Installation and optimization of CDH4 Training of employees in using Hue, Hadoop as well as development of Hive UDFs 08.2015 Euroclear, Brussels, Belgium Setup of a CDH 5.4 cluster Integration of the company s Active Directory including Sentry 07.2015 LeanBI, Stettlen, Schweiz Spark & Hadoop Consulting and Troubleshooting 07.2015 P3 Communications, Aachen, Germany Spark consulting 2 / 5
06.2015 05.2015 04.2015 04.2015 06.2015 03.2015 01.2015 09.2014 Roche Diagnostics, Mannheim, Germany Hadoop consultancy Maintenance of a cluster based on Amazon s EC2 OTTO, Hamburg, Germany Consultancy around the BRAIN project (new BI platform) HBase, Hadoop, Spark, Realtime simpli.fi, Fort Worth, USA Consultancy around Hadoop, Spark, Best Practices, Kafka Review of an architecture based on Kafka, Flume, Hadoop Review of an existing cluster regarding Best practices, performance Cluster sizing based on predicted usage T- Systems Iberia, Barcelona, Spain & Deutsche Telekom, Bremen, Germany Review of a proposed Hadoop based architecture to replaced a Oracle & Informatica based Data Warehouse and ETL process Consultancy and training on all things Hadoop, Spark, HBase Setup of a development Hadoop cluster SDG Consulting, Hamburg, Germany Consultancy and training on all things Hadoop, Spark and Big Data Development of Spark applications and Hive UDFs for PoC projects Tableau & Spark Integration Installation of a Hadoop Clusters on Microsoft Azure GfK SE, Nuremberg, Germany Documentation and consultation around making and validating informed decisions for the following topics: o HBase vs. Accumulo, Spark, SQL- on- Hadoop o Backup and High availability of Hadoop clusters o PaaS, IaaS, Bare- metal deployments in public and private cloud scenarios Development of code for HBase backed projects General Hadoop and Spark consultancy xplosion Interactive, Hamburg, Germany Migration of a Hadoop installation (set up using Chef) to Cloudera Manager Setup of Kerberos with Samba4 & Univention UCS for Hadoop Security Troubleshooting 3 / 5
09.2014 12.2014 advanced STORE, Berlin, Germany Consultancy around Big Data solutions for tools in the real time bidding world (e.g. generating models) Development of a prototype/proof of concept in Java using Dropwizard, Aerospike, RxJava and MongoDB Focus on pre- processing data using MongoDB and Aerospike and low latency Java web applications Setup of the ELK stack (Elasticsearch, Logstash, Kibana) 11.2014 CartoDB, Madrid, Spain Big Data consultancy around a scalable solution for ingesting and processing large amounts of Geospatial data Prototyping using Amazon's Elastic MapReduce and Cloudera Director 09.2014 10.2014 05.2013 01.2015 Land Resource Management Unit, JRC, European Commission, Ispra, Italy YARN/MRv2) Review and optimization of the cluster Training of employees on YARN and concepts such as HDFS High Availability Development of Hive UDFs and Queries to process large amounts of GIS data using the ESRI Spatial Framework for Hadoop Collins GmbH & Co. KG, Hamburg, Germany The project started with building an infrastructure for a newly formed BI team as well as the development of applications: Hardware selection for a new Hadoop cluster Installation of the Operating System (CentOS) as well as CDH4 Ingestion of data from various sources (MySQL, Elasticsearch, MongoDB, CSV files and others) Prepare and provide the data for analysis with Hive, Pig, Impala, Scalding and other tools PoC for a realtime infrastructure to analyse clickstream data using Storm, Kafka and Elasticsearch Implementation of a recommendation engine using Hadoop, Mahout, Elasticsearch and other components Provide ad- hoc analysis as well as regular reports Migration of the cluster's operating system from CentOS to Debian while keeping the cluster running 4 / 5
10.2010 12.2013 08.2010 09.2010 03.2010 06.2010 IT SKILLS Global Biodiversity Information Facility (GBIF), Copenhagen, Denmark Migration of a batch- oriented MySQL based workflow to process biodiversity data to a Hadoop based solution Installation of CDH3 using Puppet Upgrade of the cluster from CDH3 to CDH4 (including HBase and Solr) and migration from Puppet to Cloudera Manager Management and troubleshooting of the Hadoop cluster Provided Hadooptraining Introduction of Maven, Nexus, Jenkins and SonarQube Design and implementation of a Crawler for Biodiversity data using DiGIR, BioCASe, TAPIR and DwC- A Adternity GmbH, Dortmund, Germany Design of a Data Warehousing concept for the online advertisement business on the basis of open source technologies (namely Hadoop and Hive) Installation of CDH3 Implementation of the concept using Hadoop and Hive VZnet Netzwerke Ltd., Berlin, Germany Architecture of projects and highly scalable systems concerning geolocation for the StudiVZ Platform Implementation using Java (Jersey, Jackson) and Python Consultancy around Hadoop and HBase Core skills Big Data (Hadoop ecosystem) Software architecture and - development in Java Programming languages Java Scala Python JavaScript Big Data Experience using Hadoop, HBase and other related tools (Oozie, Sqoop, Hive, ZooKeeper, Spark, Storm, Kafka, Cloudera CDH etc.) 2009 Elasticsearch Participating in projects (Mailing lists, Reviews, Patches) Hive Committer Attending conferences and meetups Details Maven, Jenkins, SonarQube, Nexus Jersey (JAX- RS), Jackson, Avro, Dropwizard, Play, Akka, as well as the usual libraries and frameworks HBase, PostgreSQL, PostGIS, MySQL, Berkeley DB, Cassandra, MongoDB (mongo- hadoop), Redis, SQL Linux with a focus on CentOS/RedHat, Puppet, Ansible, Foreman, Fabric, Logstash, Kibana, Graylog2, Ganglia, Graphite JIRA, Confluence, Fisheye, Crucible, Git, OpenStreetMap (OSM), RabbitMQ, Varnish, Vagrant, Docker, Kerberos 5 / 5