TEXATA 2015 PREPARATION GUIDE This booklet provides participants, educators and event partners with a preparation guide for TEXATA, the 2015 Big Data Analytics World Championships. TEXATA is a fun, independent and challenging business education competition for Big Data Analytics. The mission is to improve well-rounded technical skills, awareness and understanding of the Big Data Analytics disciplines in business. We seek to celebrate the world s best organizations, business leaders and community partners. We hope to give students and young professionals the courage to pursue exciting career paths within Big Data, Data Science and Business Analytics and collaborate together with event partners. The competition involves two Online Qualification Rounds, with a Live World Finals event in Austin, Texas USA. This preparation booklet outlines core concepts to be tested during TEXATA 2015. Testing will examine a diverse range of practical, technical and business themes at the heart of Big Data Analytics including Sentiment Analysis, Machine Learning, Statistical Methods and Predictive Modeling and Analytics Insights. Round 1 and 2 Qualification questions will combine multiple-choice, short-answer and real-world business implementation case studies. The World Finals is an advanced business case study challenge, with in-depth interviews and face-to-face presentations with global leading judging authorities and industry leaders. We hope you enjoy Round 1 on Saturday September 26 worldwide. Good luck! The TEXATA Team
Competition Structure (1 of 2) The Online Qualification Rounds Round 1 (4 hours) Saturday, September 26, 2015 Round 1 will be multiple choice, with some theory questions and some practical questions. Datasets will be open data. Final scores for Round 1 will be determined by the proportion of correct answers weighted, with additional marks available for prompt time submissions (earlier is better). Depending on performances, the Top 20%-50% of participants in TEXATA Round 1 will progress to Round 2. Round 2 (4 hours) Saturday, October 10, 2015 Round 2 will have a greater focus on case studies and real-world practical questions. Theoretical questions involve detailed treatment of technical concepts covered previously in Round 1. Machine Learning principles and algorithms will also be explored more deeply in Round 2. Practical questions involve competitors implementing predictive models on very large structured and unstructured datasets. Big data sets will be open source. Competitors are welcome to include any other public domain data they feel may improve their answers. Round 2 scores will be determined by a combination of multiple choice answer correctness, free text answer assessment against a rubric, predictive modeling score, and submission time (earlier is better).
Competition Structure (2 of 2) World Finals (Austin, Texas, 6 hours) November 8-9, 2015 As part of technical presentations, Finalists will perform a complete data analysis workflow (i.e. beginning at user interviews and ending with a results presentation). As part of business presentations, Finalists will be interviewed by a variety of industry leaders and judging panelists on their proposed creative Big Data Analytics solution and real-world business challenges. Finalists will have access to real business data to solve the issues identified. Finalists are responsible for their problem definition, scope, execution, and communication of business insights. TEXATA 2015 winners will be decided by a panel of judges and the Question Design Team (including the problem owner). An evaluation criterion will be based on a rubric covering problem identification and decomposition, approach to solution, implementation effectiveness and clarity of results communication.
Technical Requirements (1 of 2) Programming Capabilities Competitors will be required to perform coding to compete in TEXATA 2015. Competitors are free to use any languages and frameworks with which they are familiar and comfortable. Competitors will need to be comfortable in performing numerical computations over data (e.g. What is the mean of value X in this dataset? ), data processing such as aggregating and normalizing data, and working with geospatial data. More advanced machine learning and predictive modeling skills will be applicable in Round 2 and World Finals. Whilst we are not focused on code quality or style in either Rounds 1 or 2, judges may request a code review as part of their overall assessment and judging panel interviews and presentations at the Live World Finals in Texas. Business Results TEXATA 2015 skills explore commercial impacts and real-world business insights of Big Data Analytics. TEXATA 2015 is focused on applying on business industries (e.g. financial services, e-commerce and mobility). Round 1 and Round 2 performances will assess objective, fact-driven results and business insights.
Technical Requirements (2 of 2) Amazon Web Services Big Data sets used in the TEXATA 2015 Online Rounds will be hosted by Amazon Web Services. Competitors should be comfortable accessing and/or processing data stored in Amazon S3. Competitors are welcome to download the data from S3 to your preferred storage solution. Access details for the datasets will be provided in the days prior to the competition. TEXATA will not provide technical support for accessing the datasets beyond basic connection details. Competition Interface TEXATA Rounds 1 and 2 will be conducted through a web browser. Participants will have 4 hours (240 minutes) to complete each Round. Participants are expected to have access to a computer with internet access and their preferred big data analytic environment over this time. The competition is independent and product agnostic every participant can use any technological tool, methodology and process to submit their competition solution. Competitors will enter their multiple choice answers and written case study answers via the HackerRank technology competition platform.
Skills & Expertise Competitors preparing to enter TEXATA should review the following topics and skills areas. This list is neither exhaustive nor definitive. TEXATA has a strong industry focus, so don t be too concerned if you re not too experienced on matrix algebra as long as you have the technical skills to implement big data analytics, and the business understanding to apply them effectively, you will be a strong competitor. Statistics Probability theory Probability distributions Precision, recall, accuracy measures A/B(/n) testing experiment design & interpretation Computer Science Algorithm description & identification Linear algebra Database fundamentals Map/Reduce program design Big data system design Linux command line tools Machine Learning Geospatial data analysis Social network analysis Mobile data analysis Text analytics Business Skills Big data industry awareness Stakeholder engagement Communication of results Data visualization