BigData Platform @ Flipkart Raju Shetty Dir. of Engg, Data Platform
About Flipkart 20 million products in 70+ categories 30 exclusive brand associations 33000 people strong 30 million registered users 10 million daily visits 8 million shipments per month In-a-Day Guarantee in 50 cities/ Same-Day-Guarantee in 13 cities Alexa traffic ranking of 6 Mobile accounts for more than 70% of our traffic
Image Source: http://blogs-images.forbes.com/steveolenski/files/2015/01/customerexperiencepuzzle.jpg
Why Data Platform at Flipkart? Technology Customer Delight We are building strong and intelligent systems that can take complex, vague information and translate that it into timely and accurate actions.
Challenges Diversity: The Indian population is truly diverse with respect to social, economic, cultural and geographic aspects which manifests in buying pattern and expectations in e-commerce. Lack of organized retail: It is believed that India has skipped the evolution phase of organised retail which means no trends and patterns to learn from for the e-commerce industry. Nascent ecosystem: Under-developed infrastructure and support services that leads to fuzziness in ecosystem. Exponential growth: Indian e-commerce market grew 10 folds in last 3 years and estimated to grow another 10 folds in next 2-3 years. i.e. 100 times in 5-6 years!! Expectation of intelligence behavior for customer satisfaction: Today's customer expects intelligent behavior from various technology systems. Internet traffic moving from desktop to mobile: India's internet revolution is happening via the handheld devices. And hence the mobile specific solutions around personalization, recommendation becomes very important.
Opportunities Personalisation Recommendation Pricing Promotion Demand forecasting Supply chain optimization Online marketing optimization Fraud and risk estimation Marketplace intelligence Accounting
Data Platform Numbers Generate 5TB/Day ( ~ 40% instrumentation) Will soon be hitting about about 1PB/Month. (Raw & processed data) Big Data cluster size - 400 Nodes, moving to a 2000 nodes Process billions of events a day in real time Process thousands of jobs/day.
Data usage patterns Exploration & Experimentation Reports and dashboards Analytics & Insights Scenario planning Systemic Intelligence Data Applications Anomoly, causality and correlation...lot more
Consumers of Data 1. 2. 3. 4. 5. 6. 7. Product teams Analysts Data scientists Business teams End customers Research partners Third party partners (Vendors, Brands, Logistic partners etc)
Data Warehouse? Data Platform?
Motivation Democratise data access, processing & intelligence within Flipkart Enable teams to focus on building data applications instead of building & managing data infra Lower the barrier for validating data-backed hypothesis (identifying business opportunities, product features)
Key Beliefs Data is the true IP of an internet company Data is the algorithm When it comes to data, value of the whole is greater than the sum of its parts Platform model scales rapidly and drives efficiency (compared to E2E implementation by single team)
Traditional Way
Data Governance
Data Governance (contd.) 1. Identify End - End business process 2. Express them as Nouns and verbs (Entities and Events) 3. Push vs Pull model of Ingestion 4. Everything - Real Time
Tech Challenges 1. Integrate desperate & clunky tools 2. Hard to Navigate, non uniform experience 3. Difficult to develop and deploy data apps. 4. Evolving too fast 5. Licensing cost vs generating IP
Capability view
Architecture
Context
Data Flow
Tech Stack
Its time to leapforg. Yesterday Descriptive analytics (what happened in the past?) Today Predictive analytics (what can happen in future?) Tomorrow Prescriptive analytics (provide me advice based on predictions)
Flipkart has taken the plunge.. Would you like to be a part of this action? Work with us: data-jobs@flipkart.com Collaborate with us data-research-collab@flipkart.com