DATA VISUALIZATION: FINDING PICTURES IN NUMBERS



Similar documents
COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

Analysis One Code Desc. Transaction Amount. Fiscal Period

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Enhanced Vessel Traffic Management System Booking Slots Available and Vessels Booked per Day From 12-JAN-2016 To 30-JUN-2017

Accident & Emergency Department Clinical Quality Indicators

Qi Liu Rutgers Business School ISACA New York 2013

BCOE Payroll Calendar. Monday Tuesday Wednesday Thursday Friday Jun Jul Full Force Calc

oct 03 / 2013 nov 12 / oct 05 / oct 07 / oct 21 / oct 24 / nov 07 / 2013 nov 14 / 2013.

Ashley Institute of Training Schedule of VET Tuition Fees 2015

CALL VOLUME FORECASTING FOR SERVICE DESKS

Supervisor Instructions for Approving Web Time Entry

Insurance and Banking Subcommittee

Data quality checks performed on SUS and HES data

APPENDIX B: QUESTION WORDING AND SURVEY TOPLINE

Computing & Telecommunications Services Monthly Report March 2015

CENTERPOINT ENERGY TEXARKANA SERVICE AREA GAS SUPPLY RATE (GSR) JULY Small Commercial Service (SCS-1) GSR

Oregon s Experience Accepting Online Credit and Debit Payments

CHOOSE MY BEST PLAN OPTION (PLAN FINDER) INSTRUCTIONS

Saving for your. child s future

Performance Dashboards in Local Government: What, Why, and How?

NASDAQ DUBAI TRADING AND SETTLEMENT CALENDAR On US Federal Reserve Holidays, no settlements will take place for USD.

Consumer ID Theft Total Costs

Academic Calendars. Term I (20081) Term II (20082) Term III (20083) Weekend College. International Student Admission Deadlines

ITRC announces latest updates of its Visitor Profile Study (VPS)

REWRITING PAYER/PROVIDER COLLABORATION July 24, MIKE FAY Vice President, Health Networks

P/T 2B: 2 nd Half of Term (8 weeks) Start: 25-AUG-2014 End: 19-OCT-2014 Start: 20-OCT-2014 End: 14-DEC-2014

P/T 2B: 2 nd Half of Term (8 weeks) Start: 26-AUG-2013 End: 20-OCT-2013 Start: 21-OCT-2013 End: 15-DEC-2013

P/T 2B: 2 nd Half of Term (8 weeks) Start: 24-AUG-2015 End: 18-OCT-2015 Start: 19-OCT-2015 End: 13-DEC-2015

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

Financial Operating Procedure: Budget Monitoring

Violent Crime in Ohio s Primary and Secondary Schools

PEBP and Medicare. Public Employees Benefits Program

The Impact of Medicare Part D on the Percent Gross Margin Earned by Texas Independent Pharmacies for Dual Eligible Beneficiary Claims

Reacting to the Challenges: Business Strategies for Future Success. Todd S. Adams, Chief Executive Officer Adams Bank & Trust Ogallala, Nebraska

EMBARGOED FOR RELEASE: Wednesday, May 4 at 6:00 a.m.

Tracking Real Estate Market Conditions Using the HousingPulse Survey

Sweating Digital Assets Analytics Way

Energy Savings from Business Energy Feedback

Agri Credit Clinic. New Entrants Work-Shop. Moorepark 25 th April Bank of Ireland is regulated by the Central Bank of Ireland

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

Choosing a Cell Phone Plan-Verizon

CAFIS REPORT

Industry Environment and Concepts for Forecasting 1

April Gross Receipts Show Impact of Low Oil and Gas Prices

Employers Compliance with the Health Insurance Act Annual Report 2015

Detailed guidance for employers

YEARLY ANALYSIS SHEET - CASH RECEIPTS 20

2015 Settlement Calendar for ASX Cash Market Products ¹ Published by ASX Settlement Pty Limited A.B.N

University-Wide Academic Calendar

Economic indicators dashboard

Live Scouting & Live Odds Services. 24/7 Live Betting Coverage from the market leader

Architectural Services Data Summary March 2011

Interest Rates. Countrywide Building Society. Savings Growth Data Sheet. Gross (% per annum)

HIGH LEVEL BUSINESS CASE FOR AN EEMS SYSTEM

Energy at Home ENERGY USE AND DELIVERY LESSON PLAN 3.6. Public School System Teaching Standards Covered

Exam 3 Review/WIR 9 These problems will be started in class on April 7 and continued on April 8 at the WIR.

Teller & Cash Activity Analysis Tools

INVENTORY IS YOUR RETAIL HEART Is it healthy? Paul Erickson RMSA Retail Solutions

Based on Chapter 11, Excel 2007 Dashboards & Reports (Alexander) and Create Dynamic Charts in Microsoft Office Excel 2007 and Beyond (Scheck)

Metric of the Month: The Service Desk Balanced Scorecard

LeSueur, Jeff. Marketing Automation: Practical Steps to More Effective Direct Marketing. Copyright 2007, SAS Institute Inc., Cary, North Carolina,

a. mean b. interquartile range c. range d. median

Example of a diesel fuel hedge using recent historical prices

Location Based Services - The Less Commonly Used

Hedging Milk with BFP Futures and Options

If the World Were Our Classroom. Brief Overview:

Using Futures Markets to Manage Price Risk for Feeder Cattle (AEC ) February 2013

Are you prepared to make the decisions that matter most? Decision making in manufacturing

Event-Driven Database Maintenance: An Process for Increasing Value and Lowering Costs

ENERGY STAR for Data Centers

UNIVERSITY OF DAYTON DAYTON OHIO ACADEMIC CALENDAR

EMBARGOED FOR RELEASE: Wednesday, December 23 at 4:00 p.m.

Welcome! First Steps to Achieving Effective Inventory Management

A Beginner s Guide to Financial Freedom through the Stock-market. Includes The 6 Steps to Successful Investing

Resource Management Spreadsheet Capabilities. Stuart Dixon Resource Manager

Newfoundland and Labrador Hydro Electricity Rates

CS Week Customer Analytics: Discover the Value. Bob Geneczko Executive Customer Analytics Utility Analytics Institute

After Boston, Little Change in Views of Islam and Violence

Human Resources Management System Pay Entry Calendar

MY PERSONAL CREDIT REPORT

PowerSteering Product Roadmap Your Success Is Our Bottom Line

SOCIAL ENGAGEMENT BENCHMARK REPORT THE SALESFORCE MARKETING CLOUD. Metrics from 3+ Million Twitter* Messages Sent Through Our Platform

FORECASTING. Operations Management

Insolvency Overview. Presented by Richard McCulloch October 2013

University-Wide Academic Calendar

10 Strategic Steps to Reducing Your Energy Costs AEE Regional Chapter Meeting February 2, 2005

Insight Guide. Predictive Power. Leveraging analytics to mitigate property insurance risk

Using INZight for Time series analysis. A step-by-step guide.

Facebook Ads: Local Advertisers. A Guide for. Marketing Research and Intelligence Series. From the Search Engine People. Search Engine People

Important Information about Procedures for Opening a New Account

Cllr Kath Hartley, Putting Passengers First

2.1 The Committee note and comment on the report. 3 DIRECTORATE OF DEVELOPMENT AND ENVIRONMENT PERFORMANCE

Proposal to Reduce Opening Hours at the Revenues & Benefits Coventry Call Centre

University-Wide Academic Calendar

Predicting Credit Score Calibrations through Economic Events

Growth in the Cognitive Era Global Business Services Bridget van Kralingen

Department of Public Welfare (DPW)

Hyatt MDM Case Study: Increasing Revenue with Better Customer Insight. Chris Brogan VP Business Strategy Analytics Hyatt Hotel Corporation

Transcription:

DATA VISUALIZATION: FINDING PICTURES IN NUMBERS @PratapVardhan Pratap Vardhan, Data Scientist, Gramener

You will see 3 questions. You have 30 seconds. Try it! Your timer starts now A DATA VISUALISATION CHALLENGE

HOW 23 17 37 62 101 39 11 75 12 29 37 46 3 48 MANY NUMBERS ARE ABOVE 32 21 8 55 56 53 12 10 52 56 23 10 46 56 107 59 45 22 36 69 41 10 25 5 43 19 39 25 72 44 14 64 26 67 69 58 57 30 102 37 50 58 68 12 33 43 26 70 51 104 33 21 11 50 57 22 87 51 41 55 94 48 94 77 7 96 70 81 64 100? 11 84 69 73 97 2 92 88 66 65 95 65 91 1 77 20 14 58 78 82 59 66 84 81 66 84 63 76 70 18 103 6 73 92 81 78 101 63 9 16 40 92 93 98 82 91 87 88 98 91 79

HOW 23 17 37 62 101 39 11 75 12 29 37 46 3 48 MANY NUMBERS ARE BELOW 32 21 8 55 56 53 12 10 52 56 23 10 46 56 107 59 45 22 36 69 41 10 25 5 43 19 39 25 72 44 14 64 26 67 69 58 57 30 102 37 50 58 68 12 33 43 26 70 51 104 33 21 11 50 57 22 87 51 41 55 94 48 94 77 7 96 70 81 64 10? 11 84 69 73 97 2 92 88 66 65 95 65 91 2 77 20 14 58 78 82 59 66 84 81 66 84 63 76 70 18 103 6 73 92 81 78 101 63 9 16 40 92 93 98 82 91 87 88 98 91 79

WHICH 23 17 37 62 101 39 11 75 12 29 37 46 3 48 QUADRANT HAS HIGHEST TOTAL? 32 21 8 55 56 53 12 10 52 56 23 10 46 56 107 59 45 22 36 69 41 10 25 5 43 19 39 25 72 44 14 64 26 67 69 58 57 30 102 37 50 58 68 12 33 43 26 70 51 104 33 21 11 50 57 22 87 51 41 55 94 48 94 77 7 96 70 81 64 11 84 69 73 97 2 92 88 66 65 95 65 91 77 20 14 58 78 82 59 66 84 81 66 84 63 76 3 70 18 103 6 73 92 81 78 101 63 9 16 40 92 93 98 82 91 87 88 98 91 79

The same questions again. But with a few visual cues. See how long it takes now. Your timer starts now A DATA VISUALISATION CHALLENGE

HOW 23 17 37 62 101 39 11 75 12 29 37 46 3 48 MANY NUMBERS ARE ABOVE 32 21 8 55 56 53 12 10 52 56 23 10 46 56 107 59 45 22 36 69 41 10 25 5 43 19 39 25 72 44 14 64 26 67 69 58 57 30 102 37 50 58 68 12 33 43 26 70 51 104 33 21 11 50 57 22 87 51 41 55 94 48 94 77 7 96 70 81 64 100? 11 84 69 73 97 2 92 88 66 65 95 65 91 1 77 20 14 58 78 82 59 66 84 81 66 84 63 76 70 18 103 6 73 92 81 78 101 63 9 16 40 92 93 98 82 91 87 88 98 91 79

HOW 23 17 37 62 101 39 11 75 12 29 37 46 3 48 MANY NUMBERS ARE BELOW 32 21 8 55 56 53 12 10 52 56 23 10 46 56 107 59 45 22 36 69 41 10 25 5 43 19 39 25 72 44 14 64 26 67 69 58 57 30 102 37 50 58 68 12 33 43 26 70 51 104 33 21 11 50 57 22 87 51 41 55 94 48 94 77 7 96 70 81 64 10? 11 84 69 73 97 2 92 88 66 65 95 65 91 2 77 20 14 58 78 82 59 66 84 81 66 84 63 76 70 18 103 6 73 92 81 78 101 63 9 16 40 92 93 98 82 91 87 88 98 91 79

WHICH 23 17 37 62 101 39 11 75 12 29 37 46 3 48 QUADRANT HAS HIGHEST TOTAL? 32 21 8 55 56 53 12 10 52 56 23 10 46 56 107 59 45 22 36 69 41 10 25 5 43 19 39 25 72 44 14 64 26 67 69 58 57 30 102 37 50 58 68 12 33 43 26 70 51 104 33 21 11 50 57 22 87 51 41 55 94 48 94 77 7 96 70 81 64 11 84 69 73 97 2 92 88 66 65 95 65 91 77 20 14 58 78 82 59 66 84 81 66 84 63 76 3 70 18 103 6 73 92 81 78 101 63 9 16 40 92 93 98 82 91 87 88 98 91 79

YOU WILL BE SHOWN A SET OF NUMBERS ALONG WITH A SUMMARY (AVERAGE, ETC) CAN YOU MAKE SENSE OF THE FIGURES? WHY VISUALISE?

DO THESE FOUR CITIES LOOK IDENTICAL TO YOU? Take a look at the sales report alongside. A company has branches in 4 cities, and each branch changes the product price every month. This leads to a corresponding change in the sales. Here is the performance of the 4 branches with their monthly price and sales for each month. Looking at the average, the four branches have an identical performance. DO YOU AGREE? 2010 Month Boston Chicago Detroit New York Price Sales Price Sales Price Sales Price Sales Jan Feb 10.0 8.0 8.04 6.95 10.0 8.0 9.14 8.14 10.0 8.0 7.46 6.77 8.0 8.0 6.58 5.76 Mar 13.0 7.58 13.0 8. 13.0 12. 8.0 7. Apr May Jun Jul 9.0 11.0 14.0 6.0 8.81 8.33 9.96 7.24 9.0 11.0 14.0 6.0 8.77 9.26 8.10 6.13 9.0 11.0 14.0 6.0 7.11 7.81 8.84 6.08 8.0 8.0 8.0 8.0 8.84 8.47 7.04 5.25 Aug 4.0 4.26 4.0 3.10 4.0 5.39 12.0 10.84 12.0 9.13 12.0 8. 8.0 5.56 Sep 19.0 12.50 Oct Nov 7.0 5.0 4.82 5.68 7.0 5.0 7.26 4. 7.0 5.0 6.42 5.73 8.0 8.0 7.91 6.89 Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50 Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75 Average price is the same. Variance in price is the same. Average sales is the same too. So is the variance in sales.

ARE THEY REALLY IDENTICAL? CHECK AGAIN But in fact, the four cities are totally different in behaviour. Boston Chicago Detroit New York Boston s sales has generally increased with price. Detroit has a nearly perfect increase in sales with price, except for one aberration. Chicago shows a decline in sales beyond a price of 10. New York s sales fluctuates despite a nearly constant price.

A data analytics and visualisation company We handle terabyte-size data Gramener visualises your data via non-traditional analytics and visualise it in real-time. Gramener transforms your data into concise dashboards that make your business problem & solution visually obvious. We help you find insights quickly, based on cognitive research, and our visualisations guide you towards actionable decisions.

INDIAN ODI BATTING GRAMENER.COM/CRICKET/

Jan 100 YEARS OF INDIA S WEATHER 1901 1911 1921 1931 1941 1951 1961 19 1981 11 2001 Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

19 S

IN 2014 ELECTIONS, WHICH STATE PRODUCED MOST NUMBER OF CROREPATI CANDIDATES? AND WHICH STATE HAS HIGHEST % OF CROREPATI CANDIDATES?

GEOGRAPHY OF CANDIDATE WEALTH Uttar Pradesh, with over 400 crorepati candidates, tops the list. Number of Candidates The Northeastern states have the largest percentage of crorepati candidates. Percentage of Crorepati Candidates

AMONG THE MAINSTREAM PARTIES, WHICH PARTY HAS HIGHEST % OF CRIMINAL CANDIDATES?

CRIMINAL CASES Size: Number of candidates Color: % of criminal candidates MNS seems like a winner here. Closely followed by RJD, MDMK 23 S

AND, ONE MORE THING.. NAMESAKES OF 2014 ELECTIONS

CHANDU LALS OF MAHASAMUND Winner s Margin: 1,217 votes Namesakes' polled:,000+ votes

MOST OF WHAT I DO TODAY IS VISUALISING DATA ANOMALIES YOU DON T NEED SOPHISTICATED ANALYSES FOR THIS IT CAN BE EASY TO SPOT THEM

PREDICTING MARKS What determines a child s marks? Do girls score better than boys? Does the choice of subject matter? EDUCATION Does the medium of instruction matter? Does community or religion matter? Does their birthday matter? Does the first letter of their name matter?

LET S LOOK AT YEARS This is a dataset (1975 10) that has been around for several years, and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known. More births Some special days like April Fool s day are avoided, but Valentine s Day is quite popular Fewer births OF US BIRTH DATA For example, Are birthdays uniformly distributed? Do doctors or parents exercise the C-section option to move dates? Is there any day of the month that has unusually high or low births? Are there any months with relatively high or low births? on average, for each day of the year (from 1975 to 10) Most people prefer not to have children on the 13th of any month, given that it s an unlucky day Relatively few births during the Christmas and Thanksgiving holidays, as well as New Year and Independence Day. Very high births in September. But this is fairly well known. Most conceptions happen during the winter holiday season

THE PATTERN IN INDIA IS QUITE DIFFERENT This is a birth date dataset that s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns. More births Fewer births Such round numbered patterns a typical indication of fraud. Here, birthdates are brought forward to aid early school admission For example, Is there an aversion to the 13th or is there a local cultural nuance? Are holidays avoided for births? Which months have a higher propensity for births, and why? Are there any patterns not found in the US data? on average, for each day of the year (from 2007 to 2013) We see a large number of children born on the 5th, 10th, th, 20th and 25th of each month that is, round numbered dates Very few children are born in the month of August, and thereafter. Most births are concentrated in the first half of the year

THIS ADVERSELY IMPACTS It s a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. Higher marks Lower marks CHILDREN S MARKS The average marks of children born on the 1st, 5th, 10th, th etc. of the month tend to score lower marks. Are holidays avoided for births? Which months have a higher propensity for births, and why? Are there any patterns not found in the US data? on average, for children born on a given day of the year (from 2007 to 2013) Children born on round numbered days score lower marks on average, due to a higher proportion of younger children 32

EXPLORING THE MAHABHARATA How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics? Can this unstructured data be processed to extract analytical insights? What does sentiment analysis of this tome convey? Is there a better way to explore relations between characters? How can closeness of characters be analysed & visualized?

MMS SPEECHES https://gramener.com/speechopedia

AAP DONATIONS https://gramener.com/aapdonations

FLAGS OF THE WORLD https://gramener.com/flags

CALVIN AND HOBBES

DETECTING FRAUD ENERGY UTILITY We know meter readings are incorrect, for various reasons. We don t, however, have the concrete proof we need to start the process of meter reading automation. Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.

BILLING FRAUD AT AN ENERGY UTILITY An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available. Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh). Most fraud detection software failed to load the data, and sampled data revealed little or no insight. Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary. This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are aligned with the slab boundaries. This can happen in one of two ways. First, people may be monitoring their usage very carefully, and turn of their lights and fans the instant their usage hits the slab boundary. Or, more realistically, there s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.

Github: https://github.com/pratapvardhan Elections: https://gramener.com/election/ Speechopedia: https://gramener.com/speechopedia/ AAP: https://gramener.com/aapdonations/ Cricket: https://gramener.com/cricket/ Flags: https://gramener.com/flags/ LINKS

Try it! All you need is some data and some curiosity to VISUALISE DATA YOURSELF! @PratapVardhan Pratap.Vardhan@gramener.com +91-837-4-9651