How Eastern Bank Uses Big Data to Better Serve & Protect its Customers! Brian Griffith Principal Data Engineer
Agenda! Introduction Eastern Bank & the banking industry Data architecture and our big data journey Challenges Use Case: Debit card anomaly detection 2
@bwgriffith! Database developer and engineer for 15 years Working in the big data space for about 5 years Blizzard Entertainment Irvine, CA Localytics Boston, MA Now @ Eastern Bank, helping engineer their next generation data platform b.griffith@easternbank.com 3
Eastern Bank! 197 year old mutual bank (largest of its kind in the country) Leader in corporate social responsibility 8 th most charitable business in Massachusetts ~1 Million customers 4 Organizations: Banking: Eastern Bank Insurance: Eastern Insurance Group Wealth: Eastern Wealth Management R & D and Product Dev: Eastern Labs 4
Banking is Evolving! Customer activity moving more into the mobile space Diverse services continuously emerging Customers value personalized service Relevant value added services Personal relationships 5
Positioned for the Best of Both Worlds! Like larger banks, leverage data in a manner that allows us to offer improved features and convenience Like smaller banks, leverage data in a manner that allows us to offer more customized services and relationships 6
7
Past Data Architecture Issues! Customer data lives in transaction silos 3 Major data entities: Insurance, wealth, and banking Data access via in-house or out-sourced solution Impedes analysis Regulatory compliance Technical Debt Auditing 3 rd party dependencies 8
Data Architecture Goals! Abstraction from source systems Scale horizontally, not vertically Complete ownership of depth and breadth of our data Improve data quality and stewardship Drive iterative analytics throughout the enterprise Make the bank smarter 9
Data Architecture! Tx Data Warehouse Customer Master Big Data Store Eastern endeavors to be relationship-driven, not transaction driven. In a digital economy, face to face interactions continue to decline. We need to rely on data integration and analytics to know our customers to best meet their evolving needs Our Data Architecture is built on four interdependent tiers each with its own capabilities and contributions to the overall enterprise platform 10
Hadoop! Tx Data Warehouse Customer Master Big Data Store Can be a significant driver of customer intimacy in an increasingly digital world Allows us to leverage data we ve never thought of as Customer Data before Goes beyond what a customer has with us gives visibility into what a customer does with us through behavioral analytics Scales ability to store with ability to process Platform natively supports data analytics languages and machine learning tools Fast processing enables iterative exploration 11
Architecture Diagram! 12
Big Data Challenges! 13
Challenges! Governance! Ingestion Data Lineage Data Quality Managing growth Balancing what data we can keep vs data we should keep Security Personal Identifiable Information (PII) Mask and limit view of data Driving Consumption If you build it, they will come ß Does not work by itself Constant evangelism Need to demonstrate value! 14
Data Science! 15
Hadoop Data Science! Fraud Detection Proof of Concept
Fraud in the Financial Industry! An Introduction! In 2012, there was 31.1 million fraudulent transactions, with a value of $6.1 billion 1 1 The 2013 Federal Reserve Payments Study 17
Debit Card Fraud! Industry wide debit card fraud has been rising at an significant rate > 400% in the last 3 years! Mostly due to breaches at large, national retailers 18
Use Case Generation! Develop process to work in conjunction with existing fraud detection tools Existing tools mostly rules based Leverage Hadoop to traverse broad customer history for anomalous patterns Behavioral analysis 19
Fraud Use Case Workflow! DATA testing and validating features iteratively TESTING FEATURES sample trans & claims to build training data scoring model will identify suspicious accounts the day after fraud happens TRAINING identify account behavior patterns indicative of fraud 20
Data! Claims Customer reported Only use customer s first claim Model trained on all available transaction data 21
s! Variables indicative of fraud, formatted for machine learning Example: dollarratio = Ratio of dollar spend today vs hx Values calculated by comparing variables today vs history Ratios, log(n), binary, etc Higher value = more suspicious Hadoop performance 22
Building and Evaluating the Model 100% ROC for TestModel 140 False Positive Rate for TestModel 120 80% 100 Fraud Detection Rate 60% 40% 20% training testing reference False Positive Ratio 80 60 40 20 testing 0% 0% 20% 40% 60% 80% 100% Total Accounts Receiver operating characteristic shows model tuning. Reviewing 20% of accounts finds ~80% of anomalies. Reference line shows predicted result of random sample. 0 0% 20% 40% 60% 80% 100% Fraud Detection Rate Weight Std Error Z p(> Z ) (Intercept) -3.44 0.051-66.93 < 2e-16 dollarratio 0.09 0.007 11.75 < 2e-16 23
Scoring! How anomalous were a day s transactions Value range: 0.00 1.00 Comparing a day to customer s history Assigned to each unique account Function of weights & feature values 24
25
Results & Testing! ACCOUNT Score 1 2 3 4 5 6 xxxxxxxx 1 0.693 0.105 0.105 0.105 0.105 237.747 xxxxxxxx 0.9997 0.693 0.713 0.316 1.379 0.036 129.467 xxxxxxxx 0.9994 0.693 0.486 4.847 169.688 35.87 0.29 xxxxxxxx 0.9979 0 14.844 3.088 52.461 41.066 1 xxxxxxxx 0.9803 0.693 0.356 0.421 0.224 0.817 86.446 26
Results & Testing! dollarratio = 6 ACCOUNT Score 1 2 3 4 5 6 xxxxxxxx 1 0.693 0.105 0.105 0.105 0.105 237.747 xxxxxxxx 0.9997 0.693 0.713 0.316 1.379 0.036 129.467 xxxxxxxx 0.9994 0.693 0.486 4.847 169.688 35.87 0.29 xxxxxxxx 0.9979 0 14.844 3.088 52.461 41.066 1 xxxxxxxx 0.9803 0.693 0.356 0.421 0.224 0.817 86.446 27
Results & Testing! ACCOUNT Score 1 2 3 4 5 6 xxxxxxxx 1 0.693 0.105 0.105 0.105 0.105 237.747 Merchant Amount Timestamp JETBLUE AIRW $2,142.00 4/30/15 9:35 AM 28
Results & Testing! ACCOUNT Score 1 2 3 4 5 6 xxxxxxxx 1 0.693 0.105 0.105 0.105 0.105 237.747 xxxxxxxx 0.9997 0.693 0.713 0.316 1.379 0.036 129.467 xxxxxxxx 0.9994 0.693 0.486 4.847 169.688 35.87 0.29 xxxxxxxx 0.9979 0 14.844 3.088 52.461 41.066 1 xxxxxxxx 0.9803 0.693 0.356 0.421 0.224 0.817 86.446 29
Results & Testing! ACCOUNT Score 1 2 3 4 5 6 xxxxxxxx 0.9979 0 14.844 3.088 52.461 41.066 1 Merchant Amount Timestamp Internet Vendor $12.25 4/30/15 3:42 AM Internet Vendor $3.01 4/30/15 3:42 AM Internet Vendor $2.46 4/30/15 3:42 AM Internet Vendor $1.49 4/30/15 3:42 AM Internet Vendor $18.95 4/30/15 3:42 AM 30
Iterating! Build new features Remove ineffective features Address feature interaction. Minimize False Positives Try Different Algorithms 31
Next Steps! Real time w/ Spark & MLLib Get closer to when fraud actually occurs Expanded customer reach via notifications Improved customer service More agile feedback loop based on customer assessment 32
Other Uses! Comparing customer behaviors day over day has carry over to many uses cases: Predicting churn Customer segmentation & personas Predicting Customer Lifetime Value (CLV) 33
Wrap up! Banking is evolving Hadoop addresses a very large gap in our architecture Empowers us to know more about our customers through all of their interactions with us Needs to be governed Customer fraud detection only the tip of the iceberg 34
Special Thanks! Mark Leonard (Eastern Bank) SVP, Data & Development Director Joe Blue (MapR) Data Scientist 35
Thank You!! 36