A Poit of View Mortgage Market & Big Data Aalytics About this Viewpoit The fiacial crisis ad the prevailig ecoomic dowtur have put the mortgage market i turmoil i the recet past. The malfuctios of the market ca be attributed to multiple factors. Most importat oes are recklessess ad lack of scietific itelligece about market behavior ad upcomig treds. More importatly, a ucoected approach to decisio makig has ofte led to the dowfall of the system resultig i bakruptcy of various eterprises. Aalyzig available data to gai isights that help i decisio makig is the call of the era. While the availability of data is ot ofte a challege, the right usage of data has ofte faced road blocks i terms of limitatios of systems, lack of processig power, slowess of processig ad/or a lack of applicatio of scietific aalytics to busiess. This paper aims to aalyze some of the possibilities that have emerged due to path breakig techology ad combiatio of busiess ad scietific aalysis. This paper will look at better decisio makig capabilities ad busiess opportuities that could emerge through the use of Big Data aalytics i the mortgage sector. Itroductio Every sigle day, busiess realms keep expadig ad the eed for true busiess aalytics grows till it becomes a ivisible strig that ties data with busiess i a partership. For smaller eterprises, the eed for agility is higher ad techology pretty much keeps up with busiess expectatios. However, the same is ot true for relatively large eterprises as they struggle with legacy issues. With volume beig the key costrait, provisioig the data o time for busiess aalytics ad reportig is the biggest challege for major eterprises. With the US ad Europe still reelig uder the ecoomic crisis, credit ad mortgage are playig a key role i the ecoomy. I the US Bakig ad Fiacial Services idustry, Office of the Comptroller of the Currecy (OCC) ad other regulators are pressig hard for details aroud exposures (mortgage debt, home equity lie of credit (HELOCs), credit card, commercial loas) with a much quicker turaroud time tha previously. Additioally, there are heavy pealties if errors are discovered i the data reported. With icreased focus o foreclosures ad their impact o the overall ecoomy, US baks have created specific uits to work through bakruptcy ad foreclosures related challeges. At the same time, with the itroductio of ew govermet madated programs for mortgage/heloc products, there is a icreased flux of chages i busiess processes. These chages agai come with striget timelies for implemetatio. However, the mai challeges lie i accessig the right source of data ad buildig scalable platforms that ca provisio ad eable aalytics. Curret State Challeges Existig IT architectures face challeges ad scalability issues. Strugglig with legacy issues, various busiess processes ru i disparate applicatios maaged by multiple IT groups; the applicatios also have covoluted depedecies with each other. With challegig deadlies, the IT groups struggle to iform other groups o how oe applicatio might impact the busiess process o other applicatios. Some applicatios have moved to the cloud (SalesForce - Case Maagemet, CRM), but itegratio ad large data movemet still remais a issue. The other bigger issue is aroud expoetial data growth. Regulatios madate maiteace of accout holder iformatio i the system for seve years. Cosequetly, the data would keep explodig at all levels, startig from origiatio, uderwritig, fulfillmet, servicig, modificatios ad post-closig to bakruptcy, ad foreclosures. Udertakig deeper tredig aalysis ecessitates that data be maitaied forever (E.g. reports that eed to provide details of losses at borrower ad loa level, fees that were charged to the customer durig the life cycle of a loa). I these cases, it is expected that the data for iactive loas is also stored for detailed aalysis which adds to the volume of data which has ow reached terabytes. Regulators have also started requestig for relevat loa documets at each loa processig stage to check if loa processig adhered to rules. With regulatory compliace issues drivig the demad for improvig time-to-market, archivig the data to tapes ad retrievig for reportig is costly. This meas, data sources would have to maitai historical ad trasactioal history olie. As a result, there would be o reprieve for aalytical applicatios which would have to ru their queries o large databases for hours ad may ot have access to the data required for aalysis.
How Big Data Solutios help This gives rise to a perfect busiess case for Big Data products ad leveragig its aalytics to realize busiess beefits. Big Data techology applies the power of massively parallel distributed computig to capture ad sift through data goe wild that is, data at a extreme scale of volume, velocity, ad variability. This is where the ope source Apache HADOOP software framework helps by offerig advaced aalytics usig distributed file systems for aalyzig structured ad ustructured data. But HADOOP aloe does ot provide database services, it would eed to be combied with NoSQL databases to facilitate the map-reduce framework which otherwise would be very cumbersome to implemet i Java, C++ or Pytho. There are few other relatioal databases which implemet the Map Reduce framework (e.g. Teradata AsterData) that have demostrated the ability to be able to cruch huge amouts of data ad provide aalytics. The HADOOP framework should be used to build a cetralized hub (Operatioal Data Store) for storig ad maagig trasactioal data. However, the master ad referece data should be stored i their curret database platforms. The trasactioal data should iclude the loa applicatio movemet from origiatio through uderwritig, fulfillmet ad servicig. The other miscellaeous trasactios should iclude other critical evets relatig to fees, credit reversal ad loa modificatio. The ETL processes of ecommerce trasactios should also be moved to HADOOP based solutios. Additioally, other cosumer trasactios - deposits, cards, tradig trasactios, ad high value ivestmet - should also be brought oto this cetralized hub. These trasactios would be huge i size ad HADOOP, by virtue of its huge data cruchig ability, would be able to maitai ad maage this data. This platform would have to evagelize across differet groups ad would serve as data as a service for multiple applicatios. The curret applicatio would ot have to migrate oto the ew applicatio ad ca cotiue leveragig the existig database platform. For trasactioal data, baks would be able to coect oto the HADOOP platform ad existig ETL ad aalytical frameworks would ot have to be modified. Also, a summarized view ca be moved from the HADOOP platform to relatioal database where reports could be pulled out usig existig COGNOS/MicroStrategy/SSRS tools. With all the trasactioal data available o a sigle platform ad aalytical power provided by Hive ad HBase, orgaizatios could look to trasform the data to gai actioable busiess isights. Mortgage Aalytics o Big Data While BigData directly advertises the volume, velocity ad variety of data processig, the most importat busiess use comes from the eed for samplig data. Big Data solutios ca be leveraged to ru aalytics o existig data alog with the possibility of ew sources of overlaps (like market data, sourcig agecy, tribuals) to come up with itelliget decisio makig opportuities. The use cases give below are probably limited examples of the scope of Big Data aalytics. Eable B2B - Cliet Cetric Aalysis - Deposits - Cards - Mortgage Big Data Solutio - Deposits Referece Data BI Reportig Solutio - Cogos, Microstrategy, SSRS - Trade ad Ivestmet Trasactios Master Data Figure : 1 Cosolidatio of trasactioal data i a sigle hub Takig mortgage data cosolidatio to the ext level, Cosumer to Busiess (C2B) trasactios across multiple lies of busiess will help better uderstad cosumer cash flows eablig capture of Busiess to Busiess (B2B) relatios through trasactio data. O a average, the data i these trasactio repositories ca go over 200 millio trasactios/moth. All i all, this eables cliet cetric aalysis usig Big Data. 2
Expad View of Prospective Borrowers Borrower Ifo - With Bak Borrower Ifo - Exteral Sources Batch Iterface Borrower Ifo - Exteral Sources Update back to SOR Big Data Solutio Real Time Iterface Decisioig Eable Predictive Aalytics Figure : 2 Eable full spectrum view of borrower, by combiig borrower ifo from iteral ad exteral sources Whe it comes to Credit Risk Maagemet, the appraisal process may ot be able to access all the details pertaiig to property, borrower s curret portfolio with the bak, udisclosed lies, property ad tax lies, judgmets ad child-support obligatios, ad bakruptcies. I most credit reports, data is refreshed oly oce i 60-90 days which meas there is a critical missig piece i the appraisal process. This data ca be made available via alterate sources (e.g. CoreScore FICO) ad ca be merged ito the existig scorig process. The existig scorig process ca also be tued to icorporate paymet treds as oticed for other products. Similarly, there is a potetial correlatio betwee deliquecies, liquidatios ad customer's credit score chages. FICO offer aalytics to help predict strategic defaults by a borrower. This aalytics requires data for the calculatio of FICO Score, utilizatio percetage of credit card, retail balace ad home price depreciatio. There is a eed to uderstad if a borrower is up to date i his repaymets i other accouts such as credit card, auto or HELOCs, but deliquet i real estate. Despite US baks executig Loa Modificatio Programs to help borrowers to recover, there is a cosiderable chace that a borrower may ot be able to make all paymets durig the three-moth trial period. A aalysis of borrower data icludig his FICO score, credit card ad other loa trasactios would help aticipate ad predict such situatios. Most baks have set up a dedicated case / relatioship maager to maage defaultig borrowers. These aalytics would help them to uderstad the situatio better ad provide better assistace to borrowers. The aswer is to get ahead of the problem by allowig baks to work with borrowers to fid a solutio to loa deliquecy, help provide alterative solutios to foreclosure ad tackle default risk proactively. Baks should leverage predictive aalytics to idetify situatios where risk of default is immiet ad take appropriate actio to avoid the high cost of foreclosure. By actig early ad optimizig the remediatio treatmet for each borrower, baks ca sigificatly reduce default ad improve the existig traditioal ad maual approach to loa modificatios. This would serve as a early warig system at the borrower level. Ehace Risk Maagemet Deliquecy Models ad Cotagio Aalysis Baks i the USA have started buildig deliquecy data models that ca help predict loas likely to become deliquet i the ext threesix moths ad help iitiate proactive actio. However, this deliquecy model requires aalysis of the customer trasactio table (~ 1 billio rows), combiig it with the fees trasactio data (~800 millio rows) ad borrower ad loa relatio data (active as well as archives). Such aalysis cosumes sigificat processig power ad Iput/Output (I/O) ad caot be provided i real time. Movig this processig over to Big Data solutios ca save CPU costs. 3
Ehace Sub Servicig Large baks i the US have iitiated trasfer of servicig rights to sub services for specific default loas. However, idetifyig loas which would have to be sub serviced is a challege as this requires detailed aalysis of the history of the loa, borrower ad the loa documets available i the Documet Maagemet System. The aalytics require a thorough uderstadig of the paymets made by the borrower, umber of 30 day, 60 day ad 90 day deliquecies at the loa ad borrower level ad the documets available i the system to review the trasactios. By buildig a automated aalytics program that rus over huge trasactio logs ad groups them at borrower ad loa level would help create ability to idetify the loas that could be sub-serviced. The aalytics ca also help idetify the gaps i terms of available loa documets i Documet Maagemet Systems. This would eable itelliget prioritizatios based o a aalysis of borrowers propesity to default. Curretly, baks execute this process maually ad it takes roughly seve-eight weeks to decide if a loa ca be sub-serviced. With predictive aalytics this ca be reduced to two-three weeks. Predict Mortgage Frauds Baks have struggled to idetify mortgage frauds despite availability ad easy accessibility of most of the relevat data. Some mortgage frauds that ca be preveted are: Essetially, baks eed to leverage predictive aalytics to help create heat maps that idetify regios that are kow for mortgage frauds with details available at zip code level. This will eable effective aalysis whe appraisal process is beig coducted for a ew loa applicatio. Idetify Household Spedig Patters Most baks have a process i place for idetifyig households that are cosumers of their products. Takig the solutio to the ext step, baks ca aalyze trasactio logs for all their products ad idetify spedig treds at the household level. This will eable a better view of the ed customer s true ability to pay back loas ad idetify future cross sellig opportuities. Borrower Ifo - With Bak AVM Property Valuatio Olie Iterface Fraud Ifo - Exteral Sources Update back to SOR Big Data Solutio Batch Iterface Decisioig Idetify Property Valuatio, Occupacy & Short Sale Fraud Figure : 3 Eable Fraud Detectio usig Big Data Solutio a) Property Valuatio Fraud This impacts Loa-to-value Ratio (LTV) ad hece the uderwritig ad Loa Modificatio Process are impacted. LTV ca be predicted by addig alterate source ad eablig cross verificatio with the collateral value provided durig the appraisal process. b) Occupacy Fraud This impacts iterest rates of loas. This ca be preveted by idetifyig borrowers who have multiple real estate properties ad leveragig data that provide details o ladlord/teat ad evictio data. c) Short Sale Fraud As more ad more servicers are leaig towards short sale, fraud detectio has to become more relevat ad up to date. 4
Implemetatio Challeges While there are use cases for applyig mortgage data aalysis, implemetatio of Big Data projects also face several challeges that could potetially delay their implemetatio. Some of these challeges are: Talet Acquisitio Big Data aalytics is complex ad requires the kowledge of Data Aalysis tools ad Big Data solutios. Fidig suitable resources is a idustry wide challege. HortoWorks, Cloudera ad a few other Big Data solutio providers are offerig certificatio programs but the supply is ot able to match demad ad the gap is wideig every day. Large orgaizatios also struggle to idetify ad cross trai their existig staff i Big Data aalysis as resources require both techical kowhow ad good uderstadig of Big Data Solutios ad Aalysis Methodologies. Further, traied resources are susceptible to head-hutig by competitors. Big Data Aalysis Solutios are i Icubatio Phase Data Aalysis requires uderstadig of Pig, MapReduce cocepts ad Java programmig i additio to SQL. These cocepts are ew to the market ad orgaizatios are still learig how to iclude them i their associate cross traiig programs. Uless there is a stable framework that ca reduce the learig curve for data aalysts, adoptio would be slow. Cost of Opportuity Mortgage baks ad servicers already ru loa origiatio, fulfillmet ad servicig applicatios as well as data warehouse appliaces. Movig the processig to a ew solutio requires busiess case alog with fudig ad seior maagemet approval Aalytics as Solutio The idea of a holistic aalysis is what the Eterprise Data Warehouse was always about. For may, true busiess aalytics has bee a elusive dream. Large orgaizatios already possess huge amouts of data about their customers ad products ad it is just a matter of usig them i the right combiatio to deliver the right solutio to the busiess. However, orgaizatios struggle to provide access to the right data o a platform that would eable true aalytics that would aid busiess with right decisios. Big Data based solutios have the capability to provide the right aalytics; however, uless orgaizatios make efforts to recogize the use case, brig the right talet together, provide fudig to work o a proof of cocept, they would cotiue to struggle with buildig a true aalytics solutio. About Author Arvid Radhakrishe Arvid Radhakrishe works i the world of iformatio maagemet ad has expertise i aalyzig data, decipherig its meaig ad decidig the usefuless of iformatio extracted from urelated data. His iterests iclude architectig data warehouses, cotextualizig time series data ad applyig ewer techologies to his field of iterest. Arvid has bee workig i the area of iformatio maagemet with various idustry tools ad techologies for the last six years ad has take special iterest i devisig solutios for the mortgage area. He is a valuable cotributor to TCS' efforts i buildig solutio accelerators ad thought papers i the area of Big Data ad Aalytics across various domais. 5
Cotact For more iformatio about TCS' Bakig & Fiacial Services, email us at bfs.marketig@tcs.com About Tata Cosultacy Services Ltd (TCS) Tata Cosultacy Services is a IT services, cosultig ad busiess solutios orgaizatio that delivers real results to global busiess, esurig a level of certaity o other firm ca match. TCS offers a cosultig-led, itegrated portfolio of IT ad IT-eabled ifrastructure, egieerig TM ad assurace services. This is delivered through its uique Global Network Delivery Model, recogized as the bechmark of excellece i software developmet. A part of the Tata Group, Idia s largest idustrial coglomerate, TCS has a global footprit ad is listed o the Natioal Stock Exchage ad Bombay Stock Exchage i Idia. For more iformatio, visit us at www.tcs.com IT Services Busiess Solutios Outsourcig TCS Desig Services I M 03 I 13 I