The Big Data Integration and Analytics Revolution in Agricultural Finance, Risk, and Insurance

Similar documents
Ag-Analytics Documentation

Foundations of Business Intelligence: Databases and Information Management

Course MIS. Foundations of Business Intelligence

Foundations of Business Intelligence: Databases and Information Management

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Global outlook on the perspectives of technologies like Power Hub

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Integrating Ingres in the Information System: An Open Source Approach

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

When to consider OLAP?

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Foundations of Business Intelligence: Databases and Information Management

Data Warehousing and OLAP Technology for Knowledge Discovery

Integrating data in the Information System An Open Source approach

Building a Web-Enabled Data Warehouse

Data Integration Checklist

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Report Data Management in the Cloud: Limitations and Opportunities

LEARNING SOLUTIONS website milner.com/learning phone

Soil Data Warehouse Western Regional NCSS Conference Western Regional NCSS Conference

The ESB and Microsoft BI

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

MDM and Data Warehousing Complement Each Other

Editing Strategies for Enterprise Geodatabase

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

So What s the Big Deal?

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Structure of the presentation

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Workday Big Data Analytics

Relational Databases for the Business Analyst

Foundations of Business Intelligence: Databases and Information Management

Sisense. Product Highlights.

SQL Server 2012 Business Intelligence Boot Camp

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Microsoft Business Intelligence

Next-Generation Cloud Analytics with Amazon Redshift

Databricks. A Primer

Data Warehousing and Data Mining

Implementing a Data Warehouse with Microsoft SQL Server 2012

Course Outline. Module 1: Introduction to Data Warehousing

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Challenges and Success of Migrating to an Enterprise Database in York County, PA

Copyright 2011 Sentry Data Systems, Inc. All Rights Reserved. No Unauthorized Reproduction.

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

By Makesh Kannaiyan 8/27/2011 1

Implementing a Data Warehouse with Microsoft SQL Server 2012

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Developing Business Intelligence and Data Visualization Applications with Web Maps

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

SAP Analytics Roadmap for Small and Midsize Companies. Kevin Chan, Director, Solutions SAP

Foundations of Business Intelligence: Databases and Information Management

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Your Data, Any Place, Any Time.

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

Databricks. A Primer

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Wednesday, 12 th November 2015 Presenter: Jon Lambert

Implementing a SQL Data Warehouse 2016

G Cloud Services Definition Document. Compliance Service. Invigilatis Limited. Contents. Pages. Invigilatis Applications 1.

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

In-Database Analytics

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Foundations of Business Intelligence: Databases and Information Management

Rachael Narel Engagement Manager Chad Dotzenrod BI Practice Lead SWC Technology Partners

Designing a Dimensional Model

MAD Skills: New Analysis Practices for Big Data

DATA MINING AND WAREHOUSING CONCEPTS

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Assignment # 1 (Cloud Computing Security)

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

SAP and Hortonworks Reference Architecture

Week 13: Data Warehousing. Warehousing

Building a BI Solution in the Cloud

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Implementing a Data Warehouse with Microsoft SQL Server 2012

Lection 3-4 WAREHOUSING

G Cloud Services Definition Document. Property Management Service. Invigilatis Limited. Contents. Pages. Invigilatis Applications 1.

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Data Warehousing and Data Mining in Business Applications

Ernesto Ongaro BI Consultant February 19, The 5 Levels of Embedded BI

Transcription:

The Big Data Integration and Analytics Revolution in Agricultural Finance, Risk, and Insurance Joshua D. Woodard Assistant Professor and Zaitz Faculty Fellow in Agribusiness and Finance Dyson School of Applied Economics and Management Cornell University IARFIC Keynote June 8th, 2015

Introduction Overview, society and data The data integration problem Ongoing system development efforts Challenges and considerations Purpose is to provide high level overview relevant to ag finance, risk and insurance field

The Data Integration Problem Analysts typically source data from many different government and nongovernment sources, different temporal and spatial resolution Relevant data spread over a wide variety of operational/transaction based databases, datamarts, unstructured text files, etc. Sources all have different data storage and formatting protocols, API s, different levels of temporal and spatial aggregation etc. Existing infrastructures can not be queried jointly, nor at all Not processed to scales appropriate for most uses Typical approach is to one off for every study, to do the following: At a point in time, download slices of data from several different sources, Then format (often by hand or copy/paste) individual data sets and mash together (may take days or weeks; not automated/replicable/documented) Perform one off analysis To expand analysis or update, entire process must be recreated by human

A Fairly Small Sampling

The Data Integration Problem This makes it impossible for most to conduct analysis of today s programs, and imposes large costs on agencies and others Results in duplication in effort, redundancy, error, and difficulty/waste in sourcing data Limits use and usability of data Pushes out many potential users Very difficult to recreate analyses and update Renders research and analysis less credible (see recent blunders) Data analysis versus management/integration/processing Yet, very little focus in the community on building such systems to date

Advantages of Data Warehousing and Integration Systems Acts as a clearinghouse for data to support policy makers, oversight, research & development, and business intelligence Not a transactional database, but rather used for informational/reporting/research purposes Integrated and centralized Subject oriented and optimized to give answers to diverse questions Data are processed in various ways for variety of uses, flexible Non-volatile, meaning data are never deleted, and is always growing Consistent data storage and formatting protocols within warehouse (reconciles source data)

AgDB Data Warehousing Overview Data Chunks External Databases, Datastores, Datastreams: RMA, USGS, NRCS, AMS, ERS, PRISMS, CME, NASA, NASS, FSA, FAS, etc. OLTP Data Scheduled Jobs To Download and Extract from Source Over Web Integration Services Filter/clean Prepocessing/ Aggregation/ Interpolation/ Transformation Load Data Auditing & Validation AgDB Data Warehouse Web Data Services, OLAP, Data Marts External Clients Web Decision Tools

Advantages of Data Warehousing and Integration Systems Resilient to change, additions, updates Data from different sources can be joined, integrated, and queried with low effort Wide degree of control and consistency in aggregating, interpolating, and cross referencing among and between different types of data Improved data integrity (auditing, cleaning, validation) Increases utility, usability, and access to data Results in lower costs and more reliability for analysts and policymakers who use the data Overall: Save time, save money, increase capabilities Facilitate user tools and access to data that farmers, researchers, and policy makes want

Current Efforts, AgDB Data Warehousing System at Cornell Genesis of system & motivation for Open Data Warehouse Pulls in data from disparate sources and consolidates in a single repository (primarily various USDA data, but also as others) Basic ETL: Data extraction and/or sourcing Extraction, Preprocessing and transformation before loading; Filter, transform, integrate, classify, aggregate, summarize DBMS: Microsoft SQL Server SSIS and other programs for transformation (Python, Matlab, ArcGIS, ArcPy, etc.) Built in spatial libraries, SSDT, etc. Other candidates: Oracle, PostgreSQL, MySQL, MongoDB, other custom Deployed on CIT servers at Cornell (moving to Azure this summer) Data Access: Web API: Virtually any language or stat program (e.g., Matlab, Python, Excel, STATA, Java) Point and click interfaces (also generate API calls for replicability) Others: RMA rating calculator API, spot/basis interpolation, Dairy Margin Tool, Grape Vine Cost tool, etc. See Forum for code samples Web interfaces in development, test site at: http://agfinance.dyson.cornell.edu/agriskmanagement/ Some qualifiers

Abridged/Partial Summaries of Major Datasets/Sources Currently in AgDB Data Source and Item IPCC Climate Change Projections National Climatic Data Center Drought Data PRISMs Climate Group Chicago Mercantile Exchange Risk Management Agency (RMA) US Census Bureau USDA Economic Research Service (ERS) USDA Agricultural Marketing Service (AMS) USDA National Agricultural Statistics Service (NASS) USDA Foreign Agricultural Service USDA National Resource Conservation Service (NRCS) Description Future temperature and precipitation projections across different emission scenarios and percentiles of the 16 General Circulation Models (GCMs). Raw or spatially processed data. Monthly PDSI drought index data available at the climate district level aggregation. Data is available from 1895 to present, by NCDC District. Monthly and daily historical temperature and precipitation data, as well as GDD/HDD processed data. Monthly data is available from 1895 to present. Daily weather data is available from 1981 to present. 800 meter resolution (raw) and processed by FIPS, Township, and in certain cases CLU (pre-2008) available. Daily historical futures and options data for agricultural commodities from the Chicago Mercantile Exchange (CME), Chicago Board of Trade (CBOT), and Kansas City Board of Trade (KCBOT). Data is available from 1959 to present, updated daily. Agricultural insurance price and participation data available at the county level aggregation. Data is available from 1989 to present from Summary of Business. Other data also loaded from various unstructured text files (including historical discovery prices, GRIP yields, etc.) County-level and township level geographical coordinates, land area size, water area size, and population data. Annual farm structural and financial data available at state-level aggregation for the 15 Agricultural Resource Management Survey (ARMS) states. Data is available from 1996 to present. Other various datasets are also sourced from the ad hoc ERS tools and API s. Monthly data on the volume, pricing, and utilization of raw milk received by handlers regulated under Federal milk orders from dairy farmers. All tables in the Public MMO database. Census and survey data available at regional, state, and county level aggregation. The broad categories of data available are crops, animals and products, economics, demographics, and environmental. Data is available from 1926 to present. Obtained via FTP bulk download from QuickStats. CDL data processed against ready to map gssurgo NRCS data by crop also available (raw and county processed). Data on production, supply and distribution of agricultural commodities for the U.S. and key producing and consuming countries. Soil data for the continental US from gssurgo, raw and processed available at various levels of aggregation.

Applications & Accessing Data Applications: Virtually anything Insurance Conservation and Climate Change Policy Analysis and oversight Farm Bill Program Analysis Product Development Different tools for different users 1) Direct DB Connect (or bulk download for external users) Matlab, R, Python, or Web apps using, or standard SQL connections (ODBC, BCP, etc.) 2) Web API and data services for analysts 3) Interactive web tools for farmers, consumers of research Workflows for webtools Enables extension of research and tool dev

Soil Rating/Insurance 0.07 Soil Productivity Index Kernel Density among Common Land Units (CLU's), McLean County, Illinois (SSURGO Data) 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Soil Productivity Index (IL 810 Circular)

Ongoing Efforts and Priorities Recently received a Microsoft Azure Research Grant for use of Azure cloud platform, conversion to cloud platform in progress (early to mid-summer) Additional datasets, API s, tools (ongoing) Open Source launch (mid-summer) Upgraded data portal interface (late summer) for faster and more flexible cataloging/access Movement towards and incorporation of NoSQL platforms Identify various user needs, partners, and collaborators

Challenges, Policy Considerations, and Opportunities Technical and training Some degree of learning curve, but frankly minimal Technical limitations are eroding quickly, political ones are not Expanding purview Still inherently a public good, so without intervention will be underprovisioned Marginal cost curse Coordination within the community Improving access to government data (incentives and bandwidth) What data are made available or opened up How data are made available Privacy concerns Field is at an interesting vantage point compared to many others given mix of market, business, environmental and other natural systems data

Thank you Questions?