Making Good Use of Data at Hand: Government Data Projects Mark C. Cooke, Ph.D. Tax
Tax Management Associates Privately held company serving state and local government Markets across eighteen (18) states and more than 500 clients 34 years in business Staff of ~135 Four national offices HQ in Charlotte, NC Making Good Use of Data Mark C Cooke - Tax
Tax Management Associates Services include: Staff of credentialed specialized auditors for various local tax types Technology solutions, including SaaS applications Taxscribe (online tax listing service), CAVS (modeling application) Data-based solutions for business intelligence, fraud detection, and revenue optimization Making Good Use of Data Mark C Cooke - Tax
Data Solutions Problem: Govt s legacy data collection, storage, and management systems Problem: Data is across departments and agencies Problem: Governments have no direct access to data Making Good Use of Data Mark C Cooke - Tax
Data Solutions Revenue and Tax Fire and Police Governance 2011 2012 Force Budget 911 Voter Services Revenues Crime Econ. Dev. Making Good Use of Data Mark C Cooke - Tax
Open Data Concept Open Data 1. Govt s collect enormous quantities of useful data 2. Data made available to a wide audience will leverage insights from industry and academia 3. Open Data and Business Intelligence can be consumed internally by Govts as well Making Good Use of Data Mark C Cooke - Tax
Open Data Making Good Use of Data Mark C Cooke - Tax
Data Scientist
Doing Data the Old Way Data is locked inside systems :-( Software systems are designed to wrap a Graphical User Interface (GUI) around data The GUI functionality, historically, has to be programmed to produce reports, views, and analysis The GUI is driven by the sole purpose of the software. But the data has many purposes
Open Data Way Forward Using data in ways for which it was never intended Connecting data across multiple platforms Using data for novel insight Better governance through using data at hand and rapid development of analytics
Data Science Real Property Personal Property Permits Text Based Data Sales/Use Billing & Collections Police & Fire Expenditures Insight (Business Intelligence) + Answers
Data Science What is the output? Business Intelligence (BI) or actionable information that drives business decisions through insight Actionable insights from existing data Visualizations - making it consumable to a nonspecialist audience According to Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. http://en.wikipedia.org/wiki/data_visualization
The advantages of Knime: Read data in from multiple resources in real time, re-executing analyses on demand Simple GUI-based analysis environment for nonprogramming oriented users Resulting data can be written out to tables or to visualizations, depending on the context Web-portal allows non-technical end-users to consume output
The advantages of Knime: Rapid development environment Very powerful processing, handling large datasets on commodity hardware Allows for 100% data samples up to millions of elements row-wise Nodes provide access to complex algorithms for statistical or machine learning approaches
Knime Integrates with R R integration is key to expanding the data analysis and visualization capabilities of Knime R supports data ingestion of complex files (including ESRI) R supports complex data manipulation and statistical analysis R supports a wide variety of highly customizable visualizations So, what is R, exactly?
R Project for Statistical Computing www.r-project.org R is an open source scripting language which can be run inside Knime, but also within a command line environment independently Several GUI interfaces for R exist such as R Studio, a group that provides software for using R as well as training and extension packages (www.rstudio.com) Community contributions make up the bulk of R packages, which now total more than 4,700
Applications Case examples for working with county data: Combine real property data with 911 services so that responders can know the size, shape, and details of a property Identify holes in the tax base for entities which may be reporting one tax type but not others 100% sample of revenue impact from policy changes Productivity analyses on units produced over time Revenue resources and time series of annual revenue cycles across the entire revenue base, compared year on year Crime patterns for research and predictive policing
Demonstrations Data: Florida 67 Counties > 1.24 million personal property accounts
Demonstrations Data: database state change table (events table) > 90k events Output produced 200-300% performance improvement
Questions? Thank you for your time and attention. I am always happy to discuss data, so please feel free to reach out to any of the contact information below. Mark C Cooke Mark.Cooke@tma1.com 704.847.1234 (office) 704.953.6349 (cell) www.linkedin.com/in/markccooke