Building patient-level predictive models Martijn J. Schuemie, Marc A. Suchard and Patrick Ryan
|
|
|
- Clifford Whitehead
- 10 years ago
- Views:
Transcription
1 Building patient-level predictive models Martijn J. Schuemie, Marc A. Suchard and Patrick Ryan Contents 1 Introduction Specifying the cohort of interest and outcomes Installation instructions 2 3 Data extraction Configuring the connection to the server Preparing the cohort and outcome of interest Extracting the data from the server Fitting the model Train-test split Fitting the model on the training data Model evaluation Inspecting the model Acknowledgments 9 1 Introduction This vignette describes how you can use the PatientLevelPrediction package to build patient-level prediction models. We will walk through all the steps needed to build an exemplar model, and we have selected the well-studied topic of predicting re-hospitalization. To reduce the scope a bit, we have limited ourselves to a diabetes type 2 population. 1.1 Specifying the cohort of interest and outcomes The PatientLevelPrediction package requires longitudinal observational healthcare data in the OMOP Common Data Model format. The user will need to specify two things: 1. Time periods for which we wish to predict the occurrence of an outcome. We will call this the cohort of interest or cohort for short. One person can have multiple time periods, but time periods should not overlap. 2. Outcomes for which we wish to build a predictive model. The cohort and outcomes should be provided as data in a table on the server, where the table should have the same structure as the cohort table in the OMOP CDM, meaning it should have the following columns: 1
2 cohort_concept_id (CDM v4) or cohort_concept_id (CDM v5+), a unique identifier for distinguishing between different types of cohorts, e.g. cohorts of interest and outcome cohorts. subject_id, a unique identifier corresponding to the person_id in the CDM. cohort_start_date, the start of the time period where we wish to predict the occurrence of the outcome. cohort_end_date, which can be used to determine the end of the prediction window. Can be NULL for outcomes. The prediction window always starts on the cohort_start_date. When the usecohortenddate argument is set to TRUE, the prediction window will end on the cohort_end_date plus the number of days specified using the windowpersistence argument. If the usecohortenddate argument is set to FALSE, the prediction window will end on the cohort_start_date plus the number of days specified using the windowpersistence argument. The package will use data from the time period preceding (and including) the cohort_start_date to build a large set of features that can be used to predict outcomes during the prediction window (including the first and last day of the prediction window). These features can include binary indicators for the occurrence of any individual drug, condition, procedures, as well as demographics and comorbidity indices. Important: Since the cohort_start_date is included in both the time used to construct predictors, and the time when outcomes can occurr, it is up to the user to either remove outcomes occurring on the cohort_start_date, or to remove predictors that are in fact indicators of the occurrence of an outcome on the cohort_start_date. 2 Installation instructions Before installing the PatientLevelPrediction package make sure you have Java available. Java can be downloaded from For Windows users, RTools is also necessary. RTools can be downloaded from CRAN. The PatientLevelPrediction package is currently maintained in a Github repository, and has dependencies on other packages in Github. All of these packages can be downloaded and installed from within R using the devtools package: 2
3 install.packages("devtools") library(devtools) install_github("ohdsi/sqlrender") install_github("ohdsi/databaseconnector") install_github("ohdsi/ohdsirtools") install_github("ohdsi/cyclops") install_github("ohdsi/patientlevelprediction") Once installed, you can type library(patientlevelprediction) to load the package. 3 Data extraction The first step in running the PatientLevelPrediction is extracting all necessary data from the database server holding the data in the Common Data Model (CDM) format. 3.1 Configuring the connection to the server We need to tell R how to connect to the server where the data are. PatientLevelPrediction uses the DatabaseConnector package, which provides the createconnectiondetails function. Type?createConnectionDetails for the specific settings required for the various database management systems (DBMS). For example, one might connect to a PostgreSQL database using this code: connectiondetails <- createconnectiondetails(dbms = "postgresql", server = "localhost/ohdsi", user = "joe", password = "supersecret") cdmdatabaseschema <- "my_cdm_data" cohortsdatabaseschema <- "my_results" cdmversion <- "4" The last three lines define the cdmdatabaseschema and cohortsdatabaseschema variables,as well as the CDM version. We ll use these later to tell R where the data in CDM format live, where we want to create the cohorts of interest, and what version CDM is used. Note that for Microsoft SQL Server, databaseschemas need to specify both the database and the schema, so for example cdmdatabaseschema <- "my_cdm_data.dbo". 3.2 Preparing the cohort and outcome of interest Before we can start using the PatientLevelPrediction package itself, we need to construct a cohort of interest for which we want to perform the prediction, and the outcome, the event that we would like to predict. We do this by writing SQL statements against the CDM that populate a table containing the persons and events of interest. The resulting table should have the same structure as the cohort table in the CDM. For CDM v4, this means it should have the fields cohort_concept_id, cohort_start_date, cohort_end_date,and subject_id. For CDM v4, the cohort_concept_id field must be called cohort_definition_id. For our example study, we need to create the cohort of diabetics that have been hospitalized and have a minimum amount of observation time available before and after the hospitalization. We also need to defined re-hospitalizations, which we define as any hospitalizations occurring after the original hospitalization. For this purpose we have created a file called HospitalizationCohorts.sql with the following contents: 3
4 /*********************************** File HospitalizationCohorts.sql ***********************************/ IF 'U') IS NOT NULL DROP SELECT visit_occurrence.person_id AS subject_id, MIN(visit_start_date) AS cohort_start_date, MIN(visit_start_date)) AS cohort_end_date, 1 AS cohort_concept_id INNER ON visit_occurrence.person_id = observation_period.person_id INNER ON condition_occurrence.person_id = visit_occurrence.person_id WHERE place_of_service_concept_id IN (9201, 9203) AND DATEDIFF(DAY, observation_period_start_date, visit_start_date) AND visit_start_date > observation_period_start_date AND DATEDIFF(DAY, visit_start_date, observation_period_end_date) AND visit_start_date < observation_period_end_date AND DATEDIFF(DAY, condition_start_date, visit_start_date) AND condition_start_date <= visit_start_date AND condition_concept_id IN ( SELECT descendant_concept_id WHERE ancestor_concept_id = ) /* Type 2 DM */ GROUP BY visit_occurrence.person_id; INSERT SELECT visit_occurrence.person_id AS subject_id, visit_start_date AS cohort_start_date, visit_end_date AS cohort_end_date, 2 AS cohort_concept_id INNER ON visit_occurrence.person_id = rehospitalization.subject_id WHERE place_of_service_concept_id IN (9201, 9203) AND visit_start_date > cohort_start_date AND visit_start_date <= cohort_end_date AND cohort_concept_id = 1; This is parameterized SQL which can be used by the SqlRender package. We use parameterized SQL so we do not have to pre-specify the names of the CDM and result schemas. That way, if we want to run the SQL on a different schema, we only need to change the parameter values; we do not have to change the SQL code. By also making use of translation functionality in SqlRender, we can make sure the SQL code can be run in many different environments. library(sqlrender) sql <- readsql("hospitalizationcohorts.sql") sql <- rendersql(sql, cdmdatabaseschema = cdmdatabaseschema, cohortsdatabaseschema = cohortsdatabaseschema, 4
5 post_time = 30, pre_time = 365) )$sql sql <- translatesql(sql, targetdialect = connectiondetails$dbms)$sql connection <- connect(connectiondetails) executesql(connection, sql) In this code, we first read the SQL from the file into memory. In the next line, we replace four parameter names with the actual values. We then translate the SQL into the dialect appropriate for the DBMS we already specified in the connectiondetails. Next, we connect to the server, and submit the rendered and translated SQL. If all went well, we now have a table with the events of interest. We can see how many events per type: sql <- paste("select cohort_concept_id, COUNT(*) AS count", "GROUP BY cohort_concept_id") sql <- rendersql(sql, cohortsdatabaseschema = cohortsdatabaseschema)$sql sql <- translatesql(sql, targetdialect = connectiondetails$dbms)$sql querysql(connection, sql) cohort_concept_id count Extracting the data from the server Now we can tell PatientLevelPrediction to extract all necessary data for our analysis: covariatesettings <- createcovariatesettings(usecovariatedemographics = TRUE, usecovariateconditionoccurrence = TRUE, usecovariateconditionoccurrence365d = TRUE, usecovariateconditionoccurrence30d = TRUE, usecovariateconditionoccurrenceinpt180d = TRUE, usecovariateconditionera = TRUE, usecovariateconditioneraever = TRUE, usecovariateconditioneraoverlap = TRUE, usecovariateconditiongroup = TRUE, usecovariatedrugexposure = TRUE, usecovariatedrugexposure365d = TRUE, usecovariatedrugexposure30d = TRUE, usecovariatedrugera = TRUE, usecovariatedrugera365d = TRUE, usecovariatedrugera30d = TRUE, usecovariatedrugeraoverlap = TRUE, usecovariatedrugeraever = TRUE, usecovariatedruggroup = TRUE, usecovariateprocedureoccurrence = TRUE, usecovariateprocedureoccurrence365d = TRUE, usecovariateprocedureoccurrence30d = TRUE, 5
6 plpdata <- getdbplpdata(connectiondetails = connectiondetails, cdmdatabaseschema = cdmdatabaseschema, oracletempschema = oracletempschema, cohortdatabaseschema = cohortsdatabaseschema, cohorttable = "rehospitalization", cohortids = 1, washoutwindow = 183, usecohortenddate = TRUE, windowpersistence = 0, covariatesettings = covariatesettings, outcomedatabaseschema = cohortsdatabaseschema, outcometable = "rehospitalization", outcomeids = 2, firstoutcomeonly = FALSE, cdmversion = cdmversion) usecovariateproceduregroup = TRUE, usecovariateobservation = TRUE, usecovariateobservation365d = TRUE, usecovariateobservation30d = TRUE, usecovariateobservationcount365d = TRUE, usecovariatemeasurement = TRUE, usecovariatemeasurement365d = TRUE, usecovariatemeasurement30d = TRUE, usecovariatemeasurementcount365d = TRUE, usecovariatemeasurementbelow = TRUE, usecovariatemeasurementabove = TRUE, usecovariateconceptcounts = TRUE, usecovariateriskscores = TRUE, usecovariateriskscorescharlson = TRUE, usecovariateriskscoresdcsi = TRUE, usecovariateriskscoreschads2 = TRUE, usecovariateriskscoreschads2vasc = TRUE, usecovariateinteractionyear = FALSE, usecovariateinteractionmonth = FALSE, excludedcovariateconceptids = c(), deletecovariatessmallcount = 100) There are many parameters, but they are all documented in the PatientLevelPrediction manual. In short, we are using the getdbplpdata to get all data on the cohort of interest, outcomes and covariates from the database. The resulting plpdata object uses the package ff to store information in a way that ensures R does not run out of memory, even when the data are large. We can get some overall statistics using the generic summary() method: summary(plpdata) PlPData object summary Cohort ID: 1 Outcome concept ID(s): 2 Using cohort end date: TRUE Window persistence: 0 6
7 Persons: Windows: Outcome counts: Event count Window count Covariates: Number of covariates: Number of non-zero covariate values: Saving the data to file Creating the plpdata object can take considerable computing time, and it is probably a good idea to save it for future sessions. Because plpdata uses ff, we cannot use R s regular save function. Instead, we ll have to use the saveplpdata() function: saveplpdata(plpdata, "rehosp_plp_data") We can use the loadplpdata() function to load the data in a future session. 4 Fitting the model 4.1 Train-test split We typically not only want to build our model, we also want to know how good it is. Because evaluation using the same data on which the model was fitted can lead to overestimation, one uses a train-test split of the data or cross-validation. We can use the splitdata() function to split the data in a 75%-25% split: parts <- splitdata(plpdata, c(0.75, 0.25)) The parts variable is a list of plpdata objects. We can now fit the model on the first part of the data (the training set): 4.2 Fitting the model on the training data model <- fitpredictivemodel(parts[[1]], modeltype = "logistic") The fitpredictivemodel() function uses the Cyclops package to fit a large-scale regularized regression. To fit the model, Cyclops needs to know the hyperparameter value which specifies the variance of the prior. By default Cyclops will use cross-validation to estimate the optimal hyperparameter. However, be aware that this can take a really long time. You can use the prior and control parameters of the fitpredictivemodel() to specify Cyclops behaviour, including using multiple CPUs to speed-up the cross-validation. 7
8 4.3 Model evaluation We can evaluate how well the model is able to predict the outcome, for example on the test data we can compute the area under the ROC curve: prediction <- predictprobabilities(model, parts[[2]]) computeauc(prediction, parts[[2]]) [1] And we can plot the ROC curve itself: plotroc(prediction, parts[[2]]) Sensitivity specificity We can also look at the calibration: plotcalibration(prediction, parts[[2]], numberofstrata = 10) 8
9 Observed fraction Predicted probability 4.4 Inspecting the model Now that we have some idea of the operating characteristics of our predictive model, we can investigate the model itself: modeldetails <- getmodeldetails(model, plpdata) head(modeldetails) coefficient id covariatename in 365d on or prior to cohort index Intercept condition group: Livebirth ohort index: Cardiac arrest condition group: Livebirth , without interpretation and report This shows the strongest predictors in the model with their corresponding betas. 5 Acknowledgments Considerable work has been dedicated to provide the PatientLevelPrediction package. 9
10 citation("patientlevelprediction") To cite package 'PatientLevelPrediction' in publications use: Martijn J. Schuemie, Marc A. Suchard and Patrick B. Ryan (2015). PatientLevelPrediction: Package for patient level prediction using data in the OMOP Common Data Model. R package version A BibTeX entry for LaTeX users title = {PatientLevelPrediction: Package for patient level prediction using data in the OMOP Comm author = {Martijn J. Schuemie and Marc A. Suchard and Patrick B. Ryan}, year = {2015}, note = {R package version 1.0.0}, } ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'. Further, PatientLevelPrediction makes extensive use of the Cyclops package. citation("cyclops") To cite Cyclops in publications use: Suchard MA, Simpson SE, Zorych I, Ryan P and Madigan D (2013). "Massive parallelization of serial inference algorithms for complex generalized linear models." _ACM Transactions on Modeling and Computer Simulation_, *23*, pp. 10. <URL: A BibTeX entry for LaTeX users author = {M. A. Suchard and S. E. Simpson and I. Zorych and P. Ryan and D. Madigan}, title = {Massive parallelization of serial inference algorithms for complex generalized linear mo journal = {ACM Transactions on Modeling and Computer Simulation}, volume = {23}, pages = {10}, year = {2013}, url = { } This work is supported in part through the National Science Foundation grant IIS
Learning from observational databases: Lessons from OMOP and OHDSI
Learning from observational databases: Lessons from OMOP and OHDSI Patrick Ryan Janssen Research and Development David Madigan Columbia University http://www.omop.org http://www.ohdsi.org The sole cause
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
Linas Virbalas Continuent, Inc.
Linas Virbalas Continuent, Inc. Heterogeneous Replication Replication between different types of DBMS / Introductions / What is Tungsten (the whole stack)? / A Word About MySQL Replication / Tungsten Replicator:
Next-generation Phenotyping Using Interoperable Big Data
Biomedical Informatics discovery and impact Next-generation Phenotyping Using Interoperable Big Data George Hripcsak, Chunhua Weng Columbia University Medical Center Collab with Mount Sinai Medical Center
Achilles a platform for exploring and visualizing clinical data summary statistics
Biomedical Informatics discovery and impact Achilles a platform for exploring and visualizing clinical data summary statistics Mark Velez, MA Ning "Sunny" Shang, PhD Department of Biomedical Informatics,
In this tutorial, we try to build a roc curve from a logistic regression.
Subject In this tutorial, we try to build a roc curve from a logistic regression. Regardless the software we used, even for commercial software, we have to prepare the following steps when we want build
Package dsmodellingclient
Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
Installation Guide for contineo
Installation Guide for contineo Sebastian Stein Michael Scholz 2007-02-07, contineo version 2.5 Contents 1 Overview 2 2 Installation 2 2.1 Server and Database....................... 2 2.2 Deployment............................
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.
Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D. Most college courses in statistical analysis and data mining are focus on the mathematical techniques for analyzing data structures, rather
How to extract transform and load observational data?
How to extract transform and load observational data? Martijn Schuemie Janssen Research & Development Department of Pharmacology & Pharmacy, The University of Hong Kong Outline Observational data & research
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
Estimating a market model: Step-by-step Prepared by Pamela Peterson Drake Florida Atlantic University
Estimating a market model: Step-by-step Prepared by Pamela Peterson Drake Florida Atlantic University The purpose of this document is to guide you through the process of estimating a market model for the
Predictive Modelling of High Cost Healthcare Users in Ontario
Predictive Modelling of High Cost Healthcare Users in Ontario Health Analytics Branch, MOHLTC SAS Health User Group Forum April 12, 2013 Introduction High cost healthcare users (HCUs) are patients who
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: [email protected] Office: Dipartimento di Ingegneria
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Object systems available in R. Why use classes? Information hiding. Statistics 771. R Object Systems Managing R Projects Creating R Packages
Object systems available in R Statistics 771 R Object Systems Managing R Projects Creating R Packages Douglas Bates R has two object systems available, known informally as the S3 and the S4 systems. S3
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Tungsten Replicator, more open than ever!
Tungsten Replicator, more open than ever! MC Brown, Senior Product Line Manager September, 2015 2014 VMware Inc. All rights reserved. We Face An Age Old Problem BRS/Search 2 It s Gotten Worse 3 Much Worse
Installing Cobra 4.7
Installing Cobra 4.7 Stand-alone application using SQL Server Express A step by step guide to installing the world s foremost earned value management software on a single PC or laptop. 1 Installing Cobra
Your Best Next Business Solution Big Data In R 24/3/2010
Your Best Next Business Solution Big Data In R 24/3/2010 Big Data In R R Works on RAM Causing Scalability issues Maximum length of an object is 2^31-1 Some packages developed to help overcome this problem
This chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016
Data Migration from Magento 1 to Magento 2 Including ParadoxLabs Authorize.Net CIM Plugin Last Updated Jan 4, 2016 This guide was contributed by a community developer for your benefit. Background Magento
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Contents WEKA Microsoft SQL Database
WEKA User Manual Contents WEKA Introduction 3 Background information. 3 Installation. 3 Where to get WEKA... 3 Downloading Information... 3 Opening the program.. 4 Chooser Menu. 4-6 Preprocessing... 6-7
SpagoBI exo Tomcat Installation Manual
SpagoBI exo Tomcat Installation Manual Authors Luca Fiscato Andrea Zoppello Davide Serbetto Review Grazia Cazzin SpagoBI exo Tomcat Installation Manual ver 1.3 May, 18 th 2006 pag. 1 of 8 Index 1 VERSION...3
Package TSfame. February 15, 2013
Package TSfame February 15, 2013 Version 2012.8-1 Title TSdbi extensions for fame Description TSfame provides a fame interface for TSdbi. Comprehensive examples of all the TS* packages is provided in the
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Third-Party Software Support. Converting from SAS Table Server to a SQL Server Database
Third-Party Software Support Converting from SAS Table Server to a SQL Server Database Table of Contents Prerequisite Steps... 1 Database Migration Instructions for the WebSphere Application Server...
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
GETTING STARTED WITH SQL SERVER
GETTING STARTED WITH SQL SERVER Download, Install, and Explore SQL Server Express WWW.ESSENTIALSQL.COM Introduction It can be quite confusing trying to get all the pieces in place to start using SQL. If
Data warehousing with PostgreSQL
Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience
Profile Data Mining with PerfExplorer. Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu
Profile Data Mining with PerfExplorer Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu TAU Analysis PerfDMF: Performance Data Mgmt. Framework 3 Using Performance Database
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
Speedy Organizer. SQL Server 2008 R2 RTM Express - Installation Guide Introduction
Speedy Organizer SQL Server 2008 R2 RTM Express - Installation Guide Introduction Contents Revision table... 1 1 Introduction... 2 2 SQL Server 2008 Express installation... 2 2.1 Installation... 2 3 Post
Data Mining Lab 5: Introduction to Neural Networks
Data Mining Lab 5: Introduction to Neural Networks 1 Introduction In this lab we are going to have a look at some very basic neural networks on a new data set which relates various covariates about cheese
AWS Schema Conversion Tool. User Guide Version 1.0
AWS Schema Conversion Tool User Guide AWS Schema Conversion Tool: User Guide Copyright 2016 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may
System Area Management Software Tool Tip: Integrating into NetIQ AppManager
System Area Management Software Tool Tip: Integrating into NetIQ AppManager Overview: This document provides an overview of how to integrate System Area Management's event logs with NetIQ's AppManager.
Tutorial: How to Use SQL Server Management Studio from Home
Tutorial: How to Use SQL Server Management Studio from Home Steps: 1. Assess the Environment 2. Set up the Environment 3. Download Microsoft SQL Server Express Edition 4. Install Microsoft SQL Server Express
CDD user guide. PsN 4.4.8. Revised 2015-02-23
CDD user guide PsN 4.4.8 Revised 2015-02-23 1 Introduction The Case Deletions Diagnostics (CDD) algorithm is a tool primarily used to identify influential components of the dataset, usually individuals.
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Install SQL Server 2014 Express Edition
How To Install SQL Server 2014 Express Edition Updated: 2/4/2016 2016 Shelby Systems, Inc. All Rights Reserved Other brand and product names are trademarks or registered trademarks of the respective holders.
i2b2 Installation Guide
Informatics for Integrating Biology and the Bedside i2b2 Installation Guide i2b2 Server and Clients Document Version: 1.7.00-003 Document Management Revision Number Date Author Comment 1.7.00-001 03/06/2014
Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734
Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data
IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL
IUCLID 5 Guidance and support Installation Guide Distributed Version Linux - Apache Tomcat - PostgreSQL June 2009 Legal Notice Neither the European Chemicals Agency nor any person acting on behalf of the
HP NonStop JDBC Type 4 Driver Performance Tuning Guide for Version 1.0
HP NonStop JDBC Type 4 Driver November 22, 2004 Author: Ken Sell 1 Introduction Java applications and application environments continue to play an important role in software system development. Database
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Big Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
Fortgeschrittene Computerintensive Methoden
Fortgeschrittene Computerintensive Methoden Einheit 3: mlr - Machine Learning in R Bernd Bischl Matthias Schmid, Manuel Eugster, Bettina Grün, Friedrich Leisch Institut für Statistik LMU München SoSe 2014
Real SQL Programming 1
Real 1 We have seen only how SQL is used at the generic query interface an environment where we sit at a terminal and ask queries of a database. Reality is almost always different: conventional programs
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
Intensive Care Cloud (ICCloud) Venus-C pilot application
Intensive Care Cloud (ICCloud) Venus-C pilot application Intensive Care Cloud - ICCloud Led by the Internet Computing Lab / University of Cyprus (http:// www.grid.ucy.ac.cy) Problem Target ICU Patient
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
extreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
Karl Lum Partner, LabKey Software [email protected]. Evolution of Connectivity in LabKey Server
Karl Lum Partner, LabKey Software [email protected] Evolution of Connectivity in LabKey Server Connecting Data to LabKey Server Lowering the barrier to connect scientific data to LabKey Server Increased
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
Using the SQL Server Linked Server Capability
Using the SQL Server Linked Server Capability SQL Server s Linked Server feature enables fast and easy integration of SQL Server data and non SQL Server data, directly in the SQL Server engine itself.
We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap
Statistical Learning: Chapter 5 Resampling methods (Cross-validation and bootstrap) (Note: prior to these notes, we'll discuss a modification of an earlier train/test experiment from Ch 2) We discuss 2
MONAHRQ Installation Permissions Guide. Version 2.0.4
MONAHRQ Installation Permissions Guide Version 2.0.4 March 19, 2012 Check That You Have all Necessary Permissions It is important to make sure you have full permissions to run MONAHRQ. The following instructions
Installing SQL Server Express 2008 Version 3 2014/08/05 sdk
Installing SQL Server Express 2008 Version 3 2014/08/05 sdk ArcGIS Users - Before proceeding with the process described in this document it may be easier install SQL Server Express 2008 using ArcGIS install
AklaBox. The Ultimate Document Platform for your Cloud Infrastructure. Installation Guideline
AklaBox The Ultimate Document Platform for your Cloud Infrastructure Installation Guideline Contents Introduction... 3 Environment pre-requisite for Java... 3 About this documentation... 3 Pre-requisites...
SQL Server 2005 Features Comparison
Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions
SAP Predictive Analysis Installation
SAP Predictive Analysis Installation SAP Predictive Analysis is the latest addition to the SAP BusinessObjects suite and introduces entirely new functionality to the existing Business Objects toolbox.
FileMaker 11. ODBC and JDBC Guide
FileMaker 11 ODBC and JDBC Guide 2004 2010 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker is a trademark of FileMaker, Inc. registered
Examining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
System Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.
SQL Databases Course by Applied Technology Research Center. 23 September 2015 This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. Oracle Topics This Oracle Database: SQL
Package missforest. February 20, 2015
Type Package Package missforest February 20, 2015 Title Nonparametric Missing Value Imputation using Random Forest Version 1.4 Date 2013-12-31 Author Daniel J. Stekhoven Maintainer
Intro to Databases. ACM Webmonkeys 2011
Intro to Databases ACM Webmonkeys 2011 Motivation Computer programs that deal with the real world often need to store a large amount of data. E.g.: Weather in US cities by month for the past 10 years List
Geodatabase Programming with SQL
DevSummit DC February 11, 2015 Washington, DC Geodatabase Programming with SQL Craig Gillgrass Assumptions Basic knowledge of SQL and relational databases Basic knowledge of the Geodatabase We ll hold
Using Netbeans and the Derby Database for Projects Contents
Using Netbeans and the Derby Database for Projects Contents 1. Prerequisites 2. Creating a Derby Database in Netbeans a. Accessing services b. Creating a database c. Making a connection d. Creating tables
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang
Graph Mining on Big Data System Presented by Hefu Chai, Rui Zhang, Jian Fang Outline * Overview * Approaches & Environment * Results * Observations * Notes * Conclusion Overview * What we have done? *
FileMaker 12. ODBC and JDBC Guide
FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.
Team Foundation Server 2013 Installation Guide
Team Foundation Server 2013 Installation Guide Page 1 of 164 Team Foundation Server 2013 Installation Guide Benjamin Day [email protected] v1.1.0 May 28, 2014 Team Foundation Server 2013 Installation Guide
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Flatten from/to Relational
Flatten from/to Relational Jácome Cunha João Saraiva Joost Visser CIC 2007 22-23 October Universidade do Minho Software Improvement Group Jácome Cunha Flatten from/to Relational 1 / 21 Overview Jácome
JDBC. It is connected by the Native Module of dependent form of h/w like.dll or.so. ex) OCI driver for local connection to Oracle
JDBC 4 types of JDBC drivers Type 1 : JDBC-ODBC bridge It is used for local connection. ex) 32bit ODBC in windows Type 2 : Native API connection driver It is connected by the Native Module of dependent
Using Microsoft Access Front End to NIMSP MySQL Database
Technical Report Using Microsoft Access Front End to NIMSP MySQL Database by Earl F Glynn 10 Oct. 2013 Contents Purpose... 3 Introduction... 3 National Institute on Money in State Politics... 3 How to
Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?
Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide
Package sjdbc. R topics documented: February 20, 2015
Package sjdbc February 20, 2015 Version 1.5.0-71 Title JDBC Driver Interface Author TIBCO Software Inc. Maintainer Stephen Kaluzny Provides a database-independent JDBC interface. License
Online Backup - Installation and Setup
The following guide will assist you in installing the Attix5 Backup Professional Server Edition on your Admin Server. You will have received your Group Name and Group Create Key from the Schools Broadband
Microsoft SQL Server Staging
Unified ICM requires that you install Microsoft SQL Server on each server that hosts a Logger or Administration & Data Server (Real Time Distributor and HDS only) component. Microsoft SQL Server efficiently
Download and Installation of MS SQL Server
Download and Installation of MS SQL Server To use MS SQL Server and its database, one needs to download the server software and server management software. Fortunately, Microsoft provides a free version
ivos Technical Requirements V06112014 For Current Clients as of June 2014
ivos Technical Requirements V06112014 For Current Clients as of June 2014 The recommended minimum hardware and software specifications for ivos version 4.2 and higher are described below. Other configurations
How To Understand How Weka Works
More Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz More Data Mining with Weka a practical course
SECURITY DOCUMENT. BetterTranslationTechnology
SECURITY DOCUMENT BetterTranslationTechnology XTM Security Document Documentation for XTM Version 6.2 Published by XTM International Ltd. Copyright XTM International Ltd. All rights reserved. No part of
