Integration of biospecimen data with clinical data mining
|
|
- Shannon Baker
- 8 years ago
- Views:
Transcription
1 Astrid Genet 24 Oct, 2014
2 The origins of Big Data in biomedicine As in many other fields, recently emerged state-of-the-art biomedical technologies generates huge and heterogeneous amount of digital health care information of all types accumulated from patients. Those large quantities of data are referred to as "Big Data".
3 What makes Big Data special? They cannot be managed and processed by conventional methods for the following reasons: Volume: Big data implies enormous volumes of data; Variety: data typically come from multiple sources (images, free text, measurements from monitoring devices, audio records, etc.) and therefore can have various formats; Velocity: the speed at which the data are generated massive and sometimes continuous ("real-time data"); Variability: refers to the possible biases, noise, abnormality or time inconsistency in data (things like volume and velocity makes the variability even harder to handle).
4 Some of the recent technologies that went with an associated large collection of experimental data: Microarrays of gene expression data are being generated by the gigabyte all over the world; Next-generation sequencing (NGS) has exponentially increased the rate of biological data generation in the last 4 years a (considered small) project with 10 to 20 whole genome sequencing samples can generate about 4TB of raw data; Mass spectrometry also generate massive amount of complex proteomic data the ProteomicsDB database 1 is a mass-spectrometry-based draft of the human proteome represents terabytes of big data; Medical imaging: in just 20 years, MRI has revolutionised medical imaging by produces diagnostic images of photographic quality one year of imaging is over 15 TB (with a very low acquisition to analysis ratio). Patient Data management systems: comprehensive software recording measurements from ICUs static and temporal data are stored from one of the most data intensive environments in medicine (admission data, monitoring devices, laboratory analyses, annotations from the medical staff, etc.). 1 Mathias Wilhelm, Judith Schlegl, Hannes Hahne, Amin Moghaddas Gholami, Marcus Lieberenz, Mikhail M Savitski, Emanuel Ziegler, Lars Butzmann, Siegfried Gessulat, Harald Marx, et al. Mass-spectrometry-based draft of the human proteome. Nature, 509(7502): , 2014
5 What do we do with it all? Modern technologies make it possible to generate huge quantities of complex and high quality data, at a reasonable price. But does it really make it possible to get more for less in terms of disease classifiers, analyses of shape and improved diagnostic accuracy? Emerging challenges have to be faced: Storage and organisation of the volume of information (requires hardware, maintenance, physical space); Concerns over privacy and security of patient data; Bioinformatics and biostatistics processing tools should be adapted to the size and complexity of the data.
6 Storage, maintenance and organisation The question is quite well organised for microarray gene expression data. Measurements data are stored in (public or subscription-based) repositories called microarray databases, which also manage a searchable index and make the data available for analysis and interpretation. Some standards have also been created for reporting microarray experiments under a reliable form: MIAME (Minimum Information About a Microarray Experiment) standard; MACQ (MicroArray Quality Control) project.
7 Storage, maintenance and organisation Storage remains a major challenge for NGS, medical imaging and Mass spectrometry data which represent larger amount of data (by the TB). There is not yet an established standard for storing and exchanging them. Centralized storage should allow: everything to be in one place; everything to be in one format; to read and use analysis tools to interact directly with the data. Concerns with time, expense, and security that arise from those requirements given the size of the data are still an issue.
8 Challenges in biostatistics and bioinformatics Ultimate goal of clever storage and organisation of biological data: turn them into usable information for mining and real knowledge. Challenges faced by traditional biostatistics and bioinformatics: exploration and cleaning of large and incomplete datasets (variable transformations, relationship among variables, verification and quality control) time-consuming, difficult or impossible to fully complete risk of overlooked relationships, likelihood of errors or omissions; traditional statistical models, software programs, visualisation tools do not scale for application to large-scale data; insufficient computer processing power extreme time delays when running complex models; interpretation of analytical results and their clinical applications analysts might need effective clinical support to guide them.
9 Emerging solutions: computational facilities for analysing Big Data New tools are continually emerging and solutions appear, related to computing performance, computing environment and analysis algorithms. High-performance computing solutions include: highly optimised CPU multicore workstation; Graphics Prossessing Unit (GPU) significantly speeding up the processing of mining algorithms on workstations; parallel processing on multiple processor core also an option to reduce computation time; cloud-based computing moving computation to resources delivered over the Internet ("renting" computational power by the hour and save the acquisition of expensive resources).
10 Emerging solutions: environments for statistical computing Specific extensions of computing environments enables to handle large datasets. For example, R(64 bits) offers the following facilities: facilities for High-Performance CPU and GPU Parallel Computing (domc, gputools); options to use file-based access to data sets that are too large to be loaded into R s internal memory (RAM access) (ff, bigmemory); easy transfer of Robjects to efficient C or C++ functions via the use of dll (.C(), Rcpp); flexible and fast visualization method to explore and analyse large multivariate dataset (bigviz). Equivalent possibilities also exists in similar environments like Perl, Python and Matlab.
11 Data analysis of Big Data Regarding the processing of data (if supported by adequate computational resources), flexible models from the Machine Learning field adapt better to large datasets than statistical models with highly structured forms (linear regression, logistic regression, discriminant analysis) because they enable inference in non-standard situations: non-i.i.d. data;. semi-supervised learning; learning with structured data; etc. Examples of machine learning algorithms suitable for mining large and complex datasets: neural networks, classification and regression trees (decision trees), naive Bayes, k-nearest neighbor, support vector machines, etc.
12 Dealing with the complexity of biomedical data: still an open issue Pre-processing the data often requires the biggest effort in a data-mining study. Most issues concern either: the structure of data: missing meta-information (fields meaning, keys, units), class label imbalance (control/cases), repeated measurements, etc. the quality of data: typos, multiple formats, changes in scale, gaps in time series, missing values, duplicated measurements, etc. Some tools exist that help dealing with those situations (multiple imputation, resampling technique, etc.) but the volume and velocity of data hamper the solutions available for smaller datasets and make them highly resource-consuming. So far there is no consensus regarding the right way (and order) to deal with the complexity of medical datasets. Care must be taken: inappropriate pre-processing can destroy or mutilate information, lead to misleading results or bias.
13 Patient specific modeling Large healthcare datasets also opens new avenues regarding the development of personalized diagnostics and therapeutics: if data are mined from billions of persons each patient can be surrounded by a "virtual cloud" of cases matching its own health status. Patient specific modelling (PSM) is an emerging field in biostatistics. modelling technique are rather specific to the medical field (bones, heart and circulation, brain, diagnostics, surgical planning, etc.) 1 the common goal however is to develop computational models that are influenced by the particular history, symptoms, laboratory results, etc. of the patient and perform better than population-wide learners 2. 1 Amit Gefen. Patient-Specific Modeling in Tomorrow s Medicine, volume 9. Springer, Visweswaran. Shyam, Derek C. Angus, Margaret Hsieh, Lisa Weissfeld, Donald Yealy, and Gregory F. Cooper. Learning patient-specific predictive models from clinical data. Journal of Biomedical Informatics, 43(5): , 2010
14 Project PATIENTS Goal: help understand and early diagnose postoperative liver and kidney failure leading cause of death in surgical ICUs; risk factors, causes and prognosis are not fully understood. Efforts devoted to the adjustment of a reliable methodology using state-of-the-art mining procedures for the development of accurate and robust clinical prediction models, meant to complement medical reasoning in decision making tasks.
15 PATIENTS database Learning algorithms are developed and tested on clinical data from the PDMS of the COPRA System company installed at the intensive care unit of the University Hospital of Rostock (Germany). Data consists of: almost cases; admitted for major surgery between 2008 and 2011; up to parameters measured at different time intervals; parameters include demographic, clinical and laboratory information.
16 Clinical translation of data mining results According to the currently increasing demand for translational medicine, biomedical research results must be conveyed to the health-care providers, in a manner that is fast and easy to understand and apply to patient care. The way we want to make this happen is by developing a further module for integration in the COPRA PDMS system therefore turning it into a clinical decision support system (CDSS): carry out predefined decision rules based on data in the PDMS; help predict potential events; assist the nurse or physician in their diagnostic and in the choice of an appropriate course of treatment; increase ICU patient safety and compliance.
17 Multi-disciplinary collaborations Reaching this ultimate goal requires expertise in many disciplines and multi-disciplinary collaborations: Medical research and clinical care gather data from clinical studies and offer guidance from biological reasoning; Mathematical and statistical expertise of data analysts methods for analysis and results interpretation; Computer science and developers make the results available and usable to healthcare providers.
18 Experimental methodology: development of clinical prediction models Benchmarking classification models: flexible models from machine learning (tree, SVM, BN); statistic models (LDA, LR) easier to interpret. Special care will be taken to: avoid over-fitting: production of a combined model, averaged over a large amount of single classifiers (reduced variance); get accurate performance estimates: using resampling methods to try and inject variation into the system better approximate performance on future samples. Adapted pre-processing of data class imbalance problem addressed at data level using resampling (SMOTE algorithm); missing values in the data filled in with plausible values using multiple imputation (10-20 repetitions); all pre-processing steps included within the resampling loop ensure fair performance estimates.
19 Experimental methodology Development of patient-specific algorithms well suited methodology: selection of a subset among the single learners, most relevant for the patient at hand; averaged patient prediction over the optimised subset. way to make use of the population-wide methodology and limit the computational burden while improving the accuracy of the outcome. Computational solutions: Workstation multicore CPU and extensions to the R software environment for applications on large data.
20 Thank you for your attention
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationHow To Change Medicine
P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version
More informationFrom Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data
100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.
More informationData Mining On Diabetics
Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationData Mining and Machine Learning in Bioinformatics
Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationEPSRC Cross-SAT Big Data Workshop: Well Sorted Materials
EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationRulex s Logic Learning Machines successfully meet biomedical challenges.
Rulex s Logic Learning Machines successfully meet biomedical challenges. Rulex is a predictive analytics platform able to manage and to analyze big amounts of heterogeneous data. With Rulex, it is possible,
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationUniversity Uses Business Intelligence Software to Boost Gene Research
Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile
More informationPREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA
PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationDr. Rob Donald - Curriculum Vitae. Email: rob@statsresearch.co.uk, Web: http://www.statsresearch.co.uk Mob: 07780 650 910
Dr. Rob Donald - Curriculum Vitae Email: rob@statsresearch.co.uk, Web: http://www.statsresearch.co.uk Mob: 07780 650 910 Profile Data Scientist, Systems and Data Analyst In my current role I am a senior
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationCourse Requirements for the Ph.D., M.S. and Certificate Programs
Health Informatics Course Requirements for the Ph.D., M.S. and Certificate Programs Health Informatics Core (6 s.h.) All students must take the following two courses. 173:120 Principles of Public Health
More informationSurvey of clinical data mining applications on big data in health informatics
Survey of clinical data mining applications on big data in health informatics Matthew Herland, Taghi M. Khoshgoftaar, and Randall Wald 劉 俊 成 Survey of clinical data mining applications on big data in health
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationMedical Informatics II
Medical Informatics II Zlatko Trajanoski Institute for Genomics and Bioinformatics Graz University of Technology http://genome.tugraz.at zlatko.trajanoski@tugraz.at Medical Informatics II Introduction
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationPromises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends
Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in
More informationGerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationEmbedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008
Embedded Systems in Healthcare Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008 About the Speaker Working for Philips Research since 1982 Projects
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationData Cleansing for Remote Battery System Monitoring
Data Cleansing for Remote Battery System Monitoring Gregory W. Ratcliff Randall Wald Taghi M. Khoshgoftaar Director, Life Cycle Management Senior Research Associate Director, Data Mining and Emerson Network
More informationMEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012
MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationMachine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationINTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationBuilding a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London
Building a Collaborative Informatics Platform for Translational Research: An IMI Project Experience Prof. Yike Guo Department of Computing Imperial College London Living in the Era of BIG Big Data : Massive
More informationKeywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
More informationDataSafe Solutions. Protect your valuable genomic data
DataSafe Solutions Protect your valuable genomic data Central and secure storage of next-generation sequencing (NGS) data is critical to the success of your organization. The ability to store and protect
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationmedexter clinical decision support
medexter Arden Syntax training course Klaus-Peter Adlassnig and Karsten Fehre Medexter Healthcare Borschkegasse 7/5 A-1090 Vienna www.medexter.com Arden Syntax training course, Vienna, 17 June 2015 Computers
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationRandom forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationTHE ROLE OF BIG DATA IN HEALTH AND BIOMEDICAL RESEARCH. John Quackenbush Dana-Farber Cancer Institute Harvard School of Public Health
THE ROLE OF BIG DATA IN HEALTH AND BIOMEDICAL RESEARCH John Quackenbush Dana-Farber Cancer Institute Harvard School of Public Health CONFIDENTIAL Background and Disclosures Professor of Biostatistics and
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationMedical Big Data Interpretation
Medical Big Data Interpretation Vice president of the Xiangya Hospital, Central South University The director of the ministry of mobile medical education key laboratory Professor Jianzhong Hu BIG DATA
More informationVad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
More informationHealthcare Professional. Driving to the Future 11 March 7, 2011
Clinical Analytics for the Practicing Healthcare Professional Driving to the Future 11 March 7, 2011 Michael O. Bice Agenda Clinical informatics as context for clinical analytics Uniqueness of medical
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationCLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationData Science, Predictive Analytics & Big Data Analytics Solutions. Service Presentation
Data Science, Predictive Analytics & Big Data Analytics Solutions Service Presentation Did You Know That According to the new research from GE and Accenture*: 87% of companies believe Big Data analytics
More informationDr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
More informationHow To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationA Case Study on the Use of Unstructured Data in Healthcare Analytics. Analysis of Images for Diabetic Retinopathy
A Case Study on the Use of Unstructured Data in Healthcare Analytics Analysis of Images for Diabetic Retinopathy A Case Study on the Use of Unstructured Data in Healthcare Analytics: Analysis of Images
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationHigh Performance Compu2ng Facility
High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationCourse Requirements for the Ph.D., M.S. and Certificate Programs
Course Requirements for the Ph.D., M.S. and Certificate Programs PhD Program The PhD program in the Health Informatics subtrack inherits all course requirements of the Informatics PhD program, that is,
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationHPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr
More informationKNIME Enterprise server usage and global deployment at NIBR
KNIME Enterprise server usage and global deployment at NIBR Gregory Landrum, Ph.D. NIBR Informatics Novartis Institutes for BioMedical Research, Basel 8 th KNIME Users Group Meeting Berlin, 26 February
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationEHR CURATION FOR MEDICAL MINING
EHR CURATION FOR MEDICAL MINING Ernestina Menasalvas Medical Mining Tutorial@KDD 2015 Sydney, AUSTRALIA 2 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Agenda Motivation the potential
More informationAlgorithmic Scoring Models
Applied Mathematical Sciences, Vol. 7, 2013, no. 12, 571-586 Algorithmic Scoring Models Kalamkas Nurlybayeva Mechanical-Mathematical Faculty Al-Farabi Kazakh National University Almaty, Kazakhstan Kalamkas.nurlybayeva@gmail.com
More informationData Mining and Pattern Recognition for Large-Scale Scientific Data
Data Mining and Pattern Recognition for Large-Scale Scientific Data Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory October 15, 1998 We need an effective
More informationHow To Get A Computer Science Degree
MAJOR: DEGREE: COMPUTER SCIENCE MASTER OF SCIENCE (M.S.) CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING The Department of Computer Science offers a Master of Science
More information