APPLICATION OF POPULATION-BASED TECHNOLOGY IN SELECTION OF GLYCAN MARKERS FOR CANCER DETECTION. A Thesis. Presented to the.

Size: px
Start display at page:

Download "APPLICATION OF POPULATION-BASED TECHNOLOGY IN SELECTION OF GLYCAN MARKERS FOR CANCER DETECTION. A Thesis. Presented to the."

Transcription

1 APPLICATION OF POPULATION-BASED TECHNOLOGY IN SELECTION OF GLYCAN MARKERS FOR CANCER DETECTION A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Haofei Fang Summer 212

2

3 iii Copyright 212 by Haofei Fang All Rights Reserved

4 iv DEDICATION This thesis is dedicated to my dear fiancé, who supported me when I was in perplexity during the thesis preparation. She is one of the great powers pushing me to go forward. Also it is dedicated to my parents, who taught me the way to face difficulties. Their support encourages me to find my way to success.

5 v ABSTRACT OF THE THESIS Application of Population-Based Technology in Selection of Glycan Markers for Cancer Detection by Haofei Fang Master of Science in Computer Science San Diego State University, 212 Recent advances in computer technology and in molecular biology have greatly influenced and promoted the field of bioinformatics. Parts of these advances are new high throughput platforms for biomarker discovery and new algorithms for feature selection and classification. This thesis is dedicated to a class of feature selection and classification algorithms that are based on a new paradigm of artificial intelligence and pattern recognition known as swarm intelligence. A particular algorithm considered is Ant Colony Optimization (ACO) which is applied to a recently emerged biomarker platform based on printed glycan arrays (PGA). The thesis proposes an implementation of the ACO which is specially tuned for diagnosis of cancer using PGA data. The implementation is evaluated on real clinical data obtained from the School of Medicine of NYU, which contain 65 control samples of highrisk subjects exposed to asbestos and 5 subjects diagnosed with malignant mesothelioma. The results are compared to artificially generated data which have general characteristics similar to the original real data.

6 vi TABLE OF CONTENTS PAGE ABSTRACT...v LIST OF TABLES... ix LIST OF FIGURES...x ACKNOWLEDGEMENTS... xiii CHAPTER 1 INTRODUCTION MESOTHELIOMA STUDY AND PRINTED GLYCAN ARRAY Mesothelioma Study, Demographics and Goals Printed Glycan Array General Information Structure of Data for MATLAB Data Preprocessing Normalization Quantile Normalization Intra -Slide Normalization Inter-Slide Normalization Transformation Transformation Necessity Implementation and Result FEATURE SELECTION AND CLASSIFICATION Univariate Feature Selection Multivariate Feature Selection Forward Sequential Feature Selection (FWD) Recursive Feature Elimination (RFE) Genetic Algorithm (GA) Ant Colony Optimization (ACO) Classification and Regression Trees (C&RT/C4.5)...23

7 vii Random Forest Trees (RF) Classifiers Multiple Logistic Regression (MLR) Generalized Linear Model (GLM) Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Naive Bayes/Mahalanobis Distance K-Nearest Neighbor (KNN) Classifier Performance Measures Accuracy Area Under the ROC Curve (AUC) Cross Validation Leave-One-Out Cross Validation (LOOCV) K-fold Cross Validation Hold-Out Cross Validation ANT COLONY OPTIMIZATION ALGORITHM Theory and the Algorithm Implementation Optimization Objective M-Files Step 1: Initialization Step 2: Population Step 3: Evaluation Step 4: Deposition Step 5: Preparation for New Iteration Optional Step: Randomization Empirical Tuning of ACO Parameters Number of Ants Stopping Criteria Application of ACO to Artificial Data Generation of Artificial Data Contaminated Artificial Data...54

8 4.4.3 Results Application of ACO to Mesothelioma Study Experiment Design Results COMPARISON OF ACO WITH OTHER APPROACHES IN CLASSIFICATION Genetic AUC Optimizer (GAUC) Experiment Design Efficiency Test Stability Test Cross-Validation Performance on Raw and Contaminated Mesothelioma Data Results Efficiency Test Stability Test Cross-Validation Performance on Raw and Contaminated Mesothelioma Data CONCLUSION...91 REFERENCES...93 viii

9 ix LIST OF TABLES PAGE Table 3.1. WMW Rank Calculation Demonstration Table 3.2. Result of Applying WMW to Mesothelioma Data Set...18 Table 4.1. Parameters to be Initialized for ACO...41 Table 4.2. Summary of Number of Ants Tuning (m = 4)...5 Table 4.3. Summary of Number of Ants Tuning Verification (m = 6)...5 Table 4.4. OCI-GID Reference Table...54 Table 4.5. Contamination Parameters for Artificial Data...56 Table 4.6. Selected Values of Parameters for Contamination...56 Table 4.7. The Performance of WMW on Artificial Data without/with Contamination...59 Table 4.8. Performance of WMW and ACO on Artificial Data...59 Table 4.9. Performance of WMW and ACO Applied on Artificial Data - Bootstrap...65 Table 4.1. Result of Comparing ACO Repeats and WMW on Mesothelioma Data Set...7 Table Result of Comparing ACO Repeats and WMW on Subsampled Mesothelioma Data Set...73 Table 5.1. Execution Time for Each Combination...77 Table 5.2. Average AUC Values for Each Combination...77 Table 5.3. Results of Stability Experiment...86 Table 5.4. Cross Validation on Mesothelioma Data Best AUC Values...87 Table 5.5. Cross Validation on Mesothelioma Data Features at Best AUC Values...87 Table 5.6. Cross Validation on Mesothelioma Data Best Stability...88 Table 5.7. Cross Validation on Mesothelioma Data Features at Best Stability...88 Table 5.8. Compare ACO and WMW on Raw Mesothelioma Data and Normalized Data...88 Table 5.9. Compare ACO and WMW on Contaminated Mesothelioma Datasets Repeated Training...89 Table 5.1. Compare ACO and WMW on Contaminated Mesothelioma Datasets Bootstrap...9

10 x LIST OF FIGURES PAGE Figure 2.1. The data structure of mesothelioma PGAs data for MATLAB....7 Figure 2.2. Diagrammatic explanation of quantile normalization of training and test data Figure 2.3. Raw data and transformed data with different lambda using Box-Cox transformation Figure 3.1. Best features distribution plot for mesothelioma data set Figure 3.2. Plotting GA Fitness (Best and Average Values) Figure 3.3. Maximum-margin hyperplane and margins for an SVM trained with samples from two classes Figure 3.4. ROC curve space Figure 4.1. The flow chart of ACO for feature selection Figure 4.2. Flow chart of moving ants for ACO Figure 4.3. Flow chart for solution evaluation and pheromone table updating Figure 4.4. Plot of ACO performance, using 1 ants to select 4 features in 1 iterations Figure 4.5. Plot of ACO performance, using 25 ants to select 4 features in 1 iterations Figure 4.6. Plot of ACO performance, using 5 ants to select 4 features in 1 iterations Figure 4.7. Plot of ACO performance, using 1 ants to select 4 features in 1 iterations Figure 4.8. Plot of ACO performance, using 2 ants to select 4 features in 1 iterations Figure 4.9. Plot for ACO stopping criteria analysis demonstrating the trend of the ACO performance with iteration increasing Figure 4.1. Distribution of the best AUC values in the 1 repeats of ACO function without/with stopping criteria Figure Plots of patients distributions for the best features of artificial data without noise contamination Figure Plots of patients distributions for the best features of artificial data contamination level: Tiny....57

11 Figure Plots of patients distributions for the best features of artificial data with medium contamination level Figure Plots of patients distribution for the best features of artificial data with high contamination level Figure Histogram of selected features obtained by repeated ACO applied to artificial data without contamination....6 Figure Histogram of AUC values obtained by repeated ACO applied to artificial data without contamination Figure Histogram of selected features obtained by repeated ACO applied to artificial data with tiny contamination Figure Histogram of AUC values obtained by repeated ACO applied to artificial data with tiny contamination Figure Histogram of selected features obtained by repeated ACO applied to artificial data with mediun contamination Figure 4.2. Histogram of AUC values obtained by repeated ACO applied to artificial data with medium contamination Figure Histogram of selected features obtained by repeated ACO applied to artificial data with heavy contamination Figure Histogram of AUC values obtained by repeated ACO applied to artificial data with heavy contamination Figure Repeated ACO applied to re-sampled artificial data without contamination Figure Repeated ACO applied to re-sampled artificial data with tiny contamination Figure Repeated ACO applied to on re-sampled artificial data with medium contamination Figure Repeated ACO applied to on re-sampled artificial data with heavy contamination Figure Repeated WMW applied to on re-sampled original artificial data Figure Repeated WMW applied to on re-sampled artificial data with tiny contamination Figure Repeated WMW applied to on re-sampled artificial data with medium contamination Figure 4.3. Repeated WMW applied to on re-sampled artificial data with heavy contamination Figure Histogram of repeated ACO on original mesothelioma data....7 xi

12 Figure Histogram of AUC values obtained by repeated ACO applied to subsampled mesothelioma data Figure Histogram of selected features obtained by repeated ACO applied to subsampled mesothelioma data Figure Histogram of AUC values obtained by repeated WMW applied to subsampled mesothelioma data Figure Histogram of selected features obtained by repeated WMW applied to subsampled mesothelioma data Figure 5.1. The fitness progress of GAUC on mesothelioma data Figure 5.2. Histogram for selected features in stability experiment ACO-GLM Figure 5.3. Histogram for selected features in stability experiment ACO-SVM Figure 5.4. Histogram for selected features in stability experiment ACO-GA Figure 5.5. Histogram for selected features in Stability Experiment ACO FLD Figure 5.6. Histogram for selected features in stability experiment WMW GLM....8 Figure 5.7. Histogram for selected features in stability experiment WMW SVM....8 Figure 5.8. Histogram for selected features in stability experiment WMW GA Figure 5.9. Histogram for selected features in stability experiment WMW FLD Figure 5.1. Histogram for selected features in stability experiment GA GLM Figure 5.11 Histogram for selected features in stability experiment GA SVM Figure Histogram for selected features in stability experiment GA GA Figure Histogram for selected features in stability experiment GA - FLD Figure Histogram for selected features in stability experiment FWD GLM Figure Histogram for selected features in stability experiment FWD SVM Figure Histogram for selected features in stability experiment FWD GA Figure Histogram for selected features in stability experiment FWD FLD Figure Cross validation results on contaminated mesothelioma dataset....9 xii

13 xiii ACKNOWLEDGEMENTS Dr. Marko Vuskovic has been the ideal thesis supervisor. His sage advice, patient encouragement as well as cogent criticisms aided the writing of the thesis. I would also like to thank Dr. Joseph Lewis whose suggestions to this study were greatly needed.

14 1 CHAPTER 1 INTRODUCTION With the development of computer capabilities and deployment of advanced algorithms, biomarker discovery is becoming an important topic in bioinformatics applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. The stability with respect to sampling variation or robustness of such selection processes has received attention recently. Robustness of bio-markers is an important issue, as it may greatly influence subsequent biological validations. Besides the process of feature selection, classification plays an important role in the procedure of bio-marker s discovery as well. It is usually used as performance evaluation based on the result from feature selection. Numbers of methods could be involved in this process, including logistic regression, fisher linear discriminant, support vector machine and many others. Recently, the Ant Colony Optimization and Genetic Algorithm are introduced to implement the classification. The investigators at the Glycomic Laboratory of the NYU, School of Medicine [1] are evaluating a novel means of detecting mesothelioma and lung cancer early through what could ultimately be a simple blood test. They have developed a unique cancer diagnostic approach that utilizes a printed glycan array (PGA). This new high-throughput platform contains 286 carbohydrate molecules (glycans) that are often expressed on the surfaces of human cells, including abnormal sugars produced by lung cancer cells in response to changes induced by the cancer process. Researchers can measure antibodies against these abnormal glycans in the blood of people with mesothelioma or lung adenocarcinoma or those at risk for these diseases. This test could also be a tool for identifying new therapeutic targets. The scientists are developing this array as a global way of looking at molecules that may serve as very early markers to indicate that something is wrong inside lung or mesothelium cells. This information could be used to determine if someone is at risk for the mesothelioma or lung cancer or if someone who already has the disease is likely to do poorly and may need more aggressive therapy.

15 2 One of the basic problems in bioinformatics is that biomarker platforms deal with generally large number of features that can range from hundreds to thousands. Most of them are non-informative and ineffective in discrimination of patients as control or case group. Thus, the feature selection is used to select the most relevant glycans, or remove the noisy ones. There are several feature selection algorithms available nowadays. Basically, they can be divided into two groups, univariate and multivariate feature selection algorithms. Univariate methods treat existing candidate features individually. The performance of each feature in discrimination is evaluated separately. All features are then ranked by their performance and the top features would be used to train the classifier. In multivariate methods, features are treated as a group of dependent variables. Many algorithms for multivariate feature selection are developed, such as Recursive Feature Accumulation (RFA), Recursive Feature Elimination (RFE) and sequential forward/backward feature selections. These algorithms are developed as a compromise to global optimization which in case of large number of features becomes infeasible. There are, however, heuristic algorithms for feature selection which perform nearly real global optimization, such as Genetic Algorithm and Ant Colony Optimization. The latter will be the focus of this thesis. Ant Colony Optimization was initially proposed by Marco Dorigo in 1992 in his Ph.D Thesis [2]. It is a probabilistic technique for solving computational problems which can be reduced to finding a good path through a graph. By moving on the map from data model, ants can communicate with each other to transform information of the goal. ACO in this research is used to find an optimal subset from candidate features. The goal of this thesis is to apply the ideas of ACO to the diagnosis of cancer diseases based on data obtained from PGA. The study includes implementation of ACO based algorithms, analysis of performance and tuning of algorithmic parameters, and demonstration of the application of the developed software on the diagnosis of mesothelioma and lung cancer. The implementation of ACO includes computation of an important classification performance measure called area under the Receiver Operating Characteristic Curve (AUC), directly as opposed to computation of AUC after feature selection and projection. By comparing the results of application of ACO and other F/S method on PGA data, this study provides a better view on this new approach in cancer detection. Although ACO

16 3 doesn t achieve the best performance among other methods, it performs well with noisy data, when other algorithms fail. The material in this thesis is organized as follows: In Chapter 1, we introduce general concepts of technologies used in the research and the organization of this thesis. In Chapter 2, we introduce details about the PGA data for mesothelioma study. In Chapter 3, we discuss other feature selection and classification, including univariate and multivariate feature selection algorithms, classification models and the methods for classifier evaluation. In Chapter 4, we discuss the implementation of ACO including the parameter tuning and evaluation of ACO with both, artificial data and real mesothelioma data. Chapter 5 describes the experiments designed to evaluate the performance of different feature selection methods, combined with different classification algorithms. Chapter 6 presents the conclusion from experiments and discusses a possible future work which is enabled by the research in this thesis.

17 4 CHAPTER 2 MESOTHELIOMA STUDY AND PRINTED GLYCAN ARRAY 2.1 MESOTHELIOMA STUDY, DEMOGRAPHICS AND GOALS Mesothelioma, more precisely malignant mesothelioma (MM), is a rare form of cancer that develops in the protective lining that covers many of the body s internal organs, the mesothelium. It is usually caused by exposure to asbestos [3]. Its most common site is the pleura (outer lining of the lungs and internal chest wall), but it may also occur in the peritoneum (the lining of the abdominal cavity), the heart, the pericardium (a sac that surrounds the heart) [4] or tunica vaginalis. Most people who develop mesothelioma have worked on jobs where they inhaled asbestos and glass particles, or they have been exposed to asbestos dust and fiber in other ways. Unlike lung cancer, there is no association between mesothelioma and smoking, but smoking greatly increases the risk of other asbestos-related cancers [5]. Those who have been exposed to asbestos often utilize attorneys to collect damages for asbestos-related disease, including mesothelioma. Compensation via asbestos funds or lawsuits is an important issue in mesothelioma. The symptoms of mesothelioma include shortness of breath due to pleural effusion (fluid between the lung and the chest wall) or chest wall pain, and general symptoms such as weight loss. The diagnosis may be suspected with chest X-ray or CT scan, and is confirmed with a biopsy (tissue sample) and microscopic examination. Diagnosing mesothelioma is often difficult, because the symptoms are similar to those of a number of other conditions. Diagnosis begins with a review of the patient s medical history. A history of exposure to asbestos may increase clinical suspicion for mesothelioma. A physical examination is performed, followed by chest X-rays and often lung function test.

18 5 The life expectancy for mesothelioma patients is generally reported as less than one year following diagnosis. However, a patient s prognosis is affected by several factors, including how early the cancer is diagnosed and how aggressively it is treated. If a problem is suspected, a physician may request several diagnostic tests. These typically include medical imaging techniques such as: X-rays; CT scans; PET scans; MRI scans. A combination of these tests is often used to determine the location, size and type of cancer. Biopsy procedures are often requested following an imaging scan to test samples of fluid and tissue for the presence of cancerous cell. In this research we will demonstrate early detection and/or diagnosis of malignant mesothelioma based on Printed Glycan Arrays (PGAs). The Mesothelioma study [6] will include 65 subjects exposed to asbestos, but not diagnosed with MM, and 5 patients diagnosed with MM. The data were obtained from serum collected by Prof. Harvey Pass, MD in the School of medicine at NYU, and developed on PGAs at Cellexicon, Inc., La Jolla, CA. The data and related results were part of the NIH-NCI grant [7] and are published in several publications, including [8] and [6, 9]. In the following sections we will describe the PGAs and their functionality and various data preprocessing algorithms which are used before ACO-based feature selection and classification. 2.2 PRINTED GLYCAN ARRAY In medicine, a biomarker can be a traceable substance that is introduced into an organism as a means to examine organ function or other aspects of health. It can also be a substance whose detection indicates a particular disease state. For example, the presence of an antibody may indicate an infection. More specifically in this research, a biomarker, glycan, indicates a change in expression or state of the immune system that correlates with the risk or progression of mesothelioma, or with the susceptibility of the disease to a given treatment. Biochemical biomarkers are often used in clinical trials, where they are derived from bodily fluids that are easily available to the early phase researchers.

19 General Information In the last five years, a new biomarker-discovery platform has emerged based on glycan arrays [9], which has some advantages over nucleic acid-based and other platforms. The printed glycan arrays are similar to DNA microarrays, but contain deposits of various carbohydrate structures (glycans) instead of spotted DNAs. Most of these glycans can be found on the surfaces of normal human cells, human cancer cells, and on the surfaces of many human infectious agents such as bacteria, viruses, and other pathogenic microorganisms. Transformation of cells from healthy to pre-malignant and malignant is associated with the appearance of abnormal glycosylation on proteins and lipids presented on the surface of these cells. The malignancy-related abnormal glycans are called tumorassociated carbohydrate antigens (TACA). There is growing evidence that numerous TACAs are immunogenic, and that the human immune system can generate antibodies against them. Since multiple glycans arrayed on PGAs are either known TACAs or closely related structures, the antibodies present in human sera that bind to glycans on PGAs can indicate the status of response of the immune system to human malignancies. A printed glycan array (PGA) consists of a glass slide coated with a chemically reactive surface on which various glycans are covalently attached using standard aminocoupling chemistry and contact printing technology. A PGA slide contains several sub-arrays of the entire currently available glycan library in the form of microscopic glycan deposits of size about 8 microns that are identical duplicates. For each slide, the data from each subarray will be processed as the raw data to which we are going to apply processing and classification algorithms. The advantage of a potential PGA-based serum test [9] for early detection of cancer and cancer risk can be summarized as follows: (a) minimal invasiveness of serum sampling; (b) minimal sampling variability, in contrast to well-known heterogeneity of solid tissue samples; (c) stability of antibodies, (d) low cost associated with technology; (e) low labor intensity and short duration of the test; (f) broad scope of the test, i.e. the test doesn t have to be narrowly targeted to a particular disease, e.g. cancer type. All these advantages make the PGA platform attractive for early detection of disease and for the potential application in screening of the general population.

20 Generally, there are five steps introduced to achieve the PGA data: printing of glycan arrays, development of arrays with serum samples, scanning, quantification and data aggregation. After all these steps, we can form a data structure based on the quantified PGA data. The detail of the data structure is discussed in next section. Due to the relatively moderate discriminatory power of individual glycans of PGA arrays, see Chapter 4, we can see the necessity of applying such an ant-based feature selection algorithm to find an optimal combination of several biomarkers for classification, modeling and other purposes. The results comparing the discriminatory power of individual glycans and combination of glycans would be discussed in Chapter Structure of Data for MATLAB We are working with data as a structure consisting mainly of a 2-dimension matrix, two row vectors, and two column vectors mainly and other auxiliary data (see Figure 2.1). One of the row vectors is called Original Column Index (OCI). The data in this vector denotes the original index of features in the matrix after data quantification and before we extract some of them from the matrix. These indices correspond to the order of glycans used in PGA library. A second row vector contains the Glycan Identification (GID) assigned to distinct features. Each GID is a unique three-digit number which denotes a specific glycan structure used in the array. One of the column vectors is the Patient Identification (PID). Each patient is assigned a unique ID for further exploration. A second row vector (y) contains binary labels, i.e. membership to control or case class for each patient. 7 OCI 1xd GID 1xd PID nx1 X nxd Y nx1 Figure 2.1. The data structure of mesothelioma PGAs data for MATLAB.

21 8 Each data element of the matrix (X) represents a fluorescent intensity in relative units (FRU) associated with the binding of anti-glycan antibodies from a serum of a patient (rows) and glycans (columns). The most critical part in the above data structure is the 2-dimension matrix X. Since the sample data for different patients could be collected by different physicians and equipment at different locations and even in different years, the data might not be comparable. These differences could introduce biases and other unexpected impacts on the data, which would reduce the reliability of the glycans (features) and even cause incorrect classification. Therefore, an important step is necessary: the preprocessing of raw data, explained in the following section. 2.3 DATA PREPROCESSING In the real world, the data are generally incomplete, noisy and inconsistent [1]. They could be lacking attribute values, lacking certain attributes of interest and containing errors or outliers. Sometimes the data might contain discrepancies caused by the variety of equipment and environment. Data preprocessing describes any type of processing performed on raw data to prepare it for another processing procedure [11]. Commonly used as a preliminary data mining practice, data preprocessing transforms the data into a format that will be more easily and effectively processed for the purpose of the user. There are a number of different tools and methods used for preprocessing. The most common tools are normalization and transformation. In this study, the ultimate goal is to extract most significant information from the PGAs to classify patients correctly. From this point of view, every step before the classification, including PGAs development, data normalization and transformation as well as feature selection, can be considered as preprocessing. Considering the main object in this study of evaluating the performance of the Ant-based algorithm in data feature selection, we will only consider normalization and transformation as preprocessing steps. In the following two sub-sections, we discuss normalization, which organizes data for more efficient access; and transformation, which manipulates raw data to produce a single input.

22 Normalization In one usage in statistics, normalization is the process of isolating statistical error in repeated measured data. In another usage, normalization refers to the division of multiple sets of data by a common variable in order to negate the variable s effect on the data, thus allowing underlying characteristics of the data set to be compared. This allows data on different scales to be compared, by bringing them to a common scale. For example, in this study a PGA image is developed with patient s serum and glycans on the glass slides, which are scanned by a laser scanner for quantification. The image for the patients could vary because of the equipment, location or even climate differences. To handle such a problem, normalization is necessary to ensure the data for patients are comparable so that the following steps, feature selection and classification, will achieve reliable results. In this particular study, we use three different normalization methods: Quantile Normalization, Intra-slide Normalization and Inter-slide Normalization. These three methods are going to be applied to the raw data. The resulting processed data will be applied for the further feature selection procedure QUANTILE NORMALIZATION The goal of quantile normalization is to ensure that the distribution of intensities across all variables (glycans) is the same for each patient [12]. The method is motivated by the idea that a quantile-quantile plot shows that the distribution of two data vectors is the same if the plot is a straight diagonal line and not the same if it is other than a diagonal line. This concept is extended to n dimensions so that if all n data vectors have the same distribution, then plotting the quantiles in n dimentions gives a straight line. This suggests we could make a set of data have the same distribution if we project the point of our n dimensional quantile plot onto the diagonal. The critical part of this normalization is to find the reference distribution. Generally a reference distribution will be one of the standard statistical distributions such as the Gaussian distribution or the Poisson distribution. The reference distribution can be generated randomly or from taking regular samples from the cumulative distribution function of the distribution. However any reference distribution can be used.

Mesothelioma: Questions and Answers

Mesothelioma: Questions and Answers CANCER FACTS N a t i o n a l C a n c e r I n s t i t u t e N a t i o n a l I n s t i t u t e s o f H e a l t h D e p a r t m e n t o f H e a l t h a n d H u m a n S e r v i c e s Mesothelioma: Questions

More information

Mesothelioma. 1995-2013, The Patient Education Institute, Inc. www.x-plain.com ocft0101 Last reviewed: 03/21/2013 1

Mesothelioma. 1995-2013, The Patient Education Institute, Inc. www.x-plain.com ocft0101 Last reviewed: 03/21/2013 1 Mesothelioma Introduction Mesothelioma is a type of cancer. It starts in the tissue that lines your lungs, stomach, heart, and other organs. This tissue is called mesothelium. Most people who get this

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Cross-Validation. Synonyms Rotation estimation

Cross-Validation. Synonyms Rotation estimation Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

More information

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Rulex s Logic Learning Machines successfully meet biomedical challenges.

Rulex s Logic Learning Machines successfully meet biomedical challenges. Rulex s Logic Learning Machines successfully meet biomedical challenges. Rulex is a predictive analytics platform able to manage and to analyze big amounts of heterogeneous data. With Rulex, it is possible,

More information

S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY

S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

FREQUENTLY ASKED QUESTIONS about asbestos related diseases

FREQUENTLY ASKED QUESTIONS about asbestos related diseases FREQUENTLY ASKED QUESTIONS about asbestos related diseases 1. What are the main types of asbestos lung disease? In the human body, asbestos affects the lungs most of all. It can affect both the spongy

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision

More information

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1 BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects

More information

Beating the NCAA Football Point Spread

Beating the NCAA Football Point Spread Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Decision Trees What Are They?

Decision Trees What Are They? Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a

More information

Understanding Pleural Mesothelioma

Understanding Pleural Mesothelioma Understanding Pleural Mesothelioma UHN Information for patients and families Read this booklet to learn about: What is pleural mesothelioma? What causes it? What are the symptoms? What tests are done to

More information

The Best of Both Worlds:

The Best of Both Worlds: The Best of Both Worlds: A Hybrid Approach to Calculating Value at Risk Jacob Boudoukh 1, Matthew Richardson and Robert F. Whitelaw Stern School of Business, NYU The hybrid approach combines the two most

More information

Asbestos Related Diseases

Asbestos Related Diseases Asbestos Related Diseases Asbestosis Mesothelioma Lung Cancer Pleural Disease Asbestosis and Mesothelioma (LUNG CANCER) Support Group 1800 017 758 www.amsg.com.au ii Helping you and your family through

More information

Asbestos and your lungs

Asbestos and your lungs This information describes what asbestos is and the lung conditions that are caused by exposure to it. It also includes information about what to do if you have been exposed to asbestos, and the benefits

More information

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

NATIONAL GENETICS REFERENCE LABORATORY (Manchester) NATIONAL GENETICS REFERENCE LABORATORY (Manchester) MLPA analysis spreadsheets User Guide (updated October 2006) INTRODUCTION These spreadsheets are designed to assist with MLPA analysis using the kits

More information

Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

More information

Transcript for Asbestos Information for the Community

Transcript for Asbestos Information for the Community Welcome to the lecture on asbestos and its health effects for the community. My name is Dr. Vik Kapil and I come to you from the Centers for Disease Control and Prevention, Agency for Toxic Substances

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

بسم هللا الرحمن الرحيم

بسم هللا الرحمن الرحيم بسم هللا الرحمن الرحيم Updates in Mesothelioma By Samieh Amer, MD Professor of Cardiothoracic Surgery Faculty of Medicine, Cairo University History Wagner and his colleagues (1960) 33 cases of mesothelioma

More information

Asbestos & Mesothelioma Cases. Presented by Sara Salger On behalf of Gori, Julian & Associates, P.C., Edwardsville, IL

Asbestos & Mesothelioma Cases. Presented by Sara Salger On behalf of Gori, Julian & Associates, P.C., Edwardsville, IL Asbestos & Mesothelioma Cases Presented by Sara Salger On behalf of Gori, Julian & Associates, P.C., Edwardsville, IL What you know about Asbestos & Mesothelioma Insert Clip Here Definition of Asbestos

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Applied Multivariate Analysis - Big data analytics

Applied Multivariate Analysis - Big data analytics Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of

More information

What is Mesothelioma?

What is Mesothelioma? What is Mesothelioma? Mesothelioma is a rare type of cancer that develops in the mesothelial cells found in one s body. These cells form membranous linings that surround and protect the body s organs and

More information

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis Thang V. Pham and Connie R. Jimenez OncoProteomics Laboratory, Cancer Center Amsterdam, VU University Medical Center De Boelelaan 1117,

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

If you are signing for a minor child, you refers to your child throughout the consent document.

If you are signing for a minor child, you refers to your child throughout the consent document. CONSENT TO PARTICIPATE IN A CLINICAL RESEARCH STUDY Adult Patient or Parent, for Minor Patient INSTITUTE: National Cancer Institute PRINCIPAL INVESTIGATOR: Raffit Hassan, M.D. STUDY TITLE: Tissue Procurement

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

Asbestos Related Diseases. Asbestosis Mesothelioma Lung Cancer Pleural Disease. connecting raising awareness supporting advocating

Asbestos Related Diseases. Asbestosis Mesothelioma Lung Cancer Pleural Disease. connecting raising awareness supporting advocating Asbestos Related Diseases Asbestosis Mesothelioma Lung Cancer Pleural Disease connecting raising awareness supporting advocating 1800 017 758 www.asbestosassociation.com.au Asbestos lagging was widely

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

An ACO Approach to Solve a Variant of TSP

An ACO Approach to Solve a Variant of TSP An ACO Approach to Solve a Variant of TSP Bharat V. Chawda, Nitesh M. Sureja Abstract This study is an investigation on the application of Ant Colony Optimization to a variant of TSP. This paper presents

More information

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello Extracting Knowledge from Biomedical Data through Logic Learning Machines and Rulex Marco Muselli Institute of Electronics, Computer and Telecommunication Engineering National Research Council of Italy,

More information

Malignant Mesothelioma

Malignant Mesothelioma Malignant mesothelioma is a tumour originating from mesothelial cells. 85 95% of mesotheliomas are caused by asbestos exposure. It occurs much more commonly in the chest (malignant pleural mesothelioma)

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Non-Inferiority Tests for One Mean

Non-Inferiority Tests for One Mean Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Computer-Aided Multivariate Analysis

Computer-Aided Multivariate Analysis Computer-Aided Multivariate Analysis FOURTH EDITION Abdelmonem Af if i Virginia A. Clark and Susanne May CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C Contents Preface

More information

Ant Colony Optimization and Constraint Programming

Ant Colony Optimization and Constraint Programming Ant Colony Optimization and Constraint Programming Christine Solnon Series Editor Narendra Jussien WILEY Table of Contents Foreword Acknowledgements xi xiii Chapter 1. Introduction 1 1.1. Overview of the

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Functional Data Analysis of MALDI TOF Protein Spectra

Functional Data Analysis of MALDI TOF Protein Spectra Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF

More information

IBM SPSS Missing Values 22

IBM SPSS Missing Values 22 IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

James Rhio O Conner Memorial Scholarship Essay

James Rhio O Conner Memorial Scholarship Essay Farris 1 James Rhio O Conner Memorial Scholarship Essay Cancer is a growing medical phenomenon that is continuing to increase and take the lives of many people. There are several beliefs and opinions as

More information

Fingerprinting the Datacenter: Automated Classification of Performance Crises

Fingerprinting the Datacenter: Automated Classification of Performance Crises Fingerprinting the Datacenter: Automated Classification of Performance Crises Peter Bodík University of California, Berkeley Armando Fox University of California, Berkeley Moises Goldszmidt Microsoft Research

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Hamers S O L I C I T O R S. Jim Wyatt jwyatt@hamers.com. Freephone: 0800 591 999. 5 Earls Court, Priory Park, East, Hull HU4 7DY

Hamers S O L I C I T O R S. Jim Wyatt jwyatt@hamers.com. Freephone: 0800 591 999. 5 Earls Court, Priory Park, East, Hull HU4 7DY Hamers S O L I C I T O R S Jim Wyatt jwyatt@hamers.com Freephone: 0800 591 999 5 Earls Court, Priory Park, East, Hull HU4 7DY Tel: 01482 326666 Fax: 01482 324432 www.hamers.com Hamers Solicitors LP is

More information

SOLiD System accuracy with the Exact Call Chemistry module

SOLiD System accuracy with the Exact Call Chemistry module WHITE PPER 55 Series SOLiD System SOLiD System accuracy with the Exact all hemistry module ONTENTS Principles of Exact all hemistry Introduction Encoding of base sequences with Exact all hemistry Demonstration

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring Delivering up to 2500 MRM Transitions per LC Run Christie Hunter 1, Brigitte Simons 2 1 AB SCIEX,

More information

HEALTH EFFECTS. Inhalation

HEALTH EFFECTS. Inhalation Health Effects HEALTH EFFECTS Asbestos can kill you. You must take extra precautions when you work with asbestos. Just because you do not notice any problems while you are working with asbestos, it still

More information

Beating the NFL Football Point Spread

Beating the NFL Football Point Spread Beating the NFL Football Point Spread Kevin Gimpel kgimpel@cs.cmu.edu 1 Introduction Sports betting features a unique market structure that, while rather different from financial markets, still boasts

More information

Getting insights about life cycle cost drivers: an approach based on big data inspired statistical modelling

Getting insights about life cycle cost drivers: an approach based on big data inspired statistical modelling Introduction A Big Data applied to LCC Conclusion, Getting insights about life cycle cost drivers: an approach based on big data inspired statistical modelling Instituto Superior Técnico, Universidade

More information

Hong Kong Stock Index Forecasting

Hong Kong Stock Index Forecasting Hong Kong Stock Index Forecasting Tong Fu Shuo Chen Chuanqi Wei tfu1@stanford.edu cslcb@stanford.edu chuanqi@stanford.edu Abstract Prediction of the movement of stock market is a long-time attractive topic

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR Results Interpretation Guide Pathogen Detection Systems by Real Time PCR Microbial offers real time PCR based systems for the detection of pathogenic bacteria

More information

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer? Dynamic Predictive Modeling in Claims Management - Is it a Game Changer? Anil Joshi Alan Josefsek Bob Mattison Anil Joshi is the President and CEO of AnalyticsPlus, Inc. (www.analyticsplus.com)- a Chicago

More information

Multivariate Analysis. Overview

Multivariate Analysis. Overview Multivariate Analysis Overview Introduction Multivariate thinking Body of thought processes that illuminate the interrelatedness between and within sets of variables. The essence of multivariate thinking

More information

CLOUD DATABASE ROUTE SCHEDULING USING COMBANATION OF PARTICLE SWARM OPTIMIZATION AND GENETIC ALGORITHM

CLOUD DATABASE ROUTE SCHEDULING USING COMBANATION OF PARTICLE SWARM OPTIMIZATION AND GENETIC ALGORITHM CLOUD DATABASE ROUTE SCHEDULING USING COMBANATION OF PARTICLE SWARM OPTIMIZATION AND GENETIC ALGORITHM *Shabnam Ghasemi 1 and Mohammad Kalantari 2 1 Deparment of Computer Engineering, Islamic Azad University,

More information

SACOC: A spectral-based ACO clustering algorithm

SACOC: A spectral-based ACO clustering algorithm SACOC: A spectral-based ACO clustering algorithm Héctor D. Menéndez, Fernando E. B. Otero, and David Camacho Abstract The application of ACO-based algorithms in data mining is growing over the last few

More information

Adaptation of the ACO heuristic for sequencing learning activities

Adaptation of the ACO heuristic for sequencing learning activities Adaptation of the ACO heuristic for sequencing learning activities Sergio Gutiérrez 1, Grégory Valigiani 2, Pierre Collet 2 and Carlos Delgado Kloos 1 1 University Carlos III of Madrid (Spain) 2 Université

More information

Occupational respiratory diseases due to Asbestos. Dirk Dahmann, IGF, Bochum

Occupational respiratory diseases due to Asbestos. Dirk Dahmann, IGF, Bochum Occupational respiratory diseases due to Asbestos Dirk Dahmann, IGF, Bochum Contents Introduction Diseases Further Effects Preventive Strategies Conclusion Asbestos minerals Woitowitz, 2003 Imports (+

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee ltakeuch@stanford.edu yy.albert.lee@gmail.com Abstract We

More information

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects Journal of Computer Science 2 (2): 118-123, 2006 ISSN 1549-3636 2006 Science Publications Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects Alaa F. Sheta Computers

More information

Targeting Specific Cell Signaling Pathways for the Treatment of Malignant Peritoneal Mesothelioma

Targeting Specific Cell Signaling Pathways for the Treatment of Malignant Peritoneal Mesothelioma The Use of Kinase Inhibitors: Translational Lab Results Targeting Specific Cell Signaling Pathways for the Treatment of Malignant Peritoneal Mesothelioma Sheelu Varghese, Ph.D. H. Richard Alexander, M.D.

More information

IBM SPSS Direct Marketing 20

IBM SPSS Direct Marketing 20 IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to

More information

Predictive Analytics Certificate Program

Predictive Analytics Certificate Program Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

. 1/ CHAPTER- 4 SIMULATION RESULTS & DISCUSSION CHAPTER 4 SIMULATION RESULTS & DISCUSSION 4.1: ANT COLONY OPTIMIZATION BASED ON ESTIMATION OF DISTRIBUTION ACS possesses

More information

A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine November 3, 2008

A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine November 3, 2008 A Cross-Sectional Study of Asbestos- Related Morbidity and Mortality in Vermonters Residing Near an Asbestos Mine 108 Cherry Street, PO Box 70 Burlington, VT 05402 802.863.7200 healthvermont.gov A Cross-Sectional

More information

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models Integer Programming INTEGER PROGRAMMING In many problems the decision variables must have integer values. Example: assign people, machines, and vehicles to activities in integer quantities. If this is

More information

Ant Colony Optimization (ACO)

Ant Colony Optimization (ACO) Ant Colony Optimization (ACO) Exploits foraging behavior of ants Path optimization Problems mapping onto foraging are ACO-like TSP, ATSP QAP Travelling Salesman Problem (TSP) Why? Hard, shortest path problem

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

Treatment Guide Lung Cancer Management

Treatment Guide Lung Cancer Management Treatment Guide Lung Cancer Management The Chest Cancer Center at Cleveland Clinic, which includes specialists from the Respiratory Institute, Taussig Cancer Institute and Miller Family Heart & Vascular

More information

Gene Selection for Cancer Classification using Support Vector Machines

Gene Selection for Cancer Classification using Support Vector Machines Gene Selection for Cancer Classification using Support Vector Machines Isabelle Guyon+, Jason Weston+, Stephen Barnhill, M.D.+ and Vladimir Vapnik* +Barnhill Bioinformatics, Savannah, Georgia, USA * AT&T

More information

Mesothelioma Understanding your diagnosis

Mesothelioma Understanding your diagnosis Mesothelioma Understanding your diagnosis Mesothelioma Understanding your diagnosis When you first hear that you have cancer, you may feel alone and afraid. You may be overwhelmed by the large amount of

More information

Imputing Values to Missing Data

Imputing Values to Missing Data Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data

More information

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex

More information

Classification Techniques for Remote Sensing

Classification Techniques for Remote Sensing Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551

More information

Asbestos Brochure. Jim Wyatt - jwyatt@hamers.com Stephen Ball - sball@hamers.com. Freephone: 0800 591 999. www.hamers.com

Asbestos Brochure. Jim Wyatt - jwyatt@hamers.com Stephen Ball - sball@hamers.com. Freephone: 0800 591 999. www.hamers.com Jim Wyatt - jwyatt@hamers.com Stephen Ball - sball@hamers.com Freephone: 0800 591 999 5 Earls Court, Priory Park East, Hull, HU4 7DY Tel: 01482 326666 Fax: 01482 324432 Aspect Court, 47 Park Square East,

More information

Asbestos Disease: An Overview for Clinicians Asbestos Exposure

Asbestos Disease: An Overview for Clinicians Asbestos Exposure Asbestos Asbestos Disease: An Overview for Clinicians Asbestos Exposure Asbestos: A health hazard Exposure to asbestos was a major occupational health hazard in the United States. The first large-scale

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Contents. Abstract...i. Committee Membership... iii. Foreword... vii. 1 Scope...1

Contents. Abstract...i. Committee Membership... iii. Foreword... vii. 1 Scope...1 ISBN 1-56238-584-4 Volume 25 Number 27 ISSN 0273-3099 Interference Testing in Clinical Chemistry; Approved Guideline Second Edition Robert J. McEnroe, PhD Mary F. Burritt, PhD Donald M. Powers, PhD Douglas

More information

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer

More information

Credit Risk Models. August 24 26, 2010

Credit Risk Models. August 24 26, 2010 Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing

More information

Identification of noisy variables for nonmetric and symbolic data in cluster analysis

Identification of noisy variables for nonmetric and symbolic data in cluster analysis Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,

More information

An ant colony optimization for single-machine weighted tardiness scheduling with sequence-dependent setups

An ant colony optimization for single-machine weighted tardiness scheduling with sequence-dependent setups Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 22-24, 2006 19 An ant colony optimization for single-machine weighted tardiness

More information