Computer-assisted diagnosis of breast cancer using a data-driven Bayesian belief network

Size: px
Start display at page:

Download "Computer-assisted diagnosis of breast cancer using a data-driven Bayesian belief network"

Transcription

1 International Journal of Medical Informatics 54 (1999) Computer-assisted diagnosis of breast cancer using a data-driven Bayesian belief network Xiao-Hui Wang, Bin Zheng, Walter F. Good *, Jill L. King, Yuan-Hsiang Chang Imaging Research Di ision, Department of Radiology, Uni ersity of Pittsburgh, A439 Scaife Hall, Pittsburgh, PA , USA Accepted 4 December 1998 Abstract This study investigates a simple Bayesian belief network for the diagnosis of breast cancer, and specifically addresses the question of whether integrating image and non-image based features into a single network can yield better performance than hybrid combinations of independent networks. From a dataset of 419 cases, including 92 malignancies, 13 features relating to mammographic findings, physical examinations and patients clinical histories, were extracted to build three Bayesian belief networks. The scenarios tested included a network incorporating all features and two hybrids which combined the outputs of sub-networks corresponding to the image or non-image features. Average areas (A z ) under the corresponding ROC curves were used as measures of performance. The network incorporating only image based features performed better (A z =0.81) than that using nonimage features (A z =0.71). Both hybrid classifiers yielded better performance (A z =0.85 for averaging and A z =0.87 for logistic regression), but neither hybrid was as accurate as the network incorporating all features (A z =0.89). This preliminary study suggests that, like human observers who concurrently consider different types of information, a single classifier that simultaneously evaluates both image and non-image information can achieve better diagnostic performance than the hybrid combinations considered here Elsevier Science Ireland Ltd. All rights reserved. Keywords: Bayesian belief network; Breast cancer; Computer-assisted diagnosis; Classifier; Cross-validation; Machine learning 1. Introduction * Corresponding author. Tel.: ; fax: Mammography is currently the most effective diagnostic tool for the early detection of breast cancer. But because of the complexity of tissue patterns represented in mam /99/$ - see front matter 1999 Elsevier Science Ireland Ltd. All rights reserved. PII: S (98)

2 116 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) mograms, combined with the low prevalence of cancer in screening environments, the early diagnosis of breast cancer can be a difficult task. Clinical studies have shown that radiologists may initially miss 10 30% of breast cancers that are visible in mammograms [1], and that less than 30% of the patients who have undergone biopsy were found to have breast cancers [2]. One effective approach for improving diagnostic accuracy is the independent double-reading of mammograms [3], however this approach is both inefficient and costly. As an alternative, computerized decision aids which can be used to assist physicians in the diagnosis of breast cancer, have become a topic of extensive research during the past decade [4,5] and many of these approaches have demonstrated potential value for improving diagnosis. A number of decision systems have been developed which use mammographic features described by radiologists and other relevant clinical information for predicting the risk or probability of having breast cancer. An early attempt at providing a decision aid to mammographers applied discriminant analysis to identify lists of perceptual features, which were weighted according to their relative importance, and evaluated by a computer-based classifier [6 8]. More recently, a similar method was adopted to combine evidence from diaphanography and mammography [9]. Cook et al. have developed a rule based expert system which incorporates features from mammograms, as well as from clinical and patient history data [10]. Various efforts have also been undertaken to apply neural network technology to breast cancer diagnosis. These include the development of a system to classify features extracted from mammograms by radiologists [11] and the development of a classification scheme which incorporates patient age as well as mammographic features [12 14]. Much of the current interest in the development of decision systems centers around techniques employing Bayesian networks, and these networks have been applied to a number of problems similar to those for which neural networks have traditionally been used. These networks, which are often called belief networks, Bayesian belief networks, or probabilistic causal networks, are represented as directed acyclic graphs, where the nodes correspond to variables and the links between nodes relate to the independence assumptions which hold between the nodes. For each node there is a probability function which specifies for each value of the variable, and for each value of its parents variables, the posterior probability of the node given the value of the parent (i.e. P(Node Parent i )) [15 19]. Once input parameters to the network have been specified, the network generates a hypothesis about the values of the remaining parameters, which is optimal in the sense that no other hypothesis is more likely [15,16]. Bayesian networks are very attractive for medical diagnostic systems because they can be applied to make inferences in cases where the input data is incomplete. This is the situation in many clinical settings where diagnostic decisions must be made on limited data, but the decision can be revised at a later time as more data becomes available. The most significant application of Bayesian networks to breast cancer detection reported to date is Kahn et al. [20,21], who used Bayesian networks as the basis of a system which combined 15 mammographic features, five patient-history features and two physical findings into a decision process for predicting the likelihood of breast cancer. The probabilities required by their network were derived largely from published statistics and the subjective estimates of expert mammographers. This system was evaluated on a

3 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) set of 77 cases which included 25 malignancies and attained a performance, as measured by receiver operating characteristic (ROC) analysis, A z = for the area under the ROC curve, where A z is the average area. This level of performance was actually higher than that achieved by radiologists reading a largely overlapping set of cases in an independent study [11]. Compared to artificial neural networks, Bayesian belief networks have certain unique advantages, in addition to their ability to work with incomplete information, mentioned above. One such advantage is that they can provide explanations of their decisions [15,22 24]. Because they provide a flexible capability for specifying dependence and independence of variables, in a natural way through the network topology, their structure tends to reflect the logical structure inherent in a decision task. In contrast, neural networks can be viewed, to a large extent, as a black box whose machine learned internal decision structure is generally incomprehensible to human observers. The probability values for links between nodes in Bayesian networks reflect degrees of dependence between variables. This makes it possible for the structure of these networks to be examined by human experts to uncover relationships between the variables, hence, enabling the assessment of the reasonableness of the decision process. Confirmation by an expert can provide some level of confidence for Bayesian networks that is not attainable in neural network implementations, and this is likely to be an important factor in their gaining acceptance in the field of medical diagnosis. An additional advantage of the Bayesian network paradigm is that, instead of using an iterative optimization approach as is the case when training artificial neural networks with procedures such as back-propagation, the weights between links of different nodes can be derived from subjective estimates of the probabilities, or from statistical reports in the medical literature, or determined from datasets by using probabilistic learning methods [16,25]. These processes can accommodate knowledge about the prior probabilities of alternative hypotheses and the probability of observing various data given some hypothesis. Despite these very positive qualities, there are certain limitations of Bayesian networks as compared to neural networks. Current implementations of Bayesian networks require that nodes be assigned discrete values. This means that continuous variables must be quantified, though in practice, given the limited accuracy with which continuous variables are usually known, this does not necessarily significantly reduce the precision of the input parameters. Using a large number possible states for a parameter increases its potential precision, but at the same time increases the size of the probability tables that must be derived and retained. A more fundamental difficulty in applying Bayesian networks relates to the computational complexity of evaluating these networks. Although the singly-connected structure of all the networks considered in this study permits them to be evaluated in a reasonable time [16,26], this is not true of Bayesian networks having a more general structure which have been shown to be computationally NP-hard [27]. The issues of computational complexity is currently an active topic of research, and improved approximation algorithms should become available in the future [27]. In current applications of Bayesian networks to the diagnosis of breast cancer, the networks are relatively small and the problem of computational complexity has not been prohibitive. Nevertheless, as these systems

4 118 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) evolve they will likely become more complex and the computational issue will have to be addressed. In this case, there may be a significant computational advantage to partitioning networks into multiple sub-networks and forming a hybrid combination of the outputs of these individual sub-networks. The question arises as to how to optimally decompose the network and whether such a partitioning will have a significantly adverse impact on overall performance. As part of our ongoing endeavor to apply intelligent system techniques to diagnostic tasks in radiology, we have begun an investigation of certain questions that have arisen in our effort to develop a Bayesian network decision mechanism for breast cancer detection. The two issues which we have attempted to address in this preliminary investigation relate to assessing the impact of a seemingly natural decomposition of a Bayesian network for breast cancer diagnosis, and to determining the feasibility of using machine learning to configure networks, by directly analyzing large databases of cases. Specifically, in this study we investigated the application of simple Bayesian networks to the diagnosis of breast cancer, based on features derived from mammographic findings, physical examination findings, and other relevant data from patients clinical histories. The networks used in this study were automatically built by applying machine learning methods to a set of training cases. With these networks, we investigated a possible decomposition of the decision task into subtasks related to the image and non-image components of the feature set, as well as the question of how best to use the combined data in a single decision aid. A description of the approach, along with the preliminary test results of a fivefold cross-validation on a set of 419 clinical cases, is reported below. 2. Materials and methods The cases used in this study were selected, in order, from the film library of Magee Womens Hospital s Breast Care Centers in Pittsburgh, PA, and correspond to mammographic examinations performed between 1987 and Cases were only used if complete follow-up documentation was available. Of the total of 419 cases selected, 92 are positive for malignancy. The verification of positive cases consisted of biopsy and/or surgical reports, while establishing a negative case required a negative follow-up for at least a 2-year period. Our database contains both mammographic and non-mammographic features. The non-mammographic features, which are related to patient history and physical exam, were extracted from patient files. To obtain mammographic features we employed a specially designed computerized scoring form, into which experienced mammographers reported their findings as they read the films. A detailed description of the design and creation of this database has been reported elsewhere [28]. Because we consider this to be primarily a feasibility study, in anticipation of more elaborate investigations to optimize both the network topology and feature set, for this study we employed a preliminary feature set and topology which were specified based on subjective evaluation and on general radiological experience. Although such a subjectively designed system is not optimal, it is sufficient to define a lower bound on the performance that can be expected from this type of system. When applying a data-driven machine learning algorithm to medical diagnosis, the size of the input feature set must be limited by the training sample size if robust performance is to be achieved [29]. At present, only a relatively small number of cases are available for this study. Thus, we limited this

5 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) study to 13 features from the database, which were used to train and test the networks. The selected features, and their possible states in our networks, are summarized in Table 1. The features include four from the mammographic findings, four from the general physical examination, and five from other patient clinical history data. Based on consideration of the dependence and independence of the selected features, we adopted the singly connected structure shown in Fig. 1 for the topology of our network. The network was built by using commercially available software, Hugin Demo [15]. Due to the definition of Bayesian belief networks [15], which includes the properties of acyclic connection and d-separation, there are no feedback loops between nodes. The absence of a link (or path) between two nodes indicates that, although the variables are not necessarily statistically independent, whatever dependence exists is assumed to not be important in the particular decision process being modeled. To build the network, we first determined a series of prior and conditional probabilities according to the network s topology. A common practice in applying Bayesian networks is to represent these probabilities in a conditional probability table [15]. Basically, our network (as shown in Fig. 1) consists of three-layers. In the first layer, there are five features derived from patients clinical histories. Since each of these features has two possible states, yes and no, as shown in Table 1, the probability table for this layer will contain ten values, but only five of which are needed to completely determine this layer. Table 1 Definition of features and their states in the Bayesian belief networks a Category Node description State description Diagnosis Physical findings Breast cancer Clinical his- tory Habit of drinking alcoholic beverages and smoking Taking female hormones Have gone through menopause Have ever been pregnant Family member has breast cancer Nipple discharge Skin thickening Breast pain Have a lump(s) Present, absent. Mammo- Architectural distortion Present, absent. graphic findings Mass Score from one to three, score from four to five, absent Microcalcification cluster Score from one to three, score from four to five, absent Asymmetry Present, absent. a Scores for both masses and microcalcification clusters are based on a scale of one to five, where one is definitely benign and five is very suspicious for malignancy.

6 120 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) Fig. 1. Topology of a Bayesian belief network to diagnose breast cancer. The first layer includes five features related to patients clinical history, the second layer contains one diagnostic feature, breast cancer, and the third layer involves eight features from both mammographic findings and general physical examination findings. Breast cancer is the only node in the second layer, and has five parent nodes (Y i, i=1,, 5) from the first layer. In this layer the conditional probabilities, P(Cancer Y 1,Y 2, Y 3, Y 4, Y 5 ), must be computed. Since each of the five parent nodes has two states in the configuration shown in Table 1, the probability table will contain 64 values, but because not all are independent, the following 32 different combinations of conditional probabilities are sufficient to specify this probability table: P 1 (Cancer=yes Y 1 =yes, Y 2 =yes, Y 3 =yes, Y 4 =yes, Y 5 =yes), P 2 (Cancer=yes Y 1 =yes, Y 2 =yes, Y 3 =yes, Y 4 =yes, Y 5 =no), P 3 (Cancer=yes Y 1 =yes, Y 2 =yes, Y 3 =yes, Y 4 =no, Y 5 =yes), P 32 (Cancer=yes Y 1 =no, Y 2 =no, Y 3 =no, Y 4 =no, Y 5 =no). The breast cancer node also has eight daughter nodes (X i, i=1,, 8) represented by the third layer, and in this preliminary experiment, these nodes are assumed independent of each other. Since six nodes in this layer have two possible states and the remaining two have three possible states (see Table 1), the probability table for this layer will contain 36 values, 20 of which are independent and sufficient to completely specify the layer. Thus, in order to fully specify the Bayesian network as shown in Fig. 1, a table containing 110 probability values must be determined, but only 57 of these values are independent. In this study, all of the necessary probabilities were automatically computed from the cases selected for network training. In the event that a conditional probability needed in the second layer could not be calculated from our database, because of the size limitation, the default probability values P(Cancer=yes Y 1,, Y 5 )=0.5 and P(Cancer=no Y 1,, Y 5 )=0.5 were used.

7 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) Different cross-validation methods have commonly been employed to evaluate the performance of statistical pattern recognition systems in general [30], and computer-assisted diagnosis schemes for mammography in particular [29,31]. Considering the size limitation of our database, a fivefold cross- alidation (CV) framework, that has been used in our previous studies [32], was adopted for this experiment. The original database of 419 cases was divided randomly into five mutually exclusive partitions. Except for one partition that contained 63 negative and 16 positive cases, the partitions involved equal numbers of cases, each with 66 negative and 19 positive cases. A series of five experimental cycles was performed. In each cycle a different group of four partitions was used to train each network (deriving all 110 probability values), and the group of positive and negative cases in the remaining partition was subsequently used for testing. Thus, in these five experimental cycles, each partition was used for training in four cycles and for testing in one cycle. ROC curves were produced by combining the test results from the five experimental cycles, and the areas under these ROC curves (A z values) were computed by using the program ROCFIT [33]. In this study we also investigated the relative contributions of image and non-image based features to the decision process. This involved comparing the performance changes attained when applying different methods to integrate all the features into a single decision outcome. Two methods for feature integration were compared in this study. First, we evaluated the performance of a network which incorporated both mammographic and non-mammographic based features into a single comprehensive Bayesian network. Second, we produced a hybrid decision system in which separate Bayesian networks for mammographic and non-mammographic based features were created, and the outputs of these two networks were combined. Both a simple average of the outputs and a technique based on logistic regression for combining the outputs of the separate networks were tested. Specifically, we divided our original Bayesian network (as shown in Fig. 1) into two sub-networks. The first subnetwork used only non-mammographic based features and the node breast cancer while excluding the four mammographic features (i.e. architectural distortion, mass, microcalcification cluster, and asymmetry) from the network. In contrast, the second sub-network contained only the four mammographic based nodes and the breast cancer node. The fivefold cross-validation method was used to train and test these two sub-networks. The hybrid classifiers, which combined results of the two sub-networks, were also tested. Areas under the ROC curves of the hybrid classifiers were compared to that from the comprehensive Bayesian network which utilized all features. This comparison was intended to indicate whether incorporating features in a single classifier yielded better performance than using a hybrid system of two, presumable independent, classifiers. 3. Results Fig. 2 shows three ROC curves, computed from the detection results of a Bayesian belief network incorporating all 14 nodes (see Fig. 1) as well as from two sub-networks incorporating only image or non-image related nodes. Average areas under the three ROC curves were , , and , respectively. The 0.10 increase in the A z value for the network utilizing four image based features, as compared to the network utilizing non-image based features, suggests that the mammographic features

8 122 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) made a greater contribution to the final diagnosis. Furthermore, using the combined feature set as the basis for a network yields a significant (P 0.05) improvement ( A z 0.08) in diagnostic performance over either of the individual sub-networks. Fig. 3 compares the ROC curve for the results of a single Bayesian network using all features to two curves corresponding to the results from hybrid classifiers using either averaging or logistic regression to combine results from the sub-networks. The areas under these ROC curves, achieved by the hybrid classifiers, are A z = and A z = , respectively. Although these values are higher than those yielded by either single sub-network, they are significantly (P 0.05) lower than the performance of the single network that incorporates all features. 4. Discussion The higher performance achieved by the use of all features in our complete network, as compared to either of the two sub-networks, suggests that the classification potential of the image based features and the non-image based features are at least partially independent. Given this independence, the question arises as to whether there is a synergistic effect when both sets of features are used concurrently, as opposed to making separate decisions using each feature subset individually and then combining the two decisions. The degree of such an effect can be assessed quantitatively, such as was done in this experiment, by a comparison of the areas under the ROC curves generated in the two scenarios. Because the combination of the Fig. 2. The ROC curves of three Bayesian belief networks using a fivefold cross-validation testing method.

9 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) Fig. 3. ROC curve for the original Bayesian network (Fig. 1) compared to ROC curves for two hybrid classifiers, which were based on combining the outputs of sub-networks by averaging or through logistic regression. two sub-networks is actually a special case of the more general complete network, to the extent that the complete network is optimal, it would not be possible for the combination of sub-networks to perform better than the complete network. In the diagnosis of breast cancer, human observers (mammographers) consolidate information from different views of mammograms (i.e. mammograms from left and right breasts with cranio-caudal and mediolateral oblique views) and other sources of information such as the patient s clinical examination or history. In contrast, many current computer-assisted diagnosis schemes for breast cancer either deal with only a single type of information or process each individual type of information separately and then combine the individual results to form a final decision. In fact, our study indicates a significantly better performance for the complete network, which coincides more closely with the diagnostic process of human observers. This suggests that a synergistic effect is indeed possible, but not proven by this study. Furthermore, the clinical importance of differences of the size found in this study depend on how the decision mechanism is ultimately incorporated into medical practice. It is easy to appreciate the possibility of such an effect. Each sub-network has condensed all of its input parameters to a one-dimensional variable, and the combination of the two sub-networks represents the decision space as only a two-dimensional manifold. It is easy to contrive a decision process having only three input variables, with each assum-

10 124 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) ing two possible states, in which no two can be combined without completely neutralizing the decision process. Consider, for example, input states (assuming each input variable takes a value of either zero or one) of (1, 0, 0), (0, 1, 0), (0, 0, 1) and (1, 1, 1) correspond to a zero output and the remaining input combinations correspond to an output of one. Then combining any pair of input parameters into a sub-network (to produce a one-dimensional result) caused the overall decision process to be ineffective. Thus, the existence of a synergistic effect is not surprising. The A z = achieved by our combined network is close to the value (A z = ) reported by Kahn et al. [20]. A similar result (A z =0.89) has been previously reported for an artificial neural network which incorporated 14 input features [11]. Although a direct comparison of these results is dubious since completely different methods and sets of cases were used in the studies, the consistency of these results suggests that this is a level of performance that can reasonably be expected to be attained with these kinds of decision systems. The previously reported study by Kahn et al. [20,21], used a Bayesian network which was not automatically trained from sample cases, but rather by either the assignment of available statistical data from published sources or the direct assignment by experts [20]. Our preliminary study has demonstrated that we can successfully apply machine learning methods to derive the required probabilities from a reasonably small training set. Although in this study we limited our network to relatively few nodes, this can be increased as larger training sets become available. Several encouraging results have been demonstrated in this investigation. We must emphasize, however, that this was a very preliminary study, and it is unlikely that the simple network designs described here, which were based on a small set of features and small training sets, would be sufficient to yield any significant clinical utility. Further investigations on many of the issues discussed, including the selection and investigation of features as well as robustness of performance, are required. Nevertheless, we have demonstrated that it is feasible to use machine learning techniques to develop Bayesian networks for the diagnosis of breast cancer, and that even simple Bayesian networks, based on small training sets, can achieve performance levels which are comparable to more established paradigms [11,20,28]. We have also demonstrated that questions related to the possible partitioning of a network into sub-networks, for the purpose of alleviating the computational burden, merit further study. Acknowledgements The authors wish to thank the staff of Magee Womens Hospital for their extensive assistance in developing the dataset used in this study. This work is sponsored in part by grants CA77850 and CA62800 from the National Cancer Institute, National Institutes of Health. References [1] R.E. Bird, T.W. Wallace, B.C. Yankaskas, Analysis of cancers missed at screening mammography, Radiology 184 (1992) [2] D.B. Kopans, The positive predictive value of mammography, Am. J. Radiol. 158 (1991)

11 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) [3] E.L. Thurfjell, K.A. Lernevall, A.S. Taube, Benefit of independent double reading in a populationbased mammography screening program, Radiology 191 (1994) [4] C.J. Vyborny, M.L. Giger, Computer vision and artificial intelligence in mammography, Am. J. Radiol. 162 (1994) [5] C.J. Vyborny, Can computers help radiologists read mammograms?, Radiology 191 (1994) [6] D.J. Getty, R.M. Pickett, C.J. D Orsi, J.A. Swets, Enhanced interpretation of diagnostic images, Invest. Radiol. 23 (1988) [7] J.A. Swets, D.J. Getty, R.M. Pickett, C.J. D Orsi, S.E. Seltzer, B.J. McNeil, Enhancing and evaluating diagnostic accuracy, Med. Decis. Mak. 11 (1) (1991) [8] C.J. D Orsi, D.J. Getty, J.A. Swets, R.M. Pickett, S.E. Seltzer, B.J. McNeil, Reading and decision aids for improved accuracy and standardization of mammographic diagnosis, Radiology 184 (3) (1992) [9] S.E. Seltzer, B.J. McNeil, C.J. D Orsi, D.J. Getty, R.M. Pickett, J.A. Swets, Combining evidence from multiple imaging modalities: a feature-analysis method, Comput. Med. Imaging Graph. 16 (6) (1992) [10] H.M. Cook, M.D. Fox, Application of expert systems to mammographic image analysis, Am. J. Physiol. Imag. 4 (1) (1989) [11] Y. Wu, M.L. Giger, K. Doi, C.J. Vyborny, R.A. Schmidt, C.E. Metz, Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer, Radiology 187 (1993) [12] C.E. Floyd Jr, J.Y. Lo, A.J. Yun, D.C. Sullivan, P.J. Kornguth, Predication of breast cancer malignancy using an artificial neutral network, Cancer 74 (11) (1994) [13] J.Y. Lo, J.A. Baker, P.J. Kornguth, C.E. Floyd Jr, Computer-aided diagnosis of breast cancer: artificial neutral network approach for optimized merging of mammographic features, Acad. Radiol. 2 (10) (1995) [14] J.Y. Lo, J.A. Baker, P.J. Kornguth, J.D. Iglehart, C.E. Floyd Jr, Predicting breast cancer invasion with artificial neutral networks on the basis of mammographic features, Radiology 203 (1) (1997) [15] F.V. Jensen, An Introduction to Bayesian Network, Springer Verlag, New York, NY, [16] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, CA, [17] D.E. Heckerman, E.H. Shortliffe, From certainty factors to belief networks, Artif. Intell. Med. 4 (1992) [18] D. Heckerman, Bayesian networks for data mining, Data. Min. Knowl. Discov. 1 (1997) [19] G.F. Cooper, Current research directions in the development of expert systems based on belief networks, Appl. Stoch. Models 5 (1989) [20] C.E. Kahn, L.M. Roberts, K.A. Shaffer, P. Haddawy, Construction of a Bayesian network for mammographic diagnosis of breast cancer, Comput. Biol. Med. 27 (1997) [21] C.E. Kahn, L.M. Roberts, K. Wang, D. Jenks, P. Haddawy, Preliminary investigation of a Bayesian network for mammographic diagnosis of breast cancer, Proc Annu Symp Comput Appl Med Care (1995) [22] P. Haddawy, J. Jacobson, C.E. Kahn, Generating explanations and tutorial problems from Bayesian networks, Proc Annu Symp Comput Appl Med Care (1994) [23] M. Henrion, M.J. Druzdzel, Qualitative propagation and scenario-based approaches to explanation of probabilistic reasoning, in: P.P. Bonissone, M. Henrion, L.N. Kanal, J.F. Lemmar (Eds.), Uncertainty in Artificial Intelligence 6, Elsevier, New York, [24] H.J. Suermondt, G.F. Cooper, An evaluation of explanations of probabilistic inference, Comput. Biomed. Res. 26 (1993) 242. [25] T.M. Mitchell, Machine learning, WCB McGraw- Hill Company, Boston, MA, 1997 (Chapter 6) pp. l97. [26] E. Neapolitan, Probabilistic reasoning in expert systems, Wiley, New York, NY, [27] G.F. Cooper, Probabilistic inference using belief networks is NP-hard, Technical Report 87-27, Medical Computer Science Group, Stanford University (1987). [28] K.M. Harris, B.C. Good, J.L. Kong, D. Toma, D. Gur, Z.S. Ilkhanipour, M.J. Staiger, J.H. Oliver, P.W. Wintz, M.A. Ganott, C.A. Britton, W.H. Straub, Exploring computerized mammographic reporting with feedback, Proc. SPIE 1899 (1993) [29] G.D. Tourassi, C.E. Floyed, The effect of data sampling on the performance evaluation of artificial neural networks in medical diagnosis, Med. Decis. Making 17 (1997)

12 126 X.-H. Wang et al. / International Journal of Medical Informatics 54 (1999) [30] B. Efron, G. Gong, A leisurely look at the bootstrap, the jackknife, and cross-validation, Am. Stat. 37 (1983) [31] W. Zhang, D. Kunio, M.L. Giger, R.M. Nishikawa, R.A. Schmidt, Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant artificial neural network, Med. Phys. 21 (1994) [32] R. Rymon, B. Zheng, Y.H. Chang, D. Gur, Incorporation of a set enumeration trees-based classifier into a hybrid computer-assisted diagnosis scheme for mass detection, Acad. Radiol. 5 (1998) [33] C.E. Metz, H.B. Kronman, P.L. Wang, J.H. Shen, ROCF11: A modified maximum likelihood algorithm for estimating a binormal ROC curve from confidence-rating data, University of Chicago, Chicago (1985)..

Preliminary Investigation of a Bayesian Network for Mammographic Diagnosis of Breast Cancer

Preliminary Investigation of a Bayesian Network for Mammographic Diagnosis of Breast Cancer Preliminary Investigation of a Bayesian Network for Mammographic Diagnosis of Breast Cancer Charles E. Kahn, Jr., M.D., Linda M. Roberts, M.S., Kun Wang, B.S., Deb Jenks, M.S.N., Peter Haddawy, Ph.D. The

More information

Cross-Institutional Evaluation of BI-RADS Predictive Model for Mammographic Diagnosis of Breast Cancer

Cross-Institutional Evaluation of BI-RADS Predictive Model for Mammographic Diagnosis of Breast Cancer Cross-Institutional Evaluation of BI-RADS Predictive Model for Mammographic Diagnosis of Breast Cancer Joseph Y. Lo 1,2 Mia K. Markey 1,2 Jay A. Baker 1 Carey E. Floyd, Jr. 1,2 OBJECTIVE. Given a predictive

More information

A decision support system for breast cancer detection in screening programs

A decision support system for breast cancer detection in screening programs A decision support system for breast cancer detection in screening programs Marina Velikova and Peter J.F. Lucas 2 and Nivea Ferreira 2 and Maurice Samulski and Nico Karssemeijer Abstract. The goal of

More information

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka

More information

Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

More information

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D. Research Report CBMI-99-27, Center for Biomedical Informatics, University of Pittsburgh, September 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel,

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Prototype Internet consultation system for radiologists

Prototype Internet consultation system for radiologists Prototype Internet consultation system for radiologists Boris Kovalerchuk, Department of Computer Science, Central Washington University, Ellensburg, WA 98926-7520, USA borisk@tahoma.cwu.edu James F. Ruiz

More information

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

More information

Breast Imaging Made Brief and Simple. Jane Clayton MD Associate Professor Department of Radiology LSUHSC New Orleans, LA

Breast Imaging Made Brief and Simple. Jane Clayton MD Associate Professor Department of Radiology LSUHSC New Orleans, LA Breast Imaging Made Brief and Simple Jane Clayton MD Associate Professor Department of Radiology LSUHSC New Orleans, LA What women are referred for breast imaging? Two groups of women are referred for

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

Evaluating Data Mining Models: A Pattern Language

Evaluating Data Mining Models: A Pattern Language Evaluating Data Mining Models: A Pattern Language Jerffeson Souza Stan Matwin Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa K1N 6N5, Canada {jsouza,stan,nat}@site.uottawa.ca

More information

Agenda. Interface Agents. Interface Agents

Agenda. Interface Agents. Interface Agents Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

GeNIeRate: An Interactive Generator of Diagnostic Bayesian Network Models

GeNIeRate: An Interactive Generator of Diagnostic Bayesian Network Models GeNIeRate: An Interactive Generator of Diagnostic Bayesian Network Models Pieter C. Kraaijeveld Man Machine Interaction Group Delft University of Technology Mekelweg 4, 2628 CD Delft, the Netherlands p.c.kraaijeveld@ewi.tudelft.nl

More information

Methodologies for Evaluation of Standalone CAD System Performance

Methodologies for Evaluation of Standalone CAD System Performance Methodologies for Evaluation of Standalone CAD System Performance DB DSFM DCMS OSEL DESE DP DIAM Berkman Sahiner, PhD USFDA/CDRH/OSEL/DIAM AAPM CAD Subcommittee in Diagnostic Imaging CAD: CADe and CADx

More information

Variability and Accuracy in Mammographic Interpretation Using the American College of Radiology Breast Imaging Reporting and Data System

Variability and Accuracy in Mammographic Interpretation Using the American College of Radiology Breast Imaging Reporting and Data System Variability and Accuracy in Mammographic Interpretation Using the American College of Radiology Breast Imaging Reporting and Data System Karla Kerlikowske, Deborah Grady, John Barclay, Steven D. Frankel,

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

A Probabilistic Causal Model for Diagnosis of Liver Disorders

A Probabilistic Causal Model for Diagnosis of Liver Disorders Intelligent Information Systems VII Proceedings of the Workshop held in Malbork, Poland, June 15-19, 1998 A Probabilistic Causal Model for Diagnosis of Liver Disorders Agnieszka Oniśko 1, Marek J. Druzdzel

More information

Chapter 7.30 Retrieving Medical Records Using Bayesian Networks

Chapter 7.30 Retrieving Medical Records Using Bayesian Networks 2274 Chapter 7.30 Retrieving Medical Records Using Bayesian Networks Luis M. de Campos Universidad de Granada, Spain Juan M. Fernández Luna Universidad de Granada, Spain Juan F. Huete Universidad de Granada,

More information

Medicare Part B. Mammograms - Updated Billing Guide for Screening and Diagnostic Tests

Medicare Part B. Mammograms - Updated Billing Guide for Screening and Diagnostic Tests Mammograms - Updated Billing Guide for Screening and Diagnostic Tests This article from Medicare B News Issue 223 dated October 21, 2005 is being updated and reprinted to ensure that the Noridian Administrative

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

Mammography. What is Mammography?

Mammography. What is Mammography? Scan for mobile link. Mammography Mammography is a specific type of breast imaging that uses low-dose x-rays to detect cancer early before women experience symptoms when it is most treatable. Tell your

More information

How To Find Influence Between Two Concepts In A Network

How To Find Influence Between Two Concepts In A Network 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

VI. FREQUENTLY ASKED QUESTIONS CONCERNING BREAST IMAGING AUDITS

VI. FREQUENTLY ASKED QUESTIONS CONCERNING BREAST IMAGING AUDITS ACR BI-RADS ATLAS VI. FREQUENTLY ASKED QUESTIONS CONCERNING BREAST IMAGING AUDITS American College of Radiology 55 ACR BI-RADS ATLAS A. All Breast Imaging Modalities 1. According to the BI-RADS Atlas,

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Comparing Data Mining with Ensemble Classification of Breast Cancer Masses in Digital Mammograms

Comparing Data Mining with Ensemble Classification of Breast Cancer Masses in Digital Mammograms Comparing Data Mining with Ensemble Classification of Breast Cancer Masses in Digital Mammograms Shima Ghassem Pour 1, Peter Mc Leod 2, Brijesh Verma 2, and Anthony Maeder 1 1 School of Computing, Engineering

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Journal of Information

Journal of Information Journal of Information journal homepage: http://www.pakinsight.com/?ic=journal&journal=104 PREDICT SURVIVAL OF PATIENTS WITH LUNG CANCER USING AN ENSEMBLE FEATURE SELECTION ALGORITHM AND CLASSIFICATION

More information

DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS

DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS Hacettepe Journal of Mathematics and Statistics Volume 33 (2004), 69 76 DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS Hülya Olmuş and S. Oral Erbaş Received 22 : 07 : 2003 : Accepted 04

More information

Digital Mammogram National Database

Digital Mammogram National Database Digital Mammogram National Database Professor Michael Brady FRS FREng Medical Vision Laboratory Oxford University Chairman: Mirada Solutions Ltd PharmaGrid 2/7/03 ediamond aims construct a federated database

More information

Evaluation of Stylus for Radiographic Image Annotation

Evaluation of Stylus for Radiographic Image Annotation Evaluation of Stylus for Radiographic Image Annotation Gautam S. Muralidhar, 1 Gary J. Whitman, 2 Tamara Miner Haygood, 2 Tanya W. Stephens, 2 Alan C. Bovik, 3 and Mia K. Markey 1 We evaluated the use

More information

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J.

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J. A factor 1 European Journal of Operational Research xxx (00) xxx xxx Discrete Optimization approximation algorithm for two-stage stochastic matching problems Nan Kong, Andrew J. Schaefer * Department of

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Healthcare Data Mining: Prediction Inpatient Length of Stay

Healthcare Data Mining: Prediction Inpatient Length of Stay 3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract

More information

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening , pp.169-178 http://dx.doi.org/10.14257/ijbsbt.2014.6.2.17 Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening Ki-Seok Cheong 2,3, Hye-Jeong Song 1,3, Chan-Young Park 1,3, Jong-Dae

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

Selecting Pedagogical Protocols Using SOM

Selecting Pedagogical Protocols Using SOM Selecting Pedagogical Protocols Using SOM F. Salgueiro, Z. Cataldi, P. Britos, E. Sierra and R. García-Martínez Intelligent Systems Laboratory. School of Engineering. University of Buenos Aires Educational

More information

Electronic health records to study population health: opportunities and challenges

Electronic health records to study population health: opportunities and challenges Electronic health records to study population health: opportunities and challenges Caroline A. Thompson, PhD, MPH Assistant Professor of Epidemiology San Diego State University Caroline.Thompson@mail.sdsu.edu

More information

Neural network models: Foundations and applications to an audit decision problem

Neural network models: Foundations and applications to an audit decision problem Annals of Operations Research 75(1997)291 301 291 Neural network models: Foundations and applications to an audit decision problem Rebecca C. Wu Department of Accounting, College of Management, National

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Breast Cancer Screening

Breast Cancer Screening Breast Cancer Screening The American Cancer Society and Congregational Health Ministry Team October Module To access this module via the Web, visit www.cancer.org and type in congregational health ministry

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation

More information

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

Neural Network Applications in Stock Market Predictions - A Methodology Analysis Neural Network Applications in Stock Market Predictions - A Methodology Analysis Marijana Zekic, MS University of Josip Juraj Strossmayer in Osijek Faculty of Economics Osijek Gajev trg 7, 31000 Osijek

More information

Up/Down Analysis of Stock Index by Using Bayesian Network

Up/Down Analysis of Stock Index by Using Bayesian Network Engineering Management Research; Vol. 1, No. 2; 2012 ISSN 1927-7318 E-ISSN 1927-7326 Published by Canadian Center of Science and Education Up/Down Analysis of Stock Index by Using Bayesian Network Yi Zuo

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Breast Ultrasound: Benign vs. Malignant Lesions

Breast Ultrasound: Benign vs. Malignant Lesions October 25-November 19, 2004 Breast Ultrasound: Benign vs. Malignant Lesions Jill Steinkeler,, Tufts University School of Medicine IV Breast Anatomy Case Presentation-Patient 1 62 year old woman with a

More information

Predicting Bankruptcy with Robust Logistic Regression

Predicting Bankruptcy with Robust Logistic Regression Journal of Data Science 9(2011), 565-584 Predicting Bankruptcy with Robust Logistic Regression Richard P. Hauser and David Booth Kent State University Abstract: Using financial ratio data from 2006 and

More information

Sustaining a High-Quality Breast MRI Practice

Sustaining a High-Quality Breast MRI Practice Sustaining a High-Quality Breast MRI Practice Christoph Lee, MD, MSHS Associate Professor of Radiology Adjunct Associate Professor, Health Services University of Washington September 11, 2015 Overview

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

How To Create A Decision Support System For A Patient Care System

How To Create A Decision Support System For A Patient Care System DOI: 10.7763/IPEDR. 2013. V63. 3 Design of a Decision Support System in Electronic Medical Record Using Structured Query Language Muhammad Asif +, Mohammad Jamil Sawar, and Umair Abdullah Barani Institute

More information

Software Engineering of NLP-based Computer-assisted Coding Applications

Software Engineering of NLP-based Computer-assisted Coding Applications Software Engineering of NLP-based Computer-assisted Coding Applications 1 Software Engineering of NLP-based Computer-assisted Coding Applications by Mark Morsch, MS; Carol Stoyla, BS, CLA; Ronald Sheffer,

More information

D. FREQUENTLY ASKED QUESTIONS

D. FREQUENTLY ASKED QUESTIONS ACR BI-RADS ATLAS D. FREQUENTLY ASKED QUESTIONS 1. Under MQSA, is it necessary to include a numeric assessment code (i.e., 0, 1, 2, 3, 4, 5, or 6) in addition to the assessment category in all mammography

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Knowledge Based Descriptive Neural Networks

Knowledge Based Descriptive Neural Networks Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: jtyao@cs.uregina.ca Abstract This paper presents a

More information

Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1

Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1 Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1 Maitreya Natu Dept. of Computer and Information Sciences University of Delaware, Newark, DE, USA, 19716 Email: natu@cis.udel.edu

More information

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites BOR 6335 Data Mining Course Description This course provides an overview of data mining and fundamentals of using RapidMiner and OpenOffice open access software packages to develop data mining models.

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Bayesian Networks and Classifiers in Project Management

Bayesian Networks and Classifiers in Project Management Bayesian Networks and Classifiers in Project Management Daniel Rodríguez 1, Javier Dolado 2 and Manoranjan Satpathy 1 1 Dept. of Computer Science The University of Reading Reading, RG6 6AY, UK drg@ieee.org,

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Visualizing class probability estimators

Visualizing class probability estimators Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Real Time Traffic Monitoring With Bayesian Belief Networks

Real Time Traffic Monitoring With Bayesian Belief Networks Real Time Traffic Monitoring With Bayesian Belief Networks Sicco Pier van Gosliga TNO Defence, Security and Safety, P.O.Box 96864, 2509 JG The Hague, The Netherlands +31 70 374 02 30, sicco_pier.vangosliga@tno.nl

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

More information

Application of Data Mining Methods in Health Care Databases

Application of Data Mining Methods in Health Care Databases 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Infrared Thermography Not a Useful Breast Cancer Screening Tool

Infrared Thermography Not a Useful Breast Cancer Screening Tool Contact: Jeanne-Marie Phillips Sharon Grutman HealthFlash Marketing The American Society of Breast Surgeons 203-977-3333 877-992-5470 Infrared Thermography Not a Useful Breast Cancer Screening Tool Mammography

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

PharmaSUG2011 Paper HS03

PharmaSUG2011 Paper HS03 PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Local Coverage Determination (LCD): Screening and Diagnostic Mammography (L29328)

Local Coverage Determination (LCD): Screening and Diagnostic Mammography (L29328) Local Coverage Determination (LCD): Screening and Diagnostic Mammography (L29328) Contractor Information Contractor Name First Coast Service Options, Inc. LCD Information Document Information LCD ID L29328

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

Learning diagnostic diagrams in transport-based data-collection systems

Learning diagnostic diagrams in transport-based data-collection systems University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers Faculty of Engineering and Information Sciences 2014 Learning diagnostic diagrams in transport-based data-collection

More information

Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

More information

How To Compare Mammography To A 3D System

How To Compare Mammography To A 3D System Radiology Advisory Panel Meeting Hologic Selenia Dimensions 3D System with C-View Software Module FDA Review Robert Ochs, PhD Branch Chief Mammography, Ultrasound, and Imaging Software Branch Division

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Stock Investing Using HUGIN Software

Stock Investing Using HUGIN Software Stock Investing Using HUGIN Software An Easy Way to Use Quantitative Investment Techniques Abstract Quantitative investment methods have gained foothold in the financial world in the last ten years. This

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information