Across a wide variety of fields, data are

Size: px
Start display at page:

Download "Across a wide variety of fields, data are"

Transcription

1 Frm Data Mining t Knwledge Discvery in Databases Usama Fayyad, Gregry Piatetsky-Shapir, and Padhraic Smyth Data mining and knwledge discvery in databases have been attracting a significant amunt f research, industry, and media attentin f late. What is all the ecitement abut? This article prvides an verview f this emerging field, clarifying hw data mining and knwledge discvery in databases are related bth t each ther and t related fields, such as machine learning, statistics, and databases. The article mentins particular real-wrld applicatins, specific data-mining techniques, challenges invlved in real-wrld applicatins f knwledge discvery, and current and future research directins in the field. Acrss a wide variety f fields, data are being cllected and accumulated at a dramatic pace. There is an urgent need fr a new generatin f cmputatinal theries and tls t assist humans in etracting useful infrmatin (knwledge) frm the rapidly grwing vlumes f digital data. These theries and tls are the subject f the emerging field f knwledge discvery in databases (KDD). At an abstract level, the KDD field is cncerned with the develpment f methds and techniques fr making sense f data. The basic prblem addressed by the KDD prcess is ne f mapping lw-level data (which are typically t vluminus t understand and digest easily) int ther frms that might be mre cmpact (fr eample, a shrt reprt), mre abstract (fr eample, a descriptive apprimatin r mdel f the prcess that generated the data), r mre useful (fr eample, a predictive mdel fr estimating the value f future cases). At the cre f the prcess is the applicatin f specific data-mining methds fr pattern discvery and etractin. 1 This article begins by discussing the histrical cntet f KDD and data mining and their intersectin with ther related fields. A brief summary f recent KDD real-wrld applicatins is prvided. Definitins f KDD and data mining are prvided, and the general multistep KDD prcess is utlined. This multistep prcess has the applicatin f data-mining algrithms as ne particular step in the prcess. The data-mining step is discussed in mre detail in the cntet f specific data-mining algrithms and their applicatin. Real-wrld practical applicatin issues are als utlined. Finally, the article enumerates challenges fr future research and develpment and in particular discusses ptential pprtunities fr AI technlgy in KDD systems. Why D We Need KDD? The traditinal methd f turning data int knwledge relies n manual analysis and interpretatin. Fr eample, in the health-care industry, it is cmmn fr specialists t peridically analyze current trends and changes in health-care data, say, n a quarterly basis. The specialists then prvide a reprt detailing the analysis t the spnsring health-care rganizatin; this reprt becmes the basis fr future decisin making and planning fr health-care management. In a ttally different type f applicatin, planetary gelgists sift thrugh remtely sensed images f planets and asterids, carefully lcating and catalging such gelgic bjects f interest as impact craters. Be it science, marketing, finance, health care, retail, r any ther field, the classical apprach t data analysis relies fundamentally n ne r mre analysts becming Cpyright 1996, American Assciatin fr Artificial Intelligence. All rights reserved / $2.00 FALL

2 There is an urgent need fr a new generatin f cmputatinal theries and tls t assist humans in etracting useful infrmatin (knwledge) frm the rapidly grwing vlumes f digital data. areas is astrnmy. Here, a ntable success was achieved by SKICAT, a system used by astrnmers t perfrm image analysis, classificatin, and catalging f sky bjects frm sky-survey images (Fayyad, Djrgvski, and Weir 1996). In its first applicatin, the system was used t prcess the 3 terabytes (10 12 bytes) f image data resulting frm the Secnd Palmar Observatry Sky Survey, where it is estimated that n the rder f 10 9 sky bjects are detectable. SKICAT can utperfrm humans and traditinal cmputatinal techniques in classifying faint sky bjects. See Fayyad, Haussler, and Stlrz (1996) fr a survey f scientific applicatins. In business, main KDD applicatin areas includes marketing, finance (especially investment), fraud detectin, manufacturing, telecmmunicatins, and Internet agents. Marketing: In marketing, the primary applicatin is database marketing systems, which analyze custmer databases t identify different custmer grups and frecast their behavir. Business Week (Berry 1994) estimated that ver half f all retailers are using r planning t use database marketing, and thse wh d use it have gd results; fr eample, American Epress reprts a 10- t 15- percent increase in credit-card use. Anther ntable marketing applicatin is market-basket analysis (Agrawal et al. 1996) systems, which find patterns such as, If custmer bught X, he/she is als likely t buy Y and Z. Such patterns are valuable t retailers. Investment: Numerus cmpanies use data mining fr investment, but mst d nt describe their systems. One eceptin is LBS Capital Management. Its system uses epert systems, neural nets, and genetic algrithms t manage prtflis ttaling $600 millin; since its start in 1993, the system has utperfrmed the brad stck market (Hall, Mani, and Barr 1996). Fraud detectin: HNC Falcn and Nestr PRISM systems are used fr mnitring creditcard fraud, watching ver millins f accunts. The FAIS system (Senatr et al. 1995), frm the U.S. Treasury Financial Crimes Enfrcement Netwrk, is used t identify financial transactins that might indicate mneylaundering activity. Manufacturing: The CASSIOPEE trubleshting system, develped as part f a jint venture between General Electric and SNECMA, was applied by three majr Eurpean airlines t diagnse and predict prblems fr the Being 737. T derive families f faults, clustering methds are used. CASSIOPEE received the Eurpean first prize fr innvaintimately familiar with the data and serving as an interface between the data and the users and prducts. Fr these (and many ther) applicatins, this frm f manual prbing f a data set is slw, epensive, and highly subjective. In fact, as data vlumes grw dramatically, this type f manual data analysis is becming cmpletely impractical in many dmains. Databases are increasing in size in tw ways: (1) the number N f recrds r bjects in the database and (2) the number d f fields r attributes t an bject. Databases cntaining n the rder f N = 10 9 bjects are becming increasingly cmmn, fr eample, in the astrnmical sciences. Similarly, the number f fields d can easily be n the rder f 10 2 r even 10 3, fr eample, in medical diagnstic applicatins. Wh culd be epected t digest millins f recrds, each having tens r hundreds f fields? We believe that this jb is certainly nt ne fr humans; hence, analysis wrk needs t be autmated, at least partially. The need t scale up human analysis capabilities t handling the large number f bytes that we can cllect is bth ecnmic and scientific. Businesses use data t gain cmpetitive advantage, increase efficiency, and prvide mre valuable services t custmers. Data we capture abut ur envirnment are the basic evidence we use t build theries and mdels f the universe we live in. Because cmputers have enabled humans t gather mre data than we can digest, it is nly natural t turn t cmputatinal techniques t help us unearth meaningful patterns and structures frm the massive vlumes f data. Hence, KDD is an attempt t address a prblem that the digital infrmatin era made a fact f life fr all f us: data verlad. Data Mining and Knwledge Discvery in the Real Wrld A large degree f the current interest in KDD is the result f the media interest surrunding successful KDD applicatins, fr eample, the fcus articles within the last tw years in Business Week, Newsweek, Byte, PC Week, and ther large-circulatin peridicals. Unfrtunately, it is nt always easy t separate fact frm media hype. Nnetheless, several welldcumented eamples f successful systems can rightly be referred t as KDD applicatins and have been deplyed in peratinal use n large-scale real-wrld prblems in science and in business. In science, ne f the primary applicatin 38 AI MAGAZINE

3 tive applicatins (Manag and Auril 1996). Telecmmunicatins: The telecmmunicatins alarm-sequence analyzer (TASA) was built in cperatin with a manufacturer f telecmmunicatins equipment and three telephne netwrks (Mannila, Tivnen, and Verkam 1995). The system uses a nvel framewrk fr lcating frequently ccurring alarm episdes frm the alarm stream and presenting them as rules. Large sets f discvered rules can be eplred with fleible infrmatin-retrieval tls supprting interactivity and iteratin. In this way, TASA ffers pruning, gruping, and rdering tls t refine the results f a basic brute-frce search fr rules. Data cleaning: The MERGE-PURGE system was applied t the identificatin f duplicate welfare claims (Hernandez and Stlf 1995). It was used successfully n data frm the Welfare Department f the State f Washingtn. In ther areas, a well-publicized system is IBM s ADVANCED SCOUT, a specialized data-mining system that helps Natinal Basketball Assciatin (NBA) caches rganize and interpret data frm NBA games (U.S. News 1995). ADVANCED SCOUT was used by several f the NBA teams in 1996, including the Seattle Supersnics, which reached the NBA finals. Finally, a nvel and increasingly imprtant type f discvery is ne based n the use f intelligent agents t navigate thrugh an infrmatin-rich envirnment. Althugh the idea f active triggers has lng been analyzed in the database field, really successful applicatins f this idea appeared nly with the advent f the Internet. These systems ask the user t specify a prfile f interest and search fr related infrmatin amng a wide variety f public-dmain and prprietary surces. Fr eample, FIREFLY is a persnal music-recmmendatin agent: It asks a user his/her pinin f several music pieces and then suggests ther music that the user might like (< CRAYON ( allws users t create their wn free newspaper (supprted by ads); NEWSHOUND (< sjmercury.cm/hund/>) frm the San Jse Mercury News and FARCAST (< autmatically search infrmatin frm a wide variety f surces, including newspapers and wire services, and relevant dcuments directly t the user. These are just a few f the numerus such systems that use KDD techniques t autmatically prduce useful infrmatin frm large masses f raw data. See Piatetsky-Shapir et al. (1996) fr an verview f issues in develping industrial KDD applicatins. Data Mining and KDD Histrically, the ntin f finding useful patterns in data has been given a variety f names, including data mining, knwledge etractin, infrmatin discvery, infrmatin harvesting, data archaelgy, and data pattern prcessing. The term data mining has mstly been used by statisticians, data analysts, and the management infrmatin systems (MIS) cmmunities. It has als gained ppularity in the database field. The phrase knwledge discvery in databases was cined at the first KDD wrkshp in 1989 (Piatetsky-Shapir 1991) t emphasize that knwledge is the end prduct f a data-driven discvery. It has been ppularized in the AI and machine-learning fields. In ur view, KDD refers t the verall prcess f discvering useful knwledge frm data, and data mining refers t a particular step in this prcess. Data mining is the applicatin f specific algrithms fr etracting patterns frm data. The distinctin between the KDD prcess and the data-mining step (within the prcess) is a central pint f this article. The additinal steps in the KDD prcess, such as data preparatin, data selectin, data cleaning, incrpratin f apprpriate prir knwledge, and prper interpretatin f the results f mining, are essential t ensure that useful knwledge is derived frm the data. Blind applicatin f data-mining methds (rightly criticized as data dredging in the statistical literature) can be a dangerus activity, easily leading t the discvery f meaningless and invalid patterns. The Interdisciplinary Nature f KDD KDD has evlved, and cntinues t evlve, frm the intersectin f research fields such as machine learning, pattern recgnitin, databases, statistics, AI, knwledge acquisitin fr epert systems, data visualizatin, and high-perfrmance cmputing. The unifying gal is etracting high-level knwledge frm lw-level data in the cntet f large data sets. The data-mining cmpnent f KDD currently relies heavily n knwn techniques frm machine learning, pattern recgnitin, and statistics t find patterns frm data in the data-mining step f the KDD prcess. A natural questin is, Hw is KDD different frm pattern recgnitin r machine learning (and related fields)? The answer is that these fields prvide sme f the data-mining methds that are used in the data-mining step f the KDD prcess. KDD fcuses n the verall prcess f knwledge discvery frm data, including hw the data are stred and accessed, hw algrithms can be scaled t massive data sets The basic prblem addressed by the KDD prcess is ne f mapping lw-level data int ther frms that might be mre cmpact, mre abstract, r mre useful. FALL

4 Data mining is a step in the KDD prcess that cnsists f applying data analysis and discvery algrithms that prduce a particular enumeratin f patterns (r mdels) ver the data. Basic Definitins KDD is the nntrivial prcess f identifying valid, nvel, ptentially useful, and ultimateand still run efficiently, hw results can be interpreted and visualized, and hw the verall man-machine interactin can usefully be mdeled and supprted. The KDD prcess can be viewed as a multidisciplinary activity that encmpasses techniques beynd the scpe f any ne particular discipline such as machine learning. In this cntet, there are clear pprtunities fr ther fields f AI (besides machine learning) t cntribute t KDD. KDD places a special emphasis n finding understandable patterns that can be interpreted as useful r interesting knwledge. Thus, fr eample, neural netwrks, althugh a pwerful mdeling tl, are relatively difficult t understand cmpared t decisin trees. KDD als emphasizes scaling and rbustness prperties f mdeling algrithms fr large nisy data sets. Related AI research fields include machine discvery, which targets the discvery f empirical laws frm bservatin and eperimentatin (Shrager and Langley 1990) (see Klesgen and Zytkw [1996] fr a glssary f terms cmmn t KDD and machine discvery), and causal mdeling fr the inference f causal mdels frm data (Spirtes, Glymur, and Scheines 1993). Statistics in particular has much in cmmn with KDD (see Elder and Pregibn [1996] and Glymur et al. [1996] fr a mre detailed discussin f this synergy). Knwledge discvery frm data is fundamentally a statistical endeavr. Statistics prvides a language and framewrk fr quantifying the uncertainty that results when ne tries t infer general patterns frm a particular sample f an verall ppulatin. As mentined earlier, the term data mining has had negative cnntatins in statistics since the 1960s when cmputer-based data analysis techniques were first intrduced. The cncern arse because if ne searches lng enugh in any data set (even randmly generated data), ne can find patterns that appear t be statistically significant but, in fact, are nt. Clearly, this issue is f fundamental imprtance t KDD. Substantial prgress has been made in recent years in understanding such issues in statistics. Much f this wrk is f direct relevance t KDD. Thus, data mining is a legitimate activity as lng as ne understands hw t d it crrectly; data mining carried ut prly (withut regard t the statistical aspects f the prblem) is t be avided. KDD can als be viewed as encmpassing a brader view f mdeling than statistics. KDD aims t prvide tls t autmate (t the degree pssible) the entire prcess f data analysis and the statistician s art f hypthesis selectin. A driving frce behind KDD is the database field (the secnd D in KDD). Indeed, the prblem f effective data manipulatin when data cannt fit in the main memry is f fundamental imprtance t KDD. Database techniques fr gaining efficient data access, gruping and rdering peratins when accessing data, and ptimizing queries cnstitute the basics fr scaling algrithms t larger data sets. Mst data-mining algrithms frm statistics, pattern recgnitin, and machine learning assume data are in the main memry and pay n attentin t hw the algrithm breaks dwn if nly limited views f the data are pssible. A related field evlving frm databases is data warehusing, which refers t the ppular business trend f cllecting and cleaning transactinal data t make them available fr nline analysis and decisin supprt. Data warehusing helps set the stage fr KDD in tw imprtant ways: (1) data cleaning and (2) data access. Data cleaning: As rganizatins are frced t think abut a unified lgical view f the wide variety f data and databases they pssess, they have t address the issues f mapping data t a single naming cnventin, unifrmly representing and handling missing data, and handling nise and errrs when pssible. Data access: Unifrm and well-defined methds must be created fr accessing the data and prviding access paths t data that were histrically difficult t get t (fr eample, stred ffline). Once rganizatins and individuals have slved the prblem f hw t stre and access their data, the natural net step is the questin, What else d we d with all the data? This is where pprtunities fr KDD naturally arise. A ppular apprach fr analysis f data warehuses is called nline analytical prcessing (OLAP), named fr a set f principles prpsed by Cdd (1993). OLAP tls fcus n prviding multidimensinal data analysis, which is superir t SQL in cmputing summaries and breakdwns alng many dimensins. OLAP tls are targeted tward simplifying and supprting interactive data analysis, but the gal f KDD tls is t autmate as much f the prcess as pssible. Thus, KDD is a step beynd what is currently supprted by mst standard database systems. 40 AI MAGAZINE

5 Interpretatin / Evaluatin Data Mining Preprcessing Transfrmatin Knwledge Selectin Preprcessed Data Transfrmed Data Patterns Data Target Date Figure 1. An Overview f the Steps That Cmpse the KDD Prcess. ly understandable patterns in data (Fayyad, Piatetsky-Shapir, and Smyth 1996). Here, data are a set f facts (fr eample, cases in a database), and pattern is an epressin in sme language describing a subset f the data r a mdel applicable t the subset. Hence, in ur usage here, etracting a pattern als designates fitting a mdel t data; finding structure frm data; r, in general, making any high-level descriptin f a set f data. The term prcess implies that KDD cmprises many steps, which invlve data preparatin, search fr patterns, knwledge evaluatin, and refinement, all repeated in multiple iteratins. By nntrivial, we mean that sme search r inference is invlved; that is, it is nt a straightfrward cmputatin f predefined quantities like cmputing the average value f a set f numbers. The discvered patterns shuld be valid n new data with sme degree f certainty. We als want patterns t be nvel (at least t the system and preferably t the user) and ptentially useful, that is, lead t sme benefit t the user r task. Finally, the patterns shuld be understandable, if nt immediately then after sme pstprcessing. The previus discussin implies that we can define quantitative measures fr evaluating etracted patterns. In many cases, it is pssible t define measures f certainty (fr eample, estimated predictin accuracy n new data) r utility (fr eample, gain, perhaps in dllars saved because f better predictins r speedup in respnse time f a system). Ntins such as nvelty and understandability are much mre subjective. In certain cntets, understandability can be estimated by simplicity (fr eample, the number f bits t describe a pattern). An imprtant ntin, called interestingness (fr eample, see Silberschatz and Tuzhilin [1995] and Piatetsky-Shapir and Matheus [1994]), is usually taken as an verall measure f pattern value, cmbining validity, nvelty, usefulness, and simplicity. Interestingness functins can be defined eplicitly r can be manifested implicitly thrugh an rdering placed by the KDD system n the discvered patterns r mdels. Given these ntins, we can cnsider a pattern t be knwledge if it eceeds sme interestingness threshld, which is by n means an attempt t define knwledge in the philsphical r even the ppular view. As a matter f fact, knwledge in this definitin is purely user riented and dmain specific and is determined by whatever functins and threshlds the user chses. Data mining is a step in the KDD prcess that cnsists f applying data analysis and discvery algrithms that, under acceptable cmputatinal efficiency limitatins, prduce a particular enumeratin f patterns (r mdels) ver the data. Nte that the space f FALL

6 patterns is ften infinite, and the enumeratin f patterns invlves sme frm f search in this space. Practical cmputatinal cnstraints place severe limits n the subspace that can be eplred by a data-mining algrithm. The KDD prcess invlves using the database alng with any required selectin, preprcessing, subsampling, and transfrmatins f it; applying data-mining methds (algrithms) t enumerate patterns frm it; and evaluating the prducts f data mining t identify the subset f the enumerated patterns deemed knwledge. The data-mining cmpnent f the KDD prcess is cncerned with the algrithmic means by which patterns are etracted and enumerated frm data. The verall KDD prcess (figure 1) includes the evaluatin and pssible interpretatin f the mined patterns t determine which patterns can be cnsidered new knwledge. The KDD prcess als includes all the additinal steps described in the net sectin. The ntin f an verall user-driven prcess is nt unique t KDD: analgus prpsals have been put frward bth in statistics (Hand 1994) and in machine learning (Brdley and Smyth 1996). The KDD Prcess The KDD prcess is interactive and iterative, invlving numerus steps with many decisins made by the user. Brachman and Anand (1996) give a practical view f the KDD prcess, emphasizing the interactive nature f the prcess. Here, we bradly utline sme f its basic steps: First is develping an understanding f the applicatin dmain and the relevant prir knwledge and identifying the gal f the KDD prcess frm the custmer s viewpint. Secnd is creating a target data set: selecting a data set, r fcusing n a subset f variables r data samples, n which discvery is t be perfrmed. Third is data cleaning and preprcessing. Basic peratins include remving nise if apprpriate, cllecting the necessary infrmatin t mdel r accunt fr nise, deciding n strategies fr handling missing data fields, and accunting fr time-sequence infrmatin and knwn changes. Furth is data reductin and prjectin: finding useful features t represent the data depending n the gal f the task. With dimensinality reductin r transfrmatin methds, the effective number f variables under cnsideratin can be reduced, r invariant representatins fr the data can be fund. Fifth is matching the gals f the KDD prcess (step 1) t a particular data-mining methd. Fr eample, summarizatin, classificatin, regressin, clustering, and s n, are described later as well as in Fayyad, Piatetsky-Shapir, and Smyth (1996). Sith is eplratry analysis and mdel and hypthesis selectin: chsing the datamining algrithm(s) and selecting methd(s) t be used fr searching fr data patterns. This prcess includes deciding which mdels and parameters might be apprpriate (fr eample, mdels f categrical data are different than mdels f vectrs ver the reals) and matching a particular data-mining methd with the verall criteria f the KDD prcess (fr eample, the end user might be mre interested in understanding the mdel than its predictive capabilities). Seventh is data mining: searching fr patterns f interest in a particular representatinal frm r a set f such representatins, including classificatin rules r trees, regressin, and clustering. The user can significantly aid the data-mining methd by crrectly perfrming the preceding steps. Eighth is interpreting mined patterns, pssibly returning t any f steps 1 thrugh 7 fr further iteratin. This step can als invlve visualizatin f the etracted patterns and mdels r visualizatin f the data given the etracted mdels. Ninth is acting n the discvered knwledge: using the knwledge directly, incrprating the knwledge int anther system fr further actin, r simply dcumenting it and reprting it t interested parties. This prcess als includes checking fr and reslving ptential cnflicts with previusly believed (r etracted) knwledge. The KDD prcess can invlve significant iteratin and can cntain lps between any tw steps. The basic flw f steps (althugh nt the ptential multitude f iteratins and lps) is illustrated in figure 1. Mst previus wrk n KDD has fcused n step 7, the data mining. Hwever, the ther steps are as imprtant (and prbably mre s) fr the successful applicatin f KDD in practice. Having defined the basic ntins and intrduced the KDD prcess, we nw fcus n the data-mining cmpnent, which has, by far, received the mst attentin in the literature. 42 AI MAGAZINE

7 The Data-Mining Step f the KDD Prcess Debt The data-mining cmpnent f the KDD prcess ften invlves repeated iterative applicatin f particular data-mining methds. This sectin presents an verview f the primary gals f data mining, a descriptin f the methds used t address these gals, and a brief descriptin f the data-mining algrithms that incrprate these methds. The knwledge discvery gals are defined by the intended use f the system. We can distinguish tw types f gals: (1) verificatin and (2) discvery. With verificatin, the system is limited t verifying the user s hypthesis. With discvery, the system autnmusly finds new patterns. We further subdivide the discvery gal int predictin, where the system finds patterns fr predicting the future behavir f sme entities, and descriptin, where the system finds patterns fr presentatin t a user in a human-understandable frm. In this article, we are primarily cncerned with discvery-riented data mining. Data mining invlves fitting mdels t, r determining patterns frm, bserved data. The fitted mdels play the rle f inferred knwledge: Whether the mdels reflect useful r interesting knwledge is part f the verall, interactive KDD prcess where subjective human judgment is typically required. Tw primary mathematical frmalisms are used in mdel fitting: (1) statistical and (2) lgical. The statistical apprach allws fr nndeterministic effects in the mdel, whereas a lgical mdel is purely deterministic. We fcus primarily n the statistical apprach t data mining, which tends t be the mst widely used basis fr practical data-mining applicatins given the typical presence f uncertainty in real-wrld data-generating prcesses. Mst data-mining methds are based n tried and tested techniques frm machine learning, pattern recgnitin, and statistics: classificatin, clustering, regressin, and s n. The array f different algrithms under each f these headings can ften be bewildering t bth the nvice and the eperienced data analyst. It shuld be emphasized that f the many data-mining methds advertised in the literature, there are really nly a few fundamental techniques. The actual underlying mdel representatin being used by a particular methd typically cmes frm a cmpsitin f a small number f well-knwn ptins: plynmials, splines, kernel and basis functins, threshld-blean functins, and s n. Thus, algrithms tend t differ primarily in the gdness-f-fit criterin used t evaluate mdel fit r in the search methd used t find a gd fit. In ur brief verview f data-mining methds, we try in particular t cnvey the ntin that mst (if nt all) methds can be viewed as etensins r hybrids f a few basic techniques and principles. We first discuss the primary methds f data mining and then shw that the data- mining methds can be viewed as cnsisting f three primary algrithmic cmpnents: (1) mdel representatin, (2) mdel evaluatin, and (3) search. In the discussin f KDD and data-mining methds, we use a simple eample t make sme f the ntins mre cncrete. Figure 2 shws a simple tw-dimensinal artificial data set cnsisting f 23 cases. Each pint n the graph represents a persn wh has been given a lan by a particular bank at sme time in the past. The hrizntal ais represents the incme f the persn; the vertical ais represents the ttal persnal debt f the persn (mrtgage, car payments, and s n). The data have been classified int tw classes: (1) the s represent persns wh have defaulted n their lans and (2) the s represent persns whse lans are in gd status with the bank. Thus, this simple artificial data set culd represent a histrical data set that can cntain useful knwledge frm the pint f view f the bank making the lans. Nte that in actual KDD applicatins, there are typically many mre dimensins (as many as several hundreds) and many mre data pints (many thusands r even millins). Incme Figure 2. A Simple Data Set with Tw Classes Used fr Illustrative Purpses. FALL

8 Debt N Lan Lan Incme Figure 3. A Simple Linear Classificatin Bundary fr the Lan Data Set. The shaped regin dentes class n lan. Debt Regressin Line Incme Figure 4. A Simple Linear Regressin fr the Lan Data Set. The purpse here is t illustrate basic ideas n a small prblem in tw-dimensinal space. Data-Mining Methds The tw high-level primary gals f data mining in practice tend t be predictin and descriptin. As stated earlier, predictin invlves using sme variables r fields in the database t predict unknwn r future values f ther variables f interest, and descriptin fcuses n finding human-interpretable patterns describing the data. Althugh the bundaries between predictin and descriptin are nt sharp (sme f the predictive mdels can be descriptive, t the degree that they are understandable, and vice versa), the distinctin is useful fr understanding the verall discvery gal. The relative imprtance f predictin and descriptin fr particular data-mining applicatins can vary cnsiderably. The gals f predictin and descriptin can be achieved using a variety f particular data-mining methds. Classificatin is learning a functin that maps (classifies) a data item int ne f several predefined classes (Weiss and Kulikwski 1991; Hand 1981). Eamples f classificatin methds used as part f knwledge discvery applicatins include the classifying f trends in financial markets (Apte and Hng 1996) and the autmated identificatin f bjects f interest in large image databases (Fayyad, Djrgvski, and Weir 1996). Figure 3 shws a simple partitining f the lan data int tw class regins; nte that it is nt pssible t separate the classes perfectly using a linear decisin bundary. The bank might want t use the classificatin regins t autmatically decide whether future lan applicants will be given a lan r nt. Regressin is learning a functin that maps a data item t a real-valued predictin variable. Regressin applicatins are many, fr eample, predicting the amunt f bimass present in a frest given remtely sensed micrwave measurements, estimating the prbability that a patient will survive given the results f a set f diagnstic tests, predicting cnsumer demand fr a new prduct as a functin f advertising ependiture, and predicting time series where the input variables can be time-lagged versins f the predictin variable. Figure 4 shws the result f simple linear regressin where ttal debt is fitted as a linear functin f incme: The fit is pr because nly a weak crrelatin eists between the tw variables. Clustering is a cmmn descriptive task 44 AI MAGAZINE

9 where ne seeks t identify a finite set f categries r clusters t describe the data (Jain and Dubes 1988; Titteringtn, Smith, and Makv 1985). The categries can be mutually eclusive and ehaustive r cnsist f a richer representatin, such as hierarchical r verlapping categries. Eamples f clustering applicatins in a knwledge discvery cntet include discvering hmgeneus subppulatins fr cnsumers in marketing databases and identifying subcategries f spectra frm infrared sky measurements (Cheeseman and Stutz 1996). Figure 5 shws a pssible clustering f the lan data set int three clusters; nte that the clusters verlap, allwing data pints t belng t mre than ne cluster. The riginal class labels (dented by s and s in the previus figures) have been replaced by a t indicate that the class membership is n lnger assumed knwn. Clsely related t clustering is the task f prbability density estimatin, which cnsists f techniques fr estimating frm data the jint multivariate prbability density functin f all the variables r fields in the database (Silverman 1986). Summarizatin invlves methds fr finding a cmpact descriptin fr a subset f data. A simple eample wuld be tabulating the mean and standard deviatins fr all fields. Mre sphisticated methds invlve the derivatin f summary rules (Agrawal et al. 1996), multivariate visualizatin techniques, and the discvery f functinal relatinships between variables (Zembwicz and Zytkw 1996). Summarizatin techniques are ften applied t interactive eplratry data analysis and autmated reprt generatin. Dependency mdeling cnsists f finding a mdel that describes significant dependencies between variables. Dependency mdels eist at tw levels: (1) the structural level f the mdel specifies (ften in graphic frm) which variables are lcally dependent n each ther and (2) the quantitative level f the mdel specifies the strengths f the dependencies using sme numeric scale. Fr eample, prbabilistic dependency netwrks use cnditinal independence t specify the structural aspect f the mdel and prbabilities r crrelatins t specify the strengths f the dependencies (Glymur et al. 1987; Heckerman 1996). Prbabilistic dependency netwrks are increasingly finding applicatins in areas as diverse as the develpment f prbabilistic medical epert systems frm databases, infrmatin retrieval, and mdeling f the human genme. Change and deviatin detectin fcuses n Debt Cluster 1 discvering the mst significant changes in the data frm previusly measured r nrmative values (Berndt and Cliffrd 1996; Guyn, Matic, and Vapnik 1996; Klesgen 1996; Matheus, Piatetsky-Shapir, and McNeill 1996; Basseville and Nikifrv 1993). The Cmpnents f Data-Mining Algrithms The net step is t cnstruct specific algrithms t implement the general methds we utlined. One can identify three primary cmpnents in any data-mining algrithm: (1) mdel representatin, (2) mdel evaluatin, and (3) search. This reductinist view is nt necessarily cmplete r fully encmpassing; rather, it is a cnvenient way t epress the key cncepts f data-mining algrithms in a relatively unified and cmpact manner. Cheeseman (1990) utlines a similar structure. Mdel representatin is the language used t describe discverable patterns. If the representatin is t limited, then n amunt f training time r eamples can prduce an accurate mdel fr the data. It is imprtant that a data analyst fully cmprehend the representatinal assumptins that might be inherent in a particular methd. It is equally imprtant that an algrithm designer clearly state which representatinal assumptins are being made by a particular algrithm. Nte that increased representatinal pwer fr mdels increases the danger f verfitting the training data, resulting in reduced predictin accuracy n unseen data. Mdel-evaluatin criteria are quantitative Cluster 3 Cluster 2 Incme Figure 5. A Simple Clustering f the Lan Data Set int Three Clusters. Nte that riginal labels are replaced by a. FALL

10 Debt N Lan t Incme statements (r fit functins) f hw well a particular pattern (a mdel and its parameters) meets the gals f the KDD prcess. Fr eample, predictive mdels are ften judged by the empirical predictin accuracy n sme test set. Descriptive mdels can be evaluated alng the dimensins f predictive accuracy, nvelty, utility, and understandability f the fitted mdel. Search methd cnsists f tw cmpnents: (1) parameter search and (2) mdel search. Once the mdel representatin (r family f representatins) and the mdel-evaluatin criteria are fied, then the data-mining prblem has been reduced t purely an ptimizatin task: Find the parameters and mdels frm the selected family that ptimize the evaluatin criteria. In parameter search, the algrithm must search fr the parameters that ptimize the mdel-evaluatin criteria given bserved data and a fied mdel representatin. Mdel search ccurs as a lp ver the parameter-search methd: The mdel representatin is changed s that a family f mdels is cnsidered. Sme Data-Mining Methds A wide variety f data-mining methds eist, but here, we nly fcus n a subset f ppular techniques. Each methd is discussed in the cntet f mdel representatin, mdel evaluatin, and search. Lan Figure 6. Using a Single Threshld n the Incme Variable t Try t Classify the Lan Data Set. Decisin Trees and Rules Decisin trees and rules that use univariate splits have a simple representatinal frm, making the inferred mdel relatively easy fr the user t cmprehend. Hwever, the restrictin t a particular tree r rule representatin can significantly restrict the functinal frm (and, thus, the apprimatin pwer) f the mdel. Fr eample, figure 6 illustrates the effect f a threshld split applied t the incme variable fr a lan data set: It is clear that using such simple threshld splits (parallel t the feature aes) severely limits the type f classificatin bundaries that can be induced. If ne enlarges the mdel space t allw mre general epressins (such as multivariate hyperplanes at arbitrary angles), then the mdel is mre pwerful fr predictin but can be much mre difficult t cmprehend. A large number f decisin tree and rule-inductin algrithms are described in the machinelearning and applied statistics literature (Quinlan 1992; Breiman et al. 1984). T a large etent, they depend n likelihd-based mdel-evaluatin methds, with varying degrees f sphisticatin in terms f penalizing mdel cmpleity. Greedy search methds, which invlve grwing and pruning rule and tree structures, are typically used t eplre the superepnential space f pssible mdels. Trees and rules are primarily used fr predictive mdeling, bth fr classificatin (Apte and Hng 1996; Fayyad, Djrgvski, and Weir 1996) and regressin, althugh they can als be applied t summary descriptive mdeling (Agrawal et al. 1996). Nnlinear Regressin and Classificatin Methds These methds cnsist f a family f techniques fr predictin that fit linear and nnlinear cmbinatins f basis functins (sigmids, splines, plynmials) t cmbinatins f the input variables. Eamples include feedfrward neural netwrks, adaptive spline methds, and prjectin pursuit regressin (see Elder and Pregibn [1996], Cheng and Titteringtn [1994], and Friedman [1989] fr mre detailed discussins). Cnsider neural netwrks, fr eample. Figure 7 illustrates the type f nnlinear decisin bundary that a neural netwrk might find fr the lan data set. In terms f mdel evaluatin, althugh netwrks f the apprpriate size can universally apprimate any smth functin t any desired degree f accuracy, relatively little is knwn abut the representatin prperties f fied-size netwrks estimated frm finite data sets. Als, the standard squared errr and 46 AI MAGAZINE

11 crss-entrpy lss functins used t train neural netwrks can be viewed as lg-likelihd functins fr regressin and classificatin, respectively (Ripley 1994; Geman, Bienenstck, and Dursat 1992). Back prpagatin is a parameter-search methd that perfrms gradient descent in parameter (weight) space t find a lcal maimum f the likelihd functin starting frm randm initial cnditins. Nnlinear regressin methds, althugh pwerful in representatinal pwer, can be difficult t interpret. Fr eample, althugh the classificatin bundaries f figure 7 might be mre accurate than the simple threshld bundary f figure 6, the threshld bundary has the advantage that the mdel can be epressed, t sme degree f certainty, as a simple rule f the frm if incme is greater than threshld, then lan will have gd status. Eample-Based Methds The representatin is simple: Use representative eamples frm the database t apprimate a mdel; that is, predictins n new eamples are derived frm the prperties f similar eamples in the mdel whse predictin is knwn. Techniques include nearestneighbr classificatin and regressin algrithms (Dasarathy 1991) and case-based reasning systems (Kldner 1993). Figure 8 illustrates the use f a nearest-neighbr classifier fr the lan data set: The class at any new pint in the tw-dimensinal space is the same as the class f the clsest pint in the riginal training data set. A ptential disadvantage f eample-based methds (cmpared with tree-based methds) is that a well-defined distance metric fr evaluating the distance between data pints is required. Fr the lan data in figure 8, this wuld nt be a prblem because incme and debt are measured in the same units. Hwever, if ne wished t include variables such as the duratin f the lan, se, and prfessin, then it wuld require mre effrt t define a sensible metric between the variables. Mdel evaluatin is typically based n crss-validatin estimates (Weiss and Kulikwski 1991) f a predictin errr: Parameters f the mdel t be estimated can include the number f neighbrs t use fr predictin and the distance metric itself. Like nnlinear regressin methds, eample-based methds are ften asympttically pwerful in terms f apprimatin prperties but, cnversely, can be difficult t interpret because the mdel is implicit in the data and nt eplicitly frmulated. Related techniques include kernel-density Debt N Lan Lan Incme Figure 7. An Eample f Classificatin Bundaries Learned by a Nnlinear Classifier (Such as a Neural Netwrk) fr the Lan Data Set. Debt N Lan Lan Incme Figure 8. Classificatin Bundaries fr a Nearest-Neighbr Classifier fr the Lan Data Set. FALL

12 Given the brad spectrum f data-mining methds and algrithms, ur verview is in- Understanding data mining and mdel inductin at this cmpnent level clarifies the behavir f any data-mining algrithm and makes it easier fr the user t understand its verall cntributin and applicability t the KDD prcess. estimatin (Silverman 1986) and miture mdeling (Titteringtn, Smith, and Makv 1985). Prbabilistic Graphic Dependency Mdels Graphic mdels specify prbabilistic dependencies using a graph structure (Whittaker 1990; Pearl 1988). In its simplest frm, the mdel specifies which variables are directly dependent n each ther. Typically, these mdels are used with categrical r discrete-valued variables, but etensins t special cases, such as Gaussian densities, fr real-valued variables are als pssible. Within the AI and statistical cmmunities, these mdels were initially develped within the framewrk f prbabilistic epert systems; the structure f the mdel and the parameters (the cnditinal prbabilities attached t the links f the graph) were elicited frm eperts. Recently, there has been significant wrk in bth the AI and statistical cmmunities n methds whereby bth the structure and the parameters f graphic mdels can be learned directly frm databases (Buntine 1996; Heckerman 1996). Mdel-evaluatin criteria are typically Bayesian in frm, and parameter estimatin can be a miture f clsed-frm estimates and iterative methds depending n whether a variable is directly bserved r hidden. Mdel search can cnsist f greedy hill-climbing methds ver varius graph structures. Prir knwledge, such as a partial rdering f the variables based n causal relatins, can be useful in terms f reducing the mdel search space. Althugh still primarily in the research phase, graphic mdel inductin methds are f particular interest t KDD because the graphic frm f the mdel lends itself easily t human interpretatin. Relatinal Learning Mdels Althugh decisin trees and rules have a representatin restricted t prpsitinal lgic, relatinal learning (als knwn as inductive lgic prgramming) uses the mre fleible pattern language f first-rder lgic. A relatinal learner can easily find frmulas such as X = Y. Mst research t date n mdel-evaluatin methds fr relatinal learning is lgical in nature. The etra representatinal pwer f relatinal mdels cmes at the price f significant cmputatinal demands in terms f search. See Dzerski (1996) fr a mre detailed discussin. Discussin evitably limited in scpe; many data-mining techniques, particularly specialized methds fr particular types f data and dmains, were nt mentined specifically. We believe the general discussin n data-mining tasks and cmpnents has general relevance t a variety f methds. Fr eample, cnsider timeseries predictin, which traditinally has been cast as a predictive regressin task (autregressive mdels, and s n). Recently, mre general mdels have been develped fr time-series applicatins, such as nnlinear basis functins, eample-based mdels, and kernel methds. Furthermre, there has been significant interest in descriptive graphic and lcal data mdeling f time series rather than purely predictive mdeling (Weigend and Gershenfeld 1993). Thus, althugh different algrithms and applicatins might appear different n the surface, it is nt uncmmn t find that they share many cmmn cmpnents. Understanding data mining and mdel inductin at this cmpnent level clarifies the behavir f any data-mining algrithm and makes it easier fr the user t understand its verall cntributin and applicability t the KDD prcess. An imprtant pint is that each technique typically suits sme prblems better than thers. Fr eample, decisin tree classifiers can be useful fr finding structure in high-dimensinal spaces and in prblems with mied cntinuus and categrical data (because tree methds d nt require distance metrics). Hwever, classificatin trees might nt be suitable fr prblems where the true decisin bundaries between classes are described by a secnd-rder plynmial (fr eample). Thus, there is n universal data-mining methd, and chsing a particular algrithm fr a particular applicatin is smething f an art. In practice, a large prtin f the applicatin effrt can g int prperly frmulating the prblem (asking the right questin) rather than int ptimizing the algrithmic details f a particular data-mining methd (Langley and Simn 1995; Hand 1994). Because ur discussin and verview f data-mining methds has been brief, we want t make tw imprtant pints clear: First, ur verview f autmated search fcused mainly n autmated methds fr etracting patterns r mdels frm data. Althugh this apprach is cnsistent with the definitin we gave earlier, it des nt necessarily represent what ther cmmunities might refer t as data mining. Fr eample, sme use the term t designate any manual 48 AI MAGAZINE

13 search f the data r search assisted by queries t a database management system r t refer t humans visualizing patterns in data. In ther cmmunities, it is used t refer t the autmated crrelatin f data frm transactins r the autmated generatin f transactin reprts. We chse t fcus nly n methds that cntain certain degrees f search autnmy. Secnd, beware the hype: The state f the art in autmated methds in data mining is still in a fairly early stage f develpment. There are n established criteria fr deciding which methds t use in which circumstances, and many f the appraches are based n crude heuristic apprimatins t avid the epensive search required t find ptimal, r even gd, slutins. Hence, the reader shuld be careful when cnfrnted with verstated claims abut the great ability f a system t mine useful infrmatin frm large (r even small) databases. Applicatin Issues Fr a survey f KDD applicatins as well as detailed eamples, see Piatetsky-Shapir et al. (1996) fr industrial applicatins and Fayyad, Haussler, and Stlrz (1996) fr applicatins in science data analysis. Here, we eamine criteria fr selecting ptential applicatins, which can be divided int practical and technical categries. The practical criteria fr KDD prjects are similar t thse fr ther applicatins f advanced technlgy and include the ptential impact f an applicatin, the absence f simpler alternative slutins, and strng rganizatinal supprt fr using technlgy. Fr applicatins dealing with persnal data, ne shuld als cnsider the privacy and legal issues (Piatetsky-Shapir 1995). The technical criteria include cnsideratins such as the availability f sufficient data (cases). In general, the mre fields there are and the mre cmple the patterns being sught, the mre data are needed. Hwever, strng prir knwledge (see discussin later) can reduce the number f needed cases significantly. Anther cnsideratin is the relevance f attributes. It is imprtant t have data attributes that are relevant t the discvery task; n amunt f data will allw predictin based n attributes that d nt capture the required infrmatin. Furthermre, lw nise levels (few data errrs) are anther cnsideratin. High amunts f nise make it hard t identify patterns unless a large number f cases can mitigate randm nise and help clarify the aggregate patterns. Changing and timeriented data, althugh making the applicatin develpment mre difficult, make it ptentially much mre useful because it is easier t retrain a system than a human. Finally, and perhaps ne f the mst imprtant cnsideratins, is prir knwledge. It is useful t knw smething abut the dmain what are the imprtant fields, what are the likely relatinships, what is the user utility functin, what patterns are already knwn, and s n. Research and Applicatin Challenges We utline sme f the current primary research and applicatin challenges fr KDD. This list is by n means ehaustive and is intended t give the reader a feel fr the types f prblem that KDD practitiners wrestle with. Larger databases: Databases with hundreds f fields and tables and millins f recrds and f a multigigabyte size are cmmnplace, and terabyte (10 12 bytes) databases are beginning t appear. Methds fr dealing with large data vlumes include mre efficient algrithms (Agrawal et al. 1996), sampling, apprimatin, and massively parallel prcessing (Hlsheimer et al. 1996). High dimensinality: Nt nly is there ften a large number f recrds in the database, but there can als be a large number f fields (attributes, variables); s, the dimensinality f the prblem is high. A high-dimensinal data set creates prblems in terms f increasing the size f the search space fr mdel inductin in a cmbinatrially eplsive manner. In additin, it increases the chances that a data-mining algrithm will find spurius patterns that are nt valid in general. Appraches t this prblem include methds t reduce the effective dimensinality f the prblem and the use f prir knwledge t identify irrelevant variables. Overfitting: When the algrithm searches fr the best parameters fr ne particular mdel using a limited set f data, it can mdel nt nly the general patterns in the data but als any nise specific t the data set, resulting in pr perfrmance f the mdel n test data. Pssible slutins include crss-validatin, regularizatin, and ther sphisticated statistical strategies. Assessing f statistical significance: A prblem (related t verfitting) ccurs when the system is searching ver many pssible mdels. Fr eample, if a system tests mdels at the significance level, then n average, with purely randm data, N/1000 f these mdels will be accepted as significant. FALL

14 This pint is frequently missed by many initial attempts at KDD. One way t deal with this prblem is t use methds that adjust the test statistic as a functin f the search, fr eample, Bnferrni adjustments fr independent tests r randmizatin testing. Changing data and knwledge: Rapidly changing (nnstatinary) data can make previusly discvered patterns invalid. In additin, the variables measured in a given applicatin database can be mdified, deleted, r augmented with new measurements ver time. Pssible slutins include incremental methds fr updating the patterns and treating change as an pprtunity fr discvery by using it t cue the search fr patterns f change nly (Matheus, Piatetsky-Shapir, and McNeill 1996). See als Agrawal and Psaila (1995) and Mannila, Tivnen, and Verkam (1995). Missing and nisy data: This prblem is especially acute in business databases. U.S. census data reprtedly have errr rates as great as 20 percent in sme fields. Imprtant attributes can be missing if the database was nt designed with discvery in mind. Pssible slutins include mre sphisticated statistical strategies t identify hidden variables and dependencies (Heckerman 1996; Smyth et al. 1996). Cmple relatinships between fields: Hierarchically structured attributes r values, relatins between attributes, and mre sphisticated means fr representing knwledge abut the cntents f a database will require algrithms that can effectively use such infrmatin. Histrically, data-mining algrithms have been develped fr simple attribute-value recrds, althugh new techniques fr deriving relatins between variables are being develped (Dzerski 1996; Djk, Ck, and Hlder 1995). Understandability f patterns: In many applicatins, it is imprtant t make the discveries mre understandable by humans. Pssible slutins include graphic representatins (Buntine 1996; Heckerman 1996), rule structuring, natural language generatin, and techniques fr visualizatin f data and knwledge. Rule-refinement strategies (fr eample, Majr and Mangan [1995]) can be used t address a related prblem: The discvered knwledge might be implicitly r eplicitly redundant. User interactin and prir knwledge: Many current KDD methds and tls are nt truly interactive and cannt easily incrprate prir knwledge abut a prblem ecept in simple ways. The use f dmain knwl- edge is imprtant in all the steps f the KDD prcess. Bayesian appraches (fr eample, Cheeseman [1990]) use prir prbabilities ver data and distributins as ne frm f encding prir knwledge. Others emply deductive database capabilities t discver knwledge that is then used t guide the data-mining search (fr eample, Simudis, Livezey, and Kerber [1995]). Integratin with ther systems: A standalne discvery system might nt be very useful. Typical integratin issues include integratin with a database management system (fr eample, thrugh a query interface), integratin with spreadsheets and visualizatin tls, and accmmdating f real-time sensr readings. Eamples f integrated KDD systems are described by Simudis, Livezey, and Kerber (1995) and Stlrz, Nakamura, Mesrbiam, Muntz, Shek, Sants, Yi, Ng, Chien, Mechs, and Farrara (1995). Cncluding Remarks: The Ptential Rle f AI in KDD In additin t machine learning, ther AI fields can ptentially cntribute significantly t varius aspects f the KDD prcess. We mentin a few eamples f these areas here: Natural language presents significant pprtunities fr mining in free-frm tet, especially fr autmated anntatin and indeing prir t classificatin f tet crpra. Limited parsing capabilities can help substantially in the task f deciding what an article refers t. Hence, the spectrum frm simple natural language prcessing all the way t language understanding can help substantially. Als, natural language prcessing can cntribute significantly as an effective interface fr stating hints t mining algrithms and visualizing and eplaining knwledge derived by a KDD system. Planning cnsiders a cmplicated data analysis prcess. It invlves cnducting cmplicated data-access and data-transfrmatin peratins; applying preprcessing rutines; and, in sme cases, paying attentin t resurce and data-access cnstraints. Typically, data prcessing steps are epressed in terms f desired pstcnditins and precnditins fr the applicatin f certain rutines, which lends itself easily t representatin as a planning prblem. In additin, planning ability can play an imprtant rle in autmated agents (see net item) t cllect data samples r cnduct a search t btain needed data sets. Intelligent agents can be fired ff t cllect necessary infrmatin frm a variety f 50 AI MAGAZINE

15 surces. In additin, infrmatin agents can be activated remtely ver the netwrk r can trigger n the ccurrence f a certain event and start an analysis peratin. Finally, agents can help navigate and mdel the Wrld-Wide Web (Etzini 1996), anther area grwing in imprtance. Uncertainty in AI includes issues fr managing uncertainty, prper inference mechanisms in the presence f uncertainty, and the reasning abut causality, all fundamental t KDD thery and practice. In fact, the KDD-96 cnference had a jint sessin with the UAI-96 cnference this year (Hrvitz and Jensen 1996). Knwledge representatin includes ntlgies, new cncepts fr representing, string, and accessing knwledge. Als included are schemes fr representing knwledge and allwing the use f prir human knwledge abut the underlying prcess by the KDD system. These ptential cntributins f AI are but a sampling; many thers, including humancmputer interactin, knwledge-acquisitin techniques, and the study f mechanisms fr reasning, have the pprtunity t cntribute t KDD. In cnclusin, we presented sme definitins f basic ntins in the KDD field. Our primary aim was t clarify the relatin between knwledge discvery and data mining. We prvided an verview f the KDD prcess and basic data-mining methds. Given the brad spectrum f data-mining methds and algrithms, ur verview is inevitably limited in scpe: There are many data-mining techniques, particularly specialized methds fr particular types f data and dmain. Althugh varius algrithms and applicatins might appear quite different n the surface, it is nt uncmmn t find that they share many cmmn cmpnents. Understanding data mining and mdel inductin at this cmpnent level clarifies the task f any data-mining algrithm and makes it easier fr the user t understand its verall cntributin and applicability t the KDD prcess. This article represents a step tward a cmmn framewrk that we hpe will ultimately prvide a unifying visin f the cmmn verall gals and methds used in KDD. We hpe this will eventually lead t a better understanding f the variety f appraches in this multidisciplinary field and hw they fit tgether. Acknwledgments We thank Sam Uthurusamy, Rn Brachman, and KDD-96 referees fr their valuable suggestins and ideas. Nte 1. Thrughut this article, we use the term pattern t designate a pattern fund in data. We als refer t mdels. One can think f patterns as cmpnents f mdels, fr eample, a particular rule in a classificatin mdel r a linear cmpnent in a regressin mdel. References Agrawal, R., and Psaila, G Active Data Mining. In Prceedings f the First Internatinal Cnference n Knwledge Discvery and Data Mining (KDD-95), 3 8. Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Agrawal, R.; Mannila, H.; Srikant, R.; Tivnen, H.; and Verkam, I Fast Discvery f Assciatin Rules. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Apte, C., and Hng, S. J Predicting Equity Returns frm Securities Data with Minimal Rule Generatin. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Basseville, M., and Nikifrv, I. V Detectin f Abrupt Changes: Thery and Applicatin. Englewd Cliffs, N.J.: Prentice Hall. Berndt, D., and Cliffrd, J Finding Patterns in Time Series: A Dynamic Prgramming Apprach. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Berry, J Database Marketing. Business Week, September 5, Brachman, R., and Anand, T The Prcess f Knwledge Discvery in Databases: A Human-Centered Apprach. In Advances in Knwledge Discvery and Data Mining, 37 58, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy. Menl Park, Calif.: AAAI Press. Breiman, L.; Friedman, J. H.; Olshen, R. A.; and Stne, C. J Classificatin and Regressin Trees. Belmnt, Calif.: Wadswrth. Brdley, C. E., and Smyth, P Applying Classificatin Algrithms in Practice. Statistics and Cmputing. Frthcming. Buntine, W Graphical Mdels fr Discvering Knwledge. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky- Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Cheeseman, P On Finding the Mst Prbable Mdel. In Cmputatinal Mdels f Scientific Discvery and Thery Frmatin, eds. J. Shrager and P. Langley, San Francisc, Calif.: Mrgan Kaufmann. Cheeseman, P., and Stutz, J Bayesian Classificatin (AUTOCLASS): Thery and Results. In Advances in Knwledge Discvery and Data Mining, eds. FALL

16 U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Cheng, B., and Titteringtn, D. M Neural Netwrks A Review frm a Statistical Perspective. Statistical Science 9(1): Cdd, E. F Prviding OLAP (On-Line Analytical Prcessing) t User-Analysts: An IT Mandate. E. F. Cdd and Assciates. Dasarathy, B. V Nearest Neighbr (NN) Nrms: NN Pattern Classificatin Techniques. Washingtn, D.C.: IEEE Cmputer Sciety. Djk, S.; Ck, D.; and Hlder, L Analyzing the Benefits f Dmain Knwledge in Substructure Discvery. In Prceedings f KDD-95: First Internatinal Cnference n Knwledge Discvery and Data Mining, Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Dzerski, S Inductive Lgic Prgramming fr Knwledge Discvery in Databases. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Elder, J., and Pregibn, D A Statistical Perspective n KDD. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky- Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Etzini, O The Wrld Wide Web: Quagmire r Gld Mine? Cmmunicatins f the ACM (Special Issue n Data Mining). Nvember Frthcming. Fayyad, U. M.; Djrgvski, S. G.; and Weir, N Frm Digitized Images t On-Line Catalgs: Data Mining a Sky Survey. AI Magazine 17(2): Fayyad, U. M.; Haussler, D.; and Stlrz, Z KDD fr Science Data Analysis: Issues and Eamples. In Prceedings f the Secnd Internatinal Cnference n Knwledge Discvery and Data Mining (KDD-96), Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Fayyad, U. M.; Piatetsky-Shapir, G.; and Smyth, P Frm Data Mining t Knwledge Discvery: An Overview. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky- Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Fayyad, U. M.; Piatetsky-Shapir, G.; Smyth, P.; and Uthurusamy, R Advances in Knwledge Discvery and Data Mining. Menl Park, Calif.: AAAI Press. Friedman, J. H Multivariate Adaptive Regressin Splines. Annals f Statistics 19: Geman, S.; Bienenstck, E.; and Dursat, R Neural Netwrks and the Bias/Variance Dilemma. Neural Cmputatin 4:1 58. Glymur, C.; Madigan, D.; Pregibn, D.; and Smyth, P Statistics and Data Mining. Cmmunicatins f the ACM (Special Issue n Data Mining). Nvember Frthcming. Glymur, C.; Scheines, R.; Spirtes, P.; Kelly, K Discvering Causal Structure. New Yrk: Academic. Guyn, O.; Matic, N.; and Vapnik, N Discv- ering Infrmative Patterns and Data Cleaning. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Hall, J.; Mani, G.; and Barr, D Applying Cmputatinal Intelligence t the Investment Prcess. In Prceedings f CIFER-96: Cmputatinal Intelligence in Financial Engineering. Washingtn, D.C.: IEEE Cmputer Sciety. Hand, D. J Decnstructing Statistical Questins. Jurnal f the Ryal Statistical Sciety A. 157(3): Hand, D. J Discriminatin and Classificatin. Chichester, U.K.: Wiley. Heckerman, D Bayesian Netwrks fr Knwledge Discvery. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky- Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Hernandez, M., and Stlf, S The MERGE- PURGE Prblem fr Large Databases. In Prceedings f the 1995 ACM-SIGMOD Cnference, New Yrk: Assciatin fr Cmputing Machinery. Hlsheimer, M.; Kersten, M. L.; Mannila, H.; and Tivnen, H Data Surveyr: Searching the Nuggets in Parallel. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Hrvitz, E., and Jensen, F Prceedings f the Twelfth Cnference f Uncertainty in Artificial Intelligence. San Mate, Calif.: Mrgan Kaufmann. Jain, A. K., and Dubes, R. C Algrithms fr Clustering Data. Englewd Cliffs, N.J.: Prentice- Hall. Klesgen, W A Multipattern and Multistrategy Discvery Assistant. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Klesgen, W., and Zytkw, J Knwledge Discvery in Databases Terminlgy. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Kldner, J Case-Based Reasning. San Francisc, Calif.: Mrgan Kaufmann. Langley, P., and Simn, H. A Applicatins f Machine Learning and Rule Inductin. Cmmunicatins f the ACM 38: Majr, J., and Mangan, J Selecting amng Rules Induced frm a Hurricane Database. Jurnal f Intelligent Infrmatin Systems 4(1): Manag, M., and Auril, M Mining fr OR. ORMS Tday (Special Issue n Data Mining), February, Mannila, H.; Tivnen, H.; and Verkam, A. I Discvering Frequent Episdes in Sequences. In Prceedings f the First Internatinal Cnference n Knwledge Discvery and Data Mining (KDD-95), Menl Park, Calif.: American 52 AI MAGAZINE

17 Assciatin fr Artificial Intelligence. Matheus, C.; Piatetsky-Shapir, G.; and McNeill, D Selecting and Reprting What Is Interesting: The KEfiR Applicatin t Healthcare Data. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Pearl, J Prbabilistic Reasning in Intelligent Systems. San Francisc, Calif.: Mrgan Kaufmann. Piatetsky-Shapir, G Knwledge Discvery in Persnal Data versus Privacy A Mini-Sympsium. IEEE Epert 10(5). Piatetsky-Shapir, G Knwledge Discvery in Real Databases: A Reprt n the IJCAI-89 Wrkshp. AI Magazine 11(5): Piatetsky-Shapir, G., and Matheus, C The Interestingness f Deviatins. In Prceedings f KDD-94, eds. U. M. Fayyad and R. Uthurusamy. Technical Reprt WS-03. Menl Park, Calif.: AAAI Press. Piatetsky-Shapir, G.; Brachman, R.; Khabaza, T.; Klesgen, W.; and Simudis, E., An Overview f Issues in Develping Industrial Data Mining and Knwledge Discvery Applicatins. In Prceedings f the Secnd Internatinal Cnference n Knwledge Discvery and Data Mining (KDD-96), eds. J. Han and E. Simudis, Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Quinlan, J C4.5: Prgrams fr Machine Learning. San Francisc, Calif.: Mrgan Kaufmann. Ripley, B. D Neural Netwrks and Related Methds fr Classificatin. Jurnal f the Ryal Statistical Sciety B. 56(3): Senatr, T.; Gldberg, H. G.; Wtn, J.; Cttini, M. A.; Umarkhan, A. F.; Klinger, C. D.; Llamas, W. M.; Marrne, M. P.; and Wng, R. W. H The Financial Crimes Enfrcement Netwrk AI System (FAIS): Identifying Ptential Mney Laundering frm Reprts f Large Cash Transactins. AI Magazine 16(4): Shrager, J., and Langley, P., eds Cmputatinal Mdels f Scientific Discvery and Thery Frmatin. San Francisc, Calif.: Mrgan Kaufmann. Silberschatz, A., and Tuzhilin, A On Subjective Measures f Interestingness in Knwledge Discvery. In Prceedings f KDD-95: First Internatinal Cnference n Knwledge Discvery and Data Mining, Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Silverman, B Density Estimatin fr Statistics and Data Analysis. New Yrk: Chapman and Hall. Simudis, E.; Livezey, B.; and Kerber, R Using Recn fr Data Cleaning. In Prceedings f KDD-95: First Internatinal Cnference n Knwledge Discvery and Data Mining, Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Smyth, P.; Burl, M.; Fayyad, U.; and Perna, P Mdeling Subjective Uncertainty in Image Anntatin. In Advances in Knwledge Discvery and Data Mining, Menl Park, Calif.: AAAI Press. Spirtes, P.; Glymur, C.; and Scheines, R Causatin, Predictin, and Search. New Yrk: Springer-Verlag. Stlrz, P.; Nakamura, H.; Mesrbian, E.; Muntz, R.; Shek, E.; Sants, J.; Yi, J.; Ng, K.; Chien, S.; Mechs, C.; and Farrara, J Fast Spati-Tempral Data Mining f Large Gephysical Datasets. In Prceedings f KDD-95: First Internatinal Cnference n Knwledge Discvery and Data Mining, Menl Park, Calif.: American Assciatin fr Artificial Intelligence. Titteringtn, D. M.; Smith, A. F. M.; and Makv, U. E Statistical Analysis f Finite-Miture Distributins. Chichester, U.K.: Wiley. U.S. News Basketball s New High-Tech Guru: IBM Sftware Is Changing Caches Game Plans. U.S. News and Wrld Reprt, 11 December. Weigend, A., and Gershenfeld, N., eds Predicting the Future and Understanding the Past. Redwd City, Calif.: Addisn-Wesley. Weiss, S. I., and Kulikwski, C Cmputer Systems That Learn: Classificatin and Predictin Methds frm Statistics, Neural Netwrks, Machine Learning, and Epert Systems. San Francisc, Calif.: Mrgan Kaufmann. Whittaker, J Graphical Mdels in Applied Multivariate Statistics. New Yrk: Wiley. Zembwicz, R., and Zytkw, J Frm Cntingency Tables t Varius Frms f Knwledge in Databases. In Advances in Knwledge Discvery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapir, P. Smyth, and R. Uthurusamy, Menl Park, Calif.: AAAI Press. Usama Fayyad is a senir researcher at Micrsft Research. He received his Ph.D. in 1991 frm the University f Michigan at Ann Arbr. Prir t jining Micrsft in 1996, he headed the Machine Learning Systems Grup at the Jet Prpulsin Labratry (JPL), Califrnia Institute f Technlgy, where he develped data-mining systems fr autmated science data analysis. He remains affiliated with JPL as a distinguished visiting scientist. Fayyad received the JPL 1993 Lew Allen Award fr Ecellence in Research and the 1994 Natinal Aernautics and Space Administratin Eceptinal Achievement Medal. His research interests include knwledge discvery in large databases, data mining, machine-learning thery and applicatins, statistical pattern recgnitin, and clustering. He was prgram cchair f KDD-94 and KDD-95 (the First Internatinal Cnference n Knwledge Discvery and Data Mining). He is general chair f KDD-96, an editr in chief f the jurnal Data Mining and Knwledge Discvery, and ceditr f the 1996 AAAI Press bk Advances in Knwledge Discvery and Data Mining. FALL

18 Gregry Piatetsky-Shapir is a principal member f the technical staff at GTE Labratries and the principal investigatr f the Knwledge Discvery in Databases (KDD) Prject, which fcuses n develping and deplying advanced KDD systems fr business applicatins. Previusly, he wrked n applying intelligent frnt ends t hetergeneus databases. Piatetsky-Shapir received several GTE awards, including GTE s highest technical achievement award fr the KEfiR system fr health-care data analysis. His research interests include intelligent database systems, dependency netwrks, and Internet resurce discvery. Prir t GTE, he wrked at Strategic Infrmatin develping financial database systems. Piatetsky-Shapir received his M.S. in 1979 and his Ph.D. in 1984, bth frm New Yrk University (NYU). His Ph.D. dissertatin n self-rganizing database systems received NYU awards as the best dissertatin in cmputer science and in all natural sciences. Piatetsky- Shapir rganized and chaired the first three (1989, 1991, and 1993) KDD wrkshps and helped in develping them int successful cnferences (KDD-95 and KDD-96). He has als been n the prgram cmmittees f numerus ther cnferences and wrkshps n AI and databases. He edited and cedited several cllectins n KDD, including tw bks Knwledge Discvery in Databases (AAAI Press, 1991) and Advances in Knwledge Discvery in Databases (AAAI Press, 1996) and has many ther publicatins in the areas f AI and databases. He is a ceditr in chief f the new Data Mining and Knwledge Discvery jurnal. Piatetsky-Shapir funded and mderates the KDD Nuggets electrnic newsletter (kdd@gte.cm) and is the web master fr Knwledge Discvery Mine (< ~kdd /inde.html>). Padhraic Smyth received a firstclass-hnrs Bachelr f Engineering frm the Natinal University f Ireland in 1984 and an MSEE and a Ph.D. frm the Electrical Engineering Department at the Califrnia Institute f Technlgy (Caltech) in 1985 and 1988, respectively. Frm 1988 t 1996, he was a technical grup leader at the Jet Prpulsin Labratry (JPL). Since April 1996, he has been a faculty member in the Infrmatin and Cmputer Science Department at the University f Califrnia at Irvine. He is als currently a principal investigatr at JPL (part-time) and is a cnsultant t private industry. Smyth received the Lew Allen Award fr Ecellence in Research at JPL in 1993 and has been awarded 14 Natinal Aernautics and Space Administratin certificates fr technical innvatin since He was ceditr f the bk Advances in Knwledge Discvery and Data Mining (AAAI Press, 1996). Smyth was a visiting lecturer in the Cmputatinal and Neural Systems and Electri- cal Engineering Departments at Caltech (1994) and regularly cnducts tutrials n prbabilistic learning algrithms at natinal cnferences (including UAI-93, AAAI-94, CAIA-95, IJCAI-95). He is general chair f the Sith Internatinal Wrkshp n AI and Statistics, t be held in Smyth s research interests include statistical pattern recgnitin, machine learning, decisin thery, prbabilistic reasning, infrmatin thery, and the applicatin f prbability and statistics in AI. He has published 16 jurnal papers, 10 bk chapters, and 60 cnference papers n these tpics. AAAI 97 Prvidence, Rhde Island July 27 31, 1997 Title pages due January 6, 1997 Papers due January 8, 1997 Camera cpy due April 2, 1997 ncai@aaaai.rg Cnferences/Natinal/1997/aaai97.html 54 AI MAGAZINE

Across a wide variety of fields, data are

Across a wide variety of fields, data are AI Magazine Vlume 17 Number 3 (1996) ( AAAI) Articles Frm Data Mining t Knwledge Discvery in Databases Usama Fayyad, Gregry Piatetsky-Shapir, and Padhraic Smyth Data mining and knwledge discvery in databases

More information

Business Intelligence represents a fundamental shift in the purpose, objective and use of information

Business Intelligence represents a fundamental shift in the purpose, objective and use of information Overview f BI and rle f DW in BI Business Intelligence & Why is it ppular? Business Intelligence Steps Business Intelligence Cycle Example Scenaris State f Business Intelligence Business Intelligence Tls

More information

To transform information into knowledge- a firm must expend additional resources to discover, patterns, rules, and context where the knowledge works

To transform information into knowledge- a firm must expend additional resources to discover, patterns, rules, and context where the knowledge works Chapter 15- Managing Knwledge Knwledge Management Landscape Knwledge management systems- supprt the creatin, capture, strage, and disseminatin f firm expertise and knwledge, have becme ne f the fastest-grwing

More information

Data mining methodology extracts hidden predictive information from large databases.

Data mining methodology extracts hidden predictive information from large databases. Data Mining Overview By: Dr. Michael Gilman, CEO, Data Mining Technlgies Inc. With the prliferatin f data warehuses, data mining tls are flding the market. Their bjective is t discver hidden gld in yur

More information

How to Reduce Project Lead Times Through Improved Scheduling

How to Reduce Project Lead Times Through Improved Scheduling Hw t Reduce Prject Lead Times Thrugh Imprved Scheduling PROBABILISTIC SCHEDULING & BUFFER MANAGEMENT Cnventinal Prject Scheduling ften results in plans that cannt be executed and t many surprises. In many

More information

Data Warehouse: Introduction

Data Warehouse: Introduction DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase

More information

Trends and Considerations in Currency Recycle Devices. What is a Currency Recycle Device? November 2003

Trends and Considerations in Currency Recycle Devices. What is a Currency Recycle Device? November 2003 Trends and Cnsideratins in Currency Recycle Devices Nvember 2003 This white paper prvides basic backgrund n currency recycle devices as cmpared t the cmbined features f a currency acceptr device and a

More information

Design for securability Applying engineering principles to the design of security architectures

Design for securability Applying engineering principles to the design of security architectures Design fr securability Applying engineering principles t the design f security architectures Amund Hunstad Phne number: + 46 13 37 81 18 Fax: + 46 13 37 85 50 Email: amund@fi.se Jnas Hallberg Phne number:

More information

Talking Bout. a Revolution 100% 110% 120% 90% 80% 70% 130% 140%

Talking Bout. a Revolution 100% 110% 120% 90% 80% 70% 130% 140% Talking But a Revlutin 90% 80% 70% 60 0% 100% 110% 120% 130% 140% In-Memry analysis n 64-bit platfrms ushering in a new class f pwerful, affrdable and easy-t-use Business Intelligence slutins fr the masses

More information

Team Process Data Warehouse Goals and High-Level Requirements

Team Process Data Warehouse Goals and High-Level Requirements Team Prcess Data Warehuse Gals and High-Level Requirements Backgrund TSP SM is used by teams wrking in a wide variety f prblem dmains (e.g. sftware, hardware, services). Since these activities are nt limited

More information

UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES

UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES REFERENCES AND RELATED POLICIES A. UC PPSM 2 -Definitin f Terms B. UC PPSM 12 -Nndiscriminatin in Emplyment C. UC PPSM 14 -Affirmative

More information

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future The Imprtance Advanced Data Cllectin System Maintenance Berry Drijsen Glbal Service Business Manager WHITE PAPER knwledge t shape yur future The Imprtance Advanced Data Cllectin System Maintenance Cntents

More information

Data Abstraction Best Practices with Cisco Data Virtualization

Data Abstraction Best Practices with Cisco Data Virtualization White Paper Data Abstractin Best Practices with Cisc Data Virtualizatin Executive Summary Enterprises are seeking ways t imprve their verall prfitability, cut csts, and reduce risk by prviding better access

More information

Usage of data mining for analyzing customer mindset

Usage of data mining for analyzing customer mindset Internatinal Jurnal f Electrnics and Cmputer Science Engineering 2533 Available Online at www.ijecse.rg ISSN- 2277-1956 Usage f data mining fr analyzing custmer mindset Priti Sadaria 1, Miral Kthari 1

More information

Integrate Marketing Automation, Lead Management and CRM

Integrate Marketing Automation, Lead Management and CRM Clsing the Lp: Integrate Marketing Autmatin, Lead Management and CRM Circular thinking fr marketers 1 (866) 372-9431 www.clickpintsftware.cm Clsing the Lp: Integrate Marketing Autmatin, Lead Management

More information

Research Report. Abstract: The Emerging Intersection Between Big Data and Security Analytics. November 2012

Research Report. Abstract: The Emerging Intersection Between Big Data and Security Analytics. November 2012 Research Reprt Abstract: The Emerging Intersectin Between Big Data and Security Analytics By Jn Oltsik, Senir Principal Analyst With Jennifer Gahm Nvember 2012 2012 by The Enterprise Strategy Grup, Inc.

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA 3..5

More information

Oakland Unified School District Impact Assessment Performance Management in Action

Oakland Unified School District Impact Assessment Performance Management in Action Oakland Unified Schl District Impact Assessment Perfrmance Management in Actin The perfrmance management system that has been built in this district prvides the systems that supprt ur cmmitment t scial

More information

The Importance of Market Research

The Importance of Market Research The Imprtance f Market Research 1. What is market research? Successful businesses have extensive knwledge f their custmers and their cmpetitrs. Market research is the prcess f gathering infrmatin which

More information

Licensing Windows Server 2012 for use with virtualization technologies

Licensing Windows Server 2012 for use with virtualization technologies Vlume Licensing brief Licensing Windws Server 2012 fr use with virtualizatin technlgies (VMware ESX/ESXi, Micrsft System Center 2012 Virtual Machine Manager, and Parallels Virtuzz) Table f Cntents This

More information

CS 360 Software Development Spring 2008 Tuesdays and Thursdays 3:30 p.m. 4:45 p.m.

CS 360 Software Development Spring 2008 Tuesdays and Thursdays 3:30 p.m. 4:45 p.m. CS 360 Sftware Develpment Spring 2008 Tuesdays and Thursdays 3:30 p.m. 4:45 p.m. Instructr: Ingrid Russell Office: Dana 343 email: irussell@hartfrd.edu http://uhaweb.hartfrd.edu/irussell Curse Descriptin:

More information

Defining Sales Campaign Automation How e-mail, the Killer App, is best applied to marketing

Defining Sales Campaign Automation How e-mail, the Killer App, is best applied to marketing Defining Sales Campaign Autmatin Hw e-mail, the Killer App, is best applied t marketing Summary: Cmpanies tday are steadily adpting strategies and technlgies t reach prspects, custmers, and partners thrugh

More information

CDC UNIFIED PROCESS PRACTICES GUIDE

CDC UNIFIED PROCESS PRACTICES GUIDE Dcument Purpse The purpse f this dcument is t prvide guidance n the practice f Business Case and t describe the practice verview, requirements, best practices, activities, and key terms related t these

More information

Analytical Techniques created for the offline world can they yield benefits online?

Analytical Techniques created for the offline world can they yield benefits online? Analytical Techniques created fr the ffline wrld can they yield benefits nline? Dr. Barry Leventhal BarryAnalytics Limited Transfrming Data Abut BarryAnalytics Advanced Analytics Cnsultancy funded in 2009

More information

IFRS Discussion Group

IFRS Discussion Group IFRS Discussin Grup Reprt n the Public Meeting February 26, 2014 The IFRS Discussin Grup is a discussin frum nly. The Grup s purpse is t assist the Accunting Standards Bard (AcSB) regarding issues arising

More information

How To Mine Data From A Database

How To Mine Data From A Database Intrductin t KDD and data mining Nguyen Hung Sn This presentatin was prepared n the basis f the fllwing public materials: 1. Jiawei Han and Micheline Kamber, Data mining, cncept and techniques http://www.cs.sfu.ca

More information

Improved Data Center Power Consumption and Streamlining Management in Windows Server 2008 R2 with SP1

Improved Data Center Power Consumption and Streamlining Management in Windows Server 2008 R2 with SP1 Imprved Data Center Pwer Cnsumptin and Streamlining Management in Windws Server 2008 R2 with SP1 Disclaimer The infrmatin cntained in this dcument represents the current view f Micrsft Crpratin n the issues

More information

Table of Contents. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Table of Contents. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Table f Cntents Tp Pricing and Licensing Questins... 2 Why shuld custmers be excited abut Micrsft SQL Server 2012?... 2 What are the mst significant changes t the pricing and licensing fr SQL Server?...

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 Stats 202: Data Mining and Analysis Lester Mackey September 23, 2015 (Slide credits: Sergi Bacallad) 1 / 24 Annuncements

More information

Data Warehouse Scope Recommendations

Data Warehouse Scope Recommendations Rensselaer Data Warehuse Prject http://www.rpi.edu/datawarehuse Financial Analysis Scpe and Data Audits This dcument describes the scpe f the Financial Analysis data mart scheduled fr delivery in July

More information

Licensing Windows Server 2012 R2 for use with virtualization technologies

Licensing Windows Server 2012 R2 for use with virtualization technologies Vlume Licensing brief Licensing Windws Server 2012 R2 fr use with virtualizatin technlgies (VMware ESX/ESXi, Micrsft System Center 2012 R2 Virtual Machine Manager, and Parallels Virtuzz) Table f Cntents

More information

Document Management Versioning Strategy

Document Management Versioning Strategy 1.0 Backgrund and Overview Dcument Management Versining Strategy Versining is an imprtant cmpnent f cntent creatin and management. Versin management is a key cmpnent f enterprise cntent management. The

More information

Disk Redundancy (RAID)

Disk Redundancy (RAID) A Primer fr Business Dvana s Primers fr Business series are a set f shrt papers r guides intended fr business decisin makers, wh feel they are being bmbarded with terms and want t understand a cmplex tpic.

More information

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE A N D R E I A F E R R E I R A, A N T Ó N I O C A S T R O, D E L F I N A S Á S O A R E

More information

Change Management Process

Change Management Process Change Management Prcess B1.10 Change Management Prcess 1. Intrductin This plicy utlines [Yur Cmpany] s apprach t managing change within the rganisatin. All changes in strategy, activities and prcesses

More information

The Cost Benefits of the Cloud are More About Real Estate Than IT

The Cost Benefits of the Cloud are More About Real Estate Than IT y The Cst Benefits f the Clud are Mre Abut Real Estate Than IT #$#%&'()*( An Osterman Research Executive Brief Published December 2010 "#$#%&'()*( Osterman Research, Inc. P.O. Bx 1058 Black Diamnd, Washingtn

More information

A Walk on the Human Performance Side Part I

A Walk on the Human Performance Side Part I A Walk n the Human Perfrmance Side Part I Perfrmance Architects have a license t snp. We are in the business f supprting ur client rganizatins in their quest fr results that meet r exceed gals. We accmplish

More information

Retirement Planning Options Annuities

Retirement Planning Options Annuities Retirement Planning Optins Annuities Everyne wants a glden retirement. But saving fr retirement is n easy task. The baby bmer generatin is graying. Mre and mre peple are appraching retirement age. With

More information

Standardization or Harmonization? You need Both

Standardization or Harmonization? You need Both Standardizatin r? Yu need Bth Albrecht Richen and Ansgar Steinhrst Recently the CFO f a majr cnsumer electrnics cmpany stated, We dn t need standardizatin f ur wrldwide prcesses, we need harmnizatin. Is

More information

GENERAL EDUCATION. Communication: Students will effectively exchange ideas and information using multiple methods of communication.

GENERAL EDUCATION. Communication: Students will effectively exchange ideas and information using multiple methods of communication. Prcedure 3.12 (f) GENERAL EDUCATION General educatin unites cllege students frm diverse areas by adding breadth and depth t their prgrams f study. General educatin cncepts, framewrks, and/r patterns f

More information

Top 10 Techniques For Building Effective Performance Dashboards

Top 10 Techniques For Building Effective Performance Dashboards Tp 10 Techniques Fr Building Effective Perfrmance Dashbards S much data, s little insight...2 Techniques fr Effective Dashbards...2 1. Chse the right type f dashbard...2 2. Dashbard cntent: Use best practices

More information

Business Plan Overview

Business Plan Overview Business Plan Overview Organizatin and Cntent Summary A business plan is a descriptin f yur business, including yur prduct yur market, yur peple and yur financing needs. Yu shuld cnsider that a well prepared

More information

BRISTOL CITY COUNCIL ROLE AND EMPLOYEE PROFILE: Architect (Practitioner Level) Specific Role Data Architect

BRISTOL CITY COUNCIL ROLE AND EMPLOYEE PROFILE: Architect (Practitioner Level) Specific Role Data Architect BRISTOL CITY COUNCIL ROLE AND EMPLOYEE PROFILE: Architect (Practitiner Level) Specific Rle Data Architect Grade Directrate Managed by BG13 (TBC) Business Change Senir Infrmatin Systems & Technlgy Architect

More information

Tests For EDA Testing Strategy

Tests For EDA Testing Strategy 5th Wrld Cngress fr Sftware Quality Shanghai, China Nvember 2011 Test Strategy fr High Quality EDA Sftware Richard Léveillé Synpsys Inc. Muntain View, Califrnia, USA Richard.Leveille@synpsys.cm Abstract

More information

Data Mining & Advanced Analytics

Data Mining & Advanced Analytics Data Mining & Advanced Analytics Expandiend el alcance de sus mdels predictivs Marian Urman Sales Engineering Manager 1 Current Situatin 2 Users f Advanced Analytics Data Mining Users BI Users Tw types

More information

Business Intelligence and DataWarehouse workshop

Business Intelligence and DataWarehouse workshop Business Intelligence and DataWarehuse wrkshp Benefits: Enables the Final year BE student/ Junir IT prfessinals t get a perfect blend f thery and practice n Business Intelligence and Data warehuse s as

More information

ITIL Release Control & Validation (RCV) Certification Program - 5 Days

ITIL Release Control & Validation (RCV) Certification Program - 5 Days ITIL Release Cntrl & Validatin (RCV) Certificatin Prgram - 5 Days Prgram Overview ITIL is a set f best practices guidance that has becme a wrldwide-adpted framewrk fr Infrmatin Technlgy Services Management

More information

WEB APPLICATION SECURITY TESTING

WEB APPLICATION SECURITY TESTING WEB APPLICATION SECURITY TESTING Cpyright 2012 ps_testware 1/7 Intrductin Nwadays every rganizatin faces the threat f attacks n web applicatins. Research shws that mre than half f all data breaches are

More information

Point2 Property Manager Quick Setup Guide

Point2 Property Manager Quick Setup Guide Click the Setup Tab Mst f what yu need t get started using Pint 2 Prperty Manager has already been taken care f fr yu. T begin setting up yur data in Pint2 Prperty Manager, make sure yu have cmpleted the

More information

CMS Eligibility Requirements Checklist for MSSP ACO Participation

CMS Eligibility Requirements Checklist for MSSP ACO Participation ATTACHMENT 1 CMS Eligibility Requirements Checklist fr MSSP ACO Participatin 1. General Eligibility Requirements ACO participants wrk tgether t manage and crdinate care fr Medicare fee-fr-service beneficiaries.

More information

Using PayPal Website Payments Pro UK with ProductCart

Using PayPal Website Payments Pro UK with ProductCart Using PayPal Website Payments Pr UK with PrductCart Overview... 2 Abut PayPal Website Payments Pr & Express Checkut... 2 What is Website Payments Pr?... 2 Website Payments Pr and Website Payments Standard...

More information

Michigan Transfer Agreement (MTA) Frequently Asked Questions for College Personnel

Michigan Transfer Agreement (MTA) Frequently Asked Questions for College Personnel Michigan Transfer Agreement (MTA) Frequently Asked Questins fr Cllege Persnnel What happened t the MACRAO Agreement? Originally signed in 1972, the MACRAO agreement has been used successfully by many students

More information

CDC UNIFIED PROCESS PRACTICES GUIDE

CDC UNIFIED PROCESS PRACTICES GUIDE Dcument Purpse The purpse f this dcument is t prvide guidance n the practice f Risk Management and t describe the practice verview, requirements, best practices, activities, and key terms related t these

More information

COE: Hybrid Course Request for Proposals. The goals of the College of Education Hybrid Course Funding Program are:

COE: Hybrid Course Request for Proposals. The goals of the College of Education Hybrid Course Funding Program are: COE: Hybrid Curse Request fr Prpsals The gals f the Cllege f Educatin Hybrid Curse Funding Prgram are: T supprt the develpment f effective, high-quality instructin that meets the needs and expectatins

More information

Training Efficiency: Optimizing Learning Technology

Training Efficiency: Optimizing Learning Technology Ideas & Insights frm 2008 Training Efficiency Masters Series Survey Results Training Efficiency: Optimizing Learning Technlgy trainingefficiency.cm Survey Results: Training Efficiency: Optimizing Learning

More information

QAD Operations BI Metrics Demonstration Guide. May 2015 BI 3.11

QAD Operations BI Metrics Demonstration Guide. May 2015 BI 3.11 QAD Operatins BI Metrics Demnstratin Guide May 2015 BI 3.11 Overview This demnstratin fcuses n ne aspect f QAD Operatins Business Intelligence Metrics and shws hw this functinality supprts the visin f

More information

[Preliminary] Staff Publication

[Preliminary] Staff Publication [Preliminary] Staff Publicatin Addressing Disclsures in the Audit f Financial Statements 1. This [preliminary] 1 dcument highlights matters that may be f relevance fr auditrs when addressing disclsures

More information

Overview of the Final Requirements for Meaningful Use - 2015 through 2017

Overview of the Final Requirements for Meaningful Use - 2015 through 2017 Overview f the Final Requirements fr Meaningful Use - 2015 thrugh 2017 On Oct. 6, 2015, the Centers fr Medicare & Medicaid Services (CMS) issued a final rule utlining the requirements fr eligible prfessinal

More information

Systems Load Testing Appendix

Systems Load Testing Appendix Systems Lad Testing Appendix 1 Overview As usage f the Blackbard Academic Suite grws and its availability requirements increase, many custmers lk t understand the capability f its infrastructure. As part

More information

TO: Chief Executive Officers of all National Banks, Department and Division Heads, and all Examining Personnel

TO: Chief Executive Officers of all National Banks, Department and Division Heads, and all Examining Personnel AL 96-7 Subject: Credit Card Preapprved Slicitatins TO: Chief Executive Officers f all Natinal Banks, Department and Divisin Heads, and all Examining Persnnel PURPOSE The purpse f this advisry letter is

More information

CASSOWARY COAST REGIONAL COUNCIL POLICY ENTERPRISE RISK MANAGEMENT

CASSOWARY COAST REGIONAL COUNCIL POLICY ENTERPRISE RISK MANAGEMENT CASSOWARY COAST REGIONAL COUNCIL POLICY ENTERPRISE RISK MANAGEMENT Plicy Number: 2.20 1. Authrity Lcal Gvernment Act 2009 Lcal Gvernment Regulatin 2012 AS/NZS ISO 31000-2009 Risk Management Principles

More information

Build the cloud OpenStack Installation & Configuration Integration with existing tools and processes Cloud Migration

Build the cloud OpenStack Installation & Configuration Integration with existing tools and processes Cloud Migration Slutin Brief OpenStack Services OVERVIEW OnX understands clud adptin challenges f glbal enterprise cmpanies and helps Enterprises adpt OpenStack slutins thrugh targeted services. We ffer vertical industry

More information

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite Vlume Licensing brief Licensing the Cre Client Access License (CAL) Suite and Enterprise CAL Suite Table f Cntents This brief applies t all Micrsft Vlume Licensing prgrams. Summary... 1 What s New in This

More information

Internal Audit Charter and operating standards

Internal Audit Charter and operating standards Internal Audit Charter and perating standards 2 1 verview This dcument sets ut the basis fr internal audit: (i) the Internal Audit charter, which establishes the framewrk fr Internal Audit; and (ii) hw

More information

Position Paper on In-Network Object Cloud Architecture and Design Goals. Interconnecting Smart Objects with Internet Workshop 25 th March 2011

Position Paper on In-Network Object Cloud Architecture and Design Goals. Interconnecting Smart Objects with Internet Workshop 25 th March 2011 Architecture and Design Gals Intercnnecting Smart Objects with Internet Wrkshp 25 th March 2011 Alex Galis Stuart Clayman University Cllege Lndn Department

More information

What Does Specialty Own Occupation Really Mean?

What Does Specialty Own Occupation Really Mean? What Des Specialty Own Occupatin Really Mean? Plicy definitins are cnfusing, nt nly t cnsumers but als t many f the insurance prfessinals wh sell them. Belw we will try t prvide an understandable explanatin

More information

Basics of Supply Chain Management

Basics of Supply Chain Management The Champlain Valley APICS Chapter is a premier prfessinal assciatin fr supply chain and peratins management and wrking tgether with the APICS rganizatin the leading prvider f research, educatin and certificatin

More information

Equal Pay Audit 2014 Summary

Equal Pay Audit 2014 Summary Equal Pay Audit 2014 Summary Abut the dcument The fllwing summary is an abridged versin f Ofcm s equal pay audit 2014. In the full versin f the reprt we set ut ur key findings, cmment n any issues arising

More information

Accident Investigation

Accident Investigation Accident Investigatin APPLICABLE STANDARD: 1960.29 EMPLOYEES AFFECTED: All emplyees WHAT IS IT? Accident investigatin is the prcess f determining the rt causes f accidents, n-the-jb injuries, prperty damage,

More information

Succession Planning & Leadership Development: Your Utility s Bridge to the Future

Succession Planning & Leadership Development: Your Utility s Bridge to the Future Successin Planning & Leadership Develpment: Yur Utility s Bridge t the Future Richard L. Gerstberger, P.E. TAP Resurce Develpment Grup, Inc. 4625 West 32 nd Ave Denver, CO 80212 ABSTRACT A few years ag,

More information

DALBAR Due Diligence: Trust, but Verify

DALBAR Due Diligence: Trust, but Verify BEST INTEREST INVESTMENT RECOMMENDATIONS Advisr Rle under Best Interest Regulatins January 27, 2016 In the era when the cntractual bligatin is t act in the client s best interest, investment decisins can

More information

ITIL Service Offerings & Agreement (SOA) Certification Program - 5 Days

ITIL Service Offerings & Agreement (SOA) Certification Program - 5 Days ITIL Service Offerings & Agreement (SOA) Certificatin Prgram - 5 Days Prgram Overview ITIL is a set f best practices guidance that has becme a wrldwide-adpted framewrk fr Infrmatin Technlgy Services Management

More information

FundingEdge. Guide to Business Cash Advance & Bank Statement Loan Programs

FundingEdge. Guide to Business Cash Advance & Bank Statement Loan Programs Guide t Business Cash Advance & Bank Statement Lan Prgrams Cash Advances: $2,500 - $1,000,000 Business Bank Statement Lans: $5,000 - $500,000 Canada Cash Advances: $5,000 - $500,000 (must have 9 mnths

More information

Google Adwords Pay Per Click Checklist

Google Adwords Pay Per Click Checklist Ggle Adwrds Pay Per Click Checklist This checklist summarizes all the different things that need t be setup t prperly ptimize Ggle Adwrds t get the best results. This includes items that are required fr

More information

USABILITY TESTING PLAN. Document Overview. Methodology

USABILITY TESTING PLAN. Document Overview. Methodology USABILITY TESTING PLAN Dcument Overview This dcument describes a test plan fr cnducting a usability test during the develpment f new ischl website. The gals f usability testing include establishing a baseline

More information

Considerations for Success in Workflow Automation. Automating Workflows with KwikTag by ImageTag

Considerations for Success in Workflow Automation. Automating Workflows with KwikTag by ImageTag Autmating Wrkflws with KwikTag by ImageTag Cnsideratins fr Success in Wrkflw Autmatin KwikTag balances cmprehensive, feature-rich Transactinal Cntent Management with affrdability, fast implementatin, ease

More information

The ad hoc reporting feature provides a user the ability to generate reports on many of the data items contained in the categories.

The ad hoc reporting feature provides a user the ability to generate reports on many of the data items contained in the categories. 11 This chapter includes infrmatin regarding custmized reprts that users can create using data entered int the CA prgram, including: Explanatin f Accessing List Screen Creating a New Ad Hc Reprt Running

More information

Legacy EMR Data Conversions

Legacy EMR Data Conversions Legacy EMR Data Cnversins Agenda Abut us Drivers fr EMR Replacement Things t Cnsider Tp 5 Reasns EMR Cnversins Fail Optins fr Legacy EMR Cnversin Case Study Abut Us Health efrmatics is a healthcare IT

More information

HP ExpertOne. HP2-T21: Administering HP Server Solutions. Table of Contents

HP ExpertOne. HP2-T21: Administering HP Server Solutions. Table of Contents HP ExpertOne HP2-T21: Administering HP Server Slutins Industry Standard Servers Exam preparatin guide Table f Cntents Overview 2 Why take the exam? 2 HP ATP Server Administratr V8 certificatin 2 Wh shuld

More information

This report provides Members with an update on of the financial performance of the Corporation s managed IS service contract with Agilisys Ltd.

This report provides Members with an update on of the financial performance of the Corporation s managed IS service contract with Agilisys Ltd. Cmmittee: Date(s): Infrmatin Systems Sub Cmmittee 11 th March 2015 Subject: Agilisys Managed Service Financial Reprt Reprt f: Chamberlain Summary Public Fr Infrmatin This reprt prvides Members with an

More information

Professional Leaders/Specialists

Professional Leaders/Specialists Psitin Prfile Psitin Lcatin Reprting t Jb family Band BI/Infrmatin Manager Wellingtn Prfessinal Leaders/Specialists Band I Date February 2013 1. POSITION PURPOSE The purpse f this psitin is t: Lead and

More information

Lean Continuous Process Improvement Training Strategy and Capacity Building Efforts at EPA

Lean Continuous Process Improvement Training Strategy and Capacity Building Efforts at EPA Lean Cntinuus Prcess Imprvement Training Strategy and Capacity Building Effrts at EPA July 1, 2015 Prepared by: United States Envirnmental Prtectin Agency Office f Plicy, Office f Strategic Envirnmental

More information

Fund Accounting Class II

Fund Accounting Class II Fund Accunting Class II BS&A Fund Accunting Class II Cntents Gvernmental Financial Reprting Mdel - Minimum GAAP Reprting Requirements... 1 MD&A (Management's Discussin and Analysis)... 1 Basic Financial

More information

Annuities and Senior Citizens

Annuities and Senior Citizens Illinis Insurance Facts Illinis Department f Insurance January 2010 Annuities and Senir Citizens Nte: This infrmatin was develped t prvide cnsumers with general infrmatin and guidance abut insurance cverages

More information

CONTRIBUTION TO T1 STANDARDS PROJECT. On Shared Risk Link Groups for diversity and risk assessment Sudheer Dharanikota, Raj Jain Nayna Networks Inc.

CONTRIBUTION TO T1 STANDARDS PROJECT. On Shared Risk Link Groups for diversity and risk assessment Sudheer Dharanikota, Raj Jain Nayna Networks Inc. Bulder, CO., March 26-28, 2001 /2001-098 CONTRIBUTION TO T1 STANDARDS PROJECT TITLE SOURCE PROJECT On Shared Risk Link Grups fr diversity and risk assessment Sudheer Dharanikta, Raj Jain Nayna Netwrks

More information

Online Learning Portal best practices guide

Online Learning Portal best practices guide Online Learning Prtal Best Practices Guide best practices guide This dcument prvides Micrsft Sftware Assurance Benefit Administratrs with best practices fr implementing e-learning thrugh the Micrsft Online

More information

Issue Brief. SBC Distribution Rules for Employer Sponsored Health Plans October 2012. Summary. Which Plans Are Required to Provide the SBC?

Issue Brief. SBC Distribution Rules for Employer Sponsored Health Plans October 2012. Summary. Which Plans Are Required to Provide the SBC? Issue Brief SBC Distributin Rules fr Emplyer Spnsred Health Plans Octber 2012 Summary The Affrdable Care Act (ACA) expands ERISA's disclsure requirements by requiring that a summary f benefits and cverage

More information

Importance and Contribution of Software Engineering to the Education of Informatics Professionals

Importance and Contribution of Software Engineering to the Education of Informatics Professionals Imprtance and Cntributin f Sftware Engineering t the Educatin f Infrmatics Prfessinals Dr. Tick, József Budapest Plytechnic, Hungary, tick@bmf.hu Abstract: As a result f the Blgna prcess a new frm f higher

More information

Symantec User Authentication Service Level Agreement

Symantec User Authentication Service Level Agreement Symantec User Authenticatin Service Level Agreement Overview and Scpe This Symantec User Authenticatin service level agreement ( SLA ) applies t Symantec User Authenticatin prducts/services, such as Managed

More information

Developing Expertise as Coaches of Teachers

Developing Expertise as Coaches of Teachers Develping Expertise as Caches f Teachers Presented by: Elaine M. Bukwiecki, Ed.D. Assciate Prfessr f Literacy Educatin Presented at: 11 th Internatinal Writing Acrss the Curriculum Cnference Savannah,

More information

ONGOING FEEDBACK AND PERFORMANCE MANAGEMENT. A. Principles and Benefits of Ongoing Feedback

ONGOING FEEDBACK AND PERFORMANCE MANAGEMENT. A. Principles and Benefits of Ongoing Feedback ONGOING FEEDBACK AND PERFORMANCE MANAGEMENT A. Principles and Benefits f Onging Feedback While it may seem like an added respnsibility t managers already "full plate," managers that prvide nging feedback

More information

HOW TO SELECT A LIFE INSURANCE COMPANY

HOW TO SELECT A LIFE INSURANCE COMPANY HOW TO SELECT A LIFE INSURANCE COMPANY There will prbably be hundreds f life insurance cmpanies t chse frm when yu decide t purchase a life insurance plicy. Hw d yu decide which ne? Mst cmpanies are quite

More information

Getting Started Guide

Getting Started Guide AnswerDash Resurces http://answerdash.cm Cntextual help fr sales and supprt Getting Started Guide AnswerDash is cmmitted t helping yu achieve yur larger business gals. The utlined pre-launch cnsideratins

More information

Recognition of Prior Learning (RPL) TAE40110 Certificate IV in Training and Assessment

Recognition of Prior Learning (RPL) TAE40110 Certificate IV in Training and Assessment Recgnitin f Prir Learning (RPL) TAE40110 Certificate IV in Training and Assessment What is RPL? RPL recgnises that yu may already have the skills and knwledge needed t meet natinal cmpetency standards.

More information

Using PayPal Website Payments Pro with ProductCart

Using PayPal Website Payments Pro with ProductCart Using PayPal Website Payments Pr with PrductCart Overview... 2 Abut PayPal Website Payments Pr & Express Checkut... 3 What is Website Payments Pr?... 3 Website Payments Pr and Website Payments Standard...

More information

March 2016 Group A Payment Issues: Missing Information-Loss Calculation letters ( MILC ) - deficiency resolutions: Outstanding appeals:

March 2016 Group A Payment Issues: Missing Information-Loss Calculation letters ( MILC ) - deficiency resolutions: Outstanding appeals: The fllwing tpics were discussed in the March 24, 2016 meeting with law firms representing VCF claimants. Grup A Payment Issues: We cntinue t fcus n paying Grup A claims in full and are meeting the schedule

More information

Secretary of Energy Steven Chu, U.S. Department of Energy. Acting Under Secretary David Sandalow, U.S. Department of Energy

Secretary of Energy Steven Chu, U.S. Department of Energy. Acting Under Secretary David Sandalow, U.S. Department of Energy T: Cc: Secretary f Energy Steven Chu, U.S. Department f Energy Acting Under Secretary David Sandalw, U.S. Department f Energy Frm: Steven Ashby, Deputy Directr fr Science & Technlgy, Pacific Nrthwest Natinal

More information

The AppSec How-To: Choosing a SAST Tool

The AppSec How-To: Choosing a SAST Tool The AppSec Hw-T: Chsing a SAST Tl Surce Cde Analysis Made Easy GIVEN THE WIDE RANGE OF SOURCE CODE ANALYSIS TOOLS, SECURITY PROFESSIONALS, AUDITORS AND DEVELOPERS ALIKE ARE FACED WITH THE QUESTION: Hw

More information

To achieve these objectives we will use a combination of lectures, cases, class discussion, and exercises.

To achieve these objectives we will use a combination of lectures, cases, class discussion, and exercises. 95-730 E-business Technlgy and Management Curse Descriptin The Internet, and assciated technlgies, are nw an established element f the IT prtfli f rganizatins in bth the public and private sectrs. Experiments

More information

Why Can t Johnny Encrypt? A Usability Evaluation of PGP 5.0 Alma Whitten and J.D. Tygar

Why Can t Johnny Encrypt? A Usability Evaluation of PGP 5.0 Alma Whitten and J.D. Tygar Class Ntes: February 2, 2006 Tpic: User Testing II Lecturer: Jeremy Hyland Scribe: Rachel Shipman Why Can t Jhnny Encrypt? A Usability Evaluatin f PGP 5.0 Alma Whitten and J.D. Tygar This article has three

More information