Data De-duplication from the data sets using Similarity functions
|
|
- Samuel Bennett
- 8 years ago
- Views:
Transcription
1 Data De-dplication from the data sets sing Similarity fnctions M.Chitrarpa 1, V.Mniraj Naid 2 1 M.Tech Stdent Department of CSE Adisankara College of Engineering & Technology Gdr, Nellore district, India 2 Assoc. Prof. Department of CSE Adisankara College of Engineering & Technology Gdr, Nellore district, India Abstract: The fndamental isse of dplicate detection is that inexact dplicates in a database may refer to the same real world object de to errors and missing data. Dplicate elimination is hard becase it is cased by different types of errors like typographical errors, missing vales, abbreviations and different representations of the same logical vale. The recent poplarity in merging data from mltiple data sorces has introdced the problem of dplicate records in databases. For Instance, one can find two records that have different syntactic representation of data, bt yet describe the same real world entity. Themselves weakened by the presence of those dplicates in databases. Sch dplicates often negatively inflence the reslts when analyzing data. In this paper we are sing Levenshtein Edit-Distance similarity fnction for strings. Keywords: Dplicate detection, Similarity fnction: Edit-distance, K-means clstering, Syntactic and Entity. 1. Introdction Data qality problems arise with the constantly increasing qantity of data stored in real-world databases that are assred by the vital data cleaning process. The fndamental element of data cleaning is sally termed as dplicate record identification that is the process of identifying the record pairs signifying the same entity (dplicate records). In this paper, we have developed a domain independent approach to detect dplicate followed by data sorces as they are independent. Normally, organizations become conscios of practical precise disparities or inconsistencies while integrating data from diverse sorces to implement a data warehose. Sch problems belong to the category called data heterogeneity. Erroneos dplication of data occrs when information from diverse data sorces that store overlapping information is integrated bt, errors like spelling mistakes, convicting cstoms across records presented in large databases. In the data sorces, omitted fields etc., normally exist in training phase, the record level similarity is specified by the featre vector as inpt for training. The experimentation is performed on the realworld datasets and the performance is evalated with the evalation metrics. Data warehoses that are archives of data gathered from nmeros data sorces constitte the fondation of the majority of existing decision the data accepted at the data warehose from external sorces. Incoming data tples from external sorces need validation and refinement for providing high data qality. An `error-free' procedre in the data warehose is recommended by data qality. Data cleaning techniqes are essential to improve the qality of data. Data cleaning also called data cleansing or spport applications and CRM (Cstomer scrbbing, enhance the qality of data by Relationship Management). Preciseness of identifying and eradicating errors and decision spport analysis on data warehoses is critical becase important bsiness decisions are inflenced by sch analysis. Independent and inconsistencies from the data [5]. It aims at enhancing the overall data compatibility by concentrating on eradication of changes in data practically incompatible standards may be contents and minimizing data repetition. Record 36
2 dplicates, omitted vales, record and field resemblances and dplicate eradications are detected by crrent data cleaning techniqes [8], [4]. Detection of other or several records that signify one distinct real world entity or object is performed by the dplicate record detection process [2, 31]. The problem of dplicate detection is actally to find ot for all objects represented in the database whether the same real-world object is represented by two or more distinct database entries. Record linkage, object identification, record matching etc., are alternative names for Dplicate detection". It is a greatly researched topic and has high importance in fields sch as master data management, data warehosing and ETL (Extraction, Transformation and Loading), cstomer relationship management, and data integration [2]. The two innate problems that mst be addressed by dplicate detection are qick detection of all dplicates in large data sets (efficiency) and proper determination of dplicates and non-dplicates (effectiveness) [9]. Clstering is the categorization of objects into diverse grops, or more exactly, the division of a data set into sbsets (clsters), so that the data in each sbset (ideally) reveal a few common traits C freqently proximity based on certain defined distance measre. The process of splitting database into a set of mtally exclsive sbsets (blocks) sch that matches do not occr across blocks is commonly termed as blocking. Hence, the efficiency of dplicate detection is increased by blocking which sbstantially improves the speed of the comparison process [12]. For example, classifying a set of people records based on the zip codes in the address fields, avoids comparing records that have different zip codes [11]. Fingerprint-based and fll text-based are two types of common dplicate detection methods [10]. Textal similarity, typically qantified sing a similarity fnction [20] sch as, edit distance or cosine similarity is tilized by most of the crrent approaches to determine if two representations are dplicates. In this paper, we employ an efficient detects the dplicates by comparing a pair of records sing attribte similarities. Here, is trained to discriminate between pairs of records corresponding to dplicates and nondplicates sing the training vectors generated by these attribte similarities. In or proposed approach, we employ two phases that is training phase and dplicate detection phase. In the training phase, the dataset contains the labeled dplicates and nondplicates.first we consider a pair of dplicate or non-dplicate records and then, compte the similarity sing three similarity measres to obtain the similarity distance between two individal records. After that, we combine all the similarity vales of all records to generate a featre vector. The above two steps are repeated for all pair of records in the training dataset to obtain a set of featre vectors. After that, we introdce the training system, according to the featre vectors. In the dplicate detection phase, an efficient K- means clstering is sed to partition the dataset into small partitions based on some common featres. It redces the time taken for record comparison to increase the efficiency of the dplicate detection process. Hence, the similarity comptations and other processes are performed on the particlar clster in which the record falls. Ths, we identify the dplicate and nondplicate records for the inpt dataset to improve the qality of or database. The experimentation is carried ot on the real dataset containing dplicate records and the reslts demonstrate that or approach improves the accracy in dplicate detection. 2 Dplicate Record Detection Using Similarity Metrics Dplicate detection depending on the particlar domain is complicated for two reasons. First one is dplicate representations are not similar as they differ slightly in their vales and Second one is theoretical reqired comparison of all pairs of records is impractical for hge qantities of data. 2.1 Proposed Work: Let s consider a database D, that contains records composed of k different fields. Given database, D = {R1; R2; _... ; Rn} where each record Ri incldes k fields sch as name, address, street and city. The fields are often called as attribtes in data cleaning. First, the proposed work takes the training dataset and directly comptes the similarity vales for each record sing edit distance. Bt there exists m different similarity metrics for example edit distance[15, 16], cosine similarity, ID5 and many more. 2.2 Similarity fnctions 37
3 Similarity fnctions are formlas that have been invented to measre the distance between two data points. In this paper the similarity fnctions are sed to compte the similarity between two vales of an attribte. The similarity is defined as a vale indicating how close the two elements are depending on how big or small the vale is. A distinction needs to be drawn here between the terms distance fnctions and similarity fnctions. Given two strings s and t, a distance fnction generates a comparative vale v where a bigger v represents less similarity and a smaller v represents a greater similarity between the two strings. In contrast, a similarity fnction rns on two strings will generate a bigger vale only if the two strings are compared to be very similar. In this paper, the two terms will be sed withot distinction, nless clearly specified. In this dplicate detection phase, we have sed clstering method, K-means [16] to avoid the similarity comptation between all pairs of records. Then, the similarity comptation is performed on the particlar clster in which the record falls. It is sefl for the fast identification of dplicate records. Dplicate detection process 1. Similarity Comptation for records sing Levenshtein distance fnction 2. Combining similarity vales 3. Clstering of inpt dataset 4. Dplicate detection and elimination of dplicate records Step 1.Similarity Comptation for records sing Levenshtein distance fnction In this paper we have sed Levenshtein distance to measre the amont of difference between two seqences (i.e. an edit distance). T he Levenshtein distance between two strings is the minimm nmber of edits that are reqired to transform one string into the other, with the help of edit operations like insertion, deletion or sbstittion of a single character. This edit distance is normalized by dividing the maximm of the length of two name strings. The cost of compting this distance is proportional to the prodct of the two string lengths. In or example, we compare the strings 'Divanov' and "Diane". D I V A N O V D I A N E Table 1. Levenshtein distance between two strings The distance between these two strings is 3 (i.e. a minimm of three edit operations is reqired to transform the first string into the second). Let s consider a sample dataset in Table 1. contains asset of records of stdent details of Name, Address, Street and City. Name Address Street City Mrs.Snith a D/oJohn 51Bharat Nagar d-5 Mrs.Sneet ha D/oJohn Bharat Nagar d-5 Haritha D/oRam HyderNaga r d Raj S/osom shivajiroad Delhi-01 Harita D/oRam HyderNaga r d Raj S/swam BhagyaNag Delhi-01 i ar Table 2 sample dataset We calclate the Levenshtein distance fnctions for each pair of record in the dataset. Here, the similarity comptation process is explained for two records R1and R2 present in the table shown above. Let s consider only the Name" attribte of these two records. The distance between these two strings is 2 (i.e. a minimm of two edit operations is reqired to transform the first string into the second).in this way we find distance between all pairs of records in the dataset. 38
4 Table 2. Sample dataset International Jornal of Advanced Engineering and Global Technology 0 M r s. S n i t h a M r s S n e e t h a Step2. Combining similarity vales all the mltiple fields We combine these similarity vales obtained from different similarity measres to compte the distance between any two records. Here, we can represent similarity between any pair of records by a featre vector in which each component has the similarity vale between two records of anyone of the similarity measre. When considering a dataset that contains records composed of n different fields and a set of m distance metrics, we can represent similarity between any pair of records by an m-length vector. Each component of the vector represents the compted similarity vale between two records that is calclated sing one of the m distance metrics. Step3. Clstering The clstering of data records is carried ot sing the k-means clstering that is widely accepted clstering method among the data mining commnity. The basic steps for clstering are given as: (1) Initialize k-centroids, one for each clster. (2) Compte the similarity of each k-centroid with the data records presented in the dataset. (3) Assign data records to clster Ci whose similarity measre is high. (4) Update the k-centroids. (5) Repeat Step 2 to step 4, ntil there is no movement of the data records between the clsters. We apply K-means clstering to the data set to separate the fields contains same average distance. Step 4. Detection and elimination of Dplicate records In this dplicate detection phase, for each data records, the comparison is done only the data records presented within the clster to redce the nmber of record comparisons. Pairs of records that fall nder each clster will be candidates for a fll similarity comptation and then perform the record comparison by tilizing previos steps in the same way as sed. Here, the similarity metrics are sed to calclate distances for each field of each pair of potential dplicate or non-dplicate records. Inpt and obtaining a binary matrix which distingishes the dplicate and Non-dplicate records based on the threshold vale. We are considering the threshold vale is 0.9(i.e. 90%).If two records similarity is 90% then the second record is consider as dplicate record and less than threshold vale remaining all records are nondplicate records. Reslts Analysis: The reslts of the proposed approach for dplicate record detection are explained in this section. Experimental reslts consider the table 2 as follows: R1 &R2 : Distance vector= [ ] Similarity measres mapping into 0 to 1 is [ ] 0.91 R1&R3 : Distance vector= [ ] The complete similarity matrix, Similarity measres mapping into 0 to 1 is [ ] 0.5 R6 R1 0 R1 R2 R3 R4 R5 R R R R
5 R We are considering the threshold vale is 0.9(i.e 90%).If two records similarity is 90% then the second record is consider as dplicate record and less than threshold vale remaining all records are non-dplicate records. Here R1, R2 90%, R2 is removed and R3 & R5 are 96% similar, R5 is removed. After dplicate records removing the data set consists of R1, R3, R4 and R6. Conclsion: We followed a simple approach to detect dplicate records occr in large databases. The approach makes se of similarity fnctions in detecting the dplicate records. The similarity was compted with the help of Levenshtein distance, similarity to accrately identifying the dplicate records. Finally, the experimentation was carried ot sing the real-world datasets and the performance of the proposed approach is evalated based on the Symmetric metrics. The experimental evalation ensred that the proposed approach detects dplicates efficiently. References [1] Srajit Chadhri, Kris Ganjam, Venkatesh Ganti and Rajeev Motwani, Robst and Efficient Fzzy Match for Online Data Cleaning", In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , New York, USA, [2] Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis and Vassilios S. Verykios, Dplicate Record Detection: A Srvey", IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 1, pp. 1 { 16, Janary [3] Srajit Chadhri, Anish Das Sarma, Venkatesh Ganti and Raghav Kashik, Leveraging Aggregate Constraints for Dedplication", In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 437 { 448, New York, USA, 2007}. [4] J. Jebamalar Tamilselvi and Dr. V. Saravanan, A Uni_ed Framework and Seqential Data Cleaning Approach for a Data Warehose", International Jornal of Compter Science and NetworkSecrity, Vol. 8, No. 5, May [5] Erhard Rahm and Hong Hai Do, \Data Cleaning: Problems and Crrent Approaches", IEEE Data Engineering Blletin, Vol. 23, No. 4, December [6] Matsakis, Nicholas E and Leslie Pack Kaelbling, Active dplicate detection with Bayesian non-parametric models", Massachsetts Institte of Technology, Thesis (Ph. D.), MIT libraries, [7]RIDDLE data repository from [8] Arthr D. Chapman, Principles and Methods of Data Cleaning Primary Species and Species- Occrrence Data", Version 1.0, Report for the Global Biodiversity Information Facility, Copenhagen, [9] Uwe Draisbach and Felix Namann, A Comparison and Generalization of Blocking and Win-dowing Algorithms for Dplicate Detection", In Proceedings of the 7th International Workshop on Qality in Databases at VLDB, Lyon, France, [10] Hi Yang and Jamie Callan, Near-Dplicate Detection by Instance-Level Constrained Clstering", In Proceedings of the Twenty-Ninth Annal International ACM SIGIR Conference on Re-search and Development in Information Retrieval, pp. 421 { 428, Seattle, WA, USA, 2006}. [11] Steven Eijong Whang, David Menestrina, Georgia Kotrika, Martin Theobald, and Hector Garcia-Molina, \Entity Resoltion with Iterative Blocking", In Proceedings of the 35th SIGMOD international conference on Management of data, Providence, pp. 219 { 232, Rhode Island, USA, Jne 29 { Jly 2, 2009}. [12] J. Jebamalar Tamilselvi and V. Saravanan, Token- Based Method of Blocking Records for Large Data Warehose", Advances in Information Mining, Vol. 2, No. 2, pp. 5 { 10, [13] Microsoft Research `Data Cleaning' from lt.aspx. [14] Israr Ahmed, Abdl Aziz, \Dynamic Approach for Data Scrbbing Process", International Jornal on Compter Science and Engineering (IJCSE), Vol. 2, No. 2, pp. 416 { 423, 2010}. [15] Jaro, M.: Advances in record-linkage methodology as applied to matching the 1985 censs of tampa, orida. Jornal of the American Statistical Association pp (1989). [16]. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady. vol. 10, pp. 707{710 (1966)}. [17] MacQeen, J.B., \Some Methods for classi_cation and Analysis of Mltivariate Observations", in Proceedings of 5th Berkeley Symposim on Mathematical Statistics and Probability. pp. 281 { 297,1967}. 40
An unbiased crawling strategy for directed social networks
Abstract An nbiased crawling strategy for directed social networks Xeha Yang 1,2, HongbinLi 2* 1 School of Software, Shenyang Normal University, Shenyang 110034, Liaoning, China 2 Shenyang Institte of
More informationGUIDELINE. Guideline for the Selection of Engineering Services
GUIDELINE Gideline for the Selection of Engineering Services 1998 Mission Statement: To govern the engineering profession while enhancing engineering practice and enhancing engineering cltre Pblished by
More informationCurriculum development
DES MOINES AREA COMMUNITY COLLEGE Crriclm development Competency-Based Edcation www.dmacc.ed Why does DMACC se competency-based edcation? DMACC tilizes competency-based edcation for a nmber of reasons.
More information10 Evaluating the Help Desk
10 Evalating the Help Desk The tre measre of any society is not what it knows bt what it does with what it knows. Warren Bennis Key Findings Help desk metrics having to do with demand and with problem
More informationTrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang Neil Yorke-Smith School of Compter Engineering Nanyang Technological University
More informationPlanning a Managed Environment
C H A P T E R 1 Planning a Managed Environment Many organizations are moving towards a highly managed compting environment based on a configration management infrastrctre that is designed to redce the
More informationIntroduction to HBase Schema Design
Introdction to HBase Schema Design Amandeep Khrana Amandeep Khrana is a Soltions Architect at Clodera and works on bilding soltions sing the Hadoop stack. He is also a co-athor of HBase in Action. Prior
More informationUsing GPU to Compute Options and Derivatives
Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation
More informationASAND: Asynchronous Slot Assignment and Neighbor Discovery Protocol for Wireless Networks
ASAND: Asynchronos Slot Assignment and Neighbor Discovery Protocol for Wireless Networks Fikret Sivrikaya, Costas Bsch, Malik Magdon-Ismail, Bülent Yener Compter Science Department, Rensselaer Polytechnic
More informationPlanning a Smart Card Deployment
C H A P T E R 1 7 Planning a Smart Card Deployment Smart card spport in Microsoft Windows Server 2003 enables yo to enhance the secrity of many critical fnctions, inclding client athentication, interactive
More informationPhone Banking Terms Corporate Accounts
Phone Banking Terms Corporate Acconts If there is any inconsistency between the terms and conditions applying to an Accont and these Phone Banking Terms, these Phone Banking Terms prevail in respect of
More informationDimension Debasing towards Minimal Search Space Utilization for Mining Patterns in Big Data
Volme: 3 Isse: 8 59-594 Dimension Debasing towards Minimal Search Space Utilization for Mining Patterns in Big Data Dr. M. Naga Ratna Dept. of Compter Science JNTUH College of Engineering Email: mratnajnt@jnth.ac.in
More informationFINANCIAL FITNESS SELECTING A CREDIT CARD. Fact Sheet
FINANCIAL FITNESS Fact Sheet Janary 1998 FL/FF-02 SELECTING A CREDIT CARD Liz Gorham, Ph.D., AFC Assistant Professor and Family Resorce Management Specialist, Utah State University Marsha A. Goetting,
More informationEMC ViPR Analytics Pack for VMware vcenter Operations Management Suite
EMC ViPR Analytics Pack for VMware vcenter Operations Management Site Version 1.1.0 Installation and Configration Gide 302-000-487 01 Copyright 2013-2014 EMC Corporation. All rights reserved. Pblished
More informationDesigning and Deploying File Servers
C H A P T E R 2 Designing and Deploying File Servers File servers rnning the Microsoft Windows Server 2003 operating system are ideal for providing access to files for sers in medim and large organizations.
More information9 Setting a Course: Goals for the Help Desk
IT Help Desk in Higher Edcation ECAR Research Stdy 8, 2007 9 Setting a Corse: Goals for the Help Desk First say to yorself what yo wold be; and then do what yo have to do. Epictets Key Findings Majorities
More informationEnabling Advanced Windows Server 2003 Active Directory Features
C H A P T E R 5 Enabling Advanced Windows Server 2003 Active Directory Featres The Microsoft Windows Server 2003 Active Directory directory service enables yo to introdce advanced featres into yor environment
More informationChapter 1. LAN Design
Chapter 1 LAN Design CCNA3-1 Chapter 1 Note for Instrctors These presentations are the reslt of a collaboration among the instrctors at St. Clair College in Windsor, Ontario. Thanks mst go ot to Rick Graziani
More informationIsilon OneFS. Version 7.1. Backup and recovery guide
Isilon OneFS Version 7.1 Backp and recovery gide Copyright 2013-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished March, 2014 EMC believes the information in this pblication is accrate
More informationHSBC Internet Banking. Combined Product Disclosure Statement and Supplementary Product Disclosure Statement
HSBC Internet Banking Combined Prodct Disclosre Statement and Spplementary Prodct Disclosre Statement AN IMPORTANT MESSAGE FOR HSBC CUSTOMERS NOTICE OF CHANGE For HSBC Internet Banking Combined Prodct
More informationCloser Look at ACOs. Designing Consumer-Friendly Beneficiary Assignment and Notification Processes for Accountable Care Organizations
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Janary 2012
More informationPlanning and Implementing An Optimized Private Cloud
W H I T E PA P E R Intelligent HPC Management Planning and Implementing An Optimized Private Clod Creating a Clod Environment That Maximizes Yor ROI Planning and Implementing An Optimized Private Clod
More informationEvolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending
Evoltionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending Abstract Xiaoyn Liao G. Gary Wang * Dept. of Mechanical & Indstrial Engineering, The University of Manitoba Winnipeg, MB,
More informationEvery manufacturer is confronted with the problem
HOW MANY PARTS TO MAKE AT ONCE FORD W. HARRIS Prodction Engineer Reprinted from Factory, The Magazine of Management, Volme 10, Nmber 2, Febrary 1913, pp. 135-136, 152 Interest on capital tied p in wages,
More informationDesigning an Authentication Strategy
C H A P T E R 1 4 Designing an Athentication Strategy Most organizations need to spport seamless access to the network for mltiple types of sers, sch as workers in offices, employees who are traveling,
More informationHigh Availability for Internet Information Server Using Double-Take 4.x
High Availability for Internet Information Server Using Doble-Take 4.x High Availability for Internet Information Server Using Doble-Take 4.x pblished April 2000 NSI and Doble-Take are registered trademarks
More informationSpectrum Balancing for DSL with Restrictions on Maximum Transmit PSD
Spectrm Balancing for DSL with Restrictions on Maximm Transmit PSD Driton Statovci, Tomas Nordström, and Rickard Nilsson Telecommnications Research Center Vienna (ftw.), Dona-City-Straße 1, A-1220 Vienna,
More information8 Service Level Agreements
8 Service Level Agreements Every organization of men, be it social or political, ltimately relies on man s capacity for making promises and keeping them. Hannah Arendt Key Findings Only abot 20 percent
More informationPlanning an Active Directory Deployment Project
C H A P T E R 1 Planning an Active Directory Deployment Project When yo deploy the Microsoft Windows Server 2003 Active Directory directory service in yor environment, yo can take advantage of the centralized,
More informationEMC ViPR. Concepts Guide. Version 1.1.0 302-000-482 02
EMC ViPR Version 1.1.0 Concepts Gide 302-000-482 02 Copyright 2013-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Febrary, 2014 EMC believes the information in this pblication is
More informationEMC Storage Analytics
EMC Storage Analytics Version 2.1 Installation and User Gide 300-014-858 09 Copyright 2013 EMC Corporation. All rights reserved. Pblished in USA. Pblished December, 2013 EMC believes the information in
More informationRegular Specifications of Resource Requirements for Embedded Control Software
Reglar Specifications of Resorce Reqirements for Embedded Control Software Rajeev Alr and Gera Weiss University of Pennsylvania Abstract For embedded control systems a schedle for the allocation of resorces
More informationDesigning a TCP/IP Network
C H A P T E R 1 Designing a TCP/IP Network The TCP/IP protocol site defines indstry standard networking protocols for data networks, inclding the Internet. Determining the best design and implementation
More informationCloser Look at ACOs. Putting the Accountability in Accountable Care Organizations: Payment and Quality Measurements. Introduction
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Janary 2012
More informationTechnical Notes. PostgreSQL backups with NetWorker. Release number 1.0 302-001-174 REV 01. June 30, 2014. u Audience... 2. u Requirements...
PostgreSQL backps with NetWorker Release nmber 1.0 302-001-174 REV 01 Jne 30, 2014 Adience... 2 Reqirements... 2 Terminology... 2 PostgreSQL backp methodologies...2 PostgreSQL dmp backp... 3 Configring
More informationWHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor
WHITE PAPER Filter andwidth Definition of the WaveShaper S-series 1 Introdction The WaveShaper family of s allow creation of ser-cstomized filter profiles over the C- or L- band, providing a flexible tool
More informationNAPA TRAINING PROGRAMS FOR:
NAPA TRAINING PROGRAMS FOR: Employees Otside Sales Store Managers Store Owners See NEW ecatalog Inside O V E R V I E W 2010_StoreTrainingBrochre_SinglePg.indd 1 5/25/10 12:39:32 PM Welcome 2010 Store Training
More informationCloser Look at ACOs. Making the Most of Accountable Care Organizations (ACOs): What Advocates Need to Know
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Updated
More informationCorporate performance: What do investors want to know? Innovate your way to clearer financial reporting
www.pwc.com Corporate performance: What do investors want to know? Innovate yor way to clearer financial reporting October 2014 PwC I Innovate yor way to clearer financial reporting t 1 Contents Introdction
More informationMotorola Reinvents its Supplier Negotiation Process Using Emptoris and Saves $600 Million. An Emptoris Case Study. Emptoris, Inc. www.emptoris.
Motorola Reinvents its Spplier Negotiation Process Using Emptoris and Saves $600 Million An Emptoris Case Stdy Emptoris, Inc. www.emptoris.com VIII-03/3/05 Exective Smmary With the disastros telecommnication
More informationA taxonomy of knowledge management software tools: origins and applications
Evalation and Program Planning 25 2002) 183±190 www.elsevier.com/locate/evalprogplan A taxonomy of knowledge management software tools: origins and applications Peter Tyndale* Kingston University Bsiness
More informationEquilibrium of Forces Acting at a Point
Eqilibrim of orces Acting at a Point Eqilibrim of orces Acting at a Point Pre-lab Qestions 1. What is the definition of eqilibrim? Can an object be moving and still be in eqilibrim? Explain.. or this lab,
More informationSample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification
Sample Pages Edgar Dietrich, Alfred Schlze Measrement Process Qalification Gage Acceptance and Measrement Uncertainty According to Crrent Standards ISBN: 978-3-446-4407-4 For frther information and order
More informationHigh Availability for Microsoft SQL Server Using Double-Take 4.x
High Availability for Microsoft SQL Server Using Doble-Take 4.x High Availability for Microsoft SQL Server Using Doble-Take 4.x pblished April 2000 NSI and Doble-Take are registered trademarks of Network
More informationThe Good Governance Standard for Public Services
The Good Governance Standard for Pblic Services The Independent Commission for Good Governance in Pblic Services The Independent Commission for Good Governance in Pblic Services, chaired by Sir Alan Langlands,
More informationSYSTEM OF CONFORMITY ASSESSMENT SCHEMES FOR ELECTROTECHNICAL EQUIPMENT
IECEE Reporting Service for Hazardos Sbstances: Helping yo protect corporate reptation and the bottom line SYSTEM OF CONFORMITY ASSESSMENT SCHEMES FOR ELECTROTECHNICAL EQUIPMENT AND Components (iecee)
More informationThe Intelligent Choice for Disability Income Protection
The Intelligent Choice for Disability Income Protection provider Pls Keeping Income strong We prposeflly engineer or disability income prodct with featres that deliver benefits sooner and contine paying
More informationApache Hadoop. The Scalability Update. Source of Innovation
FILE SYSTEMS Apache Hadoop The Scalability Update KONSTANTIN V. SHVACHKO Konstantin V. Shvachko is a veteran Hadoop developer. He is a principal Hadoop architect at ebay. Konstantin specializes in efficient
More informationPurposefully Engineered High-Performing Income Protection
The Intelligent Choice for Disability Income Insrance Prposeflly Engineered High-Performing Income Protection Keeping Income strong We engineer or disability income prodcts with featres that deliver benefits
More informationI Symbolization J,1 II e L~ "-"-:"u"'dll... Table I: The kinds of CGs and their classification, (where, t - a local neighbourhood topology)
POSTER SESSIONS 484 REPRESENTATION OF THE GENERALIZED DATA STRUCTURES FOR MULTI-SCALE GIS M.O.Govorov Dept. of Cartography,' Siberian State Academy of Geodesy Plahotnogo 10, Novosibirsk, 630108, Rssia
More informationSTI Has All The Pieces Hardware Software Support
STI Has All The Pieces Hardware Software Spport STI has everything yo need for sccessfl practice management, now and in the ftre. The ChartMaker Medical Site Incldes: Practice Management/Electronic Billing,
More informationOptimal Trust Network Analysis with Subjective Logic
The Second International Conference on Emerging Secrity Information, Systems and Technologies Optimal Trst Network Analysis with Sbjective Logic Adn Jøsang UNIK Gradate Center, University of Oslo Norway
More informationMUNICIPAL CREDITWORTHINESS MODELLING BY NEURAL NETWORKS
0 Acta Electrotechnica et Informatica Vol. 8, No. 4, 008, 0 5 MUNICIPAL CREDITWORTHINESS MODELLING BY NEURAL NETWORKS Petr HÁJEK, Vladimír OLEJ Institte of System Engineering and Informatics, Faclty of
More informationA Spare Part Inventory Management Model for Better Maintenance of Intelligent Transportation Systems
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 A Spare Part Inventory Management Model for Better Maintenance of Intelligent
More informationIntroducing Revenue Cycle Optimization! STI Provides More Options Than Any Other Software Vendor. ChartMaker Clinical 3.7
Introdcing Revene Cycle Optimization! STI Provides More Options Than Any Other Software Vendor ChartMaker Clinical 3.7 2011 Amblatory EHR + Cardiovasclar Medicine + Child Health STI Provides More Choices
More informationEMC NetWorker. Performance Optimization Planning Guide. Version 8.2 302-000-697 REV 01
EMC NetWorker Version 8.2 Performance Optimization Planning Gide 302-000-697 REV 01 Copyright 2000-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Janary, 2015 EMC believes the information
More informationFast and Interactive Analytics over Hadoop Data with Spark
NETWORKED SYSTEMS Fast and Interactive Analytics over Hadoop Data with Spark MATEI ZAHARIA, MOSHARAF CHOWDHURY, TATHAGATA DAS, ANKUR DAVE, JUSTIN MA, MURPHY MCCAULEY, MICHAEL J. FRANKLIN, SCOTT SHENKER,
More informationpersonal income insurance product disclosure statement and policy Preparation date: 26/03/2004
personal income insrance prodct disclosre statement and policy Preparation date: 26/03/2004 personal income Insrer CGU Insrance Limited ABN 27 004 478 371 AFS Licence No. 238291 This is an important docment.
More informationDeploying Network Load Balancing
C H A P T E R 9 Deploying Network Load Balancing After completing the design for the applications and services in yor Network Load Balancing clster, yo are ready to deploy the clster rnning the Microsoft
More information2.1 Unconstrained Graph Partitioning. 1.2 Contributions. 1.3 Related Work. 1.4 Paper Organization 2. GRAPH-THEORETIC APPROACH
Mining Newsgrops Using Networks Arising From Social Behavior Rakesh Agrawal Sridhar Rajagopalan Ramakrishnan Srikant Yirong X IBM Almaden Research Center 6 Harry Road, San Jose, CA 95120 ABSTRACT Recent
More informationMake the College Connection
Make the College Connection A college planning gide for stdents and their parents Table of contents The compelling case for college 2 Selecting a college 3 Paying for college 5 Tips for meeting college
More informationCRM Customer Relationship Management. Customer Relationship Management
CRM Cstomer Relationship Management Farley Beaton Virginia Department of Taxation Discssion Areas TAX/AMS Partnership Project Backgrond Cstomer Relationship Management Secre Messaging Lessons Learned 2
More informationPosition paper smart city. economics. a multi-sided approach to financing the smart city. Your business technologists.
Position paper smart city economics a mlti-sided approach to financing the smart city Yor bsiness technologists. Powering progress From idea to reality The hman race is becoming increasingly rbanised so
More informationPreparing your heavy vehicle for brake test
GUIDE Preparing yor heavy vehicle for brake test A best practice gide Saving lives, safer roads, ctting crime, protecting the environment Breaking the braking myth Some people believe that a locked wheel
More informationHealth Benefits Coverage Under Federal Law...
covers Labor Compliance 2014mx.pdf 1 11/19/2014 2:05:01 PM Compliance Assistance Gide Health Benefits Coverage Under Federal Law... The Affordable Care Act Health Insrance Portability and Accontability
More informationModeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.
FSI-2-TN6 Modeling Roghness Effects in Open Channel Flows D.T. Soders and C.W. Hirt Flow Science, Inc. Overview Flows along rivers, throgh pipes and irrigation channels enconter resistance that is proportional
More informationResearch on Staff Explicitation in Organizational Knowledge Management Based on Fuzzy Set Similarity to Ideal Solution
Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Jornal, 015, 9, 139-144 139 Open Access Research on Staff Explicitation in Organizational Knowledge Management Based
More informationDIRECT TAX LAWS Taxability of Capital Gains on By-back of Shares - Debate ignites after AAR s rling in RST s case BACKGROUND 1. Recently, the Athority for Advance Rlings ( AAR ) in the case of RST, In
More information11 Success of the Help Desk: Assessing Outcomes
11 Sccess of the Help Desk: Assessing Otcomes I dread sccess... I like a state of continal becoming, with a goal in front and not behind. George Bernard Shaw Key Findings Respondents help desks tend to
More informationPgrading To Windows XP 4.0 Domain Controllers and Services
C H A P T E R 8 Upgrading Windows NT 4.0 Domains to Windows Server 2003 Active Directory Upgrading yor domains from Microsoft Windows NT 4.0 to Windows Server 2003 Active Directory directory service enables
More informationUpgrading Windows 2000 Domains to Windows Server 2003 Domains
C H A P T E R 9 Upgrading Windows 2000 Domains to Windows Server 2003 Domains Upgrading yor network operating system from Microsoft Windows 2000 to Windows Server 2003 reqires minimal network configration
More informationCandidate: Shawn Mullane. Date: 04/02/2012
Shipping and Receiving Specialist / Inventory Control Assessment Report Shawn Mllane 04/02/2012 www.resorceassociates.com To Improve Prodctivity Throgh People. Shawn Mllane 04/02/2012 Prepared For: NAME
More informationA Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems
A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems Agusthiyar.R, 1, Dr. K. Narashiman 2 Assistant Professor (Sr.G), Department of Computer Applications,
More informationaééäçóáåö=táåççïë= péêîéê=ommp=oéöáçå~ä= açã~áåë
C H A P T E R 7 aééäçóáåö=táåççïë= péêîéê=ommp=oéöáçå~ä= açã~áåë Deploying Microsoft Windows Server 2003 s involves creating new geographically based child domains nder the forest root domain. Deploying
More informationKentucky Deferred Compensation (KDC) Program Summary
Kentcky Deferred Compensation (KDC) Program Smmary Smmary and Highlights of the Kentcky Deferred Compensation (KDC) Program Simple. Smart. For yo. For life. 457 Plan 401(k) Plan Roth 401(k) Deemed Roth
More informationThe Good Governance Standard for Public Services
The Good Governance Standard for Pblic Services The Independent Commission on Good Governance in Pblic Services Good Governance Standard for Pblic Services OPM and CIPFA, 2004 OPM (Office for Pblic Management
More informationBonds with Embedded Options and Options on Bonds
FIXED-INCOME SECURITIES Chapter 14 Bonds with Embedded Options and Options on Bonds Callable and Ptable Bonds Instittional Aspects Valation Convertible Bonds Instittional Aspects Valation Options on Bonds
More information7 Help Desk Tools. Key Findings. The Automated Help Desk
7 Help Desk Tools Or Age of Anxiety is, in great part, the reslt of trying to do today s jobs with yesterday s tools. Marshall McLhan Key Findings Help desk atomation featres are common and are sally part
More informationEMC VNX Series. EMC Secure Remote Support for VNX. Version VNX1, VNX2 300-014-340 REV 03
EMC VNX Series Version VNX1, VNX2 EMC Secre Remote Spport for VNX 300-014-340 REV 03 Copyright 2012-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Jly, 2014 EMC believes the information
More informationResearch on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game
International Jornal of Grid and Distribted Compting Vol. 9, No. 5 (06), pp.-0 http://dx.doi.org/0.457/ijgdc.06.9.5.8 Research on Pricing Policy of E-bsiness Spply Chain Based on Bertrand and Stackelberg
More informationAnatomy of SIP Attacks
Anatomy of SIP Attacks João M. Ceron, Klas Steding-Jessen, and Cristine Hoepers João Marcelo Ceron is a Secrity Analyst at CERT.br/NIC.br. He holds a master s degree from Federal University of Rio Grande
More informationBLIND speech separation (BSS) aims to recover source
A Convex Speech Extraction Model and Fast Comptation by the Split Bregman Method Meng Y, Wenye Ma, Jack Xin, and Stanley Osher. Abstract A fast speech extraction (FSE) method is presented sing convex optimization
More informationThe Boutique Premium. Do Boutique Investment Managers Create Value? AMG White Paper June 2015 1
The Botiqe Premim Do Botiqe Investment Managers Create Vale? AMG White Paper Jne 2015 1 Exective Smmary Botiqe active investment managers have otperformed both non-botiqe peers and indices over the last
More informationThe Intelligent Choice for Basic Disability Income Protection
The Intelligent Choice for Basic Disability Income Protection provider Pls Limited Keeping Income strong We prposeflly engineer or basic disability income prodct to provide benefit-rich featres delivering
More informationOpening the Door to Your New Home
Opening the Door to Yor New Home A Gide to Bying and Financing. Contents Navigating Yor Way to Home Ownership...1 Getting Started...3 Finding Yor Home...9 Finalizing Yor Financing...12 Final Closing...13
More informationCandidate: Kevin Taylor. Date: 04/02/2012
Systems Analyst / Network Administrator Assessment Report 04/02/2012 www.resorceassociates.com To Improve Prodctivity Throgh People. 04/02/2012 Prepared For: Resorce Associates Prepared by: John Lonsbry,
More informationCharles Dickens A Tale of Two Cities A TALE OF TWO ARCHITECTURES. By W H Inmon. It was the best of times. It was the worst of times.
A TALE OF TWO ARCHITECTURE It was the est of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of elief, it was the epoch of incredlity, it was
More informationInferring Continuous Dynamic Social Influence and Personal Preference for Temporal Behavior Prediction
Inferring Continos Dynamic Social Inflence and Personal Preference for Temporal Behavior Prediction Jn Zhang 1,2,3,4 Chaokn Wang 2,3,4 Jianmin Wang 2,3,4 Jeffrey X Y 5 1 Department of Compter Science and
More informationFacilities. Car Parking and Permit Allocation Policy
Facilities Car Parking and Permit Allocation Policy Facilities Car Parking and Permit Allocation Policy Contents Page 1 Introdction....................................................2 2.0 Application
More informationFlexible, Low-overhead Event Logging to Support Resource Scheduling
Flexible, Low-overhead Event Logging to Spport Resorce Schedling Jan Stoess University of Karlsrhe Germany stoess@ira.ka.de Volkmar Uhlig IBM T.J. Watson Research Center New York vhlig@s.ibm.com Abstract
More informationQuery Optimization in Microsoft SQL Server PDW
Qery Optimization in Microsoft SQL Server PDW Srinath Shankar, Rimma Nehme, Josep Agilar-Saborit, Andrew Chng, Mostafa Elhemali, Alan Halverson, Eric Robinson, Mahadevan Sankara Sbramanian, David DeWitt,
More informationForm M-1 Report for Multiple Employer Welfare Arrangements (MEWAs) and Certain Entities Claiming Exception (ECEs)
U.S. Department of Labor Employee Benefits Secrity Administration Room N5511 200 Constittion Avene, NW Washington, DC 20210 P-450 Form M-1 Report for Mltiple Employer Welfare Arrangements (MEWAs) and Certain
More informationRoth 401(k) and Roth 403(b) Accounts: Pay Me Now or Pay Me Later Why a Roth Election Should Be Part of Your Plan Now
Reprinted with permission from the Society of FSP. Reprodction prohibited withot pblisher's written permission. Roth 401(k) and Roth 403(b) Acconts: Why a Roth Election Shold Be Part of Yor Plan Now by
More informationWelcome to UnitedHealthcare. Ideally, better health coverage should cost less. In reality, now it can.
Welcome to UnitedHealthcare Ideally, better health coverage shold cost less. In reality, now it can. The plan designed with both qality and affordability in mind. Consistent, qality care is vitally important.
More informationEMC PowerPath Virtual Appliance
EMC PowerPath Virtal Appliance Version 1.2 Administration Gide P/N 302-000-475 REV 01 Copyright 2013 EMC Corporation. All rights reserved. Pblished in USA. Pblished October, 2013 EMC believes the information
More informationCompensation Approaches for Far-field Speaker Identification
Compensation Approaches for Far-field Speaer Identification Qin Jin, Kshitiz Kmar, Tanja Schltz, and Richard Stern Carnegie Mellon University, USA {qjin,shitiz,tanja,rms}@cs.cm.ed Abstract While speaer
More informationDirect Loan Basics & Entrance Counseling Guide. For Graduate and Professional Student Direct PLUS Loan Borrowers
Direct Loan Basics & Entrance Conseling Gide For Gradate and Professional Stdent Direct PLUS Loan Borrowers DIRECT LOAN BASICS & ENTRANCE COUNSELING GUIDE For Gradate and Professional Stdent Direct PLUS
More informationMining Social Media with Social Theories: A Survey
Mining Media with Theories: A Srvey Jiliang Tang Compter Science & Eng Arizona State University Tempe, AZ, USA Jiliang.Tang@as.ed Yi Chang Yahoo!Labs Yahoo!Inc Snnyvale,CA, USA yichang@yahooinc.com Han
More informationFaceTrust: Assessing the Credibility of Online Personas via Social Networks
FaceTrst: Assessing the Credibility of Online Personas via Social Networks Michael Sirivianos Kyngbaek Kim Xiaowei Yang Dke University University of California, Irvine Dke University msirivia@cs.dke.ed
More information