EXTENDING UML FOR MODELLING OF DATA MINING CASES



Similar documents
CRISP-DM: The life cicle of a data mining project. KDD Process

The Rap on RUP : An Introduction to the Rational Unified Process

Business-Driven Software Engineering Lecture 3 Foundations of Processes

Requirement Management with the Rational Unified Process RUP practices to support Business Analyst s activities and links with BABoK

Increasing Development Knowledge with EPFC

Development Methodologies

Plan-Driven Methodologies

CS 389 Software Engineering. Lecture 2 Chapter 2 Software Processes. Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed.

Components Of Successful Software Development. Mobi-Sys Internet Solutions Inc. Software Development Solutions and Consulting

Chap 1. Introduction to Software Architecture

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

Unit 1 Learning Objectives

Digital Industries Trailblazer Apprenticeship. Software Developer - Occupational Brief

PROJECT MANAGEMENT METHODOLOGY OF OBJECT- ORIENTED SOFTWARE DEVELOPMENT

The Unified Software Development Process

Business Modeling with UML

NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0

Table of Contents. CHAPTER 1 Web-Based Systems 1. CHAPTER 2 Web Engineering 12. CHAPTER 3 A Web Engineering Process 24

In this Lecture you will Learn: Development Process. Unified Software Development Process. Best Practice

Requirements Definition and Management Processes

ProGUM-Web: Tool Support for Model-Based Development of Web Applications

Classical Software Life Cycle Models

Software Engineering. Session 3 Main Theme Requirements Definition & Management Processes and Tools Dr. Jean-Claude Franchitti

Chapter 15. Web services development lifecycle

To introduce software process models To describe three generic process models and when they may be used

Software Development Process Models and their Impacts on Requirements Engineering Organizational Requirements Engineering

Software Engineering. Software Processes. Based on Software Engineering, 7 th Edition by Ian Sommerville

3C05: Unified Software Development Process

WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT

Software Process and Models

SAS in clinical trials A relook at project management,

A Comparison of SOA Methodologies Analysis & Design Phases

Object-Oriented Systems Analysis and Design

Background: Business Value of Enterprise Architecture TOGAF Architectures and the Business Services Architecture

A Software Development Platform for SOA

Modellistica Medica. Maria Grazia Pia, INFN Genova. Scuola di Specializzazione in Fisica Sanitaria Genova Anno Accademico

Sistemi ICT per il Business Networking

Case Study. Developing an. Enterprise-wide Architecture. within. Insurance Australia Group

A Review of an MVC Framework based Software Development

Requirements Management Practice Description

The Rational Unified Process

What Is the Rational Unified Process?

Chapter 2 Software Processes

Basic Unified Process: A Process for Small and Agile Projects

Using UML Part One Structural Modeling Diagrams

JOURNAL OF OBJECT TECHNOLOGY

CS 487. Week 8. Reference: 1. Software engineering, roger s. pressman. Reading: 1. Ian Sommerville, Chapter 3. Objective:

CS590D: Data Mining Chris Clifton

Appendix 2-A. Application and System Development Requirements

Clarifying a vision on certification of MDA tools

Software Engineering

I219 Software Design Methodology

Surveying and evaluating tools for managing processes for software intensive systems

Software Project Management using an Iterative Lifecycle Model

SysML Modelling Language explained

Managing Small Software Projects - An Integrated Guide Based on PMBOK, RUP, and CMMI

Nr.: Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg

Redesigned Framework and Approach for IT Project Management

UML FOR OBJECTIVE-C. Excel Software

Data Modeling Basics

Fixed Scope Offering for Implementation of Sales Cloud & Sales Cloud Integration With GTS Property Extensions

This is an author-deposited version published in : Eprints ID : 15447

IBM Rational DOORS Next Generation

Software Process and Project Plan

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Process Methodology. Wegmans Deli Kiosk. for. Version 1.0. Prepared by DELI-cious Developers. Rochester Institute of Technology

Requirements Management with Enterprise Architect

Generating Aspect Code from UML Models

!! " "!! # $ % " & ' $ % (! %) * +, $ ( ) ' " -

A Multitier Fraud Analytics and Detection Approach

JOURNAL OF OBJECT TECHNOLOGY

Fixed Scope Offering for Oracle Fusion HCM. Slide 1

Step-by-step data mining guide

PHASE 6: DEVELOPMENT PHASE

(BA122) Software Engineer s Workshop (SEW)

Project estimation with Use Case Points using Enterprise Architect (EA)

Time Monitoring Tool Software Development Plan. Version <1.1>

The Business Process Model

Meta-Model specification V2 D

Analysis of the Specifics for a Business Rules Engine Based Projects

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

How To Understand The Business Analysis Lifecycle

Requirements Management

White Paper What Solutions Architects Should Know About The TOGAF ADM

IBM G-Cloud IBM SPSS Decision Management Software as a Service

RUP for Software Development Projects

Rational Unified Process for Systems Engineering RUP SE1.1. A Rational Software White Paper TP 165A, 5/02

Knowledge, Certification, Networking

Development models. 1 Introduction. 2 Analyzing development models. R. Kuiper and E.J. Luit

SYSTEMS ANALYSIS DESIGN

White Paper IT Methodology Overview & Context

Function Point Modeler Enterprise Edition A Software Lifecycle Management Tool

CDC UNIFIED PROCESS PRACTICES GUIDE

Transcription:

EXTENDING UML FOR MODELLING OF DATA MINING CASES Prof. LADISLAV BURITA VOJTECH ONDRYHAL EXTENDING UML FOR MODELLING OF DATA MINING CASES The article describes possible approach for modelling data mining cases. The aim of the paper is to describe possibility of using standard modelling language for data mining process to achieve compatibility with other projects based on incremental approach, especially those using Unified Process and UML language. Common UML elements like use cases, classes, interfaces, components, nodes, etc. can be specialized by an extension mechanisms including stereotypes and named values. The new set of UML elements is provided and described for data mining process that covers whole project lifecycle. As an example of such approach can be stated data mining model element, that can extend UML element class by new named values like input data, output data, model parameters, etc. UML Schemas, syntax, semantic and usability examples for those new elements will be included in the paper. A cikk az adatbányászati esetek modellezésének egy lehetséges módját írja le. Célja hogy ismertesse az általános modellező nyelvek használatának lehetőségét az adatbányászati folyamatokhoz, megőrizve ezzel a kompatibilitást más projektekkel, különösen azokkal melyek Egységesített Eljárást és UML nyelvet használnak. Az általános UML elemek, mint a használati esetek (Use case), osztályok (class), komponensek, szálak (node) stb. egyedivé tehetők egy külső, sablonokat és nevesített értékeket tartalmazó mechanizmussal. Az így létrejött új UML elemkészlet használata ajánlott az egész project életciklust lefedő adatbányászati folyamatokhoz. Példaként egy ilyen megközelítéshez vegyük egy adatbányászati modell elemét, amely kiterjeszt egy UML osztályt új, nevesített értékekkel, amik lehetnek bementi és kimeneti adatok, paraméterek stb.. A cikk tartalmazza az új elemekhez tartozó UML sémákat és a szintaxist, valamint példákat a használatukhoz. Development Process Methodology A methodology formally defines the process that you use to gather requirements, analyze them, and design an application that meets them. There are many methodologies, each differing in some way or ways from the others. There are many reasons why one methodology may be better than another for any particular project: For example, some are better suited for large enterprise applications while others are built to design small embedded or safety-critical systems. Some methods better 111

VÉDELMI INFOKOMMUNIKÁCIÓ support large numbers of architects and designers working on the same project, while others work better when used by one person or a small group. Unified Process and UML Language The Unified Process and UML (Unified Modelling Language) are quickly becoming the defacto standards for development process (software development methodology), within the object-oriented and component-based software communities. The Unified Modelling Language (UML) is a graphical language for visualizing, specifying, constructing, and documenting the artefacts of a software-intensive system. The UML offers a standard way to write a system's blueprints, including conceptual things such as business processes and system functions as well as concrete things such as programming language statements, database schemas, and reusable software components. [www-uml] On the Figure 1 [RUP2000] there are displayed key concepts of Rational Unified Process (RUP). The aim of the research is to reuse of those concepts in building data mining methodology. 112 Figure 1 Key Concepts of Rational Unified Process

EXTENDING UML FOR MODELLING OF DATA MINING CASES mining development process methodologies In the data mining world, we can recognize several methodologies for data mining projects. These are usually tightly connected with software producers like SAS, SPSS, Oracle or Microsoft companies. Among these approaches, CRISP-DM methodology is probably the leader in the field of industry independent methodologies. The whole process is described in four level hierarchical process model, consisting of sets of tasks as follows: phase, generic task, specialized task, process instance. On the Figure 2 is the common representation of data mining project based on CRISP-DM. The data lies in the centre of the process. Figure 2 Project lifecycle according to CRISP-DM methodology Integration In a project, where data mining technology is only part of a whole solution, integrated environment has to be set up. Unified Process and UML, as was already mentioned, provide environment already accepted within 113

VÉDELMI INFOKOMMUNIKÁCIÓ software development communities. In the next part of the article possible approach for integration of data mining cases into corporate projects is introduced. All the main phases have been refactored and models, according to Unified Process guidelines, have been created. The following changes and additions have been made to the CRISP-DM methodology: Roles were introduced. Role is not explicitly defined in CRISP- DM. This will help to assign properly responsibilities to persons. For example role Analyst is required in Understanding workflow. Outputs and products from phases have been transformed to artefacts. Significantly reduced number of independent deliverables. Outputs from tasks were integrated and a list of suggested documents has been created. For all documents templates were defined in html and rtf formats. Modelling tool (Enterprise Architect) was used to model data mining process. From such tool subsequent documentation can be generated for output unification. od Process model packages (phases) Name: Package: Version: Author: Process model packages (phases) Mining Process Model 1.0 Vojtěch Ondryhal Tasks and Deliverables Business Understanding + Project Plan + Requirements + Terminology + Vision Understanding + Analysis Report Preparation + Set + Set Description Modelling + Model + Model Description + Model Parameters Settings + Test Design Ev aluation + Evaluation Report + Final Report Deployment + Deployment Plan + Monitoring And Maintenance Plan Figure 3 Mining Process Model Overview 114

EXTENDING UML FOR MODELLING OF DATA MINING CASES Business understanding The artefacts produced during work are: The vision document provides first insight into project. It includes the following parts: background, business objectives, business success criteria, inventory of resources, risks and contingencies, costs and benefits. Requirements document includes requirements, assumptions and constraints, data mining goals and data mining success criteria. Terminology repository (in form of document or model glossary in a tool) of relevant business terminology and data mining terminology. Project plan document, for example in a form of Gant chart. The plan lists stages, duration, resources, inputs, outputs and dependencies, including initial assessment of tools and techniques. Name: Package: Version: Author: Workflow detail Business Understanding 1.0 Vojtěch Ondryhal Determine business objectives Vision Business Analyst Determine Mining Goal Asses Situation Terminology Produce Project Plan Requirements Project Manager Project Plan Figure 4 Business understanding workflow detail understanding The artefact produced in this phase is Analysis Report Document that contains report on initial data collection, description on data, report on data exploration and data quality. 115

VÉDELMI INFOKOMMUNIKÁCIÓ Name: Package: Version: Author: Workflow detail Understanding 1.0 Vojtěch Ondryhal Collect Initial Describe Analyst Verify Quality Explore Analysis Report Figure 5 understanding workflow detail preparation This phase creates data sets that will be used in the next phases for modelling. Each activity displayed on Figure 6 provides a chapter in the Set Description document. Set contains real data prepared as an input for modelling. The data are properly selected, cleaned, eventually new data items created, merged and formatted. Name: Package: Version: Author: Workflow detail Preparation 1.0 Vojtěch Ondryhal Select Clean Construct Set Description Designer Set Format Integrate Figure 6 preparation workflow detail 116

EXTENDING UML FOR MODELLING OF DATA MINING CASES Modelling At the start of the workflow tests are created for model validation, training and testing. Model itself runs prepared dataset for results. Model parameters setting lists required parameters for model and values. Usually for different set of values model behaves variously. All variants of setting should be captured and described. Name: Workflow detail Package: Modell ing Version: 1.0 Author: Vojtěch Ondryhal Select Modell ing Techni que Generate Test Design Test Design Mining Engineer Build Model Model Description Model Parameters Settings Assess Model Model Set Figure 7 Modelling workflow detail Evaluation The evaluation report indicates how results meet business criteria defined in Business Understanding phase. During evaluation models are approved (or rejected). Name: Workflow detail Package: Evaluation Version: 1.0 Author: Vojtěch Ondryhal Evaluate Results Business Analyst Ev aluation Report Final Report Determine Next Steps Review Process Proj ect Manager Quality Insurance Manager Figure 8 Evaluation phase workflow detail 117

VÉDELMI INFOKOMMUNIKÁCIÓ Final report contains review of the whole process, checks whether all required activities have been finished. It also include list of possible action in the project and decisions on these actions. Deployment Deployment is last workflow in the data mining development process. Deployment packages and deployment plan for target environment is created. Monitoring and maintenance plan defines method of day-to-day result checking in order to assure correctness of produced results. Name: Workflow detail Package: Deployment Version: 1.0 Author: Vojtěch Ondryhal Plan Deployment Deployment Plan Deployment Manager Plan Monitoring And Maintenance Monitoring And Maintenance Plan Project Manager Review Project finalize Final Report Figure 9 Deployment phase workflow details 118 Conclusion The possible approach for modelling of data mining cases based on UML and CRISP-DM was introduced in the paper. Paper provides insight into the more detailed work that includes detailed description of deliverables, templates and examples. This methodology is based on prototypes which were experienced at the Communication and Information Systems Department at University of Defence. The advantage of this approach is unification of the project administration (templates, work description, etc.) with other development projects.

EXTENDING UML FOR MODELLING OF DATA MINING CASES References 1. [www-ea] Enterprise Architect web site. http://www.sparxsystems.com.au/ 2. [BOHTH05] Buřita L., Ondryhal V., Hodický J., Trunda M., Hlaváček M, Information Systems, University of Defence, 2005, U-3099 [in Czech language] 3. [CD01] CRISP-DM, Step by Step Mining Guide v. 1.0, CRISP-DM Consorcium, http://www.crisp-dm.org/ 4. [RUP2000] Rational Unified Process 2000 Online documentation 5. [www-vo] Web pages of the author. http://dcs.unob.cz/~vojtech.ondryhal/ [in Czech language] 6. [www-uml] Unified Modelling Language Resource Page. http://www.uml.org/ 119

120 VÉDELMI INFOKOMMUNIKÁCIÓ