COCOMO II and Big Data



Similar documents
MTAT Software Economics. Lecture 5: Software Cost Estimation

CSSE 372 Software Project Management: Software Estimation With COCOMO-II

Cost Estimation Driven Software Development Process

Software cost estimation. Predicting the resources required for a software development process

Software Migration Project Cost Estimation using COCOMO II and Enterprise Architecture Modeling

Topics. Project plan development. The theme. Planning documents. Sections in a typical project plan. Maciaszek, Liong - PSE Chapter 4

Extending Change Impact Analysis Approach for Change Effort Estimation in the Software Development Phase

Web Development: Estimating Quick-to-Market Software

Software cost estimation

Software Engineering. Dilbert on Project Planning. Overview CS / COE Reading: chapter 3 in textbook Requirements documents due 9/20

CISC 322 Software Architecture

COCOMO-SCORM Interactive Courseware Project Cost Modeling

Project Plan 1.0 Airline Reservation System

Fuzzy Expert-COCOMO Risk Assessment and Effort Contingency Model in Software Project Management

COCOMO II Model Definition Manual

Keywords Software Cost; Effort Estimation, Constructive Cost Model-II (COCOMO-II), Hybrid Model, Functional Link Artificial Neural Network (FLANN).

Finally, Article 4, Creating the Project Plan describes how to use your insight into project cost and schedule to create a complete project plan.

Incorporating Data Mining Techniques on Software Cost Estimation: Validation and Improvement

Effect of Schedule Compression on Project Effort

Agile Inspired Risk Mitigation Techniques for Software Development Projects

SoftwareCostEstimation. Spring,2012

Safety critical software and development productivity

University of Southern California COCOMO Reference Manual

Project Plan. Online Book Store. Version 1.0. Vamsi Krishna Mummaneni. CIS 895 MSE Project KSU. Major Professor. Dr.Torben Amtoft

The COCOMO II Estimating Model Suite

Software Estimation Experiences at Xerox

Software Engineering. Reading. Effort estimation CS / COE Finish chapter 3 Start chapter 5

Identifying Factors Affecting Software Development Cost

2 Evaluation of the Cost Estimation Models: Case Study of Task Manager Application. Equations

Safe and Simple Software Cost Analysis Barry Boehm, USC Everything should be as simple as possible, but no simpler.

SOFTWARE COST DRIVERS AND COST ESTIMATION IN NIGERIA ASIEGBU B, C AND AHAIWE, J

Module 11. Software Project Planning. Version 2 CSE IIT, Kharagpur

A Comparative Evaluation of Effort Estimation Methods in the Software Life Cycle

CISC 322 Software Architecture. Example of COCOMO-II Ahmed E. Hassan

E-COCOMO: The Extended COst Constructive MOdel for Cleanroom Software Engineering

Software cost estimation

Chapter 23 Software Cost Estimation

COCOMO (Constructive Cost Model)

Software Engineering Economics Barry W. Boehm

The Next Wave of Data Management. Is Big Data The New Normal?

Dr. Barry W. Boehm USC Center for Software Engineering

Contents. Today Project Management. Project Management. Last Time - Software Development Processes. What is Project Management?

Comparative Analysis of COCOMO II, SEER-SEM and True-S Software Cost Models

Software cost estimation

Knowledge-Based Systems Engineering Risk Assessment

Distributed Operating Systems

VIDYAVAHINI FIRST GRADE COLLEGE

Software Cost Estimation Methods: A Review

risks in the software projects [10,52], discussion platform, and COCOMO

The 10 Most Important Ideas in Software Development

Deducing software process improvement areas from a COCOMO II-based productivity measurement

The ROI of Systems Engineering: Some Quantitative Results

10 Keys to Successful Software Projects: An Executive Guide

The 10 Best Ideas in Software Development

Cost Estimation for Secure Software & Systems

Cost Estimation Strategies COST ESTIMATION GUIDELINES

Valuation of Software Intangible Assets

Cost/Benefit-Aspects of Software Quality Assurance

IT2403-SOFTWARE PROJECT MANAGEMENT 2 MARKS QUESTIONS

The Effect of CASE Tools on Software Development Effort

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

ICS 121 Lecture Notes Spring Quarter 96

How To Manage Project Management

COTIPMO: A COnstructive Team Improvement Process MOdel

Fundamentals of Measurements

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

What happens when Big Data and Master Data come together?

Big Data and Data Science: Behind the Buzz Words

Cost Models for Future Software Life Cycle Processes: COCOMO 2.0 *

Applying COCOMO II - A case study Darko Milicic

Operating Systems 4 th Class

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Hadoop for Enterprises:

COCOMO II Model Definition Manual

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

The emergence of big data technology and analytics

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Architectures for Big Data Analytics A database perspective

Big Data-Challenges and Opportunities

Data Refinery with Big Data Aspects

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

CS Homework 4 p. 1. CS Homework 4. To become more familiar with top-down effort estimation models, especially COCOMO 81 and COCOMO II.

A Comparison of Distributed Systems: ChorusOS and Amoeba

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

Chapter 1: Introduction. What is an Operating System?

IMPROVED SIZE AND EFFORT ESTIMATION MODELS FOR SOFTWARE MAINTENANCE. Vu Nguyen

Principles of Operating Systems CS 446/646

Big Data Processing: Past, Present and Future

Chapter 5: System Software: Operating Systems and Utility Programs

FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY AUTUMN 2016 BACHELOR COURSES

Chapter 7. Using Hadoop Cluster and MapReduce

Cost Models for Future Software Life Cycle Processes: COCOMO 2.0 *

Big Data Database Revenue and Market Forecast,

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

BIG DATA TRENDS AND TECHNOLOGIES

Introduction to Embedded Systems. Software Update Problem

Domain Analysis for the Reuse of Software Development Experiences 1

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Big Data Are You Ready? Thomas Kyte

Transcription:

COCOMO II and Big Data Rachchabhorn Wongsaroj*, Jo Ann Lane, Supannika Koolmanojwong, Barry Boehm *Bank of Thailand and Center for Systems and Software Engineering Computer Science Department, Viterbi School of Engineering University of Southern California 28 th International Forum on COCOMO and System/Software Cost Modeling

Outline Big Data Concept COCOMO II Cost factor COCOMO II Cost factor and Big Data Future Works 2

Big Data concept Big Data Datasets whose size are beyond the ability of typical database software tools to capture, store, manage, and analyze (McKinsey Global Institute) 3V s concepts of Big Data (IBM) Volume -- The amounts of data generated Variety -- The different data types and sources Velocity -- The speed of data is generated in/out and moves around 3 Source: IBM

Big Data concept Volume People to People Variety Machine to Machine People to Machine Velocity 8 Billion messages/day 845M active users 20 Hours of video uploaded every minute 340Million Tweets/day 140M active users Source: IBM 4

Big Data Landscape Source: Sajal Das, Keith Marzullo Source: IBM 5

Big Data Landscape (cont.) Source: blogs.forbes.com/davefeinleib 6

Big Data problems World interconnection Data Quality Data Quantity Lots of data is being created & collected Data Data Timely Variety 7

COCOMO Black Box Model product size estimate product, process, platform, and personnel attributes reuse, maintenance, and increment parameters COCOMO II development, maintenance cost and schedule estimates cost, schedule distribution by phase, activity, increment organizational project data recalibration to organizational data 8

COCOMO II Cost factor Significant factors of development cost: scale drivers are sources of exponential effort variation cost drivers are sources of linear effort variation product, platform, personnel and project attributes effort multipliers associated with cost driver ratings Each factor is rated between very low and very high per rating guidelines 9

Scale Drivers Precedentedness (PREC) Degree to which system is new and past experience applies Development Flexibility (FLEX) Need to conform with specified requirements Architecture/Risk Resolution (RESL) Degree of design thoroughness and risk elimination Team Cohesion (TEAM) Need to synchronize stakeholders and minimize conflict Process Maturity (PMAT) SEI CMM process maturity rating 10 10

Scale Drivers Precedentedness (PREC) Degree to which system is new and past experience applies Development Flexibility (FLEX) Need to conform with specified requirements Architecture/Risk Resolution (RESL) Degree of design thoroughness and risk elimination Team Cohesion (TEAM) Need to synchronize stakeholders and minimize conflict Process Maturity (PMAT) SEI CMM process maturity rating 11 (c) USC 11 CSSE

Scale Drivers Scale Factors (W i ) Very Low Low Nominal High Very High Extra High Precedentedness (PREC) Development Flexibility (FLEX) Architecture/Risk Resolution (RESL)* Team Cohesion (TEAM) Process Maturity (PMAT) thoroughly unprecedented rigorous largely unprecedented occasional relaxation somewhat unprecedented some relaxation generally familiar general conformity little (20%) some (40%) often (60%) generally (75%) very difficult interactions some difficult interactions basically cooperative interactions largely cooperative largely familiar some conformity mostly (90%) highly cooperative Weighted average of Yes answers to CMM Maturity Questionnaire * % significant module interfaces specified, % significant risks eliminated throughly familiar general goals full (100%) seamless interactions 12 12

Precedentedness (PREC) Elaboration of the PREC rating scales: Feature Very Low Nominal / High Extra High Precedentedness Organizational understanding of product objectives Experience in working with related software systems Concurrent development of associated new hardware and operational procedures Need for innovative data processing architectures, algorithms General Considerable Thorough Moderate Considerable Extensive Extensive Moderate Some Considerable Some Minimal 13 13

Cost Drivers Product Factors Reliability (RELY) Data (DATA) Complexity (CPLX) Reusability (RUSE) Documentation (DOCU) Platform Factors Time constraint (TIME) Storage constraint (STOR) Platform volatility (PVOL) Personnel Factors Analyst capability (ACAP) Program capability (PCAP) Applications experience (APEX) Platform experience (PLEX) Language and tool experience (LTEX) Personnel continuity (PCON) Project Factors Software tools (TOOL) Multisite development (SITE) Required schedule (SCED) 14

Cost Drivers and Big Data Product Factors Reliability (RELY) Data (DATA) Complexity (CPLX) Reusability (RUSE) Documentation (DOCU) Platform Factors Time constraint (TIME) Storage constraint (STOR) Platform volatility (PVOL) Personnel Factors Analyst capability (ACAP) Program capability (PCAP) Applications experience (APEX) Platform experience (PLEX) Language and tool experience (LTEX) Personnel continuity (PCON) Project Factors Software tools (TOOL) Multisite development (SITE) Required schedule (SCED) 15

Product Factors (cont d) Required Software Reliability (RELY) Measures the extent to which the software must perform its intended function over a period of time. Ask: what is the effect of a software failure? Very Low Low Nominal High Very High Extra High RELY Descriptors slight inconvenience low, easily recoverable losses moderate, easily recoverable losses high financial loss risk to human life 16

Big Data Landscape 17 Source: Sajal Das, Keith Marzullo Source: IBM

Product Factors (cont d) Data Base Size (DATA) Captures the effect large data requirements have on development to generate test data that will be used to exercise the program. Calculate the data/program size ratio (D/P): D P DataBaseSize( Bytes ) Program Size( SLOC) IBM: Data Base Size of Big Data -> Scale from terabytes to zettabytes Very Low Low Nominal High Very High Extra High DATA DB bytes/ Pgm SLOC < 10 10 D/P < 100 100 D/P < 1000 D/P > 1000 18

19

20 Source: (c)2012 Enterprise Strategy Group

Product Factors (cont d) Product Complexity (CPLX) Complexity is divided into five areas: control operations, computational operations, device-dependent operations, data management operations, and user interface management operations. Select the area or combination of areas that characterize the product or a sub-system of the product. 21

Product Factors (cont d) Module Complexity Ratings vs. Type of Module Use a subjective weighted average of the attributes, weighted by their relative product importance. Control Operations Computational Operations Very Low Low Nominal High Very High Extra High Straightline code with a few nonnested structured programming operators: DOs, CASEs, IFTHENELSEs. Simple module composition via procedure calls or simple scripts. Evaluation of simple expressions: e.g., A=B+C*(D-E) Straightforward nesting of structured programming operators. Mostly simple predicates. Evaluation of moderate-level expressions: e.g., D=SQRT(B**2-4.*A*C) Mostly simple nesting. Some intermodule control. Decision tables. Simple callbacks or message passing, including middlewaresupported distributed processing. Use of standard math and statistical routines. Basic matrix/vector operations. Highly nested structured programming operators with many compound predicates. Queue and stack control. Homogeneous, dist. processing. Single processor soft realtime ctl. Basic numerical analysis: multivariate interpolation, ordinary differential eqns. Basic truncation, roundoff concerns. Reentrant and recursive coding. Fixed-priority interrupt handling. Task synchronization, complex callbacks, heterogeneous dist. processing. Singleprocessor hard realtime ctl. Difficult but structured numerical analysis: near-singular matrix equations, partial differential eqns. Simple parallelization. Multiple resource scheduling with dynamically changing priorities. Microcodelevel control. Distributed hard realtime control. Difficult and unstructured numerical analysis: highly accurate analysis of noisy, stochastic data. Complex parallelization. 22

Product Factors (cont d) Devicedependent Operations Data Management Operations User Interface Management Very Low Low Nominal High Very High Extra High Simple read, write statements with simple formats. Simple arrays in main memory. Simple COTS- DB queries, updates. Simple input forms, report generators. No cognizance needed of particular processor or I/O device characteristics. I/O done at GET/PUT level. Single file subsetting with no data structure changes, no edits, no intermediate files. Moderately complex COTS-DB queries, updates. Use of simple graphic user interface (GUI) builders. I/O processing includes device selection, status checking and error processing. Multi-file input and single file output. Simple structural changes, simple edits. Complex COTS-DB queries, updates. Simple use of widget set. Operations at physical I/O level (physical storage address translations; seeks, reads, etc.). Optimized I/O overlap. Simple triggers activated by data stream contents. Complex data restructuring. Widget set development and extension. Simple voice I/O, multimedia. Routines for interrupt diagnosis, servicing, masking. Communication line handling. Performance-intensive embedded systems. Distributed database coordination. Complex triggers. Search optimization. Moderately complex 2D/3D, dynamic graphics, multimedia. Device timingdependent coding, micro-programmed operations. Performancecritical embedded systems. Highly coupled, dynamic relational and object structures. Natural language data management. Complex multimedia, virtual reality. 23

Source: (c)2012 Enterprise Strategy Group

25 25

Platform Factors Execution Time Constraint (TIME) Measures the constraint imposed upon a system in terms of the percentage of available execution time expected to be used by the system consuming the execution time resource. Very Low Low Nominal High Very High Extra High TIME 50% use of available execution time 70% 85% 95% http://www.parstream.com/product/ 26 26

Source: (c)2012 Enterprise Strategy Group

Platform Factors Main Storage Constraint (STOR) Measures the degree of main storage constraint imposed on a software system or subsystem. Very Low Low Nominal High Very High Extra High STOR 50% use of available storage 70% 85% 95% The largest big data practitioners Google, Facebook, Apple, etc run what are known as hyper scale computing environments. 28 28

Big Data Storage The key requirements of big data storage are that: Must be capable of handling large volumes of data Must be scalable to growth Must provide the input/output operations per second (IOPS) to deliver data to analytic tools 29

Personnel Factors Analyst Capability (ACAP) Analysts work on requirements, high level design and detailed design. Consider analysis and design ability, efficiency and thoroughness, and the ability to communicate and cooperate. Very Low Low Nominal High Very High Extra High ACAP 15th percentile 35th percentile 55th percentile 75th percentile 90th percentile Programmer Capability (PCAP) Evaluate the capability of the programmers as a team rather than as individuals. Consider ability, efficiency and thoroughness, and the ability to communicate and cooperate. Very Low Low Nominal High Very High Extra High PCAP 15th percentile 35th percentile 55th percentile 75th percentile 90th percentile 30 30

Personnel Factors (cont d) Applications Experience (AEXP) Assess the project team's equivalent level of experience with this type of application. Very Low Low Nominal High Very High Extra High AEXP 2 months 6 months 1 year 3 years 6 years 31 31

32 Source: (c)2012 Enterprise Strategy Group 32

Personnel Factors (cont d) Platform Experience (PEXP) Assess the project team's equivalent level of experience with this platform including the OS, graphical user interface, database, networking, and distributed middleware. Very Low Low Nominal High Very High Extra High PEXP 2 months 6 months 1 year 3 years 6 year 33 33

34 Source: (c)2012 Enterprise Strategy Group 34

35 Source: (c)2012 Enterprise Strategy Group 35

Conclusion - Scale Drivers and Big Data Scale Drivers Precedentedness (PREC) Development Flexibility (FLEX) Architecture/Risk Resolution (RESL) Team Cohesion (TEAM) Process Maturity (PMAT) COCOMO II Coverage 36 (c) USC 36 CSSE

Conclusion - Cost Drivers and Big Data Cost Drivers Reliability (RELY) COCOMO II Coverage / Future Work Data (DATA) Need to define EXTRA HIGH Cost rating For terabytes to zettabytes data project Complexity (CPLX) but need more detail for Big Data - custom developed solution (25% of all projects) Reusability (RUSE) Documentation (DOCU) Time constraint (TIME) Storage constraint (STOR) Platform volatility (PVOL) 37 (c) USC 37 CSSE

Conclusion - Cost Drivers and Big Data Cost Drivers COCOMO II Coverage / Future Work Analyst capability (ACAP) Program capability (PCAP) Applications experience (APEX) Platform experience (PLEX) Language and tool experience (LTEX) Personnel continuity (PCON) Software tools (TOOL) Multisite development (SITE) Required schedule (SCED) 38 (c) USC 38 CSSE

Reference Barry W. Boehm, et al (2000), Software Cost Estimation With COCOMO II, Prentice Hall, New Jersey. Barry W. Boehm (1981), Software Engineering Economics, Prentice Hall, New Jersey. McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity, June 2011 (www.mckinsey.com/mgi) Zikopoulos, P., and Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data, McGraw-Hill Osborne Media. Enterprise Strategy Group, Research Report : The Convergence of Big Data Processing and Integrated Infrastructure http://en.wikipedia.org/wiki/big_data 39 (c) USC 39 CSSE