A Software Tool for. Automatically Veried Operations on. Intervals and Probability Distributions. Daniel Berleant and Hang Cheng


 Quentin Miller
 1 years ago
 Views:
Transcription
1 A Software Tool for Automatically Veried Operations on Intervals and Probability Distributions Daniel Berleant and Hang Cheng Abstract We describe a software tool for performing automatically veried arithmetic operations on independent operands when the operands are intervals, or probability distribution functions, or one operand is an interval and the other is a distribution. Intervals and distributions are expressed using the same technique, so the algorithms do not need to distinguish between intervals and distributions in their operation. The tool can calculate common arithmetic operations with guaranteed results (as well as condence limits on a distribution if the distribution is empirically estimated from samples). A previous paper (Berleant 1993 [1]) discusses the concepts, algorithms, and related work. Here we emphasize a software tool that implements the algorithms, interacts with the user via a graphical user interface, and saves, retrieves, and prints the results of its calculations. 1 Introduction to the Representation We use histograms to represent, correctly, both probability distribution functions (PDFs) and intervals. We take an interval to be an incompletely specied description of value such that there is a probability of one that the actual but unknown value falls within the interval. A probability distribution may be in the form of either a probability density function (PDF) or its integral, a cumulative distribution function (CDF). Next, we show how to represent both distribution functions and intervals, correctly, using the same representational technique. 1.1 Representing a Distribution Function Histograms are a natural way for people to create and edit probability distributions, and are used for this purpose by our software tool. While a histogram discretizes the underlying PDF, in our technique this discretization introduces 1
2 no error, and thereby maintains correctness. Instead, it introduces information loss. To see this, consider a histogram discretization of a PDF to be a partitioning of the PDF's domain into a set of intervals, each of which is associated with a probability mass equal to the area under the PDF across the range of that interval. The graphical depiction of a histogram is then simply a convention for describing this set of intervals and their associated masses. A histogram graphic (Figure 1) is user friendly but has the disadvantage that a histogram bar is conventionally shown with a at top, which might give the impression that the probability mass associated with the interval for that bar is assumed to be distributed evenly over its range. In fact, in our technique no assumption is made about how the probability mass associated with an interval is distributed over that interval, hence a given histogram bar is consistent with any such distribution, including the PDF that the histogram might have been intended to discretize but also an innite number of others. This is explained in more detail in Berleant (1993 [1]), but the main point to consider is that this view of a histogram allows it to discretize a PDF correctly but with information loss, and displaying the bars with at tops is merely a graphical convention. The more bars in the histogram, the less the information loss. The software tool we report here includes the capability of allowing users to graphically edit histograms. To underline their informationlosing but correctness properties, the tool also allows display of the same information in another graphical form, the integral (or cumulated probability) of a histogram. Just as a PDF can be integrated into a corresponding CDF, the histogram discretization of a PDF we have described can also be integrated and this cumulated form displayed graphically (Figure 1). The graphic for the cumulation of the histogram consists of two bounding CDFs. The higher of the two assumes the extreme case where the probability mass associated with each bar of the histogram is concentrated at the low bound of its interval. The lower of the two assumes the other extreme, where each probability mass is concentrated at the high bound of its corresponding interval. Any other of the innite number of distributions consistent with the histogram, when integrated, results in a curve that stays within the two stairstep shaped bounding CDFs exemplied in Figure 1. The space between these bounding CDFs reects the information loss associated with the discretization. 1.2 Representing an Interval An interval is represented in our technique as a histogram with one bar. Since the interval representation of value does not assume any particular distribution of probability mass within the interval, the interval representation is consistent with any PDF whose value is zero outside the bounds of the interval, and with any CDF whose value is zero below the interval and one above it (Figure 2). To represent the interval using bounding CDFs, we must determine the space of CDFs consistent with the interval. The two most extreme CDFs that bound 2
3 3 Figure 1: A histogram edited by a user and its two corresponding CDF bounds (one solid and one dotted).
4 this space are the one that rises faster than any other CDF consistent with the interval, and the one that rises slower than any other CDF consistent with the interval. The one that rises fastest will correspond to the PDF which is an impulse of mass 1 at the interval's low bound (that is, the variable's value is certainly equal to the interval's low bound). Likewise, the one that rises slowest will correspond to the PDF which is an impulse of mass 1 at the interval's high bound. The interval is therefore representable as a family of CDFs whose two bounding CDFs each have one \step." Figure 2 shows an interval and its representation as a bounded family of CDFs; the CDF that bounds this family from above jumps from zero to one at the interval's low bound, and CDF that bounds it from below jumps from zero to one at the interval's high bound. 2 Introduction to Arithmetic Operations The algorithmic approach to performing an arithmetic operation on histograms is exemplied in Figure 3. An arithmetic operation produces as a result a Cartesian product consisting of a set of intervals that in general contains overlaps. Due to the overlaps this set cannot be shown directly as a histogram, but can be integrated and shown correctly as a pair of stepwise, bounding CDFs. (The software tool will however allow a result to be displayed as an approximating histogram if desired, due to the visual impact and intuitive appeal of histograms.) However displayed, this result may in turn be used as an operand in further arithmetic operations. The representational technique of using a set of intervals nonoverlapping in the case of PDFs, singleton in the case of an interval operand, and overlapping in the case of results of arithmetic operations and a probability mass associated with each interval in the set, is the key to our algorithm. This representation we term an intermediate distribution and it mediates between PDFs, intervals, and bounding CDFs and is the underlying representation used by the algorithm for its computations. A fuller description of the algorithm is given in Berleant (1993 [1]). Related algorithms appeared beginning in 1968 (Ingram et al. [6]), however these algorithms were not automatically verifying. A recent constraint satisfaction approach to the problem is in Hyvonen (1995 [4]). An approach to using CDF bounds in computer systems administration appears in Post and Diltz (1986 [8]) under the rubric of the stochastic dominance eld, which recognizes the usefulness of bounded families of CDFs. Another area of related work is robust statistics. We have not been able to nd work by others discussing automatically veried operations where one operand is an interval and the other is a distribution function. A fuller review of related work appears in Berleant (1993 [1]). 4
5 5 Figure 2: The interval [4; 7] and its representation as two CDFs bounding the family of all CDFs consistent with the interval.
6 Figure 3a: Let this histogram describe uncertain value X: It depicts the intermediate distribution fp([1; 2]) = 1 2 ; p([2; 4]) = 1 2 g: 6
7 Figure 3b: Let this histogram describe uncertain value Y: It depicts the intermediate distribution fp([2; 3]) = 1 4 ; p([3; 4]) = 1 2 ; p([4; 5]) = 1 4 g: Cartesian product term # #1 #2 #3 #4 #5 #6 interval probability interval probability interval probability interval probability interval probability interval probability Operand 1 (X i) [1; 2] [1; 2] [1; 2] [2; 4] [2; 4] [2; 4] Operand 2 (Y j ) [2; 3] 1=4 [3; 4] [4; 5] 1=4 [2; 3] 1=4 [3; 4] [4; 5] 1=4 Result intermediate distribution [2; 6] 1=8 [3; 8] 1=4 [4; 10] 1=8 [4; 12] 1=8 [6; 16] 1=4 [8; 20] 1=8 Figure 3c: Multiplying the intermediate distributions for X and Y leads to another intermediate distribution, shown in the last column. (This intermediate distribution contains overlapping intervals, as is typical of intermediate distributions resulting from arithmetic operations.) 7
8 Figure 3d: Integrating the result intermediate distribution in the last column of Figure 3c produces these stairstep shaped bounding CDFs. Figure 3 (ad): Multiplying two operands. The user created two histograms X (Figure 3a) and Y (Figure 3b) using a graphical editor. These histograms are represented internally as intermediate distributions, that is, lists of intervals and associated probability masses. The intermediate distributions are used to calculate X 2 Y, whose intermediate distribution is shown in the last column of Figure 3c. Integrating this result produces the CDF bounds shown in Figure 3d. 3 A Software Tool A software calculating tool that runs under Microsoft Windows has been written. This tool does automatically veried arithmetic operations on operands, each of which may be either an interval or a distribution function. 8
9 3.1 Creating and Displaying Operands The tool allows the user to graphically create and edit histograms using the mouse. A mouseclick based menu can be used to increase or decrease the number of bars in the histogram. Clicking the left button above a bar increases its height (and hence its area, which denes its probability mass) in proportion to how high above the bar the cursor is when the left button is pressed. Clicking the left button below the top of the bar decreases its height analogously. The width of a bar can also be increased or decreased, by clicking the right (instead of the left) mouse button above or below the top of the bar. As an alternative to mousebased graphical editing, numerical values can be typed in directly when desired, by clicking on the Edit button at the top of the screen (visible e.g. in Figure 1) and invoking a dialog box. Clicking on the File button allows saving a histogram to a disk le, or reading one in from a disk le. The intent is to provide a user interface that is easytouse yet quite exible. An input histogram, as well as the result of an arithmetic operation on histogram operands, is represented internally as an intermediate distribution. This set of intervals each associated with a probability mass, when stored on disk, is stored as a list of triples. Each triple species specifying an interval low bound, an interval high bound, and a probability mass for the interval. This list can be edited using an ordinary text editor if desired. All intermediate distributions may be displayed graphically as histograms. These histograms are labeled Approximate when the intermediate distribution has overlapping intervals, as is usually the case for the result of an arithmetic operation. Approximate histograms cannot be edited, and like other histograms can alternatively be displayed correctly (not approximately) as a pair of bounding CDFs. 3.2 Operations on Operands The tool provides several primitive operations. A menu of these operations is invoked by clicking on the Operation button at the top of the screen (Figure 4). Operations in f+; 0; 2; 4g take the top two panels A and B as operands and place the result in the lowest of the three panels, panel C (Figure 5), overwriting anything that may already be in C. If this result is to be used as an operand in a subsequent operation, it may be moved into panel A or B using one of the Exchange operations (Figure 4). An operand or result may be viewed as either a histogram or a pair of CDF bounds and may be viewed in detailed form (e.g. Figure 1), with or without a grid. As an added feature, the tool can apply results of Kolmogorov (1941 [7]) and others to obtain bounds on the CDF for a random variable for which some samples are provided. The samples must be provided in a text le ending in \.smp" and containing their numerical values. This le may be read into panel A and displayed in cumulated form as an empirically estimated CDF for the underlying random variable. Then, the operation C = A's Condence Limits... 9
10 Figure 4: The operations oered by the tool. 10
11 Figure 5: Panel C contains the result of applying an operation (division in this case) to A and B. Panel C, labeled Approximate as displayed in histogram form, can alternatively be exactly displayed as a pair of bounding CDFs. 11
12 Figure 6: Condence limits around the CDF in panel A, if generated from samples of a random variable (several dozen student test scores in this case) may be calculated and shown in panel C. The condence level is chosen from a menu. (Figure 4) may be invoked to place in panel C bounds on the actual but unknown CDF for the random variable, to a given condence level (Figure 6). Various functions of the software tool may be grouped together and partially automated using a primitive batch command facility. 4 Future Work We now have a tool for doing common arithmetic operations on distribution function and interval operands. However, much more remains to be done to extend and apply the work. A next major stage in the research is to identify applications and apply the tool to those applications. Once identied, an application can serve not only to demonstrate the practical value of an idea but also to guide its further extension. Digital signal processing and highway maintenance decision support are among 12
13 those that have been suggested. Applications of stochastic dominance are also possible candidates, as are applications of robust statistics. The world is full of incompletely known values which need to be taken into account, underlining the importance and potential value of research dealing with incompletely known information. A number of extensions to the tool described here would be useful: Handling calculations whose intermediate results are dependent on one another could be done by implementing not only the common arithmetic operations as we have done, but also combinations of those so that, for example, given histograms in panels A and B; (B 0 A)=A could be calculated even though (B 0A) and A are dependent on each other. This would require a simple expression parser and, although excess width might occur in constituent interval calculations in many cases, the result would still be correct (Berleant 1993 [1]). Providing exponentiation, logarithms, trigonometric and other unary and binary operations would be useful. Handling distribution functions that have tails extending out to innite could be handled by applying the work of others on handling intervals containing bounds of 1 and 01: The tool is presently modeled after a calculator, in which only a small number of operands can be handled at once. A useful extension to the calculator concept is the spreadsheet concept. Interval mathematics has been implemented in spreadsheets (Hyvonen 1994 [5]) and distribution functions have also been shown to be feasible as cells in spreadsheets in commercial products such as Crystal Ball [3] [9]. 5 Conclusion The present paper describes a computer tool with graphical user interface capabilities, for allowing users to specify interval and distribution function operands as histograms and allowing them to perform automatically veried arithmetic operations on those operands. A previous paper (Berleant 1993 [1]) reviews related work and describes in more detail how automatically veried arithmetic operations may be carried out on intervals and distribution functions. An important next stage in this research is to nd applications for the techniques in order to demonstrate practical use of the tool and its underlying ideas. Details on the implementation appear in Cheng (1994 [2]). The software is available at no charge (for noncommercial purposes) from the authors. 13
14 6 Acknowledgements We thank Robert Babcock for helping to facilite support for one of the authors (Cheng), Vladik Kreinovich for his encouragement, Dan Layne for originally suggesting the interval mathematics forums, and Bill Walster for suggesting some possible application directions. References [1] D. Berleant, \Automatically Veried Reasoning with Both Intervals and Probability Density Functions," Interval Computations, 1993 No. 2, pp. 48{ 70. [2] H. Cheng, \A Software Tool for Automatically Veried Reasoning with Intervals and Cumulative Distribution Functions," Master's Thesis, Dept. of Computer Systems Engineering, U. of Arkansas, Fayetteville, Dec [3] Crystal Ball, software product, Decisioneering Inc., 1380 Lawrence St. Suite 520, Denver CO [4] E. Hyvonen, \Constraint Reasoning on Probability Distributions," manuscript (1995), author is at VTT Information Technology, P.O. Box 1201, VTT, FINLAND, [5] E. Hyvonen, \Spreadsheets Based on Interval Constraint Satisfaction," Articial Intelligence for Engineering Design, Analysis, and Manufacturing, Vol. 8, 1994, pp. 27{34. [6] G. E. Ingram, E. L. Welker, and C. R. Herrmann, \Designing for Reliability Based on Probabilistic Modeling Using Remote Access Computer Systems," Proceedings 7th Reliability and Maintainability Conference, American Society of Mechanical Engineers, 1968, pp. 492{500. [7] A. Kolmogoro (a.k.a. Kolmogorov), \Condence Limits for an Unknown Distribution Function," Annals of Mathematical Statistics, Vol. 12, No. 4, 1941, pp [8] G. V. Post and J. D. Diltz, \A Stochastic Dominance Approach to Risk Analysis of Computer Systems," Management Information Systems Quarterly, Vol. 10, No. 4, Dec. 1986, pp. 363{375. software product, Palisade Corp., 31 Decker Rd., Neweld, NY
15 Contact: Daniel Berleant Dept. of Computer Systems Engineering University of Arkansas Fayetteville, AR
Help File Version 1.0.14
Version 1.0.14 By engineering@optimumg.com www.optimumg.com Welcome Thank you for purchasing OptimumT the new benchmark in tire model fitting and analysis. This help file contains information about all
More informationTHE development of methods for automatic detection
Learning to Detect Objects in Images via a Sparse, PartBased Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects
More informationAutomatically Detecting Vulnerable Websites Before They Turn Malicious
Automatically Detecting Vulnerable Websites Before They Turn Malicious Kyle Soska and Nicolas Christin, Carnegie Mellon University https://www.usenix.org/conference/usenixsecurity14/technicalsessions/presentation/soska
More informationCalculating Space and Power Density Requirements for Data Centers
Calculating Space and Power Density Requirements for Data Centers White Paper 155 Revision 0 by Neil Rasmussen Executive summary The historic method of specifying data center power density using a single
More informationDirect Manipulation Interfaces
I b HUMANCOMPUTER INTERACTION, 1985, Volume 1, pp. 311338 4,T. Copyright 0 1985, Lawrence Erlbaum Associates, Inc. Direct Manipulation Interfaces Edwin L. Hutchins, James D. Hollan, and Donald A. Norman
More informationNot Seeing is Also Believing: Combining Object and Metric Spatial Information
Not Seeing is Also Believing: Combining Object and Metric Spatial Information Lawson L.S. Wong, Leslie Pack Kaelbling, and Tomás LozanoPérez Abstract Spatial representations are fundamental to mobile
More informationA Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package
A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package William J Browne, Mousa Golalizadeh Lahi* & Richard MA Parker School of Clinical Veterinary
More informationData Abstraction and Hierarchy
Data Abstraction and Hierarchy * This research was supported by the NEC Professorship of Software Science and Engineering. Barbara Liskov Affiliation: MIT Laboratory for Computer Science Cambridge, MA,
More informationUsing Focal Point Learning to Improve HumanMachine Tacit Coordination
Using Focal Point Learning to Improve HumanMachine Tacit Coordination Inon Zuckerman 1, Sarit Kraus 1, Jeffrey S. Rosenschein 2 1 Department of Computer Science BarIlan University RamatGan, Israel {zukermi,
More informationON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT
ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros
More informationgprof: a Call Graph Execution Profiler 1
gprof A Call Graph Execution Profiler PSD:181 gprof: a Call Graph Execution Profiler 1 by Susan L. Graham Peter B. Kessler Marshall K. McKusick Computer Science Division Electrical Engineering and Computer
More informationKnowledge Base Completion via SearchBased Question Answering
Knowledge Base Completion via SearchBased Question Answering Robert West*, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin *Computer Science Department, Stanford University, Stanford,
More informationA survey of pointbased POMDP solvers
DOI 10.1007/s1045801292002 A survey of pointbased POMDP solvers Guy Shani Joelle Pineau Robert Kaplow The Author(s) 2012 Abstract The past decade has seen a significant breakthrough in research on
More informationA Federated Architecture for Information Management
A Federated Architecture for Information Management DENNIS HEIMBIGNER University of Colorado, Boulder DENNIS McLEOD University of Southern California An approach to the coordinated sharing and interchange
More informationGaussian Random Number Generators
Gaussian Random Number Generators DAVID B. THOMAS and WAYNE LUK Imperial College 11 PHILIP H.W. LEONG The Chinese University of Hong Kong and Imperial College and JOHN D. VILLASENOR University of California,
More informationThe InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationMining Data Streams. Chapter 4. 4.1 The Stream Data Model
Chapter 4 Mining Data Streams Most of the algorithms described in this book assume that we are mining a database. That is, all our data is available when and if we want it. In this chapter, we shall make
More informationRevisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations
Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations Melanie Mitchell 1, Peter T. Hraber 1, and James P. Crutchfield 2 In Complex Systems, 7:8913, 1993 Abstract We present
More informationVCDEMO.rtf VcDemo Application Help Page 1 of 55. Contents
VCDEMO.rtf VcDemo Application Help Page 1 of 55 Contents VCDEMO.rtf VcDemo Application Help Page 2 of 55 About VcDemo (Version 5.03): The Image and Video Compression Learning Tool Image compression modules:
More informationMODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS
MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS JEANNETTE JANSSEN, MATT HURSHMAN, AND NAUZER KALYANIWALLA Abstract. Several network models have been proposed to explain the link structure observed
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationBest Practices for Accessible Flash Design. by Bob Regan
by Bob Regan August 2005 Copyright 2004 Macromedia, Inc. All rights reserved. The information contained in this document represents the current view of Macromedia on the issue discussed as of the date
More informationLUBM: A Benchmark for OWL Knowledge Base Systems
LUBM: A Benchmark for OWL Knowledge Base Systems Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin Computer Science & Engineering Department Lehigh University, 19 Memorial Drive West, Bethlehem, PA18015, USA
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationWhere s the FEEB? The Effectiveness of Instruction Set Randomization
Where s the FEEB? The Effectiveness of Instruction Set Randomization Ana Nora Sovarel David Evans Nathanael Paul University of Virginia, Department of Computer Science http://www.cs.virginia.edu/feeb Abstract
More informationDHIS 2 Enduser Manual 2.19
DHIS 2 Enduser Manual 2.19 20062015 DHIS2 Documentation Team Revision 1529 Version 2.19 20150707 21:00:22 Warranty: THIS DOCUMENT IS PROVIDED BY THE AUTHORS ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
More informationJigsaw: supporting investigative analysis through interactive visualization
(2008) 7, 118  132 2008 PalgraveMacmillan Ltd.All rightsreserved 14738716 $30.00 www.palgravejournals.com/ivs Jigsaw: supporting investigative analysis through interactive visualization John Stasko
More informationIBM SPSS Direct Marketing 21
IBM SPSS Direct Marketing 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 21 and to
More informationAnomalous System Call Detection
Anomalous System Call Detection Darren Mutz, Fredrik Valeur, Christopher Kruegel, and Giovanni Vigna Reliable Software Group, University of California, Santa Barbara Secure Systems Lab, Technical University
More informationFactor Graphs and the SumProduct Algorithm
498 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 2, FEBRUARY 2001 Factor Graphs and the SumProduct Algorithm Frank R. Kschischang, Senior Member, IEEE, Brendan J. Frey, Member, IEEE, and HansAndrea
More information