Modelling and Resolving Software Dependencies



Similar documents
10.2 Systems of Linear Equations: Matrices

Firewall Design: Consistency, Completeness, and Compactness

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

State of Louisiana Office of Information Technology. Change Management Plan

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

A New Evaluation Measure for Information Retrieval Systems

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

Lecture L25-3D Rigid Body Kinematics

A Data Placement Strategy in Scientific Cloud Workflows

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

Introduction to Integration Part 1: Anti-Differentiation

On Adaboost and Optimal Betting Strategies

Risk Adjustment for Poker Players

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014

The one-year non-life insurance risk

Enterprise Resource Planning

! # % & ( ) +,,),. / % ( 345 6, & & & &&3 6

Ch 10. Arithmetic Average Options and Asian Opitons

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues

A Comparison of Performance Measures for Online Algorithms

Cross-Over Analysis Using T-Tests

Sustainability Through the Market: Making Markets Work for Everyone q

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines

Towards a Framework for Enterprise Architecture Frameworks Comparison and Selection

Risk Management for Derivatives

The Quick Calculus Tutorial

Notes on tangents to parabolas

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik Trier, Germany

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements

Example Optimization Problems selected from Section 4.7

There are two different ways you can interpret the information given in a demand curve.

Professional Level Options Module, Paper P4(SGP)

How To Segmentate An Insurance Customer In An Insurance Business

Product Differentiation for Software-as-a-Service Providers

Math , Fall 2012: HW 1 Solutions

Differentiability of Exponential Functions

Aon Retiree Health Exchange

Chapter 9 AIRPORT SYSTEM PLANNING

Mathematical Models of Therapeutical Actions Related to Tumour and Immune System Competition

RUNESTONE, an International Student Collaboration Project

Web Appendices of Selling to Overcon dent Consumers

EU Water Framework Directive vs. Integrated Water Resources Management: The Seven Mismatches

Search Advertising Based Promotion Strategies for Online Retailers

The most common model to support workforce management of telephone call centers is

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

Achieving quality audio testing for mobile phones

Calculus Refresher, version c , Paul Garrett, garrett@math.umn.edu garrett/

Supporting Adaptive Workflows in Advanced Application Environments

Option Pricing for Inventory Management and Control

View Synthesis by Image Mapping and Interpolation

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

Optimal Energy Commitments with Storage and Intermittent Supply

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

Minimizing Makespan in Flow Shop Scheduling Using a Network Approach

Answers to the Practice Problems for Test 2

An Introduction to Event-triggered and Self-triggered Control

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

Performance And Analysis Of Risk Assessment Methodologies In Information Security

UNIFIED BIJECTIONS FOR MAPS WITH PRESCRIBED DEGREES AND GIRTH

Trace IP Packets by Flexible Deterministic Packet Marking (FDPM)

Safety Stock or Excess Capacity: Trade-offs under Supply Risk

Unbalanced Power Flow Analysis in a Micro Grid

Using research evidence in mental health: user-rating and focus group study of clinicians preferences for a new clinical question-answering service

Bellini: Ferrying Application Traffic Flows through Geo-distributed Datacenters in the Cloud

Pythagorean Triples Over Gaussian Integers

y or f (x) to determine their nature.

The higher education factor: The role of higher education in the hiring and promotion practices in the fire service. By Nick Geis.

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

zupdate: Updating Data Center Networks with Zero Loss

Net Neutrality, Network Capacity, and Innovation at the Edges

Owner s Manual. TP--WEM01 Performance Series AC/HP Wi-- Fi Thermostat Carrier Côr Thermostat TABLE OF CONTENTS

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION

Stock Market Value Prediction Using Neural Networks

USING SIMPLIFIED DISCRETE-EVENT SIMULATION MODELS FOR HEALTH CARE APPLICATIONS

Lagrangian and Hamiltonian Mechanics

How To Find Out How To Calculate Volume Of A Sphere

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar

Reading: Ryden chs. 3 & 4, Shu chs. 15 & 16. For the enthusiasts, Shu chs. 13 & 14.

2r 1. Definition (Degree Measure). Let G be a r-graph of order n and average degree d. Let S V (G). The degree measure µ(s) of S is defined by,

Automatic Long-Term Loudness and Dynamics Matching

Measures of distance between samples: Euclidean

Inverse Trig Functions

CURRENCY OPTION PRICING II

Given three vectors A, B, andc. We list three products with formula (A B) C = B(A C) A(B C); A (B C) =B(A C) C(A B);

Transcription:

<burrows@ebian.org> June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software that may be installe on a system. Resolving incompatibilities between ifferent pieces of software is an NP-complete problem, an existing solutions require the user to manually resolve many simple epenency problems. I present a simplifie, abstract moel of epenency relationships, an a restartable technique base on best-first-search to calculate resolutions. Note. This is a work in progress; it sometimes lags behin or jumps ahea of the current state of the software it ocuments, an some of the etails may be incomplete or unattene to. However, I hope that it provies some more insight into the irection in which aptitue s problem resolver is heae an in which I believe that other installation frontens shoul also consier heaing. 1 Introuction It is common nowaays for hunres or thousans of software packages to be installe on a single computer system, an for many of these software packages to interact with one another. Because some combinations of software packages will not function properly for instance, an application program might require a particular version of a graphics library installing software manually while avoiing unexpecte breakage is an increasingly unpleasant chore. To aress this problem, programs known as package systems were evelope. A package system typically manages packages that consist of the files of a program or library, along with metaata such as the name an version of the package, a brief escription of what it contains, an (most importantly for our purposes) a list of which other packages it requires or is incompatible with. The package installation software warns the user upon any attempt to install or remove software that woul violate these constraints. Unfortunately, the early versions of these tools replace the chore of manual software installation with the chore of epenency resolution: for instance, installing a package of the popular game wesnoth might prouce an error inicating that the user shoul fin, ownloa, an install a new version of the SDL 1

graphics library. A new version of the kmail mail client might require the user to upgrae his or her entire operating system, inicating this fact by a slew of cascaing error messages. As a result of this so-calle epenency hell, new an more automate tools, such as apt an up2ate, were evelope. These tools maintain a atabase of software installe on the user s system, along with software available from any number of remote sites. To install or remove a piece of software, the user issues a request to, for instance, install wesnoth or upgrae kmail. The installation tool will procee to fin a set of package installations or removals which leas to a consistent result. Typically, it then presents this list of actions to the user an prompts for confirmation; the user can either accept the propose solution, or reject it an procee to fix the problem in a fully manual way. Once the user is satisfie with the propose changes, the tool will ownloa any new software packages an install them. This approach has two major rawbacks: 1. The user interface for resolving epenencies is a take it or leave it proposition: there is no way for the user to ask the algorithm to fin another solution. This means that if the algorithm makes a poor or unesire choice (which, as I will argue below, will inevitably occur from time to time) the user is force to fall back to fully manual operation. 2. In at least some cases (particularly apt), the algorithm use in resolving epenency conflicts eals poorly which is a euphemism for not at all when there are more than two versions of a package to choose from 1. For instance, if versions 1, 2, an 3 of package A are available, with 2 being the efault version of the package, an if package B requires version 3 of package A, when the user tries to install package B, he or she will receive an error message inicating that the epenency on A cannot be fulfille. Another general ifficulty in solving epenencies in these systems is that the package systems contain many features which, although they are arguably syntactic sugar, ten to cause algorithms that operate on packages to become strewn with complex iteration constructs an unpleasant corner cases. Although some attempts have been mae to fin general moels of package epenencies (for instance, the internal structures of apt can represent either Debian or Re Hat packages), the moels with which I am familiar work by taking a greatest upper boun of the systems that they cover, leaing to a generic framework that is, if anything, even more convolute than the iniviual package systems that it covers. Note. I have not yet performe an extensive survey of package systems, an it may be that there alreay exist systems that fix one, two, or all of the rawbacks liste above. 1 More precisely, if more than one version other than the currently installe version (if any) exists. 2

2 Example: the Debian Package System The Debian package system is implemente by a low-level tool known as pkg. Debian packages are files with the extension.eb; pkg can install a.eb file that has alreay been retrieve, or remove a package that is currently installe on the system. If epenency constraints are violate, pkg will print errors messages an abort the installation after unpacking the packages. The usual user interface to the package system is through one of the programs in the apt suite. apt is a high-level library which allows C++ programs to examine the set of installe packages, etermine what actions are to be performe, an execute these actions (by, for instance, ownloaing package files an calling pkg to install them). apt-base installation tools typically refuse to even begin any actions that will result in an inconsistent system state, an all of them provie a basic algorithm that resolves inconsistencies by ajusting package states until all epenencies are fixe. In the Debian package system, each package may have one or more versions, but at most one version of each package may be installe at any given time. The basic relationships between packages are epenencies an conflicts. For instance, version 6.14.00-1 of the tcsh comman shell epens on version 2.3.2.s-4 or greater of the libc6 package an version 5.4-1 or greater of the libncurses5 package: it may not be installe unless an appropriate version of each of these package is installe. On the other han, the same package conflicts with all versions of the tcsh-kanji an tcsh-i18n packages: tcsh may not be installe at the same time as either of these packages. A single epenency may name several packages, combine with an OR operator (inicate by a vertical bar). For instance, version 1.4.48 of the ebconf package epens upon ebconf-i18n ebconf-english; in orer to install ebconf, you must also install one of these two packages. Last but not least, pkg supports what are known as virtual packages. Each version of each package may provie one or more package names; each name package will become a virtual package. Virtual packages can have the same name as a normal package, in which case they are known as mixe virtual, or they can exist only through being provie by a normal package, in which case they are known as pure virtual. An unversione epenency on a virtual package will be satisfie if any provier of the name is installe, while an unversione conflicts will require that all proviers of the name are remove but as a special case, irect or implicit self-conflicts are ignore. Versione epenencies an conflicts o not, as of this writing, follow provie package names. For instance, the virtual package mail-transport-agent is use to ientify packages containing mail transport agents. Every such package both provies an conflicts with the virtual package, an packages that require a mail transport agent epen on it. 2 2 Due to a quirk in how apt resolves epenencies, epenencies on a virtual package are require to inclue a real package as an alternative: for instance, bugzilla epens on senmail mail-transport-agent. 3

3 An Abstract Moel of Depenency Relationships As can be seen in the previous section, real epenency systems are complex. This tens to complicate the business of reasoning about how to fin solutions to epenency problems, an to cause algorithms that manipulate epenencies to become horribly messy. In situations like this, it is often a goo iea to fin a simpler, more mathematical moel of the problem being analyze. Of course, it is well-known that package epenencies can be reuce to the satisfaction of Boolean equations, but such a reuction is arguably too extreme: it certainly results in a mathematical moel, but the moel it prouces hies the structure of the original problem. The following section escribes an alternate moel which is sufficient to capture any epenency problem (at least in Debian) an retains the structure of a package system. 3.1 Basic Concepts In this simplifie moel, the only objects in the worl are packages, versions of packages, an epenencies. Packages will typically be enote by p 1,... ; versions will typically be enote by v 1,... ; an epenencies are of the form v {v 1,... }, inicating that the version v requires the versions {v 1,... }. The package associate with a version v is enote by P kgof(v). To represent the state of the entire system, the following sets are efine: P is the set of all packages. V is the set of all package versions. D is the set of all epenencies. 3 Throughout this paper, I will assume that P an V (an hence D) are finite. 3.2 Reuction of Debian Depenencies to the Moel As claime above, it is possible to reuce a Debian epenency system to this abstracte moel. The reuction procees in approximately the following way: P is the set of Debian packages. V is the set of versions of those packages, plus one aitional version for each package. This version represents the state of the package when it is not installe. Versions corresponing to versions of the Debian package are inicate by p: n where n is the version number, while the uninstalle version is inicate by p u. 3 Not the set of all potential epenencies, but the set of all epenencies asserte in the current package system. 4

For each epenency of the version v of a package on A 1..., accumulate a set S containing all the matching versions of each name package, combine with every package version that provies a name package (if the epenency is unversione). For instance, if v eclares a epenency on A (>= 3.2) B, versions 3.1, 3.2, an 3.3 of package A are available, versions 1, 2, an 3 of package B are available, an package C version 3.14 provies B, then S = {A: 3.2, A: 3.3, B: 1, B: 2, B: 3, C: 3.14}. D contains v S for every such epenency. For each conflict eclare by a version v on the package p, accumulate a set S containing all the non-matching versions of p, incluing the uninstalle version, an insert v S into D. Furthermore, if the conflict is not versione, then for each package p an version v of p such that v provies p, let S = {v P kgof(v ) = p v oes not provie p} an insert v S into D. For instance, if v conflicts with A, of which versions 3.2 an 3.3 are available, versions 2 an 3 of B provie A, an no other versions of B are available, then S = {A: 3.2, A: 3.3, A: UNINST, B: 2, B: UNINST}. Note. In reality, extra care must be taken to screen out self-conflicts in this process, but the escription above is complicate enough as it stans! Remark. Although the above reuction is complicate to escribe, its major steps must be performe whenever any program is analyzing epenencies: for instance, when listing all the versions that can fulfill a epenency, it is necessary to iterate over all members of each OR an to search their proviing packages as necessary. Thus, an on-the-fly reuction in an algorithm written for the generic moel is conceivably almost as efficient as an algorithm that works with the Debian package structure irectly. 3.3 Installations An installation represents a choice about which package versions to install; it is a function that maps each package to a version of that package. Definition 1. An installation I installs a version v, written I v, if I(P kgof(v)) = v. Definition 2. An installation I satisfies a epenency = v S if either I v or I v for some v S. Definition 3. An installation I is consistent if I for all D. Definition 4. If I is an installation, then I; p v is a new installation which installs the version v of the package p an leaves all other packages unchange: 5

(I; p v)(p ) = { I(p ), p p v, p = p As a shorthan, the following notation inicates that a particular version of a package is to be installe: (1) I; v = I; P kgof(()v) v (2) 4 The Depenency Resolution Problem 4.1 Problem Definition Let I 0 be an inconsistent installation. We woul like to fin a consistent installation that is similar to I 0. This is the epenency resolution problem. In a real package manager, it correspons to a situation in which the user s requests have resulte in an invali state (with unsatisfie epenencies or conflicts); the goal of the package manager in this situation is to fin a small an goo set of changes that result in a vali state. Note. This problem is poorly efine: small an goo are not precise terms. The goal, from a UI point of view, is to not change too many packages, but to make reasonable ecisions: for instance, if the user has requeste that some packages be installe an these installations cause epenency clashes, solving the problem by cancelling the installations is probably not the esire result. However, while it might have obviously wrong solutions, this problem has no principle correct solution, because it is possible that if several ifferent users view a single epenency problem, each prefers a ifferent solution from the others. In other wors, some of the information necessary to fin the best solution is insie the user s hea. Thus, the best we can o is to efine some criteria for gooness (to prioritize solutions that are more likely to interest the user) an allow the user to see alternatives to an unesire solution. 4.2 Depenency Resolution is NP-complete In orer to fin a goo solution, we must first fin any solution to the existing set of epenencies. Unfortunately, as shown below, this is an NP-complete problem. Theorem 5. Depenency resolution is NP-complete. Proof. Proof is by reuction from CNF-SAT to the problem oes a consistent installation I exist? Create one package for each variable an for each clause in the SAT problem. For each variable x, let the versions of the corresponing package be x: 0 an x: 1; for each clause, create exactly one version. For each SAT clause let v c 6

be the package corresponing to the clause, an insert v c S into D, where for each literal of a variable x appearing in the clause, S contains x: 0 if x is a negative literal an x: 1 if x is a positive literal. This reuction is clearly polynomial-time; I claim that a solution to this set of epenencies exists if an only if a solution to the corresponing SAT problem exists. Suppose that there is an assignment that solves the SAT problem. Define an installation I as follows: if p correspons to a clause, I(p) is the single version of p; if p correspons to a variable x, I(p) = x: 0 if x is FALSE in the SAT solution an I(p) = x: 1 if x is TRUE in the SAT solution. Now, consier any epenency = v S. From the construction above, S an v correspon to a clause of the SAT instance. At least one literal in this clause must be assigne the value TRUE (otherwise the clause is not satisfie); let x be the corresponing variable. If the literal is positive, then (by construction) S contains x: 1; since x must be assigne the value TRUE. I x: 1. Hence, I. On the other han, if the literal is negative, then S contains x: 0 an I x: 0, so I. Thus, I is a consistent installation. On the other han, suppose that there is a consistent installation I. For all variables x, let p be the corresponing package; if I(p) = x: 0, assign FALSE to x, an if I(p) = x: 1, assign TRUE to x. Now consier any clause in the SAT problem: from the construction above, D contains a epenency v c S where v c is the single version of the package corresponing to the clause. Since we must have I v c an since I is consistent, there must be a version v S such that I v. But from the construction, there is some x such that v correspons to either x: 1, where x appears as a positive literal in the clause or x: 0, where x appears as a negative literal in the clause. Thus, the clause is satisfie, an so the assignment escribe above satisfies all clauses. Therefore, epenency resolution is NP-complete. 4.3 Don t Panic Although the problem at han is NP-complete in general, there is goo reason to think that the instances that arise in practice are tractable. It is well-known that many NP-complete problems have easy an har instances: some instances of the problem can be solve quickly by relatively naive algorithms, while others are intractable even using the most sophisticate techniques available. In the particular case of package epenencies, the traitions that have grown up aroun package tools seem to encourage the creation of easy instances of the epenency problem; furthermore, the user s esire installation is typically consistent or almost consistent (meaning that few epenencies are violate). It is usually straightforwar, when solving problems in an a hoc way, to isolate a small part of the epenency graph in which the problem occurs; for instance, by informally applying a constraint such as on t solve epenencies by removing core library packages. Once this is one, the problem can be eclare either solvable or unsolvable on the basis of a quick analysis of that region of the graph. In fact, when even relatively basic search techniques are applie to many 7

typical epenency problems, the ifficulties that arise are relate not to a paucity of solutions, but rather to an excess of them. That is, fining a solution is easy, but fining the right solution is more problematic. Inee, in the Debian framework there is always at least one solution: removing every package on the system will satisfy all the epenencies. However, for obvious reasons, this is not a solution that we want to prouce! 4.4 Solving Depenencies Through Best-First Search This problem statement suggests the use of a relatively simple algorithm best-first search to resolve epenencies. To briefly review, best-first search works by keeping a priority queue, known as the open queue, of potential (or partial) solutions. The priority queue is sorte accoring to some heuristic function that quantifies the gooness of each noe (often in terms of nearness to a full solution). In each iteration of the algorithm, the best partial solution is remove from the queue. If it is a full solution, the algorithm terminates; otherwise, each successor noe is generate an place in the queue. There are two main issues to resolve: How shoul successors be generate? What heuristic shoul be use? To generate successors, we coul simply enqueue all possible changes to a single package. However, this woul result in a gigantic branching factor (over 1500 branches at each step in the current Debian archive), an it woul cause the algorithm to consier ajusting packages that were utterly irrelevant to the problem at han, as well as changing a package multiple times (which can lea to choices being mae for reasons that are obscure to the user). A more focusse approach is neee. Similarly, we coul simply use the number of currently unsatisfie epenencies as our heuristic, but this oes not provie any guiance as to how epenencies shoul be resolve. If A epens on B, A is installe, an B is not installe, it is usually better to install B than to remove A; however, a straight count of broken epenencies woul consier both solutions to be equally goo. 4.4.1 Generating Successors to a Partial Solution An obvious way of generating the successors of a given solution is to o it on the basis of unsatisfie epenencies. If the installation I oes not satisfy the epenency v S, we know that v is installe but no member of S is. To resolve this epenency, we can either install a ifferent version of P kgof(v) or install any element of S. Applying this rule to each broken epenency in turn will prouce a set of successors that each solve at least one epenency (although they may break others in the process). However, this approach still has the potential to run in circles by installing one version of a package, encountering broken epenencies, an then moving to 8

a ifferent version (possibly after resolving some epenencies of the intermeiate version). The problem resolver of apt, for instance, sometimes confuses users by exhibiting this behavior. To fix this, I enforce a simple rule in generating solutions: a solution shoul never moify a package twice. Definition 6. If the original installation was I 0, then for any I an any D such that I, the installation I = I; v is a successor of I for if v I 0 (P kgof(v)) an I(P kgof(v)) = I 0 (P kgof(v)). One might woner whether this approach risks overlooking solutions: for instance, maybe it really is necessary to go in circles in orer to fin a particular solution. However, as shown below, if a solution cannot be generate through the application of the successor rule efine above, then there is a simpler version of that solution (one which moifies the states of fewer packages) that can be generate. To prove this, I first will introuce some efinitions an notation. Definition 7. Let I 1, I 2 be installations. The following notation is use to enote the istance from I 1 to I 2 (efine as the number of packages whose mappings iffer between I 1 an I 2 ). I 1, I 2 = {p I 1 (p) I 2 (p)} (3) Definition 8. Let I 1, I 2 be installations. An installation I is a hybri of I 1 an I 2 if for all p, either I (p) = I 1 (p) or I (p) = I 2 (p). Note. An alternative phrasing is that if I is a hybri of I 1 an I 2, then for all v such that I v, either I 1 v or I 2 v. Definition 9. If I is a successor of I with respect to I 0 for the epenency, then I I0 I. If there exist I 1,..., I n an 1,..., n such that I 1 I 0 1 I 2 I 0 2... I 0 n 1 I n, then I 1 I 0 I n. Lemma 10. Let I c be any consistent installation (if one exists) an I 0 be any installation. For all hybris I of I 0 an I c an all epenencies D such that I, there exists an I such that I I0 I, I is a hybri of I 0 an I c, an I, I c < I, I c. Proof. Consier any hybri I of I 0 an I c such that I is not a solution an any = v S D such that I. where I I0 Suppose that I c v. Since I is a hybri of I 0 an I c, I 0 v. Thus, I I0 I, I = I; I c (P kgof(v)) (4) On the other han, if I c v for some v S, then I 0 v. Therefore, I, where 9

I = I; v (5) In either case, clearly I is a hybri of I 0 an I c an I, I c < I, I c, proving the lemma. Theorem 11. For any consistent installation I c an any inconsistent installation I 0, there exists a consistent installation I c such that I c is a hybri of I 0 an I c, an I 0 I 0 I c. Proof. Proof is by repeate application of the previous lemma. Consier any inconsistent hybri I of I 0 an I c. Let I + be the I shown to exist in the previous lemma for an arbitrary such that I, an efine a sequence I 1,... as follows: { I k 1 if I k 1 is consistent I k = (6) otherwise I + k I claim that this sequence converges; i.e., that for some finite n an all m > n, I n = I m. Proof: let D k = I k, I c an n = I 0, I c. By the previous lemma, D k D k+1 for all k, an D k = D k+1 if an only if I k is a solution. Thus, if I k is not a solution, we have D k n k. But by efinition, D k 0 for all k, so clearly I n+1 is a solution (else we have 0 D n+1 1). Therefore, the theorem hols with I c = D n+1. 4.4.2 Scoring The secon key ingreient of a best-first search is a scheme for orering search noes, typically by assigning a numerical value to each prospective solution. In oing so, we must balance two priorities: the esire to fin a solution quickly, an the esire to fin a goo solution. The most obvious way to guie the search towars a solution is to rewar avenues of inquiry that ecrease the number of unsatisfie epenencies. This is not, of course, guarantee to prouce a solution quickly; however, in practice, it seems to be a sufficient hint for the algorithm to reach a goal noe in a reasonable number of steps 4. Fining goo solutions is somewhat more ifficult, not least because of the fact that goo is an ill-efine property. The experimental implementation of this algorithm in aptitue uses the following general criteria to assign scores to noes: Each version of each package is assigne a separate score. By efault, removing any package is heavily penalize, altering packages which were automatically installe recieves a smaller penalty, maintaining the state of an automatic package makes no contribution to the score, an maintaining the state of a manually installe package receives a bonus. 5 4 Most searches seem to converge in uner 5000 steps. 5 In actuality, all that is calculate is the ifference between the initial total version score an the final total version score. 10

A penalty is applie to each search noe base on its istance from the root of the search. This works to favor simpler solutions an penalize more complex ones. Noes that resolve all epenencies are given an aitional bonus usually a relatively minor one. Goal noes are move through the priority queue in the normal manner, rather than being floate irectly to its hea, in orer to ensure that solutions that are particularly ba are not prouce unless it is absolutely necessary to o so. Thus, letting B(I) be the set of epenencies that are broken (not satisfie) by I an letting h(v) be the score of the version v, the total score of an installation is h(i) = α B B(I) + α L I, I 0 + α G δ(0, B(I) ) + p P h(i(p)) (7) where α B, α L, an α G are weighting factors an δ is the Kronacker elta function (i.e., δ(i, j) is 1 if i = j an 0 otherwise). In the current implementation, α B = 100, α L = 10, an α G = 50. 5 Reucing the Branching Factor 5.1 One Depenency at a Time The algorithm lai out above is sufficient to solve many of the epenency problems that are encountere in practice. However, some problems still cause the search to take an unacceptable amount of time to reach a solution. The problems observe fall into two main categories: 1. Too many reverse epenencies. In orer to calculate the score of a successor of an installation (an of course to analyze that solution later on) it is necessary to generate the set of epenencies which are not satisfie by that successor. However, there are some one hunre thousan epenencies in the Debian archive; so that it completes in a reasonable amount of time, the current implementation uses the obvious optimization of only testing those epenencies which either were previously broken, which impinge on the package version being remove, or which impinge on the package version being installe. 6 Unfortunately, some packages have very many reverse epenencies. For instance, if I removes the system C library, over a thousan epenencies will be unsatisfie an simply generating the successors of this noe will require time at least quaratic in the number of reverse epenencies of libc. This can impose a significant performance penalty on the process of successor generation. 6 Recall that a successor to I will install version v of p, removing I(p) in the process. 11

2. Removing the bottom of a epenency chain. When an important library such as GTK is remove, it is necessary to propagate the removal up the epenency tree. However, the search technique outline above will search exponentially many installations before settling on this solution. Asie from the goal noe of keep the library on the system, the first step of the search will enqueue one noe for each package epening on GTK; each noe will remove its corresponing package. As these noes are processe, pairs of packages will be remove from the system; then triples, an so on, until the full power set of all packages epening (irectly or inirectly) on GTK is generate. Worse, at each step, solutions that suggest installing GTK (an removing many packages) will be generate. There is a simple solution to both of these problems. Instea of generating successors for every epenency, it is sufficient to generate successors for a single, arbitrary epenency (as shown in Theorem 11). In theory, this coul lea to somewhat less optimal orering of generate solutions, but this oesn t seem to be a major problem in practice an the ecrease in the problem s branching factor is well worth it. 5.2 Exclue Supersets of Solutions One simple way to trim the search tree is to rop any search noe I that is a superset of a full solution I c meaning that I c is a hybri of I an I 0. This has the aitional beneficial effect of preventing solutions from being offere to the user which are just a previously-isplaye solution with some extra, reunant actions ae to it. 5.3 Forbien versions If we have a choice between removing p an installing q, an we choose to remove p, why shoul we ever install q? This question leas to yet another way of reucing the problem s branching factor. To each solution noe I, attach a set F of forbien versions; the successors of I are restricte to those which o not install any version in F. For all successors I of I, let F F ; furthermore, if a successor I of I is generate by removing the source version of a epenency, then all of the targets of that epenency are members of I F. This new successor relationship is formally efine in Figure 2 on page 15. This has the effect of forcing the algorithm to stick with a ecision to forgo installing the targets of a epenency in favor of shifting the source. Note. This technique coul just as well be applie by expaning the forbien set when generating successors for the targets of a epenency: that is, forbiing a ifferent version of the source of a epenency to be installe. The ecision regaring which exclusion principle to use was mae on the basis of a 12

I = v S I(v) = I 0 (v) P kgof(v ) = P kgof(v) v / F (I, F ) I0 (I; v, F S) I = v S v S I(v ) = I 0 (v ) v / F (I, F ) I0 (I; v, F ) Figure 1: Successor generation with forbien versions conjecture that we are more likely to encounter a har epenency problem when moving up a epenency chain than when moving own it. Of course, it is important to verify that cutting off wie swathes of the search space in this manner oes not impee our ability to generate solutions: Theorem 12. Let I c be any consistent installation (if one exists) an I 0 be any installation. There exists an I c such that I c is a hybri of I 0 an I 0 I 0 I C. Proof. Let F 0 =. I claim that there exists a sequence (I 1, F 1 ),... such that for all k 0, For all v F k, I c v. I 0 I 0 I k Either k = 0, I k 1 is consistent an I k = I k 1, or I k, I c < I k 1, I c. Proof is by inuction on k. Suppose that a sequence (I 1, F 1 ),..., (I k, F k ) exists satisfying the above conition. If I k is consistent, then let I k+1 = I k an F k+1 = F k ; the inuctive hypothesis is satisfie immeiately. Otherwise, consier any = v S D such that I k (since I k is inconsistent, at least one such exists). If there is a v S such that I c v, then let I k+1 = I k ; v an F k+1 = F k. Clearly I c v for all v F k+1 an I k+1, I c < I k, I c ; since we aitionally have (I k, F k ) I0 (I k+1, F k ), the inuctive hypothesis hols. If instea I c v for all v S, then since I c is consistent, I c (P kgof(v)) v. Let I k+1 = I k ; I c (P kgof(v)) an F k+1 = F k S. I c v for all v S by efinition an clearly I k+1, I c < I k, I c. In aition, I k I 0 I k+1 by Figure 2; therefore, the inuctive hypothesis hols. Thus, the claim is establishe: such a sequence exists. Following the logic of Theorem 11, we can see that for n = I 0, I c, I n is a consistent installation. Furthermore, from the construction above, I n is a hybri of I 0 an I c. Thus, the theorem is establishe with I c = I n. 13

5.4 Use Logical Necessity In combination with the tracking of forbien versions, it is also possible to etect force installations an essential conflicts. A force installation is one which is logically necessary given I an F : for instance, if we have = v {v 1, v 2}, I has touche v (i.e., I(v) I 0 ), v 1 F, an v 2 / I F, then the only permissible successor given is (I; v 1). An essential conflict is a epenency for which no successors can be generate: for instance, if in the previous example we instea ha v 2 F, then woul be an essential conflict. If any essential conflicts exist in an installation I, it is iscare immeiately (rather than, for instance, generating successors for all the solvable epenencies). If any force installations exist, they are accumulate an a successor forme by aing these installations to I is place into the open queue. 6 Non-manatory Depenencies In aition to the stanar Depens metaata, Debian also has a class of epenencies known as recommenations. In the wors of section 7.2 of Debian s technical policy: Recommens: This eclares a strong, but not absolute epenency. The Recommens fiel shoul list packages that woul be foun together with [the recommening package] in all but unusual installations. Package management frontens aopt a variety of strategies to eal with recommenations, ranging from completely ignoring them to treating them nearly as strictly as epenencies. The current best practice seems to be the rule install recommenations when a package is first installe; ignore them otherwise. In this section, I will propose one way in which the above theory an algorithm can be extene to accomoate these non-manatory relationships. 6.1 Har an Soft The information content of a recommenation is equivalent to that of a epenency, an so it makes sense to represent a recommenation in our formal moel as a special type of epenency. I will ivie epenencies into two classes: har epenencies an soft epenencies. soft epenencies, of course, represent recommenations 7. Now, although soft epenencies nee not be satisfie in an eventual solution, we woul like the algorithm to at least try to satisfy them, an in fact it shoul to make a reasonably significant effort to satisfy them. In orer to ensure that this is one, I suggest the following techniques: 7 Or, to be more precise, recommenations of packages that are not presently installe, in accorance with the abovementione rule. 14

/ C I = v S I(v) = I 0 (v) P kgof(v ) = P kgof(v) v F (I, F, C) I0 (I; v, F S, C) / C I = v S v S I(v ) = I 0 (v ) v F (I, F, C) I0 (I; v, F, C) I (I, F, C) I0 is soft (I, F, C {}) Figure 2: Successor generation with soft epenencies Exten the state of search noes with an aitional set C, representing the epenencies that have been close by being examine at least once. As shown in Figure 2, exten successor generation to permit the algorithm to give up on any open soft epenency: in aition to generating successors for the various way of solving that epenency, it will also generate a successor in which no package states are change, but the epenency is close anyway. Penalize broken soft epenencies, to rewar solutions that fulfill soft epenencies. This has not yet been teste, but will likely require some rebalancing of the various weighting factors previously iscusse in orer to prouce reasonable results. 7 Implementation A prototype implementation of this resolver algorithm exists in the experimental branch of aptitue. The implementation is compose of two pieces, which are assemble via C++ templates: a search algorithm for a generic epenency problem, an a runtime translation of APT epenencies to the generic form outline above. It oes not implement soft epenencies, although their future inclusion is planne. The current implementation seems to perform reasonably well: in the cases that I have teste, solutions are generate quickly enough for interactive use. However, the orer in which solutions are offere is sometimes surprising: for instance, if the installation of a package causes problems, it is common for the first generate solution to be cancel this installation. While, as note above, there is no perfect solution even in principle an any static weighting is likely to occasionally prouce o results, I expect that some of these problems can be fixe through ajustments of the score function. 15

8 Future Work As note above, the score function nees to be ajuste an soft epenencies nee to be supporte. In aition, some consieration of the following questions seems worthwhile to me: 1. Is it ever possible to ivie an conquer a epenency problem? Rationale: as note towars the beginning of this paper, informal analyses of epenency problems often seem to aopt a ivie an conquer approach. Moreover, such an approach woul have several important user interface benefits: for instance, it woul avoi the tenency of the algorithm to prouce the Cartesian prouct of all the ifferent ways to solve each isolate group of epenency problems. I o not, however, see an obvious simple way of performing such a ivision. 2. Can an shoul overly similar solutions be etecte an roppe? When a solution implicates a large number of packages, the current algorithm tens to prouce many solutions which iffer only slightly from one another. From a user-interface perspective, it might be esirable to rop some of these solutions. What metric, if any, shoul be use to perform this ropping? 16