Progress Report to ONR on MURI Project Building Interactive Formal Digital Libraries of Algorithmic Mathematics Robert L. Constable Cornell University February 2003 Project Web Page http://www.cs.cornell.edu/info/projects/nuprl/html/digital Libraries.html Accomplishments since the May 2002 Project Review 1. Further Design of the Prototype Formal Digital Library (FDL) Stuart Allen has written a key technical article on the notion of abstract object identifiers and certificates and a substantial companion design document. The technical article has been submitted for publication, and the design documents are a blueprint for our on-going work. These articles are highly original and conceptual, and the ideas developed in them are essential to a sound FDL. Stuart s treatment of every topic breaks new ground, and we have had to spend time to educate the project and publish ideas to elicit feedback. We have been exploring the use of reflection based on the term structure of the FDL. We can now demonstrate a result that will hold in all of the theories that we connect. It is a result critical to Tarski s theorem. It represents a significant improvement in a logical treatment of reflection made possible by computer support. 2. Software Expanding the Formal Digital Library (FDL) Prototype 1
We have added Library navigation functions and documented them. This entails a great deal of programming and technical writing. The extension to the FDL represents nearly 1,000 person hours of work. Linking Nuprl 5, Nuprl 4, MetaPRL, and PVS to the FDL We have spent many programmer hours linking these provers to the Library. 3. Adding content to the FDL We have added new content in two areas, both essential. First, we are developing new material that is directly relevant to software infrastructure protection, namely protocol verification, system security, and basic formal data structures. Second, we are incorporating PVS material into the Library. Critical infrastructure protection of software Our approach to this area is unique and important; together we can point this out to OSD. The basic argument is that verification tools will be vastly more useful if they have access to large knowledge bases. Moreover, the latest DARPA thrust in intelligent networks will require the knowledge base that we will assemble. We know the importance of digital knowledge bases from research in AI, namely, intelligent systems need access to large amounts of knowledge. We also know it from our work with protocol verification. Once we established a large knowledge base in distributed systems, we could verify protocol designs as fast as the systems people could produce them. The investigations we are conducting now are modeled after the research program we pursued with ONR funding in 1990 s for the investigation of correct-by-construction functional programming with Nuprl. The area created with NSF and ONR funding is still very active and is having a significant impact spawning related research programs and systems and eventually leading to a decade long period of practical work supported by DARPA. The Nuprl book from 1986 remains among the top twenty five most cited documents on the Web according to Citeseer, and the current Nuprl Web page brings that book up to date. There are many lines of theoretical work that can be traced back to this earlier project, including the solution of two open problems in theoretical computer science and mathematic (Howe and Murthy) and the creation of closely related proof development systems, such as Alf, Coq, Lego and MetaPRL, and many significant practical re- 2
sults in both industry and government. Dozens of PhD theses have been written on this topic along with hundreds of articles and reports. We believe that the research agenda we are proposing now has the same or greater potential to produce strong practical applications and unexpected discoveries over another twenty year period. In addition, we have picked a topic on which there is common interest with the Naval Research Laboratory protocol verification using IO automata and protocol synthesis. Also Jason s popular expository writing about O Caml spreads our formally grounded writing ideas to a wide audience. We have developed a new capability to extract distributed systems from constructive proofs that specified behaviors are achievable. In some sense this solves a problem that has been under investigation by many researchers since 1990. It will have a significant impact on our ability to create reliable and secure software, and it depends vitally on a large formal digital library of the kind we are building. Accumulating content from multiple provers Jason s work on a formal O Caml compiler developed inside MetaPRL is highly original and complements our formal O Caml semantics. We have imported PVS proofs into the FDL. We did this instead of posting the O Caml semantics to the Web because as you noted, we are burning researcher cycles as fast as we can and still have a large stack of subgoals awaiting attention. Our ability to replay and import PVS proofs extends our PVS capability considerably. We have written a CADE paper on this. The import required a substantial amount of programming and experimentation, again hundreds of hours. We are working on posting this material to the Web. 4. Presenting FDL content on the Web The ability to display formal content on the Web has proven to be important to advertising the project. Thus it has taken on a higher priority with me and you than the technology now supports when applied to massive content from multiple systems. Progress has been slow against hard technical problems. We are writing internal notes about the mechanisms and problems and discussing them in our research meetings and seminars. There are serious problems of scale as we try to deal with tens of thousands of objects. 3
Presenting PVS proofs on the Web at the same quality level of Nuprl and MetaPRL proofs is an additional technical challenge and requires us to extend Stuart s tools still further. 5. Engaging with a Community Our discussions with people at the review and with you suggested the importance of identifying a community on which we can have a significant impact. We are committed to at least one such community, Mathematical Knowledge Management (MKM). In June 2002 I gave an invited talk at the first meeting of the North American branch, NA-MKM. This talk is posted at the project home page. Jim Caldwell is engaged with the European MKM which has ties to the North American branch, and I will meet some of the leaders in July 2003. We will also look to identify a second community that complements the first. At the North American MKM meeting that we attended en mass, this was one of the liveliest topics. We have remained in contact with the North American and European leaders of this group. I expect that we will host a meeting of the North American group next year if they are willing, and we might ask to use some of our ONR funds to help organize. A related effort in which we also participate is QPQ (QED Pro Quo), a repository of open source deductive software. I am on the Advisory Board. As part of community building, I have given technical lectures on the FDL at these places: North American Mathematical Knowledge Management meeting in Hamilton Ontario. Automath meeting at Harriot-Watt University in Edinburgh University of Reading, England Ben-Gurion University, Israel I will be lecturing on the topic at the Marktoberdorf Summer School in July 2003 and at an ICALP workshop. 4
6. Personnel Cornell CalTech Wyoming Stuart Allen Jason Hickey James Caldwell Robert Constable Aleksey Nogin Vitali Khaikine Christoph Kreitz Xin Yu John Cowles Richard Eaton Lori Lorigo Eli Barzilay 7. Publications We list fourteen publications on the project Web page (including the FDL manual). There will also be two PhD theses, one from August 2002 and one from August 2003. This is a very high publication rate given the amount of software and formal mathematical content we produce as well. Among the most significant papers are these. (a) Abstract Identifiers and Textual Reference (2002) (b) Notes on the Design and Purpose of the FDL (2002) (c) Logic of Events (2003) (d) Nuprl-PVS Connection: Integrating Libraries of Formal Mathematics (2002) (e) eflecting Higher-Order Abstract Syntax in Nuprl (2002) (f) Sequent Schema for Derived Rules (2002) (g) Theory and Implementation of an Efficient Tactic-Based Logical Framework (2002) Paper (a) lays the foundations for a multi-logic library of formalized computational mathematics and computer science. It discusses an issue that many people have not discovered as critical to such a system, namely the need for abstract treatment of the name space. Paper (b) translates the ideas of (a) into FDL design decisions. Paper (c) shows the value of this work to one the problem of building secure-by-construction distributed system software. This is a core problem in CIP/SW. Cornell has another MURI in the area, on language based security. Our work is important to that MURI as well. We make a direct connection between the FDL and the creation of secure code. 5
Paper (d) discusses some of the early issues discovered as we imported PVS into the FDL. The PVS prover creates a great deal of formal mathematics, and some of it is useful in algorithm construction; but the PVS prover is based on an entirely different logic than what is used in Alf, Nuprl, and Coq, the provers for computational mathematics. We are showing how to incorporate two very different logics into the FDL. This will increase its value considerably. It is a very hard logical and software engineering problem. Paper (e) deals with the problem of reflecting formal logics. This capability is widely recognized as important in practice, but it is extremely complex and has not been fully deployed in any prover. We proposed using insights from computational mathematics to simplify the task. A thesis is being written on this topic. Its results will apply to all logics of the FDL. Papers (f) and (g) describe the mechanisms of MetaPRL on derived rules which will benefit all implemented formal logics. 6