Monday, June 16, 2008

Parnas oopsla keynote podcast notes

David Lorge Parnas delivered a keynote speech on October 24, 2007 at OOPSLA in Montreal. The speech largely covers documentation, and communications required to build large projects. It touches on issues of building these projects, but mostly covers documentation.

I listened to it twice, and took notes. I haven't seen his slides. Sorry, this isn't prose, it's notes. Expect sentence fragments. And, they're somewhat organized together by topics, not notes in the order you'd hear in the show. Some repetition was removed. It hasn't escaped my notice that, as documentation, these notes of mine suck.

Listen to the show. It's one of the better talks about computing i've heard in the last decade or so.

Object Oriented: not a language thing. It's a description thing. Dykstra's THE does this in assembler. But OO languages have made it harder. Parnas has done this in Fortran. But younger people see this as a language issue.

  • Abstraction - what is the "more abstract than" relationship? Subsetability: sequence of virtual machines.
  • Information hiding, inspired by managers who wanted to break up a task into work packages, without incurring too much communication. (Parnas). Information hiding as a way to do composition.
  • Documenting module interfaces.
  • "Information hiding is an empirical result". - no. It's mathematical. It allows one to know if module A uses B, and knowing a limited amount about B, you can know if A needs to change or not.
  • Hence, Information Hiding Theorem.
  • So, you have the theory, but must also check it out in practice. Because "Theorectically" means "not really". "Theoretically, we can all run a 4 minute mile."
  • Unhappy with this definition: Modules are collections of procedures - invoked by procedure call.
  • Abstract datatypes. Instead of one data structure for each module, more than one.
  • Components - really about sellable bits. Modules aren't components.


Young people may have more energy. Older people have the experience to know what might change.

Documentation

How can you document a project so that if a manager gives it to three people, he'd get the same thing. When it fails, there's no one to blame. It was difficult, not in theory, but in practice.

Sometimes, combining modules is the right answer.

Mathematical relations.

Documentation for other engineering projects has multiple dimensions. There are multiple documents for a bridge, etc.

Coffee stain test - if the document is used, it's good. These days, it would be a document web hit count or something.

Design through documentation. Building a bridge: the documentation gets refined until the builders can use it to build the bridge. In engineering, these documents assist in inspection. Assists in maintenance. Enables systematic design review. Attitude is that the documents are binding. But in software - it's the program that counts - and the documents may be updated. In engineering, these documents are binding. Enables verification and inspections.

Documentation must be accurate - if it isn't right, people will stop using it.

Documentation must be precise. A precise document that is wrong is better than a vague document. A vague document is not even wrong. (One complaint i have for the talk is that it's more than a bit vague. It might be the slides i missed. He clearly wasn't reading from the slides, so it's gone.) Yet, the questions at the end were from people who clearly knew what he was talking about.

Documentation must be consistent. Free from contradiction.

Everything must be only in one place. Otherwise one will be changed, and you lose consistency.

Documentation must be complete. It must answer questions you have.

Documentation must be easy to search. You must be able to find the answers to the questions you ask. Search tools can help you find things in electronic documents.

Notation and Organization

Documentation must use simple, closed form mathematics.

No introductory sections. People will look at the introduction, and ignore the body, then misinterpret it. Extreme example: introductions were audio, and could only be played once. The goal was to help you understand the body, but were not referenceable. Of course, clever users figured out workarounds.

Source code isn't documentation for end users.

Documents show a separation of concerns. At least three views for a solid object. There is a set of documents.

Documentation is not a crisis: it's a chronic disease. Documentation needs to be a disciplined effort.

Documents should be good enough that you should not have to look at the code.

Documents are practical tools.

Documents must be authoritative. They must be organized so that you know where things go, or can be found. They are reference, like a dictionary, rather than a tutorial.

The test is that if you ask someone about a program, if they go to the documentation first, then it's good documentation. If they go to the code first, it isn't.

Documentation needs structure that avoids inconsistency.

Documentation needs to be better than "let's just try it".

Documentation needs to be used before you write the code. While you write the code, and after the code is finished.

All these documentation rules are easier said than done.

Each document should have a clearly defined role. Description, partial specification, full specification.

Three types of documents
  • Description: facts. Some facts are requirements.
  • Specification: only requirements.
  • Full specification: all requirements.

The same notation may be used by all three types. This means that there is no specification language.

If you look at a document, you can't tell if it is a description or a specification unless someone tells you.

It's not models. (!)

It's not formal documentation.

Source code clearly describes the program, but not the program's intent. It's not a specification. So it doesn't describe how the program might change. (On the other hand, specifications can change too. And, some programs do document intent via comments. One could argue that any other comments are pointless.)

The word 'theory' often means 'not really'. The program works, in theory.

All words are bendable (or ambiguous).

How to define what information should be in a document?
  • Must specify the content.
  • Each document is the representation of a mathematical relation.
  • Ordered pairs.
  • Programing proving. Preconditions and postconditions.
  • If you can start in x, it should end in y.
  • Lots of examples of format and notation. Must agree on content.
  • Need to know all the variables you sense and control.

Counter example: Document form vs. content.

Great anecdote. Manager offers prize for documents conforming to specification: Seven parts on A4 with 1 cm margins, pages numbered, etc., but without talking about content. Winner submitted document in seven sections that said the same thing. Description: the module copies data from here to there. Design: Bytes are copied from here to there. Detailed design: Byte one is copied before byte two... and so on. Manager never noticed.

Two are two relations:

The set of things that are possible without the system - nature. Describe the restrictions of cases that the system allows.

Document should be able to be checked for feasibility.

Software: modules, objects, components.
  • Distict but related concepts.
  • Module is a work assignment.
  • Component is a unit for sale.
  • There can be many copies of an object in a system.
  • The same kind of document techniques can be used for all of these.

Describing things operation by operation is natural to programmers. It's not very helpful to others.

A7 aircraft documentation project. Output by output - with history of inputs. You get this under these circumstances.

It's harder to write for managers.

Microsoft EU court case. $3 million a day fines. Anti-trust. MS was told to document what other vendors needed to do to interoperate. MS attempted to document their interfaces and failed. In Parna's opinion, MS tried. MS offered code, but other vendors refused. Intellectual property, but also that the code can change, and so, the spec.

Inability to write documentation is key to some of the big expensive problems today.

Specifications are not wish lists, or lists of features.

Documentation is not a list of facts about the code. In engineering, a specification is the requirements for the product.

Requirements should not be repeated. Else, inconsistencies creep in.

Describe the data structure. Describe how the data structure is interpreted: (interface) abstraction relation. Describe what each program on the interface does to the data structure.

Traces - history of what has been done. Commuting diagram. (More math references). If it commutes, then it can be implemented.

A document should be a description of a set of relations.

Documentation is inconsistent, and has errors.

Describe mathematical concepts in a readable way. Predicate pairs. But how to write it so people can read it?

Bad example:

21 pages of inconsistent and erroneous testing documentation. The testers ignored the errors. Following testing instructions is dangerous. Everyone preferred this bad document. The document was readable, but managers didn't know it was wrong because they never tried to use it.

These two things are needed for practical, sound documentation.
  • Tabular expressions. Better for math with if, and but.
    Discrete math.The tables parse out the dimensions of the problems.
  • Relational model of documentation. Tells you what information must be
    in the document.

Requires significant training. But not a PhD. Fewer mistakes. Requires an engineering background. Precise and checkably complete. Pilots can read it (for A7) likely due to engineering training. Can be used as a slow (non-real time) prototype.

Down on Zed, VDM, Alloy (British doc languages). Alloy is relational, but this is not the problem. Tabular notation is needed. These languages are translations of the code into other codes. May as well use a Turing Machine implementation. These tools are not practical. If it's easier to read the code, then the documentation isn't good enough. These tools are code. Should use engineering documentation as a guide.

Down on documentation extracted from source after the fact. For one thing, it's descriptive, not prescriptive.

Managers often do not have the engineering background needed to read documentation. Writing documentation for managers is harder than for programmers or users.

When you're done, the program should do what the document says it should. If you test the program from the document, they can't get out of sync.

Testing

Test coverage.

Reliability estimation via testing.

Mutation testing.

Test case generation. Interesting points:
  • Extreme values
  • Cross zero
  • Discontinuities
  • Equation solving for boundary values

Precondition/postcondition doesn't scale up as well as relational models.

Errors escape reviews. Must use the code to answer questions. What would you have to change in the code to do something?

Separation of concerns (Dykstra). Look at small bits of code at a time. Important for testing.

(more testing:)
A "display"
  • what it should do
  • what the invoked programs do
  • right or wrong on it's own.


Then, for large programs, testing uses lots of these displays. Scalable - lots of people can do little bits.

"The smallness of the human skull." - Dykstra "You should never look at a long program." Use divide and conquer. Precise documentation to hook things together. It's not stepwise refinement. In stepwise refinement, the program gets longer. Instead, keep it separate.

If it's an object and a component, use the trace function method. if it's a program within a component, use the display concept with documentation.

Tools for documentation:
  • Functional and imperative programming are combined.
  • Functional specs for imperative programs.
  • Don't worry about efficiency of functional programs.
  • Don't worry about illegibility of imperative programs.

Object orientation for hiding the right things.

To Change
  • Vague diagrams that don't answer the right questions.
  • Write the documentation first, to help build the programs.
  • From group reading to systematic inspection of the doc.
  • Test generation from the documentation.
  • Anyone who knows two language is a software engineer - a myth.
  • Need to train people in documentation.

Summary
  • Need to take a serious view of documentation.
  • It needs to be seen as normal, expected and effective.
  • Needs to be taught.
  • Think about why people don't do it. (ie: it's not taught).
  • Licenses for software engineering, including testing documentation skill.


Questions from the audience

In theory it will work everywhere.

Interpreting specifications, not executing them. Theorem provers, etc.

Deliberately introduce errors into program to find errors. Deliberately introduce errors into documentation to find errors - Mutation.

Movies: storyboarding. Rapid prototyping is similar. Knowing what you want by creating an approximation. Is the prototype a document?

The prototype should be bound by the specs, just as the real program.

Tables of documentation as an interface for talking to the customers.

Evolution of documents. The document evolves with the program. Use information hiding in the document as well as the code. Allows easier evolution.

User interfaces and complex database interfaces. - works.

Teach professionals the tabular notation? Need to focus on a method that works - not just about methods. Need to teach better fundamentals. The math - 'for all', etc. Lectures and labs. Lectures give theory. Labs teach why there's theory.

Questions i might have asked if i'd been there

  • So if the documentation is really good, is the code write-only?
  • What does Parnas think of use cases? Aren't they requirements?
  • Many ideas (like use documentation to show who's to blame) are wrong because the idea itself is shallow.

No comments: