documenting language bindings

I had lunch with Jao before he left for Zurich, throwing around ideas over copious wine. He made the observation that a lack of good documentation is a limit to software's potential impact, with particular reference to the pile of code that I've written or maintained. Point taken! With that thought in mind, I've been trying to focus more on documentation. In this blagpost I will summarize work on documenting language bindings.

haskell

Duncan Coutts, the gtk2hs maintainer, called me out at FOSDEM earlier this year. I was presenting about Guile-GNOME, and sheepishly mentioned that the entire project was undocumented. Duncan was kind enough to point me to how gtk2hs does things.

Their bindings are automatically generated at the beginning, then tweaked and maintained by hand. The Haskell folks have developed a documentation system that involves specially formatted comments in the code, similar to Javadoc or gtk-doc. When the bindings are first autogenerated, their generator produces the documentation as well.

The documentation actually comes from the docbook files produced by gtk-doc; that is to say, from the upstream C documentation. They run some basic search and replace operations on the output so that it is more "Haskelly". Seems to be a reasonable way to bootstrap the documentation, and the HTML output certainly looks good.

gtkmm

I asked Murray what gtkmm does, and it seems that they do something similar. The difference is that they combine the C documentation from the docbook files into one XML API-cum-documentation file, then generate their documentation from that.

Also, presumably since C++ is similar enough to C, gtkmm bindings regenerate their documentation all the time, customizing the documentation for only about 5% of the functions. Customizations are maintained in a separate overrides file. The output documentation is made with Doxygen, which looks OK but not as nice as Haddock.

java-gnome

A bit tipsily last night I prompted Andrew Cowie to opine about the same topic. Java-gnome people are apparently awash with contributors, as they decided at some point that they would hand-write all of their language bindings. They do the same with their documentation -- all written by hand, and processed with javadoc. The HTML documentation looks OK, better than Doxygen but it seems that the wrapper itself is incomplete.

pygtk

As far as I know, the most excellent pygtk documentation seems to be completely written by hand.

guile-gnome

Guile-GNOME itself is still undocumented, as whichever way I might go, it will be a lot of work. It pays to invest a few weeks figuring out the right way to go.

As a test case, I looked at seeing how difficult it would be to automatically generate documentation for Guile-Cairo. I cannot write all documentation by hand; it is too much. Instead I looked at reusing the technique from gtkmm/gtk2hs, munging the docbok generated by gtk-doc.

There are three documentation formats that are equally important to me, and one that is less important.

The first one is HTML, so that someone browsing the project's web page can see the status of the binding.

The second one is "online" documentation, so that when I am at the Guile listener I can type (help cairo-get-extents) and get good documentation. (This is also the case when I hack in Emacs with Guile-Debugging, and I type C-h g. More on that later.)

Thirdly we have local searchable documentation, either via Info or via devhelp. Lastly, we have hardcopy output as PDF.

For me and for Scheme users, these requirements point to texinfo as the intermediate format. I can generate good-looking PDF output with indexes, HTML output, and Info, which is actually quite OK. In addition I can write out a text representation of the texinfo into a docstring file, which allows the documentation to be available at runtime without incurring memory use penalties.

I don't have the documentation-generation code nicely packaged yet, but I'm pushing it into other projects. I think I need to let it sit for a while, to see if I actually want to undergo the pain of documenting the bindings for the GNOME stack, given that even for Guile-Cairo some work remains. Anyway, that's the hack of the last few weeks. If you are a bindings author, or less likely, are a would-be Scheme hacker, and are at GUADEC, pull me aside and we can chat about such things. You will know me by the extra-large chops.

3 responses

  1. Duncan Coutts says:

    Thanks for bringing this up again Andy.

    Seems to me that the ideal thing would be for gtk-doc to be able to produce some nice xml/lisp format output (not docbook! - it's almost a write-only format, it doesn't preserve nearly enough semantic information).

    So much like we have these standardised .def files which contain the gtk api info, we should have something similar for the gtk-doc output. Then each project could take that and translate it into whatever format they use, be it markup in source code comments, docbook, texinfo or whatever.

    The worst part of the current system is munging the gtk-doc docbook output files to reconstruct the information into a sane format. It's not quite as bad as html screen scraping, but it's pretty bad.

  2. Peter Russell says:

    Very interesting post. You're quite right that the PyGTK documentation is excellent. I don't think I've ever seen API docs so nicely and neatly done.

  3. brought to you by torres viña sol -- andy wingo says:

    [...] I used some techniques I wrote about previously to generate schemey texinfo from upstream’s docbook for C, with a twist: when generating function docs, we load up the wrapset metadata, and use that to determine which functions are actually in the wrapset, what their arguments are, and if they have a generic function associated with them. [...]

Comments are closed.