22 May 2004

the return of advogato

Iyaloo! (An expression of delight in Oshiwambo.) Speaking of which, I just spent the whole week trying to finish a book on Oshiwambo. The next step is to see if I can get it published. That would be pretty nifty -- my first book.

a python snuck up on me

I wanted to kill offlineimap, which wasn't responding to Ctrl-C, so I went to another console and typed "killall python". Ay, there went my blog entry, never to be seen again! From now on, I'll always think of python as part of my desktop.

profiling

I had to do some profiling recently, and figured I'd blog about it. I wrote this a couple of weeks ago, and in the meantime people have started blogging about profiling. Funny how free software has a kind of milieu.

I started off investigating gprof. I had this idea in my head that profiling requires recompilation, so I checked the info and recompiled a whole stack of libraries with -g -pg -fprofile-arcs. Then I ran the program, ran gprof on the output and... bloody worthless. All it told me that main() takes 100% of the time.

Turns out gprof can't handle shared libraries. What is this, 1994?

I found a post about qprof, and it sounded like a good idea. Uses the LD_PRELOAD hack to set up some statistical profiling, no recompilation needed. Worked out pretty well.

By default, qprof only does flat profiles (counting the time spent in functions on top of the stack). However, if you configure it with libunwind, also from HP, it will do a version of call profiling. You have to fiddle with the build, manually copying things over and fiddling with the LD_PRELOAD scripts, but it does work.

qprof keeps an internal buffer of program counter (pc) locations. In the flat case it just records the pc in a slot and moves on. In the call case, it records the pc for each frame and moves on. Then at the end it counts up occurences for each pc, runs addr2line on it, and prints it out. However, you don't know where functions are called from. A little hack to store which pc records correspond to a single tick could make that possible, though -- I'll hack it up if I don't lose interest. That way you can get nice tracebacks like from valgrind.

A further annoyance is that functions called recursively are multiply counted. For instance, it reports that scm_ceval (from the guile evaluator) gets called 3246% of the time. Took me a little while to figure that out.

In summary, gprof blows, but has great documentation, except that they don't mention that the program blows. qprof is really useful and easy, but takes some fiddling, and really needs some more loving. Also libunwind on x86-32 is a little buggy, it seems. (Its main target is x86-64).

Anyone will use valgrind, because it's easy. No one profiles because traditionally they have to recompile. qprof (and others like it, oprofile for example) will hopefully change that situation.

guile-gnome

(gnome gtk) loads 7 times faster than it used to. 2 seconds is still not good, but it's good enough that maybe I won't notice that gnome-blog loads up in 1.2. jamesh did a damn good job with that library.

family trees

To get that performance, I delayed the creation of scheme classes and methods until they are first used in the source, because programs will only ever use a small part of the gtk api. Incidentally gtk2-perl does the same thing. I think that says something about both languages: they can be elegant (although I think that's harder with perl ;) and they can get dirty. By dirty, I mean really low level. For instance, I can define an allocate-instance method on a class such that the instance actually doesn't below to that class. It's useful, but damn, it's ugly.

postscript

I really enjoy coding while tipsy, but I hate spilling beer on my keyboard. Something's gotta give. I think it's the position of my glass.

Comments are closed.