An Open Access Peon

08 June 2006

Reinventing the Wheel and Other Exercises

So, it looks like the re-write of Celestial is nearing its conclusion. As it was Celestial was getting to be a pain to maintain, as well as being particular user un-friendly. Now the OAI sets membership has been separated into per-repository tables, set-based selection performance is greatly increased (at least until a single repository ends up getting stupidly big). There's now a proper interface for managing subscriptions - the ability to register for reports on Celestial's harvesting.

Internally I've adopted a more-modular approach to the web interface, with each section in its own (lightly wrapped) .pm file. This has made it surprisingly easy (and neat) to add additional outputs. All of the existing functionality is there: the OAI interface, ListFriends, repository listing and editing, but now is all through a common Apache interface.

So, stuff I've learnt in this exercise (NB this is under Redhat Enterprise 4):

  1. Apache::RequestIO is needed to enable $r->read(), otherwise CGI fails on HTTP POST (had similar problems with hashes and requiring APR::Table). The mod_perl use of modules is pretty infuriating.
  2. There are some interesting, subtle issues in XML::LibXML when trying to generate SAX events from a sub-tree. Basically, the thing that SAX events are generated from has to be the root node, which means if you want to include a DOM fragment in another structure that generates SAX events the subtree has to be extracted and set as the root node of another DOM. So far, so annoying, but another bug inside LibXML caused me a headache. It seems LibXML segfaults if you try to set a subtree as the documentElement (presumably because it frees() the old root element, clobbering the subtree you just set as the documentElement). If you're wondering why you would ever want to do this, well my OAI library is based on sticking DOM fragments into a perl OO structure that outputs by generating SAX events.
  3. The HTTP connector classes across browsers behave the same, but are called different things and set self to different things in the callback. Why does this matter? Well, the different names can be handled using javascript voodoo, but if you want to open up multiple HTTP connectors getting a handle to the particular connector that triggered a callback is impossible (with the exception of Opera, which apparently sets self to be the connector object). Instead, for my Celestial AJAX experimentation, I had to store each connector in an array, then interogate each one in turn to spot the one that was in a ready state. More regrettable global-fudges to get around stateless callbacks.

This is the first post to this blog - if you're reading it, blimey. Essentially this will be my effort to document problems I've grappled with and the general grind of working on the tools I've developed and now support (Citebase, Celestial and ROAR). This blog is aimed at myself - as I'm useless at keeping a lab-book and can't find what I want in them anyway - but if it helps you, that's great.

0 Comments:

Post a Comment

<< Home