Here are a couple of remarks:

- PostgreSQL contributes a lot to the system load  :-(  .  When PostgreSQL decides to run a checkpoint process, load shoots up above the target load (the thing I'm doing has an adaptive load control) which evidently causes the indexing process to slow down.  Naturally, this is exactly the expected behavior, except that when the target load is equal to 1 or 2 or in the ballpark, PostgreSQL will drag the load go up to 4 or 5 anyways, which is not good for interactive performance.
  
- Disk space efficiency for PostgreSQL-based storages is not the best either  :-(  : an object takes anywhere from 2KB (even 1-byte files) to the size of the object + 2KB.  The database seems to grow to "the size it chooses to be", even after VACUUMing deleted objects, I cannot find a strong correlation between count(stored objects) and `du -1 /var/lib/pgsql/data`.

- Plus, I've found no way to index TEXT columns with more than 8Ki characters.  :-(
 
Just mentioned that since I've heard you're using PostgreSQL.  I'd very much suggest you study using any database which has MySQL-like features (I read on the licensing problems of MySQL) like full-text indexing, PostgreSQL appears to not have them.  Though PG is fast, over here, storing an object and its attributes (in heavily indexed columns) does not take more than 0.02 sec, including transaction commits.

- The Qt bindings leak   :-(  .  This code can very quickly fill up all available virtual memory:
-------------------
	import qt

	while True:
		a = qt.QString("zeta")
------------------
put that in a file called "test", open a Konsole window in the file's folder, and run
...]$ ulimit -v 100000 # just to be on the safe side, if you have less than half a gig of ram, make the number a bit smaller
...]$ python test # to start 

Over here, that code segfaults.  Other variations of the same (like, e.g., with "print") just fill up all assigned RAM until MemoryError is raised, or (without ulimit) the system locks up hard until the OOM killer kicks in. And, for some reason I still cannot figure out, after running that code several times without the ulimit, the kernel leaked GBs of memory as well and I had to reboot. This is abnormal behavior - you might not know it if you haven't used Python before, but after assigning "a" to a value, "a"'s former referent should be deleted by reference counting automatically.  If you know any PyQt developers, alert them on this issue.

Well, back on topic  :-)  .  I've just redesigned the metadata service's query system to accept multiple "infospaces": the Filesystem infospace is the one I'm currently working on.  The metadata service's index interface receives queries and determines which infospaces are alluded, and relays the query to each infospace querier, which return their own results, and finally the index interface collates them.   I'm planning to develop "Mail", "Web cache" and "Bookmarks" infospaces as well, which of course would require a separate infospace for each application.  In this area, the idea is to make it flexible and with room to grow for future needs, while keeping it simple and ending up having the featureset of at least Beagle and company.  There's no NL querying yet, but that's not required for a KFind- or Best-type application.

Oh, I forget: I've designed the metadata service as a system-level daemon, to avoid duplicity of information and save resources.  This should also allow to surpass Beagle's utility, and permit plugging the metadata service into custom document management systems, CMSs or workflow-type applications, e.g. MyDMS or Mambo.
  
(Frankly, never liked the fact that Best uses D-BUS to query the store.  Couldn't find any docs on how to plug independent components or consume Beagle's API/protocols.  Over here, we use XML-RPC over a UNIX socket, with a simple, defined API, which allows the system to determine which user ID is knocking on the XML-RPC port, and provide different levels of experience according to the user.  There's also a TCP XML-RPC server - but that's a mouthful for an acronym so I don't publicize it ;-D  )

Right now, information access security is of little concern: later on, I'll refactor the code to filter results users aren't allowed to see, in an extensible fashion (I can see the Filesystem infospace restricting access to files by using the standard access() UNIX call, e.g.).  I'm not really concerned with traditional exploits, seeing that they hardly can affect Python - nevertheless, I'm aware that, even with memory management and all the advanced facilities in Python, there might be a chance for an exploit.

Now I'm going to plug the old Zope code back in (having found out the memory issues aren't Zope's fault) and start using the ZODB again.  The ZODB has amazing features: multiple catalogs with different types of indexes, multilevel transactions/undo, but the most practical feature is *persistence*: developers do not need to write any code to actually store things in the database, they just assign to a variable and presto, the data is stored on disk.

- The metadata service has been stable for the past 8 hours, in which it has indexed over 150.000 files (sadly, only extracting basic filesystem attributes, because I cannot afford to use KDE's KFile until I have pinned down the leak issues).

I'm attaching a preview release.  You might get some useful ideas from it.  Check the UML artifacts too. They might not be complete or error-free (I slept through the introduction to OOP class, which was not really OOP but rather UML). 
 
If you feel like running the service and seeing how it behaves:
- the script name is "metadata-service", located in the src/ directory.
- Create on your PostgreSQL a database named "rudd-o", and grant everyone complete access to it.
- cd into the src directory, and run the script as "./metadata-service".  It will start and recreate the DB structure automatically, after which it will start indexing the / volume (and only that).
- a file called "metadata-service.log" will be created in the same directory of the script.
- Ctrl+C should terminate the service cleanly. SIGTERM as well.
- note that there is no search support in this release, because the PostgreSQL-based index is only a proof of concept, and it does not parse search queries at all.

I'm planning on dropping RDBMS support.  Future releases will use ZODB (which is needed anyway, to support all planned features).