Personal tools
You are here: Home Linux and free software

Linux and free software: front page

Get to know the wonderful world that Linux and free software enthusiasts have created!

How to solve network data corruption with Realtek cards

Posted by Rudd-O at Jan 03, 2009 03:05 AM |
Filed under: Fedora Realtek bugs Linux tips

Do you have a Realtek card driven by the r8169 driver, and you're experiencing SSH, SSL or rsync errors associated to the error message "Disconnected: Corrupted MAC on input"? Do you download large files only to find them corrupted? Here's the solution.

Read More…

Have a newish NVIDIA GeForce and experiencing freezes with desktop effects enabled?

After fixing your video card issues, you can enjoy the full range of desktop effects available with Compiz and KWin in KDE 4.1 and above.

Read More…

How to make your video or music player play on another sound card in Fedora (and Ubuntu)

Posted by Rudd-O at Dec 15, 2008 09:30 PM |

The latest iterations of popular Linux distros Fedora and Ubuntu include PulseAudio. Make sure you learn about it!

Read More…

How to customize your full-text RSS feeds in Plone, and discourage spam content harvesters at the same time

How to customize your full-text RSS feeds in Plone, and discourage spam content harvesters at the same time

Posted by Rudd-O at Dec 12, 2008 08:56 AM |

Earlier on, we discussed how to enable full-text RSS feeds. Now we'll discuss how to improve on that by preventing your full-text feeds from being harvested and posted on blogspammers' sites. The nice thing about this trick is that you can use it to include any sort of text on your RSS feeds, while leaving your site content completely unaffected.

Read More…

A hack to enable full-text RSS feeds in Plone

Posted by Rudd-O at Dec 12, 2008 06:10 AM |

You can let people syndicate your Plone site, searches or collections in full. Here's how.

Read More…

How to really speed up Web serving (and Plone) for your iPhone readers

How to really speed up Web serving (and Plone) for your iPhone readers

Web servers (especially Plone, by default) work wonders in combination with an HTTP accelerator such as Varnish or Squid. But your iPhone readers are out of luck because of a grave bug on MobileSafari -- Plone sites are especially slow like molasses on the iPhone. Don't worry, here's a trick that will solve it.

Read More…

Using Varnish to capitalize on image hotlinkers and retain referred visitors

Using Varnish to capitalize on image hotlinkers and retain referred visitors

Hotlinkers suck your bandwidth dry, but referrals are valuable to you. Learn how to use Varnish to stop image hotlinkers while at the same time welcoming referred visitors and gaining a big performance edge.

Read More…

How to move a Plone or Zope site (instance) to another folder or computer safely

How to move a Plone or Zope site (instance) to another folder or computer safely

It's not just a matter of moving the files... but it's not hard either.

Read More…

NoMachine NX fails on Fedora 9 -- here's the solution

Posted by Rudd-O at Jul 20, 2008 02:27 PM |

I am a heavy user of NoMachine NX. After my move to Fedora 9, it fails with an error of The connection with the remote server was shut down. Please check the state of your network connection. The solution is simple: install the package xorg-x11-fonts-misc. Here's why that works:

The NX server requires a font named fixed from the X core font subsystem when starting up. Fedora 9 no longer has the X core font subsystem by default -- they solved it by compiling fixed and cursor right into the X server. Therefore, the X fonts package is no longer installed by default.

Without said fonts, the NX server fails. Once you've installed them, NX works again.

You're welcome.

Read More…

Big win for ZFS on Linux today

Posted by Rudd-O at Jul 16, 2008 05:05 PM |

ZFS through FUSE is now much faster on Linux thanks to a patch Ricardo Correia developed and I improved, against the latest code in the ZFS-FUSE repository. Further big wins are expected soon.

Heads up: 64 bits are needed for high performance -- I'm still running 32 bits so there's lots of low-hanging fruit to be reaped. Why 64 instead of 32 bits? Integer arithmetic.
atomic_add_64, the function in the ZFS sources that accounts for the most CPU time, is over twenty instructions of 32 bits assembler guarded by mutexes. The same function in 64 bits is one instruction. One.

The updated patch is here:

diff -r 008c531499cd src/zfs-fuse/zfs_operations.c
--- a/src/zfs-fuse/zfs_operations.c     Thu Oct 30 16:45:21 2008 +0100
+++ b/src/zfs-fuse/zfs_operations.c     Sat Dec 06 22:59:49 2008 -0500
@@ -213,15 +213,22 @@

        cred_t cred;
        zfsfuse_getcred(req, &cred);
+       struct fuse_entry_param e = { 0 };

        error = VOP_LOOKUP(dvp, (char *) name, &vp, NULL, 0, NULL, &cred, NULL, NULL, NULL);
        if(error)
+       {
+               if (error == ENOENT) {
+                       /* Cache negative entries */
+                       error = 0;
+                       e.ino = 0;
+                       e.entry_timeout = 3600;
+               }
                goto out;
+       }

-       struct fuse_entry_param e = { 0 };
-
-       e.attr_timeout = 0.0;
-       e.entry_timeout = 0.0;
+       e.attr_timeout = 3600.0;
+       e.entry_timeout = 3600.0;

        if(vp == NULL)
                goto out;

This patch gives an estimated fourfold to eightfold performance improvement in syscalls involving metadata lookups like stat and lstat.

Read More…

The document-centric and application-centric paradigms vs. streams

Posted by Rudd-O at Jun 24, 2008 10:17 PM |
Filed under: user interfaces KDE GNOME

I'd like to see the following ideas studied and implemented. And you're welcome to contribute to them (it's on a wiki, after all).

Read More…

ZFS on Linux: my story and HOWTO you can have it too

Posted by Rudd-O at Jun 24, 2008 09:45 AM |

Have you heard about ZFS? It's a generation-defining stable high-performance high-end filesystems, created by Jeff Bonwick at Sun, and ported over to Mac OS X and the BSD family. Oh, and for Linux, using the FUSE (Filesystem in userspace) kernel abstraction. Here's my ZFS story.

I'm using Kubuntu Hardy, and my computer has two 400 GB SATA hard disks. Yes, that's all the storage I have at hand; as of three days ago, it was RAIDed using the multipath devices (md) kernel module, split in two LVM volumes: /and /home. Oh, and two same-size byte-aligned swap partitions, one on each disk, swapon'ed pri=0.

I had been salivating over the thought of using ZFS in my workstation because of several killer features:

  • The first one that comes to mind is end-to-end data integrity thanks to checksumming -- I've already had many disks go bad on me, while others corrupted my data silently (which is, believe it or not, the most insidious thing ever, because after you've noticed it, backups won't help you with that -- you've probably already papered over your backups with new, bad data).
  • The second one is compression. Together with tightly packed data, compression promises to increase performance and reduce disk utilization.
  • The third one is the advanced transactional algorithm that yields an always-consistent disk structure. Unlike log-based filesystems, ZFS does copy-on-write and ripples the changes up through the filesystem tree; before the topmost node is updated, the changes don't affect consistency; when the topmost node is updated, the disk is consistent as well. Never fsck again!

"Damn, gotta get me some of that, I thought"

Getting ZFS was actually a piece of cake: I went to the Mercurial repository for the project, selected the tip view, and downloaded a nice tarball. I then installed a couple of dependencies according to the README, and hit scons in a terminal window. Five commands were built:

  • zfs-fuse, the daemon that serves FUSE requests. The FUSE module is an odd beast: applications futzing with a FUSE-mounted filesystem talk to the kernel VFS, which talks to FUSE, which talks to the daemon backing that particular mount. This userspace-kernelspace-userspace-kernelspace--userspace overhead, you will see, is a big deal.
  • zfs and zpool, the main management commands that use IPC to talk to zfs-fuse.
  • two others that you won't care and I won't care either.

A cursory inspection with such important system binaries was in order, so I ldd the daemon and the commands.
zfs-fuse links to /usr/lib/libz*.so*. Not good, chicken and egg problem, linking to a library in a filesystem that will not be available before zfs-fuse is running? I rebuild it using a modified SConstruct file so it statically links that library in.

I had decided that my filesystem layout would be:

  • 1 GB swap partition on each disk
  • 1 GB / filesystem, composed of two RAID1 partitions (one on each disk), formatted with ext3 (in case of catastrophe, it's nice to have something the kernel can boot without initial RAM disks)
  • 398 GB ZFS volume, where I planned to drop /usr, /home and /var

But I didn't have extra hard disks to make the switch. No problem, croupier, everything I have on red please -- and spin that wheel! I installed ZFS directly on my running system. How did I do it? Well, if you must know:

  • I offlined the second disk with mdadm.
  • I swapoff'ed its swap partition. At this point the disk is no longer busy.
  • I repartitioned the disk (if the disk is non-busy, the kernel rereads the partition table just fine).
  • Then I relied on the first all the time.

Yes, realtime no-boot filesystem switchover -- or at least I thought it would be that easy (I was very wrong).

Then I mkfs.ext3ed the new 1 GB root filesystem, and mkswap'ed the swap one. A couple of rsyncs later (which I scripted for consistency and repeatability), I had a new, working /. I mounted it and went in it, to remove mdadm.conf and lvm.conf lines that could prove problematic on next boot. At this point I was panicking because it was superstitiously conceivable that, after a reboot, md would want to rebuild the arrays and destroy the second disk.

I then copied the ZFS binaries in /sbin and ran it. A cursory lsof inspection told me that the ZFS socket was on /etc/zfs/zfs_socket.
zpool create quickly gave me the 392 GB of disk space that were previously empty in the second disk, in which I created subvolumes, with adjusted mount points to end up under a temporary tree structure under /newfs. Curiously, after creating a subvolume, it's not mounted, but a zfs mount -a works as you probably would expect.

I enabled compression in the root volume (subvolumes inherit attributes) and started rsyncing /var, /usr and /home into each subvolume. Cue the movie 32 hours later to have an idea of how slow it was. It was unbelievably slow -- un-frigging-believable, with both CPUs nearly pegged and regularly hovering at 150% combined user+system. The worst part is, I was seeing disk throughput in the 2-3 MB/s range, using iostat 1 and zpool iostat 1. Keep in mind that performance (high write throughput, low responsiveness/latency during massive reads) is marketed as a ZFS selling point -- and I don't doubt the Sun guys... on Solaris, not Linux!

During that lengthy process I started finding out several things that would prove crucial later on:

  • FUSE does not support mmap in the Linux kernel that my distribution uses. Many, many applications rely on that feature to work.
  • There was no initscript for ZFS. I would have to write an initscript from scratch. On Kubuntu, where initscripts are being (1) phased out and (2) completely different to my beloved RPM distros.

At this point I was a bit nervous, if you'll allow me to understate. But I wrote the initscript anyway:

#! /bin/sh
### BEGIN INIT INFO
# Provides:          zfs
# Required-Start:    mountall
# Required-Stop:     sendsigs
# Should-Start:
# Should-Stop:
# Default-Start:
# Default-Stop:
# Short-Description: Enable/disable the ZFS-FUSE subsystem
# Description: Control ZFS-FUSE subsystem
### END INIT INFO

PIDFILE=/var/run/zfs-fuse.pid
LOCKFILE=/var/lock/zfs/zfs_lock

. /lib/init/vars.sh

. /lib/lsb/init-functions
. /lib/init/mount-functions.sh

export PATH=/sbin:/bin
unset LANG
ulimit -v unlimited

do_start() {
	test -x /sbin/zfs-fuse || exit 0
	PID=`cat "$PIDFILE" 2> /dev/null`
	if [ "$PID" != "" ]
	then
		if kill -0 $PID 2> /dev/null
		then
			echo "ZFS-FUSE is already running"
			exit 3
		else
			# pid file is stale, we clean up shit
			log_action_begin_msg "Cleaning up stale ZFS-FUSE PID files"
			rm -f /var/run/sendsigs.omit.d/zfs-fuse "$PIDFILE"
			log_action_end_msg 0
		fi
	fi

pre_mountall

log_action_begin_msg "Starting ZFS-FUSE process"
	zfs-fuse -p "$PIDFILE"
	ES_TO_REPORT=$?
	if [ 0 = "$ES_TO_REPORT" ]
	then
		true
	else
		log_action_end_msg 1 "code $ES_TO_REPORT"
		post_mountall
		exit 3
	fi

for a in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
	do
		PID=`cat "$PIDFILE"`
		 [ "$PID" != "" ] && break
		sleep 1
	done

if [ "$PID" = "" ]
	then
		log_action_end_msg 1 "ZFS-FUSE did not start or create $PIDFILE"
		post_mountall
		exit 3
	else
		log_action_end_msg 0
	fi

log_action_begin_msg "Immunizing ZFS-FUSE against OOM kills and sendsigs signals"
	mkdir -p /var/run/sendsigs.omit.d
	cp "$PIDFILE" /var/run/sendsigs.omit.d/zfs-fuse
	echo -17 > "/proc/$PID/oom_adj"
	ES_TO_REPORT=$?
	if [ 0 = "$ES_TO_REPORT" ]
	then
		log_action_end_msg 0
	else
		log_action_end_msg 1 "code $ES_TO_REPORT"
		post_mountall
		exit 3
	fi
	
	log_action_begin_msg "Mounting ZFS filesystems"
	
	zfs mount -a
	ES_TO_REPORT=$?
	if [ 0 = "$ES_TO_REPORT" ]
	then
		log_action_end_msg 0
	else
		log_action_end_msg 1 "code $ES_TO_REPORT"
		post_mountall
		exit 3
	fi

if [ -x /usr/bin/renice ] ; then
		log_action_begin_msg "Increasing ZFS-FUSE priority"
		/usr/bin/renice -15 -g $PID > /dev/null
		ES_TO_REPORT=$?
		if [ 0 = "$ES_TO_REPORT" ]
		then
			log_action_end_msg 0
		else
			log_action_end_msg 1 "code $ES_TO_REPORT"
			post_mountall
			exit 3
		fi
		true
	fi
	
	post_mountall
}

do_stop () {
	test -x /sbin/zfs-fuse || exit 0
	PID=`cat "$PIDFILE" 2> /dev/null`
	if [ "$PID" = "" ] ; then
		# no pid file, we exit
		exit 0
	elif kill -0 $PID 2> /dev/null; then
		# pid file and killable, we continue
		true
	else
		# pid file is stale, we clean up shit
		log_action_begin_msg "Cleaning up stale ZFS-FUSE PID files"
		rm -f /var/run/sendsigs.omit.d/zfs-fuse "$PIDFILE"
		log_action_end_msg 0
		exit 0
	fi

pre_mountall

log_action_begin_msg "Syncing disks"
	sync
	log_action_end_msg 0

log_action_begin_msg "Unmounting ZFS filesystems"
	zfs unmount -a
	ES_TO_REPORT=$?
	if [ 0 = "$ES_TO_REPORT" ]
	then
		log_action_end_msg 0
	else
		log_action_end_msg 1 "code $ES_TO_REPORT"
		post_mountall
		exit 3
	fi
	
	post_mountall # restore /var/lock and /var/run to their right places

log_action_begin_msg "Terminating ZFS-FUSE process gracefully"
	kill -TERM $PID

for a in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
	do
		kill -0 $PID 2> /dev/null
		[ "$?" != "0" ] && break
		sleep 1
	done

if kill -0 $PID 2> /dev/null
	then
		log_action_end_msg 1 "ZFS-FUSE refused to die after 15 seconds"
		post_mountall
		exit 3
	else
		rm -f /var/run/sendsigs.omit.d/zfs-fuse "$PIDFILE"
		log_action_end_msg 0
	fi

log_action_begin_msg "Syncing disks again"
	sync
	log_action_end_msg 0
}

case "$1" in
  start)
	do_start
	;;
  stop)
	do_stop
	;;
  status)
	PID=`cat "$PIDFILE" 2> /dev/null`
	if [ "$PID" = "" ] ; then
		echo "ZFS-FUSE is not running"
		exit 3
	else
		if kill -0 $PID
		then
			echo "ZFS-FUSE is running, pid $PID"
			zpool status
			exit 0
		else
			echo "ZFS-FUSE died, PID files stale"
			exit 3
		fi
	fi
	;;
  restart|reload|force-reload)
	echo "Error: argument '$1' not supported" >&2
	exit 3
	;;
  *)
	echo "Usage: $0 start|stop|status" >&2
	exit 3
	;;
esac

:

The script should explain itself.

There were two problems, though. I derived my script from the NFS one and, in the process, I discovered that NFS was symlinked to be started at slot 31 in level 6 and 0. This means that the initscripts subsystem would call that script with a start argument when in reality, the action was in the stop block. Since I couldn't figure out what kind of magic the Upstart initscripts compatibility subsystem does to get a stop block to run when a start block is requested by its configuration, I just created two glue scripts: one to start ZFS no matter what, and one to stop ZFS no matter what:

-rwxr-xr-x 1 root root 481 2008-06-18 04:09 /etc/init.d/mountzfs
-rwxr-xr-x 1 root root 488 2008-06-18 04:09 /etc/init.d/umountzfs

Then I studied the Kubuntu boot sequence very carefully, and used some elbow grease (update-rc.d) to symlink them to get the results I wanted:

lrwxrwxrwx 1 root root 19 2008-06-18 03:52 /etc/rc0.d/S35umountzfs -> ../init.d/umountzfs
lrwxrwxrwx 1 root root 19 2008-06-18 03:52 /etc/rc6.d/S35umountzfs -> ../init.d/umountzfs
lrwxrwxrwx 1 root root 18 2008-06-18 03:52 /etc/rcS.d/S36mountzfs -> ../init.d/mountzfs

Trust me, writing the script was the easy part -- figuring out how it interacts with the rest of the system was much harder.

Finally, I rebooted to my new root filesystem on the second disk. If you thought that my system booted correctly, you would be very, very wrong indeed. Eighty percent of the boot sequence were red [ fail ]s and sh: command not found errors. At the end, the system dropped me into a recovery console, where I could finally switch the ZFS mount points to their final destinations. Then, just to try out: zfs mount -a.

/homemounted.
/var couldn't be mounted, because the boot process graciously created incredibly important missing directories in it. And then, deadlock.

Crap, what was wrong?

Alt+SysRq+R. Boot again. What's wrong? No idea. Try strace. The friggin' command is in /usr. Hypotheses ran through my head for two hours. With me in front of a very, very broken system. I tried everything under the Sun that I could get my hands on -- which is not much when you don't have a CD-ROM drive, mind you.

And (summarizing two hours) then, I tried this: zfs set mountpoint=/tmp/usr vault/usr ; mkdir -p /tmp/usr ; zfs mount vault/usr.

Miracle of miracles, it worked. I copied the entire cast of characters of Linux Debugging: The Movie into the very tightly packed /. I strace --ffed the hell out of zfs-fuse and I found the problem. The moronic mount.fuse subcommand, that actually connects the kernel and user endpoints, tries to read /usr/lib/locale/locale-archive right in the middle of mounting the filesystem! Instant deadlock that you can only get out of by using the SysRq OOM key (yes, zfs-fuse is actually a great OOM candidate -- 1.5 GB VM size on this 1.0 GB RAM computer; yes, I discovered that on my own before I wrote the OOM immunization code in the initscript).

I then discovered two things: zfs-fuse didn't deadlock when started from the recovery command, but it did lockup when starting it from the initscript. What you can't see is that the version of the initscript that I initially wrote was sourcing the LANG variable from a configuration script in /etc. OK, so how do you solve locale problems? Instant fixup: unset LANG before running the command.

OK, so do I have a booting system now, or what? Wrong again. Some processes get started before the actual mounting of filesystems, and the ZFS subsystem can't actually be started earlier in the boot process without creating an initramfs dependency or another, different, chicken-and-egg problem. So I moved what I could move from the ZFS volume's /var into the /var directory of the / filesystem. I ended up with this structure backed up by ZFS (and the rest, you can safely assume, in a very tightly crammed ext3 filesystem):

zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
vault             294G  69,8G    18K  none
vault/home        290G  69,8G   290G  /home
vault/usr        3,36G  69,8G  3,36G  /usr
vault/var         842M  69,8G    18K  none
vault/var/cache   515M  69,8G   515M  /var/cache
vault/var/lib     282M  69,8G   282M  /var/lib
vault/var/tmp    44,5M  69,8G  44,5M  /var/tmp

Boot again. Oh, yeah, I'm enjoying the 3-minute boot time on this formerly-a-screamer machine. D-Bus fails to start. D-Bus is actually very required for many things in Kubuntu, but I manage to start a GUI session up, if only to Google up what was wrong with it. That was probably not the best moment to find out that just starting the KDE 3.5 session took over ten minutes. All of this with less than 1 MB/s from the disk, according to iostat and 160% CPU usage, according to top.

Then I discovered the zfs-fuse Google group. It's a fantastic place where everyone (including Ricardo Correia) received me very well and had lots of tips. Only there did I find out what was wrong with D-Bus -- a bug that manifests itself only with FUSE filesystems, for which a patch exists and works.

At this point I'm extremely exhausted from this marathon session, so I basically just try to backport the patch into the dbus source package for my distribution. You've probably heard that Debian (and, by extension, Ubuntu) has a fantastic build system -- it failed on me. Not only was apt not working (remember the mmap issue?), but dpkg-source also failed while trying to apply the patches for the source package. Oh, yes, I manage to solve this problem by learning, on-the-spot, how the apt build "system" actually works, and manually replicating the entire process that should be automated. Many thanks to the gents at #debian in Freenode for their kind responses to my questions.

Bam, built dbus (it's yours if you want it). Installed it. Started it. And the chain of daemons that were depending on it just start up and take life. Neat trick, Upstart!

Back to performance questions and ZFS. Do you know what the real performance killer is? You'll never guess it...

...icons! While GTK+ applications take marginally more time to start under a ZFS regime, KDE applications take an order of magnitude more. Before, on a warm working set, a KDE application took about 2 seconds to start. Today, Kmail takes in excess of five minutes to start. Why? Here's why -- multiply that by fifty thousand and you'll get the idea. Each icon that the application requests results in thousands and thousands of access() and stat() calls. FUSE doesn't use a kernel cache by default (there are several reasons for that), so the only cache that backs those requests up is the ARC cache, which is an impressive caching regime and technical achievement but, in this case, it's very much like caching your car keys somewhere in Europe, because of the transatlantic userspace-kernelspace-userspace-kernelspace-userspace barrier. Per-call. When this is taking place, the CPUs remain pegged at 190%, eaten by ZFS alive, and the 12 case fans jump to 11.000 RPM.

The zfs-fuse Google groups guys came up with a couple of suggestions (all documented in the list, which I'm too lazy to link to again). These all are compile-time options, so a ZFS rebuild is in order for every one of them:

  • scons debug=0. A very slight CPU usage decrease.
  • Increasing the ARC cache. I doubled it from 128 to 256 MB. Turns out it's not a caching problem and it doesn't help at all.
  • Mount option big_writes for FUSE filesystems. Here's what I did about that:

Recompiled ZFS, this time enabling a FUSE mount option named big_writes that I've read about in the Google group. Yes, the daemon needs to be recompiled, and it's not fast. No, I'm not actually jumping to the part where I actually compiled ZFS with big_writes first, then booted, only to find out that I needed a new kernel. Oh, wait, I just did. Fortunately, I did back zfs-fuse up.

Next up? Latest 2.6.26-rc6 kernel, because of:

  • Hey, writable mmap is there for FUSE filesystems! Yeah! Now I can have apt-get back!
  • big_writes.

When was the last time a kernel compile took four hours for you? Mine was yesterday. But it's actually fun -- the process hasn't changed that much from 1998, and the distro already comes with a nice .config that you can reuse with
make oldconfig. And, this time, you get to do out-of-tree kernel builds! Yay!

Well, I ticked the wrong option in make menuconfig anyway, because my kernel modules don't fit my puny /, now at 400 MB free. Jeez, four hours. Google some more. Turns out I turned a debugging option on.

After this, FUSE userspace itself was due for a recompile. Another odyssey, whose fruits you can reap here (warning: CVS checkout).

OK, redo the initial RAM disk, adjust GRUB configuration, reboot with the latest kernel. It's all good. More surprisingly, I'm actually getting some of my performance back. Some of it. As in "Kmail no longer takes five minutes to start -- only three".

And, most importantly, applications that depend on mmap now work correctly. My boot process isn't an epic [ fail ] anymore -- and that's incredibly reassuring.

This is the point where my journey turns into smooth sailing. I zpool scrubed my new baby. After five hours, with the solid guarantee that my data was OK and nothing'd been lost or corrupted during the rsync, I nuked my first disk, replicated the new partition structure on it. A nice RAID1 array for the final /. A short rsync for the / filesystem. A quick mkswap for the new swap partition. A fast adjustment in /etc/fstab and another one in mdadm.conf for the new array. Reinstall and reconfigure GRUB on the first disk. And, finally, I leave the best for the latest:

zpool attach vault /dev/by-id/second-disk-huge-partition /dev/by-id/first-disk-huge-partition

Man, that rocked. It was unbelievably fast -- like, disk-platter fast, around 40 to 50 MB per second, and the system didn't get that much more slow when it was resilvering the first disk. Which kind of makes lots of sense, because zfs-fuse is now crossing the userspace-kernelspace barrier just once per operation. How do I know this? Well, strace: I know that what zfs-fuse does is, it opens the disk partition in direct I/O mode and then manages it for itself, responding to FUSE requests -- but the resilvering process doesn't involve FUSE at all, it's just the two disks practically chatting with each other through zfs-fuse. Now I know for sure that ZFS will give me platter speeds. It's just a matter of time (and maybe me pestering Ricardo Correia to collaborate with me on this same issue).

Questions that I haven't solved yet? Sure, there are a lot. Two that haunt me:

  • No root filesystem on ZFS. Others on the Google group have managed it. Me? I didn't want to mess with /etc/zfs inside the initramfs, thank you very much.
  • I know this for sure: the only active cache now is the userspace ARC cache from ZFS; I read the FUSE kernel code, and it clearly flushes files from the cache when programs open() them. Honestly, if I could wish for something to just become true overnight, I'd wish for the ARC to be moved into the kernel and to have it replace the page cache, but that won't happen anytime soon. There's a FUSE kernel_cache option, but I'm wary of enabling it. When I have been sufficiently reassured that the option won't corrupt my precious data, I will enable it. That will be a couple of hours of reading someone else's code, so I'm inclined to defer it for a few days. But, in theory, this should give me platter speeds instead of giving my 12 case fans 'speed'. At the hefty cost of RAM for two redundant caches.
  • Do filesystem readahead and Linux disk scheduler algorithms mess up in some way with ZFS' control of the platter? The data integrity question is closed, because the writes are submitted with barriers, but I'm worried that the Linux I/O scheduler is second-guessing the decisions of ZFS' one.
  • The /etc/init.d/sendsigs omit.d protocol I'm using on the initscript plain fails. I had to shunt the script with an exit 0 right before the killall5 in sendsigs because killall5 plain hung instead of ignoring ZFS as it should have done -- and it needs to ignore ZFS because ZFS is unmounted later. This won't be a problem once we get our own kernelspace ZFS implementation.

OK, that was my journey. I'm on ZFS now, my machine's rock-solid (if a bit CPU-tired) and my data's never been so safe. I also got compression, which saved me about 6 GB. Furthermore, I've given you the initscript, the steps and the software (except ZFS, but you can compile that yourself).

Go wild.

Read More…

Filesystems vs. brain damage in Linux applications -- why, oh why, can't it be better?

Posted by Rudd-O at Jun 19, 2008 08:24 AM |

Look, I'm a fan of Linux. But nothing can excuse the stupid behavior of KDE applications under Kubuntu:

What you’re about to see is just a twenty-line snippet representing standard pathological behavior of a widely used KDE application that comes with Kubuntu. I'm at a loss of words, because what you're about to see repeats itself (I kid you not) at least a twenty thousand times:

access("/usr/share/icons/crystalproject/24x24/devices/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/24x24/filesystems/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/24x24/filesystems/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/24x24/mimetypes/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/24x24/mimetypes/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/actions/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/actions/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/apps/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/apps/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/devices/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/devices/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/filesystems/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/filesystems/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)
access("/usr/share/icons/crystalproject/32x32/mimetypes/mail_todo.xpm", R_OK) = -1 ENOENT (No such file or directory)

Now, let me ask you: what kind of engineering is this? Oh, I'll just look for the icon named mail_todo of types XPM, SVG, SVGZ, PNG, JPG across thousands of directories and if I can't find it, I just won't draw it.

Modern desktop environments package all that shit up in a single icon cache and use mmap to quickly find it. Well, guess again, loser who invented that band-aid. I'm using a filesystem that doesn't support mmap. Now my filesystem has to consume inordinate amounts of CPU to find each one of those two hundred icons your application uses, because you couldn't find a better algorithm to find the motherfucking icon.

Oh, by the way, every time I delete an e-mail from KMail's listing, the entire process is repeated again. I guess I can safely omit the part where Kicker (the panel) and Kopete all do the same things for each icon that they display. Woops, I didn't omit it.

Just another tale of living in the bleeding edge of Linux computing.

In the interest of full disclosure: I'm using the CPU-heavy ZFS filesystem from Sun. I'm testing it because it offers unsurpassed reliability. Yes, it's CPU-heavy. Yes, it runs through FUSE. But that's not the point -- the point is that this wouldn't matter if KDE apps (and, I'm sure, many others as well) wouldn't suck so much to find a motherfucking icon. After the latest torvalds kernel recompile and another ZFS recompile for performance (which failed), I have gained quite a lot of performance: now KMail doesn't take 10 minutes to launch -- it only takes four. On a dual-core Xeon, with 1 GB RAM and 7200 RPM dual platters. Four motherfucking minutes. I now fully agree with Cox when he said userspace does stupid things.

Read More…

Are you using Firefox 3?

Posted by Rudd-O at Jun 13, 2008 06:04 PM |
Filed under: Firefox easter eggs haha!

Then click here.

Read More…

Processing in Python: the future of multiprocessing

Posted by Rudd-O at May 29, 2008 10:05 PM |

The PEP for the inclusion of pyProcessing in Python 2.6 and 3.0 has been published. This is incredibly significant for multicore programming.

Read More…

KDE obliterates the competition

Posted by Rudd-O at Apr 25, 2008 08:57 AM |

Imagine 52 million children being simultaneously introduced to KDE and Linux. Well, you no longer need to just imagine it, because the Ministry of Education Brazil, over the course of this and next year, will do exactly that. This is unabashed success.

Read More…

The history meme

Posted by Rudd-O at Apr 15, 2008 02:14 PM |
Filed under: bash hacks tips

Since it's been around my corners of the Internets, I guess it's time for me to post this meme as well:

<