How to divide and conquer a problem the UNIX way
The main loop
Now our little program is taking shape. Let’s code the main function up:
print "Starting download of finished torrents"
try:
for torrent,filename in get_files_to_download():
# no point in doing anything if the file was removed, right?
# so we continue if the file doesn't exist
if not exists_on_server(filename): continue
print "Downloading %s from torrent %s"%(filename,torrent)
retvalue = dorsync(filename)
if retvalue == 0:
report_file_done(filename)
print "Download of %s complete"%filename
# TODO: make use of fluxcli to kill the torrent out of TF
else:
if retvalue == 20:
print "Download of %s stopped -- rsync process interrupted"%(filename)
print "Finishing by user request"
sys.exit(2)
elif retvalue < 0:
report_file_failed(filename)
print "Download of %s failed -- rsync process killed with signal %s"%(filename,-retvalue)
print "Aborting"
sys.exit(1)
else:
report_file_failed(filename)
print "Download of %s failed -- rsync process exited with return status %s"%(filename,retvalue)
print "Aborting"
sys.exit(1)
print "Download of finished torrents complete"
except Exception,e:
report_file_failed("00 - General error")
raise</pre>
Do you see what it does? It simply asks which torrents are available and done downloading, retrieves the file names, and rsyncs — interspersed with a bit of error handling.
But I want it to run in the background…
I don’t want to type the command to run my script every time. I want it to be done automatically, every ten minutes or so. Therefore, I set up a cron job — cron is the UNIX service that executes commands periodically, and it’s brutally useful to automate tasks.
Two things were missing:
- I needed a log to inspect, in case something went awry.
- I also needed the application to run in the background, because otherwise
crongets stuck until the download (usually taking many hours) finishes.
Those two requirements were handily fulfilled by a single Google question on daemons (UNIX-speak for servers). The application, when run, daemonizes itself, disconnecting itself from the calling process, and saving all output to a file. We accomplish that by:
fork()ing, twice, and in the middle of the forks, making the program a session leader.- Closing the standard input, standard output, and standard error files, then reopening them to point to files on disk.
Once we’ve done this, cron thinks that the process is done, and continues on his merry path:
def daemonize():
"""Detach a process from the controlling terminal and run it in the
background as a daemon.
"""
try: pid = os.fork()
except OSError, e: raise Exception, "%s [%d]" % (e.strerror, e.errno)
if (pid == 0): # The first child.
os.setsid()
try: pid = os.fork() # Fork a second child.
except OSError, e: raise Exception, "%s [%d]" % (e.strerror, e.errno)
if (pid == 0): # The second child.
os.chdir("/")
else:
# exit() or _exit()? See below.
os._exit(0) # Exit parent (the first child) of the second child.
else: os._exit(0) # Exit parent of the first child.
import resource # Resource usage information.
maxfd = resource.getrlimit(resource.RLIMIT_NOFILE)[1]
if (maxfd == resource.RLIM_INFINITY):
maxfd = 1024
# Iterate through and close all file descriptors.
for f in [ sys.stderr, sys.stdout, sys.stdin ]:
try: f.flush()
except: pass
for fd in range(0, 2):
try: os.close(fd)
except OSError: pass
for f in [ sys.stderr, sys.stdout, sys.stdin ]:
try: f.close()
except: pass
sys.stdin = file("/dev/null", "r")
sys.stdout = file(os.path.join(torrentleecher_destdir,".torrentleecher.log"), "a",0)
sys.stderr = file(os.path.join(torrentleecher_destdir,".torrentleecher.log"), "a",0)
os.dup2(1, 2)
return(0)</pre>
We also need a way to prevent multiple copies of the program from running simultaneously, lest the several rsyncs step on each other’s toes.
The way to do this is to create and lock an empty file — we UNIXers call that a lockfile — if a second copy of the program tries to lock the file, it fails — and we use that effect to make the second copy of the program bail shamelessly. Code to create the lock file was also available on the Web and in the Python documentation:
def lock():
global f
try:
fcntl.lockf(f.fileno(),fcntl.LOCK_UN)
f.close()
except: pass
try:
f=open(os.path.join(torrentleecher_destdir,".torrentleecher.lock"), 'w')
fcntl.lockf(f.fileno(),fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError,e:
if e.errno == 11: return False
else: raise
return True
A call to daemonize() and a call to lock() in the main function, and we’re good to go.
…and I would also like to be informed whenever an event happens!
Now, cron doesn’t know anything about GUIs, so my little application needs to report back. How to know when a file is downloaded, or when an error took place?
With this code, which is rather ingenious:
def mail(subject,text): return getstdoutstderr(["mail","-s",subject,email_address],text)
def report_file_failed(filename): try: os.symlink(os.path.join(torrentleecher_destdir,".torrentleecher.log"),"%s.log"%filename) except OSError,e: if e.errno != 17: raise #file exists should be ignored of course sys.stdout.flush() sys.stderr.flush() errortext = """Please take a look at the log files in %s"""%torrentleecher_destdir mail("Leecher: error -- %s"%filename,errortext)
def report_file_done(filename): try: file("%s is done"%filename,"w").write("Done") except OSError,e: if e.errno != 17: raise #file exists should be ignored of course mail("Leecher: done -- %s"%filename,"The file is at %s"%torrentleecher_destdir)
These three functions basically create a visible link to the (normally hidden) log whenever an event takes place, and send mail (because, you know, mail is an integral part of the UNIX infrastructure).
I trapped potential error paths with try: except: blocks (like try/catch in Java), and interspersed calls to these functions in all the right places.
Miscellanea like signal handlers
One more thing that I wanted to go and, in the end, made, was to make the program respond gracefully to signals. A signal is usually sent to the process when you want to kill it — and since I expected that I might want to kill the downloader process once in a while, I wanted it to kill all rsync processes as well. It wasn’t hard to do at all.
I will forfeit the opportunity to copy and paste the appropriate code snippet (I’m sleepy), and instead will let you go and read the finished script linked on the next paragraph.
All that begins well, ends well
And this article is no exception. After years of doing what I do on Linux, I can confidently and categorically say that this is precisely the reason why the much-hated UNIX command line is orders of magnitude much more efficacious than the competition. Had I used Windows for the purpose, I would have had to develop an entire program and it would have taken me much, much longer — or, I could just have installed the Cygwin UNIX compatibility layer and run this very script in it.
But more valuable is the thought process that UNIX instills in you. With just about enough knowledge to get this small project started, I taught myself absolutely everything that was missing to solve the puzzle; after solving it, I was a better programmer.
Here’s the final masterpiece (note that I’ve already added a few new features to my local script). If you have any ideas, pitch in using the comment form.
And keep hacking!
November 5th, 2007 at 0:10
Hi there,
thank you so much for sharing that information on how you created your script and solved that problem. Believe it or not, there are people out in the world (like me) who like to read such things; as the process you are following does not necessarily come naturally. Its great to learn how to break up a problem into pieces like that.
thanks
Michaelg