Friday, June 11, 2010

Triumphs and Travails in Geekdom

My goodness, it seems that finishing theses has a strongly negative effect on blog-motivation! Never fear though, soon I'll have a 9,000 word paper on Simplicity (I know, right?) to share with you! I was afraid for a while that it was straying dangerously far from the theme of the class (Occam's Razor, and the tortured modern interpretations of it), but the professor reassured me that he was equally non-plussed by the several hundred pages of reading he had us do for the class, and that he'd be much more interested in reading something about the cognitive science of simplicity, which is what I'm writing about. Probably wise to at least mention the assigned readings, though; so far I've done lots of exposition on stuff I've read over the last year outside the class and and plenty of integration thereof with nary a mention of 'grue' or 'minimal reversals.'

I had a moment of programming-induced rage last night (involving lots of swearing at hung terminals and throwing empty cranberry-juice bottles), triggered by realizing just how awesomely specialized R apparently is for some tasks, as compared to Python. I'm working on this "Think Python" book, and one of the tasks it had me do really stretched the capabilities of the language. It asked me to read in the official crossword list and do some list operations on it; such an apparently easy task (from my R-soaked perspective) that I almost skipped it. As it turns out, making a list by traversing the word file and appending each of the 1ook or so words to the previous ones takes up a LOT of memory, and it was a "great learning experience". Apparently Python creates a new list at each iteration, so instead of one list with 100k-some elements you get 100k-some lists, each with as many elements as precedes it in the list. I know I could do the math to say how much excess memory that uses, but meh. I tried it several different ways, looking up efficient idoms in python to do the task, and every one of them caused the shell to hang for at least a half hour. None of them actually "finished" in a way that gave me a usable list; if I had the patience it would eventually give me back my command prompt, but would remain hung.

Maybe if my first real programming experience had been with a general purpose language like Python, I would have been expecting this. But I've been working in with super-specialized-for-statistics-"R" for the last year, which does tasks like this efficiently in the bleary-eyed milliseconds of post-waking-up pre-first-cup-of-coffee time frame that it takes for the enter key to spring back up after being depressed. Want to read a file and do list operations on it? Great! Type "someList<-read.csv('filename')" and its done. I gather that this built-in function is actually rather well-designed for the task, though I've never had a chance to really appreciate it until now. After some plastic-bottle-throwing and chair-overturning, it came to me that if I just initialize a vector of the length I needed and then used indexing to assign each word to its proper place as I traversed the list, I should avoid the memory issues. I was ready to throw Python (in the computers trash bin) if it couldn't handle that. Happily I found that its quite comfortable with that sort of operation, but my travails weren't quite over yet. The "range(x)" function will create a vector of length x with pleasing rapidity and is the closest equivalent to R's "vector(mode, length)" function I could find. Unhappily, if you try to print this vector with x=100k-some, it hangs the shell. What the EverlovingEff? This is another thing that R does without blinking. I suppressed my boiling anger long enough to find that if I just assign it to a variable without printing it, like this: "someList=range(x)," everything is fine. It makes a list of the requested length, but fills it with consecutive integers. Presumably this isn't the most memory-efficient way to go, since all I really wanted was zeros, but it does the trick.

Now, at long last, I could read the file and assign words to slots in this damn vector. Here's the code that I finally came up with, at 1am:

for line in fin:
for line in fin:

It works, just don't type "result" into the shell, for the love of god! Why was I doing all this, you might ask? Merely so that I could eventually implement the "bisect" algorithm (which incidentally Python already has a version of) to reduce the search space in sorted lists. Its pretty neat: If you've got an alphabetical list of words and you want to know whether some other word is in it, you could just read through the list, testing each word to see if it matches the new word, and after 100k-some operations you'd be done. OR, the non-naive way to do it: test whether the word is in the first or second half of the file based on alphebetization, then test the half that its in, then test that half... and continue till you've got a list of just one or two words you can test. The virtue is that the bisect algorithm completes the search in twenty-or-so operations, rather than 100k-some. Neat eh? Now that I've finally got a list with indexes, I can actually write the function!

I'd be pleased if some expert Python programmer would offer a comment explaining that I'm doing it all wrong and give me a nice simple solution to the problem. Pleased in an ironic way.

A few more complaints, though: IDLE doesn't seem exceptionally stable on OS X. I can't use the keyboard interupt to stop processes that are taking for ever, and I end up force-quitting it much more often than seems ideal. Also, I can't seem to run more than one shell at a time. Both of these things aren't a problem on my sparkly-new Ubuntu-running salvaged laptop from 2004; though that's got its share of headaches.

Speaking of which: Cory Doctorow has a glowing review of the latest Ubuntu release, co-opting the "It just works!" slogan erstwhile applied to Macs. As much as I'm enjoying Ubuntu, I can't say that I've had the same experience. Version 9.10 worked pretty well, aside from issues with the wireless driver (5 hrs of (clueless) shell scripting), not being able to hibernate when the lid is closed (no solution, just have to do it manually), and the wireless being disabled when it comes back from manual hibernation (no solution but reboot). I enthusiastically upgraded to version 10.4 when it prompted me to do so... only to find that I had to do some shell scripting to get the video driver to work, plus the wi-fi issues were back and harder to solve. I ended up going back to 9.10 in frustration. Complaints issued, I must say that I rather like it. Its sleek and fast, and there's free software available for everything I want to do. My advice to would-be Ubuntu explorers: be prepared for deep-googleing problem-solving sessions, at least if you're using a Dell x300.

No comments: