Neo4j and graph databases

Here’s a nice introduction by Todd Huff to the topic of graph databases: what they are, and why they’re relevant. The author starts by trashing all of the candidates:

So relational database can’t handle complex relationships. Graph systems are opaque, unmaintainable, and inflexible. OO databases loose [sic] flexibility by combining logic and data. Key-value stores require the programmer to maintain all relationships. There, everybody sucks 🙂

And then Todd gets into a nice discussion of one graph database, Neo4j. He cites a piece comparing Neo4j with Hadoop. Hadoop’s great for shallow data reductions, like log processing, but really bad for deep relationships.
And don’t just read this piece; bookmark it! Because at the end, Todd includes an excellent bibliography of related articles.

A month with a netbook

Just over a month ago, I bought myself an Asus EeePC 901 netbook, and wrote a blog piece describing my first impressions, including the process of installing Ubuntu Netbook Remix as the default OS. And then I started using the device, and didn’t think much more about it.
A couple of days ago, a friend emailed me, and asked, “I haven’t read any comments about [the EeePC 901] from you. Do you like it? Was it all you thought it would be? Would you buy it again, now that you have experience with it?” Good questions.
First: yes, I like it. I’ve made two trips to California recently, for job interviews and apartment hunting, and each time I took the netbook with me. Previously I’d have toted my MacBook Air, and while I miss Mac OS X, Ubuntu is fine for the basics: email, web access, word processing, blogging, twittering, and so forth. And the netbook is half the size, with three times the battery life, at a fraction of the price.
The latest Ubuntu WiFi works just fine – it’s almost as easy as OS X. Audio is a bit of a pain: the function keys work sometimes, but not always, so I occasionally have to use the volume widget. More annoying is the fact that even when the volume is zero, audio output can still cause the speakers to buzz and click. Odd.
Sleep mode works – mostly. I normally close the lid to sleep, then open the lid and press the power button to wake it. However on several occasions the machine has failed to go into sleep mode; on one occasion I retrieved it from my backpack after a few hours to find that the battery was drained and the machine was really warm! After that incident, I have taken to watching the blinkin’ lights on the front edge of the machine when I put it to sleep; if it fails to go to sleep correctly (about 20% of the time) I open it up and reset it.
I’ve recently been thinking about what gear to take with me when I’m travelling to Shenzhen for Huawei. Both the MacBook Air and the EeePC 901 are plausible: both can support Skype, so that I can phone home. (However the Mac has better support for L2TP tunnelling with services like PublicVPN.com.) Neither machine has a DVD drive, however, so I bought a bus-powered USB external DVD drive from LG which I can use to watch movies on either system.
The size of the EeePC 901 has not proved to be a usability problem. The keyboard, trackpad and screen are all just fine. The only nit is that the space bar seems 1-2 mm too high, and it’s quite sensitive, so that I tend to catch it after typing bottom-row letters. However those who know me will confirm that I’m a lousy typist anyway, so it may just be me.
Would I buy it again? I think so – if not this unit, then an equally light netbook, like the Asus “Seashell”. But the combination of size, weight, and battery life is pretty damn compelling; the 8.9 inch netbook is my sweet spot. It’s a shame that manufacturers seem to be giving up on this configuration.
Several people have asked if I plan to install Mac OS X on the EeePC. Right now, the costs – complexity, problematic networking, screen size assumptions in some apps, GUI real estate usage – seem to outweigh the benefits, so the answer is no. Now if someone came up with a foolproof way of reading a Leopard installation DVD and writing a bootable SD card, I’d be interested in playing with it. Until then, Ubuntu will be just fine.
(And yes, I am composing this on the netbook. Not to do so would be silly, wouldn’t it?)

CloudSlamming

One of the unexpected benefits of being between gigs is that I’m going to be able to attend all of Cloud Slam 09:
CloudSlam'09
A number of friends – Werner, Rob, and Hal, for example – are going to be speaking, and it looks like an interesting agenda. And in these cost-conscious times, a virtual conference is the way to go. Nevertheless, five days in front of my computer from 8am to 7pm (and that’s EDT, or GMT-4, so it’ll be 5am onwards here in Seattle); that feels a lot like work! I must be sure to stock up on my personal fuel.

Programming your PDP-11

Here’s a wonderful intro to a collection of videos and instructional materials from the 1970s on how to program and operate a DEC PDP-11. First, toggle in your loader, then boot from paper tape…. Very cool, in a retro way. I did a lot of PDP-11 stuff back in the mid-70s.
[The videos are accessible on YouTube; the website with the other materials appears to have been brought to its knees as a result of being mentioned in Boing Boing…]

40 years ago, more or less: my first application

As the calendar clicks around, I’m reminded of an odd anniversary. Roughly 40 years ago – maybe late 1968, perhaps early 1969 – I wrote my first serious piece of software: a real application, used by real people, and constructed as part of my paid employment. I thought it might be worth revisiting that event.
The first thing you have to understand is that I’d had no computer-related education at all. The closest I came at the Royal Grammar School, High Wycombe, was an after-school seminar in the School Library, when somebody delivered a talk on computers. I’ve forgotten the content of the presentation completely; I only remember that the speaker passed around a core memory module for us to look at. (Hands up those who don’t know what “core memory” is, or how it works.) In the spring of 1968 I applied to Essex University to read Economics, and that summer I took GCE A Levels in Economics, Maths (A+S), and Physics. However I had already decided that it would be useful to spend what is now termed a “gap year” before going to university, in order to get some experience of the real world. Fortune (or nepotism) was in my favor, and I was accepted at the UKAEA Harwell to spend a year as a “Mathematics Assistant”.
I started in September 1968, and lived in a hostel (a barracks, really) in Abingdon. I was working for the Programmes Anaysis Unit (PAU), a group that was trying to understand the economic impact of government-sponsored research and development initiatives. We were interested in how quickly innovation spread through a marketplace, and what the return on investment looked like. I was the only assistant in a team of a couple of dozen eminent scientists and economists. They understood the policy issues, and most understood the mathematics. The challenge was gathering the data and interpreting it.
I started out on issues related to ROI. The models typically involved calculating the year-by-year impact of an investment, with each annual contribution discounted due to monetary deflation and substitution. I worked up a family of models of increasing complexity; for each one, I planned to accumulate the discounted annual contributions until the marginal return was less than some epsilon. But how to run them?
I was put in charge of the department’s Wang Programmable Calculator. The programming model was similar to more recent programmable calculators from TI and HP. The program memory essentially stored keystrokes, which were executed just as if you’d pressed them. Keystroke steps were numbered, and there were conditional and unconditional branch operations. For the Wang, the “program memory” was a pre-scored card, from which “chads” were punched out with a stylus; the card was then “read” in a device that looked like a small toaster. The output display used Nixie tubes
I programmed up my first model. It ran to completion in 5 minutes. My “second order” model took 30 minutes to finish. The “third order” model ran for four hours. When the “fourth order” model had not converged after an overnight run, I knew that I needed some better technology. My team leader, a physicist who had never recovered from the fleshpots of Cairo during the 8th Army campaign of 1942, directed me to the computing centre. There a rather startled young man with a huge red beard thrust a copy of “McCracken on Fortran” into my hand, created an account for me on the IBM 360/65, and showed me where the card punches were. Two days later, I’d completed all of the ROI calculations, and I was hooked.
In those first programs I used the 360 as a glorified version of the Wang calculator. I didn’t have to manage data sets, or design complex algorithms, or do anything for output beyond printing a single number. But the next job was different. Several PAU teams were interested in how technologies were taken up by a marketplace, and then (as now) it was assumed that adoption tended to follow an S-curve. Today, curve-fitting is a standard feature of every maths library, but in 1968 we were making it up as we went along. Furthermore we weren’t simply throwing a best-fit curve through a bunch of points: we had a number of exogenous constraints that we had to respect.
One of my colleagues came up with a nice set of linear transformations for the primary equations (Sigmoid and Gompertz), which meant that I could vary one parameter (usually the asymptote, which was constrained anyway) and use a linear fit to generate the other values. I demonstrated experimentally that graphing the residual errors against the asymptotes had a single minimum, so I was able to use a simple bisection approach to find the best fit. Some of the data sets were too big to fit in memory, so I added a buffered input reader to stream the data from the disk (or was it a drum?).
My first version of the program simply output the parameters of the S curve and the residual errors. This was OK for the mathematicians, but unsatisfactory for the policy wonks. I made friends with the red-bearded guy in the computer centre (who would later be my lecturer at Essex University!), and discovered that the IBM 360/65 was equipped for COM, or Computer Output on Microfilm. I cut-and-pasted some code from the COM system documentation, and augmented my application with full graphical output, showing the original data points (or bucketed samples thereof) and the various s-curves that corresponded to the different constraints.
By this point, I was more or less lost to the PAU. While I kept doing minor tasks for them, I spent 80% of my time in the computer centre, and by the time I left in June, 1969, I was helping teams from all over Harwell with their applications. I’d also moved on from punched cards to a teletype-based RJE system, which was only one step away from being a real interactive system. (For that, I had to wait until I encountered the PDP-10 in 1970.)
Meanwhile my application was used for a number of years. When I returned to a different branch of Harwell in the summer of 1971, I was asked by my old team to make several small enhancements. Naturally, I looked at the code I had written, and was mortified at how primitive it was. But it was my first, and self-taught to boot, so I cut myself some slack and fixed it.

AWS and Ruby

The book of the moment is James Murty’s “Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB”. It’s a really nicely-written introduction and tutorial for our utility computing services, with plenty of sample code that just works. Highly recommended.
Murty chose to write his examples in Ruby, which pushed one of my buttons. I have a love-hate relationship with Ruby, and it’s getting to the point where I’d love to find an alternative ((And that doesn’t include PHP or Python, or even Groovy.)). On the one hand, Ruby offers Smalltalk with instant gratification. On the other, we have a syntax replete with ad hoc short-cuts, looping constructs with inconsistent scope rules, and ASCII rather than UTF-8.
My friend Jon Irving agreed:

Hahah, yes – I love it, although the things which I love are the
things that make it horrific for any large app. Re-opening class defs,
awesome, except when you’re trying to find where a method is defined.
Monkey-patching, awesome, except when you suddenly find that for *no
perceptible reason* a core API has been changed by some library you’re
using.
And rails, oh rails. What is this “thread safety” of which you speak?
Srsly, it’s like it’s 1995 all over again. But much prettier, and this
time smalltalk won!

Writing Ruby is great fun; reading someone else’s Ruby application (particularly anything substantial) is deeply frustrating. In other words, Ruby is a candidate write-only language. And that’s a shame.

The truth about Linux

Jaron Lanier nails it:

Some of the youngest, brightest minds have been trapped in a 1970s intellectual framework because they are hypnotized into accepting old software designs as if they were facts of nature. Linux is a superbly polished copy of an antique, shinier than the original, perhaps, but still defined by it.

However the prevailing cult of OSS is so dominant that even the most obviously proprietary projects have to pretend to be open source. (The fact that all of the individuals with “commit” privileges happen to work for a single company is purely coincidental.) And try telling any OSS enthusiast that they ought to be “open” to a world with multiple open source operating systems…
Anyway, by picking out the most provocative paragraph, I’m doing an injustice to Jaron. It really is an interesting piece, especially what it has to say about the importance of speciation. Check it out.