Penguin

March 15, 2010

Spammed - bacula-dir.conf

I’ve been getting spammed from a bacula server in Romania, finally got hold of the SysAdmin that had copied a conf file from my Wiki that had my own e-mail address in it…… Nice to know my conf files work…. not nice getting spammed from a backup server i Romania. YES, I’ve removed my email address from my bacula-dir.conf fil on my Wiki.

/Roger Sinel

March 10, 2010

orj

Lately I’ve been playing with Microsoft’s Azure service.  Tonight in particular I was attempting to use the Table Storage service.  Table Storage is a simple REST based object persistence system.  Microsoft have wrapped this in the ADO.NET Data Services API.  So it looks fairly full featured.  However it is not.  At almost every turn I have ended up bashing my head against a Table Storage limitation.  Debuging these problems has been a bit of a nightmare.

The things I have learned are as follows.

Development Table Storage is Arse

The local Development Table Storage service (based on top of SQL Server Express), has limitations and idiosyncrasies that the full cloud hosted service does not, as outlined by Microsoft.

In particular the fact that “in development storage, querying on a property that does not exist in the table returns an error” caused me a bit of a problem.  When my table was empty I could not execute simple queries such as:

var q = from v in context.CreateQuery(VehicleTableName) where v.Id == id select v;

Doing so with an empty table would generate cryptic and unhelpful exceptions with messages along the lines of “one of the request inputs is not valid”.  With some furious googling I discovered that, with the development table storage service, one has to incant the following on service start up to ensure that the table storage knows about the structure of your objects.

var query = from x in context.CreateQuery(VehicleTableName) select x;
var l = query.ToList();
var v = new Vehicle();
context.AddVehicle(v);
context.SaveChanges();
context.DeleteObject(v);
context.SaveChanges();

This insert/delete voodoo ensures that the SchemaXml column of the TableContainer table for my “Vehicles” table is populated with the appropriate XML definition of my Vehicle class. You have to do this for each of your tables/classes every time you start up your service.  This is idiotic to say the least.

You Can’t Store Classes with Decimal Members.

It took me a while, after many more “one of the request inputs is not valid” style exceptions to figure out that my Vehicle class was being rejected because it had a property, Price, of type Decimal.  That type is not supported by Table Storage.  I don’t think this is documented anywhere.

DatesTimes Must Be UTC.

After yet more “one of the request inputs is not valid” exceptions, I guessed why the following was failing.

var q = context.CreateQuery(VehicleTableName)
.Where(v => v.PartitionKey == Vehicle.Partition && v.ExpiryDate >= DateTime.Now.Date);

I needed to add the magic UTC characters so it read as follows.

var q = context.CreateQuery(VehicleTableName)
.Where(v => v.PartitionKey == Vehicle.Partition && v.ExpiryDate >= DateTime.UtcNow.Date);

So my journey so far into Azure’s data storage APIs has been somewhat less enjoyable than I had otherwise hoped. I just hope my luck improves as I delve deeper into its mysteries.


March 07, 2010

Driftwood in the sea of concepts

Rick Jelliffe:

If they are good ideas they will surface sooner or later. It is not that good ideas always come out on top (ie., that the status quo necessarily reflects the best) but I do think that good ideas are buoyant: they will keep trying to surface.

This is a greatly put observation, and it sends my mind in so many directions. If we think of the parallel between memes and genes – is this one shared, too, does an analogous phenomenon exist in biological systems? Does this mean that these ideas are inherent or implicated somehow in the structure of the memetic genome? Does that mean they are reflective of the structure of a universal language of some form? And is that inturn merely reflective of our biophysical structure (the way our brains, our metabolism, our genes work) – or might it even imply a form of Platonic idealism?

My head is spinning.

March 06, 2010

March 02, 2010

Glasshouses and control?

I was just reading this article here about why Sun died in the opinion of Jeremy Allison and was quite amazed to be honest.

I have to agree with him that OpenOffice, Solaris etc weren't handled well and this contributed to their downfall. I'm not sure whether it was failure on the hardware side, services attach or the software that did them more damage but I agree with his points.

Most of all I'm amazed because Jeremy is working for Google. Google seem to be doing exactly the same thing with the Android platform. Parts of it are open, but the parts they choose and they release it out when they feel like it. The same thing occurs with Chrome, ChromeOS etc etc, not to mention that so much of their services run on unreleased code.

You can see some independent discussion about this here in case you think I'm biased because I work for the Symbian Foundation. (Of course all this is my own opinion, not my employer's... blah blah blah). Plenty more opinions like this abound on the web as well but I'm too lazy to post them here.

What astounds me though is that Jeremy posts this, while working for Google. Is he getting annoyed with Google and telling them to be more open, or does Google just allow free thinkers? He hasn't made the connection to what is going on inside Google but you don't have to be a genius to link the two together...

We live in interesting times!


February 26, 2010

C code in Iron Man

I was re-watching Iron Man recently and noticed something interesting.  During Iron Man’s first “boot up sequence”, in the “terrorist” caves of Nowhereistan, some butchered C code is displayed on a faked up laptop screen.

C source code from Iron Man Movie

The code displayed on screen, although missing some syntactically important characters such as semi-colons, is actual valid C source code.  So valid in fact that I wondered where it came from.

After a quick Google I found it. This code is in fact as follows:

    send[0] = 0x65;
    send[1] = 1;
    send[2] = 3;
    send[3] = 5;
    send[4] = 7;
    send[5] = 11;

    if (rcx_sendrecv(fd, send, 6, recv, 1, 50, RETRIES, use_comp) != 1) {
	fprintf(stderr, "%s: delete firmware failed\n", progname);
	exit(1);
    }

    /* Start firmware download */
    send[0] = 0x75;
    send[1] = (start >> 0) & 0xff;
    send[2] = (start >> 8) & 0xff;
    send[3] = (cksum >> 0) & 0xff;
    send[4] = (cksum >> 8) & 0xff;
    send[5] = 0;

    if (rcx_sendrecv(fd, send, 6, recv, 2, 50, RETRIES, use_comp) != 2) {
	fprintf(stderr, "%s: start firmware download failed\n", progname);
	exit(1);
    }

    /* Transfer data */
    addr = 0;
    index = 1;
    for (addr = 0, index = 1; addr < len; addr += size, index++) {

The code above comes from a firmware downloader for the RCX (a programmable, microcontroller-based Lego brick), written in 1998 at Stanford University by Kekoa Proudfoot. You can get the full source file here and it is distributed under the Mozilla Public License.  This is the same license used by Firefox and many other Open Source software products.

The sequence in the film in which this code appears suggests that the code is either being downloaded as firmware to the Iron Man suit or being used to upload firmware to an RCX Lego brick that is somehow involved in the operation of Iron Man.

So it appears that Iron Man is either powered by Open Source software or made of Lego.  I’m not sure which is cooler.


February 17, 2010

Fedora 13 - Automatic Print Driver Installation

What a great feature, this will really increase the Linux Desktop user experience and this function equals or surpasses functionality of proprietary operating systems regarding this issue. Go Fedora you rock!! Keep out there on the bleeding edge, keep pushing the envelope and making things happen fast and furious. We love it.

“Automatic Print Driver Installation”

/Roger

February 15, 2010

LUG

Whilst I was home in New Zealand on a Holiday I took some time to go to a local LUG fixit meeting. We were about 10 Linux geeks in a Church hall talking about FOSS and helping each other our with diverse problems. It was great to meet some like minded people whilst on holiday and get a little insight into the FOSS community in my old home town.
So thanks to WLUG for making me feel welcome.
Now it’s back to work as usual.

Next week there is a meeting within the Stockholm FOSS community that has accumulated quite a lot of interest. http://foss-sthlm.haxx.se/

/Roger

February 08, 2010

orj

Today, while doing some .Net development I noticed something about MSBuild that is really annoying.  I’ve seen his before but didn’t know the reason behind it.  Now I do and it is a little frustrating.

If you have assembly A that depends on assembly B that depends on assembly C and assembly C exists in the GAC (Global Assembly Cache).  When assembly B is copied into assembly A’s “bin” directory during build, assembly C is not copied, even if you specify that assembly C is a “private” (copy local) dependency of assembly B.

The problem is that by default MSBuild’s ResolveAssemblyReference task will not copy referenced assemblies that are in the GAC to the build output directory unless the referenced assembly is specifically tagged as “private” by the build project being built.

So even though assembly A doesn’t directly reference assembly C in its code, it only uses C through the intermediate dependency B, you still need to add C to A’s referenced assemblies set for C to be copied into the build output of A.

There is a discussion of this issue here, on Microsoft’s developer forums.


February 04, 2010

Repairing broken documents that mix UTF-8 and ISO-8859-1

A perpetual (if thankfully not too frequent) problem on the web are documents claiming to be encoded in either UTF-8 or ISO-8859-1, but containing characters encoded according to the respective other charset. Such documents will display incorrectly, regardless of which way you look at them. Worse, if the document in question is XML (such as, say, a newsfeed) and claims to be encoded in UTF-8, upset ensues that leads the XML parser to halt and catch fire as soon as it encounters the first invalid byte.

How does it know? It does because UTF-8 has a very specific way of encoding non-ASCII characters. Encoding non-ASCII characters according to ISO-8859-1 violates this scheme, so their presence is detectable with a very high degree of confidence.

Of course, this can just as soon be used to good advantage. If you start with the working assumption that the primary encoding of a confusedly encoded document is UTF-8, and merely decode and re-encode the byte stream, you can salvage misencoded data by catching any character decoding errors and decoding the offending invalid bytes as ISO-8859-1.

Here’s a Perl script, cleverly called repair-utf8, which implements this approach:

#!/usr/bin/perl
use strict;
use warnings;

use Encode qw( decode FB_QUIET );

binmode STDIN, ':bytes';
binmode STDOUT, ':encoding(UTF-8)';

my $out;

while ( <> ) {
    $out = '';
    while ( length ) {
        # consume input string up to the first UTF-8 decode error
        $out .= decode( "utf-8", $_, FB_QUIET );
        # consume one character; all octets are valid Latin-1
        $out .= decode( "iso-8859-1", substr( $_, 0, 1 ), FB_QUIET ) if length;
    }
    print $out;
}

The only non-obvious bit to be aware of here is that when using the FB_QUIET fallback mode, Encode will remove any successfully processed data from the input buffer. The entire script revolves around this behaviour. After the first decode, $_ will be empty if it was successfully decoded. If not, the successfully decoded part at the start of $_ will be returned, and $_ will be truncated from the front up to the offending byte. The second decode is then free to process that. The inner loop will keep running as long as any undecoded input is left, decoding it, if need be, one byte at a time as ISO-8859-1.

See also Sam Ruby’s just posted clean_utf8_for_xml.c.

January 30, 2010

iSingularity? (take 2)

Yesterday I wondered:

Where are the people who worry about [the iPad] being the future of computing?

Turns out I was just too impatient by a few hours. Alex Payne voiced the same thought and Steven Frank wrote along very similar lines, though in some fundamental ways I disagree with him. Adam Pash wrote a decent piece at Lifehacker, though I think the issue is better covered by David Megginson’s rather wider concern. But the piece that I was looking for, basically word for word, was Mark Pilgrim’s take.

The reason I disagree with Steven is that I think the Old World/New World dichotomy is a red herring. The only real difference between these is what UI metaphor is predominant in each and what supporting concepts are exposed to the user. Steven talks about “Old Worlders” expecting windows, menus and toolbars and other complexity that presumably corresponds to power. But as an Old Worlder, that’s the least of my worries. In my opinion, compared to home computers, personal computers already present huge barriers to tinkering – but merely de facto, due to the sheer complexity of modern systems.

Let me walk down memory lane. I grew up on PCs, not home computers, myself. I boggle in retrospect at how many stumbling blocks the Microsoft ecosystem and culture forced me to overcome. People who grew up on either home computers or Unices had an order of magnitude easier a time to get into computing. If I’d been someone not so doggedly curious, that differential could easily have been enough to keep me away. Things haven’t gotten better since, and meanwhile the complexity of modern computers has only increased. But the defining situation for children and teenagers is that they have no money but an infinite supply of time. In the Microsoft ecosystem, those were largely fungible – and so I overcame.

On the iPad? Not a chance. The iPad’s answer to the problems of personal computers is to simplify the UI – which is good. But the complexity under the hood isn’t even a concern. And that’s because it legislates a barrier to entry for tinkerers. No one can do anything with it that Apple does not approve of – in Adam Pash’s words, Apple’s gotten into the habit of acting like you’re renting hardware. Now, you can tinker – but you need a Mac and an iPhone dev licence: a large wad of cold hard cash, exactly what children and teenagers don’t have. (Some of them will have parents who understand why this is a good idea and can provide the spare cash. I was out of luck on both counts.) The iPad the barrier to entry is so ridiculously high, I would not have been able to surmount it.

In contrast to Steven’s thesis, I posit that the iPad represents no trend reversal, but rather is poised to be the bend in the hockey stick shape of a curve we have been riding for a long time – as Robert Young points out:

When IBM created the Personal Computer in 1981, it predicted 2,500/year in sales. They based this estimate on a specified use case: users (assumed to be engineers, scientists, etc.) would write programs for their own use, and run same on their Personal Computer. To that end, IBM made available 3 operating systems from which the user could choose the one to his liking: CPM/86, UCSD P-System, PC-DOS. It was envisioned to be a mainframe on a desk.

And so it was until… Lotus released 1-2-3, and only for PC-DOS. At that point the light bulb went off around 128 and the Valley: what IBM had created was an Office Stove, a device for which the User DIDN’T write the programs to be run, but which could bake all sorts of delight food stuffs. That IBM didn’t restrict the BIOS and didn’t secure the OS’s made Billy Boy rich. And quite a few programmers.

The iPad is just the most extreme extension of this paradigm: it’s an appliance, but significantly less open to the gaggle of Cooks in the wild. Users, in Steverino’s mind, couldn’t care less whether the Cooks are indentured servants to Apple. They don’t even care that they are locked-in to Apple. They just know that the tarts taste good.

The iPad is not a revolution. It is right in line with where we have been going for decades. If it represents anything fundamental, that is the courage to throw out an ill-fitted UI metaphor to better serve this direction.

But how would the fundamental experience of the device suffer if Apple shipped a dev environment with the iPad, just like one used to be part of every home computer (incl. the Apple II)? Is that really an inconceivable proposition? Or heck, it could be a $20 download on the App Store for all I care. That’s no hurdle for a teenager, not even a big one for a preteen. Why must the iPad require a dev licence and a Mac to write code for? (Obviously: because that makes Apple a lot of money.)

The current personal computer is a bad paradigm. What I was hoping for was a move toward things like Alan Kay’s visions – a simplification of programming to the point where everyone (especially kids) can do it so easily, at least for simple tasks, that it becomes routine. The iPad is the direct opposite of that.

The irony in all this is that for all of how much Adobe Flash gets lambasted in the Apple sphere (and make no mistake, I am not enamoured with Flash on any level), it let Joe Gregorio’s 13-year-old create his first game, one of subsequently many others. And a successful iPad would close even this unsatisfying avenue.

Is the future we’re getting the one we really want?

January 29, 2010

iSingularity?

The reactions to the iPad I have seen so far almost exclusively fall in two camps: people who think it’s lame (even though we know how that worked out for the iPod and iPhone nay-sayers) and people who rave about it as the future of computing.

Where are the people who worry about it being the future of computing? I have seen writing along lines from Rafe Colburn so far, but precious little else. And I feel my worry is more fundamental than he expressed.

No home computer generation is going to grow from the iPad. And if it is the future, possibly ever after. The revolution devours its own children.

January 25, 2010

In praise of Git’s index

I still run into people lambasting Git for the concept of the index from time to time. It seems strange and superfluous to users of other VCSs – like a speed bump that serves no purpose. Why not just commit the changes in the working copy? This perception is understandable; when I first heard of Git, back as a Subversion user, I was one of these people.

How times and minds change. Today, I use it and rely on it so much that I can’t imagine moving to any other VCS that doesn’t have this concept. (And none of the contemporary contenders do.) Because of this, I keep responding to such criticism, repeating myself. I figured I should put my explanation down somewhere where I can point people to.

So what is the index good for?

The key to understanding it is how it interacts with git diff. Once you add something to the index (also referred to as staging it), it disappears off the diff. You can pass --cached to see what changes you have staged, but by default, it doesn’t show you the changes that you have asserted are ready for commit. When I first read about this, it sounded outright stupid to me. Why would anyone want that?

Turns out: because it is hugely helpful. Consider: when a merge fails, the successfully merged diff hunks are staged, but conflicted hunks are not – which means that git diff will show only conflicts, and the successfully performed part of the merge doesn’t cloud the diff. Furthermore, the way to mark files with conflicts as merged is to stage them after manual resolution, which makes them too disappear from the diff. Maybe this is why Linus introduced the concept in the first place, being that the main part of his job is to perform merges all day long. But that’s far from the only circumstance in which the index has been useful to me.

The essence, already apparent in the above description but applicable much more widely than just during merges (which I don’t do a whole lot of, all things considered), is that the index introduces the idea of a known good part of a commit under construction.

Often, when I set out to make some self-contained change to the code, I don’t know up front the detailed approach of how I’ll go about it. I may also end up making incidental other changes – a small improvement to a utility library, a fix for a tiny bug I noticed while tooling about in its vicinity, stuff like that. As well, I sometimes end up changing directions a few times for some aspect or other of the change that I was originally planning to make.

Having the index available to me, I just keep working on things for however long I need to arrive at a clear picture, without worrying about commits. Afterwards, I start by reviewing the diff to see how to break down the work into chunks that will best make sense to whoever might read the patches later. Then I use git add --patch to gradually untangle changes from each other into separate logical steps. This command will even let you edit diffhunks for extra control, which I occasionally make use of to pull apart changes from multiple logical steps that ended up affecting the same line(s).

I’d say I end up making about 3-and-change commits on average out of non-trivial work units, along with a varying number of assorted one-liner commits that may get shuffled onto other branches. Yet I am free to get there any way I shall, rather than being forced to painstakingly plan out the minutiæ of the work ahead of time. I keep harping on this, but it really matters to me. I love how much Git goes out of its way to get out of mine in this regard.

NB: if you work this way, it means that when time comes to commit, you are making up commits that reflect states of the source code which never existed on disk before. So you don’t actually know whether the commit you are about to make is any good – a syntax error might have slipped in, say. Again, git has just the ticket: it’s called git stash --keep-index. This will stash the changes you see using git diff, but not the ones you have staged, so it will leave the code on disk exactly the same as the index. Use this just before you commit, to run your tests. After committing, you apply the stashed changes back into the working copy using git stash pop, as always, and continue where you left off.

In this kind of workflow, the meaning of git diff becomes “work I haven’t reviewed yet” or “work I don’t want to commit yet” and git diff --cached becomes “work I have vetted for inclusion in the next commit”. The index is what makes this possible.

I don’t know how I ever worked another way.

January 07, 2010

WS-REST 2010: Call for Papers

WS-REST 2010:

Call for Papers

The First International Workshop on RESTful Design (WS-REST 2010) aims to provide a forum for discussion and dissemination of research on the emerging resource-oriented style of Web service design.

Background

Over the past few years, several discussions between advocates of the two major architectural styles for designing and implementing Web services (the RPC/ESB-oriented approach and the resource-oriented approach) have been mainly held outside of the research and academic community, within dedicated mailing lists, forums and practitioner communities. The RESTful approach to Web services has also received a significant amount of attention from industry as indicated by the numerous technical books being published on the topic.

This first edition of WS-REST, co-located with the WWW2010 conference, aims at providing an academic forum for discussing current emerging research topics centered around the application of REST, as well as advanced application scenarios for building large scale distributed systems.

In addition to presentations on novel applications of RESTful Web services technologies, the workshop program will also include discussions on the limits of the applicability of the REST architectural style, as well as recent advances in research that aim at tackling new problems that may require to extend the basic REST architectural style. The organizers are seeking novel and original, high quality paper submissions on research contributions focusing on the following topics:

  • Applications of the REST architectural style to novel domains
  • Design Patterns and Anti-Patterns for RESTful services
  • RESTful service composition
  • Inverted REST (REST for push events)
  • Integration of Pub/Sub with REST
  • Performance and QoS Evaluations of RESTful services
  • REST compliant transaction models
  • Mashups
  • Frameworks and toolkits for RESTful service implementations
  • Frameworks and toolkits for RESTful service consumption
  • Modeling RESTful services
  • Resource Design and Granularity
  • Evolution of RESTful services
  • Versioning and Extension of REST APIs
  • HTTP extensions and replacements
  • REST compliant protocols beyond HTTP
  • Multi-Protocol REST (REST architectures across protocols)

All workshop papers are peer-reviewed and accepted papers will be published as part of the ACM Digital Library. Two kinds of contributions are sought: short position papers (not to exceed 4 pages in ACM style format) describing particular challenges or experiences relevant to the scope of the workshop, and full research papers (not to exceed 8 pages in the ACM style format) describing novel solutions to relevant problems. Technology demonstrations are particularly welcome, and we encourage authors to focus on "lessons learned" rather than describing an implementation.

Papers must be submitted electronically in PDF format. Submit at the WS-REST 2010 EasyChair installation.

Easychair page: http://www.easychair.org/conferences/?conf=wsrest2010

Important Dates

  • Submission deadline: February 8, 2010, 23.59 Hawaii time
  • Notification of acceptance: March 1, 2010
  • Camera-ready versions of accepted papers: March 14, 2010
  • WS-REST 2010 Workshop: April 26, 2010

Program Committee Chairs

Program Committee

Contact

WS-REST Web site: http://ws-rest.org/
WS-REST Email: chairs@ws-rest.org

January 02, 2010

On narratives in introspection

I have noticed that when people try to figure out the reason for some behaviour, particularly one of their own, they will often ask themselves what the cause is, come up with a list of candidates, and then try to find the prinicipal factor in this list.

Most of the time when I see such a list, it seems to me that that the answer is really “all of the above”. The answer is not to be found in any one of the items – it is to be found in the relationships and the forces between all of them. So it occurred to me that instead of coming up with a list of candidate reasons by asking which one it could be, a more productive approach would be to simply try to think of all plausible reasons, and then try to analyse them in relation to each other.

People don’t do this naturally. A major factor is probably that doing so requires higher ambiguity tolerance than working out an intuitively attractive narrative does.

December 02, 2009

Idle design musing from last night

Where less is more,
too little can be too much.

December 01, 2009

What is the “controller” in a web app?

Rafe Colburn:

People know what a “view” is (the “v” in MVC), but trying to get people to understand the difference between a “model” and a “controller” can be difficult. […] You can always yell at people and tell them not to put business logic in the view layer, but creating a mental model that explains why that’s a bad idea might help.

I generally think of it as business logic, HTTP translation layer, and templates, rather than as model, controller and view, respectively.

That there is the controller’s job: to munge the HTTP request into calls and parameters in terms of the business logic (the model), then take the results and turn them back into an HTTP response, for which it generally enlists the aid of a template subsystem, which is useful because the production of different responses typically involves shared functionality.

I find this way of thinking about the layers makes it easy for me personally to make the right distinctions between the parts of the job that belong to the controller vs the model. Maybe it will also work for others.

November 24, 2009

My new Nokia E72 phone / comparison to E71

Well I’ve had my Nokia E72 for about a week now after lending to a colleague for a few days who helped fix a few issues (thanks Shaun).

I’m very impressed with the phone overall. It is a faster improved version of the E71 (or will be once all software bugs are gone).

Things I like about it are:

  • speed. Much faster e.g. takes about half the time to load Opera than before
  • optical navikey – very nice when it works
  • camera. It is actually usable and better than my N78 or E71. E71 on original firmware was a joke and newer firmware was a little better. It has a decent flash and can cope with low light fairly well
  • the new improved email client. (In theory, as in practice it has let me down)
  • Being able to group your connections together and WLAN taking precedence over 3G. Let’s hope more software supports it soon.
  • USB charge

Things that are still a bit problematic about it:

  • The video doesn’t always cope on recording at highest CPU setting. Stutters every 2 seconds or so. Fixed by going down a notch
  • Not linking in comprehensively to Google Apps with Mail for Exchange. Can do calendar and contacts but not email. Other people in forums can do email but not calendar and contacts. I have been talking to Nokia Messaging team and they are working on the issue
  • Push email ignores your connection settings and will keep on popping up WLANs available. If you select cancel it won’t sync your email.
  • So I’ve changed to IMAP and that works, until it doesn’t! Sometimes you need to exit email, other times you need to reboot the phone before it starts working again..
  • Can’t default to HTML email. Have to click and then it opens in browser. The Nokia Messaging client that you can download for E71 can and also fixes some of the other email so I suspect the version in firmware is a bit old. Luckily I live in a country where the leading telcos upgrade firmware quickly. Not :-(
  • The Google Mail app doesn’t work on the E72. Downloaded it, went to install and it just disappears in a virtual puff of smoke. Google Maps works fine but no compass support. Tried Nokia Maps and it is meant to have compass support but didn’t appear for me and now says I need a license.
  • Optical navikey is not supported properly in all apps and you need to click on the edges which are impossibly hard – need about 20kg of pressure on the edge! Also some applications (Opera Mini) scroll very slowly with it. Thankfully 2, 8 page up/down in it.

I know many of the syncing things can be fixed by products like Dataviz and Goosync but I’d like what I have paid for to work! I’m hopeful it will get fixed in newer firmware which I can hack onto the device.

Overall a very good device and the remaining little irritations can be fixed in software I think.

NB This post does not represent the opinion of my employer or any of it’s members.


November 20, 2009

Hat-in-rubber-gloved-hand time

Me and my 'mo

Point, chuckle, and then donate.  (Hey, I've got 11 days to go.)

This is my every-three-yearly charity drive.  Last time I tried it, my boss made a sizeable donation on the condition I never do it again.  (Sorry, Andrew.)  Two countries and three years later, I'm at it again - and you should make a donation to prostate cancer research in my honour, because otherwise we just go on not talking about it until Rubber Glove time.

November 17, 2009

Fedora 12 released today

The day has come around again. The day when my favourite Operating system releases it’s new version. Check it out.

Release Notes, Feature List

Download Fedora

/Roger Sinel
Fedora Ambassador

November 10, 2009

How do I change the DHCP subnet for NAT on VMware Fusion 3.0?

There are a couple of helpful blog posts (Nilesh Kapadia and Max Newell deserve a shout-out here) which help you with changing the DHCP settings given to your NAT or host networks on VMware Fusion. However, it all changes in 3.0.

The file you now need to edit is /Library/Application Support/VMware Fusion/networking. In there, you will find these lines:

answer VNET_8_HOSTONLY_SUBNET 192.168.93.0
answer VNET_8_VIRTUAL_ADAPTER_ADDR 192.168.93.1

I believe the third octet (the 93 part) is selected randomly when you install; in any case, I wanted to give out addresses on 192.168.227.0/24, so I changed the configuration like so:

answer VNET_8_HOSTONLY_SUBNET 192.168.227.0
answer VNET_8_VIRTUAL_ADAPTER_ADDR 192.168.227.1

and restarted the network interfaces:

sudo "/Library/Application Support/VMware Fusion/boot.sh" --restart

Now, make a note of the MAC address of your virtual network adapter in your guest OS, and you can assign an entry in the dhcpd.conf file (/Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf).  Make sure you do it outside of the area that is marked "this will be overwritten"!

host developer-vm {
    hardware ethernet 00:0c:29:cb:dd:72;
    fixed-address 192.168.227.128;
}

and another service restart.

November 08, 2009

ePubs and quality

You may have heard news about the release of "bookserver" by the good folks at the Internet Archive. This is a DRM-free ePub ecosystem, initially stocked with the prodigious output of Google's book scanning project and the Internet Archive's own book scanning project.

To see how the NZETC stacked up against the much larger (and better funded) collection I picked one of our Maori Language dictionaries. Our Maori and Pacifica dictionaries month-after-month make up the bulk of our top five must used resources, so they're in-demand resources. They're also an appropriate choice because when they were encoded by the NZETC into TEI, the decision was made not to use full dictionary encoding, but a cheaper/easier tradeoff which didn't capture the linguistic semantics of the underlying entries, but treated them as typeset text. I was interested in how well this tradeoff was wearing.

I did my comparison using the new firefox ePub plugin, things will be slightly different if you're reading these ePubs on an iPhone or Kindle.

The ePub I looked at was A Dictionary of the Maori Language by Herbert W. Williams. The NZETC has the 1957 sixth edition. There are two versions of the work on bookserver. A 1852 second edition scanned by Google books (original at the New York Public library) and a 1871 third edition scanned by the Internet Archive in association with Microsoft (original in the University of California library system). All the processing of both works appear to be been done in the U.S. The original print used macrons (NZETC), acutes (Google) and breves (Internet Archive) to mark long vowels. Find them here.


Lets take a look at some entries from each, starting at 'kapukapu':


NZETC:

kapukapu. 1. n. Sole of the foot.

2. Apparently a synonym for kaunoti, the firestick which was kept steady with the foot. Tena ka riro, i runga i nga hanga a Taikomako, i te kapukapu, i te kaunoti (M. 351).

3. v.i. Curl (as a wave). Ka kapukapu mai te ngaru.

4. Gush.

5. Gleam, glisten. Katahi ki te huka o Huiarau, kapukapu ana tera.

Kapua, n. 1. Cloud, bank of clouds. E tutakitaki ana nga kapua o te rangi, kei runga te Mangoroa e kopae pu ana (P.).

2. A flinty stone. = kapuarangi.

3. Polyprion oxygeneios, a fish. = hapuku.

4. An edible species of fungus.

5. Part of the titi pattern of tattooing.

Kapuarangi, n. A variety of matā, or cutting stone, of inferior quality. = kapua, 2.

Kāpuhi, kāpuhipuhi, n. Cluster of branches at the top of a tree.

Kāpui, v.t. 1. Gather up in a bunch. Ka kapuitia nga rau o te kiekie, ka herea.

2. Lace up or draw in the mouth of a bag.

3. Earth up crops, or cover up embers with ashes to keep them alight.

kāpuipui, v.t. Gather up litter, etc.

Kāpuka, n. Griselinia littoralis, a tree. = papauma.

Kapukiore, n. Coprosma australis, a shrub. = kanono.

Kāpuku = kōpuku, n. Gunwale.



Google Books:

Kapukapu, s. Sole of the foot,

Eldpukdpu, v. To curl* as a

wave.

Ka kapukapu mai te ngaru; The wave curls over.

Kapunga, v. To take up with both hands held together,

Kapungatia he kai i te omu; Take up food from the oven.

(B. C,

Kapura, s. Fire, -' Tahuna he kapura ; Kindle a fire.

Kapurangi, s. Rubbish; weeds,

Kara, s. An old man,

Tena korua ko kara ? How are you and the old man ?

Kara, s> Basaltic stone.

He kara te kamaka nei; This stone is kara.

Karaha, s. A calabash. ♦Kardhi, *. Glass,



Internet Archive:

kapukapu, n. sole of the foot.

kapukapu, v. i. 1. curl (as a wave). Ka kapukapu mai te ngaru. 2. gush.

kakapii, small basket for cooked food.

Kapua, n. cloud; hank of clouds,

Kapunga, n. palm of the hand.

kapunga, \. t. take up in both hands together.

Kapiira, n. fire.

Kapiiranga, n. handful.

kapuranga, v. t. take up by hand-fuls. Kapurangatia nga otaota na e ia. v. i. dawn. Ka kapuranga te ata.

Kapur&ngi, n. rubbish; uveds.

I. K&r&, n. old man. Tena korua ko kara.

II. K&r&, n. secret plan; conspiracy. Kei te whakatakoto kara mo Te Horo kia patua.

k&k&r&, D. scent; smell.

k&k&r&, a. savoury; odoriferous.

k^ar&, n. a shell-iish.


Unlike the other two, the NZETC version has accents, bold and italics in the right place. It' the only one with a workable and useful table of contents. It is also edition which has been extensively revised and expanded. Google's second edition has many character errors, while the Internet Archive's third edition has many 'á' mis-recognised as '&.' The Google and Internet Achive versions are also available as PDFs, but of course, without fancy tables of contents these PDFs are pretty challenging to navigate and because they're built from page images, they're huge.

It's tempting to say that the NZETC version is better than either of the others, and from a naïve point of it is, but it's more accurate to say that it's different. It's a digitised version of a book revised more than a hundred years after the 1852 second edition scanned by Google books. People who're interested in the history of the language are likely to pick the 1852 edition over the 1957 edition nine times out of ten.

Technical work is currently underway to enable third parties like the Internet Archive's bookserver to more easily redistribute our ePubs. For some semi-arcane reasons it's linked to upcoming new search functionality.

What LibraryThing metadata can the NZETC reasonable stuff inside it's CC'd epubs?

This is the second blog following on from an excellent talk about librarything by LibraryThing's Tim given the VUW in Wellington after his trip to LIANZA.

The NZETC publishes all of it's works as epubs (a file format primarily aimed at mobile devices), which are literally processed crawls of it's website bundled with some metadata. For some of the NZETC works (such as Erewhon and The Life of Captain James Cook), LibraryThing has a lot more metadata than the NZETC, becuase many LibraryThing users have the works and have entered metadata for them. Bundling as much metadata into the epubs makes sense, because these are commonly designed for offline use---call-back hooks are unlikely to be avaliable.

So what kinds of data am I interested in?
1) Traditional bibliographic metadata. Both LT and NZETC have this down really well.
2) Images. LT has many many cover images, NZETC has images of plates from inside many works too.
3) Unique identification (ISBNs, ISSNs, work ids, etc). LT does very well at this, NZETC very poorly
4) Genre and style information. LT has tags to do fancy statistical analysis on, and does. NZETC has full text to do fancy statistical analysis on, but doesn't.
5) Intra-document links. LT has work as the smallest unit. NZETC reproduces original document tables of contents and indexes, cross references and annotations.
6) Inter-document links. LT has none. NZETC captures both 'mentions' and 'cites' relationships between documents.

While most current-generation ebook readers, of course, can do nothing with most of this metadata, but I'm looking forward to the day when we have full-fledged OpenURL resolvers which can do interesting things, primarily picking the best copy (most local / highest quality / most appropiate format / cheapest) of a work to display to a user; and browsing works by genre (LibraryThing does genre very well, via tags).

October 30, 2009

Nokia N97 firmware upgrade (and update for E71)

I don’t have a Nokia N97 but people are asking me about flashing their phones and I’m just collecting a few links etc here for them. This is getting asked more often now version 2/20 is out.

Basically the procedure is the same as my earlier article here except the upgrade is done over the air through menus seems to be the recommended procedure.

As usual the UK is slow at getting updates. For the N97 you can’t set to generic UK product code yet as no carrier has stepped up the plate (yet). To see valid codes look at this All About Symbian article.

The other thing you need to do after the upgrade is to send the settings for Internet and MMS from your carrier. For example Vodafone UK N97 is here and E71 is here.

And remember, backup, backup, backup. I would recommend getting data out in a format other than Nokia backup if you can as an extra precaution.


October 16, 2009

Interlinking of collections: the quest continues

After an excellent talk today about LibraryThing by LibraryThing's Tim, I got enthused to see how LibraryThing stacks up against other libraries for having matches in it's authority control system for entities we (the NZETC) care about.
The answer is averagely.
For copies of printed books less than a hundred years old (or reprinted in the last hundred years), and their authors, LibraryThing seems to do every well. These are the books likely to be in active circulation in personal libraries, so it stands to reason that these would be well covered.
I tried half a dozen books from our Nineteenth-Century Novels Collection, and most were missing, Erewhon, of course, was well represented. LibraryThing doesn't have the "Treaty of Waitangi" (a set of manuscripts) but it does have "Facsimiles of the Treaty of Waitangi." It's not clear to me whether these would be merged under their cataloguing rules.
Coverage of non-core bibliographic entities was lacking. Places get a little odd. Sydney is "http://www.librarything.com/place/Sydney,%20New%20South%20Wales,%20Australia" but Wellington is "http://www.librarything.com/place/Wellington" and Anzac Cove appears to be is missing altogether. This doesn't seem like a sane authority control system for places, as far as I can see. People who are the subjects rather than the authors of books didn't come out so well. I couldn't find Abel Janszoon Tasman, Pōtatau Te Wherowhero or Charles Frederick Goldie, all of which are near and dear to our hearts.

Here is the spreadsheet of how different web-enabled systems map entities we care about.

Correction: It seems that the correct URL for Wellington is http://www.librarything.com/place/Wellington,%20New%20Zealand which brings sanity back.

October 11, 2009

Open Source Inside

Following on from my introductory post I thought I’d talk a little bit more about how our IT is run.

As a young organisation we have had the chance to build our IT infrastructure from scratch and my predecessor Suran Naidoo (interim Head of IT) did a sterling job with his team to pull it together.

The tenets that were set down for the infrastructure were that we didn’t need to invest a large amount of capital, that it was software as a service (SaaS) or platform as a service and that we favoured open source wherever possible.

So the infrastructure that we have built is largely outsourced to a range of partners. The majority of servers are RedHat Enterprise Linux or Ubuntu Server with a few Windows servers thrown in for internal purposes. These servers are virtualised around VMWare and we have dipped our toe in the water a little with AWS (Amazon Web Services).

The groupware/collaboration is largely handled by Google Apps – we use this for email and some document collaboration.

We use a range of SaaS services – SAP Business By Design, Salesforce, Spigit, Wordpress.

Our websites are largely built around the LAMP stack (Linux, Apache, MySQL, PHP) along with a range of other products such as Bugzilla, Mailman, MediaWiki and soon Drupal.

On the desktop side we use Microsoft Windows XP/Vista and Mac OS X with a smattering of Ubuntu. At present we largely use Microsoft Office, and the majority of people browse with Firefox or Safari but Chrome is coming up fast also.

It was a bit of a challenge pulling this altogether – especially in a short time frame. We also found some open source software and SaaS platforms weren’t quite as ready as they claimed to be and had to change course. We didn’t quite achieve our goal of no servers on site, but we do have a lot less than any other site that I have worked on.

Stay tuned for future episodes where I’ll talk about my thoughts on the cloud, choosing open source vs closed source, how do you live open source when you are doing SaaS.


October 07, 2009

T minus 5 days

Five days from now we'll be in the air, on the way to Chicago (which is actually in exactly the wrong direction), and then onto London.

(Did I tell you we were moving to London?  Oh well, now you know!)

Before then, we have to:

  • Sell our remaining stuff
  • Have people who have bought stuff and not collected it, collect it
  • Put boxed stuff on a boat
  • Visit our "Canadian family" for Thanksgiving
  • Fill in the holes in the walls, even though we didn't put them there
  • Lots of cleaning.

Last Saturday we saw Russell Peters - perhaps not the "world's greatest comedian", as the intro announcer suggested, but definitely a very funny guy.  Half his act is race jokes (Peters is Indian, which pretty much lets him riff on whatever racial group or stereotype he wants), and the other half is embarassing the front row, especially couples of mixed ethnicity.

Sunday we had a lovely meal with Fern's old workmates from Manulife Financial, including no less than three different types of dessert.  I baked brownies.  By which, I mean "Cindy gave us brownie mix a year and a half ago; we bought a brownie pan six months later, and since we're leaving in a week, we should use both".  There is only so much you can get wrong in "mix water, oil, an egg, and this packet of powder"; I think it would have been nicer with standard vegetable oil instead of extra-virgin olive oil.

Talking of stand-up comedy, "extra-virgin" is something George Carlin would have had at:

That's another complaint of mine - too much use of this prefix "pre". It's all over the language now — "pre"-this, "pre"-that, place the turkey in a "pre-heated" oven. It's ridiculous! There are only two states an oven can possibly exist in: Heated or unheated! "Pre-heated" is a meaningless fucking term! It's like "pre-recorded" — "This program was pre-recorded." Well, of course it was pre-recorded! When else are you gonna record it, afterwards?

Perl is Unix

Ryan Tomayko asks, so I deliver. Much like Jacob Kaplan-Moss did, I copied the comments as closely as they made sense, with adaptations. I skipped the child SIGINT handler from Ryan’s code since its behaviour is default in Perl. I took some further licence in using a tiny module that I maintain, which doesn’t ship with Perl: Proc::Fork, a forking DSL that I find makes intent much clearer than the standard C-ish fork(2) idiom (which Perl provides verbatim).

#!/usr/bin/perl
use 5.010;
use strict;

# simple preforking echo server in Perl
use Proc::Fork;
use IO::Socket::INET;

sub strip { s/\A\s+//, s/\s+\z// for my @r = @_; @r }

# Create a socket, bind it to localhost:4242, and start listening.
# Runs once in the parent; all forked children inherit the socket's
# file descriptor.
my $acceptor = IO::Socket::INET->new(
    LocalPort => 4242,
    Reuse     => 1,
    Listen    => 10,
) or die "Couln't start server: $!\n";

# Close the socket when we exit the parent or any child process. This
# only closes the file descriptor in the calling process, it does not
# take the socket out of the listening state (until the last fd is
# closed).
END { $acceptor->close }

# Fork you some child processes. The code after the run_fork block runs
# in all process, but because the child block ends in an exit call, only
# the parent executes the rest of the program. If a parent block were
# specified here, it would be invoked in the parent only, and passed the
# PID of the child process.
for ( 1 .. 3 ) {
    run_fork { child {
        while (1) {
            my $socket = $acceptor->accept;
            $socket->printflush( "child $$ echo> " );
            my $message = $socket->getline;
            $socket->print( $message );
            $socket->close;
            say "child $$ echo'd: '${\strip $message}'";
        }
        exit;
    } }
}

# Trap (Ctrl-C) interrupts, write a note, and exit immediately
# in parent. This trap is not inherited by the forks because it
# runs after forking has commenced.
$SIG{ 'INT' } = sub { print "bailing\n"; exit };

# Sit back and wait for all child processes to exit.
1 while 0 < waitpid -1, 0;

September 23, 2009

Software Freedom Day 2009 - success

Well it turned out to be a very successful day. We handed out lots of CD’s and flyers and books. And we were 20 FOSS professionals and students that turned out on a sunny September day to promote our favourite software model.
Photos

The the After SFD meeting was also a well attended event. There were 17 registered but 28 turned up. So there were interesting FOSS Lightning Talks and a discussion between 6 different FOSS societies about how to work closer together in the future.
After all this we went to local pub for a mingle and dinner.

All in all it was a great day and lots of FOSS chat.

/Roger

September 19, 2009

eBook readers need OpenURL resolvers

Everyone's talking about the next generation of eBook readers having larger reading area, more battery life and more readable screen. I'd give up all of those, however, for an eBook reader that had an internal OpenURL resolver.

OpenURL is the nifty protocol that libraries use to find the closest copy of a electronic resources and direct patrons to copies that the library might have already licensed from commercial parties. It's all about finding the version of a resource that is most accessible to the user, dynamically.

Say I've loaded 500 eBooks into my eBook reader: a couple of encyclopedias and dictionaries; a stack of books I was meant to read in school but only skimmed and have been meaning to get back to; current block-busters; guidebooks to the half-dozen countries I'm planning on visiting over the next couple of years; classics I've always meant to read (Tolstoy, Chaucer, Cervantes, Plato, Descartes, Nietzsche); and local writers (Baxter, Duff, Ihimaera, Hulme, ...). My eBooks by Nietzsche are going to refer to books by Descartes and Plato; my eBooks by Descartes are going to refer to books by Plato; my encyclopaedias are going to refer to pretty much everything; most of the works in translation are going to contain terms which I'm going to need help with (help which theencyclopedias and dictionaries can provide).

Ask yourself, though, whether you'd want to flick between works on the current generation of readers---very painful, since these devices are not designed for efficient navigation between eBooks, but linear reading of them. You can't follow links between them, of course, because on current systems links must point either with the same eBook or out on to the internet---pointing to other eBooks on the same device is verboten. OpenURL can solve this by catching those URLs and making them point to local copies of works (and thus available for free even when the internet is unavailable) where possible while still retaining their

Until eBook readers have a mechanism like this eBooks will be at most a replacement only for paperback novels---not personal libraries.

September 17, 2009

IT at the Symbian Foundation

I’m Ian McDonald and I’m Head of IT at the Symbian Foundation. I come from a strong open source background and have used open source software since the 1980s.

me3

I started using open source software doing my undergraduate degree at the University of Waikato. I used to download software from around the world and soon got told off for using too much of New Zealand’s bandwidth (It was 2.4Kbits/sec for the whole country!!).

I’ve deployed open source software at a number of large corporates including building work management software using largely open source tools at NZ’s largest telco in the 1990s.

I have also served on the committee and was then president of WLUG which was one of New Zealand’s strongest open source societies (despite the name it was far more than Linux).

Personally I’ve also got code into projects such as ttcp, iperf and my largest contribution is into the Linux kernel for a new networking protocol DCCP. Hopefully I’ll also start working on the Symbian platform as well in the not too distant future!

At the Symbian Foundation we use a lot of open source software internally and I’m looking to increase this further. In a number of posts coming up I’ll outline how we use open source and what are the challenges for running IT on open source.

(Also published here on the Symbian blog)


September 16, 2009

Thoughts on koha


The Koha community is currently undergoing a spasm, with a company apparently forking the code.
As a result a bunch of people are looking at where the community should go from here and how it should be led. In particular the idea of a not-for-profit foundation has been floated and is to be discussed at a meeting early tomorrow morning .
My thoughts on this issue are pretty simple:
  • A not-for-profit is a fabulous idea
  • Reusing one of the existing software not-for-profit (Apache, Software in the Public Interest, etc) introduces a layer of non-library complexity. Libraries are have a long history with consortia, but tend to very much flock together with their own kind, I can see them being leary of a non-library entity.
  • A clear description of a forward-looking plan written in plain language that everyone can understand is vital to communicate the vision of the community, particularly to those currently on the fringes

September 07, 2009

Software Freedom Day 2009 - podcast (Swedish)

A colleague of mine Mathias Firman from the Swedish Linux Society has just released an interesting PodCast for Software Freedom Day 2009. It is a talk about the secret and exiting world free software.
Worth a listen. http://torrent.arrakis.se/SoftwareFreedomDay/SoftwareFreedomDaySverige_Podcast1.ogg.torrent It is in the free format ogg so you need a compatible player to here this important message. More info about how to play ogg file’s here

/Roger Sinel

September 04, 2009

Gitalist

Dan Brook:

The idea behind this project is to move gitweb.cgi away from a single monolithic CGI script and into a modern Catalyst app. Fortunately this is not as daunting as it might seem at first as gitweb.cgi follows an MVC type structure. Once gitweb.cgi has been suitably Catalysed then it can move from being a “this was once gitweb.cgi” to a project of its own (hence the “transitional” in the description).

September 01, 2009

Data and data modelling and underlying assumptions

I feel that there was a huge disconnect between some groups of participants at #opengovt (http://groups.google.co.nz/group/nzopengovtbarcamp) in Wellington last weekend. This is my attempt to illuminate the gaps.

The gaps were about data and data modelling and underlying assumptions that the way one person / group / institution viewed a kind of data was the same as the way others viewed it.

This gap is probably most pronounced in geo-location.

There's a whole bunch of very bright people doing wonderful mashups in geo-location using a put-points-on-a-map model. Typically using google maps (or one of a small number of competitors) they give insights into all manner of things by throwing points onto maps, street views, etc, etc. It's a relatively new field and every time I look they seem to have a whizzy new toy. Whizzy thing of the day for me was http://groups.google.com/group/digitalnz/browse_thread/thread/b5b0c96ce08ca441 . Unfortunately the very success of the 'data as points' model encourages the view that location is a lat / long pair and the important metric is the number of significant digits in the lat / long.

In the GLAM (Galleries, Libraries, Archives and Museums) sector, we have a tradition of using thesauri such as the Getty Thesaurus of Geographic Names. Take all look at the entry for The Wellington region:http://www.getty.edu/vow/TGNFullDisplay?find=wellington&place=&nation=New+Zealand&prev_page=1&english=Y&subjectid=7000512

Yes, if has a lat and a long (with laughable precision), but the lat and long are arguably the least important information on the page. There's a faceted hierarchy, synonyms, linked references and type data. Te Papa have just moved to Getty for place names in their new site (http://collections.tepapa.govt.nz/) and frankly, I'm jealous. They paid a few thousand dollars for a licence to thesaurus and it's a joy to use.

The idea of #opengovt is predicated on institutions and individuals speaking the same languages, being able to communicate effectively, and this is clearly a case where we're not. Learning to speak each others languages seems like it's going to be key to this whole venture.

As something of a worked example, here's something that I'm working on at the moment. It's a page from The Manual of the New Zealand Flora by Thomas Frederick Cheeseman, a core text in New Zealand botany, see http://www.nzetc.org/tm/scholarly/tei-CheManu-t1-body1-d22-d5.html The text is live on our website, but it's not yet fully marked up. I've chosen it because it illustrates two separate kinds of languages and their disparities.

What are the geographic locations on that page?

* Nelson-Mountains flanking the Clarence Valley
* Marlborough—Kaikoura Mountains
* Canterbury—Kowai River
* Canterbury—Coleridge Pass
* Otago—Mount St. Bathan's

The qualifier "2000–5000 ft" (which I believe is an elevation range at which these flourish) applies across these. Clearly we're going to struggle to represent these with a finite number of lat/long points, no matter how accurate. In all likelihood, I'll not actually mark up these locations, since the because no one's working with complex locations, the cost benifit isn't within sight of being worth it.

Te Papa and the NZETC have a small-scale binomial name exercise underway, and for that I'll be scripting the extraction of the following names from that page:

* Notospartium carmichœliœ (synonym Notospartium carmichaeliae)
* Notospartium torulosum

There were a bunch of folks at the #opengovt barcamp who're involved in the "New Zealand Organisms Register" (http://www.nzor.org.nz/) project. As I understand it, they want me to expose the following names from that page:

* Notospartium carmichœliœ, Hook. f.
* Notospartium torulosum, Hook. f.

Of course the name the public want is:

* New Zealand Pink Broom
* ? (Notospartium torulosum appears not to have a common name)

Note that none of these taxonomic names actually appear in full on the page...


Yes is, clearly, an area where the best can be the good and visa versa, but the good needs to at least be aware of the best.

August 20, 2009

.xz used in Fedora 12 Alpha

I was just reading the Fedora 12 Alpha release notes and was interested to see that they have switched all the software packages in Fedora from Gzip to the more efficient XZ (LZMA) compression method.
These compression method is usually used in the kernel, so I just had to try it on my Fedora 11 workstation…. guess I’m a convert now…. there is no point in wasting space when there is space to save.


[root@greyf11 log]# ll -h messages_copy
-rw------- 1 root root 495K 2009-08-20 13:40 messages_copy

[root@greyf11 log]# xz -zv messages_copy
messages_copy (1/1)
100.0 % 27.8 KiB / 494.2 KiB = 0.056

[root@greyf11 log]# ll -h messages_copy.xz
-rw------- 1 root root 28K 2009-08-20 13:40 messages_copy.xz

[root@greyf11 log]# xz -dv messages_copy.xz
messages_copy.xz (1/1)
100.0 % 27.8 KiB / 494.2 KiB = 0.056

/Roger Sinel

August 17, 2009

This is very old

I don’t maintain this blog anymore, but I do maintain a new one at:

 http://plsadventures.blogspot.com

Go there instead.

August 15, 2009

Onion typography

Phillip Smith:

I wondered what would happen if these same patterns were applied to other Perl “brands” out there in the wild. I’m personally interested in those brands that newcomers to Perl would probably perceive as the (de-facto) “official” Perl community […] In the graphic design community, just like the Perl community, there are established conventions and patterns – think Perl Best Practices – to guide people’s work. […] Over the next few weeks, I plan to continue this journey to document some existing design patterns in the Perl community with the intention of sketching out the goal posts of a Perl graphic standards guide.

August 12, 2009

Wizards and Runes

Jonathan Hoefler:

John visited my studio, where I was working on a set of roman capitals that would ultimately become the Requiem typeface. He had some suggestions about the design, which like most critiques were especially hard to articulate; typography suffers from a poverty of terminology. Eyeing two bottles of Rich Art poster paint in my taboret, John reached for these along with a sheet of typing paper, and the cheap plastic paintbrush that I kept for dusting my keyboard. In a few effortless strokes of black, he perfectly reproduced Requiem’s capital S, waited a moment for the paint to dry, and then reloaded the brush with white to render his corrections. The whole shebang couldn’t have taken fifteen seconds, most of it spent waiting for paint to dry. I just stared: it was like watching someone fold a paper napkin into a remote control helicopter, and then pilot it around the room.

July 28, 2009

Software Freedom Day 2009

So this time has rolled around again. And It’s time to start organising this important event again.

http://forum.softwarefreedomday.se/viewtopic.php?f=4&t=104

http://softwarefreedomday.org/

/Roger Sinel

July 27, 2009

Learning XSLT 2.0 Part 1; Finding Names

We mark up a lot of names, so one of the first things I decided to do was to build an XSLT stylesheet that takes a list of names and tags those names when they occur in a separate XSLT file. To make things easier and clearer, I've ignored little things like namespaces, conformant TEI, etc, etc.



First up, the list of names, these are multi-word names. Notice the simple structure, this could easily be built from a comma seperated list or similar:



<?xml version="1.0" encoding="UTF-8"?>
<names>
<name>Papaver argemone</name>
<name>Papaver dubium</name>
<name>Papaver Rhceas</name>
<name>Zanthoxylum novæ-zealandiæ</name>
</names>

Next, some sample text:



<?xml version="1.0" encoding="UTF-8"?>
<doc>
<p> There are several names Papaver argemone in this document Papaver argemone</p>
<p> Some of them are the same as others (Papaver Rhceas Papaver rhceas P. rhceas)</p>
<p> Non ASCII characters shouldn't cause a problem in names like Zanthoxylum novæ-zealandiæ AKA Zanthoxylum novae-zealandiae</p>
</doc>

Finally the stylesheet. It consists of three parts: the regexp variable that builds a regexp from the names in the file; a default template for everything but text(); and a template for text()s that applies the rexexp.



<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

<!-- build a regexp of the names -->
<xsl:variable name="regexp">
<xsl:value-of select="concat('(',string-join(document('name-list.xml')//name/text(), '|'), ')')"/>
</xsl:variable>

<!-- generic copy-everything-except-texts template -->
<xsl:template match="@*|*|processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="@*|*|processing-instruction()|comment()|text()"/>
</xsl:copy>
</xsl:template>

<!-- Look for binomal names in appreviated form where the genus name is in the immediately preceeding head -->
<xsl:template match="text()">
<xsl:analyze-string select="." regex="{$regexp}">
<xsl:matching-substring>
<name type="taxonomic" subtype="matched">
<xsl:value-of select="regex-group(1)"/>
</name>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>

</xsl:stylesheet>


The output looks like:



<?xml version="1.0" encoding="UTF-8"?><doc>
<p> There are several names <name type="taxonomic" subtype="matched">Papaver argemone</name> in this document <name type="taxonomic" subtype="matched">Papaver argemone</name></p>
<p> Some of them are the same as others (<name type="taxonomic" subtype="matched">Papaver Rhceas</name> Papaver rhceas P. rhceas)</p>
<p> Non ASCII characters shouldn't cause a problem in names like <name type="taxonomic" subtype="matched">Zanthoxylum novæ-zealandiæ</name> AKA Zanthoxylum novae-zealandiae</p>
</doc>

As you may notice, I've not yet worked out the best way to handle the 'æ'

June 28, 2009

Political Compass

It’s been a while since I’ve taken any sort of quiz like this, so when David Farrar from Kiwblog posted his results today it prompted me to give it another go.

My Political Views
I am a center-right moderate social libertarian
Right: 1.33, Libertarian: 1.97

Political Spectrum Quiz

I completed the quiz pretty quickly and felt the need to answer ‘it depends on the specifics’ to many of the questions, so take the results with a grain of salt. I think it is a reasonably accurate description of me though.

June 27, 2009

Three Evolutionary Stages of Version Control

Recently, an edit to the VersionControlSystem article on the WLUG wiki drew me in to expand on it. My own work on that edit ended up taking up 4 days (in spurts), in the course of which I would add relevant points, only to notice that the additions revealed some structure trying to emerge, which would then cause me to go back to rearrange the text, which in turn would remind me of more relevant points to add, several times over. It was much more work than I expected, and I was quite tired and wanting to be done by the end.

However, the effort ended up crystallising my conceptualisation of version control systems quite thoroughly. I have long been aware of all the individual points I wrote about, but it was only during this work that I came to understand their relations systematically. In order to make the effort expended more worthwhile, and because I do not know of any other article summarising the evolution in this manner (although of course I may well be ignorant), I thought I should give the article additional exposure by also posting it on my weblog.

Remember that getting from each of step to the next in this sequence took a long time, both because it took time to realise that there was a problem in the first place, and because the respective right solution was not clear from foresight – trivially obvious as it may all seem when you see it laid out like here.

1+1: One Repository, One Working Copy

The design of the earliest systems revolved around versioning a single working copy, directly edited by all users. To prevent attempts at simultaneous modification of a single file, editing was not allowed without checking files out, which only one user at a time could do for any given file.

Having to give each user access to the same machine and file system in order to work on code was natural at the time these systems were designed, in the mainframe era, but today would obviously be a problem. Also, the requirement to check files out was a cause of friction even at the time, since everyone has to wait on one another – not to mention that someone might forget to check a file back in before leaving on vacation.

1+n: One Repository, Many Working Copies

The next evolutionary step was to decouple the repository from the working copy, so that there may then be many working copies. The exemplar in this class of systems, known as centralised VCSs, is CVS. It lifts the obvious restrictions of earlier systems with a design in which the repository is mediated by a server. Multiple users can collaborate by each checking out a private working copy of the project.

Note that in CVS, “checking out” no longer implies locking. (In other centralised VCSs, it may; eg. Visual SourceSafe. In some, such as Perforce, it is optional.) Checking in changes is simply blocked if someone else has already checked in other changes in the meantime. Before the latecomer is allowed to check in their own changes, they have to update their working copy with the upstream changes, resolving any conflicts manually.

This works reasonably well. CVS ended up as the de facto standard for a decade.

However, its single-repository nature, subsequently adopted by most following major systems, perpetuates problems harking back to the earlier model – and adds new ones:

  • Checking in changes under such a system requires a network connection, as do most operations related to the project history. Besides the fact that this makes offline work nearly impossible, it also imposes a major performance penalty, since networked operations are inescapably slow. Some systems, like Subversion, try to selectively speed up some of these operations by keeping more data in the working copy, but the benefit of this is uneven across operations. Further, high traffic repositories may require rather beefy servers and connections to sustain.

  • Anything checked in is always public; this means one has to be very careful about the state of commits. It also makes it impossible to touch up history (eg. to fix common mistakes like forgetting to include a new file in a commit). Branches become a big deal: all commits are publicly visible, no matter how experimental. Also, branch names are forced into a global namespace so a lot of thought has to be given to choosing them.

  • Branching is problematic for more reasons too. Most of these systems do not support branch merging very well: after you do it once, the changes from the merged-in branch are mixed in without any tracking, so later attempts to merge the same branch will result in lots of artificial conflicts. This makes it very difficult to keep branches in synch. But the longer branches go without merging, the more effort it takes to merge them. All this adds up to a large barrier, psychological and otherwise, against branching.

  • The single-repository nature means that anyone who wants the safety of revision control needs to have write access to the same repository. And since branching is badly supported, everyone with access to the repository is generally going to be working on the same trunk. This means write access has to be given out selectively, to competent people only, resulting in political headaches within projects, while outsiders are forced to create their patches in an unversioned ghetto.

n+n: Many Working Copies, Paired With Equally Many Repositories

The solution to all this was to not only give each collaborator a separate working copy, but a separate repository also. This class of system, whose pioneering solid implementation was BitKeeper, is known as distributed version control systems. The technical basis that allows this is algorithmic merging: 3-way merging (in the simplest case) allows combining non-overlapping changes automatically, and merge point tracking allows repeatedly merging branches without unnecessary conflicts.

Since each collaborator has their own repository and can make commits, the effect is that everyone has their own private branch, with full versioning for local changes, and these branches can be published at the discretion of their author and can be merged by others easily. Actually, each collaborator often has several local branches – since merging is easy and branches never ”need” be published, it is painless to create short-lived branches for experiments or tests, to use them as a general workflow aspect (eg. start a new branch for every separate bug fix), or for any other purpose, whether intended for public consumption or not.

Everyone has full offline access to the project history, and all repository operations (except pushing or pulling changes, obviously) take place at full local disk speed.

All this immensely accelerates collaborative development and removes the political headaches surrounding commit access.

June 26, 2009

GPG Keysigning Update

From the better late than never category… I finally got around to signing keys from the LCA2006 key signing party, the verification sheet from which has travelled with me from NZ to Dublin and then sat on my desk for a few years. I inevitably lost a few of my notes and verifications along the way, so if you were still expecting a signature from me and didn’t get one let me know!

The main hold up for me has been that my previous key signing system, a home grown script, was overly complex and involved me sending an encrypted token to each UID that I waited to receive back before issuing the signature. Lots of work for me, and much hassle for those whose keys I am signing. I’ve reverted back to the more standard method of signing and encrypting the signature to each UID and then throwing my copy of the signature away. Unless the recipient controls the UID and can decrypt the message, the signature will never be released to the world.

I’ve adopted pius as my new signing tool of choice, with a few extra patches to help me maintain my database of signature details and the corresponding verification pages at http://www.mattb.net.nz/pgp/signatures which are linked from the Policy URL packet of each signature I make. I guess I’ll tidy up the patches over the next few days and see if there is any interest in getting them merged.

June 22, 2009

Scalping pt. 2

Old men make amusing post-punk rockers.

Old men make amusing post-punk rockers.

I see Green Day are touring NZ. The last time they were there was on the American Idiot tour, and it was a fantastic show. Go if you can, I am going to the Hamilton, ON show in three weeks.

The band and their Aus/NZ promoter (Frontier Touring Company) have announced anti scalping measures. This is a topic of personal interest, so I'll mention them here:

  • you only get a receipt when you buy a ticket, and you don't get the ticket until 30 days before the date (so you can't sell on Trade Me, which requires you have ticket in hand to sell)
  • 300 GA tickets, a maximum of 2 per person, are available from the box office of the venue, in this case Vector Arena in Auckland, the day tickets go on sale.

Lets look at the second measure first.  With a total capacity in the 12,000 range, Vector probably has a GA limit of at least 3,000.   Therefore, this would imply that 10% of the audience - those lucky 150 people who work in downtown Auckland and can justify queuing for the morning - will get tickets through.

Or, professional scalpers will pay a homeless guy $10 to queue and then take their place at 7.59am.

90% of the GA tickets, and presumably the total 11,700 other tickets, will go on the 'tubenet like always before.  Which leads back into the first measure.  Even if you're only sent a receipt, without Trade Me taking an active part, there will be auctions that read "$300 pen with free Green Day ticket receipt".  New Zealand is both blessed and cursed to only really have one public marketplace, and it's one that has expressed no interest in not taking its cut of the auction proceeds in the past.

Compare and contrast with what we have here: all GA tickets and the best seated tickets are pick-up on the night only, with the purchasing credit card.  If this were matched with a facility where you can return unwanted tickets to the retailer for a fair refund (minus handling perhaps), and have them invalidated, and made available to the pool again - yes, this means checking back later could actually have good reason! - I consider this the perfect solution.  Sure, there will be people who try and sell the invalidated tickets, but a number-checking web site could clear that up quickly.

Now, on-street scalping will never stop, but it's probably not a bad thing that it exists.  Fern and I went to see The Police on the strength of people standing outside who wanted to sell their tickets at face value (albeit a young couple, not the regular toothless hobo scalpers at the Air Canada Centre in Toronto). I even sold a ticket for R.E.M. in London to a scalper when we didn't have a fourth person who wanted to go.

A couple of weeks ago, tickets for Game 5 of the Stanley Cup, the NHL ice-hockey finals, were going on the street in host city Detroit for 1/3 face value, owing perhaps to the fantastic economic climate in host city Detroit.  Some friends of mine made the 3 hour drive down for Game 7, the final in the best-of-7, but by that point the scalpers had figured out which way up to hold their calculators and were charging between $500 and $2100.  The game was instead watched from the Windsor Casino.  You have to be prepared to walk away.

June 10, 2009

Fedora 11 released yesturday

Yesturday my favorite GNU/Linux distro released it’s latest version. For the first time in about 3 years I was not able to download and install on release day. This was due to us releasing a new major version of our software at work, and activities around this release took up most of my day. As soon as I’ve had time to get my copy of Fedora 11 installed I’ll blog about the experience with tips and advice as I usually do.
But anyway, thank-you and congratulations to the Fedora Team another bleeding edge GNU/Linux version.

/Roger Sinel

Creating a DOS USB bootdisk under linux

Every now and then I need a DOS bootdisk to flash a BIOS or similar, and I only have linux with which to create it. I can never remember the quickest way to do this, so I’m documenting it here:

Lifted entirely from this webpage. I’m only archiving it here because content disappears over time.

I needed to upgrade the bios of my Computer (Intel).

But how to do it without windows?

In my case, Intel has many options for bios upgrading and one is the plain old DOS method. This is the best and fastest way to upgrade your bios with linux.
Create a FreeDOS based bootable usb-stick

* Download a FreeDOS image, i’ll use Balder for now.
* Prepare the usb-stick
o check partition (e.g cfdisk /dev/sda)
o mkfs.msdos /dev/sda1

Commands

qemu -boot a -fda balder10.img -hda /dev/sda
A:\> sys c:
A:\> xcopy /E /N a: c:

Check with

qemu -hda /dev/sda

There are, of course, many ways to do this. With recent VirtualBox versions supporting USB passthrough, I could do it entirely from a windows VM. Several other websites suggest installing grub onto the USB disk and having it boot a floppy disk image directly, which also seems like it would work. Your FAT-formatted USB drive would appear as C:, and you can just copy whatever content you like straight onto that.

June 06, 2009

Legal Māori Archive


Now that the
Legal Māori Archive is live, I thought I'd highlight a couple of my favourite texts from the corpus.

The first is a great example of reinforcing cultural confusion.
"The Laws of England, Compiled and translated into the Māori language" by judge Francis Dart Fenton is a bi-lingual compendium of the laws of England, but extraordinarily uses bible quotes as examples.

The second example is actaully a collection of texts, the works of Rev. Henry Hanson Turton, who compiled thousands of pages of land deeds and associated documents into six volumes. I can see these seeing a lot of use by Treaty researchers.

June 04, 2009

Why card-based records aren't good enough

Card catalogs have a long tradition in librarianship, dating back, I'm told, to the book stock-take in the French revolution. Librarians understand card catalogs in a deep way that comes from generations of librarians having used them as a core professional tool all their professional lives. Librarians understand card catalogs in ways that I, as a computer scientist, never will. I still recall on one of my first visits to a university library, I asked a librarian where I might find books by a particular author, they found the work for me arguably as fast as I can now find works with the new wizzy electronic catalog.

It is natural, when faced with something new, to understand it in terms of what we already know and already understand. Unfortunately, understanding the new by analogy to the old can lead to form of the old being assumed in the new. It was true that when libraries digitized their card catalogs in the 1970s and 1980s, they were more or less exactly digital versions of the card catalog predecessors, because their content was limited to old data from the cards and new data from cataloging processes (which were unchanged from the card catalog era) and because librarians and users had come to equate a library catalog with a card catalog---it was what they expected.

MARC is a perfect example of this kind of thing. As a data format to directly replace a card catalog of printed books, it can hardly be faulted.

Unfortunately, digital metadata has capabilities undreamt of at the time of the French revolution, and card catalogs and MARC do a poor job of handling these capabilities.

A whole range of people have come up with criticisms of MARC that involve materials and methodologies not routinely held in libraries at the time of the French revolution (digital journal subscriptions and music, for example), but I view these as postdating card catalogs and thus the criticism as unfair.

So what was held in libraries in 1789 that MARC struggle with? Here's a list:
  • Systematically linking discussion of particular works with instances of those works
  • Systematically linking discussion of particular instances with those instances ("Was person X the transcriber of manuscript Y?")
  • Handling ambiguity ("This play may have been written by Shakespeare. It might also have been a later forgery by Francis Bacon, Christopher Marlowe or Edward de Vere")

All of these relate to core questions which have been studed in libraries for centuries. They're well understood issues, which changed little in the hundred years until the invention of the computer (which is when all the usually-cited issues with MARC began).

The real question is why we're still expecting an approach that didn't solve the problems two hundred years ago to solve our problems now? Computers are not magic in this area they just seem to be helping us do the wrong things faster, more reliably and for larger collections.

We need a new approach to bibliographic metadata, one which is not ontologically bound to little slips of paper. There are a whole range of different alternatives out there (including a bevy of RDF vocabularies), but I've yet to run into one which both allowed clear representation of existing data (because lets face it, I'm not going to re-enter worldcat, and neither are you, not in our lifetimes) and admitting non-card-based metadata as first class elements.

</rant>

May 22, 2009

Mirror mirror

A colleague of mine just sent me this. Why, I may never know, but it's quite cool.

MMMMMMMWMWMMMMM8BMMM@W@@WMMMW@@@MMMMMMMMMMM@MMM@MMW@MMMMMMMM@WMM@@@WW@W@MMMMMMMMMMMMMMMMMMMMMMMMM@M@
MMMMMMWMMWM0BW8BBMMMM@MMMMMM@MMMMMMWWBWMMMMMMM@MMMMMM@MM0@MMMMM@M@MM@MMMMMM@@MMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMBMMWMMM8BMMMMMMMMM@0@MM@B0@MMWM@@MMWWMMMM@MB8BB@MB0W@WMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMWMMWM0WWM0BM@8S@MMMMMMMMMMa0W@MMMMWMWMMBMWWBBBMM@@MMMM@M@Z8WMMMMMMM@W@@@MMMMMMMMMMMMMMMMMMMM@MM
MMMMM@aWM0WMMMMMMMMMMMMMMMB@8X@@8WMMMMM@WBBWX8W@@WBBBM@BM@M@8MMBW8BMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
@MMMMMM0B0ZMM@MB@MB800M@W88WMMMZMMMMMMWZa@MM@MM@WWZaa00MMMM@@@MMMMM@BMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MM@WMMMMMMBZ00WMaZMMMM@2M@MW2MM@MZ7880W8@B0WBWW@@MMMWW8aZW@88ZWMM@MW8W@BMMMMM@MM@MM@@@@@@@M@@@MM@MMM
MM@WMMZMMMMMMMWMMWMM@MMMMM@W@@W@MMMWZXra@MMM0aSX;;SMM8Za2@0ZaZBZZBBM@@@@WW@MMMMMMMM@@MMMMM@MMM@M@@MM
M@M@0M@@MMMM@MMMM0M@MM@MMBZMMMM0MMWMMBS2WZ8Z8BZZZaZ7,0WB8r7SSZWZZ80BWMMM@@MMM@MM@@MMM@@@MM@MMMMMMMMM
WMM@WMMMMMMMMMMWMMMWM@W8Za08a2@WMBSWMMZSZWZ7777SZ8BWBBSaMWaa80Z2a88Z880MMMMMMMMMMMMMMM@MMMMM@@M@M@MM
MM8MMMBMM@BW@@WBWBWW0BWM@M@MMWBZ8S7aaB2S;8ZrS2SS2XXrr7XX78MMBZ2aZZZ88a2a0BMMMM@MM@MMMMMM@M@MMMMM@@MM
@MM@MMMMM@WBB@WW@BB@MMMMMWBZ8MMMB822XXXaa2M22S2SSSS2SSSSXXXXX22aa2aZaZZaZ2Z0@MMMMMMMMMMMMMMMMMMMMMMM
BZ0BM@@@MMMMMM@MM2MM@M8WB00aaWBaSaXXXS2SX7SX222SSS222aaZa22S2a2Za2aaaa2ZZZ808WMMM@M@MMMMMWMMMMMMM@MM
MMMMMMMMMMM@MMMMWaM@MBWMMWB02SBW777S7XS22SSXXSS22S22aa22aaaaaZZ8ZZZZaZZaSZZ880@MMMM@MMBrXSrBMMMMM@MM
M@@W@W@WMMMMMMMMM@W0WW8W2Za7X7ra@B8WSXXS22S2S22222Z222aaaaZZaZZaZ888ZZ8ZZ22ZZa0WMMMMMMBrWa;,ZMMM@@MM
MWMMMWMM@BWW@MB@8BWMWBZ2X2XXS2XX777X7X2XS2S2Sa2S2S2aZaaaZZZZaaZZaZZ8WWWWWWW0aZ28W@MMMMWXM0S;.SMMM@MM
MWB0BMZ@@MBMWMMMMMMMB87SSSS22S222XXXSSSXXS22222aZa2aZZ22a2222aaa2Sa800WMMMMMMM8Z8WMMMM@SB8B2ii:MM@MM
@M@@0MBB0Z8@MMWBBWBMSXXXS02SSSS2S2a22SS2SSSS22S2a2Z222a2SXSXrXXXWM@@@MMWW08WMMMM08WBMMMSBZ@M8r,BM@MM
W@@MMBMMMBWW00W@Z0MM8SSa@BSaSSSSSSS22SSS2S22Sa2a22a2SSXrXSZa7BMMW@MM@8aSX2Z220080@WBWMMMa7XZWMX:MMMM
MMWBW0W0B808WMM@WM@WWMMMaXS2S22a22a222S22222aa2ZS2S2aS2Z@WM0MMMWMM2;iii7S2SSaaZZZ2@WB0MM@07rraMi@MMM
BMMMM@BMWMMZ8XMZXXX7SX;rS22222a22aaS2222a2a222aSaSXSXX0W000Z8WW0X:7aWMMMMM@088ZZ2S8WW8WMMSi22i7BSMMM
@MM@aBW@@WZBMZM222aa2Z8ZaS2aa2SaaZa2aZ8aa2222SSS222a2Z8ZZ00BM8220@MMB80WWMM0ZZ00Z2SZWB0BMWr:;SXX8WMM
MMMMMMMM@8aWB@MBrXX7r2a2a2aaaaaaaa22Sa2222SS2S2SSSX2ZZZ0ZZSXaBB0M0000X,:S8WBB08a2aSS8WW8WMBaXXaXSaMM
MMM@M@@MWMMM0ZWMMM@MMM22ZaZ22aaa2a2Z2Z22SSSS22S7rXSXXXZaS7S0M8@W0M2MMMZ8WW@B8ZaZZ0ZaZ0@B0WMB0aBB0;MM
MWWMBMMWWBW@M@ZZ80B0X7SZa2aaZaaaS2XSXXSX28X7rXXr;22XXZ2XSZM@ZZ2.WMMMMBW@B00Z2SXSSSZaZ8BWWBMa2i:7@XWM
@MMMMWBBM@WB@W808SS222aa2aZZZaa22SX7XXZXXXiSX2S7XXSSXX7S0MWaaB,,;8Z;XZaZ88Z22a2aa222a888@WWWS8SX7BWM
MMMMMMMMMWM@@@0B0ZaZZaaa2aZZa2SS7SS80X8ZX8aXZ7XSXrXSXSS8@Wa20MM@ZaZX8ZZZaaZZa22aaaaa2a2ZaWMMWX7SZZZM
MM@@M@MWMW@@@MBWBaZaaa2ZaZa2aSXSZZ8Z00aZ20Z0Z2aSS77XSZZW@8XaZXX2aZ8ZaaaSXXS22X2a2ZZ888Za2ZW@M8;XZa7M
MMMMMMMMM@W@M@WWBaZZ2aa8Zaaa2XaB@00Z00B08SXS7XXXX22S22a8MWW0aXSaZa22SSXSaaZZZZZZZZaZZZ0BZaZ@MMri7XSM
MMMMMM@@@MMMM@MW0ZaZaaZZa88aZa@MM0@M@0SSaa80BB0ZaZa2SSXSaBW@WW08aa2aaZ8ZZa222222SS22aZa800Z0@MW  XMM
MMMMMMMMM@MMMMM@0ZZ8ZZZ0BWW88WMMW0Z2a80M@WW0880WWZaS2SSS2ZZ8B000Za222a2S2a22222a228222ZZZ8080MMMMMMM
MMMMMMMMMMMMMMM@0ZZ888@MM@MMMM8ir70BWWBaa8ZaZ0S8B0ZS2S2S2aZ88ZZ8888Zaa2SSSS22S2Z@a2X2aaaZZ8880MMMMMM
MMMMMM@M@MMMMMMM08aZ88MMM@@ZrirZMMMMWMB2 X0BB878BB8a2SSXS2aZ88Zaa2SXXXXXXX;;rXSXSXS2aa22aaa8Z8BMMMMM
MMMMMMMMMMMMMMMM@Z8aZMMWaSXSaBMM2ZMM0MMMBBZ22X2W808082XXSa8Z8WB8aaS;ir7XXXXX7r77SSSS22aaZZZZ2aZ0MMMM
MMMMMMMMMMMMMMMMMWZZSMMM0Z2a@MMa :ZW08r7S2ZaSS0Z08Z88Z22a2aZZ8BWBB@MWaXSXS222SXXXXXSSS222ZZZa2ZZBMMM
MMMMMMMMMMMMMMMMMM0a2WWZZZ8ZWMMWBWBBaZaSaZX7a8Z28ZZ80B0aaaaaaZ8WB8880@8SXXXSS22aaa222SSa2aa8ZSaZ8MMM
MMMMMMMMMMM277XMMMM0a8@8a2Z80BBBW0ZaSa8aSSSaZaaa8ZZ8808aZZaa2Za888aSS0WSa2aaXXSS22aaZaSa2SXZZ2aZaBMM
MMMMMMMMMMWrX2aXaMMM8ZB82a2Z000Z8ZZZZ2XS2Z8Za222ZZaZ80ZaSSSr7SS2Z288Z882a22Z2222SSXXSSZ272S2ZZS2a8MM
@MMMM@@MMMM8 S00rXMMMZ0BaSZ088ZaaS22SS2Z8Z2S22SaaaSZ8BBZ2SX:.7Xa8B@XZ0ZaZZ2aZaaaaSS2Z2SZ77XSa8SS2ZMM
@@M@@@@MM@MM; S@0SS@M@880aSaaZ2SSS22a8ZZ22S2SX2XXS2WB008Z22SXXS2aM2a8aaZ28aZZa2XSSZBB2aXX;7X28Z2ZZMM
@@M@@@@MMMMMM. aMZ;,BMMW08aa22a228Z2aS2SS2Z2Sa8X7XZWZSS2Z808a2aaBWaaaZ8BaZSXSXa0W@@M07Sr7rXXXZZ2aa@M
MMMMMMM@@@@MMM ,ZWr;7iaMW88Z2ZaZaa22a222ZaXXX77SSSa0ZXZZSZWW@8BB8SXXSaSXXX2Z8WMMBMM0SX27XSSXSaZ222MM
MMMMMMMMMM@M@MM..WX;2;:8MB88aZa8aZZaaS222S77XXXSS222Z08ZZaXX2Z2X77XX7XXSaZ8MMMW. B@Sr7SXXX7XSaZ22SMM
MMMMMMMMMMMM@MMMai2;S2;78MW82ZZ8Zaaa2aZ2SXX77X22SSXS2aSaZZZaSXSSXXXXXaZaBMMr r2SWMBSX2aSXXXX222aa2MM
MMMMMMMMMMMMM@MMMM8S;2Z2XZMBaZSZZZaa2S2SSX7XSS22SSX22aaZZaa2SX77X2S2a0MMMiir ,8MB0ZXX222XSSXXSaZ2ZMM
MMMMMMMMMMMMMM@@MMMMSXZZir8MBZ2aZ0aZa22SaXXSXS2a2SS2aaZZZaSXX22aa8ZZMMSrS: iM0aB8BaS222SS2SXS2aa20MM
MMMMMMMMMMMMMM@MMMMMMSX8i7rZMWZZaZZZa222S222Sa2a222aaZaS2SSZ8ZZWMMM@0   i72S8W0B0Z2SXS2SS22SS22S2MM@
MMMMMMMMMMMMMM@MMMMMMMZ7X22;aMB8aZZZa2S2a222S2aXXSa22SSX2Z8B@@WWS   77iii.X@MB0BZXaa22S2S222S2a2ZMM8
MMMMMMMMMMMMMM@@MMM@MMMM0;Z2i2MBZZZ8Za2S222SSaaZSSXSSXZ80WM2a877Sii:irX:SMMWZZ002S22S22X222a2a22WMZ8
MMMMMMMMMMMMMM@MMMMMMMMMM8S;, .B2ZZ80Z28222aa0S22SS2aZBWB87,  :  :,:,7MMMB27S88S2aaZSSSX22aZZaaZM028
MMMMMMMMMMMMMMMMMMMMM@MMMMMMMMMM8Xa280aaZ2XSXZ7SSZ0MMMMM77S2WarMWMMWM@8227SZ0ZSSS2SSSXXX222a88ZMM228
MMMMMMM@MMMMMMMMMMMMMMMMMMMMMMMMM822S8822XSXXXXXSS2222ZWMMMMMM@BWB88SXSSZ08a22SXSSS222SX2Za8B8BMSaa8
@MMMMM@@MMMMMM@MMMMMMMMMMMMMMMM@MMB22X8Za2aXSXX7XS2X77rrr72Z8BWBWBB08000Za2aZ2SX2SSS22SSX22ZWBMZ2a2Z
MMMMM@MMMMMM@@MMMMMMMMMMMMMM@@MMMMM@2S2aZa222SSXX7XXXXS2SX2XXS22aZZZZZaaaZaa2SSSSS2222SSSSSZ@@B2Zaaa
MMMM@MMMMMMM@MMMMMMMMMMMM@@MMMM@@@MMMaXX2aa2222SXSSSXXSXXS22SSSSSXS2aSSSXXXS2S2SS22Sa2SS22S0M@Zaaaaa
MMMMMMMMMMMMMMMMMMMMMMM@MMMM@@@@W@@@MM8XXS22aa2ZaSXXXS2S2SXSS22SS2aa2XXS22SSSSX2SXXX2SX7SaSWM8aa2aaa
MMM@MMMM@MMMMMMMMMMMMMMMMM@@M@@WW@MMMMMMaX7XS22aaSS22SSSS22XSSSXS2SX22Za22S2S22XX7r7XXSXS2a@MZ2aaaa2
MMMMMMMZZMMMMM@@MMMMMMMM@@@M@M@M@@MM@@MMMMZaXXXXSS22SS2a22a2SaSXS2XSSSSaa2SSrrr7X77X77X2Sa@@Z2aaa2a2
MMMMMMMBS@MM@@MMM@MM@M@@@MMM@M@@@@MMM@@MMMMMWZaa22SSSSSXS2aZZZaSSSSS22222SSS7rr7X7XX77Xa0@W8a2222aBa
MMMMMMMMaSMMM@@MMMMM@MM@M@@BBBW@M@@@MM@M@@MMM0WM0Zaaaa222SSS2a808a2SSSSSXX77XXXXXXXX20W0008a222a8@2
MMMMMMMMM7aM@MMM@@MMM@W00WWZ@@@BMM@@@MM@W@M@M@ MMB0ZZZaZa22222S2Z0BB0a2aSSXXSSS2SaBB8Za808ZaaaZB@; .
MMMMMMMMMWXWWBBBWBWBBBWWM@W@M@@WW@@@@WWWWWWWMM:.080008Z8ZZZZ2aZ2SSSaZ000B08000008Z2S2888ZZZaa2Z;  ,7

I sent him a link to this in return.

[youtube]http://www.youtube.com/watch?v=0nRPoS2WDJA[/youtube]

May 16, 2009

Fedora 11 - coming soon

My favourite GNU/Linux distro soon has a new version available.

Fedora 11 feature list

/Roger Sinel
Fedora Ambassador

Scope of the difficulty: an idle moment’s association

  1. There are only two hard things in Computer Science: cache invalidation and naming things.
    Phil Karlton

  2. Almost all programming can be viewed as an exercise in caching.
    Terje Mathisen

May 07, 2009

Screenflow Logo

Screenflow LogoRecently I had a need to create a screencast to help my father learn how to use his new Mac. I’d seen Telestream Screenflow used in the past and from that, and a little play with the trial version, I decided I’d purchase the app for my Mac.

Although Screenflow is a bit expensive at 99USD, it has a lot of cool features, so I was all prepped and waving my credit card around ready to buy. Unfortunately when I went to Telestream’s eSellerate powered store it didn’t even list the application for sale.

Epic sales FAIL.

The chaps at Telestream however had made the smart decision to list their Twitter name on their site (@screenflow). So I tweeted a little note about how their store was broken when I had wanted to buy and that they had lost a sale.

I then went on to use Snapz Pro X, which I already own, to make my little screencast.

The next day, @screenflow tweeted me a little direct message asking for my email address, which I provided. Lo-and-behold, in response they send me a coupon code for a 100% discount for my troubles. Nice.

Thanks Telestream, that’s what I call customer service.


May 01, 2009

LoC gets semantic

This morning, the Library of Congress launched http://id.loc.gov/authorities/, their first serious entry into the semantic web.

The site makes the Library of Congress Subject Headings available as defererenable URLs. For example http://id.loc.gov/authorities/sh90005545.

April 29, 2009

orj

Call:

How do I update controls on my UI thread from the Asyncronous Delegate? I understand it’s not safe to just try to update it directly from the asynchronous thread, but I need to figure out how to update it somehow. Specifically I need to increment a ProgressBar and update text in a Label.

Thanks,
Michael C.

Response:

My name is Tyrone Hernandez. I grew up on the streets, so I’m familiar with this kind of shiznit. I am down with this, and I’m going to keep it real by top-posting.

In my hood, I have a homie named Shaniqua and some times she calls me on the phone in an asynchronous fashion. Now I’m hip to her jive, so I don’t wanna just hang up. If I feels like talkin to Shaniqua, I talk to her. If not, I use my power of “Control” to check “IsInvokeRequired” and if it is, I call Control.BeginInvoke() to upgrade my progress bar.

Well, I’d be likin’ to talk more, but my crack dealer is here so I’m going to call some of his methods. Peace out!

Made me laugh. And he’s correct. Found, here.


April 26, 2009

One heartfelt indictment

Dag Ågren:

At this point, I’d like to take a moment to speak to you about the Adobe PSD format. PSD is not a good format. PSD is not even a bad format. Calling it such would be an insult to other bad formats, such as PCX or JPEG. No, PSD is an abysmal format. Having worked on this code for several weeks now, my hate for PSD has grown to a raging fire that burns with the fierce passion of a million suns.

If there are two different ways of doing something, PSD will do both, in different places. It will then make up three more ways no sane human would think of, and do those too. PSD makes inconsistency an art form. Why, for instance, did it suddenly decide that these particular chunks should be aligned to four bytes, and that this alignement should not be included in the size? Other chunks in other places are either unaligned, or aligned with the alignment included in the size. Here, though, it is not included. Either one of these three behaviours would be fine. A sane format would pick one. PSD, of course, uses all three, and more.

Trying to get data out of a PSD file is like trying to find something in the attic of your eccentric old uncle who died in a freak freshwater shark attack on his 58th birthday. That last detail may not be important for the purposes of the simile, but at this point I am spending a lot of time imagining amusing fates for the people responsible for this Rube Goldberg of a file format.

Earlier, I tried to get a hold of the latest specs for the PSD file format. To do this, I had to apply to them for permission to apply to them to have them consider sending me this sacred tome. This would have involved faxing them a copy of some document or other, probably signed in blood. I can only imagine that they make this process so difficult because they are intensely ashamed of having created this abomination. I was naturally not gullible enough to go through with this procedure, but if I had done so, I would have printed out every single page of the spec, and set them all on fire. Were it within my power, I would gather every single copy of those specs, and launch them on a spaceship directly into the sun.

PSD is not my favourite file format.

April 23, 2009

Matt Trout has a plan

Matt S. Trout:

But here’s my point: Perl people hang out on mailing lists. We bottom post, carefully interleaved, with 76 character lines. We have signatures that meet the McQ standard for acceptable size. We hang out on IRC servers and bitch, moan, interact and collaborate with the aid of an 80×25 xterm with irssi, BitchX or IrcII in it. Sometimes we sit at home with a beer and do one or more of the above. Forums? Meh. Those are the things the PHPtards like because they can’t figure out how to work a mailing list, right? Blogs? That’s not even a ••••ing word! I mean, in my day, we posted to usenet using Larry Wall’s rn that didn’t even have decent ••••ing threading, and we schlepped the posts about from one bnews spool to another over 1200 baud dialup links, and we liked it!

The rest of the world still thinks that we write a thing called PERL, whose purpose is as a super-awk or to write CGI scripts, they think that this language is designed to be write-only line noise that’s only slightly more legible than brainfuck, and we’re sort of sat here going “meh, well, they’re wrong” and NOT ACTUALLY DOING ANYTHING ABOUT IT

And. There’s an extra incentive. […] So, if you don’t care about any of the rest of it and just think I’m an uppity annoying profane obnoxious b••••••, maybe getting the chance to help me make a fool out of myself will motivate you instead!

That’s Matt – scarily smart, gratingly profane, infectuously energetic, and gets things done.

April 19, 2009

Out for a ride.

dscn5878.jpg

I’m finally getting out on my bike again. The spring is here and my sore back is feeling a lot better after a few months of exersises.
So I’m finally out there on the open rode again.

I found an interessting site for bike nerds like myself. A bicycle blog type of thing.

/Roger Sinel

April 17, 2009

In defense of mutable history

With Git, you have a multide of ways to slice and dice commits: the index lets you commit things in bits and pieces as you go; the stash gives you a way to transplant changes between branches or shelve work for a while; the power of rebasing allows you to go back and split large commits sensibly into atomic ones. In other DVCSs, all of this is also possible but requires painstaking busywork using far less powerful tools; in centralised VCSs most of it is for all intents and purposes impossible to do at all.

But lately I’ve seen entries cautioning against the use of this power. To me this seems like a sign that Git has arrived – it is pulling in people who don’t subscribe to the philosophy behind its design. Personally, I consider the argument for immutable history a terrible idea, even if it is logically defensible within certain parameters. Why?

In contrast to systems with immutable history, Git doesn’t force you into any sort of careful premeditation where you try to anticipate all possible and impossible problems you may or may not realise yet. You go about business as usual, and if you belatedly notice a problem you hadn’t thought of or couldn’t even have been aware of at the time, you just go back and fix it. No sweat. Think of it as the Never Use a Warning When you Mean Undo principle, applied to version control.

Git is humane, in the Aza Raskin sense.

Update: Chris Siebenmann makes a good point that rewriting history in Git is actually no such thing. Of course, that is why pushing “rewritten” history causes the problems it does. And of course that in turn is actually desirable, and thus very much by design.

April 16, 2009

Good HTTP citizenship for DiggBar protesters

John Gruber started a bit of a wave by blocking the DiggBar on his site and explaining how others can do the same.

However, his implementation as it currently stands does not play well with search engines and caching proxies.

  • The problem with search engines is easily stated and easily solved: a search engine that comes in via a DiggBar link will index the get-lost page as the regular content for that page.

    To avoid that, simply send the page with a status of 403 Forbidden rather than 200 OK, thus telling the search engine that there’s nothing there for it to see. This is trivial to do – in PHP, just add a line like this to the code:

    header("Status: 403");

    Note that this needs to happen before any output is produced; ie. in John’s example it would go above the echo line.

  • Fixing the caching proxy problem is not as nice.

    The issue is what happens when two users request the page through the same caching proxy: the version of the page that’s served to the first visitor will be cached and served to the second visitor as well. But if one of them hits the page via the DiggBar and the other comes in via legitimate venues, then the second visitor will get the wrong version of the page no matter the order in which they hit the site. Of course the worse case is when the DiggBar user was first: then the second, legitimate visitor will be served the get-lost page.

    Just the response status fix above should help a little with that: proxies generally use different expiration rules for such as 403 responses.

    To really fix the problem, however, you need to also send a “Vary: Referer” header. This asks proxies to request a separate copy of the page each time they see a client request it with a different Referer. This ensures that the correct page will be served to everyone, it even ensures that caching remains possible despite the page content possibly varying depending on the source link.

    However, while correct, it is probably hard to stomach: it will increase the volume of proxy reqests by roughly a factor of the number of distinct links to your site. That’s easily an order of magnitude for popular weblogs like Daring Fireball. Then again, not implementing it will penalise some innocent visitors.

    Sending variable pages properly in an infrastructure with caches is not cheap.

    Update: in email, Mark Nottingham suggests simply making the 403 explicitly uncacheable by sending along a “Cache-Control: no-store” header.

Update: Adrian Sutton:

So that’s roughly what has now been deployed to Symphonious.net. The key difference is that the “Vary: Referer” header that Aristotle suggests is only added when the page is blocked. This means it’s possible for someone using the DiggBar to get the real page from a caching proxy, but it shouldn’t be possible for an innocent user to get the blocked page.

That’s a clever trade-off. He compounds this with a Javascript solution to bust the DiggBar frame, because a user might be coming in through a link from another site that in turn has been framed – which isn’t obvious from the referrer. Nice work.

Update: Mark Nottingham remarks that serving responses both with and without a Vary header for the same URI is likely to confuse caches.