Planet WLUG

June 28, 2013

Aristotle Pagaltzis

Cargo Cult Simplicity

Eevee:

In the quest to make the default exposed settings [of GNOME] simple, it has become remarkably complicated to actually change any settings I suspect exist but cannot see. So far today I have played with:

  • compizconfig-settings-manager
  • gnome-tweak-tool
  • unity-tweak-tool
  • ubuntu-tweak
  • dconf-editor
  • gnome-color-chooser
  • gtk-theme-config

June 28, 2013 08:41 PM

A minimax SSH key regime

Recently it occurred to me that I had been using the same main SSH key for almost 15 years. I had minted a second one for GitHub when I signed up there, but that was it. Worse, both of them were only 1024 bit strong! That may have been fine 15 years ago when I minted the first one, but it certainly isn’t now. They were also both DSA keys, which turns out to have a systematic weakness. (Plus, old versions of the FIPS standard only permitted 1024-bit DSA keys.)

This had to be fixed. And I wanted an actual regime for my keys, so I wouldn’t repeat this.

Naturally, my new keys are all RSA and 8192 bit strong. Yes, 8192 – why not? I worried about that slowing down my SSH connections, but I knew adding key length only increases the cost of the handshake phase, and if my SSH connections are taking any longer to set up now, I haven’t noticed. Even if I did notice, SSH now supports connection sharing (which I have enabled), so that only the initial connection to a host would even experience a meaningful delay. And since I combine that with autossh to set up backgrounded master connections to the hosts I shell into frequently, most of my connections are nearly instant, and always will be, irrespective of key strength.

So how many keys does it make sense to have?

My first impulse was to mint one key pair for each server I would be shelling into. But as I’ll explain, that isn’t what I ended up doing.

I spent a while reading and thinking.

In terms of private key security, my situation is that on every machine on which I work at a physical console, I run a SSH agent with most or all of my private keys loaded. I also have a few passphrase-less keys on other machines, for use by various scripts. (Logins using such keys are restricted to specific commands.) In all cases, an attacker who gained access to any one of these keys would almost certainly have access to all the other keys on the same machine. So there is no security to be gained from using different SSH keys for different servers from the same client. But it does make sense for each client to have its own private keys.

A simple way to encapsulate this as a rule is: never copy a private key to another machine.

In the trivial case, this means one private key for each client machine, with the public key copied to every server to be accessed from that machine. There is, however, a potential privacy concern with this: someone who can compare public keys between several systems can correlate accounts on these systems to each other. Because of this, I share public keys only between servers where I wouldn’t mind if they could be thus correlated.

The upshot is that I have one main key pair for use on my home network and on several other machines under my direct control; plus a few more key pairs (e.g. my new GitHub key) used for maybe 3 shell accounts each. (It so happens that I only have a single machine on which I run a SSH agent – my laptop –, but not long ago, there were 3.) Lastly, I have deleted all copies of any private keys I had distributed among my machines.

June 28, 2013 08:38 PM

June 19, 2013

Stuart Yeates

A wikipedia strategy for the Royal Society of New Zealand

Over the last 48 hours I’ve had a very unsatisfactory conversation with the individual(s) behind the @royalsocietynz twitter account regarding wikipedia. Rather than talk about what went wrong, I’d like to suggest a simple strategy that builds the Society’s causes in the long term.
First up, our resources: we have three wikipedia pages strongly related the Society, Royal Society of New Zealand, Rutherford Medal (Royal Society of New Zealand) and Hector Memorial Medal; we have a twitter account that appears to be widely followed; we have some employee of RSNZ with no apparent wikipedia skills wanting to use wikipedia to advance the public-facing causes of the Society, which are:
“to foster in the New Zealand community a culture that supports science, technology, and the humanities, including (without limitation)—the promotion of public awareness, knowledge, and understanding of science, technology, and the humanities; and the advancement of science and technology education: to encourage, promote, and recognise excellence in science, technology, and the humanities”
The first thing to notice is that promoting the Society is not a cause of the Society, so no effort should be expending polishing the Royal Society of New Zealand article (which would also breach wikipedia’s conflict of interest guidelines). The second thing to notice is that the two medal pages contain long lists of recipients, people whose contributions to science and the humanities in New Zealand are widely recognised by the Society itself.
This, to me, suggests a strategy: leverage @royalsocietynz’s followers to improve the coverage of New Zealand science and humanities on wikipedia:
  1. Once a week for a month or two, @royalsocietynz tweets about a medal recipient with a link to their wikipedia biography. In the initial phase recipients are picked with reasonably comprehensive wikipedia pages (possibly taking steps to improve the gender and racial demographic of those covered to meet inclusion targets). By the end of this part followers of @royalsocietynz have been exposed to wikipedia biographies of New Zealand people.
  2. In the second part, @royalsocietynz still tweets links to the wikipedia pages of recipients, but picks ‘stubs’ (wikipedia pages with little or almost no actual content). Tweets could look like ‘Hector Medal recipient XXX’s biography is looking bare. Anyone have secondary sources on them?’ In this part followers of @royalsocietynz are exposed to wikipedia biographies and the fact that secondary sources are needed to improve them. Hopefully a proportion of @royalsocietynz’s followers have access to the secondary sources and enough crowdsourcing / generic computer confidence to jump in and improve the article.
  3. In the third part, @royalsocietynz picks recipients who don’t yet have a wikipedia biography at all. Rather than linking to wikipedia, @royalsocietynz links to an obituary or other biography (ideally two or three) to get us started.
  4. In the fourth part @royalsocietynz finds other New Zealand related lists and get the by-now highly trained editors to work through them in the same fashion.
This strategy has a number of pitfalls for the unwary, including:
  • Wikipedia biographies of living people (BLPs) are strictly policed (primarily due to libel laws); the solution is to try new and experimental things out on the biographies of people who are safely dead.
  • Copyright laws prevent cut and pasting content into wikipedia; the solution is to encourage people to rewrite material from a source into an encyclopedic style instead.
  • Recentism is a serious flaw in wikipedia (if the Society is 150 years old, each of those decades should be approximately equally represented; coverage of recent political machinations or triumphs should not outweigh entire decades); the solution is to identify sources for pre-digital events and promote their use.
  • Systematic bias is an on-going problem in wikipedia, just as it is elsewhere; a solution in this case might be to set goals for coverage of women, Māori and/or non-science academics; another solution might be for the Society to trawl it's records and archives lists of  minorities to publish digitally.

Conflict of interest statement: I’m a high-active editor on wikipedia and am a significant contributor to all many of the wikipedia articles linked to from this post.

by Stuart Yeates (noreply@blogger.com) at June 19, 2013 09:55 PM

June 10, 2013

Aristotle Pagaltzis

rename 1.600

I just cut a new release of rename: 1.600. The headline feature of this version was inspired by Dr. Drang: a built-in $N variable for easily numbering files while renaming them. It is accompanied by a --counter-format switch for passing a template, so you will be spared the fiddling with sprintf for padded counters.

I also finally gave the documentation the huge overhaul it has needed and deserved for a long time. There is now a proper synopsis, the description is brief, and the tutorial that was previously in the description section is a separate much larger section adapted to all the new stuff added since my original version of this utility. Lots of things are now documented properly for the first time.

In more minor notes, there is now a negatable --stdin switch you can use to explicitly tell rename to read from stdin, rather than it just guessing that it’s supposed to do that based on the absence of file names on the command line. The purpose of this is more predictable behaviour in situations where rename is passed computed arguments that may evaluate to nothing (e.g. with the nullglob shell option).

And lastly, I extracted a new --trim switch from --sanitize, mostly for consistency’s sake.

Share and enjoy.

June 10, 2013 09:53 PM

May 10, 2013

Aristotle Pagaltzis

Together we can end this destructive conflict

The irc.mozilla.org qdb:

<jesup> There is no CapsLock.  There is only Ctrl.  ;-)
<jesup> First thing I config on any new machine.  Can you tell I use emacs?
<mbrubeck> It's the first thing I configure too, and I'm a vi user.
<mbrubeck> Maybe we've found the common ground that can unite a war-torn planet!

(Related.)

May 10, 2013 01:59 AM

April 10, 2013

Aristotle Pagaltzis

A quote for the ages

jwz:

I tried to explain to rzr_grl what Debian was, and the best I could come up with was that they’re like the Radical Fundamentalist nutjob faction of Linux: people for whom Red Hat is insufficiently extremist. At this point she looks at me as if to say, “you mean the nutjobs have their own nutjobs??” I suspect she thought I was making the whole thing up.

April 10, 2013 08:29 PM

January 01, 2013

Ian McDonald

Mobile broadband for home usage in the UK

I've recently moved into our own home (yeah!) but found myself with no timeline yet on when I can get home broadband :-( The situation is that Openreach has an open order on our new phone/ADSL line so can't do anything for quite a few weeks, until then - and that particular order is probably a cancellation!! It is a peculiar situation in that it would have been quicker if the previous owners hadn't done the right thing and then we could have "taken over" the line. No ISP is interested in the line until then even though I can show we legally own the house. So I've decided to document how I kept working and kept our sanity.

Firstly my requirements were that needed a fair bit of bandwidth as we have three heavy Internet users in the house and many devices, and that we don't get hit with massive bills if we go over any cap. I looked at pure pay as you go devices but the cost per GB worked out at about £5 per GB. Nasty! So for the main bandwidth I went with Three on the MiFi - Huawei E586. This gives 15 GB per month on a 24 month contract for £18.99 - £15.99 for the bandwidth and £3 for the device. I know Three has a bad reputation for some people due to poor initial coverage at launch but they have been brilliant for me in the past - used them for a dongle when we first moved to the UK and were the only ones who would give me a contract on the spot at the time. Again this time I went to a Three shop and they sorted it all out for me instantly.

This was working well and decent speed (6 Mbits down, 3 Mbits up) but is getting used up fairly quickly. I had turned off WiFi for phones and iPads that had data on, but the gaming / YouTube etc takes it's toll. One thing to note is that you can get the bill capped on these as well so that you don't get a nasty bill once you hit the 15 GB. You do have to tell them this though, and they don't guarantee it will work if their bandwidth site is down (which it has done a couple of times) so I will turn the device off when it gets close to the 15 GB each month.

I then went looking for a pay as you go option that was reasonably priced and I settled on T-Mobile. They are unique in that you pay per time period and that if you go over the amount of allocated bandwidth then they just slow down your connection and not allow videos or downloads. The data allowances / days are £2 for 1 day and 250 MB, £7 for 1 week and 500 MB or £15 for 1 month and 1GB. I figure that this will work fine for the short time when the allowance on Three has run out during the month. One useful thing T-Mobile do is graphics compression which reduces your bandwidth use. If you don't like it you can always turn it off at http://www.accelerator.t-mobile.co.uk/. I did have a bit of  problem getting the T-Mobile connection going. I paid £25 for a Huawei E3131 including £10 topup. They didn't apply the topup to my SIM and it was not working. I rang them up and they said they would correct and activate. In the meantime I decided to test it on a Zoom 4501 3G router that I had lying around to see if this would tell me the status of the card. Unfortunately the data stick stopped working altogether at that time. This then made me think back a few weeks ago when my work Vodafone data stick stopped working and funnily enough I had tried to test this as well... On the Zoom you plug the USB data stick into a USB port and it shares out the connection. This is great in theory and I have used it a fair bit in the past. But it now has developed a fault and has destroyed two USB data sticks so I would say stay away from it...

After breaking the new Huawei data stick I put the Micro SIM into the only phone I had that took a Micro SIM - a Nokia 800 and it worked for receiving text messages, but not for Internet. The phone had previously been used for Vodafone. None of the "download automatic settings" seemed to work for the phone, so had a hunt around on the Internet and just went and put in an APN of general.t-mobile.uk - note the lack of .co in the middle as some websites had this wrong and having it wrong doesn't work..

Having said all this I will be very happy to go back to Sky Broadband when I can for £10 per month. Will be even better when they roll out fibre here.. I have used Virgin before but found they "managed / shaped" the traffic a lot and this interfered with gaming. I also can't get it at this house anyway...

by Ian McDonald (noreply@blogger.com) at January 01, 2013 01:40 PM

November 03, 2012

Ian McDonald

A few tips for Amazon


Just got asked about getting started with Amazon Web Services (AWS) and thought may as well put as a blog post as well.

So here they are:
  • Make sure you take support of some kind from Amazon. You need it as sometimes your machine might get a glitch or you just want to ask a few questions
  • Get to know the account team at Amazon. They will give you free technical training and help you out. Once you grow enough they'll also help you change from credit card billing to on account billing. I personally wouldn't worry about trying to alter contract/legal terms and conditions - you'll tie yourself in knots for ages and gain virtual nothing
  • Architect for failure. With Amazon you still need to have redundancy and backups. See my blog post at http://blog.next-genit.co.uk/2012/04/building-for-amazon.html
  • Use their products where possible to reduce work for you. e.g. Amazon Linux (their version of RedHat), RDS (MySQL, Oracle), DynamoDB (a NoSQL database)
  • Start with small instances and make bigger as needed, rather than other way around. Very easy to resize (just needs a reboot) and you will save money. Only exception to this is micro instances which will never give you reliable performance as they just use timeslices that are spare.
  • Use 64 bit so you can scale all the way up if needed. No penalty on cost.
  • Amazon can now do just about anything as they have introduced SSD disks, committed IOPS etc
  • Utilise VPC by default which is their VPN. This can now connect back onto your firewall as a connection by IPSec VPN and they also connect to some data centres directly now. Of course you need to follow good system design and keep systems together that cause a lot of IO between each other.

by Ian McDonald (noreply@blogger.com) at November 03, 2012 12:41 PM

October 26, 2012

Aristotle Pagaltzis

EarPods

Curiosity got the better of me: I succumbed to the hype and bought a set of Apple’s new EarPods. These are my thoughts on them after a week.

Basically, they sound OK. They are most certainly a huge improvement on the buds Apple used to make, but the sound quality is not amazing. More noteworthy is that I find them comfortable to wear for any length of time. They also maintain a good fit to the ear by themselves, twisting only slightly out of the optimal position when not held manually. (The old buds were rubbish in both these regards.) Quantitatively speaking they are fine value for the money (they don’t cost much!) but not a brilliant shopping choice.

There is one thing about them however that I have not seen remarked on anywhere else, which makes me not regret this purchase at all.

Maybe it is only owing to a peculiarity of my ears, but somehow the EarPods manage to imbue the low bass range with that subterranean quality of a great bass listening experience on large high-fidelity speakers.

I have never experienced headphones manage to reproduce this before. Circumaural speakers tend to make that bass range sound purely ærial; intra-aural, sealing buds tend to jackhammer it directly against the ear drums; non-sealing buds (of which I have only used cheap ones, admittedly) lack almost all punch. The EarPods somehow manage to drive the bottom end of music with respectable oomph while at the same time being subtle and understated about it.

They aren’t closed, so noisy environments will drown out their bass delivery efforts. But that seeming weakness yields a great upside: it is very comfortable for me to turn the volume up loud and keep it there for quite a stretch without ever getting fatigued by a relentless onslaught of bass – even though it is anything but weak or tinny. Part of that is also the consistently open and transparent sound at any volume level.

Their mediocre crispness at the top end can be distracting when you pay attention, however.

All in all, I am enjoying these as a workhorse set I can pop in to keep myself happy while preoccupied.

In conclusion: buy these not for the exceptional quality they are advertised for, but for their great comfort.

October 26, 2012 09:28 PM

October 18, 2012

Aristotle Pagaltzis

Glasnost Lives!, or: All Nations Under The Source, or: Linux

Alan Cox:

If you look at Linux contributions they come from everywhere. The core of the network routing code was written by Russians […] who worked at a nuclear research instutite […]. We have code from government projects, from educational projects (some of which are in effect state funded), from businesses, from volunteers, from a wide variety of non profit causes. Today you can boot a box running Russian-based network code with an NSA-written ethernet driver.

October 18, 2012 05:57 PM

October 08, 2012

Aristotle Pagaltzis

Black

David Hill, of ThinkPad design fame:

It’s the color of power. It’s the color of death. It’s the color of sex. It’s the color of so many different things.

October 08, 2012 10:59 PM

October 04, 2012

Daniel Lawson

Entropy and Monitoring Systems

I use munin for monitoring various aspects of my servers, and one of the things munin will monitor for me the amount of entropy available. On both my current server and my previous one I’ve noticed something unusual here:

According to munin, I’m almost perpetually running out of entropy. Munin monitors the available entropy by chekcing the value of /proc/sys/kernel/random/entropy_avail, which is the standard way you’d check it. My machine has several VMs running, and hosts a few services that use entropy at various times (imaps, ssmtp or smtp+tls, ssh, https), so it’s not unreasonable that I may have been entropy starved. If my entropy levels are always around the 160 mark, it’s likely that at any given time I’m totally starved of entropy, so anything using encryption will stall a bit.

I had a brief look into various entropy sources, such as timer_entropyd or haveged, but none of them seemed to help. I’d seen several references to Simtec’s entropykey, which looked very promising, so I ordered one from the UK, which arrived a week or so ago.

I’ve yet to arrange a trip to the datacentre to install it however, and after a bit of poking round today I’m not so sure it’s as desperately needed as I thought

I randomly checked on the contents of /proc/sys/kernel/random/entropy_avail, just to see what it was like. There were over 3000 bits of entropy present. Very odd. I repeated this several times, and watched the available entropy decrease from over 3000 down to around 150 or so, the same as in my munin graph above. I repeated this about a quarter of an hour later, with the same results – over 3000 entropy, rapidly decreasing to very little.

After a bit of further digging, I found this blog post, which mentioned that creating a process uses a small amount of entropy. The author of that post was seeing problems with his entropy pool not staying full, which sounds like what I was seeing. I’m still not clear on what requires entropy though, as some of my systems at work clearly don’t deplete the entropy pool during process creation.

So, I did some different monitoring: Check the value of entropy_avail every minute, through a different script. The graph below shows the results:

Clearly, entropy is normally very good, but is dropping down to very low levels every 5 minutes. It replenishes just fine in the intervening 5 minutes however, which suggests that I don’t really have a problem with entropy creation, just with using it too quickly.

As for the question, “why is my entropy running out so fast?”, the answer is quite simple: Munin. On my host machine, munin runs around 50 plugins, each of which generally calls other processes such as grep, awk, sed, tr, etc. I don’t have exact figures on how many processes were being kicked off every 5 minutes, but I wouldn’t be surprised to find it was hundreds, all of which used a little bit of entropy

I’ll still install the EntropyKey, and maybe it’ll help my pool recover quicker.

by daniel at October 04, 2012 03:56 AM

August 19, 2012

Aristotle Pagaltzis

Tweet on, tweeter

Twitter effectively say quoting a tweet on one’s site as a plain quotation is henceforth outlawed. Idiotic. I doubt they have a legal leg to stand on anyway, but that they would even want to do this is galling just the same. Even more galling to me is that by all I can tell, it appears that even my own tweets would technically be subject to these limitations if I myself chose to quote them elsewhere.

It’s not like I was very active on Twitter in recent times, but this move has completely soured me on the service.

When Twitter killed the ability to see all @-replies from your followees in your stream, even those to people you didn’t yourself follow, my enthusiasm dropped off a cliff. Remember that? As far as I’m concerned, that is when communal Twitter died. A lot of people quit in a huff. I stuck around, though the place was never the same again. Next, the client I was using (which was effectively unmaintained by then but had kept working) fell over dead when Twitter made OAuth a requirement. I never found a replacement both lightweight and inoffensive enough. (On Linux you could find either or, but not both. I have not tried again in a while.) So I’ve stuck around by using just the site, only poking in every once in a while because the site is not a convenient persistent client. My vague intention was to one day make a serious effort to find a new client and get back into it.

So much for that.

And their presumption in wanting to dictate to the world what they are allowed to do with, err, 140 characters of plain text makes me want to neither read nor write anything on Twitter any more.

So I am washing my hands of it.

Update: Hah! Ha ha. Not the reason I am irritated per se, but illustrative nonetheless.

August 19, 2012 01:40 AM

July 08, 2012

Aristotle Pagaltzis

Code that counts

Tom DeMarco:

My early metrics book, Controlling Software Projects: Management, Measurement, and Estimation (Prentice Hall/Yourdon Press, 1982), played a role in the way many budding software engineers quantified work and planned their projects. […] The book’s most quoted line is its first sentence: “You can’t control what you can’t measure.” This line contains a real truth, but I’ve become increasingly uncomfortable with my use of it.

Implicit in the quote (and indeed in the book’s title) is that control is an important aspect, maybe the most important, of any software project. But it isn’t. Many projects have proceeded without much control but managed to produce wonderful products such as Google Earth or Wikipedia.

To understand control’s real role, you need to distinguish between two drastically different kinds of projects:

  • Project A will eventually cost about a million dollars and produce value of around $1.1 million.

  • Project B will eventually cost about a million dollars and produce value of more than $50 million.

What’s immediately apparent is that control is really important for Project A but almost not at all important for Project B. This leads us to the odd conclusion that strict control is something that matters a lot on relatively useless projects and much less on useful projects. It suggests that the more you focus on control, the more likely you’re working on a project that’s striving to deliver something of relatively minor value.

July 08, 2012 05:27 PM

June 29, 2012

Craig Box

Title of Record

Londoners take their titles very seriously. Filling in my name on the TFL's web site's "Contact Us" form, my options for Title are:

  • Ms
  • Mr
  • Mrs
  • Miss
  • Dr
  • Cllr
  • Prof
  • Sir
  • Not given
  • Air Cdre
  • Ambassador
  • Baron
  • Baroness
  • Brig Gen
  • Brother
  • Canon
  • Captain
  • Cardinal
  • Cllr Dr
  • Colonel
  • Commander
  • Count
  • Countess
  • Dame
  • Dowager Lady
  • Duchess of
  • Duke
  • Earl
  • Empress
  • Father
  • Fleet Admin
  • Gen
  • Gp Capt
  • Hon
  • Hon Mrs
  • HRH
  • Imam
  • Judge
  • Lady
  • Laird
  • Lieut Colonel
  • Lieutenant
  • Lord
  • Madam
  • Major
  • Major General
  • Marchioness
  • Marquess
  • Mayor
  • Pastor
  • Pc
  • Prince
  • Princess
  • Rabbi
  • Rev
  • Rev Dr
  • Revd Canon
  • Rt Hon
  • Rt Hon Baroness
  • Rt Revd
  • Sergeant
  • Sheikh
  • Sister
  • Sqn Ldr
  • Viscount
  • Viscountess
  • Wg Cd
  • Other

They list HRH, but not HM?  Surely, it's not unreasonable to assume that the Queen has complaints about service on the Underground?

 

by Craig at June 29, 2012 04:06 PM

May 31, 2012

Aristotle Pagaltzis

Magic

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. […]

Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

[…]

Not all is delight, however […] One must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn’t work. Human beings are not accustomed to being perfect, and few areas of human activity demand it.

Fred Brooks, The Mythical Man-Month

There I am, tattered-robed old man standing alone, beard whipping in the wind, with a distant stare (slightly mad), intently murmuring an unintelligible ramble under his breath… until something happens. To read this passage the first time was an arresting moment of revelation of a truth I had known without knowing, all along.

May 31, 2012 12:31 AM

May 21, 2012

Ian McDonald

Fixing your twitter being hacked

Often people are sending out junk tweets or direct messages from their account and they are not sure how to stop them. For example the latest ones are direct messages like "Hi someone is posting nasty rumors about you..."  or "Hi some person is posting nasty things about you..." or "Hi somebody is saying really bad things about you..." or sending out status updates such as "I lost weight without having to make any major diet changes while boosting energy levels, heres how: http://media-channel-8.com"

The reason for this is that your Twitter account has been compromised. This was probably due to you clicking on a bad direct message or going to a malicious website.

To fix this go into https://twitter.com/settings/applications and revoke every single application. These are web applications that you have given permission to at some time to use your Twitter account - all those applications listed can go and post spam if they are malicious. Don't worry about deleting ones you use - any legitimate application will ask for permissions the next time you go to use them.

After you have revoked the permissions go and change your Twitter password. If you have used this password elsewhere and you could have entered your Twitter password into a fake website then you must now go and change all the accounts that used the same password. When making new passwords consider using a totally different password for any financial websites.

To stop this happening in future always check on URLs before clicking on them from any website. If you can't do this as they are short URLs that don't supply the whole link (e.g. bit.ly, t.co) then right click on the link and select "open link in incognito mode" or similar such as private mode. This will then open them in another more secure browser window that will not have any of your logged in websites available to any potential malicious sites. Also make sure to read the URL properly every time you login or supply personal information. e.g. the websites on these direct messages from the compromised account show as twititre.com rather than twitter.com

by Ian McDonald (noreply@blogger.com) at May 21, 2012 08:30 PM

May 05, 2012

Aristotle Pagaltzis

An orphan olive branch to Mercurial

Git repository browsers have universally awful graph drawing algorithms.

For the longest time, one of my repositories has had two main branches, master and release. For a release, I would git merge --no-ff master into release. (Using --no-ff forces a commit on release even if release could be fast-forwarded to the current state of master. That way the act of cutting a release is always recorded in the repository.) Development happens on master, sometimes on branches. Topic branches are rebased before merging them back to master, once again using the --no-ff switch to record that a certain stretch of commits belonged to one topic together.

Essentially, this is a two-track history, with occasional short parallel side tracks on one side:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master
      \   \           \                    \   \       \   \
-------o---o-----------o--------------------o---o-------o---o    release

You would think that this would be easy to draw in a sane way.

And most of the time it is. But sometimes repository browsers decide to to draw release on the other side of master. And as it happens, sometimes a topic falls by the wayside for a while. When these conditions coincide, drawing the stray heads from these topic branches and at the same time drawing release in such a way that the merge direction (from master into release) is correct suddenly requires snaking each release commit around all the previous ones. The result is a marshalling yard of parallel tracks (which I will not try to give an ASCII diagram of…) for representing what in reality is a very simple history. That makes it very difficult to make heads or tails of what really happened in the repository: a whole Black Forest out of just two trees.

There are some ordinary options to suppress this. The most obvious one would be to do a fast-forward merge of release back into master before picking up again. Doing so yields a triangular structure like this:

                           o--o--o--o
                          /          \
-o---o   o   o---o---o   /------------o---o---o   o   o---o   o---o  master
      \ / \ /         \ /                      \ / \ /     \ / \
-------o---o-----------o------------------------o---o-------o---o    release

Here there are no parallel tracks: the only unbroken track is the release branch, so no matter when and how any algorithm tries to draw this graph, it will be forced to string the commits into short side tracks alongside the release track. There is no likely way to turn this into a funhouse of illusory complexity.

Any solution that merges release into master in any way will have a very annoying drawback, however: you can no longer read the history of master without getting all of the release merges interspersed into it. This is all the worse if you never gave you those merge commit messages much thought, because that means the history of release by itself consists of nothing but an endless row of “Merge 'master' into release”. And if that was bad enough by itself, it gets really irritating during periods when most commits are released immediately: the noise takes up a major part of your commit log.

Then an epiphany disrupted my long-standing dissatisfaction with the situation.

This is what the history in my repository looks like now:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master

-------o---o-----------o--------------------o---o-------o---o    release

That’s right: no merges.

Yet again, release is a single unbroken track. But now so is master. And since the branches are unconnected, it is never necessary to arrange them relative to each other, so they will always be drawn properly. And the master commit log remains clean and readable.

What I have done is make release an orphan branch that shares no history with master (created with git checkout --orphan). To cut a release, I check out release, then I get the tree from the commit I want to release and put that in a new commit on release. Obviously with this scheme I need to manually record the commit ID somewhere to be able to know what state of master a particular release corresponded to – there is no longer merge metadata to keep track of that. The commit message seems a natural place to record that information. I need to construct one in any case since Git does not know how to provide a default message for these commits like it does when merging a branch. Of course, the extended commit message is also a good place to put a list of commits that are hitching a ride on this release. I decided to put a release version (in my case, a simple incrementing integer) in the commit message subject as well, to make it easy to refer to a particular release.

Needless to say, I have the process automated. This is my release script:

#!/bin/bash
set -e
commit=`git rev-parse "${1-master}"`
read num junk oldcommit <<<`git log --no-walk --format=%s release --`
(
  printf '%d @ %s\n\n' $((++num)) $commit
  git log --reverse --oneline --abbrev=12 --no-decorate --no-color $oldcommit..$commit
) \
| git commit-tree $commit^{tree} -p release \
| ( read new ; git update-ref refs/heads/release $new )
git push -f origin master release

Aside from the hard linkage by commit ID you also get a soft correlation by commit date if you ask git log and friends to use --date-order. This is sufficient for routine development work. Note that since the commit IDs are recorded, it is possible to use grafts to retrospectively (possibly temporarily) make the orphan release branch seem as though a mergeful branch.

A nice aspect of doing things this way is how easy it is to get a full diff of the total change represented by a release. With a merge-based release branch it takes fiddling to ask for that diff and enough knowledge to know how to.

And so I seem to have arrived at a poor (technically awkward, functionally very limited) reinvention of Mercurial’s named branches, using the plumbing provided by Git. This may be the only true use case for named branches that I can think of.

Update: I’ve rewritten the script to use lower-level plumbing. It no longer even checks out the tree, it just directly creates a commit object based on the tree object of the released commit.

May 05, 2012 05:00 PM

May 02, 2012

Aristotle Pagaltzis

D’uh

I recently discovered the -h switch of GNU sort, added in the coreutils 7.5 release from Aug 20, 2009. With this switch, sort will do a numeric sort of human-readable size numbers, i.e. it will accept “42M” and “1.3G” as numbers and put them in the right order. This led to the following shell one-liner in my ~/bin:

#!/bin/bash
exec du "${@--xd1}" -h | sort -h

It invokes du to print the disk space consumption of a directory tree, then sorts its output by size. If you pass any switches they will be passed on to du, else it will default to -xd1 (-x = stay on one filesystem, do not cross mountpoints; -d1 = do not print directories deeper than 1 level).

I gave this script the only name it could have – obviously, duh.

Update: turns out that the -d switch of du is even newer than sort’s -h switch. It was added for compatibility with FreeBSD in the coreutils 8.6 release from Oct 15, 2010 – prior to that it had to be spelled --max-depth, which rather complicates matters. You would have to do this:

#!/bin/bash
DEFAULT=(-x --max-depth=1)
exec du "${@-${DEFAULT[@]}}" -h | sort -h

That’ll win neither beauty nor concision contests.

May 02, 2012 12:52 PM

April 16, 2012

Aristotle Pagaltzis

In which I write about PHP for the first and the last time

Tim Bray wrote a short piece on PHP and kicked up a huge hullabaloo in the land of weblogs. Here’s my contribution to the echolalia.

Tim writes that it’s his experience that systems written in PHP are all spaghetti. I don’t think that’s a coincidence, and there are two sides to that coin.

One side, which all the PHP apologists are citing with full justification, is that its nonchalant everything-but-the-kitchen-sink “standard” library approach and its wide deployment present such a low barrier to writing and re-/deploying code that a lot of people who only have small needs are empowered to meet them on their own, however messily. I have argued that this enabling function is a good thing and I stand steadfastly by that position.

But on the other side, well, PHP is… lousy. Just wretched. Why?

  • The language – and here I’m talking only about the core, that is, syntax, type system, object orientation support, scoping rules, and the like – is limited and haphazard:

    Haphazard, because it was never designed in any fashion: it grew out of a templating system with hodgepodge constructs for which orthogonality was only an afterthought.

    Limited, because while it’s all dynamically typed and garbage collected, it squanders most of that advantage by limiting itself to the expressive power of C, roughly. Anonymous functions are awkward to create, and I’m not sure closures are possible in any practical fashion at all. Lists are second-class citizens, always bound to arrays. Attempts to introspect end up looking comical. The standard code modularization mechanisms (include, require) are simple-minded textual inclusions.

  • This is my big complaint: all the APIs are execrable. It starts with the built-in stuff: try to do something with the image functions or the zip file functions – I never figured out a way to avoid making my code look ugly. With the built-in library setting a bad example, it’s no surprise that the same issue extends to the packages available from the PEAR: awkward is the rule.

    In my opinion, that is what makes me and a lot of other people feel that PHP code resists being made clean. I feel the same way when I’m forced to use Tk in Perl: the API is so misshapen that you just can’t make your own code sitting on top of it look pretty.

  • The easy, obvious way to do things is often the incorrect one.

    There are lots of tutorials which will either not tell you to quote user input before interpolating it into SQL statements at all, or tell you to use addslashes for the purpose. In either case you are open to injection attacks – either gaping wide or just wide. What you really should do is use a function that respects the particular SQL dialect, such as mysql_escape_string… no wait, that’s dead code, I mean mysql_real_escape_string. Bleurgh. And once you’ve found out all that and understood it, properly quoting user input is still a pain in the bottocks and requires much more code than not bothering. Guess what casual coders will do? Now contrast Perl’s DBI, where using bind parameters is just as easy as not; and in fact, makes the code easier to read.

    How about working with strings in an encoding-aware fashion? That means nothing short of rolling your own string munging, with “help” from some typically byzantine APIs – what fun! Which novice is going to know that they should? Who is going to bother? How many of them will get it right?

All of these flaws are interconnected; the morass is simply the result of the language being a templating system that grew too big for its breeches. I don’t believe the problems can be corrected in any sensible fashion; PHP will always be a templating system, however much it may be straining against its clothes.

And let me tell you, it’s still a great templating system! If all you need is to write a web app that consists of two pages, running four queries over a five-table database, there is nothing that will get you up and running faster.

But that doesn’t make it suitable for large-scale systems. It’s not that the premise does not scale, it’s just that this particular implementation of the premise does not. Apologists will sometimes argue that the flaws are a necessary evil in achieving the low barrier to entry; worse-is-better style. I don’t buy that argument for a second. There is no reason that a language could not be designed to address the precise problem space that PHP aims at, but be created from scratch to be big enough for its britches, without the slipshod, organic growth. There is no reason it would have to be any harder to get things done with a standard library that encourages good practices as the obvious and easy way to accomplish things.

PHP is ripe for having its lunch eaten, really.

Update: Eevee rants about it, comprehensively.

April 16, 2012 08:02 AM

April 08, 2012

Ian McDonald

Building for Amazon

Recently there has been a bit of comment from a couple of people around an article that appeared about using Amazon Web Services (AWS).

To be honest I'm more than a bit annoyed about how they were written up. The article made it sound like I'd given a talk about my use of AWS at work, or had helped write an article. I hadn't done either, but was part of a panel discussion around cloud. I do note that one of the other panellists was similarly written up too... Most of the article was from an answer to a question about what was good, and what was bad about Amazon with other bits picked out from other things. Some of it was not quite sensationalised and I didn't know the article was happening or a chance to review (something that has always happened before and I've spoken at around 20 events).

So first of all I want to go on the record and say I am a strong supporter of Amazon. They, Google and Salesforce are the companies who have done more than anything else to push cloud forward. Also anybody else who has heard me speak knows I am a strong proponent of AWS. Going forward though I think I'll just refuse to talk about any areas of improvement needed, but focus on the strengths. I have been in communication with AWS staff to tell them my view hasn't changed and I back them 100%.

So I'd know like to focus on what I was intending to convey, and did convey but wasn't reported, at the panel discussion. Building a system on Amazon does not mean that you stop making proper design decisions. Some people have assumed that because Amazon is such a great company that they can forget about system architecture and everything will be fine. You can't. You wouldn't build your physical machines or VMWare and not do backup, do no performance tuning, and have no redundancy. Most of what people use Amazon for is for running machines (They do do many other things too, like a CDN and NoSQL on demand etc). So you need to design your systems. I learnt this many years ago.

When it comes to performance you can't necessarily throw everything at it on AWS like you can with a traditional architecture as you don't know the full underlying design and you can't fix your bottle neck in a traditional way. But running your on-prem architecture in this way can cost you absolutely millions. There are numerous cases of media organisations, life science companies etc doing large batch processing for hundreds or thousands of dollars that they just couldn't do cost effectively before and to do it very quickly also. What you do need to do for AWS though is try and parallelise your workload wherever possible as Amazon works well in this model. You can also vary your machine size as you need to - and now Amazon allows 64 bit machines for all size types it makes this even easier, and saves more money. So you can go all the way from a free Micro instance up to new Sandy Bridge based machines without rebuilding your image.

Make sure with Amazon that you have machines running in another area as redundancy and you have a way of activating them. You wouldn't run your own data centres like this, so why do it differently on Amazon. The "horror" stories around when businesses struggled when an Amazon Availability Zone (AZ) went down are really a horror story about bad architects in my mind.

Backup. Yes, that is a command! Any IT exec who doesn't ensure that their data is backed up needs to be shot. Why should this be any different on Amazon? Firstly people had problems as they didn't realise that Amazon can lose changes when you do some kinds of restarts (as config runs in RAM in effect), then they didn't realise that only backing up in the data centre was a bad idea (EBS backing for the EC2 instance). Amazon do make it easy here as S3 allows you to do quick, cost efficient backups - how many other services are designed for 11 9s - that is 99.999999999%? Again, if you get data loss it is not Amazon at fault, but your architects.

Amazon do status reporting online here. Do you get that from your other vendors, or internally in your own IT function? AWS are to be applauded for their transparency. The one corruption event I referred to was documented on the status page (an EBS fault occurred). It should be noted that no data was lost at all from this as it was failed over to another system. I have had on-prem failures in the past where I have lost data or took a long time to recover. This incident was all sorted in less than an hour - AWS allows you to build solutions like this easier in many cases to avoid this problem having high impact.

So in short would I go with Amazon AWS again? Absolutely!! I have never had any significant downtime with them in my roles, and it has saved money and been extremely flexible.

NB This is all my own opinion, and not that of my employer - something I also said at the panel discussion but was also omitted.

by Ian McDonald (noreply@blogger.com) at April 08, 2012 08:50 PM

March 28, 2012

Aristotle Pagaltzis

Bug of the week

Lukas Mai:

The following code is somewhat silly, but gcc should either compile it correctly or print an error message, not generate invalid asm[:]

int $1 = -1;
int main(void) { $1++; return $1; }

Assembler injection attacks, here we come!

March 28, 2012 08:53 AM

March 25, 2012

Aristotle Pagaltzis

Six Stages of Debugging

#sixdebug { margin-left: 0 } #sixdebug li { font-size: 1.6em; font-weight: bold; margin-left: -.444em } #sixdebug li p { font-size: 0.625em; font-weight: normal }
  1. That can’t happen.

  2. That doesn’t happen on my machine.

  3. That shouldn’t happen.

  4. Why does that happen?

  5. Oh, I see.

  6. How did that ever work?

[This is not mine. Its oldest mention I could track down on the web appeared on a now-defunct weblog. I am posting it in the interest of personal archival.]

March 25, 2012 07:59 AM

March 19, 2012

Aristotle Pagaltzis

Shoestring & bubblegum sound server

In which I beat MacGyver.

I recently had need to play sound on a headless Linux machine. I started looking into sound servers, but everything I found seemed a significant amount of work to set up. I tried to reduce the problem to the fundamental parts involved, and by a trail of hints winding through a narrow mountain pass arrived at a rather… minimalist solution to fit my minimalist needs. I did not require anything else than to be able to hear sound at all and the solution did not require anything else of me than ALSA – and it’s hard to install a Linux machine without ALSA these days.

The entirety of the charade amounts to this:

  1. On the speakerless machine, load the loopback ALSA driver:

    modprobe snd-aloop index=0 pcm_substreams=1

    The driver provides a card with two sound devices, and when sound is output onto a stream on one device then the driver mirrors that as an input available on the same stream on the other device.

  2. Configure sound with an .asoundrc like this:

    pcm.!default {
      type dmix
      slave.pcm "hw:Loopback,0,0"
    }
    pcm.loop {
      type plug
      slave.pcm "hw:Loopback,1,0"
    }

    This has programs default to outputting sound to stream 0 of device 0 of the loopback (pseudo-)card, and has ALSA mixing their outputs together (type dmix). The loopback driver will make the resulting sound available for sampling via stream 0 of device 1, for which the configuration sets up another source called loop as a simple alias (type plug).

  3. On the machine with speakers you can then you can do this:

    ssh -C speakerless sox -q -t alsa loop -t wav -b 24 -r 48k - | play -q -

    The bolded portion configures sox’s input to use the ALSA type, and the underlined part (which is where normally a filename is given) gives name of the source – the loop alias from the configuration above. The rest of the switches tell sox to output 24-bit, 48 kHz WAV to standard output, to be picked up by ssh.

  4. Now play something on the speakerless machine.

This will push a constant stream of sample data down the wire, even during silence; with SSH compression enabled as it is here, that will come to something like 4 KB/sec and will very slightly busy the CPU on both machines. Both resource drains stop if you break the SSH connection. You can do so at all times without sound-playing programs on the speakerless machine ever noticing.

The one and only real drawback is a playback latency of a few fractions of a second – enough to be noticably not in real time.

But as I said, I had minimal needs of it.

[Update: added explanations.]

March 19, 2012 03:52 PM

March 16, 2012

Matt Brown

Kindle Reading Stats

I’ve written before about my initial investigations into the Kindle, and I’ve learnt much more about the software and how it communicates with the Amazon servers since then, but it all requires detailed technical explanation which I can never seem to find the motivation to write down. Extracting reading data out of the system log files is however comparatively simple.

I’m a big fan of measurement and data so my motivation and goal for the Kindle log files was to see if I could extract some useful information about my Kindle use and reading patterns. In particular, I’m interested in tracking my pace of reading, and how much time I spend reading over time.

You’ll recall from the previous post that the Kindle keeps a fairly detailed syslog containing many events, including power state changes, and changes in the “Booklet” software system including opening and closing books and position information. You can eyeball any one of those logfiles and understand what is going on fairly quickly, so the analysis scripts are at the core just a set of regexps to extract the relevant lines and a small bit of logic to link them together and calculate time spent in each state/book.

You can find the scripts on Github: https://github.com/mattbnz/kindle-utils

Of course, they’re not quite that simple. The Kindle doesn’t seem to have a proper hardware clock (or mine has a broken hardware clock). My Kindle comes back from every reboot thinking it’s either at the epoch or somewhere in the middle of 2010, the time doesn’t get corrected until it can find a network connection and ping an Amazon server for an update, so if you have the network disabled it might be many days or weeks of reading before the system time is updated to reality. Once it has a network connection it uses the MCC reported by the 3G modem to infer what timezone it should be in, and switches the system clock to local time. Unfortunately the log entries all look like this:


110703:193542 cvm[7908]: I TimezoneService:MCCChanged:mcc=310,old=GB,new=US:
110703:193542 cvm[7908]: I TimezoneService:TimeZoneChange:offset=-25200,zone=America/Los_Angeles,country=US:
110703:193542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wan,name=localTimeOffsetChanged,arg0=-25200,arg1=1309689302:
110703:193542 cvm[7908]: I TimezoneService:LTOChanged:time=1309689302000,lto=-25200000:
110703:183542 system: I wancontrol:pc:processing "pppstart"
110703:193542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wan,name=dataStateChanged,arg0=2,arg1=:
110703:183542 cvm[7908]: I ConnectionService:LipcEventArrived:source=com.lab126.cmd,name=intfPropertiesChanged,arg0=,arg1=wan:
110703:183542 cvm[7908]: W ConnectionService:UnhandledLipcEvent:event=intfPropertiesChanged:
110703:193542 wifid[2486]: I wmgr:event:handleWpasupNotify(<2>CTRL-EVENT-DISCONNECTED), state=Searching:
110703:113542 wifid[2486]: I spectator:conn-assoc-fail:t=374931.469106, bssid=00:00:00:00:00:00:
110703:113542 wifid[2486]: I sysev:dispatch:code=Conn failed:
110703:183542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wifid,name=cmConnectionFailed,arg0=Failed to connect to WiFi network,arg1=:

Notice how there is no timezone information associated with the date/time information on each line. Worse still the different daemons are logging in at least 3 different timezones/DST offsets all interspersed within the same logfile. Argh!!

So our simple script that just extracts a few regexps and links them together nearly doubles in size to handle the various time and date convolutions that the logs present. Really, the world should just use UTC everywhere. Life would be so much simpler.

The end result is a script that spits out information like:

B000FC1PJI: Quicksilver: Read 1 times. Last Finished: Fri Mar 16 18:30:57 2012
- Tue Feb 21 11:06:24 2012 => Fri Mar 16 18:30:57 2012. Reading time 19 hours, 29 mins (p9 => p914)

...

Read 51 books in total. 9 days, 2 hours, 29 mins of reading time

I haven’t got to the point of actually calculating reading pace yet, but the necessary data is all there and I find the overall reading time stats interesting enough for now.

If you have a jailbroken Kindle, I’d love for you to have a play and let me know what you think. You’ll probably find logs going back at least 2-3 weeks still on your Kindle to start with, and you can use the fetch-logs script to regularly pull them down to more permanent storage if you desire.

by matt at March 16, 2012 11:08 PM

January 23, 2012

Ian McDonald

Cloud computing and location of data

Disclaimer: I'm not a lawyer so this is not legal advice, and these views do not represent my employer's views either.

One of the big elephants in the room with cloud computing is the location of data. People are naturally worried about whether their data is accessible by others or not. Some providers will tell you the location of the data, some will not. There are also the issues of the Patriot Act and safe harbour when interaction with technology providers across the Atlantic.

The Patriot Act requires a US based corporation to hand over data to the government and they do not have to disclose it to the end customer either if they are service provider. As far as I can understand you are not protected any further even if the data is in the EU or another region. The defining requirement is whether they are a US based company.

One thing that is mentioned often is safe harbour. Basically what safe harbour means is that the US based provider will adhere to the same standards as the EU requires. This is because US data protection is basically non-existent. The safe harbour provisions does NOT mean your data will reside in the EU, it just means that it will be protected to the same standard as the EU.


Of course none of this matters if you work for a global corporation headquartered in the USA anyway as then you are required to hand data over to the government if requested under the Patriot Act as I read it. The difference is, whether you know whether the government is accessing your data. The government could request your data from you, but may not need to if they go to your cloud supplier who is also a US based corporation.


It is common sense that if you have sensitive data that you encrypt it, whether you store it in the cloud or on premises. This is especially important for data such as customer or employee data that would cause damage - either real financial loss or damage to reputation.

I believe that there will be a rise in cloud encryption services e.g. GPG type plugins for Gmail. Already Amazon has a service for S3 called Server-Side Encryption. With this service you give your private keys to Amazon to seamlessly encrypt/decrypt data on the fly. However what this means is that Amazon could give your encrypted data to the US government without your knowledge even the Patriot Act. As such in my mind the only reason that anybody would use this would be for low value data and I would not consider an encryption service for email where the vendor controls the private keys.

One aspect that people do often overlook is not so much government regulations, but their own rules. What do your customer policies say, and what do your own staff policies say. For example your HR policy may say that all personnel data will be stored in UK, or your customer terms and conditions might say that all data will be stored in the EU. Many cloud services might not be based in the EU and there are very, very few in the UK. There can also be obscure regulations specific to your industry e.g. in a previous role the code master had to be in the UK as cryptographic code was considered a weapon and needed an export license.

It should be noted that complying with privacy regulations by storing data in the EU does not mean that it cannot be taken under the Patriot Act. In these cases it is assumed that the US Government is the evil one, but I have no less reason to suspect that the UK or any other government is any less nefarious.

My conclusion is that it is safe to store data in the cloud if a company adheres to safe harbour, and is probably better than most companies own data protection. If however you are worried about your data falling into government hands then you need to look into it very carefully. The only really safe way to protect your data from governments is to encrypt your data in the cloud with your own encryption keys.

Useful reference articles:
ZDNet Article: How the USA Patriot Act can be used to access EU data
Wikipedia article on the Patriot Act
Wikipedia article on safe harbor
Ars Technica article on Patriot Act and cloud providers

by Ian McDonald (noreply@blogger.com) at January 23, 2012 10:17 AM

January 16, 2012

Aristotle Pagaltzis

Status quo awareness

Paul Graham:

The trick I recommend is to take yourself out of the picture. Instead of asking “what problem should I solve?” ask “what problem do I wish someone else would solve for me?”

January 16, 2012 11:38 AM

January 14, 2012

Aristotle Pagaltzis

Concise XPath

I get the impression that not many people know XPath, or know it very well, which is a shame. For one, it’s a beautifully concise notation (as you’ll see shortly). For another, it may be the difference between whether you hate XML or not. (I won’t claim it’ll make you like XML, though it may. It did for me.)

XPath is really very simple: you just string together conditions. Evaluation begins with a set of nodes so far. Then a new set of nodes is selected based on the given ones, and the condition is checked on this new set. If it’s a condition you appended with /, that means to then select the matching nodes for the next step. If you appended it inside [], that means to continue on with the original set, but to discard those nodes for which there were no matching new nodes.

So /foo/bar means this:

  1. Start with the root node.
  2. Then /foo: for each node (which is just the root node, so far), fetch its child nodes (of which the root note always has exactly one), check which ones are foo elements, and take those as the new set.
  3. Then /bar: for each node, fetch its child nodes, check which ones are bar elements, and take those as the new set.

These conditions appended with / are known as steps.

And /foo[bar] means this:

  1. Start with the root node.
  2. Then /foo: for each node, fetch its child nodes, check which ones are foo elements, and take those as the new set.
  3. Then [bar]: for each node, fetch its child nodes, check if any are bar elements, and if you come up empty then discard that node.

This is known as a predicate. Each predicate can itself be just as complex as any expression: it can itself contain steps and predicates.

Finally, there are axes, written as prefixes separated with a ::. Axes specify which set of nodes to select before checking the condition – it doesn’t have to be the child nodes of the current set, that’s just the default axis (which you don’t need to write) called child::. So you can write e.g. /foo/following-sibling::bar:

  1. Start with the root node.
  2. Then /foo: for each node, fetch its child nodes, check which ones are foo elements, and take those as the new set.
  3. Then /following-sibling::bar: for each node, fetch all its siblings, check which are bar elements, and then take those as the new set.

(Thus /foo/bar and /foo[bar] really mean /child::foo/child::bar and /child::foo[child::bar] respectively. Therefore each condition also includes a selection rule, often implicitly.)

Compare expressions and explanations and you see what I said about concision and beauty.

Now, with those principles given to you, just string together conditions. There are a few syntactic shortcuts other than not needing to write child::, e.g. you can write attribute::foo as @foo, and /descendant-or-self::foo can be written //foo, but there is no magic to those: they are just sugar. For the details – lists of possible axes, syntactic shortcuts, etc. – just refer to the standard. Lousy though it may be as an introduction, it makes a good reference.

That’s XPath.


Some practical notes:

With the various axes such as following-sibling::, you always get a whole set (e.g. all following siblings in this example). If you want a specific one from that set based on position – usually the first –, you have to discard the ones you aren’t interested in by using a predicate that checks the position – in that case [1], which is another shortcut notation, standing for [position() = 1]. The position() function evaluates to the index of a node within its subset, which is based on the node it was selected for.

So a common construction is following-sibling::*[1], which amounts to “the element whose start-tag is right after this one’s end-tag.” A somewhat likely case is to further combine this with a [self::foo] predicate to say “but only as long as that is a foo element.”

Observe that the order of predicates matters.

If you write *[self::foo][1], you get all elements, then narrow it down to the foo elements, then to the first of them – so it amounts to “select the first foo element anywhere” which is identical to the much simpler expression foo[1]. This is very different from *[1][self::foo], which first narrows down “everything” to “the first thing” and only then checks “but only if it’s a foo.”

January 14, 2012 03:58 AM

January 09, 2012

Aristotle Pagaltzis

The essentially mediocre

MG Siegler:

If you’re saying something that you think is great, why would you want to do it as a comment on another site anyway?

January 09, 2012 07:08 AM

December 07, 2011

Aristotle Pagaltzis

Spend money on… which is it, now

Maciej Ceglowski:

To avoid this problem, avoid mom-and-pop projects that don’t take your money! You might call this the anti-free-software movement.

But it’s not! It’s the anti-free-service movement. Which I whole-heartedly support.

(Maciej makes that point himself, eventually and obliquely, but not until after the catchy coinage…)

December 07, 2011 05:54 PM

December 04, 2011

Craig Box

Three months with the TouchPad

I first started writing this post on 2 September 2011. It was going to be called "three days with the TouchPad". I'd like to say that my opinion has changed substantially over the three months since then, but for that to have happened, I would have had to spend serious time with the device.

I haven't.

Last time anyone in our house tried to use the TouchPad it got thrown on the couch in disgust1 On the contrary, our iPad is happily used every day. Is this just a case of "you get what you pay for"?

The story so far

I fought my way through the broken websites to purchase an £89 HP TouchPad when they cleared their stock at the end of August. I couldn't be sure that Carphone Warehouse had stock for all their orders, so I was overjoyed when mine turned "dispatched" later in the week. Then, it never arrived.  I wasted hours on the phone with CPW and Yodel (cheap courier of choice for "free delivery" everywhere), who claimed it had been delivered, when no knock had ever graced my door. The driver only spoke Bulgarian, and intimated (through a translator and wild hand gesturing) that he had given it to someone who had come up from the stairs below us - an empty flat.

I had all but given up on the delivery when, after the weekend, our neighbour came over and said their housekeeper had collected it on Friday and had it the whole time.

Argh.

Eventually, thanks to people like me, the TouchPad ended up getting 17% of the market!

Of everything that wasn't the iPad.

(So, more like 1.8% then.)

And remember, I very nearly wasn't a member of that club, as it seemed very unlikely that Carphone Warehouse would have been in a position to give me another one, had the first one not surfaced.

The TouchPad was an impulse buy, as we already owned an iPad. I had opted for middle of the range - the 32GB with 3G.2 At clearance price, my iPad cost 7 times more than the TouchPad, but remember that the original retail pricing for a comparable device was £399 for HP vs £429 for Apple.

With all that in mind, here's a collection of thoughts about the TouchPad today. It is not a review: if you are interested in a review, albeit one from before the fire-sale, go read what Shawn Blanc wrote. The experience has hardly changed.

The good

I came into TouchPad ownership with a very open mind, based in part on my ex-colleague Sergei owning a Palm Pré and not hating it. Also, everything I read about webOS online made it seem that it was designed, where Android was mostly congealed. (My apologies to Douglas Adams.) Further, I wanted webOS to be a success, because I like to use systems that feel like they are consistently designed throughout, and I didn't think it would be good for the world if iOS was to be the only relevant platform for which that was true. We are in the odd position today that Microsoft has replaced Palm as the loveable underdog: Windows Phone (and possible Windows 8 for tablets) has taken the mantle of "mobile operating environment which actually has some moden design principles applied, rather than just copying iOS", which surely must provoke some cognitive dissonance for all the people still bitter about how Microsoft stole everything from the Mac.

I only made one note from three days after unboxing: "It is really handy to have the number keys on the keyboard all the time". It still is. I suppose there are other nice things, depending on your point of comparison. Notifications are good, in general, though I really don't care that each web site I visit exposes a search endpoint, so I don't appreciate that the TouchPad displays me a notification for each and tries to add them to the search.

Grasping at straws, I still like the card metaphor, though not as much for multiple tabs as for multiple applications. And the things that were good about webOS on the phone, such as the integrated contacts, are still good here, though not as useful. The only other thing I noticed in a quick look through the menus is that it has Beats Audio, which I like to think makes me one step closer to Dr Dre than most. I don't think I've ever actually tried to make the thing play audio in order that I might notice a difference.

The goblin

How long after the horse died is it acceptable to still be flogging it?

The TouchPad is slow, out of the box. Nerds like me can make it faster with - wait for it - syslogd and kernel patches, and even overclock it if they feel the need. (I didn't.)  The iPad 1 still runs rings around it in everything - even though the iPad has half the CPU cores at a much lower clock speed, and one quarter the RAM of the TouchPad.

It has a handful of apps, but not enough to retroactively justify the purchase to me, even at £89. If I go to my Applications list, I have a beta Kindle reader, which I had to side-load as it is US only: the best Twitter experience is something called "Spaz HD Beta Preview 2", which is both award-winning and open source, though apparently named by the people who came up with "The GIMP". In fairness, it's not bad, it's just not up to the experience which is available on any one of the great Twitter clients for other platforms. And with the on-again off-again abandonment by HP, surely most of those who came into the TouchPad did it eyes-open, knowing the chances of it ever developing a good app ecosystem were not high.

Most of what I do on a tablet is web browsing, and so even if it had no apps but did web browsing brilliantly, it might be redeemed. It doesn't. It has Flash, which really just serves to make YouTube worse. Maps are horrible, scrolling is slow and sluggish, and clicking doesn't normally hit the link you want it to.

Physically, it feels cheap, due to the plastic back.  It is a good weight however.

The purchase

In my mind, there were three groups of people who wanted to buy a TouchPad at fire sale prices:

  • People who wanted a "tablet" (iPad), but couldn't afford or justify one at market (iPad) prices
  • People who wanted an "Android tablet" and figured that a port couldn't be far away
  • People who liked webOS and actually wanted a TouchPad to use webOS on it

I was in the third group, but I also suspect that was about 1.8% of the people who actually got the device.

If you were to compare the experience on a £89 TouchPad vs. whatever else you could legitimately purchase for £89 - how long were the queues for the Binatone HomeSurf 7? - it seems like a no-brainer. If there was no chance that the tablet were ever able to run Android, I don't think it would have sold nearly as quickly. At the time of writing there is an alpha-quality CyanogenMod release of Android for the TouchPad, for developers, rather than end users. With the recent release of Android 4.0, it's likely there will be a reasonably good upgrade path for the application story, and on this kind of hardware Android should be about as good as it is on any other kind of hardware.

I bemoaned this fact when I came to buy it:


#bbpBox_105984734731042816 a { text-decoration:none; color:#1F98C7; }#bbpBox_105984734731042816 a:hover { text-decoration:underline; }

I wish I could find everyone talking about running Android on the hp TouchPad, and STAB THEM IN THE FACE.
@craigbox
Craig Box

Three months later, has my attitude changed? Somewhat. I simply don't want to own an Android tablet. (Neither do many other people, as we established before.) Would it be better on this hardware than webOS? Probably. Ask me again when 4.0 is released for the TouchPad - I don't think the attempts to shoehorn Android 2.x onto tablets have done hackers any better than Samsung.

I don't think there can be any argument that the fire sale was a dumb idea, and HP's CEO eventually paid the price. Would I have paid £200 for this? No, but they would still have sold out at that price.

The summary

First world problems much? Our two tablet household isn't as good as it would be if we had an iPad each. Sure. I knowingly bought an £89 gadget to have a play with, and I suspect I could easily get that back if I wanted to sell it. Alternatively, if either of my brothers read my blog, I might be convinced to post it to them for Christmas. Over time, I think I might find a use for it - if I could pick up the Touchstone dock-slash-stand, I think it could make a great digital photo frame.  Even if all it ever did was be an LCD Kindle, it was still a bargain.

But the crux is that neither of us ever want to use it. It almost got put in the cupboard today. Attempts to use it provoke disgust, throwing it back onto the couch, and getting up to find the iPad. There is really nothing redeeming about it.

  1. Fern later clarified: "It wasn't thrown on the couch, it was thrown at the couch.
  2. If I were to look back on that purchase, I would say the money spent on the 3G was mostly wasted - tablet usage is mostly at home. The iPad spent over a year without a 3G SIM card, though it has one now thanks to Arunabh, who pointed out that T-Mobile have a remarkable 12 months free on an iPhone 4 PAYG SIM, and the iPad takes the SIM quite happily.

by Craig at December 04, 2011 05:51 PM

December 02, 2011

Stuart Yeates

Prep notes for NDF2011 demonstration

I didn't really have a presentation for my demonstration at the NDF, but the event team have asked for presentations, so here are the notes for my practice demonstration that I did within the library. The notes served as an advert to attract punters to the demo; as a conversation starter in the actual demo and as a set of bookmarks of the URLs I wanted to open.




Depending on what people are interested in, I'll be doing three things

*) Demonstrating basic editing, perhaps by creating a page from the requested articles at http://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles

*) Discussing some of the quality control processes I've been involved with (http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion and http://en.wikipedia.org/wiki/New_pages_patrol)

*) Discussing how wikipedia handles authority control issues using redirects (https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Redirect ) and disambiguation (https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Disambiguation )

I'm also open to suggestions of other things to talk about.

by Stuart Yeates (noreply@blogger.com) at December 02, 2011 02:25 PM

December 01, 2011

Stuart Yeates

Metadata vocabularies LODLAM NZ cares about

At today's LODLAM NZ, in Wellington, I co-hosted a vocabulary schema / interoperability session. I kicked off the session with a list of the metadata schema we care about and counts of how many people in the room cared about it. Here are the results:

8 Library of Congress / NACO Name Authority List
7 Māori Subject Headings
6 Library of Congress Subject Headings
5 SONZ
5 Linnean
4 Getty Thesauri
3 Marsden Research Subject Codes / ANZRSC Codes
3 SCOT
3 Iwi Hapū List
2 Australian Pictorial Thesaurus
1 Powerhouse Object Names Thesaurus
0 MESH

This straw poll naturally only reflects on the participants who attended this particular session and counting was somewhat haphazard (people were still coming into the room), but is gives a sample of the scope.

I don't recall whether the heading was "Metadata we care about" or "Vocabularies we care about," but it was something very close to that.

by Stuart Yeates (noreply@blogger.com) at December 01, 2011 08:36 PM

November 30, 2011

Stuart Yeates

Unexpected advice

During the NDF2011 today I was in "Digital initiatives in Māori communities" put on the the talented Honiana Love and Claire Hall from the Te Reo o Taranaki Charitable Trust about their work on He Kete Kōrero. At the end I asked a question "Most of us [the audience] are in institutions with te Reo Māori holdings or cultural objects of some description. What small thing can we do to help enable our collections for the iwi and hapū source communities? Use Māori Subject Headings? The Iwi / Hapū list? Geotagging? ..." Quick-as-a-blink the response was "Geotagging." If I understood the answer (given mainly by Honiana) correctly, the point was that geotagging is much more useful because it's much more likely to be done right in contexts like this. Presumably because geotagging lends itself to checking, validation and visualisations that make errors easy to spot in ways that these other metadata forms don't; it's better understood by those processing the documents and processing the data.

I think it's fabulous that we're getting feedback from indigenous groups using information systems in indigenous contexts, particularly feedback about previous attempts to cater to their needs. If this is the experience of other indigenous groups, it's really important.

by Stuart Yeates (noreply@blogger.com) at November 30, 2011 09:27 PM

November 26, 2011

Stuart Yeates

Goodbye 'social-media' world

You may or may not have noticed, but recently a number of 'social media' services have begun looking and working very similarly. Facebook is the poster-child, followed by google+ and twitter. Their modus operandi is to entice you to interact with family-members, friends and acquaintances and then leverage your interactions to both sell your attention advertisers and entice other members of you social circle to join the service.

There are, naturally, a number of shiny baubles you get for participating it the sale of your eyeballs to the highest bidder, but recently I have come to the conclusion that my eyeballs (and those of my friends, loved ones and colleagues) are worth more.

I'll be signing off google plus, twitter and facebook shortly. I my return for particular events, particularly those with a critical mass the size of Jupiter, but I shall not be using them regularly. I remain serenely confident that all babies born in my extended circle are cute, I do not need to see their pictures.

I will continue using other social media as before (email, wikipedia, irc, skype, etc) as usual. My deepest apologies to those who joined at least party on my account.

by Stuart Yeates (noreply@blogger.com) at November 26, 2011 09:59 PM

November 24, 2011

Matt Brown

How I’m voting in 2011

It’s general election time again in New Zealand this year, with the added twist of an additional referendum on whether to keep MMP as our electoral system. If you’re not interested in New Zealand politics, then you should definitely skip the rest of this post.

I’ve never understood why some people consider their voting choices a matter of national security, so when via Andrew McMillan, I saw a good rationale for why you should share your opinion I found my excuse to write this post.

Party Vote
I’ll be voting for National. I’m philosophically much closer to National than Labour, particularly on economic and personal responsibility issues, but even if I wasn’t the thought of having Phil Goff as Prime Minister would be enough to put me off voting Labour. His early career seems strong, but lately it’s been one misstep and half-truth after another, the remainder of the Labour caucus and their likely support partners don’t offer much reassurance either. If I was left-leaning and the mess that Labour is in wasn’t enough to push me over to National this year then I’d vote Greens and hope they saw the light and decided to partner with National.

Electorate Vote
I live in Dublin, but you stay registered in the last electorate where you resided, which for me is Tamaki. I have no idea who the candidates there are, so I’ll just be voting for the National candidate for the reasons above.

MMP Referendum
I have no real objections to MMP and I think it’s done a good job of increasing representation in our parliament. I like that parties can bring in some star players without them having to spend time in an electorate. I don’t like the tendency towards unstable coalitions that our past MMP results have sometimes provided.

Of the alternatives, STV is the only one that I think should be seriously considered, FPP and it’s close cousin SM don’t give the proportionality of MMP and PV just seems like a simplified version of STV with limited other benefit. If you’re going to do preferential voting, you might as well do it properly and use STV.

So, I’ll vote for a change to STV, not because I’m convinced that MMP is wrong, but because I think it doesn’t hurt for the country to spend a bit more time and energy confirming that we have the right electoral system. If the referendum succeeds and we get another referendum between MMP and something other than STV in 2014, I’ll vote to keep MMP. If we have a vote between MMP and STV in 2014 I’m not yet sure how I’d vote. STV is arguably an excellent system, but I worry that it’s too complex for most voters to understand.

PS. Just found this handy list of 10 positive reasons to vote for National, if you’re still undecided and need a further nudge. Kiwiblog: 10 positive reasons to vote National

by matt at November 24, 2011 11:45 AM

November 06, 2011

Stuart Yeates

Recreational authority control

Over the last week or two I've been having a bit of a play with Ngā Ūpoko Tukutuku / The Māori Subject Headings (for the uninitiated, think of the widely used Library of Congress Subject Headings, done Post-Colonial and bi-lingually but in the same technology) the main thing I've been doing is trying to munge the MSH into Wikipedia (Wikipedia being my addiction du jour).

My thinking has been to increase the use of MSH by taking it, as it were, to where the people are. I've been working with the English language Wikipedia, since the Māori language Wikipedia has fewer pages and sees much less use.

My first step was to download the MSH in MARC XML format (available from the website) and use XSL to transform it into a wikipedia table (warning: large page). When looking at that table, each row is a subject heading, with the first column being the the te reo Māori term, the second being permutations of the related terms and the third being the scope notes. I started a discussion about my thoughts (warning: large page) and got a clear green light to create redirects (or 'related terms' in librarian speak) for MSH terms which are culturally-specific to Māori culture.

I'm about 50% of the way through the 1300 terms of the MSH and have 115 redirects in the newly created Category:Redirects from Māori language terms. That may sound pretty average, until you remember that institutions are increasingly rolling out tools such as Summon, which use wikipedia redirects for auto-completion, taking these mappings to the heart of most Māori speakers in higher and further education.

I don't have a time-frame for the redirects to appear, but they haven't appeared in Otago's Summon, whereas redirects I created ~ two years ago have; type 'jack yeates' and pause to see it at work.

by Stuart Yeates (noreply@blogger.com) at November 06, 2011 09:58 PM

October 07, 2011

Aristotle Pagaltzis

Elegy to my only love in the cloud

Maciej Ceglowski:

Avos did a similar thing last week when they relaunched Delicious while breaking every feature that made their core users so devoted to the site (networks, bundles, subscriptions and feeds). They seemed to have no idea who their most active users were, or how strongly those users cared about the product. In my mind this reinforced the idea that they had bought Delicious simply as a convenient installed base of “like” buttons scattered across the internet, with the intent of building a completely new social site unrelated to saving links.

May you eventually find rest, del.icio.us. You have been undead since the day Yahoo! bought you, and Avos has only desecrated your corpse further. (I think at this point it qualifies as brand necrophilia.)

I made my peace at the beginning of the year – Avos just put the last nail in its coffin as far as I am concerned.

But I am saddened nonetheless.

For posterity, I should note my personalised comical note in this: the Avos zombie version of del.icio.us requires that usernames be at least 3 characters long. So I can no longer log into my account: ap. My ex-account. I could not even download an export of my bookmarks now if I didn’t thankfully have one already.

The worst I feel about this is for Joschua Schachter, and for the people who joined up with him after the Yahoo! acquisition because they understood his aspirations. There is a lesson here: if you care about something, don’t give away control of it – or at the least, not to a corporation. (Joining forces with other people made of flesh and blood is – no, can be – another matter. Choose wisely.)

What a shame.

October 07, 2011 10:15 PM

September 12, 2011

Ian McDonald

Future of mobile apps

Interesting article about HTML5 being the future of apps on mobile. Having worked in the smartphone industry I totally agree. Just like not many apps are used on the PC / Mac anymore but mostly web browser I believe the same thing will happen on tablet / smartphone.

The key obstacle to PC moving to web was the horsepower and fast links. If you think about the smartphone and mobile networks you could consider them to be like the PC and dialup internet 5 years ago. Given Moores law and 4G networks the mobile will go the same way I believe.

The other key part is offline storage in HTML5 so that applications can retain data/state. Already the financial times have switched to HTML5 based app so that they can avoid the 'Apple Tax'

by Ian McDonald (noreply@blogger.com) at September 12, 2011 01:14 PM

September 07, 2011

Aristotle Pagaltzis

OPML is spectacularly lousy

OPML blows chunks. This is my conclusion after a good two hours spent on exasperated googling: the specification is about as vague and informal as could be, the format misuses XML badly, and vital parts of it used widely by feed aggregators seem to be documented nowhere at all. Yuck. I guess I’ll end up deciphering the information I need from existing working code and hope it works in the general case.

This all started as what I thought would be a fun small scripting exercise: I was going to throw together a little script that would turn someone’s LiveJournal friends list into an OPML blogroll. Instead I spent more time beating on Google in mounting frustration, fruitlessly attempting to find something – anything –, as it would have taken to write the code for a better specified format. I came out empty-handed.

Update: Uche Ogbuji has ranted about the format, and Léon Brocard reports of a quote from a Ben Hammersley/Timothy Appnel talk at OSCON ’05:

Working with OPML is like driving nails into the floor with your forehead.

Update: I can’t believe I never linked to Charles Miller’s most eloquent panning of the format.

September 07, 2011 04:10 PM

August 31, 2011

Aristotle Pagaltzis

August 16, 2011

Stuart Yeates

Thoughts on "Letter about the TEI" from Martin Mueller

Thoughts on "Letter about the TEI" from Martin Mueller

Note: I am a member of the TEI council, but this message is should be read as personal position at the time of writing, not a council position, nor the position of my employer.

Reading Martin's missive was painful. I should have responded earlier, I think perhaps I was hoping someone else could say what I wanted to say and I could just say "me too." They haven't so I've become the someone else.

I don't think that Martin's "fairly radical model" is nearly radical enough. I'd like to propose a significantly more radical model as strawman:


1) The TEI shall maintain a document called the 'The TEI Principals.' The purpose of The TEI is to advance The TEI Principals.

2) Institutional membership of The TEI is open to groups which publish, collect and/or curate documents in formats released by The TEI. Institutional membership requires members acknowledge The TEI Principals and permits the members to be listed at http://www.tei-c.org/Activities/Projects/ and use The TEI logos and branding.

3) Individual membership of The TEI is open to individuals; individual membership requires members acknowledge The TEI Principals and subscribe to The TEI mailing list at http://listserv.brown.edu/?A0=TEI-L.

4) All business of The TEI is conducted in public. Business which needs be conducted in private (for example employment matters, contract negotiation, etc) shall be considered out of scope for The TEI.

5) Changes to the structure of The TEI will be discussed on the TEI mailing list and put to a democratic vote with a voting period of at least one month, a two-thirds majority of votes cast is required to pass a motion, which shall be in English.

6) Groups of members may form for activities from time-to-time, such as members meetings, summer schools, promotions of The TEI or collective digitisation efforts, but these groups are not The TEI, even if the word 'TEI' appears as part of their name.




I'll admit that there are a couple of issues not covered here (such as who holds the IPR), but it's only a straw man for discussion. Feel free to fire it as necessary.



by Stuart Yeates (noreply@blogger.com) at August 16, 2011 07:53 PM

June 23, 2011

Stuart Yeates

unit testing framework for XSL transformations?

I'm part of the TEI community, which maintains an XML standard which is commonly transformed to HTML for presentation (more rarely PDF). The TEI standard is relatively large but relatively well documented, the transformation to HTML has thus far been largely piecemeal (from a software engineering point of view) and not error free.

Recently we've come under pressure to introduce significantly more complexity into transformations, both to produce ePub (which is wrapped HTML bundled with media and metadata files) and HTML5 (which can represent more of the formal semantics in TEI). The software engineer in me sees unit testing the a way to reduce our errors while opening development up to a larger more diverse group of people with a larger more diverse set of features they want to see implemented.

The problem is, that I can't seem to find a decent unit testing framework for XSLT. Does anyone know of one?

Our requirements are: XSLT 2.0; free to use; runnable on our ubuntu build server; testing the transformation with multiple arguments; etc;

We're already using: XSD, RNG, DTD and schematron schemas, epubcheck, xmllint, standard HTML validators, etc. Having the framework drive these too would be useful.

The kinds of things we want to test include:
  1. Footnotes appear once and only once
  2. Footnotes are referenced in the text and there's a back link from the footnote to the appropriate point in the text
  3. Internal references (tables of contents, indexes, etc) point somewhere
  4. Language encoding used xml:lang survives from the TEI to the HTML
  5. That all the paragraphs in the TEI appear at least once in the HTML
  6. That local links work
  7. Sanity check tables
  8. Internal links within parallel texts
  9. ....
Any of many languages could be used to represent these tests, but ideally it should have a DOM library and be able to run that library across entire directories of files. Most of our community speak XML fluently, so leveraging that would be good.

by Stuart Yeates (noreply@blogger.com) at June 23, 2011 10:10 PM

June 21, 2011

Ian McDonald

Slides from Cloud Computing World Forum

My slides from Cloud Computing World Forum today are up at http://www.next-genit.co.uk/events-1

It was a good event and some pretty good speakers in general and lots of interesting and smart people around. If you're in London it's still on tomorrow.

by Ian McDonald (noreply@blogger.com) at June 21, 2011 08:14 PM

June 07, 2011

Ian McDonald

Slides from today's presentation

My slides "Evolution of cloud computing" from today are up at http://www.next-genit.co.uk/events-1

It was a great little summit over in Newport, Wales - thanks to the organisers.

by Ian McDonald (noreply@blogger.com) at June 07, 2011 09:58 PM

May 14, 2011

Ian McDonald

Why the Google Chromebook will succeed

I'm excited about the Google Chromebook and I believe it will do very well. Not 80% market share well, but perhaps around 20% and Apple seems to do very well with a similar market share.

Why do I think this is a game changer? Isn't a notebook that only works on the Internet doomed?

I don't think so for a few reasons. Firstly it will work offline. Google Docs will be working offline soon. The operating system (ChromeOS) will also store files locally. This includes things like Google Music and I'm sure others such as Spotify/DropBox will add support once it becomes popular.

Secondly it will work seamlessly with Google Mail/Google Apps. More and more organisations and individuals are going down this path and remember all those people with Android phones (now #1 smartphone) also have a Google account. So now you'll have a notebook with optional 3G that you can use for email, docs etc even offline. I often use my Android phone instead of my iPad or Mac or PC already when travelling as it just works so much nicer. I'm also at a point where with the combination of DropBox and Google Apps/Mail I can easily switch machines when I want and I'm sure more people will head down this route in the future.

For both companies and individuals these combinations of things will drive down the cost. For businesses these are machines that don't need to be managed very much at all, and for individuals the notebook will keep working without being touched.

Citrix and VMWare are also supporting virtual desktops on Google Chromebook so this will work very well for businesses that haven't yet converted everything to the browser (which is the best strategy).

And of course there is the 8 second boot so you can play Angry Birds quickly.

As a postscript ff you want to come to work for a cool media company that uses this sort of technology then have a look here.

by Ian McDonald (noreply@blogger.com) at May 14, 2011 04:24 PM

March 23, 2011

Stuart Yeates

Is there a place for readers' collectives in the bright new world of eBooks?

The transition costs of migrating from the world of books-as-physical-artefacts-of-pulped-tree to the world of books-as-bitstreams are going to be non-trivial.

Current attempts to drive the change (and by implication apportion those costs to other parties) have largely been driven by publishers, distributors and resellers of physical books in combination with the e-commerce and electronics industries which make and market the physical eBook readers on which eBooks are largely read. The e-commerce and electronics industries appear to see traditional publishing as an industry full of lumbering giants unable to compete with the rapid pace of change in the electronics industry and the associated turbulence in business models, and have moved to poach market-share. By-and-large they've been very successful. Amazon and Apple have shipped millions of devices billed as 'eBook readers' and pretty much all best-selling books are available on one platform or another.

This top tier, however, is the easy stuff. It's not surprising that money can be made from the latest bodice-ripping page-turner, but most of the interesting reading and the majority of the units sold are outside the best-seller list, on the so-called 'long tail.'

There's a whole range of books that I'm interested in that don't appear to be on the business plan of any of the current eBook publishers, and I'll miss them if they're not converted:

  1. The back catalogue of local poetry. Almost nothing ever gets reprinted, even if the original has a tiny print run and the author goes on to have a wonderfully successful career. Some gets anthologised and a few authors are big enough to have a posthumous collected works, when their work is no longer cutting edge.
  2. Some fabulous theses. I'm thinking of things like: http://ir.canterbury.ac.nz/handle/10092/1978, http://victoria.lconz.ac.nz/vwebv/holdingsInfo?bibId=69659 and http://otago.lconz.ac.nz/vwebv/holdingsInfo?bibId=241527
  3. Lots of te reo Māori material (pick your local indigenous language if you're reading this outside New Zealand)
  4. Local writing by local authors.

Note that all of these are local content---no foreign mega-corporation is going to regard this as their home-turf. Getting these documents from the old world to the new is going to require a local program run by (read funded by) locals.

Would you pay for these things? I would, if it gave me what I wanted.


What is it that readers want?

We're all readers, of one kind or another, and we all want a different range of things, but I believe that what readers want / expect out of the digital transition is:

  1. To genuinely own books. Not to own them until they drop their eReader in the bath and lose everything. Not to own them until a company they've never heard of goes bust and turns off a DRM server they've never heard of. Not to own them until technology moves on and some new format is in use. To own them in a manner which enables them to use them for at least their entire lifetime. To own them in a manner that poses at least a question for their heirs.
  2. A choice of quality books. Quality in the broadest sense of the word. Choice in the broadest sense of the word. Universality is a pipe-dream, of course, but with releasing good books faster than I can read them.
  3. A quality recommendation service. We all have trusted sources of information about books: friends, acquaintances, librarians or reviewers that history have suggested have similar ideas as us about what a good read is.
  4. To get some credit for already having bought the book in pulp-of-murdered-tree work. Lots of us have collections of wood-pulp and like to maintain the illusion that in some way that makes us well read.
  5. Books bought to their attention based on whether they're worth reading, rather than what publishers have excess stock of. Since the concept of 'stock' largely vanishes with the transition from print to digital this shouldn't be too much of a problem.
  6. Confidentially for their reading habits. If you've never come across it, go and read the ALA's The Freedom to Read Statement

A not-for-profit readers' collective

It seems to me that the way to manage the transition from the old world to the new is as a not-for-profit readers' collective. By that I mean a subscription-funded system in which readers sign up for a range of works every year. The works are digitised by the collective (the expensive step, paid for up-front), distributed to the subscribers in open file formats such as ePub (very cheap via the internet) and kept in escrow for them (a tiny but perpetual cost, more on this later).

Authors, of course, need to pay their mortgage, and part of the digitisation would be obtaining the rights to the work. Authors of new work would be paid a 'reasonable' sum, based on their statue as authors (I have no idea what the current remuneration of authors is like, so I won't be specific). The collective would acquire (non-exclusive) the rights to digitise the work if not born digital, to edit it, distribute it to collective members and to sell it to non-members internationally (i.e. distribute it through 'conventional' digital book channels). In the case of sale to non-members through conventional digital book channels the author would get a cut. Sane and mutually beneficial deals could be worked out with libraries of various sizes.

Generally speaking, I'd anticipate the rights to digitise and distribute in-copyright but out-of-print poetry would would be fairly cheap; the rights to fabulous old university theses cheaper; and rights to out-of-copyright materials are, of course, free. The cost of rights to new novels / poetry would hugely depend on statue of the author and the quality of the work, which is where the collective would need to either employ a professional editor to make these calls or vote based on sample chapters / poems or some combination of the two. Costs of quality digitisation is non-trivial, but costs are much lower in bulk and dropping all the time. Depending on the platform in use, members of the collective might be recruited as proof-readers for OCR errors.

That leaves the question of how to fund the the escrow. The escrow system stores copies of all the books the collective has digitised for the future use of the collectives' members and is required to give efficacy to the promise that readers really own the books. By being held in escrow, the copies survive the collective going bankrupt, being wound up, or evolving into something completely different, but requires funding. The simplest method of obtaining funding would be to align the collective with another established consumer of local literature and have them underwrite the escrow, a university, major library, or similar.

The difference between a not-for-profit readers' collective and an academic press?

Of hundreds of years, major universities have had academic presses which publish quality content under the universities' auspices. The key difference between the not-for-profit readers' collective I am proposing and an academic press is that the collective would attempt to publish the unpublished and out-of-print books that the members wanted rather than aiming to meet some quality criterion. I acknowledge a popularist bias here, but it's the members who are paying the subscriptions.

Which links in the book chain do we want to cut out?

There are some links in the current book production chain which we need to keep, there are others wouldn't have a serious future in a not-for-profit. Certainly there is a role for judgement in which works to purchase with the collective's money. There is a role for editing, both large-scale and copy-editing. There is a role for illustrating works, be it cover images or icons. I don't believe there is a future for roles directly relating to the production, distribution, accounting for, sale, warehousing or pulping of physical books. There may be a role for the marketing books, depending on the business model (I'd like to think that most of the current marketing expense can be replaced by combination of author-driven promotion and word-of-month promotion, but I've been known to dream). Clearly there is an evolving techie role too.

The role not mentioned above that I'd must like to see cut, of course, is that of the multinational corporation as gatekeeper, holding all the copyrights and clipping tickets (and wings).

by Stuart Yeates (noreply@blogger.com) at March 23, 2011 08:26 PM

Ian McDonald

Motorola Atrix and Motorola Xoom thughts

I had a quick play today with the Motorola Xoom (iPad competitor) and Motorola Atrix (smartphone that doubles up as a netbook).

I must say that I was impressed with both of them, although the Xoom was the more impressive of the two. The Xoom seems a credible competitor to the iPad and the Honeycomb version of Android looks very polished and works on tablets well. The speed of it seemed faster than the original iPad and around the same as the iPad 2. All the software on it that I used seemed quite nice. I did like the ability to customise the home screens more than on the Apple.

Of course the key to the success for many people will be what apps come out for it. I personally believe in 5 years that apps for cellphones will be irrelevant, just like they are on a PC today as mobile platforms and web browsers become more powerful. That still leaves a few years in between though. The other key to success will be whether the battery lasts like the iPad and I couldn't test this.

The Motrola Atrix is a device that also docks to your TV and a netbook type thing. The TV side seems to just work but was quite surprised to find that it was only 480p. The king in the multimedia/goog video area still seems to be the Nokia N8. (I could write a whole post on Android vs Symbian and may yet do so..). As an Android phone it just seems like any other Android phone to be honest.

The netbook side of thing only makes sense to me if you don't have a separate device already so probably appeals to the developing world, except it will be far too expensive.

To see comprehensive reviews look at the Ars Technica review of the Xoom here and the Atrix here.

by Ian McDonald (noreply@blogger.com) at March 23, 2011 01:37 PM

January 19, 2011

Ian McDonald

Amazon move deeply into PaaS

Amazon have announced today Elastic Beanstalk.

I have been wondering how long until Amazon do PaaS in depth and now they have. As a techie, I like their approach too. You can either leave it just as a service that runs your applications e.g. Java or you can tune each individual component.

Amazon are continuing to go up the stack and this is probably enough to convince more people to use them now.

by Ian McDonald (noreply@blogger.com) at January 19, 2011 10:33 AM

January 12, 2011

Ian McDonald

Rackspace showing real thought leadership

I continue to be impressed by the direction that Rackspace is taking at the moment in regards to the cloud.

Today they announced that they are working with Akamai on distributing content through the cloud as reported through The Register here. On paper this seems to jump past the AWS from Amazon and their CloudFront service. CloudFront doesn't push anywhere near as deep into the network as what Akamai do. Of course I'm not going to write off Amazon though as they are innovating very quickly also.

The other big (huge??) thing that they are behind is OpenStack. I believe that the project has the ability to really tidy up in cloud management, and as a bonus it is open source. If it's good enough for NASA then perhaps it is good enough for others also. They worked with Rackspace to get OpenStack started because of dissatisfaction with open core model on another product (I could start a whole rant on open core here and how some people are using some real deception at the moment but I don't like writing negative blog posts...).

See this article here from The Register which talks about Ubuntu going with OpenStack very recently. I have been nudging a few people at Canonical to take this direction for a number of months, and glad to see it coming to fruition although I probably can't claim much credit for this!

Of course the big two parties missing from OpenStack are VMWare and Amazon who probably have the majority of private and public cloud deployments today. They probably have the most to lose from this as well.... Interestingly enough Microsoft IS backing OpenStack.

So RackSpace allows you to shift from traditional colo, to VM hosting, to true cloud services. Can you see why they have been so successful recently? You don't need to keep changing vendors as you start advancing your strategy.

One could also comment on the Scoble effect as a side note as Dennis Howlett did recently. And then there are other products that they've picked up along the way like CloudKick and JungleDisk which fill some nice niches.

The only area I'm not sure about though is their cloud email strategy and their SaaS strategy as a whole. I think Microsoft with Office 365 (formerly BPOS) or Google Apps has a big head start here. Having said that the market is still young, and at US$2 per user per month for their email service that is great value for money.

You could also argue that they are covering off all of the cloud bases and probably the only company that is showing such foresight at present in my mind is actually Microsoft as I talk about a little here.

For historical note I first flagged up Rackspace and OpenStack back in July last year here and here. The only thing that I am kicking myself is that I didn't buy shares at the time when I wrote this. Look at this graph and see what the shares have done since then....

by Ian McDonald (noreply@blogger.com) at January 12, 2011 08:55 PM

December 31, 2010

Ian McDonald

Microsoft server application virtualisation

I see Microsoft has released a test version of Server App-V which enables individual server roles to be run on top of a virtual machine - particularly Microsoft Azure.

This confirms in my mind that Microsoft has one of the strongest cloud visions of all the Tier 1 vendors. They provide all layers of the stack - IaaS, PaaS and SaaS. This particular move makes a lot of sense to me as you're taking away the need to manage the operating system so much and instead manage the application. The more abstraction you get, means less cost to maintain.

(As a primer for those who don't know much about App-V it started as a technology to publish virtual desktop applications. To me this makes more sense than a fully think desktop as you already have the grunt  on the desktop and you can push out individual apps and that way you can run new technology e.g. Windows 7 but still run your XP dependent apps if needed).

Thanks to Mary-Jo Foley for the heads-up. I also have some overview of Microsoft (and other vendors) on my cloud page.

by Ian McDonald (noreply@blogger.com) at December 31, 2010 04:44 PM

November 20, 2010

Stuart Yeates

HOWTO: Deep linking into the NZETC site

As the heaving mass of activity that is the mixandmash competition heats up, I have come to realise that I should have better documented a feature of the NZETC site, the ability to extract the TEI xml annotated with the IDs for deep linking.

Our content's archival form is TEI xml, which we massage for various output formats. There is a link from the top level of every document to the TEI for the document, which people are welcome to use in their mashups and remixes. Unfortunately, between that TEI and our HTML output is a deep magic that involves moving footnotes, moving page breaks, breaking pages into nicely browsable chunks, floating marginal notes, etc., and this makes it hard to deep link back to the website from anything derived from that TEI.

There is another form of the TEI available which is annotated with whether or not each structural element maps 1:1 to an HTML: nzetc:has-text and what the ID of that page is: nzetc:id This annotated XML is found by replacing the 'tei-source' in the URL with 'etexts'

Thus for The Laws of England, Compiled and translated into the Māori language at http://www.nzetc.org/tm/scholarly/tei-GorLaws.html there is the raw TEI at http://www.nzetc.org/tei-source/GorLaws.xml and the annotated TEI at http://www.nzetc.org/etexts/GorLaws.xml

Looking in the annotated TEI at http://www.nzetc.org/etexts/GorLaws.xml we see for example:

<div xml:id="t1-g1-t1-front1-tp1" xml:lang="en" rend="center" type="titlePage" nzetc:id="tei-GorLaws-t1-g1-t1-front1-tp1" nzetc:depth="5" nzetc:string-length="200" nzetc:has-text="true">


This means that this div has it's own page (because it has nzetc:has-text="true" and that the ID of that page is tei-GorLaws-t1-g1-t1-front1-tp1 (because of the nzetc:id="tei-GorLaws-t1-g1-t1-front1-tp1"). The ID can be plugged into: http://www.nzetc.org/tm/scholarly/<ID>.html to get a URL for the HTML. Thus the URL for this div is http://www.nzetc.org/tm/scholarly/tei-GorLaws-t1-g1-t1-front1-tp1.html This process should work for both text and figures.

Happy remixing everyone!

by Stuart Yeates (noreply@blogger.com) at November 20, 2010 10:45 AM

November 11, 2010

Ian McDonald

Privacy and social network aggregation

In the last couple of days I tried out a new service from a subsidiary of a multi-billion dollar company that made me want to stomp on it and crush it. Harsh words I know but the way this service abused privacy was just astounding.

This device enabled you to touch another persons device and then it would share your contact details, including relevant social networks. All sounds fine and just a modern version of a business card surely?

The problem was two fold - the first problem was that you could not decide on a person by person basis what contact details you want to share. I can't think of many people that I would want to share all my phone numbers, email (work and personal), LinkedIn, Twitter and Facebook details with - I would always want to give a subset of my data with to each person. What's also scary is that this company is in the location based scenario.... So is there future roadmap that you share all your details and where you are?

And it gets worse. The way it links to the social networks is not by you supplying a profile link. Each service it wants to install an application that can extract all your data and post to it as well. Are you kidding me? I'm not letting an aggregating service do whatever it likes with my online presence.

So when you see these sort of websites don't just click yes, yes, yes all the way through and supply your login details to everything you own online.. And also think about what would happen if that aggregating service got hacked...

by Ian McDonald (noreply@blogger.com) at November 11, 2010 10:30 AM

October 18, 2010

Ian McDonald

Amazon does SSL termination

Amazon has had a viable cloud strategy with AWS for quite a while now as I have discussed on my website.

They have now introduced SSL termination which is a big help when it comes to e-commerce/secure websites.

What is SSL termination and why is it important? Well with Amazon you've been able to distribute traffic to a number of your servers automatically using their ELB (Elastic Load Balancer). You can even use auto scaling to add extra servers automatically if you want!

The problem is that until now this has not worked for decrypting HTTPS traffic (SSL termination). This generally has made things quite hard and you've had to work around by directing to a single node, have multiple certificates or complex configuration.

Now the ELB does the HTTPS encryption/decryption and can then direct the traffic to any node as it does with unencrypted traffic. If you want to know all the gory details have a look at the AWS blog here.

As a side note - if you want to learn more about starting the Amazon journey then go have a look at Symbian's architect's blog here.

by Ian McDonald (noreply@blogger.com) at October 18, 2010 09:41 PM

August 29, 2010

Ian McDonald

Is 3Par worth $2 billion?

After seeing all the news around 3Par I started to wonder whether they are really worth $2 billion considering I had never heard of them before.


The primary benefits I can see of 3Par are:
  • simplification of storage management
  • thin provisioning


From a pure financial point of view the deal doesn't make sense - their sales were just under $200 million in the last year and I am quite confident that you could build a similar set of products for less than $2 billion.

Conspicuous by it's absence in the bidding are IBM, HDS, Oracle, and EMC. They have either decided it is too expensive or already have something similar under development.

The other thing to consider is that we have two heavyweights in HP and Dell doing the bidding. It is interesting in itself that they aren't being considered for acquisition by a smaller player. I believe that the reason for this is that a smaller company would realise you can build a competitor for much less. Many large companies have forgotten the art of innovation and it is much simpler to buy companies with the product you need.

The unknown value of 3Par might be in it's patent portfolio - since no-one else seems to be doing thin provisioning in the way that they do it might be reasonable to guess they have patented it. In my mind software/pure idea patents shouldn't exist and the idea of thin provisioning is an obvious idea. This hasn't stopped VMWare taking out patents on thin provisioning of memory though, and pursuing it.

So in my mind I wouldn't pay $2 billion for 3Par but a big company would quite willingly do so since they can't quickly innovate themselves and are scared of IP issues. What I would have done though if I led HP or Dell is to pay $100 million for a non-exclusive perpetual license of their IP, and put the rest of the money into integrating it into my products and then selling them.

by Ian McDonald (noreply@blogger.com) at August 29, 2010 04:29 PM

August 12, 2010

Ian McDonald

Nice update on Azure

Mary-Jo Foley from ZDNet has posted a nice update on where Microsoft Azure is at. Wander over here to have a look and see page 2 in particular for a nice system architecture drawing.

by Ian McDonald (noreply@blogger.com) at August 12, 2010 02:55 PM

August 06, 2010

Ian McDonald

Microsoft begins adding single-sign on support to its Azure cloud

Interesting article here about Microsoft adding SSO support to Azure cloud including things such as Google and OpenID.

They are showing pretty serious intent with all of this. Great to see!

by Ian McDonald (noreply@blogger.com) at August 06, 2010 05:24 PM

August 02, 2010

Ian McDonald

Featured in Outsource magazine

I just got interviewed by Outsource magazine today and the interview is up here.

At the moment I'm even up on the front page of their website!

by Ian McDonald (noreply@blogger.com) at August 02, 2010 05:12 PM

July 19, 2010

Ian McDonald

OpenStack

Great to see the launch of OpenStack being announced. This is a great idea for everything in the cloud stack to be open sourced, including management tools, provisioning etc. This does really fulfil the things Open Cloud Manifesto is asking for.

This is being backed up by many leading companies which is great to see. To find out more go have a look at their website or see Glyn Moody's article which has many good links.

by Ian McDonald (noreply@blogger.com) at July 19, 2010 09:08 PM