Penguin

May 05, 2012

An orphan olive branch to Mercurial

Git repository browsers have universally awful graph drawing algorithms.

For the longest time, one of my repositories has had two main branches, master and release. For a release, I would merge git merge --no-ff master into release. (Using --no-ff forces a commit on release even if release could be fast-forwarded to the current state of master. That way the act of cutting a release is always recorded in the repository.) Development happens on master, sometimes on branches. Topic branches are rebased before merging them back to master, once again using the --no-ff switch to record that a certain stretch of commits belonged to one topic together.

Essentially, this is a two-track history, with occasional short parallel side tracks on one side:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master
      \   \           \                    \   \       \   \
-------o---o-----------o--------------------o---o-------o---o    release

You would think that this would be easy to draw in a sane way.

And most of the time it is. But sometimes repository browsers decide to to draw release on the other side of master. And as it happens, sometimes a topic falls by the wayside for a while. When these conditions coincide, drawing the stray heads from these topic branches and at the same time drawing release in such a way that the merge direction (from master into release) is correct suddenly requires snaking each release commit around all the previous ones. The result is a marshalling yard of parallel tracks (which I will not try to give an ASCII diagram of…) for representing what in reality is a very simple history. That makes it very difficult to make heads or tails of what really happened in the repository: a whole Black Forest out of just two trees.

There are some ordinary options to suppress this. The most obvious one would be to do a fast-forward merge of release back into master before picking up again. Doing so yields a triangular structure like this:

                           o--o--o--o
                          /          \
-o---o   o   o---o---o   /------------o---o---o   o   o---o   o---o  master
      \ / \ /         \ /                      \ / \ /     \ / \
-------o---o-----------o------------------------o---o-------o---o    release

Here there are no parallel tracks: the only unbroken track is the release branch, so no matter when and how any algorithm tries to draw this graph, it will be forced to string the commits into short side tracks alongside the release track. There is no likely way to turn this into a funhouse of illusory complexity.

Any solution that merges release into master in any way will have a very annoying drawback, however: you can no longer read the history of master without getting all of the release merges interspersed into it. This is all the worse if you never gave you those merge commit messages much thought, because that means the history of release by itself consists of nothing but an endless row of “Merge 'master' into release”. And if that was bad enough by itself, it gets really irritating during periods when most commits are released immediately: the noise takes up a major part of your commit log.

Then an epiphany disrupted my long-standing dissatisfaction with the situation.

This is what the history in my repository looks like now:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master

-------o---o-----------o--------------------o---o-------o---o    release

That’s right: no merges.

Yet again, release is a single unbroken track. But now so is master. And since the branches are unconnected, it is never necessary to arrange them relative to each other, so they will always be drawn properly. And the master commit log remains clean and readable.

What I have done is make release an orphan branch that shares no history with master (created with git checkout --orphan). To cut a release, I check out release, then I get the tree from the commit I want to release and put that in a new commit on release. Obviously with this scheme I need to manually record the commit ID somewhere to be able to know what state of master a particular release corresponded to – there is no longer merge metadata to keep track of that. The commit message seems a natural place to record that information. I need to construct one in any case since Git does not know how to provide a default message for these commits like it does when merging a branch. Of course, the extended commit message is also a good place to put a list of commits that are hitching a ride on this release. I decided to put a release version (in my case, a simple incrementing integer) in the commit message subject as well, to make it easy to refer to a particular release.

Needless to say, I have the process automated. This is my release script:

#!/bin/bash
set -e
commit=`git rev-parse "${1-master}"`
read num junk oldcommit <<<`git log --no-walk --format=%s release --`
(
  printf '%d @ %s\n\n' $((++num)) $commit
  git log --reverse --oneline --abbrev=12 --no-decorate --no-color $oldcommit..$commit
) \
| git commit-tree $commit^{tree} -p release \
| ( read new ; git update-ref refs/heads/release $new )
git push -f origin master release

Aside from the hard linkage by commit ID you also get a soft correlation by commit date if you ask git log and friends to use --date-order. This is sufficient for routine development work. Note that since the commit IDs are recorded, it is possible to grafts to retrospectively (possibly temporarily) make the orphan release branch seem as though a mergeful branch.

A nice aspect of doing things this way is how easy it is to get a full diff of the total change represented by a release. With a merge-based release branch it takes fiddling to ask for that diff and enough knowledge to know how to.

And so I seem to have arrived at a poor (technically awkward, functionally very limited) reinvention of Mercurial’s named branches, using the plumbing provided by Git. This may be the only true use case for named branches that I can think of.

Update: I’ve rewritten the script to use lower-level plumbing. It no longer even checks out the tree, it just directly creates a commit object based on the tree object of the released commit.

May 02, 2012

D’uh

I recently discovered the -h switch of GNU sort, added in the coreutils 7.5 release from Aug 20, 2009. With this switch, sort will do a numeric sort of human-readable size numbers, i.e. it will accept “42M” and “1.3G” as numbers and put them in the right order. This led to the following shell one-liner in my ~/bin:

#!/bin/bash
exec du "${@--xd1}" -h | sort -h

It invokes du to print the disk space consumption of a directory tree, then sorts its output by size. If you pass any switches they will be passed on to du, else it will default to -xd1 (-x = stay on one filesystem, do not cross mountpoints; -d1 = do not print directories deeper than 1 level).

I gave this script the only name it could have – obviously, duh.

Update: turns out that the -d switch of du is even newer than sort’s -h switch. It was added for compatibility with FreeBSD in the coreutils 8.6 release from Oct 15, 2010 – prior to that it had to be spelled --max-depth, which rather complicates matters. You would have to do this:

#!/bin/bash
DEFAULT=(-x --max-depth=1)
exec du "${@-${DEFAULT[@]}}" -h | sort -h

That’ll win neither beauty nor concision contests.

April 29, 2012

e-legitimation - BankID och Linux - mobil-bankid

Jag har varit en stor fan av fribid.se för det de har gjort gällande bankid och linux.
Men nu när jag skulle skapa en ny e-leg hos fsb så fick jag problem igen.

Men nu finns det en mobil-bankid möjlighet som gå att beställer vi fsb webb från sin Linux dator i mitt fall Fedora 15.
mobil bankid

Och där efter installera bankid appen på sin Android telefon. Därefter när man logga in med bankid på tex skattverket så använder man mobilen och sin e-leg och bankid istället för sin linux burk. Jag vet är det här är bara en workaround och att själv klart borde all fungera på linux. Men det här kan hjälper er som precis som jag sitta uppe sen en kväll och måste bara deklarera och behöver en snabb fix för att får det gjort.

/Roger

April 16, 2012

In which I write about PHP for the first and the last time

Tim Bray wrote a short piece on PHP and kicked up a huge hullabaloo in the land of weblogs. Here’s my contribution to the echolalia.

Tim writes that it’s his experience that systems written in PHP are all spaghetti. I don’t think that’s a coincidence, and there are two sides to that coin.

One side, which all the PHP apologists are citing with full justification, is that its nonchalant everything-but-the-kitchen-sink “standard” library approach and its wide deployment present such a low barrier to writing and re-/deploying code that a lot of people who only have small needs are empowered to meet them on their own, however messily. I have argued that this enabling function is a good thing and I stand steadfastly by that position.

But on the other side, well, PHP is… lousy. Just wretched. Why?

  • The language – and here I’m talking only about the core, that is, syntax, type system, object orientation support, scoping rules, and the like – is limited and haphazard:

    Haphazard, because it was never designed in any fashion: it grew out of a templating system with hodgepodge constructs for which orthogonality was only an afterthought.

    Limited, because while it’s all dynamically typed and garbage collected, it squanders most of that advantage by limiting itself to the expressive power of C, roughly. Anonymous functions are awkward to create, and I’m not sure closures are possible in any practical fashion at all. Lists are second-class citizens, always bound to arrays. Attempts to introspect end up looking comical. The standard code modularization mechanisms (include, require) are simple-minded textual inclusions.

  • This is my big complaint: all the APIs are execrable. It starts with the built-in stuff: try to do something with the image functions or the zip file functions – I never figured out a way to avoid making my code look ugly. With the built-in library setting a bad example, it’s no surprise that the same issue extends to the packages available from the PEAR: awkward is the rule.

    In my opinion, that is what makes me and a lot of other people feel that PHP code resists being made clean. I feel the same way when I’m forced to use Tk in Perl: the API is so misshapen that you just can’t make your own code sitting on top of it look pretty.

  • The easy, obvious way to do things is often the incorrect one.

    There are lots of tutorials which will either not tell you to quote user input before interpolating it into SQL statements at all, or tell you to use addslashes for the purpose. In either case you are open to injection attacks – either gaping wide or just wide. What you really should do is use a function that respects the particular SQL dialect, such as mysql_escape_string… no wait, that’s dead code, I mean mysql_real_escape_string. Bleurgh. And once you’ve found out all that and understood it, properly quoting user input is still a pain in the bottocks and requires much more code than not bothering. Guess what casual coders will do? Now contrast Perl’s DBI, where using bind parameters is just as easy as not; and in fact, makes the code easier to read.

    How about working with strings in an encoding-aware fashion? That means nothing short of rolling your own string munging, with “help” from some typically byzantine APIs – what fun! Which novice is going to know that they should? Who is going to bother? How many of them will get it right?

All of these flaws are interconnected; the morass is simply the result of the language being a templating system that grew too big for its breeches. I don’t believe the problems can be corrected in any sensible fashion; PHP will always be a templating system, however much it may be straining against its clothes.

And let me tell you, it’s still a great templating system! If all you need is to write a web app that consists of two pages, running four queries over a five-table database, there is nothing that will get you up and running faster.

But that doesn’t make it suitable for large-scale systems. It’s not that the premise does not scale, it’s just that this particular implementation of the premise does not. Apologists will sometimes argue that the flaws are a necessary evil in achieving the low barrier to entry; worse-is-better style. I don’t buy that argument for a second. There is no reason that a language could not be designed to address the precise problem space that PHP aims at, but be created from scratch to be big enough for its britches, without the slipshod, organic growth. There is no reason it would have to be any harder to get things done with a standard library that encourages good practices as the obvious and easy way to accomplish things.

PHP is ripe for having its lunch eaten, really.

Update: Eevee rants about it, comprehensively.

April 08, 2012

Building for Amazon

Recently there has been a bit of comment from a couple of people around an article that appeared about using Amazon Web Services (AWS).

To be honest I'm more than a bit annoyed about how they were written up. The article made it sound like I'd given a talk about my use of AWS at work, or had helped write an article. I hadn't done either, but was part of a panel discussion around cloud. I do note that one of the other panellists was similarly written up too... Most of the article was from an answer to a question about what was good, and what was bad about Amazon with other bits picked out from other things. Some of it was not quite sensationalised and I didn't know the article was happening or a chance to review (something that has always happened before and I've spoken at around 20 events).

So first of all I want to go on the record and say I am a strong supporter of Amazon. They, Google and Salesforce are the companies who have done more than anything else to push cloud forward. Also anybody else who has heard me speak knows I am a strong proponent of AWS. Going forward though I think I'll just refuse to talk about any areas of improvement needed, but focus on the strengths. I have been in communication with AWS staff to tell them my view hasn't changed and I back them 100%.

So I'd know like to focus on what I was intending to convey, and did convey but wasn't reported, at the panel discussion. Building a system on Amazon does not mean that you stop making proper design decisions. Some people have assumed that because Amazon is such a great company that they can forget about system architecture and everything will be fine. You can't. You wouldn't build your physical machines or VMWare and not do backup, do no performance tuning, and have no redundancy. Most of what people use Amazon for is for running machines (They do do many other things too, like a CDN and NoSQL on demand etc). So you need to design your systems. I learnt this many years ago.

When it comes to performance you can't necessarily throw everything at it on AWS like you can with a traditional architecture as you don't know the full underlying design and you can't fix your bottle neck in a traditional way. But running your on-prem architecture in this way can cost you absolutely millions. There are numerous cases of media organisations, life science companies etc doing large batch processing for hundreds or thousands of dollars that they just couldn't do cost effectively before and to do it very quickly also. What you do need to do for AWS though is try and parallelise your workload wherever possible as Amazon works well in this model. You can also vary your machine size as you need to - and now Amazon allows 64 bit machines for all size types it makes this even easier, and saves more money. So you can go all the way from a free Micro instance up to new Sandy Bridge based machines without rebuilding your image.

Make sure with Amazon that you have machines running in another area as redundancy and you have a way of activating them. You wouldn't run your own data centres like this, so why do it differently on Amazon. The "horror" stories around when businesses struggled when an Amazon Availability Zone (AZ) went down are really a horror story about bad architects in my mind.

Backup. Yes, that is a command! Any IT exec who doesn't ensure that their data is backed up needs to be shot. Why should this be any different on Amazon? Firstly people had problems as they didn't realise that Amazon can lose changes when you do some kinds of restarts (as config runs in RAM in effect), then they didn't realise that only backing up in the data centre was a bad idea (EBS backing for the EC2 instance). Amazon do make it easy here as S3 allows you to do quick, cost efficient backups - how many other services are designed for 11 9s - that is 99.999999999%? Again, if you get data loss it is not Amazon at fault, but your architects.

Amazon do status reporting online here. Do you get that from your other vendors, or internally in your own IT function? AWS are to be applauded for their transparency. The one corruption event I referred to was documented on the status page (an EBS fault occurred). It should be noted that no data was lost at all from this as it was failed over to another system. I have had on-prem failures in the past where I have lost data or took a long time to recover. This incident was all sorted in less than an hour - AWS allows you to build solutions like this easier in many cases to avoid this problem having high impact.

So in short would I go with Amazon AWS again? Absolutely!! I have never had any significant downtime with them in my roles, and it has saved money and been extremely flexible.

NB This is all my own opinion, and not that of my employer - something I also said at the panel discussion but was also omitted.

March 28, 2012

Bug of the week

Lukas Mai:

The following code is somewhat silly, but gcc should either compile it correctly or print an error message, not generate invalid asm[:]

int $1 = -1;
int main(void) { $1++; return $1; }

Assembler injection attacks, here we come!

March 25, 2012

Six Stages of Debugging

#sixdebug { margin-left: 0 } #sixdebug li { font-size: 1.6em; font-weight: bold; margin-left: -.444em } #sixdebug li p { font-size: 0.625em; font-weight: normal }
  1. That can’t happen.

  2. That doesn’t happen on my machine.

  3. That shouldn’t happen.

  4. Why does that happen?

  5. Oh, I see.

  6. How did that ever work?

[This is not mine. Its oldest mention I could track down on the web appeared on a now-defunct weblog. I am posting it in the interest of personal archival.]

March 19, 2012

Shoestring & bubblegum sound server

In which I beat MacGyver.

I recently had need to play sound on a headless Linux machine. I started looking into sound servers, but everything I found seemed a significant amount of work to set up. I tried to reduce the problem to the fundamental parts involved, and by a trail of hints winding through a narrow mountain pass arrived at a rather… minimalist solution to fit my minimalist needs. I did not require anything else than to be able to hear sound at all and the solution did not require anything else of me than ALSA – and it’s hard to install a Linux machine without ALSA these days.

The entirety of the charade amounts to this:

  1. On the speakerless machine, load the loopback ALSA driver:

    modprobe snd-aloop index=0 pcm_substreams=1

    The driver provides a card with two sound devices, and when sound is output onto a stream on one device then the driver mirrors that as an input available on the same stream on the other device.

  2. Configure sound with an .asoundrc like this:

    pcm.!default {
      type dmix
      slave.pcm "hw:Loopback,0,0"
    }
    pcm.loop {
      type plug
      slave.pcm "hw:Loopback,1,0"
    }

    This has programs default to outputting sound to stream 0 of device 0 of the loopback (pseudo-)card, and has ALSA mixing their outputs together (type dmix). The loopback driver will make the resulting sound available for sampling via stream 0 of device 1, for which the configuration sets up another source called loop as a simple alias (type plug).

  3. On the machine with speakers you can then you can do this:

    ssh -C speakerless sox -q -t alsa loop -t wav -b 24 -r 48k - | play -q -

    The bolded portion configures sox’s input to use the ALSA type, and the underlined part (which is where normally a filename is given) gives name of the source – the loop alias from the configuration above. The rest of the switches tell sox to output 24-bit, 48 kHz WAV to standard output, to be picked up by ssh.

  4. Now play something on the speakerless machine.

This will push a constant stream of sample data down the wire, even during silence; with SSH compression enabled as it is here, that will come to something like 4 KB/sec and will very slightly busy the CPU on both machines. Both resource drains stop if you break the SSH connection. You can do so at all times without sound-playing programs on the speakerless machine ever noticing.

The one and only real drawback is a playback latency of a few fractions of a second – enough to be noticably not in real time.

But as I said, I had minimal needs of it.

[Update: added explanations.]

March 16, 2012

Kindle Reading Stats

I’ve written before about my initial investigations into the Kindle, and I’ve learnt much more about the software and how it communicates with the Amazon servers since then, but it all requires detailed technical explanation which I can never seem to find the motivation to write down. Extracting reading data out of the system log files is however comparatively simple.

I’m a big fan of measurement and data so my motivation and goal for the Kindle log files was to see if I could extract some useful information about my Kindle use and reading patterns. In particular, I’m interested in tracking my pace of reading, and how much time I spend reading over time.

You’ll recall from the previous post that the Kindle keeps a fairly detailed syslog containing many events, including power state changes, and changes in the “Booklet” software system including opening and closing books and position information. You can eyeball any one of those logfiles and understand what is going on fairly quickly, so the analysis scripts are at the core just a set of regexps to extract the relevant lines and a small bit of logic to link them together and calculate time spent in each state/book.

You can find the scripts on Github: https://github.com/mattbnz/kindle-utils

Of course, they’re not quite that simple. The Kindle doesn’t seem to have a proper hardware clock (or mine has a broken hardware clock). My Kindle comes back from every reboot thinking it’s either at the epoch or somewhere in the middle of 2010, the time doesn’t get corrected until it can find a network connection and ping an Amazon server for an update, so if you have the network disabled it might be many days or weeks of reading before the system time is updated to reality. Once it has a network connection it uses the MCC reported by the 3G modem to infer what timezone it should be in, and switches the system clock to local time. Unfortunately the log entries all look like this:


110703:193542 cvm[7908]: I TimezoneService:MCCChanged:mcc=310,old=GB,new=US:
110703:193542 cvm[7908]: I TimezoneService:TimeZoneChange:offset=-25200,zone=America/Los_Angeles,country=US:
110703:193542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wan,name=localTimeOffsetChanged,arg0=-25200,arg1=1309689302:
110703:193542 cvm[7908]: I TimezoneService:LTOChanged:time=1309689302000,lto=-25200000:
110703:183542 system: I wancontrol:pc:processing "pppstart"
110703:193542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wan,name=dataStateChanged,arg0=2,arg1=:
110703:183542 cvm[7908]: I ConnectionService:LipcEventArrived:source=com.lab126.cmd,name=intfPropertiesChanged,arg0=,arg1=wan:
110703:183542 cvm[7908]: W ConnectionService:UnhandledLipcEvent:event=intfPropertiesChanged:
110703:193542 wifid[2486]: I wmgr:event:handleWpasupNotify(<2>CTRL-EVENT-DISCONNECTED), state=Searching:
110703:113542 wifid[2486]: I spectator:conn-assoc-fail:t=374931.469106, bssid=00:00:00:00:00:00:
110703:113542 wifid[2486]: I sysev:dispatch:code=Conn failed:
110703:183542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wifid,name=cmConnectionFailed,arg0=Failed to connect to WiFi network,arg1=:

Notice how there is no timezone information associated with the date/time information on each line. Worse still the different daemons are logging in at least 3 different timezones/DST offsets all interspersed within the same logfile. Argh!!

So our simple script that just extracts a few regexps and links them together nearly doubles in size to handle the various time and date convolutions that the logs present. Really, the world should just use UTC everywhere. Life would be so much simpler.

The end result is a script that spits out information like:

B000FC1PJI: Quicksilver: Read 1 times. Last Finished: Fri Mar 16 18:30:57 2012
- Tue Feb 21 11:06:24 2012 => Fri Mar 16 18:30:57 2012. Reading time 19 hours, 29 mins (p9 => p914)

...

Read 51 books in total. 9 days, 2 hours, 29 mins of reading time

I haven’t got to the point of actually calculating reading pace yet, but the necessary data is all there and I find the overall reading time stats interesting enough for now.

If you have a jailbroken Kindle, I’d love for you to have a play and let me know what you think. You’ll probably find logs going back at least 2-3 weeks still on your Kindle to start with, and you can use the fetch-logs script to regularly pull them down to more permanent storage if you desire.

February 09, 2012

Pingdom - nice world wide monitoring tool.

Worth looking into.

These are two checks on my website….. my own server. And yes I know the resonse times arn’t amazing. And yes we had a power outage the other day and my server wasn’t able to reconnect to the network for some reason.

Uptime for www: Last 30 days

Response time for www: Last 30 days

/Roger

January 23, 2012

Cloud computing and location of data

Disclaimer: I'm not a lawyer so this is not legal advice, and these views do not represent my employer's views either.

One of the big elephants in the room with cloud computing is the location of data. People are naturally worried about whether their data is accessible by others or not. Some providers will tell you the location of the data, some will not. There are also the issues of the Patriot Act and safe harbour when interaction with technology providers across the Atlantic.

The Patriot Act requires a US based corporation to hand over data to the government and they do not have to disclose it to the end customer either if they are service provider. As far as I can understand you are not protected any further even if the data is in the EU or another region. The defining requirement is whether they are a US based company.

One thing that is mentioned often is safe harbour. Basically what safe harbour means is that the US based provider will adhere to the same standards as the EU requires. This is because US data protection is basically non-existent. The safe harbour provisions does NOT mean your data will reside in the EU, it just means that it will be protected to the same standard as the EU.


Of course none of this matters if you work for a global corporation headquartered in the USA anyway as then you are required to hand data over to the government if requested under the Patriot Act as I read it. The difference is, whether you know whether the government is accessing your data. The government could request your data from you, but may not need to if they go to your cloud supplier who is also a US based corporation.


It is common sense that if you have sensitive data that you encrypt it, whether you store it in the cloud or on premises. This is especially important for data such as customer or employee data that would cause damage - either real financial loss or damage to reputation.

I believe that there will be a rise in cloud encryption services e.g. GPG type plugins for Gmail. Already Amazon has a service for S3 called Server-Side Encryption. With this service you give your private keys to Amazon to seamlessly encrypt/decrypt data on the fly. However what this means is that Amazon could give your encrypted data to the US government without your knowledge even the Patriot Act. As such in my mind the only reason that anybody would use this would be for low value data and I would not consider an encryption service for email where the vendor controls the private keys.

One aspect that people do often overlook is not so much government regulations, but their own rules. What do your customer policies say, and what do your own staff policies say. For example your HR policy may say that all personnel data will be stored in UK, or your customer terms and conditions might say that all data will be stored in the EU. Many cloud services might not be based in the EU and there are very, very few in the UK. There can also be obscure regulations specific to your industry e.g. in a previous role the code master had to be in the UK as cryptographic code was considered a weapon and needed an export license.

It should be noted that complying with privacy regulations by storing data in the EU does not mean that it cannot be taken under the Patriot Act. In these cases it is assumed that the US Government is the evil one, but I have no less reason to suspect that the UK or any other government is any less nefarious.

My conclusion is that it is safe to store data in the cloud if a company adheres to safe harbour, and is probably better than most companies own data protection. If however you are worried about your data falling into government hands then you need to look into it very carefully. The only really safe way to protect your data from governments is to encrypt your data in the cloud with your own encryption keys.

Useful reference articles:
ZDNet Article: How the USA Patriot Act can be used to access EU data
Wikipedia article on the Patriot Act
Wikipedia article on safe harbor
Ars Technica article on Patriot Act and cloud providers

January 16, 2012

Status quo awareness

Paul Graham:

The trick I recommend is to take yourself out of the picture. Instead of asking “what problem should I solve?” ask “what problem do I wish someone else would solve for me?”

January 14, 2012

Concise XPath

I get the impression that not many people know XPath, or know it very well, which is a shame. For one, it’s a beautifully concise notation (as you’ll see shortly). For another, it may be the difference between whether you hate XML or not. (I won’t claim it’ll make you like XML, though it may. It did for me.)

XPath is really very simple: you just string together conditions. Evaluation begins with a set of nodes so far. Then a new set of nodes is selected based on the given ones, and the condition is checked on this new set. If it’s a condition you appended with /, that means to then select the matching nodes for the next step. If you appended it inside [], that means to continue on with the original set, but to discard those nodes for which there were no matching new nodes.

So /foo/bar means this:

  1. Start with the root node.
  2. Then /foo: for each node (which is just the root node, so far), fetch its child nodes (of which the root note always has exactly one), check which ones are foo elements, and take those as the new set.
  3. Then /bar: for each node, fetch its child nodes, check which ones are bar elements, and take those as the new set.

These conditions appended with / are known as steps.

And /foo[bar] means this:

  1. Start with the root node.
  2. Then /foo: for each node, fetch its child nodes, check which ones are foo elements, and take those as the new set.
  3. Then [bar]: for each node, fetch its child nodes, check if any are bar elements, and if you come up empty then discard that node.

This is known as a predicate. Each predicate can itself be just as complex as any expression: it can itself contain steps and predicates.

Finally, there are axes, written as prefixes separated with a ::. Axes specify which set of nodes to select before checking the condition – it doesn’t have to be the child nodes of the current set, that’s just the default axis (which you don’t need to write) called child::. So you can write e.g. /foo/following-sibling::bar:

  1. Start with the root node.
  2. Then /foo: for each node, fetch its child nodes, check which ones are foo elements, and take those as the new set.
  3. Then /following-sibling::bar: for each node, fetch all its siblings, check which are bar elements, and then take those as the new set.

(Thus /foo/bar and /foo[bar] really mean /child::foo/child::bar and /child::foo[child::bar] respectively. Therefore each condition also includes a selection rule, often implicitly.)

Compare expressions and explanations and you see what I said about concision and beauty.

Now, with those principles given to you, just string together conditions. There are a few syntactic shortcuts other than not needing to write child::, e.g. you can write attribute::foo as @foo, and /descendant-or-self::foo can be written //foo, but there is no magic to those: they are just sugar. For the details – lists of possible axes, syntactic shortcuts, etc. – just refer to the standard. Lousy though it may be as an introduction, it makes a good reference.

That’s XPath.


Some practical notes:

With the various axes such as following-sibling::, you always get a whole set (e.g. all following siblings in this example). If you want a specific one from that set based on position – usually the first –, you have to discard the ones you aren’t interested in by using a predicate that checks the position – in that case [1], which is another shortcut notation, standing for [position() = 1]. The position() function evaluates to the index of a node within its subset, which is based on the node it was selected for.

So a common construction is following-sibling::*[1], which amounts to “the element whose start-tag is right after this one’s end-tag.” A somewhat likely case is to further combine this with a [self::foo] predicate to say “but only as long as that is a foo element.”

Observe that the order of predicates matters.

If you write *[self::foo][1], you get all elements, then narrow it down to the foo elements, then to the first of them – so it amounts to “select the first foo element anywhere” which is identical to the much simpler expression foo[1]. This is very different from *[1][self::foo], which first narrows down “everything” to “the first thing” and only then checks “but only if it’s a foo.”

January 09, 2012

The essentially mediocre

MG Siegler:

If you’re saying something that you think is great, why would you want to do it as a comment on another site anyway?

December 28, 2011

Creative Commons - nordickiwi.com

I’ve decided to use the Creative Commons lisince for this website. You can read more about it at the bottom of this page. I still need to figure out how to add it to the MediaWiki part of this site.

/Roger

December 07, 2011

Spend money on… which is it, now

Maciej Ceglowski:

To avoid this problem, avoid mom-and-pop projects that don’t take your money! You might call this the anti-free-software movement.

But it’s not! It’s the anti-free-service movement. Which I whole-heartedly support.

(Maciej makes that point himself, eventually and obliquely, but not until after the catchy coinage…)

December 04, 2011

Three months with the TouchPad

I first started writing this post on 2 September 2011. It was going to be called "three days with the TouchPad". I'd like to say that my opinion has changed substantially over the three months since then, but for that to have happened, I would have had to spend serious time with the device.

I haven't.

Last time anyone in our house tried to use the TouchPad it got thrown on the couch in disgust1 On the contrary, our iPad is happily used every day. Is this just a case of "you get what you pay for"?

The story so far

I fought my way through the broken websites to purchase an £89 HP TouchPad when they cleared their stock at the end of August. I couldn't be sure that Carphone Warehouse had stock for all their orders, so I was overjoyed when mine turned "dispatched" later in the week. Then, it never arrived.  I wasted hours on the phone with CPW and Yodel (cheap courier of choice for "free delivery" everywhere), who claimed it had been delivered, when no knock had ever graced my door. The driver only spoke Bulgarian, and intimated (through a translator and wild hand gesturing) that he had given it to someone who had come up from the stairs below us - an empty flat.

I had all but given up on the delivery when, after the weekend, our neighbour came over and said their housekeeper had collected it on Friday and had it the whole time.

Argh.

Eventually, thanks to people like me, the TouchPad ended up getting 17% of the market!

Of everything that wasn't the iPad.

(So, more like 1.8% then.)

And remember, I very nearly wasn't a member of that club, as it seemed very unlikely that Carphone Warehouse would have been in a position to give me another one, had the first one not surfaced.

The TouchPad was an impulse buy, as we already owned an iPad. I had opted for middle of the range - the 32GB with 3G.2 At clearance price, my iPad cost 7 times more than the TouchPad, but remember that the original retail pricing for a comparable device was £399 for HP vs £429 for Apple.

With all that in mind, here's a collection of thoughts about the TouchPad today. It is not a review: if you are interested in a review, albeit one from before the fire-sale, go read what Shawn Blanc wrote. The experience has hardly changed.

The good

I came into TouchPad ownership with a very open mind, based in part on my ex-colleague Sergei owning a Palm Pré and not hating it. Also, everything I read about webOS online made it seem that it was designed, where Android was mostly congealed. (My apologies to Douglas Adams.) Further, I wanted webOS to be a success, because I like to use systems that feel like they are consistently designed throughout, and I didn't think it would be good for the world if iOS was to be the only relevant platform for which that was true. We are in the odd position today that Microsoft has replaced Palm as the loveable underdog: Windows Phone (and possible Windows 8 for tablets) has taken the mantle of "mobile operating environment which actually has some moden design principles applied, rather than just copying iOS", which surely must provoke some cognitive dissonance for all the people still bitter about how Microsoft stole everything from the Mac.

I only made one note from three days after unboxing: "It is really handy to have the number keys on the keyboard all the time". It still is. I suppose there are other nice things, depending on your point of comparison. Notifications are good, in general, though I really don't care that each web site I visit exposes a search endpoint, so I don't appreciate that the TouchPad displays me a notification for each and tries to add them to the search.

Grasping at straws, I still like the card metaphor, though not as much for multiple tabs as for multiple applications. And the things that were good about webOS on the phone, such as the integrated contacts, are still good here, though not as useful. The only other thing I noticed in a quick look through the menus is that it has Beats Audio, which I like to think makes me one step closer to Dr Dre than most. I don't think I've ever actually tried to make the thing play audio in order that I might notice a difference.

The goblin

How long after the horse died is it acceptable to still be flogging it?

The TouchPad is slow, out of the box. Nerds like me can make it faster with - wait for it - syslogd and kernel patches, and even overclock it if they feel the need. (I didn't.)  The iPad 1 still runs rings around it in everything - even though the iPad has half the CPU cores at a much lower clock speed, and one quarter the RAM of the TouchPad.

It has a handful of apps, but not enough to retroactively justify the purchase to me, even at £89. If I go to my Applications list, I have a beta Kindle reader, which I had to side-load as it is US only: the best Twitter experience is something called "Spaz HD Beta Preview 2", which is both award-winning and open source, though apparently named by the people who came up with "The GIMP". In fairness, it's not bad, it's just not up to the experience which is available on any one of the great Twitter clients for other platforms. And with the on-again off-again abandonment by HP, surely most of those who came into the TouchPad did it eyes-open, knowing the chances of it ever developing a good app ecosystem were not high.

Most of what I do on a tablet is web browsing, and so even if it had no apps but did web browsing brilliantly, it might be redeemed. It doesn't. It has Flash, which really just serves to make YouTube worse. Maps are horrible, scrolling is slow and sluggish, and clicking doesn't normally hit the link you want it to.

Physically, it feels cheap, due to the plastic back.  It is a good weight however.

The purchase

In my mind, there were three groups of people who wanted to buy a TouchPad at fire sale prices:

  • People who wanted a "tablet" (iPad), but couldn't afford or justify one at market (iPad) prices
  • People who wanted an "Android tablet" and figured that a port couldn't be far away
  • People who liked webOS and actually wanted a TouchPad to use webOS on it

I was in the third group, but I also suspect that was about 1.8% of the people who actually got the device.

If you were to compare the experience on a £89 TouchPad vs. whatever else you could legitimately purchase for £89 - how long were the queues for the Binatone HomeSurf 7? - it seems like a no-brainer. If there was no chance that the tablet were ever able to run Android, I don't think it would have sold nearly as quickly. At the time of writing there is an alpha-quality CyanogenMod release of Android for the TouchPad, for developers, rather than end users. With the recent release of Android 4.0, it's likely there will be a reasonably good upgrade path for the application story, and on this kind of hardware Android should be about as good as it is on any other kind of hardware.

I bemoaned this fact when I came to buy it:


#bbpBox_105984734731042816 a { text-decoration:none; color:#1F98C7; }#bbpBox_105984734731042816 a:hover { text-decoration:underline; }

I wish I could find everyone talking about running Android on the hp TouchPad, and STAB THEM IN THE FACE.
@craigbox
Craig Box

Three months later, has my attitude changed? Somewhat. I simply don't want to own an Android tablet. (Neither do many other people, as we established before.) Would it be better on this hardware than webOS? Probably. Ask me again when 4.0 is released for the TouchPad - I don't think the attempts to shoehorn Android 2.x onto tablets have done hackers any better than Samsung.

I don't think there can be any argument that the fire sale was a dumb idea, and HP's CEO eventually paid the price. Would I have paid £200 for this? No, but they would still have sold out at that price.

The summary

First world problems much? Our two tablet household isn't as good as it would be if we had an iPad each. Sure. I knowingly bought an £89 gadget to have a play with, and I suspect I could easily get that back if I wanted to sell it. Alternatively, if either of my brothers read my blog, I might be convinced to post it to them for Christmas. Over time, I think I might find a use for it - if I could pick up the Touchstone dock-slash-stand, I think it could make a great digital photo frame.  Even if all it ever did was be an LCD Kindle, it was still a bargain.

But the crux is that neither of us ever want to use it. It almost got put in the cupboard today. Attempts to use it provoke disgust, throwing it back onto the couch, and getting up to find the iPad. There is really nothing redeeming about it.

  1. Fern later clarified: "It wasn't thrown on the couch, it was thrown at the couch.
  2. If I were to look back on that purchase, I would say the money spent on the 3G was mostly wasted - tablet usage is mostly at home. The iPad spent over a year without a 3G SIM card, though it has one now thanks to Arunabh, who pointed out that T-Mobile have a remarkable 12 months free on an iPhone 4 PAYG SIM, and the iPad takes the SIM quite happily.

December 02, 2011

Prep notes for NDF2011 demonstration

I didn't really have a presentation for my demonstration at the NDF, but the event team have asked for presentations, so here are the notes for my practice demonstration that I did within the library. The notes served as an advert to attract punters to the demo; as a conversation starter in the actual demo and as a set of bookmarks of the URLs I wanted to open.




Depending on what people are interested in, I'll be doing three things

*) Demonstrating basic editing, perhaps by creating a page from the requested articles at http://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles

*) Discussing some of the quality control processes I've been involved with (http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion and http://en.wikipedia.org/wiki/New_pages_patrol)

*) Discussing how wikipedia handles authority control issues using redirects (https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Redirect ) and disambiguation (https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Disambiguation )

I'm also open to suggestions of other things to talk about.

December 01, 2011

Metadata vocabularies LODLAM NZ cares about

At today's LODLAM NZ, in Wellington, I co-hosted a vocabulary schema / interoperability session. I kicked off the session with a list of the metadata schema we care about and counts of how many people in the room cared about it. Here are the results:

8 Library of Congress / NACO Name Authority List
7 Māori Subject Headings
6 Library of Congress Subject Headings
5 SONZ
5 Linnean
4 Getty Thesauri
3 Marsden Research Subject Codes / ANZRSC Codes
3 SCOT
3 Iwi Hapū List
2 Australian Pictorial Thesaurus
1 Powerhouse Object Names Thesaurus
0 MESH

This straw poll naturally only reflects on the participants who attended this particular session and counting was somewhat haphazard (people were still coming into the room), but is gives a sample of the scope.

I don't recall whether the heading was "Metadata we care about" or "Vocabularies we care about," but it was something very close to that.

November 30, 2011

Unexpected advice

During the NDF2011 today I was in "Digital initiatives in Māori communities" put on the the talented Honiana Love and Claire Hall from the Te Reo o Taranaki Charitable Trust about their work on He Kete Kōrero. At the end I asked a question "Most of us [the audience] are in institutions with te Reo Māori holdings or cultural objects of some description. What small thing can we do to help enable our collections for the iwi and hapū source communities? Use Māori Subject Headings? The Iwi / Hapū list? Geotagging? ..." Quick-as-a-blink the response was "Geotagging." If I understood the answer (given mainly by Honiana) correctly, the point was that geotagging is much more useful because it's much more likely to be done right in contexts like this. Presumably because geotagging lends itself to checking, validation and visualisations that make errors easy to spot in ways that these other metadata forms don't; it's better understood by those processing the documents and processing the data.

I think it's fabulous that we're getting feedback from indigenous groups using information systems in indigenous contexts, particularly feedback about previous attempts to cater to their needs. If this is the experience of other indigenous groups, it's really important.

November 26, 2011

Goodbye 'social-media' world

You may or may not have noticed, but recently a number of 'social media' services have begun looking and working very similarly. Facebook is the poster-child, followed by google+ and twitter. Their modus operandi is to entice you to interact with family-members, friends and acquaintances and then leverage your interactions to both sell your attention advertisers and entice other members of you social circle to join the service.

There are, naturally, a number of shiny baubles you get for participating it the sale of your eyeballs to the highest bidder, but recently I have come to the conclusion that my eyeballs (and those of my friends, loved ones and colleagues) are worth more.

I'll be signing off google plus, twitter and facebook shortly. I my return for particular events, particularly those with a critical mass the size of Jupiter, but I shall not be using them regularly. I remain serenely confident that all babies born in my extended circle are cute, I do not need to see their pictures.

I will continue using other social media as before (email, wikipedia, irc, skype, etc) as usual. My deepest apologies to those who joined at least party on my account.

November 24, 2011

How I’m voting in 2011

It’s general election time again in New Zealand this year, with the added twist of an additional referendum on whether to keep MMP as our electoral system. If you’re not interested in New Zealand politics, then you should definitely skip the rest of this post.

I’ve never understood why some people consider their voting choices a matter of national security, so when via Andrew McMillan, I saw a good rationale for why you should share your opinion I found my excuse to write this post.

Party Vote
I’ll be voting for National. I’m philosophically much closer to National than Labour, particularly on economic and personal responsibility issues, but even if I wasn’t the thought of having Phil Goff as Prime Minister would be enough to put me off voting Labour. His early career seems strong, but lately it’s been one misstep and half-truth after another, the remainder of the Labour caucus and their likely support partners don’t offer much reassurance either. If I was left-leaning and the mess that Labour is in wasn’t enough to push me over to National this year then I’d vote Greens and hope they saw the light and decided to partner with National.

Electorate Vote
I live in Dublin, but you stay registered in the last electorate where you resided, which for me is Tamaki. I have no idea who the candidates there are, so I’ll just be voting for the National candidate for the reasons above.

MMP Referendum
I have no real objections to MMP and I think it’s done a good job of increasing representation in our parliament. I like that parties can bring in some star players without them having to spend time in an electorate. I don’t like the tendency towards unstable coalitions that our past MMP results have sometimes provided.

Of the alternatives, STV is the only one that I think should be seriously considered, FPP and it’s close cousin SM don’t give the proportionality of MMP and PV just seems like a simplified version of STV with limited other benefit. If you’re going to do preferential voting, you might as well do it properly and use STV.

So, I’ll vote for a change to STV, not because I’m convinced that MMP is wrong, but because I think it doesn’t hurt for the country to spend a bit more time and energy confirming that we have the right electoral system. If the referendum succeeds and we get another referendum between MMP and something other than STV in 2014, I’ll vote to keep MMP. If we have a vote between MMP and STV in 2014 I’m not yet sure how I’d vote. STV is arguably an excellent system, but I worry that it’s too complex for most voters to understand.

PS. Just found this handy list of 10 positive reasons to vote for National, if you’re still undecided and need a further nudge. Kiwiblog: 10 positive reasons to vote National

November 16, 2011

Nordickiwi soon 12 years old

I just read my first blog post 28/11/99 Dobedo that I wrote on my nordickiwi web site. Or as everyone on the net called them then a “Journal post”. 28 nov 1999 I published this account of a night out in Stockholm that actually was a IRL event for a Online community call dobedo.
Nordickiwi web site has evolved and mangled itself from being hosted here and there, and then on four different fedora laptops over the years in various cupboards and spare rooms in my apartment and now my house.
It’s been built using various tools, own html and css, java script to wordpress and mediawiki and a mysql database.

The current fedora version is a few years old and time get revamped and updated again. So maybe even the site could be migrated and squashed into a new tool I need to learn. I’ve been thinking about hosting a Joomla server and chucking it into that. Maybe even changing to CentOS and moving away from Fedora…… shock horror.
We will see what happens.
But anyway, 12 years isn’t a bad effort and I have no intention of stopping. Even at times things go quiet on the site but that all depends on life situation and other stuff I have to prioritise. Anyway happy anniversary to my website and thanks to all the visitors that ever stumbled across it.

/Roger Sinel - aka nordickiwi

Gnome3 or XFCE ?

Well I just ditched Gnome3 for XFCE, tried to adapt but just couldn’t do it. I mist the speed.

Sorry Gnome 3 dev team. Your doing some good work, but it just ain’t my thing.

/roger

e-legitimation linux

This is a great little project that solves a problem for many of us GNU/Linux users in Sweden.

FriBID is a open source project för e-legitimation with BankID

http://www.fribid.se/

Personally I usd the fedora 15 fribid package maintained by Henrik Nordström.

/Roger Sinel

November 11, 2011

A week of freedom

A week of Freedom, this week has been a week of free thinking and free software .
This week the Stockholm FOSS community were treated to a visit by Richard Stallman from the Free Software Foundation and the man behind GNU/Linux and GNU General Public License. He held a talk at the Stockholm university that over 1100 people attended.

I’ve seen Richard speak here in Stockholm before and it was again a refreshing reminder to hear his message that many of us FOSS users are all to quick to forget.

Since his talk I’ve read many positive comments and discussions about his talk and message regarding Free Software and the four freedoms. And also some debate which is healthy and to be expected.

The day after the talk I was privileged to able to drive Richard around Stockholm and show him some of the views of the city. It was very special spending time with a man that I and many others have so much to thank for. I will never forget that day.

Right now I sitting on a train between Stockholm and Gothenburg on my way to the FSCONS conference. FSCONS is the Nordic countries’ largest gathering for free culture, free software and a free society. The conference is organized yearly with 250-300 participants.

Richard will be speaking again at this conference as will many other interesting guests.

So I will be back in Stockholm on Sunday after a full week of Freedom.

Roger Sinel

November 07, 2011

Fedora 16 - Features this time around

Soon the time is upon us again for a Fedora release. One day left when writing this.

And here’s a quick run down of some of the features. Some nice stuff to keep us on our toes and on the bleeding edge.

Cloud stuff, or cloud stuffing
Aeolus Conductor - is a web UI and tools to create and manage cloud instances over varied cloud types.
Condor Cloud - an IaaS cloud implementation using Condor and the Deltacloud API.
HekaFS (formerly CloudFS) - a “cloud ready” version of GlusterFS
pacemaker-cloud - application service high availability in a cloud environment.
OpenStack - is a collection of services that can be used to setup and run a cloud compute and storage infrastructure.

Windows Managers – area to click in
Update GNOME 3 - to the latest upstream release.
GNOME unified input indictator - allowing users to switch seamlessly between keyboard layouts and input methods.
KDE Plasma Workspace 4.7. - including Plasma Desktop and Netbook workspaces,
Sugar – latest with enhanced activity set to provide an stable demo environment for Sugar as well as an environment for developers.

Multimedia gear
Blender 2.5 Updating Blender to the most recent version, cross platform suite of tools for 3D creation.

System changes
1000 System Accounts - Standardize on login.defs as authority for UID/GID space allocation, and move boundary between system and user accounts from 500 to 1000.
GRUB2 - Switch to using grub2 instead of grub legacy for boot loading an installed x86 system.
HAL Gone - Obsoleting HAL daemon (replaced by udisks, upower, libudev).
USB Network Redirection - Allow redirection of USB devices to other machines over the network.
Use Ext4 driver for Ext3 and Ext2 filesystems - Enable the ext4 driver to register ext3 and ext2 filesystems as well, and to mount those filesystems unchanged.

VIRTUAL World
Spice - aims to provide a complete open source solution for virtualized desktops. Spice 0.10 addsfeatures such as USB sharing between guests, and audio volume messages between guest and client.
Sheepdog - is a distributed object-based storage system for QEMU/KVM.
Virtual machine lock manager daemon.
Virt-manager Guest Inspection
Xen Pvops Dom0 - pvops-based kernel to serve as dom0 for a Xen-based system.

DEV dept
GCC Python Plugin, GCC plugin that embeds Python within GCC.
Update Perl to 5.14.

Roger Sinel



November 06, 2011

Recreational authority control

Over the last week or two I've been having a bit of a play with Ngā Ūpoko Tukutuku / The Māori Subject Headings (for the uninitiated, think of the widely used Library of Congress Subject Headings, done Post-Colonial and bi-lingually but in the same technology) the main thing I've been doing is trying to munge the MSH into Wikipedia (Wikipedia being my addiction du jour).

My thinking has been to increase the use of MSH by taking it, as it were, to where the people are. I've been working with the English language Wikipedia, since the Māori language Wikipedia has fewer pages and sees much less use.

My first step was to download the MSH in MARC XML format (available from the website) and use XSL to transform it into a wikipedia table (warning: large page). When looking at that table, each row is a subject heading, with the first column being the the te reo Māori term, the second being permutations of the related terms and the third being the scope notes. I started a discussion about my thoughts (warning: large page) and got a clear green light to create redirects (or 'related terms' in librarian speak) for MSH terms which are culturally-specific to Māori culture.

I'm about 50% of the way through the 1300 terms of the MSH and have 115 redirects in the newly created Category:Redirects from Māori language terms. That may sound pretty average, until you remember that institutions are increasingly rolling out tools such as Summon, which use wikipedia redirects for auto-completion, taking these mappings to the heart of most Māori speakers in higher and further education.

I don't have a time-frame for the redirects to appear, but they haven't appeared in Otago's Summon, whereas redirects I created ~ two years ago have; type 'jack yeates' and pause to see it at work.

October 07, 2011

Elegy to my only love in the cloud

Maciej Ceglowski:

Avos did a similar thing last week when they relaunched Delicious while breaking every feature that made their core users so devoted to the site (networks, bundles, subscriptions and feeds). They seemed to have no idea who their most active users were, or how strongly those users cared about the product. In my mind this reinforced the idea that they had bought Delicious simply as a convenient installed base of “like” buttons scattered across the internet, with the intent of building a completely new social site unrelated to saving links.

May you eventually find rest, del.icio.us. You have been undead since the day Yahoo! bought you, and Avos has only desecrated your corpse further. (I think at this point it qualifies as brand necrophilia.)

I made my peace at the beginning of the year – Avos just put the last nail in its coffin as far as I am concerned.

But I am saddened nonetheless.

For posterity, I should note my personalised comical note in this: the Avos zombie version of del.icio.us requires that usernames be at least 3 characters long. So I can no longer log into my account: ap. My ex-account. I could not even download an export of my bookmarks now if I didn’t thankfully have one already.

The worst I feel about this is for Joschua Schachter, and for the people who joined up with him after the Yahoo! acquisition because they understood his aspirations. There is a lesson here: if you care about something, don’t give away control of it – or at the least, not to a corporation. (Joining forces with other people made of flesh and blood is – no, can be – another matter. Choose wisely.)

What a shame.

September 23, 2011

Review of "Amazon Web Services: Migrating your .NET Enterprise Application"

Amazon Web Services: Migrating your .NET Enterprise Application
Rob Linton, Packt Publishing
2/5

(Review copy supplied by Packt Publishing.)

Amazon Web Services (AWS) is not a small topic. Just listed on their 'product summary' page are 28 different topics, most with an entire set of both product and API documentation behind it.

Condensing that into a book is not a trivial task, and it requires establishing a suitable narrative. This book has taken the angle of a ".NET Enterprise Application", and starts off well: a sample application, if a little trivial, is provided, and a goal stated to move the application from traditional server hosting to the Amazon cloud.

Good, but short, consideration is given to why you would put such an application in AWS rather than a platform solution. It then dives in to creating instances for deploying the application.

A book that takes you on a journey, as opposed to a general reference book, should not be afraid to make choices. Five pages are dedicated to the Import/Export service, which lets you post Amazon a hard drive. Shipping terabytes of data is a problem that users are unlikely to have up front - the book should acknowledge its existence, but it wastes time and confuses users by going in-depth on a subject which should be an appendix at best.

Similarly, Chapter 6 covers SQL Server, required for the example application, but then also covers Oracle, MySQL (RDS) and Amazon's key-value store SimpleDB, none of which are used or required. It is great to see that the notification (SNS) and queuing (SQS) are discussed in the context of how the application could be enhanced to use them, although using these services means you are "locked in" in much the same way you are on a platform service - somewhat undermining the point the book made in the beginning.

Many statements in this book are just plain wrong (such as Amazon.com not being hosted on AWS, or network (EBS) volumes being faster than instance disks - whole books could be written on this topic alone). Other sections of the book are have been made outdated as Amazon has rolled out improvements - the most major of which being the new license mobility options allowing the use of SQL Server Enterprise. While there is nothing the author or publisher can do about progress, there are occasions where the book is internally inconsistent - for example, referring to 4 regions in one section and 5 in another. In general, poor editing detracts from the reading experience.

One of the reasons Amazon is so much cheaper than regular datacenter providers is they allow you to build reliable solutions out of commodity hardware. However, this means you need to make allowances that are not at all discussed in this book. Deploying applications across availability zones is absolutely essential - Amazon is up-front in saying that they expect failures, which are widely reported by people who do not understand that AWS is not a traditional, expensive battery-backed-SAN-reliable datacenter. This book mentions availability zones, but doesn't show how to properly use them.

Redundancy is only briefly touched on - SQL mirroring and failover, possibly the most important topic this book could cover, is given two paragraphs and then offloaded to Microsoft. Even though there appears to be enough servers for a redundant architecture, the eventual service is riddled with single points of failure and there is no way that an application built to this model should be allowed into production on AWS.

Further, many best practices, especially those around firewalls, security groups and Active Directory, are described incorrectly, and are likely to lead to insecure or unnecessarily expensive deployments.

The author clearly understands both Windows/SQL Server and the basics of AWS, but taking 28 topics and picking out the important ones is a difficult task, and overall this book does a poor job of it.

Updating a manuscript to include new functionality means it would effectively never be published. The alternative is a 'living document', published online: hard to make money from, but guaranteed to be up-to-date. I am unlikely to bother reading another book on AWS.

 

September 17, 2011

Software Freedom Day 2011 - Stockholm

Software Freedom Day 2011 Stockholm

The Software Freedom Day event was held in Stockholm today and we organised it within the Swedish Linux Society.

This was the 4th year in a row that SFD has been celebrated in Stockholm in conjunction with Swedish Linux Society.

Pictures from todays event SFD 2011 Pictures

Thanks to the guys who came and helped spread the word about Free Software today.

//Roger Sinel



September 12, 2011

Future of mobile apps

Interesting article about HTML5 being the future of apps on mobile. Having worked in the smartphone industry I totally agree. Just like not many apps are used on the PC / Mac anymore but mostly web browser I believe the same thing will happen on tablet / smartphone.

The key obstacle to PC moving to web was the horsepower and fast links. If you think about the smartphone and mobile networks you could consider them to be like the PC and dialup internet 5 years ago. Given Moores law and 4G networks the mobile will go the same way I believe.

The other key part is offline storage in HTML5 so that applications can retain data/state. Already the financial times have switched to HTML5 based app so that they can avoid the 'Apple Tax'

September 07, 2011

OPML is spectacularly lousy

OPML blows chunks. This is my conclusion after a good two hours spent on exasperated googling: the specification is about as vague and informal as could be, the format misuses XML badly, and vital parts of it used widely by feed aggregators seem to be documented nowhere at all. Yuck. I guess I’ll end up deciphering the information I need from existing working code and hope it works in the general case.

This all started as what I thought would be a fun small scripting exercise: I was going to throw together a little script that would turn someone’s LiveJournal friends list into an OPML blogroll. Instead I spent more time beating on Google in mounting frustration, fruitlessly attempting to find something – anything –, as it would have taken to write the code for a better specified format. I came out empty-handed.

Update: Uche Ogbuji has ranted about the format, and Léon Brocard reports of a quote from a Ben Hammersley/Timothy Appnel talk at OSCON ’05:

Working with OPML is like driving nails into the floor with your forehead.

Update: I can’t believe I never linked to Charles Miller’s most eloquent panning of the format.

September 06, 2011

Cloud pricing is hard

One of the many benefits to cloud computing is the pricing model. Following Amazon's lead, any provider worth their salt lists their per-hour pricing on their website, and that is the price you pay, regardless of what you use.1 Gone are the days where you have to call for a custom price list, tailored for you by a man in a suit who is incentivised to charge exactly the maximum he thinks you will pay, no more no less. This means startups can get hold of scalable infrastructure at economies previously only available to the canny corporate negiotiator.

However, even in the automated, API-driven present, there are still different models for pricing which you can choose from. For example, Amazon has an on-demand price, reserved instances (pay up front to buy the right to run a machine for a cheaper rate) and spot instances (an instance market, where you bid a price and if the spot price is below that price, your instance runs). While spot instances sound like a curiosity for people doing queue-based distributed computing that can be started and stopped at will, James Saull points out they turn out to be an oddly cost-effective way to run your always-on infrastructure. You may not like the risk, and you are not getting the guarantee of instance availability that comes with reserved instances.

For the general case, once you understand what your infrastructure requirements look like on Amazon, you buy suitable reserved instances: you then save 34% or 49% on the cost of running the equivalent on-demand instance over 1 or 3 years.

Mull that over for a second. This morning, I came across a comparison of pricing between IBM SmartCloud Enterprise and Amazon EC2 (via Adrian Cockroft). I don't know a lot about the IBM cloud, but I do know bad math when I see it.

Lies, damned lies and estimated usage quotes

Amazon offer an online cost calculator. It's accurate, and always kept up-to-date, but admittedly it can be hard to use. For example, you have a small drop-down box at the top of the page which dictates which region you're in; if you are adding infrastructure in multiple reasons, it's easy to get lost.

The author of the IBM article, Manav Gupta, has obviously lost his way around the AWS calculator. His first estimate comes in at over $10,000 a month, as  "Amazon has included costs for redundant storage and compute in Europe". Amazon do no such thing. No data crosses a region unless you specifically request it - an important thing to note for compliance with data protection law. What is more likely is Gupta has started pricing his infrastructure in Europe, noticed his error, and continued in the US, without realising that AWS offers five global regions (six if you include the new US GovCloud) and you can easily provision infrastructure in all of them. In fairness, the IBM calculator seem to be much simpler; I can't find information on where IBM host their SmartCloud.

Quote 1 is replaced by quote 2, which comes in at $6370.62. Ignoring the obvious-but-insignificant errors (how does an application which does 20GB of inbound data per week do 120GB/week through its load balancer?) However, a quick look at the bill tab shows storage allocated in US-WEST, where everything else is allocated in US-EAST. Gupta's quote includes 7GB of S3 storage which is not mentioned on the post (or accounted for in the IBM quote). Not only that, it's charged twice: once in US-EAST and once in US-WEST! Assuming that's an error, I removed both allocations, and in order to be fair to what has been requested, added 300GB of snapshot storage for the EBS volumes to the correct page of the calculator.

Our new estimate - only correcting for errors, and without touching the compute cost - is $4211.90.

I've already beaten the published IBM price, but why stop there? As I mentioned above, sensible cloud purchasing almost always involves instance reservations. Because the pricing appears to have changed since the IBM article was published (I can't find a way to make IBM instances cost the same as shown in the calculator screenshot), I can't tell what reservation was used (if any) in the initial calculation. However, IBM offer 6- and 12-month reservations on a 64-CPU pool, with the note that "reserved capacity may not be economically attractive with the low monthly usage you have selected above".

Let's go for a 12 month reservation on AWS, in case our habits change. (And if they do, remember that reserved instance pricing can apply to any instance in the same availability zone on the same account.)

Our monthly cost has dropped to $2738.04. We do have an up-front reservation cost to pay, but if we amortize that over 12 months (as IBM does in their calculator) we are down to $3420.54 per month. Why not throw in Gold Premium Support? It's only another $341/month.

With regard to Gupta's criticisms about not having a PDF export on the Calculator, I find it easy enough to hit "Print to PDF" on a web page myself, and the fact I can export these quotes and publish them on this blog, far outweighs that hassle.

On the topic of software licensing

Pricing is even harder when you have to factor in the price of licensing. In fairness to IBM, the quoted Amazon costs do not include Red Hat Linux licenses. However, I suspect the only reason they were included, aside from IBM being a Big Support kind of company, is that commercially licensed software (RHEL, SUSE, Windows) is the only option you have on SmartCloud Enterprise.

If you want to run Oracle applications on EC2, why not run them on the freely-licensed Oracle Enterprise Linux? Or the most popular operating system for the cloud, Ubuntu Server?

Alternatively, if the requirement for Red Hat Linux is hard-and-fast, then there is an option to run Red Hat on-demand with Amazon EC2. Reserved instance pricing is not currently available for RHEL, therefore you would be better advised to bring your own RHEL licenses to the cloud with Red Hat Cloud Access.

In the interests of full disclosure, the on-demand RHEL price is $4519.34/mo, vs the $4211.90 above.

Did I mention the "everything else?"

Amazon have defined the cloud computing marketplace - at least for infrastructure - with EC2. As Adrian Cockroft points out in his excellent write-up on using clouds vs. building them, no-one can even come close to the price and performance, let alone the global scope, of EC2. If I were building Manav Gupta's web application, I would have the benefit of resiliency by balancing the application between multiple Availability Zones, and the benefit of reduced maintenance by using RDS for the database tier. And the price would probably be even lower, too.

The cloud provides great benefits to those who can make their application fit its ways. This is not a trivial task - sometimes even working the calculators can be too hard. If you want help with this, I am the Head of Cloud Services at Stoneburn in London, and I'd love you to get in touch. (And follow me on Twitter.)

Update: Manav Gupta has commented and provided a much neater explanation for why his first quote was vastly over-provisioned: there is a sample 'web application' option in the AWS calculator, which assigns a bunch of sample infrastructure over and above what was included in the IBM sample web application. The moral of the story is to ensure you are comparing like for like (as much as possible with differing size options between cloud providers) when making provider comparisons.

 

  1. Or, tiered options are clearly laid out, as with AWS data transfer.

August 31, 2011

August 29, 2011

Denting the universe

Rafe Colburn:

Who changed the world most, Google or Apple? […] I’ll boil it down to the most world-changing contribution by each company over the past ten years.

Google is the company that improved search engine results enough to really open the Web to the masses. […]

Apple is the company that brought a real Web browser to the pockets of millions of people. […]

Of course both companies have done many other things, but I don’t think any are as significant as those two. Which one made a greater impact? You tell me.

I believe that goes to Google, hands down.

The web is possibly as big a change in the world as the printing press was – and here I am making this comparison with great respect for its gravity. The mere fact that what I am writing this very moment will be read by a number of people I daren’t think about because it would boggle my mind is something none of my ancestors could dream to achieve. (Yet already this concept is beyond banal to the everyday web user.)

Google was the company that made the web colonisable by the masses. What Apple has done since, no matter how incredibly great, is – with due apologies to Steve – fungible.

In that sense, the first Steve revolution, the PC, was of much greater import than his second one looks to be. (But I will hasten to add that the second one is young as of this writing, and who knows where it will yet lead.) The PC was a necessary step, though not sufficient, to bring the web to everyone.

But in spite of providing a prerequisite for the web, not only did Apple not create the web, but I will go so far here as to argue here that they would and could not have. It is not in the DNA of the company to build humble utilities.

(And finally it should not be forgotten that Apple must share credit for the PC revolution with Microsoft.)

August 27, 2011

The Car extends Vehicle kindergarten, or, Replace Jargon With Pedagogy

Henning Koch, pushing back at my design pattern terminology rant:

I feel the need to defend Martin Fowler’s article because it had such a profound effect on me when it was published. Although I had been playing with “objects” and “classes” before, this article finally made me understand what OOP was all about. This is not true for many other articles and yes, I’m looking at you, shitty Car extends Vehicle OOP tutorial.

(Which made me laugh.)

Aristotle is right when he says that the concept of dependency injection should be so ubiquitously understood that it shouldn’t get its own brand name. But at the same time it represents the very essence of object-orientation and most people are not using it.

I can see what he’s talking about. It took me a lot of time to take to OOP at all, and I’m still not particularly passionate. In my hands it is purely a tool, a means to an end – I reach for it either when I have a lot of functions that operate on a particular data structure, or when I need to be able to pass around interchangeable implementations of a set of behaviours. The latter is precisely what Dependency Injection is about. That is indeed the essence of object orientation – in the abstract.

I suppose that my intuitive understanding of this principle puts me in a minority. For me, instead, the defining moment with regard to OOP was when I read Replace Conditional With Polymorphism. That is the essence of object orientation – in the concrete. That article made a lightbulb go off; for the first time I could verbalise what I had so long been doing intuitively, and why. For the first time, I could look at code and figure out a better design objectively, without handwaving.

Maybe it is time for someone to put together a non-shitty introduction to object orientation, based on explaining the idea of Replace Conditional With Polymorphism first, and how to reap the benefit by consistently using the idea of Dependency Injection second. Perhaps novices would spend less time thrashing about then, and we could finally stop inventing pompous names for perfectly trivial concepts.

Update: more pushback from Adrian Howard. See my comment.

Update: Kragen Sitaker thinks about how to write a better tutorial, with some discussion on Hacker News and silly jokes on Reddit ensuing.

August 21, 2011

“Disruptive”

Patrick Rhone:

Did you catch that? The iPad is causing such disruption in the PC business that HP, a company fundamental to the creation of the personal computer itself, is getting out of the PC business.

Someone has to step up to Apple in competition or “closed and subject to Jobs’ whims” is the future of IT. That’s not a world that I look forward to. Unfortunately, no one else seems to get what Apple is doing right.

August 19, 2011

RAGE

Interesting week for mobile. Google buys Motorola Mobility, hp abandons webOS devices, and only six months after promised, Symbian Anna (aka PR2.0) is released. It's best feature? New icons. No, really.

What's in it?  Lots. Let’s break down the main changes section-by-section. There are the new icons, of course, but there’s a whole lot more under the hood.

The reason Nokia gave for not drastically overhauling the look and feel of Symbian in S^3 was maintaining a "familiar look and feel". Yet their most important change in their first major update... new icons. So long, recognition. They didn't even put the new Nokia Pure font on there.

Anyway. let's update my N8. Turns out, the update is not available over the air in the UK. One of the two distinctly different ways to invoke a software update on the N8 says there is nothing available, the other says I have to use Nokia Ovi Suite on the PC to update. Well, I don't run primarily run a PC, so I dig out a Windows laptop, install Ovi Suite, back up my phone, and... am offered three games, and no system update.

"Server is probably overloaded", the Internet says.

So next morning, I try again. Lo and behold, the system update awaits! Four small steps to follow. But I can't pass step 2 without a SIM card installed. I dredge out my NZ PAYG SIM (which appears to have expired, possibly taking my very old and very awesome phone number with it). However, apparently even a useless SIM is SIM enough for Nokia. Authenticate, back up (again), only to be told at step 3 that there is no update available.

Then, if I unplug the phone and plug it back in again, Ovi Suite tells me there are 9 updates, including applications, but the update screen says "Select the applications you want to install" and then offers nothing to choose from, and no active buttons except "Later".

Later, indeed. I had high enough hopes for this platform that I went to work for the Symbian Foundation, but with the lack of control we had over the platform, I'm not going to be pouring out any 40s in its memory any time soon.

 

I should point out that I'm not even using this phone any more: thanks to the generous Eiren O'Keeffe I'm currently borrowing an HTC Mozart, running Windows Phone 7. In general, and in comparison to the N8, it is fantastic.1 Data works reliably, mail works (without requiring a regular reboot), I can have both my calendars, etc, etc. I can't imagine how long it would take to get to this point with Symbian.

I still think it's too early to call if Nokia made the right choice with WP7. Symbian was obviously not going to get to Mango-good by November. The N9 looks nice, but it's running the abandoned half of Maemo, not the "Intel collaboration" half of MeeGo. Android was out. My friend Nez called it months ago: WP7 is OK. OK enough to sell Nokia phones when other manufacturers have the same software? We shall see. I have not been a fan of Nokia's industrial design of late, but I really do miss offline maps.

So far, the downsides for me of WP7 are the aforementioned maps, a weird bug where it wouldn't let me hang up once, and a lack of "official" apps. There is a free BBC News app in the Marketplace, with "This is a 3rd party application in no way associated with the BBC" on the front screen, but no actual BBC app. The Twitter app is passable, and apparently much improved in Mango.2 It's still hard to find third-party apps that are anywhere near as good as what you get on iOS, and especially hard to judge that from the Marketplace app on the phone.

There is an stolen-firmware update to WP7 Mango but I'm laying off installing it, expecting the official update will probably be out within a month. I somewhat expected it to beat Symbian Anna out.

 

Further RAGE: creating a cisco.com ID requires a "9-50 character username" (one character longer than my first and last name concatenated) and a password with a maximum of 15 characters (counting out my regular password, "correct horse battery staple".)3

They also can't get their story half straight:

On the form: Must be 8 or more characters and contain a combination of uppercase and lowercase letters (A-Z or a-z) and at least 1 number (0-9).
After submitting: Invalid Password. Password is case sensitive with length between 5 and 15 characters. Password cannot be the same as the user name.

And then it didn't like my phone number.

And then the captcha, which had not changed all the way through, changed.

And then it mentioned, for the first time, a field I hadn't updated.

And then, after EVERYTHING WAS FINALLY ACCEPTABLE, "Your session is no longer active".

It disturbs me a little that I have very little positive comment to make about all this technology at the moment. Perhaps I should have used 'curmudgeon' as my Cisco username. That's over 9 characters.

 

This last one actually started out as a positive story, but quickly soured: I found that with Miray HDClone, you can take a Windows machine running on Amazon EC2 with an instance-store (S3) root, clone the disk to an EBS volume, and then attach that EBS volume to a new machine. Voilà, one persistent machine!

After applying 2 years worth of Windows updates, and for the first time ever, I actually found and cleaned malicious software with the Microsoft Malicious Software Removal Tool. Unfortunately for recovery purposes, getting "the Windows CD" on EC2 seemed a little harder than it should be - even a couple of ISOs I threw up there were not recognised. Unable to guarantee the system was in a good state, I advised the machine needed to be rebuilt from scratch, which of course requires the owner to audit all the software that was installed on it over the last 2 years. They will have fun with that!

 

  1. Except for the squareness of everything. Round those rectangles and I would be a happy camper.
  2. I sometimes felt the Gravity fanboy-ism on Symbian. However, Gravity has had to reinvent the entire platform to become half decent, and suffers from decisions made in less connected times - I still have to wait to "Go online" before I can use the app. Sorry Ole, but I'd rather use an OK app on a good platform than a great app on a burning platform.
  3. I had to look that up - it obviously wasn't as memorable as the list of 10 objects I was asked to remember in first-year psych, which are still burned into my head.

August 16, 2011

Thoughts on "Letter about the TEI" from Martin Mueller

Thoughts on "Letter about the TEI" from Martin Mueller

Note: I am a member of the TEI council, but this message is should be read as personal position at the time of writing, not a council position, nor the position of my employer.

Reading Martin's missive was painful. I should have responded earlier, I think perhaps I was hoping someone else could say what I wanted to say and I could just say "me too." They haven't so I've become the someone else.

I don't think that Martin's "fairly radical model" is nearly radical enough. I'd like to propose a significantly more radical model as strawman:


1) The TEI shall maintain a document called the 'The TEI Principals.' The purpose of The TEI is to advance The TEI Principals.

2) Institutional membership of The TEI is open to groups which publish, collect and/or curate documents in formats released by The TEI. Institutional membership requires members acknowledge The TEI Principals and permits the members to be listed at http://www.tei-c.org/Activities/Projects/ and use The TEI logos and branding.

3) Individual membership of The TEI is open to individuals; individual membership requires members acknowledge The TEI Principals and subscribe to The TEI mailing list at http://listserv.brown.edu/?A0=TEI-L.

4) All business of The TEI is conducted in public. Business which needs be conducted in private (for example employment matters, contract negotiation, etc) shall be considered out of scope for The TEI.

5) Changes to the structure of The TEI will be discussed on the TEI mailing list and put to a democratic vote with a voting period of at least one month, a two-thirds majority of votes cast is required to pass a motion, which shall be in English.

6) Groups of members may form for activities from time-to-time, such as members meetings, summer schools, promotions of The TEI or collective digitisation efforts, but these groups are not The TEI, even if the word 'TEI' appears as part of their name.




I'll admit that there are a couple of issues not covered here (such as who holds the IPR), but it's only a straw man for discussion. Feel free to fire it as necessary.



August 12, 2011

A quote on the state of patents

Matthew Phillips:

We have two software patent articles on the front page of HN. One from a developer saying they are unfixable; another from a lawyer saying they aren’t broken. I think this succinctly describes the situation we are in.

(Links mine.)

August 08, 2011

A strange game

Robert O’Callahan:

It no longer makes sense to take great ideas and create, sell and support software products. Instead, at modest expense and low risk you can obtain software patents covering your ideas in all their variations. Then just sit back and wait.

He wrote this 6½ years ago, and it’s more timely now than ever.

August 02, 2011

Lines spent

Edsger W. Dijkstra:

My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Faces of a culture

Nick Carr:

To succeed on a global scale, Facebook had to transform its early student culture to a more mainstream culture – it had to go from Shitfacebook to Straightfacebook.

Wet streets cause rain

Michael Crichton:

Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray’s case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward – reversing cause and effect. I call these the “wet streets cause rain” stories. Paper’s full of them.

In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.

0.999…

A way to illustrate 0.999…=1 occurred to me the other day that I haven’t seen before, although I’m sure it’s not original.

Namely, the purported difference between them, if there was one, would be 0.000…1. But that “…” represents infinitely many zeroes! No matter how long you wait for it, the 1 “at the end” never comes: there is simply no end to the zeroes.

Contrast with other numbers that are intractable in seemingly the same way, e.g. π. If you start writing down 4−π, you get non-zero digits right away: 0.8584073…. There is a material difference between 4 and π.

But with 1−0.999… you can write down digits from now till all eternity and you will never get anything more than a string of zeroes. No matter how far down the rabbit hole you chase that seeming difference, you’ll never find it.

Because: it isn’t there. There is no difference. They are the same.

July 29, 2011

Software Freedom Day 2011 Stockholm

Software Freedom Day 2011 Stockholm

Saturday 17th September 2011
Stockholm SFD 2011 event

For the fourth year in a row the Swedish Linux Society “Svenskalinuxföreningen” will be organising a Software Freedom Day event here in Stockholm. All the details and plans are still to be done, but again this important day will be celebrated.

/Roger Sinel



July 23, 2011

Software that doesn’t work with Mac OS X Lion

Since upgrading to Mac OS X Lion I’ve found the following software on my System that no longer works.

Games & apps that require Rosetta (PowerPC compatibility)

  • Age of Empires 2 Gold
  • Baldur’s Gate 2
  • Diablo 2
  • DooM Legacy
  • Railroad Tycoon 3
  • Starcraft

Games and apps that crash on start

  • The Dig
  • Indiana Jones and the Last Crusade
  • Guitar Rig 4 64bit (patch expected soon)

July 19, 2011

An era has come to an end

The last vestiges of the SSI code that originally drove this site are finally gone.

(At some point I may have more to write about the code that now runs the site. But Earth has turned many times since I wrote of it last and the world in which weblog engines were interesting seems to have expired in the meantime.)

July 12, 2011

Even more Cocoaheads Presentations

I’ve continued my journey with Final Cut Pro X and created more videos from talks done at the June Melbourne Cocoaheads meetup. These videos are now up on the Melbourne Cocoaheads Vimeo group (and embedded below).

David Kennedy & Scott Manley of Dangerous Pixels speak about building their iOS app development consultancy and their application Task Caddy.

You can find a wrap-up from the Dangerous Pixel guys, with links, slides and other resources over on their blog.

Luke Cunningham & Jesse Collis on “Epic Refactorings and Patterns to Make Your Code Awesome”.


July 07, 2011

Permissions For Browser-Based Applications

Robert O’Callahan:

Good old <input type="file"> accidentally invented an excellent permissions model for file I/O. The app asks the user to choose a file to load, and in the process the user implicitly grants the app permission to read that file! The same sort of approach works in other situations.

July 02, 2011

More Melbourne Cocoaheads Presentation Videos

I’ve been getting busy with Final Cut Pro X and created more videos from talks done at our local Cocoaheads meetups. These videos are now up on the Melbourne Cocoaheads Vimeo group (and embedded below).



June 28, 2011

When you want something done right

Phil Wilson just suffered a crippling blow from Google: they disabled his account for no apparent reason, so he no longer has access to his own email and his own weblog.

I know I have harped on this topic before, but in this age of “Web 2.0” I don’t think it can be repeated often enough. If you rely on others, you better have a plan B.

Danah Boyd recounts a similar story.

Martin De Wulf didn’t lose an account (his wife’s), they had it stolen, but the bottom line remains.

What’s wrong with this macro?

What is wrong with this Objective-C (and C/C++) macro?


#define RETAIN_PROPERTY(propertyName, newValue) \
    do { \
        if (propertyName##_ != newValue) \
        { \
            [newValue retain]; \
            [propertyName##_ release]; \
            propertyName##_ = newValue; \
        } \
    } while(0)

Looks pretty benign right? Well it is when you call it like this:


id n = [NSNumber numberWithInt:10];
RETAIN_PROPERTY(propName, n);

But what happens when you do this:


RETAIN_PROPERTY(propName, [NSNumber numberWIthInt:10]);

Well macros are just that, macros. They generate code. They substitute the text of the macro parameters and emit the body into your source. Unlike methods or functions, they do not evaluate their parameters before passing the result to the macro. So the macro above when called the second way will produce the following code:


    do {
        if (propName_ != [NSNumber numberWIthInt:10])
        {
            [[NSNumber numberWIthInt:10] retain];
            [propName_ release];
            propName_ = [NSNumber numberWIthInt:10];
        }
    } while(0);

Not quite what we wanted right. We’re leaking an NSNumber instance and creating two which are auto-released and we’ll probably have a zombie object stored in propName_ after the auto-release pool is drained. A nice confusing bug for us to stumble upon.

So how do we fix this?

Like so:


#define RETAIN_PROPERTY(propertyName, newValue) \
    do { \
        __typeof__(newValue) __A = (newValue); \
        if (propertyName##_ != __A) \
        { \
            [__A retain]; \
            [propertyName##_ release]; \
            propertyName##_ = __A; \
        } \
    } while(0)

The line __typeof__(newValue) __A = (newValue); forces the passed parameter to be evaluated and a result stored in the temporary __A. We can then use the temporary variable multiple times within the macro body without fear.

Bug fixed and we can get back to being productive coders.


June 23, 2011

unit testing framework for XSL transformations?

I'm part of the TEI community, which maintains an XML standard which is commonly transformed to HTML for presentation (more rarely PDF). The TEI standard is relatively large but relatively well documented, the transformation to HTML has thus far been largely piecemeal (from a software engineering point of view) and not error free.

Recently we've come under pressure to introduce significantly more complexity into transformations, both to produce ePub (which is wrapped HTML bundled with media and metadata files) and HTML5 (which can represent more of the formal semantics in TEI). The software engineer in me sees unit testing the a way to reduce our errors while opening development up to a larger more diverse group of people with a larger more diverse set of features they want to see implemented.

The problem is, that I can't seem to find a decent unit testing framework for XSLT. Does anyone know of one?

Our requirements are: XSLT 2.0; free to use; runnable on our ubuntu build server; testing the transformation with multiple arguments; etc;

We're already using: XSD, RNG, DTD and schematron schemas, epubcheck, xmllint, standard HTML validators, etc. Having the framework drive these too would be useful.

The kinds of things we want to test include:
  1. Footnotes appear once and only once
  2. Footnotes are referenced in the text and there's a back link from the footnote to the appropriate point in the text
  3. Internal references (tables of contents, indexes, etc) point somewhere
  4. Language encoding used xml:lang survives from the TEI to the HTML
  5. That all the paragraphs in the TEI appear at least once in the HTML
  6. That local links work
  7. Sanity check tables
  8. Internal links within parallel texts
  9. ....
Any of many languages could be used to represent these tests, but ideally it should have a DOM library and be able to run that library across entire directories of files. Most of our community speak XML fluently, so leveraging that would be good.

June 21, 2011

Slides from Cloud Computing World Forum

My slides from Cloud Computing World Forum today are up at http://www.next-genit.co.uk/events-1

It was a good event and some pretty good speakers in general and lots of interesting and smart people around. If you're in London it's still on tomorrow.

June 19, 2011

A different angle on Twitter’s recent moves

Tom Davis:

I’m actually quite happy that Twitter is starting to close up their platform. People can stop pretending it’s some open platform for global communication and finally realize it’s a novel service thanks to its popularity, but little more.

June 14, 2011

winmail.dat

I still get these file creeping into my inbox from time to time and evolution doesn’t know what to do with them. I just found this solution to unpack them in Linux and it worked.

http://www.kopf.com.br/winmail/winmail.php

Linux version
http://sourceforge.net/projects/tnef/

# tar xzvf tnef-x.y.tar.gz
# cd tnef-x.y
# ./configure
# make check
# make install
# tnef -v winmail.dat

done

/roger