Penguin

June 19, 2013

A wikipedia strategy for the Royal Society of New Zealand

Over the last 48 hours I’ve had a very unsatisfactory conversation with the individual(s) behind the @royalsocietynz twitter account regarding wikipedia. Rather than talk about what went wrong, I’d like to suggest a simple strategy that builds the Society’s causes in the long term.
First up, our resources: we have three wikipedia pages strongly related the Society, Royal Society of New Zealand, Rutherford Medal (Royal Society of New Zealand) and Hector Memorial Medal; we have a twitter account that appears to be widely followed; we have some employee of RSNZ with no apparent wikipedia skills wanting to use wikipedia to advance the public-facing causes of the Society, which are:
“to foster in the New Zealand community a culture that supports science, technology, and the humanities, including (without limitation)—the promotion of public awareness, knowledge, and understanding of science, technology, and the humanities; and the advancement of science and technology education: to encourage, promote, and recognise excellence in science, technology, and the humanities”
The first thing to notice is that promoting the Society is not a cause of the Society, so no effort should be expending polishing the Royal Society of New Zealand article (which would also breach wikipedia’s conflict of interest guidelines). The second thing to notice is that the two medal pages contain long lists of recipients, people whose contributions to science and the humanities in New Zealand are widely recognised by the Society itself.
This, to me, suggests a strategy: leverage @royalsocietynz’s followers to improve the coverage of New Zealand science and humanities on wikipedia:
  1. Once a week for a month or two, @royalsocietynz tweets about a medal recipient with a link to their wikipedia biography. In the initial phase recipients are picked with reasonably comprehensive wikipedia pages (possibly taking steps to improve the gender and racial demographic of those covered to meet inclusion targets). By the end of this part followers of @royalsocietynz have been exposed to wikipedia biographies of New Zealand people.
  2. In the second part, @royalsocietynz still tweets links to the wikipedia pages of recipients, but picks ‘stubs’ (wikipedia pages with little or almost no actual content). Tweets could look like ‘Hector Medal recipient XXX’s biography is looking bare. Anyone have secondary sources on them?’ In this part followers of @royalsocietynz are exposed to wikipedia biographies and the fact that secondary sources are needed to improve them. Hopefully a proportion of @royalsocietynz’s followers have access to the secondary sources and enough crowdsourcing / generic computer confidence to jump in and improve the article.
  3. In the third part, @royalsocietynz picks recipients who don’t yet have a wikipedia biography at all. Rather than linking to wikipedia, @royalsocietynz links to an obituary or other biography (ideally two or three) to get us started.
  4. In the fourth part @royalsocietynz finds other New Zealand related lists and get the by-now highly trained editors to work through them in the same fashion.
This strategy has a number of pitfalls for the unwary, including:
  • Wikipedia biographies of living people (BLPs) are strictly policed (primarily due to libel laws); the solution is to try new and experimental things out on the biographies of people who are safely dead.
  • Copyright laws prevent cut and pasting content into wikipedia; the solution is to encourage people to rewrite material from a source into an encyclopedic style instead.
  • Recentism is a serious flaw in wikipedia (if the Society is 150 years old, each of those decades should be approximately equally represented; coverage of recent political machinations or triumphs should not outweigh entire decades); the solution is to identify sources for pre-digital events and promote their use.
  • Systematic bias is an on-going problem in wikipedia, just as it is elsewhere; a solution in this case might be to set goals for coverage of women, Māori and/or non-science academics; another solution might be for the Society to trawl it's records and archives lists of  minorities to publish digitally.

Conflict of interest statement: I’m a high-active editor on wikipedia and am a significant contributor to all many of the wikipedia articles linked to from this post.

June 10, 2013

rename 1.600

I just cut a new release of rename: 1.600. The headline feature of this version was inspired by Dr. Drang: a built-in $N variable for easily numbering files while renaming them. It is accompanied by a --counter-format switch for passing a template, so you will be spared the fiddling with sprintf for padded counters.

I also finally gave the documentation the huge overhaul it has needed and deserved for a long time. There is now a proper synopsis, the description is brief, and the tutorial that was previously in the description section is a separate much larger section adapted to all the new stuff added since my original version of this utility. Lots of things are now documented properly for the first time.

In more minor notes, there is now a negatable --stdin switch you can use to explicitly tell rename to read from stdin, rather than it just guessing that it’s supposed to do that based on the absence of file names on the command line. The purpose of this is more predictable behaviour in situations where rename is passed computed arguments that may evaluate to nothing (e.g. with the nullglob shell option).

And lastly, I extracted a new --trim switch from --sanitize, mostly for consistency’s sake.

Share and enjoy.

May 10, 2013

Together we can end this destructive conflict

The irc.mozilla.org qdb:

<jesup> There is no CapsLock.  There is only Ctrl.  ;-)
<jesup> First thing I config on any new machine.  Can you tell I use emacs?
<mbrubeck> It's the first thing I configure too, and I'm a vi user.
<mbrubeck> Maybe we've found the common ground that can unite a war-torn planet!

(Related.)

April 10, 2013

A quote for the ages

jwz:

I tried to explain to rzr_grl what Debian was, and the best I could come up with was that they’re like the Radical Fundamentalist nutjob faction of Linux: people for whom Red Hat is insufficiently extremist. At this point she looks at me as if to say, “you mean the nutjobs have their own nutjobs??” I suspect she thought I was making the whole thing up.

January 01, 2013

Mobile broadband for home usage in the UK

I've recently moved into our own home (yeah!) but found myself with no timeline yet on when I can get home broadband :-( The situation is that Openreach has an open order on our new phone/ADSL line so can't do anything for quite a few weeks, until then - and that particular order is probably a cancellation!! It is a peculiar situation in that it would have been quicker if the previous owners hadn't done the right thing and then we could have "taken over" the line. No ISP is interested in the line until then even though I can show we legally own the house. So I've decided to document how I kept working and kept our sanity.

Firstly my requirements were that needed a fair bit of bandwidth as we have three heavy Internet users in the house and many devices, and that we don't get hit with massive bills if we go over any cap. I looked at pure pay as you go devices but the cost per GB worked out at about £5 per GB. Nasty! So for the main bandwidth I went with Three on the MiFi - Huawei E586. This gives 15 GB per month on a 24 month contract for £18.99 - £15.99 for the bandwidth and £3 for the device. I know Three has a bad reputation for some people due to poor initial coverage at launch but they have been brilliant for me in the past - used them for a dongle when we first moved to the UK and were the only ones who would give me a contract on the spot at the time. Again this time I went to a Three shop and they sorted it all out for me instantly.

This was working well and decent speed (6 Mbits down, 3 Mbits up) but is getting used up fairly quickly. I had turned off WiFi for phones and iPads that had data on, but the gaming / YouTube etc takes it's toll. One thing to note is that you can get the bill capped on these as well so that you don't get a nasty bill once you hit the 15 GB. You do have to tell them this though, and they don't guarantee it will work if their bandwidth site is down (which it has done a couple of times) so I will turn the device off when it gets close to the 15 GB each month.

I then went looking for a pay as you go option that was reasonably priced and I settled on T-Mobile. They are unique in that you pay per time period and that if you go over the amount of allocated bandwidth then they just slow down your connection and not allow videos or downloads. The data allowances / days are £2 for 1 day and 250 MB, £7 for 1 week and 500 MB or £15 for 1 month and 1GB. I figure that this will work fine for the short time when the allowance on Three has run out during the month. One useful thing T-Mobile do is graphics compression which reduces your bandwidth use. If you don't like it you can always turn it off at http://www.accelerator.t-mobile.co.uk/. I did have a bit of  problem getting the T-Mobile connection going. I paid £25 for a Huawei E3131 including £10 topup. They didn't apply the topup to my SIM and it was not working. I rang them up and they said they would correct and activate. In the meantime I decided to test it on a Zoom 4501 3G router that I had lying around to see if this would tell me the status of the card. Unfortunately the data stick stopped working altogether at that time. This then made me think back a few weeks ago when my work Vodafone data stick stopped working and funnily enough I had tried to test this as well... On the Zoom you plug the USB data stick into a USB port and it shares out the connection. This is great in theory and I have used it a fair bit in the past. But it now has developed a fault and has destroyed two USB data sticks so I would say stay away from it...

After breaking the new Huawei data stick I put the Micro SIM into the only phone I had that took a Micro SIM - a Nokia 800 and it worked for receiving text messages, but not for Internet. The phone had previously been used for Vodafone. None of the "download automatic settings" seemed to work for the phone, so had a hunt around on the Internet and just went and put in an APN of general.t-mobile.uk - note the lack of .co in the middle as some websites had this wrong and having it wrong doesn't work..

Having said all this I will be very happy to go back to Sky Broadband when I can for £10 per month. Will be even better when they roll out fibre here.. I have used Virgin before but found they "managed / shaped" the traffic a lot and this interfered with gaming. I also can't get it at this house anyway...

November 03, 2012

A few tips for Amazon


Just got asked about getting started with Amazon Web Services (AWS) and thought may as well put as a blog post as well.

So here they are:
  • Make sure you take support of some kind from Amazon. You need it as sometimes your machine might get a glitch or you just want to ask a few questions
  • Get to know the account team at Amazon. They will give you free technical training and help you out. Once you grow enough they'll also help you change from credit card billing to on account billing. I personally wouldn't worry about trying to alter contract/legal terms and conditions - you'll tie yourself in knots for ages and gain virtual nothing
  • Architect for failure. With Amazon you still need to have redundancy and backups. See my blog post at http://blog.next-genit.co.uk/2012/04/building-for-amazon.html
  • Use their products where possible to reduce work for you. e.g. Amazon Linux (their version of RedHat), RDS (MySQL, Oracle), DynamoDB (a NoSQL database)
  • Start with small instances and make bigger as needed, rather than other way around. Very easy to resize (just needs a reboot) and you will save money. Only exception to this is micro instances which will never give you reliable performance as they just use timeslices that are spare.
  • Use 64 bit so you can scale all the way up if needed. No penalty on cost.
  • Amazon can now do just about anything as they have introduced SSD disks, committed IOPS etc
  • Utilise VPC by default which is their VPN. This can now connect back onto your firewall as a connection by IPSec VPN and they also connect to some data centres directly now. Of course you need to follow good system design and keep systems together that cause a lot of IO between each other.

October 26, 2012

EarPods

Curiosity got the better of me: I succumbed to the hype and bought a set of Apple’s new EarPods. These are my thoughts on them after a week.

Basically, they sound OK. They are most certainly a huge improvement on the buds Apple used to make, but the sound quality is not amazing. More noteworthy is that I find them comfortable to wear for any length of time. They also maintain a good fit to the ear by themselves, twisting only slightly out of the optimal position when not held manually. (The old buds were rubbish in both these regards.) Quantitatively speaking they are fine value for the money (they don’t cost much!) but not a brilliant shopping choice.

There is one thing about them however that I have not seen remarked on anywhere else, which makes me not regret this purchase at all.

Maybe it is only owing to a peculiarity of my ears, but somehow the EarPods manage to imbue the low bass range with that subterranean quality of a great bass listening experience on large high-fidelity speakers.

I have never experienced headphones manage to reproduce this before. Circumaural speakers tend to make that bass range sound purely ærial; intra-aural, sealing buds tend to jackhammer it directly against the ear drums; non-sealing buds (of which I have only used cheap ones, admittedly) lack almost all punch. The EarPods somehow manage to drive the bottom end of music with respectable oomph while at the same time being subtle and understated about it.

They aren’t closed, so noisy environments will drown out their bass delivery efforts. But that seeming weakness yields a great upside: it is very comfortable for me to turn the volume up loud and keep it there for quite a stretch without ever getting fatigued by a relentless onslaught of bass – even though it is anything but weak or tinny. Part of that is also the consistently open and transparent sound at any volume level.

Their mediocre crispness at the top end can be distracting when you pay attention, however.

All in all, I am enjoying these as a workhorse set I can pop in to keep myself happy while preoccupied.

In conclusion: buy these not for the exceptional quality they are advertised for, but for their great comfort.

October 18, 2012

Glasnost Lives!, or: All Nations Under The Source, or: Linux

Alan Cox:

If you look at Linux contributions they come from everywhere. The core of the network routing code was written by Russians […] who worked at a nuclear research instutite […]. We have code from government projects, from educational projects (some of which are in effect state funded), from businesses, from volunteers, from a wide variety of non profit causes. Today you can boot a box running Russian-based network code with an NSA-written ethernet driver.

October 08, 2012

Black

David Hill, of ThinkPad design fame:

It’s the color of power. It’s the color of death. It’s the color of sex. It’s the color of so many different things.

October 04, 2012

Entropy and Monitoring Systems

I use munin for monitoring various aspects of my servers, and one of the things munin will monitor for me the amount of entropy available. On both my current server and my previous one I’ve noticed something unusual here:

According to munin, I’m almost perpetually running out of entropy. Munin monitors the available entropy by chekcing the value of /proc/sys/kernel/random/entropy_avail, which is the standard way you’d check it. My machine has several VMs running, and hosts a few services that use entropy at various times (imaps, ssmtp or smtp+tls, ssh, https), so it’s not unreasonable that I may have been entropy starved. If my entropy levels are always around the 160 mark, it’s likely that at any given time I’m totally starved of entropy, so anything using encryption will stall a bit.

I had a brief look into various entropy sources, such as timer_entropyd or haveged, but none of them seemed to help. I’d seen several references to Simtec’s entropykey, which looked very promising, so I ordered one from the UK, which arrived a week or so ago.

I’ve yet to arrange a trip to the datacentre to install it however, and after a bit of poking round today I’m not so sure it’s as desperately needed as I thought

I randomly checked on the contents of /proc/sys/kernel/random/entropy_avail, just to see what it was like. There were over 3000 bits of entropy present. Very odd. I repeated this several times, and watched the available entropy decrease from over 3000 down to around 150 or so, the same as in my munin graph above. I repeated this about a quarter of an hour later, with the same results – over 3000 entropy, rapidly decreasing to very little.

After a bit of further digging, I found this blog post, which mentioned that creating a process uses a small amount of entropy. The author of that post was seeing problems with his entropy pool not staying full, which sounds like what I was seeing. I’m still not clear on what requires entropy though, as some of my systems at work clearly don’t deplete the entropy pool during process creation.

So, I did some different monitoring: Check the value of entropy_avail every minute, through a different script. The graph below shows the results:

Clearly, entropy is normally very good, but is dropping down to very low levels every 5 minutes. It replenishes just fine in the intervening 5 minutes however, which suggests that I don’t really have a problem with entropy creation, just with using it too quickly.

As for the question, “why is my entropy running out so fast?”, the answer is quite simple: Munin. On my host machine, munin runs around 50 plugins, each of which generally calls other processes such as grep, awk, sed, tr, etc. I don’t have exact figures on how many processes were being kicked off every 5 minutes, but I wouldn’t be surprised to find it was hundreds, all of which used a little bit of entropy

I’ll still install the EntropyKey, and maybe it’ll help my pool recover quicker.

September 15, 2012

Software Freedom Day 2012 - ULUG - Great event

I was feeling a bit disappointed that this year that I couldn’t organize a Software Freedom day here in Stockholm for the 5th time running because a very hectic schedule this year. Then I was invited to attend the Uppsala event today. The local Linux User’s Group ULUG had for the second year running organized [...]

September 14, 2012

Software Freedom Day 2012 - ULUG

Spending the day with the ULUG in Uppsala this year. /Roger Software Freedom Day 2012 - ULUG ULUG SFD SFD 2012 Uppsala

August 19, 2012

Tweet on, tweeter

Twitter effectively say quoting a tweet on one’s site as a plain quotation is henceforth outlawed. Idiotic. I doubt they have a legal leg to stand on anyway, but that they would even want to do this is galling just the same. Even more galling to me is that by all I can tell, it appears that even my own tweets would technically be subject to these limitations if I myself chose to quote them elsewhere.

It’s not like I was very active on Twitter in recent times, but this move has completely soured me on the service.

When Twitter killed the ability to see all @-replies from your followees in your stream, even those to people you didn’t yourself follow, my enthusiasm dropped off a cliff. Remember that? As far as I’m concerned, that is when communal Twitter died. A lot of people quit in a huff. I stuck around, though the place was never the same again. Next, the client I was using (which was effectively unmaintained by then but had kept working) fell over dead when Twitter made OAuth a requirement. I never found a replacement both lightweight and inoffensive enough. (On Linux you could find either or, but not both. I have not tried again in a while.) So I’ve stuck around by using just the site, only poking in every once in a while because the site is not a convenient persistent client. My vague intention was to one day make a serious effort to find a new client and get back into it.

So much for that.

And their presumption in wanting to dictate to the world what they are allowed to do with, err, 140 characters of plain text makes me want to neither read nor write anything on Twitter any more.

So I am washing my hands of it.

Update: Hah! Ha ha. Not the reason I am irritated per se, but illustrative nonetheless.

July 08, 2012

Code that counts

Tom DeMarco:

My early metrics book, Controlling Software Projects: Management, Measurement, and Estimation (Prentice Hall/Yourdon Press, 1982), played a role in the way many budding software engineers quantified work and planned their projects. […] The book’s most quoted line is its first sentence: “You can’t control what you can’t measure.” This line contains a real truth, but I’ve become increasingly uncomfortable with my use of it.

Implicit in the quote (and indeed in the book’s title) is that control is an important aspect, maybe the most important, of any software project. But it isn’t. Many projects have proceeded without much control but managed to produce wonderful products such as Google Earth or Wikipedia.

To understand control’s real role, you need to distinguish between two drastically different kinds of projects:

  • Project A will eventually cost about a million dollars and produce value of around $1.1 million.

  • Project B will eventually cost about a million dollars and produce value of more than $50 million.

What’s immediately apparent is that control is really important for Project A but almost not at all important for Project B. This leads us to the odd conclusion that strict control is something that matters a lot on relatively useless projects and much less on useful projects. It suggests that the more you focus on control, the more likely you’re working on a project that’s striving to deliver something of relatively minor value.

June 29, 2012

Title of Record

Londoners take their titles very seriously. Filling in my name on the TFL's web site's "Contact Us" form, my options for Title are:

  • Ms
  • Mr
  • Mrs
  • Miss
  • Dr
  • Cllr
  • Prof
  • Sir
  • Not given
  • Air Cdre
  • Ambassador
  • Baron
  • Baroness
  • Brig Gen
  • Brother
  • Canon
  • Captain
  • Cardinal
  • Cllr Dr
  • Colonel
  • Commander
  • Count
  • Countess
  • Dame
  • Dowager Lady
  • Duchess of
  • Duke
  • Earl
  • Empress
  • Father
  • Fleet Admin
  • Gen
  • Gp Capt
  • Hon
  • Hon Mrs
  • HRH
  • Imam
  • Judge
  • Lady
  • Laird
  • Lieut Colonel
  • Lieutenant
  • Lord
  • Madam
  • Major
  • Major General
  • Marchioness
  • Marquess
  • Mayor
  • Pastor
  • Pc
  • Prince
  • Princess
  • Rabbi
  • Rev
  • Rev Dr
  • Revd Canon
  • Rt Hon
  • Rt Hon Baroness
  • Rt Revd
  • Sergeant
  • Sheikh
  • Sister
  • Sqn Ldr
  • Viscount
  • Viscountess
  • Wg Cd
  • Other

They list HRH, but not HM?  Surely, it's not unreasonable to assume that the Queen has complaints about service on the Underground?

 

May 31, 2012

Fedora 17 released

Another great day has come, Fedora 17 has been released. Get your copy now and enjoy. http://fedoraproject.org/ /Roger Fedora 17

Magic

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. […]

Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

[…]

Not all is delight, however […] One must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn’t work. Human beings are not accustomed to being perfect, and few areas of human activity demand it.

Fred Brooks, The Mythical Man-Month

There I am, tattered-robed old man standing alone, beard whipping in the wind, with a distant stare (slightly mad), intently murmuring an unintelligible ramble under his breath… until something happens. To read this passage the first time was an arresting moment of revelation of a truth I had known without knowing, all along.

May 21, 2012

Fixing your twitter being hacked

Often people are sending out junk tweets or direct messages from their account and they are not sure how to stop them. For example the latest ones are direct messages like "Hi someone is posting nasty rumors about you..."  or "Hi some person is posting nasty things about you..." or "Hi somebody is saying really bad things about you..." or sending out status updates such as "I lost weight without having to make any major diet changes while boosting energy levels, heres how: http://media-channel-8.com"

The reason for this is that your Twitter account has been compromised. This was probably due to you clicking on a bad direct message or going to a malicious website.

To fix this go into https://twitter.com/settings/applications and revoke every single application. These are web applications that you have given permission to at some time to use your Twitter account - all those applications listed can go and post spam if they are malicious. Don't worry about deleting ones you use - any legitimate application will ask for permissions the next time you go to use them.

After you have revoked the permissions go and change your Twitter password. If you have used this password elsewhere and you could have entered your Twitter password into a fake website then you must now go and change all the accounts that used the same password. When making new passwords consider using a totally different password for any financial websites.

To stop this happening in future always check on URLs before clicking on them from any website. If you can't do this as they are short URLs that don't supply the whole link (e.g. bit.ly, t.co) then right click on the link and select "open link in incognito mode" or similar such as private mode. This will then open them in another more secure browser window that will not have any of your logged in websites available to any potential malicious sites. Also make sure to read the URL properly every time you login or supply personal information. e.g. the websites on these direct messages from the compromised account show as twititre.com rather than twitter.com

May 05, 2012

An orphan olive branch to Mercurial

Git repository browsers have universally awful graph drawing algorithms.

For the longest time, one of my repositories has had two main branches, master and release. For a release, I would git merge --no-ff master into release. (Using --no-ff forces a commit on release even if release could be fast-forwarded to the current state of master. That way the act of cutting a release is always recorded in the repository.) Development happens on master, sometimes on branches. Topic branches are rebased before merging them back to master, once again using the --no-ff switch to record that a certain stretch of commits belonged to one topic together.

Essentially, this is a two-track history, with occasional short parallel side tracks on one side:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master
      \   \           \                    \   \       \   \
-------o---o-----------o--------------------o---o-------o---o    release

You would think that this would be easy to draw in a sane way.

And most of the time it is. But sometimes repository browsers decide to to draw release on the other side of master. And as it happens, sometimes a topic falls by the wayside for a while. When these conditions coincide, drawing the stray heads from these topic branches and at the same time drawing release in such a way that the merge direction (from master into release) is correct suddenly requires snaking each release commit around all the previous ones. The result is a marshalling yard of parallel tracks (which I will not try to give an ASCII diagram of…) for representing what in reality is a very simple history. That makes it very difficult to make heads or tails of what really happened in the repository: a whole Black Forest out of just two trees.

There are some ordinary options to suppress this. The most obvious one would be to do a fast-forward merge of release back into master before picking up again. Doing so yields a triangular structure like this:

                           o--o--o--o
                          /          \
-o---o   o   o---o---o   /------------o---o---o   o   o---o   o---o  master
      \ / \ /         \ /                      \ / \ /     \ / \
-------o---o-----------o------------------------o---o-------o---o    release

Here there are no parallel tracks: the only unbroken track is the release branch, so no matter when and how any algorithm tries to draw this graph, it will be forced to string the commits into short side tracks alongside the release track. There is no likely way to turn this into a funhouse of illusory complexity.

Any solution that merges release into master in any way will have a very annoying drawback, however: you can no longer read the history of master without getting all of the release merges interspersed into it. This is all the worse if you never gave you those merge commit messages much thought, because that means the history of release by itself consists of nothing but an endless row of “Merge 'master' into release”. And if that was bad enough by itself, it gets really irritating during periods when most commits are released immediately: the noise takes up a major part of your commit log.

Then an epiphany disrupted my long-standing dissatisfaction with the situation.

This is what the history in my repository looks like now:

                       o--o--o--o
                      /          \
-o---o---o---o---o---o------------o---o---o---o---o---o---o---o  master

-------o---o-----------o--------------------o---o-------o---o    release

That’s right: no merges.

Yet again, release is a single unbroken track. But now so is master. And since the branches are unconnected, it is never necessary to arrange them relative to each other, so they will always be drawn properly. And the master commit log remains clean and readable.

What I have done is make release an orphan branch that shares no history with master (created with git checkout --orphan). To cut a release, I check out release, then I get the tree from the commit I want to release and put that in a new commit on release. Obviously with this scheme I need to manually record the commit ID somewhere to be able to know what state of master a particular release corresponded to – there is no longer merge metadata to keep track of that. The commit message seems a natural place to record that information. I need to construct one in any case since Git does not know how to provide a default message for these commits like it does when merging a branch. Of course, the extended commit message is also a good place to put a list of commits that are hitching a ride on this release. I decided to put a release version (in my case, a simple incrementing integer) in the commit message subject as well, to make it easy to refer to a particular release.

Needless to say, I have the process automated. This is my release script:

#!/bin/bash
set -e
commit=`git rev-parse "${1-master}"`
read num junk oldcommit <<<`git log --no-walk --format=%s release --`
(
  printf '%d @ %s\n\n' $((++num)) $commit
  git log --reverse --oneline --abbrev=12 --no-decorate --no-color $oldcommit..$commit
) \
| git commit-tree $commit^{tree} -p release \
| ( read new ; git update-ref refs/heads/release $new )
git push -f origin master release

Aside from the hard linkage by commit ID you also get a soft correlation by commit date if you ask git log and friends to use --date-order. This is sufficient for routine development work. Note that since the commit IDs are recorded, it is possible to use grafts to retrospectively (possibly temporarily) make the orphan release branch seem as though a mergeful branch.

A nice aspect of doing things this way is how easy it is to get a full diff of the total change represented by a release. With a merge-based release branch it takes fiddling to ask for that diff and enough knowledge to know how to.

And so I seem to have arrived at a poor (technically awkward, functionally very limited) reinvention of Mercurial’s named branches, using the plumbing provided by Git. This may be the only true use case for named branches that I can think of.

Update: I’ve rewritten the script to use lower-level plumbing. It no longer even checks out the tree, it just directly creates a commit object based on the tree object of the released commit.

May 02, 2012

D’uh

I recently discovered the -h switch of GNU sort, added in the coreutils 7.5 release from Aug 20, 2009. With this switch, sort will do a numeric sort of human-readable size numbers, i.e. it will accept “42M” and “1.3G” as numbers and put them in the right order. This led to the following shell one-liner in my ~/bin:

#!/bin/bash
exec du "${@--xd1}" -h | sort -h

It invokes du to print the disk space consumption of a directory tree, then sorts its output by size. If you pass any switches they will be passed on to du, else it will default to -xd1 (-x = stay on one filesystem, do not cross mountpoints; -d1 = do not print directories deeper than 1 level).

I gave this script the only name it could have – obviously, duh.

Update: turns out that the -d switch of du is even newer than sort’s -h switch. It was added for compatibility with FreeBSD in the coreutils 8.6 release from Oct 15, 2010 – prior to that it had to be spelled --max-depth, which rather complicates matters. You would have to do this:

#!/bin/bash
DEFAULT=(-x --max-depth=1)
exec du "${@-${DEFAULT[@]}}" -h | sort -h

That’ll win neither beauty nor concision contests.

April 29, 2012

e-legitimation - BankID och Linux - mobil-bankid

Jag har varit en stor fan av fribid.se för det de har gjort gällande bankid och linux.
Men nu när jag skulle skapa en ny e-leg hos fsb så fick jag problem igen.

Men nu finns det en mobil-bankid möjlighet som gå att beställer vi fsb webb från sin Linux dator i mitt fall Fedora 15.
mobil bankid

Och där efter installera bankid appen på sin Android telefon. Därefter när man logga in med bankid på tex skattverket så använder man mobilen och sin e-leg och bankid istället för sin linux burk. Jag vet är det här är bara en workaround och att själv klart borde all fungera på linux. Men det här kan hjälper er som precis som jag sitta uppe sen en kväll och måste bara deklarera och behöver en snabb fix för att får det gjort.

/Roger

April 16, 2012

In which I write about PHP for the first and the last time

Tim Bray wrote a short piece on PHP and kicked up a huge hullabaloo in the land of weblogs. Here’s my contribution to the echolalia.

Tim writes that it’s his experience that systems written in PHP are all spaghetti. I don’t think that’s a coincidence, and there are two sides to that coin.

One side, which all the PHP apologists are citing with full justification, is that its nonchalant everything-but-the-kitchen-sink “standard” library approach and its wide deployment present such a low barrier to writing and re-/deploying code that a lot of people who only have small needs are empowered to meet them on their own, however messily. I have argued that this enabling function is a good thing and I stand steadfastly by that position.

But on the other side, well, PHP is… lousy. Just wretched. Why?

  • The language – and here I’m talking only about the core, that is, syntax, type system, object orientation support, scoping rules, and the like – is limited and haphazard:

    Haphazard, because it was never designed in any fashion: it grew out of a templating system with hodgepodge constructs for which orthogonality was only an afterthought.

    Limited, because while it’s all dynamically typed and garbage collected, it squanders most of that advantage by limiting itself to the expressive power of C, roughly. Anonymous functions are awkward to create, and I’m not sure closures are possible in any practical fashion at all. Lists are second-class citizens, always bound to arrays. Attempts to introspect end up looking comical. The standard code modularization mechanisms (include, require) are simple-minded textual inclusions.

  • This is my big complaint: all the APIs are execrable. It starts with the built-in stuff: try to do something with the image functions or the zip file functions – I never figured out a way to avoid making my code look ugly. With the built-in library setting a bad example, it’s no surprise that the same issue extends to the packages available from the PEAR: awkward is the rule.

    In my opinion, that is what makes me and a lot of other people feel that PHP code resists being made clean. I feel the same way when I’m forced to use Tk in Perl: the API is so misshapen that you just can’t make your own code sitting on top of it look pretty.

  • The easy, obvious way to do things is often the incorrect one.

    There are lots of tutorials which will either not tell you to quote user input before interpolating it into SQL statements at all, or tell you to use addslashes for the purpose. In either case you are open to injection attacks – either gaping wide or just wide. What you really should do is use a function that respects the particular SQL dialect, such as mysql_escape_string… no wait, that’s dead code, I mean mysql_real_escape_string. Bleurgh. And once you’ve found out all that and understood it, properly quoting user input is still a pain in the bottocks and requires much more code than not bothering. Guess what casual coders will do? Now contrast Perl’s DBI, where using bind parameters is just as easy as not; and in fact, makes the code easier to read.

    How about working with strings in an encoding-aware fashion? That means nothing short of rolling your own string munging, with “help” from some typically byzantine APIs – what fun! Which novice is going to know that they should? Who is going to bother? How many of them will get it right?

All of these flaws are interconnected; the morass is simply the result of the language being a templating system that grew too big for its breeches. I don’t believe the problems can be corrected in any sensible fashion; PHP will always be a templating system, however much it may be straining against its clothes.

And let me tell you, it’s still a great templating system! If all you need is to write a web app that consists of two pages, running four queries over a five-table database, there is nothing that will get you up and running faster.

But that doesn’t make it suitable for large-scale systems. It’s not that the premise does not scale, it’s just that this particular implementation of the premise does not. Apologists will sometimes argue that the flaws are a necessary evil in achieving the low barrier to entry; worse-is-better style. I don’t buy that argument for a second. There is no reason that a language could not be designed to address the precise problem space that PHP aims at, but be created from scratch to be big enough for its britches, without the slipshod, organic growth. There is no reason it would have to be any harder to get things done with a standard library that encourages good practices as the obvious and easy way to accomplish things.

PHP is ripe for having its lunch eaten, really.

Update: Eevee rants about it, comprehensively.

April 08, 2012

Building for Amazon

Recently there has been a bit of comment from a couple of people around an article that appeared about using Amazon Web Services (AWS).

To be honest I'm more than a bit annoyed about how they were written up. The article made it sound like I'd given a talk about my use of AWS at work, or had helped write an article. I hadn't done either, but was part of a panel discussion around cloud. I do note that one of the other panellists was similarly written up too... Most of the article was from an answer to a question about what was good, and what was bad about Amazon with other bits picked out from other things. Some of it was not quite sensationalised and I didn't know the article was happening or a chance to review (something that has always happened before and I've spoken at around 20 events).

So first of all I want to go on the record and say I am a strong supporter of Amazon. They, Google and Salesforce are the companies who have done more than anything else to push cloud forward. Also anybody else who has heard me speak knows I am a strong proponent of AWS. Going forward though I think I'll just refuse to talk about any areas of improvement needed, but focus on the strengths. I have been in communication with AWS staff to tell them my view hasn't changed and I back them 100%.

So I'd know like to focus on what I was intending to convey, and did convey but wasn't reported, at the panel discussion. Building a system on Amazon does not mean that you stop making proper design decisions. Some people have assumed that because Amazon is such a great company that they can forget about system architecture and everything will be fine. You can't. You wouldn't build your physical machines or VMWare and not do backup, do no performance tuning, and have no redundancy. Most of what people use Amazon for is for running machines (They do do many other things too, like a CDN and NoSQL on demand etc). So you need to design your systems. I learnt this many years ago.

When it comes to performance you can't necessarily throw everything at it on AWS like you can with a traditional architecture as you don't know the full underlying design and you can't fix your bottle neck in a traditional way. But running your on-prem architecture in this way can cost you absolutely millions. There are numerous cases of media organisations, life science companies etc doing large batch processing for hundreds or thousands of dollars that they just couldn't do cost effectively before and to do it very quickly also. What you do need to do for AWS though is try and parallelise your workload wherever possible as Amazon works well in this model. You can also vary your machine size as you need to - and now Amazon allows 64 bit machines for all size types it makes this even easier, and saves more money. So you can go all the way from a free Micro instance up to new Sandy Bridge based machines without rebuilding your image.

Make sure with Amazon that you have machines running in another area as redundancy and you have a way of activating them. You wouldn't run your own data centres like this, so why do it differently on Amazon. The "horror" stories around when businesses struggled when an Amazon Availability Zone (AZ) went down are really a horror story about bad architects in my mind.

Backup. Yes, that is a command! Any IT exec who doesn't ensure that their data is backed up needs to be shot. Why should this be any different on Amazon? Firstly people had problems as they didn't realise that Amazon can lose changes when you do some kinds of restarts (as config runs in RAM in effect), then they didn't realise that only backing up in the data centre was a bad idea (EBS backing for the EC2 instance). Amazon do make it easy here as S3 allows you to do quick, cost efficient backups - how many other services are designed for 11 9s - that is 99.999999999%? Again, if you get data loss it is not Amazon at fault, but your architects.

Amazon do status reporting online here. Do you get that from your other vendors, or internally in your own IT function? AWS are to be applauded for their transparency. The one corruption event I referred to was documented on the status page (an EBS fault occurred). It should be noted that no data was lost at all from this as it was failed over to another system. I have had on-prem failures in the past where I have lost data or took a long time to recover. This incident was all sorted in less than an hour - AWS allows you to build solutions like this easier in many cases to avoid this problem having high impact.

So in short would I go with Amazon AWS again? Absolutely!! I have never had any significant downtime with them in my roles, and it has saved money and been extremely flexible.

NB This is all my own opinion, and not that of my employer - something I also said at the panel discussion but was also omitted.

March 28, 2012

Bug of the week

Lukas Mai:

The following code is somewhat silly, but gcc should either compile it correctly or print an error message, not generate invalid asm[:]

int $1 = -1;
int main(void) { $1++; return $1; }

Assembler injection attacks, here we come!

March 25, 2012

Six Stages of Debugging

#sixdebug { margin-left: 0 } #sixdebug li { font-size: 1.6em; font-weight: bold; margin-left: -.444em } #sixdebug li p { font-size: 0.625em; font-weight: normal }
  1. That can’t happen.

  2. That doesn’t happen on my machine.

  3. That shouldn’t happen.

  4. Why does that happen?

  5. Oh, I see.

  6. How did that ever work?

[This is not mine. Its oldest mention I could track down on the web appeared on a now-defunct weblog. I am posting it in the interest of personal archival.]

March 19, 2012

Shoestring & bubblegum sound server

In which I beat MacGyver.

I recently had need to play sound on a headless Linux machine. I started looking into sound servers, but everything I found seemed a significant amount of work to set up. I tried to reduce the problem to the fundamental parts involved, and by a trail of hints winding through a narrow mountain pass arrived at a rather… minimalist solution to fit my minimalist needs. I did not require anything else than to be able to hear sound at all and the solution did not require anything else of me than ALSA – and it’s hard to install a Linux machine without ALSA these days.

The entirety of the charade amounts to this:

  1. On the speakerless machine, load the loopback ALSA driver:

    modprobe snd-aloop index=0 pcm_substreams=1

    The driver provides a card with two sound devices, and when sound is output onto a stream on one device then the driver mirrors that as an input available on the same stream on the other device.

  2. Configure sound with an .asoundrc like this:

    pcm.!default {
      type dmix
      slave.pcm "hw:Loopback,0,0"
    }
    pcm.loop {
      type plug
      slave.pcm "hw:Loopback,1,0"
    }

    This has programs default to outputting sound to stream 0 of device 0 of the loopback (pseudo-)card, and has ALSA mixing their outputs together (type dmix). The loopback driver will make the resulting sound available for sampling via stream 0 of device 1, for which the configuration sets up another source called loop as a simple alias (type plug).

  3. On the machine with speakers you can then you can do this:

    ssh -C speakerless sox -q -t alsa loop -t wav -b 24 -r 48k - | play -q -

    The bolded portion configures sox’s input to use the ALSA type, and the underlined part (which is where normally a filename is given) gives name of the source – the loop alias from the configuration above. The rest of the switches tell sox to output 24-bit, 48 kHz WAV to standard output, to be picked up by ssh.

  4. Now play something on the speakerless machine.

This will push a constant stream of sample data down the wire, even during silence; with SSH compression enabled as it is here, that will come to something like 4 KB/sec and will very slightly busy the CPU on both machines. Both resource drains stop if you break the SSH connection. You can do so at all times without sound-playing programs on the speakerless machine ever noticing.

The one and only real drawback is a playback latency of a few fractions of a second – enough to be noticably not in real time.

But as I said, I had minimal needs of it.

[Update: added explanations.]

March 16, 2012

Kindle Reading Stats

I’ve written before about my initial investigations into the Kindle, and I’ve learnt much more about the software and how it communicates with the Amazon servers since then, but it all requires detailed technical explanation which I can never seem to find the motivation to write down. Extracting reading data out of the system log files is however comparatively simple.

I’m a big fan of measurement and data so my motivation and goal for the Kindle log files was to see if I could extract some useful information about my Kindle use and reading patterns. In particular, I’m interested in tracking my pace of reading, and how much time I spend reading over time.

You’ll recall from the previous post that the Kindle keeps a fairly detailed syslog containing many events, including power state changes, and changes in the “Booklet” software system including opening and closing books and position information. You can eyeball any one of those logfiles and understand what is going on fairly quickly, so the analysis scripts are at the core just a set of regexps to extract the relevant lines and a small bit of logic to link them together and calculate time spent in each state/book.

You can find the scripts on Github: https://github.com/mattbnz/kindle-utils

Of course, they’re not quite that simple. The Kindle doesn’t seem to have a proper hardware clock (or mine has a broken hardware clock). My Kindle comes back from every reboot thinking it’s either at the epoch or somewhere in the middle of 2010, the time doesn’t get corrected until it can find a network connection and ping an Amazon server for an update, so if you have the network disabled it might be many days or weeks of reading before the system time is updated to reality. Once it has a network connection it uses the MCC reported by the 3G modem to infer what timezone it should be in, and switches the system clock to local time. Unfortunately the log entries all look like this:


110703:193542 cvm[7908]: I TimezoneService:MCCChanged:mcc=310,old=GB,new=US:
110703:193542 cvm[7908]: I TimezoneService:TimeZoneChange:offset=-25200,zone=America/Los_Angeles,country=US:
110703:193542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wan,name=localTimeOffsetChanged,arg0=-25200,arg1=1309689302:
110703:193542 cvm[7908]: I TimezoneService:LTOChanged:time=1309689302000,lto=-25200000:
110703:183542 system: I wancontrol:pc:processing "pppstart"
110703:193542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wan,name=dataStateChanged,arg0=2,arg1=:
110703:183542 cvm[7908]: I ConnectionService:LipcEventArrived:source=com.lab126.cmd,name=intfPropertiesChanged,arg0=,arg1=wan:
110703:183542 cvm[7908]: W ConnectionService:UnhandledLipcEvent:event=intfPropertiesChanged:
110703:193542 wifid[2486]: I wmgr:event:handleWpasupNotify(<2>CTRL-EVENT-DISCONNECTED), state=Searching:
110703:113542 wifid[2486]: I spectator:conn-assoc-fail:t=374931.469106, bssid=00:00:00:00:00:00:
110703:113542 wifid[2486]: I sysev:dispatch:code=Conn failed:
110703:183542 cvm[7908]: I LipcService:EventArrived:source=com.lab126.wifid,name=cmConnectionFailed,arg0=Failed to connect to WiFi network,arg1=:

Notice how there is no timezone information associated with the date/time information on each line. Worse still the different daemons are logging in at least 3 different timezones/DST offsets all interspersed within the same logfile. Argh!!

So our simple script that just extracts a few regexps and links them together nearly doubles in size to handle the various time and date convolutions that the logs present. Really, the world should just use UTC everywhere. Life would be so much simpler.

The end result is a script that spits out information like:

B000FC1PJI: Quicksilver: Read 1 times. Last Finished: Fri Mar 16 18:30:57 2012
- Tue Feb 21 11:06:24 2012 => Fri Mar 16 18:30:57 2012. Reading time 19 hours, 29 mins (p9 => p914)

...

Read 51 books in total. 9 days, 2 hours, 29 mins of reading time

I haven’t got to the point of actually calculating reading pace yet, but the necessary data is all there and I find the overall reading time stats interesting enough for now.

If you have a jailbroken Kindle, I’d love for you to have a play and let me know what you think. You’ll probably find logs going back at least 2-3 weeks still on your Kindle to start with, and you can use the fetch-logs script to regularly pull them down to more permanent storage if you desire.

February 09, 2012

Pingdom - nice world wide monitoring tool.

Worth looking into.

These are two checks on my website….. my own server. And yes I know the resonse times arn’t amazing. And yes we had a power outage the other day and my server wasn’t able to reconnect to the network for some reason.

Uptime for www: Last 30 days

Response time for www: Last 30 days

/Roger

January 23, 2012

Cloud computing and location of data

Disclaimer: I'm not a lawyer so this is not legal advice, and these views do not represent my employer's views either.

One of the big elephants in the room with cloud computing is the location of data. People are naturally worried about whether their data is accessible by others or not. Some providers will tell you the location of the data, some will not. There are also the issues of the Patriot Act and safe harbour when interaction with technology providers across the Atlantic.

The Patriot Act requires a US based corporation to hand over data to the government and they do not have to disclose it to the end customer either if they are service provider. As far as I can understand you are not protected any further even if the data is in the EU or another region. The defining requirement is whether they are a US based company.

One thing that is mentioned often is safe harbour. Basically what safe harbour means is that the US based provider will adhere to the same standards as the EU requires. This is because US data protection is basically non-existent. The safe harbour provisions does NOT mean your data will reside in the EU, it just means that it will be protected to the same standard as the EU.


Of course none of this matters if you work for a global corporation headquartered in the USA anyway as then you are required to hand data over to the government if requested under the Patriot Act as I read it. The difference is, whether you know whether the government is accessing your data. The government could request your data from you, but may not need to if they go to your cloud supplier who is also a US based corporation.


It is common sense that if you have sensitive data that you encrypt it, whether you store it in the cloud or on premises. This is especially important for data such as customer or employee data that would cause damage - either real financial loss or damage to reputation.

I believe that there will be a rise in cloud encryption services e.g. GPG type plugins for Gmail. Already Amazon has a service for S3 called Server-Side Encryption. With this service you give your private keys to Amazon to seamlessly encrypt/decrypt data on the fly. However what this means is that Amazon could give your encrypted data to the US government without your knowledge even the Patriot Act. As such in my mind the only reason that anybody would use this would be for low value data and I would not consider an encryption service for email where the vendor controls the private keys.

One aspect that people do often overlook is not so much government regulations, but their own rules. What do your customer policies say, and what do your own staff policies say. For example your HR policy may say that all personnel data will be stored in UK, or your customer terms and conditions might say that all data will be stored in the EU. Many cloud services might not be based in the EU and there are very, very few in the UK. There can also be obscure regulations specific to your industry e.g. in a previous role the code master had to be in the UK as cryptographic code was considered a weapon and needed an export license.

It should be noted that complying with privacy regulations by storing data in the EU does not mean that it cannot be taken under the Patriot Act. In these cases it is assumed that the US Government is the evil one, but I have no less reason to suspect that the UK or any other government is any less nefarious.

My conclusion is that it is safe to store data in the cloud if a company adheres to safe harbour, and is probably better than most companies own data protection. If however you are worried about your data falling into government hands then you need to look into it very carefully. The only really safe way to protect your data from governments is to encrypt your data in the cloud with your own encryption keys.

Useful reference articles:
ZDNet Article: How the USA Patriot Act can be used to access EU data
Wikipedia article on the Patriot Act
Wikipedia article on safe harbor
Ars Technica article on Patriot Act and cloud providers

January 16, 2012

Status quo awareness

Paul Graham:

The trick I recommend is to take yourself out of the picture. Instead of asking “what problem should I solve?” ask “what problem do I wish someone else would solve for me?”

January 14, 2012

Concise XPath

I get the impression that not many people know XPath, or know it very well, which is a shame. For one, it’s a beautifully concise notation (as you’ll see shortly). For another, it may be the difference between whether you hate XML or not. (I won’t claim it’ll make you like XML, though it may. It did for me.)

XPath is really very simple: you just string together conditions. Evaluation begins with a set of nodes so far. Then a new set of nodes is selected based on the given ones, and the condition is checked on this new set. If it’s a condition you appended with /, that means to then select the matching nodes for the next step. If you appended it inside [], that means to continue on with the original set, but to discard those nodes for which there were no matching new nodes.

So /foo/bar means this:

  1. Start with the root node.
  2. Then /foo: for each node (which is just the root node, so far), fetch its child nodes (of which the root note always has exactly one), check which ones are foo elements, and take those as the new set.
  3. Then /bar: for each node, fetch its child nodes, check which ones are bar elements, and take those as the new set.

These conditions appended with / are known as steps.

And /foo[bar] means this:

  1. Start with the root node.
  2. Then /foo: for each node, fetch its child nodes, check which ones are foo elements, and take those as the new set.
  3. Then [bar]: for each node, fetch its child nodes, check if any are bar elements, and if you come up empty then discard that node.

This is known as a predicate. Each predicate can itself be just as complex as any expression: it can itself contain steps and predicates.

Finally, there are axes, written as prefixes separated with a ::. Axes specify which set of nodes to select before checking the condition – it doesn’t have to be the child nodes of the current set, that’s just the default axis (which you don’t need to write) called child::. So you can write e.g. /foo/following-sibling::bar:

  1. Start with the root node.
  2. Then /foo: for each node, fetch its child nodes, check which ones are foo elements, and take those as the new set.
  3. Then /following-sibling::bar: for each node, fetch all its siblings, check which are bar elements, and then take those as the new set.

(Thus /foo/bar and /foo[bar] really mean /child::foo/child::bar and /child::foo[child::bar] respectively. Therefore each condition also includes a selection rule, often implicitly.)

Compare expressions and explanations and you see what I said about concision and beauty.

Now, with those principles given to you, just string together conditions. There are a few syntactic shortcuts other than not needing to write child::, e.g. you can write attribute::foo as @foo, and /descendant-or-self::foo can be written //foo, but there is no magic to those: they are just sugar. For the details – lists of possible axes, syntactic shortcuts, etc. – just refer to the standard. Lousy though it may be as an introduction, it makes a good reference.

That’s XPath.


Some practical notes:

With the various axes such as following-sibling::, you always get a whole set (e.g. all following siblings in this example). If you want a specific one from that set based on position – usually the first –, you have to discard the ones you aren’t interested in by using a predicate that checks the position – in that case [1], which is another shortcut notation, standing for [position() = 1]. The position() function evaluates to the index of a node within its subset, which is based on the node it was selected for.

So a common construction is following-sibling::*[1], which amounts to “the element whose start-tag is right after this one’s end-tag.” A somewhat likely case is to further combine this with a [self::foo] predicate to say “but only as long as that is a foo element.”

Observe that the order of predicates matters.

If you write *[self::foo][1], you get all elements, then narrow it down to the foo elements, then to the first of them – so it amounts to “select the first foo element anywhere” which is identical to the much simpler expression foo[1]. This is very different from *[1][self::foo], which first narrows down “everything” to “the first thing” and only then checks “but only if it’s a foo.”

January 09, 2012

The essentially mediocre

MG Siegler:

If you’re saying something that you think is great, why would you want to do it as a comment on another site anyway?

December 28, 2011

Creative Commons - nordickiwi.com

I’ve decided to use the Creative Commons lisince for this website. You can read more about it at the bottom of this page. I still need to figure out how to add it to the MediaWiki part of this site.

/Roger

December 07, 2011

Spend money on… which is it, now

Maciej Ceglowski:

To avoid this problem, avoid mom-and-pop projects that don’t take your money! You might call this the anti-free-software movement.

But it’s not! It’s the anti-free-service movement. Which I whole-heartedly support.

(Maciej makes that point himself, eventually and obliquely, but not until after the catchy coinage…)

December 04, 2011

Three months with the TouchPad

I first started writing this post on 2 September 2011. It was going to be called "three days with the TouchPad". I'd like to say that my opinion has changed substantially over the three months since then, but for that to have happened, I would have had to spend serious time with the device.

I haven't.

Last time anyone in our house tried to use the TouchPad it got thrown on the couch in disgust1 On the contrary, our iPad is happily used every day. Is this just a case of "you get what you pay for"?

The story so far

I fought my way through the broken websites to purchase an £89 HP TouchPad when they cleared their stock at the end of August. I couldn't be sure that Carphone Warehouse had stock for all their orders, so I was overjoyed when mine turned "dispatched" later in the week. Then, it never arrived.  I wasted hours on the phone with CPW and Yodel (cheap courier of choice for "free delivery" everywhere), who claimed it had been delivered, when no knock had ever graced my door. The driver only spoke Bulgarian, and intimated (through a translator and wild hand gesturing) that he had given it to someone who had come up from the stairs below us - an empty flat.

I had all but given up on the delivery when, after the weekend, our neighbour came over and said their housekeeper had collected it on Friday and had it the whole time.

Argh.

Eventually, thanks to people like me, the TouchPad ended up getting 17% of the market!

Of everything that wasn't the iPad.

(So, more like 1.8% then.)

And remember, I very nearly wasn't a member of that club, as it seemed very unlikely that Carphone Warehouse would have been in a position to give me another one, had the first one not surfaced.

The TouchPad was an impulse buy, as we already owned an iPad. I had opted for middle of the range - the 32GB with 3G.2 At clearance price, my iPad cost 7 times more than the TouchPad, but remember that the original retail pricing for a comparable device was £399 for HP vs £429 for Apple.

With all that in mind, here's a collection of thoughts about the TouchPad today. It is not a review: if you are interested in a review, albeit one from before the fire-sale, go read what Shawn Blanc wrote. The experience has hardly changed.

The good

I came into TouchPad ownership with a very open mind, based in part on my ex-colleague Sergei owning a Palm Pré and not hating it. Also, everything I read about webOS online made it seem that it was designed, where Android was mostly congealed. (My apologies to Douglas Adams.) Further, I wanted webOS to be a success, because I like to use systems that feel like they are consistently designed throughout, and I didn't think it would be good for the world if iOS was to be the only relevant platform for which that was true. We are in the odd position today that Microsoft has replaced Palm as the loveable underdog: Windows Phone (and possible Windows 8 for tablets) has taken the mantle of "mobile operating environment which actually has some moden design principles applied, rather than just copying iOS", which surely must provoke some cognitive dissonance for all the people still bitter about how Microsoft stole everything from the Mac.

I only made one note from three days after unboxing: "It is really handy to have the number keys on the keyboard all the time". It still is. I suppose there are other nice things, depending on your point of comparison. Notifications are good, in general, though I really don't care that each web site I visit exposes a search endpoint, so I don't appreciate that the TouchPad displays me a notification for each and tries to add them to the search.

Grasping at straws, I still like the card metaphor, though not as much for multiple tabs as for multiple applications. And the things that were good about webOS on the phone, such as the integrated contacts, are still good here, though not as useful. The only other thing I noticed in a quick look through the menus is that it has Beats Audio, which I like to think makes me one step closer to Dr Dre than most. I don't think I've ever actually tried to make the thing play audio in order that I might notice a difference.

The goblin

How long after the horse died is it acceptable to still be flogging it?

The TouchPad is slow, out of the box. Nerds like me can make it faster with - wait for it - syslogd and kernel patches, and even overclock it if they feel the need. (I didn't.)  The iPad 1 still runs rings around it in everything - even though the iPad has half the CPU cores at a much lower clock speed, and one quarter the RAM of the TouchPad.

It has a handful of apps, but not enough to retroactively justify the purchase to me, even at £89. If I go to my Applications list, I have a beta Kindle reader, which I had to side-load as it is US only: the best Twitter experience is something called "Spaz HD Beta Preview 2", which is both award-winning and open source, though apparently named by the people who came up with "The GIMP". In fairness, it's not bad, it's just not up to the experience which is available on any one of the great Twitter clients for other platforms. And with the on-again off-again abandonment by HP, surely most of those who came into the TouchPad did it eyes-open, knowing the chances of it ever developing a good app ecosystem were not high.

Most of what I do on a tablet is web browsing, and so even if it had no apps but did web browsing brilliantly, it might be redeemed. It doesn't. It has Flash, which really just serves to make YouTube worse. Maps are horrible, scrolling is slow and sluggish, and clicking doesn't normally hit the link you want it to.

Physically, it feels cheap, due to the plastic back.  It is a good weight however.

The purchase

In my mind, there were three groups of people who wanted to buy a TouchPad at fire sale prices:

  • People who wanted a "tablet" (iPad), but couldn't afford or justify one at market (iPad) prices
  • People who wanted an "Android tablet" and figured that a port couldn't be far away
  • People who liked webOS and actually wanted a TouchPad to use webOS on it

I was in the third group, but I also suspect that was about 1.8% of the people who actually got the device.

If you were to compare the experience on a £89 TouchPad vs. whatever else you could legitimately purchase for £89 - how long were the queues for the Binatone HomeSurf 7? - it seems like a no-brainer. If there was no chance that the tablet were ever able to run Android, I don't think it would have sold nearly as quickly. At the time of writing there is an alpha-quality CyanogenMod release of Android for the TouchPad, for developers, rather than end users. With the recent release of Android 4.0, it's likely there will be a reasonably good upgrade path for the application story, and on this kind of hardware Android should be about as good as it is on any other kind of hardware.

I bemoaned this fact when I came to buy it:


#bbpBox_105984734731042816 a { text-decoration:none; color:#1F98C7; }#bbpBox_105984734731042816 a:hover { text-decoration:underline; }

I wish I could find everyone talking about running Android on the hp TouchPad, and STAB THEM IN THE FACE.
@craigbox
Craig Box

Three months later, has my attitude changed? Somewhat. I simply don't want to own an Android tablet. (Neither do many other people, as we established before.) Would it be better on this hardware than webOS? Probably. Ask me again when 4.0 is released for the TouchPad - I don't think the attempts to shoehorn Android 2.x onto tablets have done hackers any better than Samsung.

I don't think there can be any argument that the fire sale was a dumb idea, and HP's CEO eventually paid the price. Would I have paid £200 for this? No, but they would still have sold out at that price.

The summary

First world problems much? Our two tablet household isn't as good as it would be if we had an iPad each. Sure. I knowingly bought an £89 gadget to have a play with, and I suspect I could easily get that back if I wanted to sell it. Alternatively, if either of my brothers read my blog, I might be convinced to post it to them for Christmas. Over time, I think I might find a use for it - if I could pick up the Touchstone dock-slash-stand, I think it could make a great digital photo frame.  Even if all it ever did was be an LCD Kindle, it was still a bargain.

But the crux is that neither of us ever want to use it. It almost got put in the cupboard today. Attempts to use it provoke disgust, throwing it back onto the couch, and getting up to find the iPad. There is really nothing redeeming about it.

  1. Fern later clarified: "It wasn't thrown on the couch, it was thrown at the couch.
  2. If I were to look back on that purchase, I would say the money spent on the 3G was mostly wasted - tablet usage is mostly at home. The iPad spent over a year without a 3G SIM card, though it has one now thanks to Arunabh, who pointed out that T-Mobile have a remarkable 12 months free on an iPhone 4 PAYG SIM, and the iPad takes the SIM quite happily.

December 02, 2011

Prep notes for NDF2011 demonstration

I didn't really have a presentation for my demonstration at the NDF, but the event team have asked for presentations, so here are the notes for my practice demonstration that I did within the library. The notes served as an advert to attract punters to the demo; as a conversation starter in the actual demo and as a set of bookmarks of the URLs I wanted to open.




Depending on what people are interested in, I'll be doing three things

*) Demonstrating basic editing, perhaps by creating a page from the requested articles at http://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles

*) Discussing some of the quality control processes I've been involved with (http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion and http://en.wikipedia.org/wiki/New_pages_patrol)

*) Discussing how wikipedia handles authority control issues using redirects (https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Redirect ) and disambiguation (https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Disambiguation )

I'm also open to suggestions of other things to talk about.

December 01, 2011

Metadata vocabularies LODLAM NZ cares about

At today's LODLAM NZ, in Wellington, I co-hosted a vocabulary schema / interoperability session. I kicked off the session with a list of the metadata schema we care about and counts of how many people in the room cared about it. Here are the results:

8 Library of Congress / NACO Name Authority List
7 Māori Subject Headings
6 Library of Congress Subject Headings
5 SONZ
5 Linnean
4 Getty Thesauri
3 Marsden Research Subject Codes / ANZRSC Codes
3 SCOT
3 Iwi Hapū List
2 Australian Pictorial Thesaurus
1 Powerhouse Object Names Thesaurus
0 MESH

This straw poll naturally only reflects on the participants who attended this particular session and counting was somewhat haphazard (people were still coming into the room), but is gives a sample of the scope.

I don't recall whether the heading was "Metadata we care about" or "Vocabularies we care about," but it was something very close to that.

November 30, 2011

Unexpected advice

During the NDF2011 today I was in "Digital initiatives in Māori communities" put on the the talented Honiana Love and Claire Hall from the Te Reo o Taranaki Charitable Trust about their work on He Kete Kōrero. At the end I asked a question "Most of us [the audience] are in institutions with te Reo Māori holdings or cultural objects of some description. What small thing can we do to help enable our collections for the iwi and hapū source communities? Use Māori Subject Headings? The Iwi / Hapū list? Geotagging? ..." Quick-as-a-blink the response was "Geotagging." If I understood the answer (given mainly by Honiana) correctly, the point was that geotagging is much more useful because it's much more likely to be done right in contexts like this. Presumably because geotagging lends itself to checking, validation and visualisations that make errors easy to spot in ways that these other metadata forms don't; it's better understood by those processing the documents and processing the data.

I think it's fabulous that we're getting feedback from indigenous groups using information systems in indigenous contexts, particularly feedback about previous attempts to cater to their needs. If this is the experience of other indigenous groups, it's really important.

November 26, 2011

Goodbye 'social-media' world

You may or may not have noticed, but recently a number of 'social media' services have begun looking and working very similarly. Facebook is the poster-child, followed by google+ and twitter. Their modus operandi is to entice you to interact with family-members, friends and acquaintances and then leverage your interactions to both sell your attention advertisers and entice other members of you social circle to join the service.

There are, naturally, a number of shiny baubles you get for participating it the sale of your eyeballs to the highest bidder, but recently I have come to the conclusion that my eyeballs (and those of my friends, loved ones and colleagues) are worth more.

I'll be signing off google plus, twitter and facebook shortly. I my return for particular events, particularly those with a critical mass the size of Jupiter, but I shall not be using them regularly. I remain serenely confident that all babies born in my extended circle are cute, I do not need to see their pictures.

I will continue using other social media as before (email, wikipedia, irc, skype, etc) as usual. My deepest apologies to those who joined at least party on my account.

November 24, 2011

How I’m voting in 2011

It’s general election time again in New Zealand this year, with the added twist of an additional referendum on whether to keep MMP as our electoral system. If you’re not interested in New Zealand politics, then you should definitely skip the rest of this post.

I’ve never understood why some people consider their voting choices a matter of national security, so when via Andrew McMillan, I saw a good rationale for why you should share your opinion I found my excuse to write this post.

Party Vote
I’ll be voting for National. I’m philosophically much closer to National than Labour, particularly on economic and personal responsibility issues, but even if I wasn’t the thought of having Phil Goff as Prime Minister would be enough to put me off voting Labour. His early career seems strong, but lately it’s been one misstep and half-truth after another, the remainder of the Labour caucus and their likely support partners don’t offer much reassurance either. If I was left-leaning and the mess that Labour is in wasn’t enough to push me over to National this year then I’d vote Greens and hope they saw the light and decided to partner with National.

Electorate Vote
I live in Dublin, but you stay registered in the last electorate where you resided, which for me is Tamaki. I have no idea who the candidates there are, so I’ll just be voting for the National candidate for the reasons above.

MMP Referendum
I have no real objections to MMP and I think it’s done a good job of increasing representation in our parliament. I like that parties can bring in some star players without them having to spend time in an electorate. I don’t like the tendency towards unstable coalitions that our past MMP results have sometimes provided.

Of the alternatives, STV is the only one that I think should be seriously considered, FPP and it’s close cousin SM don’t give the proportionality of MMP and PV just seems like a simplified version of STV with limited other benefit. If you’re going to do preferential voting, you might as well do it properly and use STV.

So, I’ll vote for a change to STV, not because I’m convinced that MMP is wrong, but because I think it doesn’t hurt for the country to spend a bit more time and energy confirming that we have the right electoral system. If the referendum succeeds and we get another referendum between MMP and something other than STV in 2014, I’ll vote to keep MMP. If we have a vote between MMP and STV in 2014 I’m not yet sure how I’d vote. STV is arguably an excellent system, but I worry that it’s too complex for most voters to understand.

PS. Just found this handy list of 10 positive reasons to vote for National, if you’re still undecided and need a further nudge. Kiwiblog: 10 positive reasons to vote National

November 16, 2011

Nordickiwi soon 12 years old

I just read my first blog post 28/11/99 Dobedo that I wrote on my nordickiwi web site. Or as everyone on the net called them then a “Journal post”. 28 nov 1999 I published this account of a night out in Stockholm that actually was a IRL event for a Online community call dobedo.
Nordickiwi web site has evolved and mangled itself from being hosted here and there, and then on four different fedora laptops over the years in various cupboards and spare rooms in my apartment and now my house.
It’s been built using various tools, own html and css, java script to wordpress and mediawiki and a mysql database.

The current fedora version is a few years old and time get revamped and updated again. So maybe even the site could be migrated and squashed into a new tool I need to learn. I’ve been thinking about hosting a Joomla server and chucking it into that. Maybe even changing to CentOS and moving away from Fedora…… shock horror.
We will see what happens.
But anyway, 12 years isn’t a bad effort and I have no intention of stopping. Even at times things go quiet on the site but that all depends on life situation and other stuff I have to prioritise. Anyway happy anniversary to my website and thanks to all the visitors that ever stumbled across it.

/Roger Sinel - aka nordickiwi

Gnome3 or XFCE ?

Well I just ditched Gnome3 for XFCE, tried to adapt but just couldn’t do it. I mist the speed.

Sorry Gnome 3 dev team. Your doing some good work, but it just ain’t my thing.

/roger

e-legitimation linux

This is a great little project that solves a problem for many of us GNU/Linux users in Sweden.

FriBID is a open source project för e-legitimation with BankID

http://www.fribid.se/

Personally I usd the fedora 15 fribid package maintained by Henrik Nordström.

/Roger Sinel

November 11, 2011

A week of freedom

A week of Freedom, this week has been a week of free thinking and free software .
This week the Stockholm FOSS community were treated to a visit by Richard Stallman from the Free Software Foundation and the man behind GNU/Linux and GNU General Public License. He held a talk at the Stockholm university that over 1100 people attended.

I’ve seen Richard speak here in Stockholm before and it was again a refreshing reminder to hear his message that many of us FOSS users are all to quick to forget.

Since his talk I’ve read many positive comments and discussions about his talk and message regarding Free Software and the four freedoms. And also some debate which is healthy and to be expected.

The day after the talk I was privileged to able to drive Richard around Stockholm and show him some of the views of the city. It was very special spending time with a man that I and many others have so much to thank for. I will never forget that day.

Right now I sitting on a train between Stockholm and Gothenburg on my way to the FSCONS conference. FSCONS is the Nordic countries’ largest gathering for free culture, free software and a free society. The conference is organized yearly with 250-300 participants.

Richard will be speaking again at this conference as will many other interesting guests.

So I will be back in Stockholm on Sunday after a full week of Freedom.

Roger Sinel

November 07, 2011

Fedora 16 - Features this time around

Soon the time is upon us again for a Fedora release. One day left when writing this.

And here’s a quick run down of some of the features. Some nice stuff to keep us on our toes and on the bleeding edge.

Cloud stuff, or cloud stuffing
Aeolus Conductor - is a web UI and tools to create and manage cloud instances over varied cloud types.
Condor Cloud - an IaaS cloud implementation using Condor and the Deltacloud API.
HekaFS (formerly CloudFS) - a “cloud ready” version of GlusterFS
pacemaker-cloud - application service high availability in a cloud environment.
OpenStack - is a collection of services that can be used to setup and run a cloud compute and storage infrastructure.

Windows Managers – area to click in
Update GNOME 3 - to the latest upstream release.
GNOME unified input indictator - allowing users to switch seamlessly between keyboard layouts and input methods.
KDE Plasma Workspace 4.7. - including Plasma Desktop and Netbook workspaces,
Sugar – latest with enhanced activity set to provide an stable demo environment for Sugar as well as an environment for developers.

Multimedia gear
Blender 2.5 Updating Blender to the most recent version, cross platform suite of tools for 3D creation.

System changes
1000 System Accounts - Standardize on login.defs as authority for UID/GID space allocation, and move boundary between system and user accounts from 500 to 1000.
GRUB2 - Switch to using grub2 instead of grub legacy for boot loading an installed x86 system.
HAL Gone - Obsoleting HAL daemon (replaced by udisks, upower, libudev).
USB Network Redirection - Allow redirection of USB devices to other machines over the network.
Use Ext4 driver for Ext3 and Ext2 filesystems - Enable the ext4 driver to register ext3 and ext2 filesystems as well, and to mount those filesystems unchanged.

VIRTUAL World
Spice - aims to provide a complete open source solution for virtualized desktops. Spice 0.10 addsfeatures such as USB sharing between guests, and audio volume messages between guest and client.
Sheepdog - is a distributed object-based storage system for QEMU/KVM.
Virtual machine lock manager daemon.
Virt-manager Guest Inspection
Xen Pvops Dom0 - pvops-based kernel to serve as dom0 for a Xen-based system.

DEV dept
GCC Python Plugin, GCC plugin that embeds Python within GCC.
Update Perl to 5.14.

Roger Sinel



November 06, 2011

Recreational authority control

Over the last week or two I've been having a bit of a play with Ngā Ūpoko Tukutuku / The Māori Subject Headings (for the uninitiated, think of the widely used Library of Congress Subject Headings, done Post-Colonial and bi-lingually but in the same technology) the main thing I've been doing is trying to munge the MSH into Wikipedia (Wikipedia being my addiction du jour).

My thinking has been to increase the use of MSH by taking it, as it were, to where the people are. I've been working with the English language Wikipedia, since the Māori language Wikipedia has fewer pages and sees much less use.

My first step was to download the MSH in MARC XML format (available from the website) and use XSL to transform it into a wikipedia table (warning: large page). When looking at that table, each row is a subject heading, with the first column being the the te reo Māori term, the second being permutations of the related terms and the third being the scope notes. I started a discussion about my thoughts (warning: large page) and got a clear green light to create redirects (or 'related terms' in librarian speak) for MSH terms which are culturally-specific to Māori culture.

I'm about 50% of the way through the 1300 terms of the MSH and have 115 redirects in the newly created Category:Redirects from Māori language terms. That may sound pretty average, until you remember that institutions are increasingly rolling out tools such as Summon, which use wikipedia redirects for auto-completion, taking these mappings to the heart of most Māori speakers in higher and further education.

I don't have a time-frame for the redirects to appear, but they haven't appeared in Otago's Summon, whereas redirects I created ~ two years ago have; type 'jack yeates' and pause to see it at work.

October 07, 2011

Elegy to my only love in the cloud

Maciej Ceglowski:

Avos did a similar thing last week when they relaunched Delicious while breaking every feature that made their core users so devoted to the site (networks, bundles, subscriptions and feeds). They seemed to have no idea who their most active users were, or how strongly those users cared about the product. In my mind this reinforced the idea that they had bought Delicious simply as a convenient installed base of “like” buttons scattered across the internet, with the intent of building a completely new social site unrelated to saving links.

May you eventually find rest, del.icio.us. You have been undead since the day Yahoo! bought you, and Avos has only desecrated your corpse further. (I think at this point it qualifies as brand necrophilia.)

I made my peace at the beginning of the year – Avos just put the last nail in its coffin as far as I am concerned.

But I am saddened nonetheless.

For posterity, I should note my personalised comical note in this: the Avos zombie version of del.icio.us requires that usernames be at least 3 characters long. So I can no longer log into my account: ap. My ex-account. I could not even download an export of my bookmarks now if I didn’t thankfully have one already.

The worst I feel about this is for Joschua Schachter, and for the people who joined up with him after the Yahoo! acquisition because they understood his aspirations. There is a lesson here: if you care about something, don’t give away control of it – or at the least, not to a corporation. (Joining forces with other people made of flesh and blood is – no, can be – another matter. Choose wisely.)

What a shame.

September 23, 2011

Review of "Amazon Web Services: Migrating your .NET Enterprise Application"

Amazon Web Services: Migrating your .NET Enterprise Application
Rob Linton, Packt Publishing
2/5

(Review copy supplied by Packt Publishing.)

Amazon Web Services (AWS) is not a small topic. Just listed on their 'product summary' page are 28 different topics, most with an entire set of both product and API documentation behind it.

Condensing that into a book is not a trivial task, and it requires establishing a suitable narrative. This book has taken the angle of a ".NET Enterprise Application", and starts off well: a sample application, if a little trivial, is provided, and a goal stated to move the application from traditional server hosting to the Amazon cloud.

Good, but short, consideration is given to why you would put such an application in AWS rather than a platform solution. It then dives in to creating instances for deploying the application.

A book that takes you on a journey, as opposed to a general reference book, should not be afraid to make choices. Five pages are dedicated to the Import/Export service, which lets you post Amazon a hard drive. Shipping terabytes of data is a problem that users are unlikely to have up front - the book should acknowledge its existence, but it wastes time and confuses users by going in-depth on a subject which should be an appendix at best.

Similarly, Chapter 6 covers SQL Server, required for the example application, but then also covers Oracle, MySQL (RDS) and Amazon's key-value store SimpleDB, none of which are used or required. It is great to see that the notification (SNS) and queuing (SQS) are discussed in the context of how the application could be enhanced to use them, although using these services means you are "locked in" in much the same way you are on a platform service - somewhat undermining the point the book made in the beginning.

Many statements in this book are just plain wrong (such as Amazon.com not being hosted on AWS, or network (EBS) volumes being faster than instance disks - whole books could be written on this topic alone). Other sections of the book are have been made outdated as Amazon has rolled out improvements - the most major of which being the new license mobility options allowing the use of SQL Server Enterprise. While there is nothing the author or publisher can do about progress, there are occasions where the book is internally inconsistent - for example, referring to 4 regions in one section and 5 in another. In general, poor editing detracts from the reading experience.

One of the reasons Amazon is so much cheaper than regular datacenter providers is they allow you to build reliable solutions out of commodity hardware. However, this means you need to make allowances that are not at all discussed in this book. Deploying applications across availability zones is absolutely essential - Amazon is up-front in saying that they expect failures, which are widely reported by people who do not understand that AWS is not a traditional, expensive battery-backed-SAN-reliable datacenter. This book mentions availability zones, but doesn't show how to properly use them.

Redundancy is only briefly touched on - SQL mirroring and failover, possibly the most important topic this book could cover, is given two paragraphs and then offloaded to Microsoft. Even though there appears to be enough servers for a redundant architecture, the eventual service is riddled with single points of failure and there is no way that an application built to this model should be allowed into production on AWS.

Further, many best practices, especially those around firewalls, security groups and Active Directory, are described incorrectly, and are likely to lead to insecure or unnecessarily expensive deployments.

The author clearly understands both Windows/SQL Server and the basics of AWS, but taking 28 topics and picking out the important ones is a difficult task, and overall this book does a poor job of it.

Updating a manuscript to include new functionality means it would effectively never be published. The alternative is a 'living document', published online: hard to make money from, but guaranteed to be up-to-date. I am unlikely to bother reading another book on AWS.

 

September 17, 2011

Software Freedom Day 2011 - Stockholm

Software Freedom Day 2011 Stockholm

The Software Freedom Day event was held in Stockholm today and we organised it within the Swedish Linux Society.

This was the 4th year in a row that SFD has been celebrated in Stockholm in conjunction with Swedish Linux Society.

Pictures from todays event SFD 2011 Pictures

Thanks to the guys who came and helped spread the word about Free Software today.

//Roger Sinel



September 12, 2011

Future of mobile apps

Interesting article about HTML5 being the future of apps on mobile. Having worked in the smartphone industry I totally agree. Just like not many apps are used on the PC / Mac anymore but mostly web browser I believe the same thing will happen on tablet / smartphone.

The key obstacle to PC moving to web was the horsepower and fast links. If you think about the smartphone and mobile networks you could consider them to be like the PC and dialup internet 5 years ago. Given Moores law and 4G networks the mobile will go the same way I believe.

The other key part is offline storage in HTML5 so that applications can retain data/state. Already the financial times have switched to HTML5 based app so that they can avoid the 'Apple Tax'

September 07, 2011

OPML is spectacularly lousy

OPML blows chunks. This is my conclusion after a good two hours spent on exasperated googling: the specification is about as vague and informal as could be, the format misuses XML badly, and vital parts of it used widely by feed aggregators seem to be documented nowhere at all. Yuck. I guess I’ll end up deciphering the information I need from existing working code and hope it works in the general case.

This all started as what I thought would be a fun small scripting exercise: I was going to throw together a little script that would turn someone’s LiveJournal friends list into an OPML blogroll. Instead I spent more time beating on Google in mounting frustration, fruitlessly attempting to find something – anything –, as it would have taken to write the code for a better specified format. I came out empty-handed.

Update: Uche Ogbuji has ranted about the format, and Léon Brocard reports of a quote from a Ben Hammersley/Timothy Appnel talk at OSCON ’05:

Working with OPML is like driving nails into the floor with your forehead.

Update: I can’t believe I never linked to Charles Miller’s most eloquent panning of the format.

September 06, 2011

Cloud pricing is hard

One of the many benefits to cloud computing is the pricing model. Following Amazon's lead, any provider worth their salt lists their per-hour pricing on their website, and that is the price you pay, regardless of what you use.1 Gone are the days where you have to call for a custom price list, tailored for you by a man in a suit who is incentivised to charge exactly the maximum he thinks you will pay, no more no less. This means startups can get hold of scalable infrastructure at economies previously only available to the canny corporate negiotiator.

However, even in the automated, API-driven present, there are still different models for pricing which you can choose from. For example, Amazon has an on-demand price, reserved instances (pay up front to buy the right to run a machine for a cheaper rate) and spot instances (an instance market, where you bid a price and if the spot price is below that price, your instance runs). While spot instances sound like a curiosity for people doing queue-based distributed computing that can be started and stopped at will, James Saull points out they turn out to be an oddly cost-effective way to run your always-on infrastructure. You may not like the risk, and you are not getting the guarantee of instance availability that comes with reserved instances.

For the general case, once you understand what your infrastructure requirements look like on Amazon, you buy suitable reserved instances: you then save 34% or 49% on the cost of running the equivalent on-demand instance over 1 or 3 years.

Mull that over for a second. This morning, I came across a comparison of pricing between IBM SmartCloud Enterprise and Amazon EC2 (via Adrian Cockroft). I don't know a lot about the IBM cloud, but I do know bad math when I see it.

Lies, damned lies and estimated usage quotes

Amazon offer an online cost calculator. It's accurate, and always kept up-to-date, but admittedly it can be hard to use. For example, you have a small drop-down box at the top of the page which dictates which region you're in; if you are adding infrastructure in multiple reasons, it's easy to get lost.

The author of the IBM article, Manav Gupta, has obviously lost his way around the AWS calculator. His first estimate comes in at over $10,000 a month, as  "Amazon has included costs for redundant storage and compute in Europe". Amazon do no such thing. No data crosses a region unless you specifically request it - an important thing to note for compliance with data protection law. What is more likely is Gupta has started pricing his infrastructure in Europe, noticed his error, and continued in the US, without realising that AWS offers five global regions (six if you include the new US GovCloud) and you can easily provision infrastructure in all of them. In fairness, the IBM calculator seem to be much simpler; I can't find information on where IBM host their SmartCloud.

Quote 1 is replaced by quote 2, which comes in at $6370.62. Ignoring the obvious-but-insignificant errors (how does an application which does 20GB of inbound data per week do 120GB/week through its load balancer?) However, a quick look at the bill tab shows storage allocated in US-WEST, where everything else is allocated in US-EAST. Gupta's quote includes 7GB of S3 storage which is not mentioned on the post (or accounted for in the IBM quote). Not only that, it's charged twice: once in US-EAST and once in US-WEST! Assuming that's an error, I removed both allocations, and in order to be fair to what has been requested, added 300GB of snapshot storage for the EBS volumes to the correct page of the calculator.

Our new estimate - only correcting for errors, and without touching the compute cost - is $4211.90.

I've already beaten the published IBM price, but why stop there? As I mentioned above, sensible cloud purchasing almost always involves instance reservations. Because the pricing appears to have changed since the IBM article was published (I can't find a way to make IBM instances cost the same as shown in the calculator screenshot), I can't tell what reservation was used (if any) in the initial calculation. However, IBM offer 6- and 12-month reservations on a 64-CPU pool, with the note that "reserved capacity may not be economically attractive with the low monthly usage you have selected above".

Let's go for a 12 month reservation on AWS, in case our habits change. (And if they do, remember that reserved instance pricing can apply to any instance in the same availability zone on the same account.)

Our monthly cost has dropped to $2738.04. We do have an up-front reservation cost to pay, but if we amortize that over 12 months (as IBM does in their calculator) we are down to $3420.54 per month. Why not throw in Gold Premium Support? It's only another $341/month.

With regard to Gupta's criticisms about not having a PDF export on the Calculator, I find it easy enough to hit "Print to PDF" on a web page myself, and the fact I can export these quotes and publish them on this blog, far outweighs that hassle.

On the topic of software licensing

Pricing is even harder when you have to factor in the price of licensing. In fairness to IBM, the quoted Amazon costs do not include Red Hat Linux licenses. However, I suspect the only reason they were included, aside from IBM being a Big Support kind of company, is that commercially licensed software (RHEL, SUSE, Windows) is the only option you have on SmartCloud Enterprise.

If you want to run Oracle applications on EC2, why not run them on the freely-licensed Oracle Enterprise Linux? Or the most popular operating system for the cloud, Ubuntu Server?

Alternatively, if the requirement for Red Hat Linux is hard-and-fast, then there is an option to run Red Hat on-demand with Amazon EC2. Reserved instance pricing is not currently available for RHEL, therefore you would be better advised to bring your own RHEL licenses to the cloud with Red Hat Cloud Access.

In the interests of full disclosure, the on-demand RHEL price is $4519.34/mo, vs the $4211.90 above.

Did I mention the "everything else?"

Amazon have defined the cloud computing marketplace - at least for infrastructure - with EC2. As Adrian Cockroft points out in his excellent write-up on using clouds vs. building them, no-one can even come close to the price and performance, let alone the global scope, of EC2. If I were building Manav Gupta's web application, I would have the benefit of resiliency by balancing the application between multiple Availability Zones, and the benefit of reduced maintenance by using RDS for the database tier. And the price would probably be even lower, too.

The cloud provides great benefits to those who can make their application fit its ways. This is not a trivial task - sometimes even working the calculators can be too hard. If you want help with this, I am the Head of Cloud Services at Stoneburn in London, and I'd love you to get in touch. (And follow me on Twitter.)

Update: Manav Gupta has commented and provided a much neater explanation for why his first quote was vastly over-provisioned: there is a sample 'web application' option in the AWS calculator, which assigns a bunch of sample infrastructure over and above what was included in the IBM sample web application. The moral of the story is to ensure you are comparing like for like (as much as possible with differing size options between cloud providers) when making provider comparisons.

 

  1. Or, tiered options are clearly laid out, as with AWS data transfer.

August 31, 2011

August 29, 2011

Denting the universe

Rafe Colburn:

Who changed the world most, Google or Apple? […] I’ll boil it down to the most world-changing contribution by each company over the past ten years.

Google is the company that improved search engine results enough to really open the Web to the masses. […]

Apple is the company that brought a real Web browser to the pockets of millions of people. […]

Of course both companies have done many other things, but I don’t think any are as significant as those two. Which one made a greater impact? You tell me.

I believe that goes to Google, hands down.

The web is possibly as big a change in the world as the printing press was – and here I am making this comparison with great respect for its gravity. The mere fact that what I am writing this very moment will be read by a number of people I daren’t think about because it would boggle my mind is something none of my ancestors could dream to achieve. (Yet already this concept is beyond banal to the everyday web user.)

Google was the company that made the web colonisable by the masses. What Apple has done since, no matter how incredibly great, is – with due apologies to Steve – fungible.

In that sense, the first Steve revolution, the PC, was of much greater import than his second one looks to be. (But I will hasten to add that the second one is young as of this writing, and who knows where it will yet lead.) The PC was a necessary step, though not sufficient, to bring the web to everyone.

But in spite of providing a prerequisite for the web, not only did Apple not create the web, but I will go so far here as to argue here that they would and could not have. It is not in the DNA of the company to build humble utilities.

(And finally it should not be forgotten that Apple must share credit for the PC revolution with Microsoft.)

August 27, 2011

The Car extends Vehicle kindergarten, or, Replace Jargon With Pedagogy

Henning Koch, pushing back at my design pattern terminology rant:

I feel the need to defend Martin Fowler’s article because it had such a profound effect on me when it was published. Although I had been playing with “objects” and “classes” before, this article finally made me understand what OOP was all about. This is not true for many other articles and yes, I’m looking at you, shitty Car extends Vehicle OOP tutorial.

(Which made me laugh.)

Aristotle is right when he says that the concept of dependency injection should be so ubiquitously understood that it shouldn’t get its own brand name. But at the same time it represents the very essence of object-orientation and most people are not using it.

I can see what he’s talking about. It took me a lot of time to take to OOP at all, and I’m still not particularly passionate. In my hands it is purely a tool, a means to an end – I reach for it either when I have a lot of functions that operate on a particular data structure, or when I need to be able to pass around interchangeable implementations of a set of behaviours. The latter is precisely what Dependency Injection is about. That is indeed the essence of object orientation – in the abstract.

I suppose that my intuitive understanding of this principle puts me in a minority. For me, instead, the defining moment with regard to OOP was when I read Replace Conditional With Polymorphism. That is the essence of object orientation – in the concrete. That article made a lightbulb go off; for the first time I could verbalise what I had so long been doing intuitively, and why. For the first time, I could look at code and figure out a better design objectively, without handwaving.

Maybe it is time for someone to put together a non-shitty introduction to object orientation, based on explaining the idea of Replace Conditional With Polymorphism first, and how to reap the benefit by consistently using the idea of Dependency Injection second. Perhaps novices would spend less time thrashing about then, and we could finally stop inventing pompous names for perfectly trivial concepts.

Update: more pushback from Adrian Howard. See my comment.

Update: Kragen Sitaker thinks about how to write a better tutorial, with some discussion on Hacker News and silly jokes on Reddit ensuing.

August 21, 2011

“Disruptive”

Patrick Rhone:

Did you catch that? The iPad is causing such disruption in the PC business that HP, a company fundamental to the creation of the personal computer itself, is getting out of the PC business.

Someone has to step up to Apple in competition or “closed and subject to Jobs’ whims” is the future of IT. That’s not a world that I look forward to. Unfortunately, no one else seems to get what Apple is doing right.

August 19, 2011

RAGE

Interesting week for mobile. Google buys Motorola Mobility, hp abandons webOS devices, and only six months after promised, Symbian Anna (aka PR2.0) is released. It's best feature? New icons. No, really.

What's in it?  Lots. Let’s break down the main changes section-by-section. There are the new icons, of course, but there’s a whole lot more under the hood.

The reason Nokia gave for not drastically overhauling the look and feel of Symbian in S^3 was maintaining a "familiar look and feel". Yet their most important change in their first major update... new icons. So long, recognition. They didn't even put the new Nokia Pure font on there.

Anyway. let's update my N8. Turns out, the update is not available over the air in the UK. One of the two distinctly different ways to invoke a software update on the N8 says there is nothing available, the other says I have to use Nokia Ovi Suite on the PC to update. Well, I don't run primarily run a PC, so I dig out a Windows laptop, install Ovi Suite, back up my phone, and... am offered three games, and no system update.

"Server is probably overloaded", the Internet says.

So next morning, I try again. Lo and behold, the system update awaits! Four small steps to follow. But I can't pass step 2 without a SIM card installed. I dredge out my NZ PAYG SIM (which appears to have expired, possibly taking my very old and very awesome phone number with it). However, apparently even a useless SIM is SIM enough for Nokia. Authenticate, back up (again), only to be told at step 3 that there is no update available.

Then, if I unplug the phone and plug it back in again, Ovi Suite tells me there are 9 updates, including applications, but the update screen says "Select the applications you want to install" and then offers nothing to choose from, and no active buttons except "Later".

Later, indeed. I had high enough hopes for this platform that I went to work for the Symbian Foundation, but with the lack of control we had over the platform, I'm not going to be pouring out any 40s in its memory any time soon.

 

I should point out that I'm not even using this phone any more: thanks to the generous Eiren O'Keeffe I'm currently borrowing an HTC Mozart, running Windows Phone 7. In general, and in comparison to the N8, it is fantastic.1 Data works reliably, mail works (without requiring a regular reboot), I can have both my calendars, etc, etc. I can't imagine how long it would take to get to this point with Symbian.

I still think it's too early to call if Nokia made the right choice with WP7. Symbian was obviously not going to get to Mango-good by November. The N9 looks nice, but it's running the abandoned half of Maemo, not the "Intel collaboration" half of MeeGo. Android was out. My friend Nez called it months ago: WP7 is OK. OK enough to sell Nokia phones when other manufacturers have the same software? We shall see. I have not been a fan of Nokia's industrial design of late, but I really do miss offline maps.

So far, the downsides for me of WP7 are the aforementioned maps, a weird bug where it wouldn't let me hang up once, and a lack of "official" apps. There is a free BBC News app in the Marketplace, with "This is a 3rd party application in no way associated with the BBC" on the front screen, but no actual BBC app. The Twitter app is passable, and apparently much improved in Mango.2 It's still hard to find third-party apps that are anywhere near as good as what you get on iOS, and especially hard to judge that from the Marketplace app on the phone.

There is an stolen-firmware update to WP7 Mango but I'm laying off installing it, expecting the official update will probably be out within a month. I somewhat expected it to beat Symbian Anna out.

 

Further RAGE: creating a cisco.com ID requires a "9-50 character username" (one character longer than my first and last name concatenated) and a password with a maximum of 15 characters (counting out my regular password, "correct horse battery staple".)3

They also can't get their story half straight:

On the form: Must be 8 or more characters and contain a combination of uppercase and lowercase letters (A-Z or a-z) and at least 1 number (0-9).
After submitting: Invalid Password. Password is case sensitive with length between 5 and 15 characters. Password cannot be the same as the user name.

And then it didn't like my phone number.

And then the captcha, which had not changed all the way through, changed.

And then it mentioned, for the first time, a field I hadn't updated.

And then, after EVERYTHING WAS FINALLY ACCEPTABLE, "Your session is no longer active".

It disturbs me a little that I have very little positive comment to make about all this technology at the moment. Perhaps I should have used 'curmudgeon' as my Cisco username. That's over 9 characters.

 

This last one actually started out as a positive story, but quickly soured: I found that with Miray HDClone, you can take a Windows machine running on Amazon EC2 with an instance-store (S3) root, clone the disk to an EBS volume, and then attach that EBS volume to a new machine. Voilà, one persistent machine!

After applying 2 years worth of Windows updates, and for the first time ever, I actually found and cleaned malicious software with the Microsoft Malicious Software Removal Tool. Unfortunately for recovery purposes, getting "the Windows CD" on EC2 seemed a little harder than it should be - even a couple of ISOs I threw up there were not recognised. Unable to guarantee the system was in a good state, I advised the machine needed to be rebuilt from scratch, which of course requires the owner to audit all the software that was installed on it over the last 2 years. They will have fun with that!

 

  1. Except for the squareness of everything. Round those rectangles and I would be a happy camper.
  2. I sometimes felt the Gravity fanboy-ism on Symbian. However, Gravity has had to reinvent the entire platform to become half decent, and suffers from decisions made in less connected times - I still have to wait to "Go online" before I can use the app. Sorry Ole, but I'd rather use an OK app on a good platform than a great app on a burning platform.
  3. I had to look that up - it obviously wasn't as memorable as the list of 10 objects I was asked to remember in first-year psych, which are still burned into my head.

August 16, 2011

Thoughts on "Letter about the TEI" from Martin Mueller

Thoughts on "Letter about the TEI" from Martin Mueller

Note: I am a member of the TEI council, but this message is should be read as personal position at the time of writing, not a council position, nor the position of my employer.

Reading Martin's missive was painful. I should have responded earlier, I think perhaps I was hoping someone else could say what I wanted to say and I could just say "me too." They haven't so I've become the someone else.

I don't think that Martin's "fairly radical model" is nearly radical enough. I'd like to propose a significantly more radical model as strawman:


1) The TEI shall maintain a document called the 'The TEI Principals.' The purpose of The TEI is to advance The TEI Principals.

2) Institutional membership of The TEI is open to groups which publish, collect and/or curate documents in formats released by The TEI. Institutional membership requires members acknowledge The TEI Principals and permits the members to be listed at http://www.tei-c.org/Activities/Projects/ and use The TEI logos and branding.

3) Individual membership of The TEI is open to individuals; individual membership requires members acknowledge The TEI Principals and subscribe to The TEI mailing list at http://listserv.brown.edu/?A0=TEI-L.

4) All business of The TEI is conducted in public. Business which needs be conducted in private (for example employment matters, contract negotiation, etc) shall be considered out of scope for The TEI.

5) Changes to the structure of The TEI will be discussed on the TEI mailing list and put to a democratic vote with a voting period of at least one month, a two-thirds majority of votes cast is required to pass a motion, which shall be in English.

6) Groups of members may form for activities from time-to-time, such as members meetings, summer schools, promotions of The TEI or collective digitisation efforts, but these groups are not The TEI, even if the word 'TEI' appears as part of their name.




I'll admit that there are a couple of issues not covered here (such as who holds the IPR), but it's only a straw man for discussion. Feel free to fire it as necessary.



August 12, 2011