Sites, Applications, Solutions since 1995

Psyphi Blog v5

[All Users] [All Tags] [Latest Entries]
14 Mac OSX Apps I can't do without Posted by rmp on 2008-05-16 11:49:07 I recently did a clean installation of my ppc powerbook, about 3 or 4 years' old. I was able to surreptitiously acquire a copy of Leopard , not yet officially supported by the systems group at Sanger . Moving from Panther this was a bit of a jump but everything went pretty smoothly. I chose to do a clean installation rather than an upgrade because I had so much cruft on the laptop I only wanted to be left with the things I actually used.

Reinstalling my non-O/S applications afterwards (especially those which I'd been version-marooned on, not being on the more common Tiger release) made me really appreciate the ones I actually use. In the order they're on my taskbar (no particular order) here are the apps I can't do without:

Microsoft_Remote_Desktop

Great application - unimaginably useful - I remote administer most of my extended family's computers with this (over SSH) now. The ability to mount local drives remotely is a blessing and it generally performs better than VNC which I also use for administering older PCs without the RDP service (primary WinXP Home).

Adium

In my opinion the best unified instant messenger client out there. It even comes with Twitter support.

Skype

Of course - everyone should be using something like this. I previously had X-Lite too but didn't tend to use it very much, most of my social network being on Skype.

Colloquy

A fantastic IRC client. I've fond memories of mIRC on Windows, BitchX and X-Chat being the other clients I use occasionally. I usually find myself in Colloquy for my IRC needs these days.

Firefox

Primarily for development. I still find it doesn't work briliantly for regular surfing but the development tools are unparalleled.

Camino

My day to day browser - the same engine as Firefox under the hood but better-integrated with the OSX control panel and preferences.

Audacity

The best, cross-platform, audio editor out there. WAV and MP3 support amongst other things. Simple and easy to use.

GIMP

Probably the closest free thing to Photoshop. Enough said..

VLC

The VideoLAN client is my preferred video player. It supports all the formats I've ever thrown at it and has shoutcast support amongst other things, too. MPlayer is another one I use occasionally too as I find the subtitle support better.

humaxGui

At home I have a Humax PVR and this app provides file transfer on/off it. It's *really* slow but it works just like an FTP client.

Aquamacs_Emacs

The time I don't spend in a web browser or terminal I spend in Aquamacs. In my opinion the port of Emacs to OSX with the best spread of features.

Neooffice

I used to prefer OpenOffice but it had to run under X. I started using Neooffice as after the reinstall I noticed it had been ported to Openoffice 2 which has much better foreign file support. Neooffice runs natively (though it is Java).

SSH Keychain

Open virtually all the time I manage all of my remote work and administration via SSH keychain, particularly the tunnel management. It could all be done using command-line ssh and the .ssh/config if pushed but I like the auto-restart and convenience of having it in a desktop application.

Quicksilver

Shortcuts for everything and everything via its shortcut. Quicksilver is impressive and I know I've hardly scratched the surface with the things it can do.

Looking through that list there aren't many other applications I have which I couldn't do without - MacPorts is worth a mention, as are the MySQL_GUI_Tools and Processing plus various other drivers and applications

(0 comments) [Full Article] [Comment]

Massively Parallel Sequence Archive Posted by rmp on 2008-04-30 00:02:59 For some time now at Sanger we've been looking at the problems and solutions involved with building services supporting what are likely to become some of the biggest databases on the planet. The biggest problem is there aren't too many people doing this kind of thing and who are willing to talk about it.

The data we're storing falls into two categories. Short Read Format (SRF) files containing sequence, quality and trace (~10Gb per lane) data and FastQ containing sequence and quality (~1Gb per lane).

Our requirements for these data are fundamentally for two different systems. One is a long-term archival system for SRF, the responsibility for which will eventually be shifted to the EBI . The second is, for me at least, the more interesting system -

The short-term storage of reads and qualities (and possibly also for selected alignments) isn't the biggest problem - that honour is left to the fast, parallel retrieval of the same. The underlying data store needs to grow at a respectable 12TB per year and serve maybe a hundred simultaneous users requesting up to 1000 sequences per second.

Transfer times for reads are small but as a result are disproportionately affected by artefacts like TCP setup times, HTTP header payloads and certainly index seek times.

We're looking at a few horizontally-scaling solutions for performing these kinds of jobs - the most obvious are tools like MapReduce and equivalents like Hadoop running with Nutch . My personal favourite and the one I'm holding out for is MogileFS from the same people who brought you Memcached . Time to get benchmarking!

Updated: Loved this via Brad
(0 comments) [Full Article] [Comment]

ClearPress-146 Posted by rmp on 2008-04-29 23:15:41 Latest release of ClearPress (v146) out to the CPAN yesterday. The ClearPress data model now implements belongs_to_through, belongs_to, has_many and has_many_through entity relationships for all you ActiveRecord lovers.

Two ClearPress-derived projects are using a half-decent test fixture system. It's really making a big difference to the development of both DECIPHER and NPG so I'm planning to bundle what can be bundled with an upcoming release.
(0 comments) [Full Article] [Comment]

History Meme Posted by rmp on 2008-04-17 14:55:52 On the laptop:

history|awk '{print $2}'|sort|uniq -c|sort -rn|head
135 prove
 80 svn
 71 make
 27 cover
 27 HARNESS_PERL_SWITCHES=-MDevel::Cover
 23 scripts/yaml_dumper
 22 open
 21 perl
 18 ls
 11 pwd
On the workstation:

history|awk '{print $2}'|sort|uniq -c|sort -rn|head
  149 ls
   51 perl
   45 svn
   42 cd
   24 tail
   15 df
   14 rm
   11 less
   11 cat
    9 grep
Very satisfying to see that about half the top 10 things on my laptop are related to testing. Sadly the same isn't true on my workstation. yaml_dumper dumps data from a mysql database in YAML format for use in ClearPress fixtures.

Matt prompted me. I tag Andy and Jody to run with it.
(0 comments) [Full Article] [Comment]

Infrared Pen MkI Posted by rmp on 2008-04-02 00:06:58 So, this evening, not wanting to spend more time on the computer (having been on it all day for day 2 of DB's Rails course) I spent my time honing my long-unused soldering skills and constructing the first revision of my infrared marker pen for the JCL-special Wiimote Whiteboard.

The raw materials:

http://://psyphi.net/gfx/ir_pen/IMG_0155.JPG

Close-up of the LEDs I'm removing:

http://://psyphi.net/gfx/ir_pen/IMG_0157.JPG

The finished article:

http://://psyphi.net/gfx/ir_pen/IMG_0159.JPG

Close-up of switch detail:

http://://psyphi.net/gfx/ir_pen/IMG_0160.JPG

Activated under the IR-sensitive digital camera:

http://://psyphi.net/gfx/ir_pen/IMG_0161.JPG

I must say it's turned out ok. I didn't have any spare small switches so went for a bit of wire with enough springiness in it. On the opposite side of the makeshift switch is a retaining screw for holding the batteries in. I'm using two old AAA batteries (actually running about 2.4V according to the meter) and no resistor in series. The LED hasn't burnt out yet!

To stop the pen switching on when not in use I slip a bit of electrical tape between the contacts. Obviously you can't tell when it's on unless you put in another, perhaps miniature, indicator visible LED.

It all fits together quite nicely though the retaining screw is too close for the batteries and has forced the back end out a bit - that's easy to fix.

As I'm of course after multitouch I'll be building the MkII pen soon with the other recovered LED!

(0 comments) [Full Article] [Comment]

Web frameworking Posted by rmp on 2008-03-31 23:47:19 It seems to be the wrong time to be reading such things, but over on InfoQ there's a nice_article introducing web development of RESTful_services using Erlang and the Yaws high performance web server.

I say "the wrong time" as this week has kicked off the "Advancing with Rails" course by David_A._Black of Ruby_Power_and_Light fame. The course is fairly advanced in terms of required rails knowledge so it's a bit of a baptism by fire for me and a few others having never written any Ruby before.

Rails is proving moderately easy to pick up but as I've remarked to a couple of people, it doesn't seem any easier coding with Rails than with Perl. Perhaps it's because I've never done it before but I reckon it's a lot harder spending my time figuring out what the heck DHH meant something to do than it is doing it myself.

Even though it's nowhere near as mature, I do reckon my ClearPress framework has a lot going for it - it's pretty feature-complete in terms of ORM, views and templating ( TT2 ). It has similar convention over configuration features meaning it's not designed for plugging in other alternative layers but it is absolutely possible to do (and I suspect without as much effort as is required in Rails). I still need to iron out some wrinkles in the autogenerated code from the application builder and provide some default authorisation and authentication mechanisms, some of which may come in the next release. But in the meantime it's easy to add these features, which is exactly what we've done for the new sequencing run tracking app, NPG to tie it to the WTSI website single sign on (MySQL and LDAP under the hood).

(0 comments) [Full Article] [Comment]

All Leoparded Up Posted by rmp on 2008-03-28 23:54:19 Hurray! I managed to snag an OSX 10.5 installation disk today and took the opportunity to upgrade my 10.3.9 PPC Powerbook and skip 10.4 Tiger completely. Apparently this was a bit of a feat to perform at Sanger where 10.5 has hitherto been unsupported.

In the "Rails Club" meeting in preparation for Dave Black's Rails course next week it was pretty obvious that Leopard will spread quickly now those other unspeakable Mac users know it's out in the wild.

So the installation took about an hour and a half wallclock time, or about 30 minutes Microsoft time - too many bars with "1 minute remaining" for ten minutes. It all went pretty smoothly though I did opt for a full install rather than an upgrade. Unfortunately I've had to spend the best part of the last 6 hours installing DarwinPorts, gem updating to Rails 2 and reinstalling the long-awaiting 10.5 versions of all my favourite apps - AquaMacs, Adium, CotVNC, VLC, Camino, Skype, Colloquy, Firefox and a few others. Plus of course setting it all up just the way I like it.

Initial impressions are that it's rather shiny and pleasant to use - I like Spaces & Dashboard (don't forget that wasn't in my old 10.3) and overall the setup definitely seems faster - surprisingly noticable when compiling and installing things from CPAN. Can't wait to try out Time Machine over the weekend!

(0 comments) [Full Article] [Comment]

interactivity experiments Posted by rmp on 2008-03-26 23:08:47 For a few months now I've been watching utterly compelling and inspirational HCI things like these: . I know most of them are a bit dated now, in fact from as far back as 2006, but they're still jaw-droppingly awesome.

So in a fit of inspiration and weekend project madness and frustration at the clumsiness of a regular touch-screen LCD I've been picking up things from Ebay and fishing around in my boxes of knackered electronics to find components suitable for assembling one or two of these sorts of devices.

There are two types of these interactive interfaces - the JCL-style wiimote-based ones which use bright sources of infrared, either transmitted or reflected and the bluetooth Nintendo controller; and the second is the Jeff Han / Perceptive_Pixel -style of frustrated total internal reflection or FTIR where infrared is reflected out of a planar surface and is picked up by a camera similar to the one in the wiimote.

Anyway, costs so far:

Wiimote: ~£28; old infrared remote control for filters & LEDs: free;

Philips bSure XG2 projector: ~£180; Philips SPC900NC: ~£30; 4.3mm CCTV lens (no IR filter): ~$12

I've been having trouble making the bluetooth pairing for the wiimote work correctly under OSX 10.3.9 - I think it's about time I had the laptop upgraded - it's work's after all. I think that should fix it for OSX, but I have had some success - this evening under Ubuntu with the Bluez_stack and libwiimote I've been able to capture events from the wiimote including spots using the IR camera. I've also been successful using camstream with the SPC900NC and CCTV lens to capture spots from working TV remotes, both directly and reflected from a wall - it's surprisingly effective!

More to come - next with the wiimote interface I need to build my whiteboard-marker battery-driven IR LED pen. Next with the FTIR display I need to experiment with a few different types of perspex and rear-reflection material. I *really* want to be able to perform pattern recognition similar to the reactable and I don't think tracing paper will work for rear-projection. Knowing next to nothing about plastics technology I think I'd like to try frosted acrylic first, or maybe just finely-sanded regular acrylic. Ebay here I come again!



(0 comments) [Full Article] [Comment]

Development Communications Posted by rmp on 2008-03-03 23:46:43 For a while now, more or less since I switched teams (from Core Web to Sequencing Informatics) I've wanted to write more about the work we do at Sanger. There's so much of it which is absolute cutting edge research and a very large proportion of that is poorly communicated both inside and outside the institute. Most of it's biology of course, which I know little about, and couldn't discuss in detail, GCSE being the furthest I took things in that direction.

However some of the great advances have been in big IT. We're in the same ballpark as CERN's high-energy physics and NASA's astronomical data. Technology is something I understand and /can/ talk about here.

So... I run the new sequencing technology pipeline development team. This means I and my team are responsible for ensuring efficient use of the Sanger's heavy investment in massively parallel sequencing instruments, primarily 28 Illumina Genome Analyzers. To do this we have a farm of 608 cores, a mix of 4- and 8-core Opteron blades with 8Gb RAM and a 320Tb shared Lustre filesystem. It seems to be becoming easy for users and administrators at Sanger to toss these figures around but the truth of the matter is that whilst this kit fits in only a handful of racks, it's still a pretty big deal.

The blades run linux, Debian Etch to be precise. The Illumina-distributed analysis pipeline (itself a mix of Perl, Python and C++) is held together with Perl applications (web and batch) which also cooperate RESTfully with a series of Rails LIMS applications developed by the Production Software team.

Roughly a terabyte of image data is spun off each of the 28 instruments every 2-3 days. The images are stacked and aligned and sequences are basecalled from spot intensities. These short reads are then packaged up with quality values for each base and dropped into approximately 100Mb compressed result files ready for further secondary analysis (e.g. SNP-calling).

More to come later but for now the take-home message is that the setup we're using is in my opinion a fair triumph, and definitely one to be proud of. It's been a (fairly) harmonious marriage of tremendous hardware savvy from the systems group and the rapid turnaround of agile software development from Sequencing Informatics, of which I'm pleased to be a part.
(0 comments) [Full Article] [Comment]

ClearPress-99 Posted by rmp on 2008-03-03 22:10:15 Last week saw the latest release of ClearPress, http://search.cpan.org/~rpettett/ClearPress/ . ClearPress is a basic, RESTful, MVC Perl application framework I've developed in tandem with my work at the Sanger Institute http://www.sanger.ac.uk/ .

The original aim of ClearPress was to provide a RESTful MVC framework which integrated with the Sanger's website single sign on. Having proved its usefulness with the first release of the tracking system I developed, ClearPress was spun off into a project of its own together with dependencies abstracted out of the Sanger-specific environment.

ClearPress sports a MySQL-backed ORM, automatic, extensible content-negotiation and easily-templated HTML, XML, Atom, RSS, JSON, iCal, YAML, PNG and other format views. It can run standalone, as CGI or under ModPerl::Registry.

I'm using ClearPress in most of my projects these days, both work and non-work. Blogs, document management, laboratory tracking and various other standalone apps. Hopefully soon there'll even be a dedicated site together with examples. For now you can check out the application-builder and example distributed with the package.
(0 comments) [Full Article] [Comment]

The Importance of Profiling Posted by rmp on 2008-02-10 21:23:21 I've worked as a software developer and worked with teams of software developers for around 10 years now, Many of those whom I've worked with have earned my trust and respect in relation to development and testing techniques. Frustratingly however it's still with irritating regularity that I hear throw-away comments bourne of uncertainty and ignorance.

A couple of times now I've specifically been told that "GD makes my code go slow". Now for those of you not in the know GD (actually specifically Lincoln Stein's GD.pm in perl) is a wrapper around Tom Boutell's most marvellous libgd graphics library. The combination of these two has always performed excellently for me and never been the bottleneck in any of my applications. The applications in question are usually database-backed web applications with graphics components for plotting genomic features or charts of one sort or another.

As any database-application developer will tell you, the database, or network connection to the database is almost always the bottleneck in an application or service. Great efforts are made to ensure database services scale well and perform as efficiently as possible, but even after these improvements are made they usually simply delay the inevitable.

Hence my frustration when I hear that "GD is making my (database) application go slow". How? Where? Why? Where's the proof? It's no use blaming something, a library in this case, that's out of your control. It's hard to believe a claim like that without some sort of measurement.

So.. before pointing the finger, profile the code and make an effort to understand what the profiler is doing. In database applications profile your queries - use EXPLAIN, add indices, record SQL transcripts and time the results. Then profile the code which is manipulating those results.

Once the results are in of course, concentrate in the first instance on the parts with the most impact (e.g. 0.1 second off each iteration of a 1000x loop rather than 1 second from /int main/ ) - the low hanging fruit. Good programmers should be relatively lazy and speeding up code with the least amount of effort should be commonsense.
(0 comments) [Full Article] [Comment]

Great pieces of code Posted by rmp on 2008-02-03 15:25:02 A lot of what I do day-to-day is related to optimisation. Be it Perl code, SQL queries, Javascript or HTML there are usually at least a couple of cracking examples I find every week. On Friday I came across this:

SELECT cycle FROM goldcrest WHERE id_run = ?


This query is being used to find the number of the latest cycles (between 1 and 37 for each id_run) in a near-real-time tracking system and is used several times whenever a run report is viewed.

EXPLAIN SELECT cycle FROM goldcrest WHERE id_run = 231;
+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+
| id | select_type | table     | type | possible_keys | key     | key_len | ref   | rows   | Extra       |
+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+
|  1 | SIMPLE      | goldcrest | ref  | g_idrun       | g_idrun |       8 | const | 262792 | Using where | 
+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+


In itself this would be fine but the goldcrest table in this instance contains several thousand rows for each id_run. So, for id_run, let's say, 231 this query happens to return approximately 588,000 rows to determine that the latest cycle for run 231 is the number 34.

To clean this up we first try something like this:

SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = ?


which still scans the 588000 rows (keyed on id_run incidentally) but doesn't actually return them to the user, only one row containing both values we're interested in. Fair enough, the CPU and disk access penalties are similar but the data transfer penalty is significantly improved.

Next I try adding an index against the id_run and cycle columns:

ALTER TABLE goldcrest ADD INDEX(id_run,cycle);
Query OK, 37589514 rows affected (23 min 6.17 sec)
Records: 37589514  Duplicates: 0  Warnings: 0


Now this of course takes a long time and, because the tuples are fairly redundant, creates a relatively inefficient index, also penalising future INSERTs. However, casually ignoring those facts, our query performance is now radically different:

EXPLAIN SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = 231;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL |    NULL | NULL | NULL | Select tables optimized away | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = 231;
+------------+------------+
| MIN(cycle) | MAX(cycle) |
+------------+------------+
|          1 |         37 | 
+------------+------------+
1 row in set (0.01 sec)


That looks a lot better to me now!

Generally I try to steer clear of the mysterious internal workings of database engines, but with much greater frequency come across examples like this:

sub clone_type {
  my ($self, $clone_type, $clone) = @_;
  my %clone_type;
  if($clone and $clone_type) {
    $clone_type{$clone} = $clone_type;
    return $clone_type{$clone};
  }
  return;
}


Thankfully this one's pretty quick to figure out - they're usually *much* more convoluted, but still.. Huh??

Pass in a clone_type scalar, create a local hash with the same name (Argh!), store the clone_type scalar in the hash keyed at position $clone, then return the same value we just stored.

I don't get it... maybe a global hash or something else would make sense, but this works out the same:

sub clone_type {
  my ($self, $clone_type, $clone) = @_;
  if($clone and $clone_type) {
    return $clone_type;
  }
  return;
}
and I'm still not sure why you'd want to do that if you have the values on the way in already.

Programmers really need to think around the problem, not just through it. Thinking through may result in functionality but thinking around results in both function and performance which means a whole lot more in my book, and incidentally, why it seems so hard to hire good programmers.
(0 comments) [Full Article] [Comment]

OpenWRT WDS bridging Posted by rmp on 2007-12-18 22:33:25 I've had a pile of kit to configure recently for an office I've been setting up. Amongst the units I specified the second Linksys WRT54GL I've had the opportunity to play with.

My one runs White Russian but I took the plunge and went with the latest Kamikaze 7.09 release. It's a little different to what I'd fiddled with before but probably more intuitive to configure with files rather than nvram variables. I'm briefly going to describe how to configure a wired switch bridged to the wireless network running WDS to the main site router (serving DHCP and DNS).

From a freshly unpacked WRT54GL, connect the ethernet WAN uplink to your internet connection and one of the LAN downlinks to a usable computer. By default the WRT DHCPs the WAN connection and serves DHCP on the 192.168.1 subnet to its LAN.

Download http://downloads.openwrt.org/kamikaze/7.09/brcm-2.4/openwrt-wrt54g-2.4-squashfs.bin to the computer then login to the WRT on 192.168.1.1, default account admin/admin. Upload the image to the firmware upgrade form. Wait for the upload to finish and the router to reboot.

Once it's rebooted you may need to refresh the DHCP lease on the computer but the default subnet range is the same iirc. telnet to the router on the same address and login as root, no password. Change the password and the SSH service is enabled and telnet service disabled.

I personally prefer using the the x-wrt interface with the Zephyr theme so I install x-wrt by editing /etc/ipkg.conf and appending "src X-Wrt http://downloads.x-wrt.org/xwrt/kamikaze/7.09/brcm-2.4/packages". Back in the shell run "ipkg update ; ipkg install webif". Once completed you should be able to browse to the router's address (hopefully still 192.168.1.1) and continue the configuration. You may wish to install matrixtunnel for SSL support in the web administration interface.

I want to use this WRT both to extend the coverage of my client's office wireless network and to connect a handful of wired devices (1 PC, 1 Edgestore NAS and a NSLU2).

So step one is to assign the router a LAN address on my existing network. The WAN port is going to be ignored (although bridging that in as well is probably possible too). In X-wrt under 'Networks' I set a static IP of 192.168.1.253 , netmask of 255.255.255.0 and a router of 192.168.1.254 - the existing 'main router' BT homehub serving the LAN and whose wireless we'll be bridging to. The LAN connection type is 'bridged'. DNS in this case is the same as the main router. I've left the WAN as DHCP for convenience though the plan is not to use it. Save the settings and apply.

Under 'Wireless' turn the radio on and set the channel to the same as the main router. Choose 'lan' to bridge the wireless network to, set mode to 'Access Point', WDS on, 'broadcast ESSID' to your personal preference (I set 'on') and AP isolation off. The ESSID itself needs to be the existing name for your network and encryption set appropriately to match. Save and apply.

Now the magic bit - I'm told this should go in the BSSID box which only seems to be present when mode is set to WDS. What needs to happen is that the WRT needs to know which existing AP to bridge to. Under the hood it's done using the command 'wlc wds main-ap-mac-address-here' and not having an appropriate text box to put it in it's almost always possible to fiddle with the startup file. It's a hack for sure but it seems to work ok for me!

Lo! A WDS bridge.

Update 2007-01-07: After installing the bridge on-site I had to reconfigure it in "Client" mode using the regular WDS settings as that seemed to be the only way to make it communicate with the Homehub. Pity - that way it doesn't extend the wireless range, just hooks up anything wired to it. It worked fine when I set it up talking to my wrt.
(0 comments) [Full Article] [Comment]

What Can Bioinformaticians Learn from YouTube? Posted by rmp on 2007-11-06 22:46:13 Caught Matt's talk this morning at the weekly informatics group meetings -

There were general murmurings of agreement amongst the audience but nobody asking the probing questions I'd hope for as a measure of interestedness.

Matt touched upon microformats in all but name - I was really expecting a sell of http://bioformats.org/ , websites as APIs and RESTful web services in particular.

Whilst I'm inclined to agree that standardised, discoverable, reusable web services are largely the way forward (especially as it keeps me in work) I'm not wholly convinced they remove the problems associated with, for example, database connections, database-engine specific SQL, hostnames, ports, accounts etc.

My feeling is that all the problems associated with keeping track of your database credentials are replaced by a different set of problems, albeit more standardised in terms of network protocols in HTTP and REST/CRUD. We now run the risk that what's fixed in terms of network protocols is pushed higher up the stack and manifests as myriad web services, all different. All these new websites and services use different XML structures and different URL schemes. The XML structures are analogous to database table schema and the URL schemes akin to table or object names.

At least these entities are now discoverable by the end user/developer simply by using the web application - and there's the big win - transparency and discoverability. There's also the whole microformat affair - once these really start to take off there'll be all sorts of arguments about what goes into them, especially in domains like Bio and Chem, not covered by core formats like hCard. But that's something for another day.

More over at Green_Is_Good
(0 comments) [Full Article] [Comment]

7 utilities for improving application quality in Perl Posted by rmp on 2007-10-08 23:10:16 I'd like to share with you a list of what are probably my top utilities for improving code quality (style, documentation, testing) with a largely Perl flavour. In loosely important-but-dull to exciting-and-weird order...

Test::More . Billed as yet another framework for writing test scripts Test::More extends Test::Simple and provides a bunch of more useful methods beyond Simple's ok(). The ones I use most being use_ok() for testing compilation, is() for testing equality and like() for testing similarity with regexes.

ExtUtils::MakeMaker . Another one of Mike Schwern's babies, MakeMaker is used to set up a folder structure and associated 'make' paraphernalia when first embarking on writing a module or application. Although developers these days tend to favour Module::Build over MakeMaker I prefer it for some reason (probably fear of change) and still make regular mileage using it.

Test::Pod::Coverage - what a great module! Check how good your documentation coverage is with respect to the code. No just a subroutine header won't do! I tend to use Test::Pod::Coverage as part of...

Test::Distribution . Automatically run a battery of standard tests including pod coverage, manifest integrity, straight compilation and a load of other important things.

perlcritic, Test::Perl::Critic . The Perl::Critic set of tools is amazing. It's built on PPI and implements the Perl_Best_Practices book by Damien Conway. Now I realise that not everyone agrees with a lot of what Damien says but the point is that it represents a standard to work to (and it's not that bad once you're used to it). Since I discovered perlcritic I've been developing all my code as close to perlcritic -1 (the most severe) as I can. It's almost instantly made my applications more readable through systematic appearance and made faults easier to spot even before Test::Perl::Critic comes in.

Devel::Cover . I'm almost ashamed to say I only discovered this last week after dipping into Ian Langworthy and chromatic's book 'Perl Testing'. Devel::Cover gives code exercise metrics, i.e. how much of your module or application was actually executed by that test. It collates stats from all modules matching a user-specified pattern and dumps them out in a natty coloured table, very suitable for tying into your CI system.

Selenium . Ok, not strictly speaking a tool I'm using right this minute but it's next on my list of integration tools. Selenium is a non-interactive, automated, browser-testing framework written in Javascript. This tool definitely has legs and it seems to have come a long way since I first found it in the middle of 2006. I'm hoping to have automated interface testing up and running before the end of the year as part of the Perl CI system I'm planning on putting together for the new sequencing pipeline.
(0 comments) [Full Article] [Comment]

Leisa Reichelt (disambiguity.com) - Ambient Intimacy Posted by rmp on 2007-10-05 13:18:39 Leisa presents an enjoyable voyage through cognitive psychology and the social network scene. Makes me wish I'd taken more of the psych options as part of my computer science degree.

ref: http://graphpaper.com/

Personal information bandwidth & learning speed has increased. New, lightweight yet extremely powerful means of communication represent ambient intimacy - a personal social platform. This isn't one to one messaging or one to the masses broadcasting, it's pushing messages into a defined area (multicast if you will). It represents the creation of a techno-social system beyond personal interaction - a more continuous interpersonal awareness.

In his book, "Grooming, gossip and the evolution of language", Dunbar describes how better social understanding leads to evolutionary growth of brains, improvement of language and better flexibility when competing for shared resources (food, sex etc.).

This intercommunication is largely a phatic expressiveness for virtual spaces. In linguistics a phatic expression is one whose only function is to perform a social task.

The phrase "continual partial friendship" coined by David Weinberger describes the almost permanent interconnectedness and friendship users feel when part of a collective virtual community built on these sorts of communication media.

"It's not about being poked and prodded, it's about exposing more surface area for others to connect with" - Johnnie Moore

New media (mobile 'phones, the internet) overcome geographical dislocation.

But it's often a love/hate thing (ref: http://twitter.com/ ) and can also cause problems with cognitive dissonance with false human interaction. Interacting virtually the subconsciousness is devoid of its usual cues - facial expressions, tone of voice, body language, resulting in unnatural stress.

The other problem associated is information overload - "infomania dents IQ more than marijuana"

- anticipated reciprocity

- reputation

- sense of efficacy

- identification with a group

ref: tom coates' presentation on social software

It has been noted that a social networks' pooled knowledge makes the whole network grow smarter. I'd personally take this further and suggest that any open data, social or otherwise but particularly in scientific contexts, makes the network grow smarter. ref: PLoS

As developers we need to support ambient intimacy. Applications need to be sympathetic to the fact that we as people are easily distracted. They need to be undemanding but intrusive enough to increase awareness of events.

- keep it lightweight

- stay out of the way

- open your API

- portable social networks

- use the periphery - antithesis of classical interface development/design

- allow for time-shifting

ref: twitterific

(0 comments) [Full Article] [Comment]

Erika Hall, (Mule Design) - Copy is Interface Posted by rmp on 2007-10-05 00:24:53 Erika outlines some dos and don'ts for those of us writing copy and building interfaces.

Gesture driven interfaces are coming but not for rich/dense data. People will want to access your application in new ways, so what does this mean for applications? Are you beginning to take device independence into account?

Pretty much everything is/has a text-based interface. We as users need to draw meaning from a stream of data.

How do users benefit? Clarity & understanding often develops from immediately interacting with the data.

How do developers benefit? User adoption & success

5 ways to get words right

- be authentic - a strong sense of service focus. Add the human touch

- be engaging - ref: http://schoolofeverything.com/ immediate clarity of offering. Involves elements of empathy with users.

- be specific - disambiguation of meaning ref: http://etsy.com/

- be appropriate - understand what the role of your application is in your users' lives. Use copy, tone & concepts to build rapport.

- be polite - as long as you're considerate and respectful of what users are coming to do, users can be very forgiving ref: http://feedburner.com/ ref: http://subtraction.com/ - social engineering and implicit standards through copy

8 kinds of bad

- don't be vague

- don't use unnatural language (e.g. banks wanting to 'expand your relationship')

- don't be passive (e.g. third-person)

- don't be too clever/cute

- don't be rude

- don't be oblivious to your surroundings - you don't know how people are going to be accessing your app

- don't be inconsistent, e.g. my vs. your

- don't be presumptuous

Take home:

You will still need designers.

"You are sociable and entertaining"
(0 comments) [Full Article] [Comment]

Eric Rodenbeck, (Stamen Design) - Next Generation Visualisations Posted by rmp on 2007-10-05 00:18:55 Eric takes us through a heavily visual applications developed by Stamen in the last few years. All aim to map detailed data spaces that's to say structures which are too complicated for lists.

Eric and by extension Stamen see data visualisation as a medium. The data is mostly live, but when it's not it's either vast or deep.

Example: http://cabspotting.org/ showing something that's live contrasted with something that's historical. Cabspotting's animation of circulatory systems really set me thinking about how this sort of visualisation could be applied to biotech, PP interactions, gene ontologies, citations and real biophysical systems.

Example: Oakland crime: notice and explore interpretation of patterns. Built with {modest maps} framework ref: http://modestmaps.com/

Example: Digg labs: Swarm, Stack, Digg Spy, Ark. "Ambient engagement".

Example: Twitter blocks: 3-dimensional message space

Example: Real Estate flow: Trelia (sp?)

Be open to the process of exploration - start with the data, not preconceived views of what it should look like.

Stamen_Design

I loved the question from AbilityNet about whether Eric had thought about accessibility for these apps. It had him completely stumped and I have to admit adding serious accessibility to these apps, whilst being extremely cool would also be extremely difficult.

I'd have liked to see something more immersive and with a biotech or medical twist. I wish I'd had the chance to hook up with Eric to discuss efforts in this sort of sphere.
(0 comments) [Full Article] [Comment]

John Aizen & Eran Shir (Dapper) - Practical Semantic Web (web plumbing 101) Posted by rmp on 2007-10-05 00:08:22 TBL once said "in the future everyone will write semantically correct websites" but the vision of a world of personalised agents has not come about. Largely things failed to take off because making things semantically correct is expensive - it requires effort. Luckily current web apps are changing this with APIs, content distribution & aggregation and meaningful search.

How has this come about?

- The Feed

- Light, easily adopted technologies: e.g. REST vs. SOAP; AJAX vs. Server-side; Microformats vs. RDF+OWL

- Increasing openness, encouraging mashups via APis and low-effort semantics

Introducing Dapper: Creating APIs for other websites, mostly community-generated. The users have the time and incentive. Dapper extensions can then be reused as services on top of other platforms, e.g. Pipes, Google gadgets & Facebook.

Example: Semantically linked advertising, e.g. a loaded shopping cart built on a recipe page.

Example: Meaningful search w/ results dissected enabling search, drill-down and filtering by automatically indexed categories

Dapper attempts to address the serious issues of fragility commonly associated with classical screen-scraping using elements of graph theory and community power.

Dapper also incorporates the gamut of CC licensing to better enable site-owners to control their content whilst boosting consumer confidence in reliability of data.
(0 comments) [Full Article] [Comment]

Simon Wardley - Short on Cycles, Long on Storage Posted by rmp on 2007-10-04 23:48:02 I had the pleasure of coming across Simon's visually engaging and entertaining presentation about commoditisation of IT recently on the web and was looking forward to his physical re-enactment of 300 slides in 30 minutes. Now I'm no economist and have no solid grounding in business management (however much I'd like to) so I'll accidentally gloss over points of variable importance which I didn't capture in time (Simon was *fast*).

Historically in software there have been no real utility services, but this is changing.

Commoditisation can be defined as the conversion of 'new' to 'common', or yesterday's hot stuff becoming today's boredom. The current trends in the commoditisation of IT services can easily be compared historically with for example the electricity or steel industries.

'New' has equivalence to competitive advantage whereas 'common' is equivalent to the cost of doing business. There is a constant pressure largely from the consumer towards commoditisation to drive down costs. There are weaker commercial pressures in the opposite direction in terms of protecting investment through patents and similar restrictions but the overall movement is towards utility.

This process has historically been termed 'creative destruction' (attribution).

These days there is no competitive advantage in having your own web infrastructure, leading to X as a Service, or XaaS. A stack of choice is presented: Software aaS, Framework aaS, Hardware aaS. Each presents its own economies of scale but what happens when things stop working for whatever reason (e.g. hardware host goes bust)?

Ultimately to avoid risk and exit costs, applications need to be portable between providers, ideally a federated service grid with transparent migration following price/performance patterns. This portability and choice can only be achieved via the adoption of open standards at each and every level of the stack.

Ref: Open_Virtual_Machine_Format , Xen

In order to better understand commoditisation the distinction should be drawn with commodification, or the process of taking invention or discovery to idea to innovation (or the 'new').

The overall commodification+commoditisation process is not a straight line. It doesn't just apply to things and not everything spreads in the same way. Concepts following this process are able to accelerate following the S-curve for diffusion of innovation via various means: for example positive network effects (the more useful, the more adopters) and standards for communication (xml, internet, open source) removing barriers to adoption. Customers can be both friend and foe. At the start of the curve the costs of doing business act as a barrier to market entry. As these costs reduce, customers become competitors.

Simon's take home messages:

1. XaaS will rapidly grow through adoption of open standards removing barriers to competition

2. There will be an increase in the rate of innovation and services released to the web

3. More disruption in the information markets
(0 comments) [Full Article] [Comment]

Dave Morin (Facebook) - The story behind the Facebook platform Posted by rmp on 2007-10-04 11:57:50 Dave's engaging and talking fast so more commentary and less transcription this time. Facebook is an impressive example of social networking success. Apparently incredible growth, doubling every 6 months with near 50% users returning every day.

Facebook is about enabling mapping of the social graph. The power of the social graph enables near viral transfer of functionality and features. Facebook photos has nearly twice the traffic of all the other photo sites combined.

The Facebook platform aims to allow developers to create applications which leverage and add context to the social graph via deep integration with the platform. Like so many other apps presenting here, it all hinges on open APIs and transparent, open data access.

The Facebook opportunity chain looks something like this: Innovation, growth, engagement, monetisation.
(0 comments) [Full Article] [Comment]

Leah Culver (Pownce) - Practical lessons we learned Posted by rmp on 2007-10-04 11:47:08 Leah gives an engaging talk about topics with which we as developers should all be familiar. What I've found mildly surprising about this and many of the related talks I've seen here at FoWA so far is how little is actually required to get started in terms of both infrastructure and investment.

Pownce is a social messaging application, developed in 4 months. Invite-only launch in June.

Think about technology choices - we could pick anything. Social as well as technological reasons factored into our decisions but took risks.

Why Django? Python web framework. Documentation and readability; auto-generated admin

Why S3? Less maintenance for Pownce; inexpensive.

Why AIR? Works on PC & Mac; easy to develop; encourages good ui.

Do a lot with a little: One website developer; self-funded; short deadlines

Small teams: multiple roles; learn quickly; dedicated, have to enjoy what you're working on

Open source tools: Someone has solved this problem before (and they're probably smarter than me). Lots of tools available.

Use your resources: documentation websites; IRC; network and learn from friends; exchange ideas with other sites

Be kind to your database: main bottleneck; one mysql database; responding quickly to slow queries has helped keep pownce running

- Caching: Memcached; caching at page and object level; cached static pages since launch. - Queuing: take a shorter note of a longer process to do later; send notes via a job queue; need to improve our queueing system - Limits & pagination: lists of notes, friends, recipients; good ui - Indexing: - Avoiding complexity: consider if queries are actually needed; usually good to avoid abstract or conceptual data display

Expect anything: young sites run into many problems; need to respond quickly; can't prepare for everything and every app is unique.

Keep backups: use revision control; if developing locally, backup personal data

Log everything: stats to monitor; quantitative measures; pretty graphs

Community: let users know what you're working on; respond to individual bug reporters; inform users of bug fixes and new features

Prepare to scale up: don't prematurely optimise (unless you work with Kevin Rose); design for success; accept that your code will change
(0 comments) [Full Article] [Comment]

Paul Graham (Y Combinator) -The Future of Web Startups Posted by rmp on 2007-10-04 11:16:45 Day 2 of FOWA kicks off with an introduction on openness - open apps, open data, open apis/access and open social networks. Microformats, creative commons. OAuth open authentication - OpenID for APIs. Bootstrapping applications from existing social network data. Ref: XFN, Pownce, Dopplr .

Bravely presenting using Google Presenter Paul begins by describing the history of commoditisation touching on the history of steel, agriculture and of course computers ending with startups - nowadays cheap and common.

- Lots of them. Startups are hard but can be very profitable. 2-3 hundred years ago more people were in farming which were effectively self-run businesses so today's more common preference of having a regular job has been a reversal of long trends.

- Standardisation - looking for common ways to scale investing. Begin by releasing a crappy version 1 and then let the market design the product. Historically the customised raising of money was expensive. Now things have been genericised and use standardised terms and ocntracts with the exception of weird Angel funders.

- New attitude towards acquisition. Google is the leader buying lots of startups, more than you realise. Often residual feelings amongst developers that acquisitions shouldn't be necessary. Google leads the field and nobody else should have a problem with them.

- Risk - always proportional to reward e.g. starting a new search engine in 1997. Founders & investors have different attitudes to risk - founders tend to be more conservative. But if startups become more common then these two attitudes tend to converge and the overall amount of wealth it's possible to create becomes larger.

- Younger, nerdier founders of more common startups due to the lower barriers to entry. Instead of approaching VCs with a plan you can start the company with seed funding beforehand.

- Still need Silicon Valley? It's true you can startup anywhere but success is more likely in a startup hub like SV. The real value of course lies in more than technology - face to face meetings & communication. The question is not whether you need it but whether it offers you an advantage over your competitors. Relating this to the new ease of starting up means that your startup is more mobile and easier to migrate to a hub. Is seed funding truly international? In that case startups wouldn't be as likely to move, but evidence suggests those moving are more likely to succeed.

- Also needed: Better Judges. Acquirers have this easier as they work later than investors and have more performance to measure. Their lawyers will make you, the 22 year old about to be rich for life, pay, oh yes. Maybe in future companies will have a VP and a chief acquisition officer CAO responsible for bringing in technology from outside.

- College will change. Most obviously degrees won't be necessary for those starting up. The importance of degrees is driven by the administrative needs of large organisations, e.g. visas for getting into the US. Startups are judged by their users who don't care where the founders went to college. The most important thing about college is the network of students and colleagues. Instead of trying to get good grades for employers, students will be trying to learn stuff.

- Lots of competitors - with more startups this is natural but Paul perceives no limit to the number of startups who can succeed.

- Faster advances - the good side to having more people with the same ideas is that things evolve faster. People won't wait to act on new ideas. YCombinator enables the release of latent hacker energy

- The internet is a series of tubes. Process of starting startups is like old plumbing which is slowly being streamlined and replaced by one big pipe. Performance measures will propagate back up the pipe.

In questions Paul draws the distinction between Startups as a short-term company with an exit strategy and a straight forward small business with an ongoing plan.

A lot of what drives startups is hard - what can large companies do to improve and be more agile like startups - releasing stuff can be a good start. Don't be quite so paranoid about brand - be willing to take a little risk and release stuff which isn't necessarily complete.
(0 comments) [Full Article] [Comment]

Kevin Rose (Digg, Rev3, Pownce) - Lessons learned from launching startups Posted by rmp on 2007-10-03 22:33:29 Kevin outlines key points drawn from his experiences of launching three startups in the last three years.

Budgeting & Investment: Create a budget w/personal investment goals. Bootstrap it on your own without VC funding. Define measurable metrics tied to investment goals and don't be tempted to invest further unless the goals are achieved.

Saving money: Use freelance coders, cheap to get started but prepare for the future (elance.com), plan for success and rapid scalability. Use rented servers w/remote management (vnc/kvm/power) e.g. Calpop, Ev1 servers, Media Temple. Utilise low-cost, open source - LAMP, django, mysql, apache, S3 (for pownce).

Making money: Adsense tied to readers / page views; pro accounts tied to subscribers

Design: Go professional when it's appropriate.

Features (of the social variety): Import address book; Make it easy to add friends; 'Shout Story' simple usergroup-driven features; Email story (involving icon familiarity for common apps, e.g. outlook, thunderbird); Friend-of-a-friend (FOAF)-style social connectivity.

Borrow from the best, e.g. fans/friends feature

Scaling: Memcached. Hire a good DBA to review schema. Hire an admin to review Apache config. Enhance service visibility e.g. analytics, custom stats, nagios

Working with large communities: Use a product blog, a dedicated place for comments about the service(s). Increase transparency, e.g. rather than disappearing posts, deletions redirect to cease & desist. Read support email

This wasn't quite what I expected from a keynote. I think was hoping for something a little more chatty, anecdotal and entertaining after a day of several similar developer talks. Kevin's bulletpointed lists included a fair amount of content common with many of them but the handful of ties to Digg and pownce were moderately interesting. Maybe he was saving himself for tonight's Diggnation.
(0 comments) [Full Article] [Comment]

Mark Quirk (Microsoft) - 7 Things You Probably Don't Know About That You Can Use in Your Future Web Apps (for FREE) Posted by rmp on 2007-10-03 22:13:57 Microsoft Virtual Earth w/ interactive sdk http://www.microsoft.com/virtualearth/

Popfly snappy mashup graphical code generator rather like Pipes

Visual Web Developer, SQL Server 2005 http://msdn.microsoft.com/express

Python and Ruby on .net http://www.codeplex.com/IronPython/ http://rubyforge.org/projects/ironruby

Live Alerts: http://signup.alerts.live.com/brochure/ email, IM, mobile alerts

Silverlight streamed apps & videos

Astoria online database web services

Seadragon: http://labs.live.com/Seadragon.aspx large virtual screen. Impressive.
(0 comments) [Full Article] [Comment]

Heidi Pollock (BluePulse) - Taking Your Application Mobile Posted by rmp on 2007-10-03 17:40:49 Interesting resume of experience in terms of mobile devices and development covering Yahoo and Twitter as a background to the "in my experience"-style points.

The biggest growth in mobile use is in places like Africa, far east on a range of low-end, (ugly) phones. A big misconception about mobile web use is that most users are high end and this doesn't hold up in real experience.

About 3000 phones, all with different browsers - every one's unique. So baseline targetted is 176px and 10k page weight limit.

Ref: WURFL

e.g. Arguably the most popular is the Motorola v3 8 lines * 30 characters. For multilingual sites, e.g. German, words can become unworkably long.

Code should be limited to basic XHTML Mobile 1.0 and basic CSS. WML is unworkable (memory limits etc.) unless you know your targetted community is largely WML. Notably missing from XHTML Mobile are things like headings and lists. Semantic Web ideals tend not to stand up. As CSS doesn't work all the time then it can't be relied upon to do things like unindenting lists.

Recommended, appropriate tools: Firefox extensions: Modify headers, user agent switcher, WML Browser, XHTML Mobile Profile.

Users are bored or in need. Preserve your brand with logo, colours & copy. Navigation links are overrated - replace dropdowns with search/autocomplete etc.

Development tips: Target a device list; think like a phone; learn to live with it; mobile acid test http://jwtmp.com/a
(0 comments) [Full Article] [Comment]

Matthew Haughey (MetaFilter) - Creating and Running Communities Posted by rmp on 2007-10-03 17:00:53 Matthew offers a list of ideas and insights based on his personal experience of growing MetaFilter (12m pageviews/week on 2 servers btw) which I'm largely going to stuff in here verbatim.

Any site/app should include a social component - we're social creatures after all. Risks can be big but a successful community is great for everyone involved. Lifecycle - spike at initial launch then drop but slow growth afterwards with three outcomes - continued growth, steady userbase or decline e.g. when community builder leaves. High-level goal: "Be a third place" - home, work, pub/sports/virtual...

Have a compelling idea

Build the best app you can - people are familiar with a baseline set of tools & features

Eat your own dogfood - build for yourself first

Highlight the best - award contributors / power users; recognition helps the readers too. Moderators: your best contributors.

Get out of the way; Build in flexibility, allow unintended uses. Build out based on the edges.

Run it well - guidelines over rules, tailored to community norms; keep emotions out of decisions.

Ownership issues.

Ephemeral happiness. Every community has a revolt eventually.

Customer service - more time spent than coding

Hire others, early if you can

Metrics ease the workload

Be transparent. Have a place to talk about the site/app. Collaborative efforts for new features. Explain changes, over-explain. Acknowledge mistakes

Legal troubles. What's illegal and where? Limited liability, terms of service, privacy policy, digital millenium copyright act, etc. On the positive side, lawsuit threats are many, lawsuits are few.

What's stopping your site/app from building out a community? A successful community can please readers & creators alike.
(0 comments) [Full Article] [Comment]

Matt Mullenweg (WordPress) - The Architecture Behind WordPress.com Posted by rmp on 2007-10-03 15:18:05 Scaling platform:

88m global uniques 1.5m blogs 215m pageviews/day

- 7 boxes, $1500/mo.

- 2 balancers, 2Gb memory, any disk, pound+wackamole+spread.

- 2 databases, 4Gb+ memory, fast disk (RAID), master+slave mysql, split read/write.

- 3 webs, fast cpu, 2Gb memory, litespeed or well-configured apache.

- Everything in subversion.

- Be stateless (shared nothing)

- Memcached

geo-targetting of dns, might as well use a CDN

single box 300 req/sec 29.5m/day

Scaling community:

Ref: sxsw presentation

Scaling business:

Tie revenue streams to cost scaling, e.g. pageviews rather than bloggers. Ads scale revenues with pageviews, effective CPM, low outlay. VIP class of users with advanced features, customisation etc.

Scaling people:

Hire people as good as, or better than you. Great people = Rich environment + worthwhile problems. 5 things to look for:

- Personality fit (when the shit hits the fan)

- Ability to learn (curiosity)

- Taste (can't be taught)

- Passion for the space

- Familiarity with current technologies

Don't put out too easily - don't hire if doubts exists

It's fun to know that the folks at wordpress manage deployment with ssh scripts running svn up.

I disagree with Matt's QA response suggesting that you should never do your own hosting. In principle that's ok for the large faceless mass of current web sites & traffic but under special circumstances it's the only way. I build web apps on top of bioinformatics pipelines supporting (today) 320Tb data and there's no way on Earth this would be possible to outsource.
(0 comments) [Full Article] [Comment]

Daniel Burka (digg/Pownce) - How user feedback can influence design Posted by rmp on 2007-10-03 15:15:18 Dan begins with comments on the Mozilla site redesign and an observation that user feedback is more than just 'good' or 'bad'. He contrasts young (pownce) vs. old (digg) communities providing different types of feedback. Young user communities feel the moving progress of an app, and are part of creative development process. Old, existing communities come with strong expectations that have been created and set patterns of use and tend to be much more resistant to change.

Before considering change, decide whether it's going to be worth it.

Rely on previous feedback; know your community; anticipate areas of friction; focus groups and usability studies; determine measures of success

Outlining types of feedback, they're split into five principle types:

- positive feedback

- bug reports

- negative feedback

- expert feedback (bug reports, expert knowledge + solutions)

- implicit feedback (observing user behaviour, objective metrics, speaks for silent users)

Consider carefully before reacting to feedback as kneejerk reactions can produce worse results.
(0 comments) [Full Article] [Comment]

Robin Christopherson (AbilityNet) - The art of attractive yet usable websites Posted by rmp on 2007-10-03 12:43:29 Now I think I've seen Robin talk before at either OSCON or Apachecon but it's always an incredible eye-opener to see simple use of a screenreader. Amazon.. how crap is that? That's a stunningly unusable site and it's far from being the only one!

How many developers do you know who actually have a screenreader installed on their development system for testing basic functionality?

Audio descriptions for multimedia & core content/functionality. Must recommend this back at basecamp for at least YourGenome .

Magnifier usage, perhaps something more of us have tinkered with, but usually only as a gimmick/toy not as critical to usability. Check your site's navigation again, but this time through magnifier.

Robin demonstrates a series of high-profile example websites almost all of which fail to meet accessibility requirements. I'm sure I'm as guilty as the next developer of not catering for these users but these are worldwide corporations with buckets of cash to throw at web design and who are immediately alienating a significant proportion of their users. Google comes out of the wash impressively well.

Push keyboard accessibility hard. All those HTML attributes we don't have time to put in are legally required. I'm sure there'd be a huge stink if/when high-profile legal cases start being brought against non-compliant services.
(0 comments) [Full Article] [Comment]

Dion Almaer - How to take your app offline Posted by rmp on 2007-10-03 12:16:45 So... caffeined and chocolate-brownied up and with a quick skim around the exhibition floor I'm back on the developer track and interested in what Dion has to say. I think he's going to be talking about AIR the Adobe integrated runtime flash platform. Maybe with some discussion of Microsoft's silverlight and the opensource moonlight which I came across last week.

So.. close. Google Gears is where it's at. I worry about the seemingly mass conversion from evil Microsoft to whiter-than-white Google. Were I choosing an offline platform for my web apps I'm really not sure I'd use Google. Yes, perhaps it's the usual conspiracy/FUD but I fear their power.

Anyway - however ubiquitous today's pervasive connectivity I think these frameworks will be critical in going forward. Hurray for SQLite!

Ref: GearsDB

Ref: GearsORM

Ref: GearShift - DB migrations for Gears

Local databases are great for performance but when the schema is upgraded your users need their local data migrated and their local databases modified. Sounds somewhat sticky to do in terms of web apps.

Ooh. Workerpools - hot - asyncrhonous / thread-like non-blocking semantics for heavy-duty client-side work. This looks great for offloading intensive calculation work from the server onto the client without interrupting the user-experience. I wonder how the cross-origin API stuff really works underneath. Sounds like cross-domain JSON to me, avoiding the single-domain security restrictions which 'real' AJAX enforces.

Syncing - Why is keeping data in sync *always* such a big problem? It seems to me that there has to be a simpler way to do this kind of stuff. On the surface these toolkits seem to work in a very similar way to svn + svk and they have the same conflict resolution problems. Most of the time syncing works ok but in the event of a conflict they'll largely throw all their toys out of the pram and sit there wailing until the user comes along to sort it out.

Could future technology based on this kind of stuff replace classical desktop applications? I think it definitely could for anything interactive. It absolutely makes sense to develop new applications using these toolkits as long as the conflict resolution is fixed and the online/offline data-store handling is transparent to the developer.
(0 comments) [Full Article] [Comment]

Steve Souders - Yahoo Performance - High Performance websites Posted by rmp on 2007-10-03 12:13:23 The gist of this presentation was to perform accurate, scientific benchmarking of front- vs. back-end overheads up-front as a developer. This should be common-sense but I don't know *anyone* amongst my colleagues who does this proactively. It's almost always reactive as a result of performance problems post-release.

Steve presented a series of guidelines on streamlining content with the aim of improving performance. Some of these conflict a little in terms of common design practices (e.g. use of javascript/css frameworks). These guidelines are more-or-less what the YSlow Firebug plugin incorporates and in a nutshell look something like this:

fewer http requests

use a cdn

add an expires header

gzip components

put stylesheets at the top

move scripts to the bottom

avoid css expressions

make js and css external

reduce dns lookups

minify js

avoid redirects

remove duplicate scripts

configure etags

make ajax cacheable

split static content across multiple domains

reduce the size of cookies

host static content on a different domain

minify css

avoid iframes

Ref: YSlow firebug plugin! Book: High Performance Websites Blog: YUIBlog, YDNBlog
(0 comments) [Full Article] [Comment]

Adobe - 10 apps in 10 minutes Posted by rmp on 2007-10-03 11:51:11 A whistle-stop tour of 10 new tech webapps. I didn't capture all of them but here are some I hadn't seen before and others worthy of a mention. Sliderocket scrapblog picnik buzzword AIR pownce (0 comments) [Full Article] [Comment]

Ready, Set, FoWA! Posted by rmp on 2007-10-02 22:05:13 Really can't wait to get to the Future_of_Web_Apps tomorrow (nice and early of course, though I'll be crossing Londinium on the tube during rush hour - joy).

I've just been circling sessions on the schedule and find myself mostly in the developer track on day 1 but with an even split between developer and business tracks on day 2. I'm almost wishing Carson hadn't packed so much goodness in so tightly.

I have to say I'm a little torn between developer sessions which are appropriate for work and business sessions which are slightly more useful for the special things I do out of hours but almost justifiable under the 'strategy' heading...

Anyway, email screwups aside it's shaping up to be a great show. More to come...
(0 comments) [Full Article] [Comment]

Hiring Perl Developers - how hard can it be? Posted by rmp on 2007-09-28 21:27:51 All the roles I've had during my time at Sanger have more or less required the development of production quality Perl code, usually OO and increasingly using MVC patterns. Why is it then that very nearly every Perl developer I've interviewed in the past 8 years is woefully lacking, specifically in OO Perl but more generally in half-decent programming skills?

It's been astonishing, not in a good way, how many have been unable to demonstrate use of hashes. Some have been too scared of them (their words, not mine) and some have never felt the need. For those of you who aren't Perl programmers, hashes (aka associative arrays) are a pretty crucial feature of the language and fundamental to its OO implementation.

Now I program in Perl sometimes more than 7-8 hours a day. For many years this also involved reworking other people's code. I can very easily say that if you claim to be a Perl programmer and have never used hashes then you're not going to get a Perl-related job because of your technical skills. With a good, interactive and engaging personality and a desire for self-improvement you might get away with it, but certainly not on technical merit.

It's also quite worrying how many of these interviewees are unable to describe the basics of object-oriented programming yet have, for example, developed and sold a commercial ERP system, presumably for big bucks. Man, these people must have awesome marketing!

Frankly a number of the bioinformaticians already working there have similar skills to the interviewees and often worse communication skills, so maybe I'm simply setting my standards too high.

I really hope this situation improves when Perl 6 goes public though I'm sure it'll take longer to become common parlance. As long as it happens before those smug RoR types take over the world I'll be happy ;)
(0 comments) [Full Article] [Comment]

DECIPHERing Large-Scale Copy Number Variations Posted by rmp on 2007-09-24 22:09:40 It's strange.. Since moving from the core Web Team at Sanger to Sequencing Informatics I've been able to reduce my working hours from ~70-80/week all the way down to the 48.5 hours which are actually in my contract.

In theory this means I've more spare time, but in reality I've been able to secure sensible contract work outside Rentacoder which I've relied on in the past.

The work in question is optimising and refactoring for the DECIPHER project http://decipher.sanger.ac.uk/ which I used to manage the technical side of whilst in the web team.

DECIPHER is a database of large-scale copy number variations (CNVs) from patient arrayCGH data curated by clinicians and cytogeneticists around the world. DECIPHER represents one of the first clinical applications to come out of the HGP data from Sanger.

What's exciting apart from the medical implications of DECIPHER's joined-up thinking is that it also represents a valuable model for social, clinical applications in the Web 2.0 world. The application draws in data from various external sources as well as its own curated database. It primarily uses DAS http://biodas.org/ via Bio::Das::Lite and Bio::Das::ProServer and I'm now working on improving interfaces, interactivity and speed by leveraging MVC and SOA techniques with ClearPress and Prototype .

It's a great opportunity for me to keep contributing to one of my favourite projects and hopefully implement a load of really neat features I've wanted to add for a long time. Stay tuned...
(0 comments) [Full Article] [Comment]

VoIP peering & profits Posted by rmp on 2007-09-06 22:35:44 So... shortly, I believe from February next year but am probably mistaken, prices in the UK go up for calling "Lo-Call" 0845 numbers. As I understand it they'll be similar, or the same as 0870 rates at 20p/min or so.

Now I wonder if the regulator has missed a trick here. It so happens that the nation is converting to broadband, be it ADSL or cable-based, and that very many of those broadband packages now come with VoIP offerings as standard.

My point is that these bundled broadband VoIP packages invariably come with 0845 dial-in numbers and no other choice. Dialing out via your broadband ISP may well be cheap for you but spare a thought for those calling in at much higher rates.

Having been tinkering with VoIP for a good few years I realise that actually this should be ok because calling VoIP-to-VoIP should be free, right? Wrong. Most of these ISPs don't peer to each others' networks - for two main reasons as far as I can see -

1. They're competitors and have little business reason to peer, apart from keeping the small proportion of aware customers happy.

2. These ISPs make profits from users dialing in - 0845 is a profit-sharing prefix with which both BT and the ISP in question have a stake. This old story is of course also true of many telephone help-desks and similar. Keeping the customer on the line longer means more profits for the company and its shareholders.

It seems to me that the world could be a better, more communicative place through more thorough VoIP network peering but I simply can't see it becoming widespread whilst profits stand in the way.
(0 comments) [Full Article] [Comment]

The simplest of organisation Posted by rmp on 2007-08-18 16:47:12 Ever since I started implementing SCRUM for my application development at work friends of mine have expressed an interest in the way it works.

Recently even people passing through my office - there talking to my colleagues and who I don't know very well - have been remarking on the backlogs which are displayed in a prominent position above my desk. I think they're impressed by the simplicity of the system and how effective it seems to be for me.

I must admit my backlogs are simpler than the full blown setup. As I'm still in the process of hiring, I currently only really develop alone so I'm not bothering with the intermediate item-in-progress stickies.

I also have tasks organised in a 2-dimensional area with axes for complexity and importance. Although sprint backlog tasks are prioritised by my customers, it's been proving useful to have my take on these attributes displayed spatially rather than just writing '3 days' on the ticket.

In fact I keep my product backlog organised this way as well, as soon as tickets come in. It allows me to relay my take on the tasks to the customers straight away, whether or not we're building a sprint backlog at the time. When a sprint has finished the product backlog is reorganised to take account of any changes, e.g. to infrastructure, affecting the tasks.
(0 comments) [Full Article] [Comment]

Picking up momentum Posted by rmp on 2007-07-04 22:20:16 It seems people are fairly taken with the BarCamb idea. It's been lightly advertised internally at Sanger and has been picking up some interest via that and also on the upcoming page http://upcoming.yahoo.com/event/208327/ .

I wonder how many of the people already signed up actually have something to present. Having been at the WTSI for nearly eight years now I've a number of things I could talk about, it's just a case of deciding which of them would be more interesting for people and that really depends on where attendees are coming from.

So... one or more of the following, of the things I've been working on recently - Bio::Das::Lite & Bio::Das::ProServer, ClearPress or the new sequencing technology. Now I'm not a biologist or a chemist either by trade or by hobby and I'm pretty certain that talking about NST is going to be asking for a whole bunch of biology and chemistry-question trouble. I guess DAS-related things are the most useful to present as they have the widest scientific application.

Though there's nothing like a good bit of self-promotion so maybe something short on ClearPress would be a good thing too. Might need to improve the application builder and test-suite a bit more for that.

In related news, not wanting to be outdone by Matt's BarCamb I coauthored and submitted a venue proposal for YAPC::Europe 2008 last week. Woohoo! Nail-biting stuff. The genome campus would be a great place to host it for all sorts of reasons - integrated and well supported conference centre; secured financial committment; great science to talk about and a tremendous perl resource to tap into just to list a few.

All I need to do now is submit my travel application for YAPC::Europe Vienna later this year and see how it's done (again). It's been a while since I've been to a YAPC::Europe!
(0 comments) [Full Article] [Comment]

Barcamp Cambridge Posted by rmp on 2007-06-26 23:15:56 So... BarCamp Cambridge, or BarCamb http://upcoming.yahoo.com/event/208327/ as we're affectionately calling it is definitely green for go.

To be hosted at the Wellcome Trust Sanger Institute http://www.sanger.ac.uk/ near Cambridge it's hopefully going to be a day of grass-roots science and technology talks on the 24th of August. That's two months away last Sunday so plenty of time to unorganise it.

Should be interesting and I think I'm looking forward to it though I'm not sure what to expect. It could, of course, be an utter disaster, but what better area to have it than Cambridge, and what better site than the Genome Campus, however biased I might be?

I always dread saying this, but "more coming soon" I hope!
(0 comments) [Full Article] [Comment]

Sporting developments Posted by rmp on 2007-06-15 20:51:39 I recently started reading 'Agile Software Development with Scrum' http://www.compman.co.uk/scripts/browse.asp?ref=558044 by Schwaber and Beedle. It's a great introduction to this branch of the Agile movement. It's easy to read and contains practical advice and straight-forward explainations of the terms and processes involved with Scrum.

Even more satisfying than the read itself was the realisation that I've been using a good number of the Scrum techniques in managing projects within my team for the last three years or so. I love the idea of a development team reaching a nirvana-like hyper-productive state though one of the examples of a four-person team at Quattro producing 1000 lines of C++ a week took me aback.

In the middle of last month I moved to a new position at WTSI, Team Leader for the New Sequencing Pipeline development team (currently consisting of me). Since then I've been working on what I'll now call a code sprint and last week I had my first product increment. The product is a smallish system for tracking runs on the new technology sequencing machines but is around 10,000 lines of Perl (excluding templates, CSS & tests) built on a light MVC framework I produced in the same time. A one man-team producing 3,333 loc in a week seems ultra-productive and I can't believe it's *purely* down to the fact that Perl is easier to write than C++.

Anyway, I'm on a C++ course all next week, so I'll soon be able to tell. Shame it's not about Rails instead ;)

(0 comments) [Full Article] [Comment]

Another new style Posted by rmp on 2007-03-19 23:50:04 Another year, another new look... So the bamboo style didn't last very long. I was never really happy with it and it just didn't look professional enough to me. Ok, so the new one's a bit of a rip-off of Shilo Design http://www.shilodesign.com/ but that one's in Flash and this is CSS and different enough in my opinion.

I think this one's cleaner and smarter and I'd normally invite comments but I haven't added that feature to this blogging app yet. What can I say? I like the technical aspects (challenges?) of doing things myself.
(0 comments) [Full Article] [Comment]

Hosting, advertising, content Posted by rmp on 2006-12-12 22:53:05 Well more than a year has passed and it's just plain embarrassing having not written anything here. So much has happened the choice is either to update or delete. I'll try updating...

So... I've switched ISP again. This site is now free from shackles. I've delegated the DNS over to http://afraid.org/ which is updated by web-client from one of the psyphi.net servers. Everything you now see is running over my home ADSL line. It's not too bad but could seriously do with better upload bandwidth. This new setup now allows me full control over everything server-wise and as much storage as I can eat. Hopefully it saves a small bucket of cash too.

I've switched advertising from AdBrite to AdSense which seems to be a little more effective. If and when it starts generating a few cents revenue it may even pay for the domain and the meagre subscription to afraid.org .

It mostly boils down to content in the end and yes, I know there is very little around here. Of course I'd like to change that little by little but I won't promise anything because then it definitely won't happen. Steve Pavlina http://www.stevepavlina.com/ reckons he generates $9000/month with his productivity site and whilst I know I won't reach that for some time it's still fairly inspiring. There are still a couple of months left to make my million before the deadline.
(0 comments) [Full Article] [Comment]

Going Green / The Iceman Cometh Posted by rmp on 2005-10-07 10:48:11 Well it's turning cold again. Doesn't say too much for this country but we've definitely seen the last of summer. The fog's setting in and it won't be lon