Sporting developments
Posted by rmp at 20:51 15th Jun 2007
I recently started reading 'Agile Software Development with Scrum' http://www.compman.co.uk/scripts/browse.asp?ref=558044 by Schwaber and Beedle. It's a great introduction to this branch of the Agile movement. It's easy to read and contains practical advice and straight-forward explainations of the terms and processes involved with Scrum.
Even more satisfying than the read itself was the realisation that I've been using a good number of the Scrum techniques in managing projects within my team for the last three years or so. I love the idea of a development team reaching a nirvana-like hyper-productive state though one of the examples of a four-person team at Quattro producing 1000 lines of C++ a week took me aback.
In the middle of last month I moved to a new position at WTSI, Team Leader for the New Sequencing Pipeline development team (currently consisting of me). Since then I've been working on what I'll now call a code sprint and last week I had my first product increment. The product is a smallish system for tracking runs on the new technology sequencing machines but is around 10,000 lines of Perl (excluding templates, CSS & tests) built on a light MVC framework I produced in the same time. A one man-team producing 3,333 loc in a week seems ultra-productive and I can't believe it's *purely* down to the fact that Perl is easier to write than C++.
Anyway, I'm on a C++ course all next week, so I'll soon be able to tell. Shame it's not about Rails instead ;)
(0 comments)
Psyphi Blog v5
The simplest of organisation
Posted by rmp at 16:47 18th Aug 2007
Ever since I started implementing SCRUM for my application development at work friends of mine have expressed an interest in the way it works.
Recently even people passing through my office - there talking to my colleagues and who I don't know very well - have been remarking on the backlogs which are displayed in a prominent position above my desk. I think they're impressed by the simplicity of the system and how effective it seems to be for me.
I must admit my backlogs are simpler than the full blown setup. As I'm still in the process of hiring, I currently only really develop alone so I'm not bothering with the intermediate item-in-progress stickies.
I also have tasks organised in a 2-dimensional area with axes for complexity and importance. Although sprint backlog tasks are prioritised by my customers, it's been proving useful to have my take on these attributes displayed spatially rather than just writing '3 days' on the ticket.
In fact I keep my product backlog organised this way as well, as soon as tickets come in. It allows me to relay my take on the tasks to the customers straight away, whether or not we're building a sprint backlog at the time. When a sprint has finished the product backlog is reorganised to take account of any changes, e.g. to infrastructure, affecting the tasks. (0 comments)
Recently even people passing through my office - there talking to my colleagues and who I don't know very well - have been remarking on the backlogs which are displayed in a prominent position above my desk. I think they're impressed by the simplicity of the system and how effective it seems to be for me.
I must admit my backlogs are simpler than the full blown setup. As I'm still in the process of hiring, I currently only really develop alone so I'm not bothering with the intermediate item-in-progress stickies.
I also have tasks organised in a 2-dimensional area with axes for complexity and importance. Although sprint backlog tasks are prioritised by my customers, it's been proving useful to have my take on these attributes displayed spatially rather than just writing '3 days' on the ticket.
In fact I keep my product backlog organised this way as well, as soon as tickets come in. It allows me to relay my take on the tasks to the customers straight away, whether or not we're building a sprint backlog at the time. When a sprint has finished the product backlog is reorganised to take account of any changes, e.g. to infrastructure, affecting the tasks. (0 comments)
DECIPHERing Large-Scale Copy Number Variations
Posted by rmp at 22:09 24th Sep 2007
It's strange.. Since moving from the core Web Team at Sanger to Sequencing Informatics I've been able to reduce my working hours from ~70-80/week all the way down to the 48.5 hours which are actually in my contract.
In theory this means I've more spare time, but in reality I've been able to secure sensible contract work outside Rentacoder which I've relied on in the past.
The work in question is optimising and refactoring for the DECIPHER project http://decipher.sanger.ac.uk/ which I used to manage the technical side of whilst in the web team.
DECIPHER is a database of large-scale copy number variations (CNVs) from patient arrayCGH data curated by clinicians and cytogeneticists around the world. DECIPHER represents one of the first clinical applications to come out of the HGP data from Sanger.
What's exciting apart from the medical implications of DECIPHER's joined-up thinking is that it also represents a valuable model for social, clinical applications in the Web 2.0 world. The application draws in data from various external sources as well as its own curated database. It primarily uses DAS http://biodas.org/ via Bio::Das::Lite and Bio::Das::ProServer and I'm now working on improving interfaces, interactivity and speed by leveraging MVC and SOA techniques with ClearPress and Prototype .
It's a great opportunity for me to keep contributing to one of my favourite projects and hopefully implement a load of really neat features I've wanted to add for a long time. Stay tuned... (0 comments)
In theory this means I've more spare time, but in reality I've been able to secure sensible contract work outside Rentacoder which I've relied on in the past.
The work in question is optimising and refactoring for the DECIPHER project http://decipher.sanger.ac.uk/ which I used to manage the technical side of whilst in the web team.
DECIPHER is a database of large-scale copy number variations (CNVs) from patient arrayCGH data curated by clinicians and cytogeneticists around the world. DECIPHER represents one of the first clinical applications to come out of the HGP data from Sanger.
What's exciting apart from the medical implications of DECIPHER's joined-up thinking is that it also represents a valuable model for social, clinical applications in the Web 2.0 world. The application draws in data from various external sources as well as its own curated database. It primarily uses DAS http://biodas.org/ via Bio::Das::Lite and Bio::Das::ProServer and I'm now working on improving interfaces, interactivity and speed by leveraging MVC and SOA techniques with ClearPress and Prototype .
It's a great opportunity for me to keep contributing to one of my favourite projects and hopefully implement a load of really neat features I've wanted to add for a long time. Stay tuned... (0 comments)
Hiring Perl Developers - how hard can it be?
Posted by rmp at 21:27 28th Sep 2007
All the roles I've had during my time at Sanger have more or less required the development of production quality Perl code, usually OO and increasingly using MVC patterns. Why is it then that very nearly every Perl developer I've interviewed in the past 8 years is woefully lacking, specifically in OO Perl but more generally in half-decent programming skills?
It's been astonishing, not in a good way, how many have been unable to demonstrate use of hashes. Some have been too scared of them (their words, not mine) and some have never felt the need. For those of you who aren't Perl programmers, hashes (aka associative arrays) are a pretty crucial feature of the language and fundamental to its OO implementation.
Now I program in Perl sometimes more than 7-8 hours a day. For many years this also involved reworking other people's code. I can very easily say that if you claim to be a Perl programmer and have never used hashes then you're not going to get a Perl-related job because of your technical skills. With a good, interactive and engaging personality and a desire for self-improvement you might get away with it, but certainly not on technical merit.
It's also quite worrying how many of these interviewees are unable to describe the basics of object-oriented programming yet have, for example, developed and sold a commercial ERP system, presumably for big bucks. Man, these people must have awesome marketing!
Frankly a number of the bioinformaticians already working there have similar skills to the interviewees and often worse communication skills, so maybe I'm simply setting my standards too high.
I really hope this situation improves when Perl 6 goes public though I'm sure it'll take longer to become common parlance. As long as it happens before those smug RoR types take over the world I'll be happy ;) (0 comments)
It's been astonishing, not in a good way, how many have been unable to demonstrate use of hashes. Some have been too scared of them (their words, not mine) and some have never felt the need. For those of you who aren't Perl programmers, hashes (aka associative arrays) are a pretty crucial feature of the language and fundamental to its OO implementation.
Now I program in Perl sometimes more than 7-8 hours a day. For many years this also involved reworking other people's code. I can very easily say that if you claim to be a Perl programmer and have never used hashes then you're not going to get a Perl-related job because of your technical skills. With a good, interactive and engaging personality and a desire for self-improvement you might get away with it, but certainly not on technical merit.
It's also quite worrying how many of these interviewees are unable to describe the basics of object-oriented programming yet have, for example, developed and sold a commercial ERP system, presumably for big bucks. Man, these people must have awesome marketing!
Frankly a number of the bioinformaticians already working there have similar skills to the interviewees and often worse communication skills, so maybe I'm simply setting my standards too high.
I really hope this situation improves when Perl 6 goes public though I'm sure it'll take longer to become common parlance. As long as it happens before those smug RoR types take over the world I'll be happy ;) (0 comments)
Steve Souders - Yahoo Performance - High Performance websites
Posted by rmp at 12:13 3rd Oct 2007
The gist of this presentation was to perform accurate, scientific benchmarking of front- vs. back-end overheads up-front as a developer. This should be common-sense but I don't know *anyone* amongst my colleagues who does this proactively. It's almost always reactive as a result of performance problems post-release.
Steve presented a series of guidelines on streamlining content with the aim of improving performance. Some of these conflict a little in terms of common design practices (e.g. use of javascript/css frameworks). These guidelines are more-or-less what the YSlow Firebug plugin incorporates and in a nutshell look something like this:
fewer http requests
use a cdn
add an expires header
gzip components
put stylesheets at the top
move scripts to the bottom
avoid css expressions
make js and css external
reduce dns lookups
minify js
avoid redirects
remove duplicate scripts
configure etags
make ajax cacheable
split static content across multiple domains
reduce the size of cookies
host static content on a different domain
minify css
avoid iframes
Ref: YSlow firebug plugin! Book: High Performance Websites Blog: YUIBlog, YDNBlog (0 comments)
Steve presented a series of guidelines on streamlining content with the aim of improving performance. Some of these conflict a little in terms of common design practices (e.g. use of javascript/css frameworks). These guidelines are more-or-less what the YSlow Firebug plugin incorporates and in a nutshell look something like this:
fewer http requests
use a cdn
add an expires header
gzip components
put stylesheets at the top
move scripts to the bottom
avoid css expressions
make js and css external
reduce dns lookups
minify js
avoid redirects
remove duplicate scripts
configure etags
make ajax cacheable
split static content across multiple domains
reduce the size of cookies
host static content on a different domain
minify css
avoid iframes
Ref: YSlow firebug plugin! Book: High Performance Websites Blog: YUIBlog, YDNBlog (0 comments)
Heidi Pollock (BluePulse) - Taking Your Application Mobile
Posted by rmp at 17:40 3rd Oct 2007
Interesting resume of experience in terms of mobile devices and development covering Yahoo and Twitter as a background to the "in my experience"-style points.
The biggest growth in mobile use is in places like Africa, far east on a range of low-end, (ugly) phones. A big misconception about mobile web use is that most users are high end and this doesn't hold up in real experience.
About 3000 phones, all with different browsers - every one's unique. So baseline targetted is 176px and 10k page weight limit.
Ref: WURFL
e.g. Arguably the most popular is the Motorola v3 8 lines * 30 characters. For multilingual sites, e.g. German, words can become unworkably long.
Code should be limited to basic XHTML Mobile 1.0 and basic CSS. WML is unworkable (memory limits etc.) unless you know your targetted community is largely WML. Notably missing from XHTML Mobile are things like headings and lists. Semantic Web ideals tend not to stand up. As CSS doesn't work all the time then it can't be relied upon to do things like unindenting lists.
Recommended, appropriate tools: Firefox extensions: Modify headers, user agent switcher, WML Browser, XHTML Mobile Profile.
Users are bored or in need. Preserve your brand with logo, colours & copy. Navigation links are overrated - replace dropdowns with search/autocomplete etc.
Development tips: Target a device list; think like a phone; learn to live with it; mobile acid test http://jwtmp.com/a (0 comments)
The biggest growth in mobile use is in places like Africa, far east on a range of low-end, (ugly) phones. A big misconception about mobile web use is that most users are high end and this doesn't hold up in real experience.
About 3000 phones, all with different browsers - every one's unique. So baseline targetted is 176px and 10k page weight limit.
Ref: WURFL
e.g. Arguably the most popular is the Motorola v3 8 lines * 30 characters. For multilingual sites, e.g. German, words can become unworkably long.
Code should be limited to basic XHTML Mobile 1.0 and basic CSS. WML is unworkable (memory limits etc.) unless you know your targetted community is largely WML. Notably missing from XHTML Mobile are things like headings and lists. Semantic Web ideals tend not to stand up. As CSS doesn't work all the time then it can't be relied upon to do things like unindenting lists.
Recommended, appropriate tools: Firefox extensions: Modify headers, user agent switcher, WML Browser, XHTML Mobile Profile.
Users are bored or in need. Preserve your brand with logo, colours & copy. Navigation links are overrated - replace dropdowns with search/autocomplete etc.
Development tips: Target a device list; think like a phone; learn to live with it; mobile acid test http://jwtmp.com/a (0 comments)
Dave Morin (Facebook) - The story behind the Facebook platform
Posted by rmp at 11:57 4th Oct 2007
Dave's engaging and talking fast so more commentary and less transcription this time.
Facebook is an impressive example of social networking success. Apparently incredible growth, doubling every 6 months with near 50% users returning every day.
Facebook is about enabling mapping of the social graph. The power of the social graph enables near viral transfer of functionality and features. Facebook photos has nearly twice the traffic of all the other photo sites combined.
The Facebook platform aims to allow developers to create applications which leverage and add context to the social graph via deep integration with the platform. Like so many other apps presenting here, it all hinges on open APIs and transparent, open data access.
The Facebook opportunity chain looks something like this: Innovation, growth, engagement, monetisation. (0 comments)
Facebook is about enabling mapping of the social graph. The power of the social graph enables near viral transfer of functionality and features. Facebook photos has nearly twice the traffic of all the other photo sites combined.
The Facebook platform aims to allow developers to create applications which leverage and add context to the social graph via deep integration with the platform. Like so many other apps presenting here, it all hinges on open APIs and transparent, open data access.
The Facebook opportunity chain looks something like this: Innovation, growth, engagement, monetisation. (0 comments)
7 utilities for improving application quality in Perl
Posted by rmp at 23:10 8th Oct 2007
I'd like to share with you a list of what are probably my top utilities for improving code quality (style, documentation, testing) with a largely Perl flavour. In loosely important-but-dull to exciting-and-weird order...
Test::More . Billed as yet another framework for writing test scripts Test::More extends Test::Simple and provides a bunch of more useful methods beyond Simple's ok(). The ones I use most being use_ok() for testing compilation, is() for testing equality and like() for testing similarity with regexes.
ExtUtils::MakeMaker . Another one of Mike Schwern's babies, MakeMaker is used to set up a folder structure and associated 'make' paraphernalia when first embarking on writing a module or application. Although developers these days tend to favour Module::Build over MakeMaker I prefer it for some reason (probably fear of change) and still make regular mileage using it.
Test::Pod::Coverage - what a great module! Check how good your documentation coverage is with respect to the code. No just a subroutine header won't do! I tend to use Test::Pod::Coverage as part of...
Test::Distribution . Automatically run a battery of standard tests including pod coverage, manifest integrity, straight compilation and a load of other important things.
perlcritic, Test::Perl::Critic . The Perl::Critic set of tools is amazing. It's built on PPI and implements the Perl_Best_Practices book by Damien Conway. Now I realise that not everyone agrees with a lot of what Damien says but the point is that it represents a standard to work to (and it's not that bad once you're used to it). Since I discovered perlcritic I've been developing all my code as close to perlcritic -1 (the most severe) as I can. It's almost instantly made my applications more readable through systematic appearance and made faults easier to spot even before Test::Perl::Critic comes in.
Devel::Cover . I'm almost ashamed to say I only discovered this last week after dipping into Ian Langworthy and chromatic's book 'Perl Testing'. Devel::Cover gives code exercise metrics, i.e. how much of your module or application was actually executed by that test. It collates stats from all modules matching a user-specified pattern and dumps them out in a natty coloured table, very suitable for tying into your CI system.
Selenium . Ok, not strictly speaking a tool I'm using right this minute but it's next on my list of integration tools. Selenium is a non-interactive, automated, browser-testing framework written in Javascript. This tool definitely has legs and it seems to have come a long way since I first found it in the middle of 2006. I'm hoping to have automated interface testing up and running before the end of the year as part of the Perl CI system I'm planning on putting together for the new sequencing pipeline. (0 comments)
Test::More . Billed as yet another framework for writing test scripts Test::More extends Test::Simple and provides a bunch of more useful methods beyond Simple's ok(). The ones I use most being use_ok() for testing compilation, is() for testing equality and like() for testing similarity with regexes.
ExtUtils::MakeMaker . Another one of Mike Schwern's babies, MakeMaker is used to set up a folder structure and associated 'make' paraphernalia when first embarking on writing a module or application. Although developers these days tend to favour Module::Build over MakeMaker I prefer it for some reason (probably fear of change) and still make regular mileage using it.
Test::Pod::Coverage - what a great module! Check how good your documentation coverage is with respect to the code. No just a subroutine header won't do! I tend to use Test::Pod::Coverage as part of...
Test::Distribution . Automatically run a battery of standard tests including pod coverage, manifest integrity, straight compilation and a load of other important things.
perlcritic, Test::Perl::Critic . The Perl::Critic set of tools is amazing. It's built on PPI and implements the Perl_Best_Practices book by Damien Conway. Now I realise that not everyone agrees with a lot of what Damien says but the point is that it represents a standard to work to (and it's not that bad once you're used to it). Since I discovered perlcritic I've been developing all my code as close to perlcritic -1 (the most severe) as I can. It's almost instantly made my applications more readable through systematic appearance and made faults easier to spot even before Test::Perl::Critic comes in.
Devel::Cover . I'm almost ashamed to say I only discovered this last week after dipping into Ian Langworthy and chromatic's book 'Perl Testing'. Devel::Cover gives code exercise metrics, i.e. how much of your module or application was actually executed by that test. It collates stats from all modules matching a user-specified pattern and dumps them out in a natty coloured table, very suitable for tying into your CI system.
Selenium . Ok, not strictly speaking a tool I'm using right this minute but it's next on my list of integration tools. Selenium is a non-interactive, automated, browser-testing framework written in Javascript. This tool definitely has legs and it seems to have come a long way since I first found it in the middle of 2006. I'm hoping to have automated interface testing up and running before the end of the year as part of the Perl CI system I'm planning on putting together for the new sequencing pipeline. (0 comments)
Development Communications
Posted by rmp at 23:46 3rd Mar 2008
For a while now, more or less since I switched teams (from Core Web to Sequencing Informatics) I've wanted to write more about the work we do at Sanger. There's so much of it which is absolute cutting edge research and a very large proportion of that is poorly communicated both inside and outside the institute. Most of it's biology of course, which I know little about, and couldn't discuss in detail, GCSE being the furthest I took things in that direction.
However some of the great advances have been in big IT. We're in the same ballpark as CERN's high-energy physics and NASA's astronomical data. Technology is something I understand and /can/ talk about here.
So... I run the new sequencing technology pipeline development team. This means I and my team are responsible for ensuring efficient use of the Sanger's heavy investment in massively parallel sequencing instruments, primarily 28 Illumina Genome Analyzers. To do this we have a farm of 608 cores, a mix of 4- and 8-core Opteron blades with 8Gb RAM and a 320Tb shared Lustre filesystem. It seems to be becoming easy for users and administrators at Sanger to toss these figures around but the truth of the matter is that whilst this kit fits in only a handful of racks, it's still a pretty big deal.
The blades run linux, Debian Etch to be precise. The Illumina-distributed analysis pipeline (itself a mix of Perl, Python and C++) is held together with Perl applications (web and batch) which also cooperate RESTfully with a series of Rails LIMS applications developed by the Production Software team.
Roughly a terabyte of image data is spun off each of the 28 instruments every 2-3 days. The images are stacked and aligned and sequences are basecalled from spot intensities. These short reads are then packaged up with quality values for each base and dropped into approximately 100Mb compressed result files ready for further secondary analysis (e.g. SNP-calling).
More to come later but for now the take-home message is that the setup we're using is in my opinion a fair triumph, and definitely one to be proud of. It's been a (fairly) harmonious marriage of tremendous hardware savvy from the systems group and the rapid turnaround of agile software development from Sequencing Informatics, of which I'm pleased to be a part. (0 comments)
However some of the great advances have been in big IT. We're in the same ballpark as CERN's high-energy physics and NASA's astronomical data. Technology is something I understand and /can/ talk about here.
So... I run the new sequencing technology pipeline development team. This means I and my team are responsible for ensuring efficient use of the Sanger's heavy investment in massively parallel sequencing instruments, primarily 28 Illumina Genome Analyzers. To do this we have a farm of 608 cores, a mix of 4- and 8-core Opteron blades with 8Gb RAM and a 320Tb shared Lustre filesystem. It seems to be becoming easy for users and administrators at Sanger to toss these figures around but the truth of the matter is that whilst this kit fits in only a handful of racks, it's still a pretty big deal.
The blades run linux, Debian Etch to be precise. The Illumina-distributed analysis pipeline (itself a mix of Perl, Python and C++) is held together with Perl applications (web and batch) which also cooperate RESTfully with a series of Rails LIMS applications developed by the Production Software team.
Roughly a terabyte of image data is spun off each of the 28 instruments every 2-3 days. The images are stacked and aligned and sequences are basecalled from spot intensities. These short reads are then packaged up with quality values for each base and dropped into approximately 100Mb compressed result files ready for further secondary analysis (e.g. SNP-calling).
More to come later but for now the take-home message is that the setup we're using is in my opinion a fair triumph, and definitely one to be proud of. It's been a (fairly) harmonious marriage of tremendous hardware savvy from the systems group and the rapid turnaround of agile software development from Sequencing Informatics, of which I'm pleased to be a part. (0 comments)
ClearPress-146
Posted by rmp at 23:15 29th Apr 2008
Latest release of ClearPress (v146) out to the CPAN yesterday. The ClearPress data model now implements belongs_to_through, belongs_to, has_many and has_many_through entity relationships for all you ActiveRecord lovers.
Two ClearPress-derived projects are using a half-decent test fixture system. It's really making a big difference to the development of both DECIPHER and NPG so I'm planning to bundle what can be bundled with an upcoming release. (0 comments)
Two ClearPress-derived projects are using a half-decent test fixture system. It's really making a big difference to the development of both DECIPHER and NPG so I'm planning to bundle what can be bundled with an upcoming release. (0 comments)
Atom
