Sites, Applications, Solutions since 1995

Psyphi Blog v5

[Latest Entries] [Entries by Author] [Entries by Tag]
Apache Forward-Proxy REMOTE_ADDR propagation Posted by rmp at 10:57 30th Sep 2008 I had an interesting problem this morning with the Apache forward-proxy supporting the WTSI sequencing farm.

It would be useful for the intranet service for tracking runs to know which (GA2) sequencer is requesting pages but because they're on a dedicated subnet they have to use a forward-proxy for fetching pages (and then only from intranet services).

Now I'm very familiar using the X-Forwarded-For header and HTTP_X_FORWARDED_FOR environment variable (and their friends) which do something very similar for reverse-proxies but forward-proxies usually want to disguise the fact there's an arbitrary number of clients behind them, usually with irrelevant RFC1918 private IP addresses too.

So what I want to do is slightly unusual - take the remote_addr of the client and stuff it into a different header. I could use X-Forwarded-For but it doesn't feel right. Proxy-Via is also not right here as that's really for the proxy servers themselves. So, I figured mod_headers on the proxy would allow me to add additional headers to the request, even though it's forwarded on. Also following a tip I saw here using my favourite mod_rewrite and after a bit of fiddling I can up with this:

#########
# copy remote addr to an internal variable
#
RewriteEngine	On
RewriteCond	%{REMOTE_ADDR}		(.*)
RewriteRule	.*	-		[E=SEQ_ADDR:%1]
#########
# set X-Sequencer header from the internal variable
#
RequestHeader	set	X-Sequencer	%{SEQ_ADDR}e
These rules sit in the container managing my proxy, after ProxyRequests and ProxyVia and before a small set of ProxyMatch restrictions.

The RewriteCond traps the contents of the REMOTE_ADDR environment variable (it's not an HTTP header - it comes from the end of the network socket as determined by the server). The RewriteRule unconditionally copies the last RewriteCond match %1 into a new environment variable SEQ_ADDR. After this mod_headers sets the X-Sequencer request header (for the proxied request) to the value of the SEQ_ADDR environment variable.

This works very nicely though I'd have hoped a more elegant solution would be this:

RequestHeader set X-Sequencer %{REMOTE_ADDR}e
but this doesn't seem to work and I'm not sure why. Anyway, by comparing $ENV{HTTP_X_SEQUENCER} to a shared lookup table, the sequencing apps running on the intranet can now track which sequencer is making requests. Yay!
(0 comments)

14 Mac OSX Apps I can't do without Posted by rmp at 11:49 16th May 2008 I recently did a clean installation of my ppc powerbook, about 3 or 4 years' old. I was able to surreptitiously acquire a copy of Leopard , not yet officially supported by the systems group at Sanger . Moving from Panther this was a bit of a jump but everything went pretty smoothly. I chose to do a clean installation rather than an upgrade because I had so much cruft on the laptop I only wanted to be left with the things I actually used.

Reinstalling my non-O/S applications afterwards (especially those which I'd been version-marooned on, not being on the more common Tiger release) made me really appreciate the ones I actually use. In the order they're on my taskbar (no particular order) here are the apps I can't do without:

Microsoft_Remote_Desktop

Great application - unimaginably useful - I remote administer most of my extended family's computers with this (over SSH) now. The ability to mount local drives remotely is a blessing and it generally performs better than VNC which I also use for administering older PCs without the RDP service (primary WinXP Home).

Adium

In my opinion the best unified instant messenger client out there. It even comes with Twitter support.

Skype

Of course - everyone should be using something like this. I previously had X-Lite too but didn't tend to use it very much, most of my social network being on Skype.

Colloquy

A fantastic IRC client. I've fond memories of mIRC on Windows, BitchX and X-Chat being the other clients I use occasionally. I usually find myself in Colloquy for my IRC needs these days.

Firefox

Primarily for development. I still find it doesn't work briliantly for regular surfing but the development tools are unparalleled.

Camino

My day to day browser - the same engine as Firefox under the hood but better-integrated with the OSX control panel and preferences.

Audacity

The best, cross-platform, audio editor out there. WAV and MP3 support amongst other things. Simple and easy to use.

GIMP

Probably the closest free thing to Photoshop. Enough said..

VLC

The VideoLAN client is my preferred video player. It supports all the formats I've ever thrown at it and has shoutcast support amongst other things, too. MPlayer is another one I use occasionally too as I find the subtitle support better.

humaxGui

At home I have a Humax PVR and this app provides file transfer on/off it. It's *really* slow but it works just like an FTP client.

Aquamacs_Emacs

The time I don't spend in a web browser or terminal I spend in Aquamacs. In my opinion the port of Emacs to OSX with the best spread of features.

Neooffice

I used to prefer OpenOffice but it had to run under X. I started using Neooffice as after the reinstall I noticed it had been ported to Openoffice 2 which has much better foreign file support. Neooffice runs natively (though it is Java).

SSH Keychain

Open virtually all the time I manage all of my remote work and administration via SSH keychain, particularly the tunnel management. It could all be done using command-line ssh and the .ssh/config if pushed but I like the auto-restart and convenience of having it in a desktop application.

Quicksilver

Shortcuts for everything and everything via its shortcut. Quicksilver is impressive and I know I've hardly scratched the surface with the things it can do.

Looking through that list there aren't many other applications I have which I couldn't do without - MacPorts is worth a mention, as are the MySQL_GUI_Tools and Processing plus various other drivers and applications

(0 comments)

Massively Parallel Sequence Archive Posted by rmp at 00:02 30th Apr 2008 For some time now at Sanger we've been looking at the problems and solutions involved with building services supporting what are likely to become some of the biggest databases on the planet. The biggest problem is there aren't too many people doing this kind of thing and who are willing to talk about it.

The data we're storing falls into two categories. Short Read Format (SRF) files containing sequence, quality and trace (~10Gb per lane) data and FastQ containing sequence and quality (~1Gb per lane).

Our requirements for these data are fundamentally for two different systems. One is a long-term archival system for SRF, the responsibility for which will eventually be shifted to the EBI . The second is, for me at least, the more interesting system -

The short-term storage of reads and qualities (and possibly also for selected alignments) isn't the biggest problem - that honour is left to the fast, parallel retrieval of the same. The underlying data store needs to grow at a respectable 12TB per year and serve maybe a hundred simultaneous users requesting up to 1000 sequences per second.

Transfer times for reads are small but as a result are disproportionately affected by artefacts like TCP setup times, HTTP header payloads and certainly index seek times.

We're looking at a few horizontally-scaling solutions for performing these kinds of jobs - the most obvious are tools like MapReduce and equivalents like Hadoop running with Nutch . My personal favourite and the one I'm holding out for is MogileFS from the same people who brought you Memcached . Time to get benchmarking!

Updated: Loved this via Brad
(0 comments)

ClearPress-146 Posted by rmp at 23:15 29th Apr 2008 Latest release of ClearPress (v146) out to the CPAN yesterday. The ClearPress data model now implements belongs_to_through, belongs_to, has_many and has_many_through entity relationships for all you ActiveRecord lovers.

Two ClearPress-derived projects are using a half-decent test fixture system. It's really making a big difference to the development of both DECIPHER and NPG so I'm planning to bundle what can be bundled with an upcoming release.
(0 comments)

History Meme Posted by rmp at 14:55 17th Apr 2008 On the laptop:

history|awk '{print $2}'|sort|uniq -c|sort -rn|head
135 prove
 80 svn
 71 make
 27 cover
 27 HARNESS_PERL_SWITCHES=-MDevel::Cover
 23 scripts/yaml_dumper
 22 open
 21 perl
 18 ls
 11 pwd
On the workstation:

history|awk '{print $2}'|sort|uniq -c|sort -rn|head
  149 ls
   51 perl
   45 svn
   42 cd
   24 tail
   15 df
   14 rm
   11 less
   11 cat
    9 grep
Very satisfying to see that about half the top 10 things on my laptop are related to testing. Sadly the same isn't true on my workstation. yaml_dumper dumps data from a mysql database in YAML format for use in ClearPress fixtures.

Matt prompted me. I tag Andy and Jody to run with it.
(0 comments)

Infrared Pen MkI Posted by rmp at 00:06 2nd Apr 2008 So, this evening, not wanting to spend more time on the computer (having been on it all day for day 2 of DB's Rails course) I spent my time honing my long-unused soldering skills and constructing the first revision of my infrared marker pen for the JCL-special Wiimote Whiteboard.

The raw materials:

http://://psyphi.net/gfx/ir_pen/IMG_0155.JPG

Close-up of the LEDs I'm removing:

http://://psyphi.net/gfx/ir_pen/IMG_0157.JPG

The finished article:

http://://psyphi.net/gfx/ir_pen/IMG_0159.JPG

Close-up of switch detail:

http://://psyphi.net/gfx/ir_pen/IMG_0160.JPG

Activated under the IR-sensitive digital camera:

http://://psyphi.net/gfx/ir_pen/IMG_0161.JPG

I must say it's turned out ok. I didn't have any spare small switches so went for a bit of wire with enough springiness in it. On the opposite side of the makeshift switch is a retaining screw for holding the batteries in. I'm using two old AAA batteries (actually running about 2.4V according to the meter) and no resistor in series. The LED hasn't burnt out yet!

To stop the pen switching on when not in use I slip a bit of electrical tape between the contacts. Obviously you can't tell when it's on unless you put in another, perhaps miniature, indicator visible LED.

It all fits together quite nicely though the retaining screw is too close for the batteries and has forced the back end out a bit - that's easy to fix.

As I'm of course after multitouch I'll be building the MkII pen soon with the other recovered LED!

(0 comments)

Web frameworking Posted by rmp at 23:47 31st Mar 2008 It seems to be the wrong time to be reading such things, but over on InfoQ there's a nice_article introducing web development of RESTful_services using Erlang and the Yaws high performance web server.

I say "the wrong time" as this week has kicked off the "Advancing with Rails" course by David_A._Black of Ruby_Power_and_Light fame. The course is fairly advanced in terms of required rails knowledge so it's a bit of a baptism by fire for me and a few others having never written any Ruby before.

Rails is proving moderately easy to pick up but as I've remarked to a couple of people, it doesn't seem any easier coding with Rails than with Perl. Perhaps it's because I've never done it before but I reckon it's a lot harder spending my time figuring out what the heck DHH meant something to do than it is doing it myself.

Even though it's nowhere near as mature, I do reckon my ClearPress framework has a lot going for it - it's pretty feature-complete in terms of ORM, views and templating ( TT2 ). It has similar convention over configuration features meaning it's not designed for plugging in other alternative layers but it is absolutely possible to do (and I suspect without as much effort as is required in Rails). I still need to iron out some wrinkles in the autogenerated code from the application builder and provide some default authorisation and authentication mechanisms, some of which may come in the next release. But in the meantime it's easy to add these features, which is exactly what we've done for the new sequencing run tracking app, NPG to tie it to the WTSI website single sign on (MySQL and LDAP under the hood).

(0 comments)

All Leoparded Up Posted by rmp at 23:54 28th Mar 2008 Hurray! I managed to snag an OSX 10.5 installation disk today and took the opportunity to upgrade my 10.3.9 PPC Powerbook and skip 10.4 Tiger completely. Apparently this was a bit of a feat to perform at Sanger where 10.5 has hitherto been unsupported.

In the "Rails Club" meeting in preparation for Dave Black's Rails course next week it was pretty obvious that Leopard will spread quickly now those other unspeakable Mac users know it's out in the wild.

So the installation took about an hour and a half wallclock time, or about 30 minutes Microsoft time - too many bars with "1 minute remaining" for ten minutes. It all went pretty smoothly though I did opt for a full install rather than an upgrade. Unfortunately I've had to spend the best part of the last 6 hours installing DarwinPorts, gem updating to Rails 2 and reinstalling the long-awaiting 10.5 versions of all my favourite apps - AquaMacs, Adium, CotVNC, VLC, Camino, Skype, Colloquy, Firefox and a few others. Plus of course setting it all up just the way I like it.

Initial impressions are that it's rather shiny and pleasant to use - I like Spaces & Dashboard (don't forget that wasn't in my old 10.3) and overall the setup definitely seems faster - surprisingly noticable when compiling and installing things from CPAN. Can't wait to try out Time Machine over the weekend!

(0 comments)

interactivity experiments Posted by rmp at 23:08 26th Mar 2008 For a few months now I've been watching utterly compelling and inspirational HCI things like these: . I know most of them are a bit dated now, in fact from as far back as 2006, but they're still jaw-droppingly awesome.

So in a fit of inspiration and weekend project madness and frustration at the clumsiness of a regular touch-screen LCD I've been picking up things from Ebay and fishing around in my boxes of knackered electronics to find components suitable for assembling one or two of these sorts of devices.

There are two types of these interactive interfaces - the JCL-style wiimote-based ones which use bright sources of infrared, either transmitted or reflected and the bluetooth Nintendo controller; and the second is the Jeff Han / Perceptive_Pixel -style of frustrated total internal reflection or FTIR where infrared is reflected out of a planar surface and is picked up by a camera similar to the one in the wiimote.

Anyway, costs so far:

Wiimote: ~£28; old infrared remote control for filters & LEDs: free;

Philips bSure XG2 projector: ~£180; Philips SPC900NC: ~£30; 4.3mm CCTV lens (no IR filter): ~$12

I've been having trouble making the bluetooth pairing for the wiimote work correctly under OSX 10.3.9 - I think it's about time I had the laptop upgraded - it's work's after all. I think that should fix it for OSX, but I have had some success - this evening under Ubuntu with the Bluez_stack and libwiimote I've been able to capture events from the wiimote including spots using the IR camera. I've also been successful using camstream with the SPC900NC and CCTV lens to capture spots from working TV remotes, both directly and reflected from a wall - it's surprisingly effective!

More to come - next with the wiimote interface I need to build my whiteboard-marker battery-driven IR LED pen. Next with the FTIR display I need to experiment with a few different types of perspex and rear-reflection material. I *really* want to be able to perform pattern recognition similar to the reactable and I don't think tracing paper will work for rear-projection. Knowing next to nothing about plastics technology I think I'd like to try frosted acrylic first, or maybe just finely-sanded regular acrylic. Ebay here I come again!



(0 comments)

Development Communications Posted by rmp at 23:46 3rd Mar 2008 For a while now, more or less since I switched teams (from Core Web to Sequencing Informatics) I've wanted to write more about the work we do at Sanger. There's so much of it which is absolute cutting edge research and a very large proportion of that is poorly communicated both inside and outside the institute. Most of it's biology of course, which I know little about, and couldn't discuss in detail, GCSE being the furthest I took things in that direction.

However some of the great advances have been in big IT. We're in the same ballpark as CERN's high-energy physics and NASA's astronomical data. Technology is something I understand and /can/ talk about here.

So... I run the new sequencing technology pipeline development team. This means I and my team are responsible for ensuring efficient use of the Sanger's heavy investment in massively parallel sequencing instruments, primarily 28 Illumina Genome Analyzers. To do this we have a farm of 608 cores, a mix of 4- and 8-core Opteron blades with 8Gb RAM and a 320Tb shared Lustre filesystem. It seems to be becoming easy for users and administrators at Sanger to toss these figures around but the truth of the matter is that whilst this kit fits in only a handful of racks, it's still a pretty big deal.

The blades run linux, Debian Etch to be precise. The Illumina-distributed analysis pipeline (itself a mix of Perl, Python and C++) is held together with Perl applications (web and batch) which also cooperate RESTfully with a series of Rails LIMS applications developed by the Production Software team.

Roughly a terabyte of image data is spun off each of the 28 instruments every 2-3 days. The images are stacked and aligned and sequences are basecalled from spot intensities. These short reads are then packaged up with quality values for each base and dropped into approximately 100Mb compressed result files ready for further secondary analysis (e.g. SNP-calling).

More to come later but for now the take-home message is that the setup we're using is in my opinion a fair triumph, and definitely one to be proud of. It's been a (fairly) harmonious marriage of tremendous hardware savvy from the systems group and the rapid turnaround of agile software development from Sequencing Informatics, of which I'm pleased to be a part.
(0 comments)

ClearPress-99 Posted by rmp at 22:10 3rd Mar 2008 Last week saw the latest release of ClearPress, http://search.cpan.org/~rpettett/ClearPress/ . ClearPress is a basic, RESTful, MVC Perl application framework I've developed in tandem with my work at the Sanger Institute http://www.sanger.ac.uk/ .

The original aim of ClearPress was to provide a RESTful MVC framework which integrated with the Sanger's website single sign on. Having proved its usefulness with the first release of the tracking system I developed, ClearPress was spun off into a project of its own together with dependencies abstracted out of the Sanger-specific environment.

ClearPress sports a MySQL-backed ORM, automatic, extensible content-negotiation and easily-templated HTML, XML, Atom, RSS, JSON, iCal, YAML, PNG and other format views. It can run standalone, as CGI or under ModPerl::Registry.

I'm using ClearPress in most of my projects these days, both work and non-work. Blogs, document management, laboratory tracking and various other standalone apps. Hopefully soon there'll even be a dedicated site together with examples. For now you can check out the application-builder and example distributed with the package.
(0 comments)

The Importance of Profiling Posted by rmp at 21:23 10th Feb 2008 I've worked as a software developer and worked with teams of software developers for around 10 years now, Many of those whom I've worked with have earned my trust and respect in relation to development and testing techniques. Frustratingly however it's still with irritating regularity that I hear throw-away comments bourne of uncertainty and ignorance.

A couple of times now I've specifically been told that "GD makes my code go slow". Now for those of you not in the know GD (actually specifically Lincoln Stein's GD.pm in perl) is a wrapper around Tom Boutell's most marvellous libgd graphics library. The combination of these two has always performed excellently for me and never been the bottleneck in any of my applications. The applications in question are usually database-backed web applications with graphics components for plotting genomic features or charts of one sort or another.

As any database-application developer will tell you, the database, or network connection to the database is almost always the bottleneck in an application or service. Great efforts are made to ensure database services scale well and perform as efficiently as possible, but even after these improvements are made they usually simply delay the inevitable.

Hence my frustration when I hear that "GD is making my (database) application go slow". How? Where? Why? Where's the proof? It's no use blaming something, a library in this case, that's out of your control. It's hard to believe a claim like that without some sort of measurement.

So.. before pointing the finger, profile the code and make an effort to understand what the profiler is doing. In database applications profile your queries - use EXPLAIN, add indices, record SQL transcripts and time the results. Then profile the code which is manipulating those results.

Once the results are in of course, concentrate in the first instance on the parts with the most impact (e.g. 0.1 second off each iteration of a 1000x loop rather than 1 second from /int main/ ) - the low hanging fruit. Good programmers should be relatively lazy and speeding up code with the least amount of effort should be commonsense.
(0 comments)

Great pieces of code Posted by rmp at 15:25 3rd Feb 2008 A lot of what I do day-to-day is related to optimisation. Be it Perl code, SQL queries, Javascript or HTML there are usually at least a couple of cracking examples I find every week. On Friday I came across this:

SELECT cycle FROM goldcrest WHERE id_run = ?


This query is being used to find the number of the latest cycles (between 1 and 37 for each id_run) in a near-real-time tracking system and is used several times whenever a run report is viewed.

EXPLAIN SELECT cycle FROM goldcrest WHERE id_run = 231;
+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+
| id | select_type | table     | type | possible_keys | key     | key_len | ref   | rows   | Extra       |
+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+
|  1 | SIMPLE      | goldcrest | ref  | g_idrun       | g_idrun |       8 | const | 262792 | Using where | 
+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+


In itself this would be fine but the goldcrest table in this instance contains several thousand rows for each id_run. So, for id_run, let's say, 231 this query happens to return approximately 588,000 rows to determine that the latest cycle for run 231 is the number 34.

To clean this up we first try something like this:

SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = ?


which still scans the 588000 rows (keyed on id_run incidentally) but doesn't actually return them to the user, only one row containing both values we're interested in. Fair enough, the CPU and disk access penalties are similar but the data transfer penalty is significantly improved.

Next I try adding an index against the id_run and cycle columns:

ALTER TABLE goldcrest ADD INDEX(id_run,cycle);
Query OK, 37589514 rows affected (23 min 6.17 sec)
Records: 37589514  Duplicates: 0  Warnings: 0


Now this of course takes a long time and, because the tuples are fairly redundant, creates a relatively inefficient index, also penalising future INSERTs. However, casually ignoring those facts, our query performance is now radically different:

EXPLAIN SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = 231;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL |    NULL | NULL | NULL | Select tables optimized away | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = 231;
+------------+------------+
| MIN(cycle) | MAX(cycle) |
+------------+------------+
|          1 |         37 | 
+------------+------------+
1 row in set (0.01 sec)


That looks a lot better to me now!

Generally I try to steer clear of the mysterious internal workings of database engines, but with much greater frequency come across examples like this:

sub clone_type {
  my ($self, $clone_type, $clone) = @_;
  my %clone_type;
  if($clone and $clone_type) {
    $clone_type{$clone} = $clone_type;
    return $clone_type{$clone};
  }
  return;
}


Thankfully this one's pretty quick to figure out - they're usually *much* more convoluted, but still.. Huh??

Pass in a clone_type scalar, create a local hash with the same name (Argh!), store the clone_type scalar in the hash keyed at position $clone, then return the same value we just stored.

I don't get it... maybe a global hash or something else would make sense, but this works out the same:

sub clone_type {
  my ($self, $clone_type, $clone) = @_;
  if($clone and $clone_type) {
    return $clone_type;
  }
  return;
}
and I'm still not sure why you'd want to do that if you have the values on the way in already.

Programmers really need to think around the problem, not just through it. Thinking through may result in functionality but thinking around results in both function and performance which means a whole lot more in my book, and incidentally, why it seems so hard to hire good programmers.
(0 comments)

OpenWRT WDS bridging Posted by rmp at 22:33 18th Dec 2007 I've had a pile of kit to configure recently for an office I've been setting up. Amongst the units I specified the second Linksys WRT54GL I've had the opportunity to play with.

My one runs White Russian but I took the plunge and went with the latest Kamikaze 7.09 release. It's a little different to what I'd fiddled with before but probably more intuitive to configure with files rather than nvram variables. I'm briefly going to describe how to configure a wired switch bridged to the wireless network running WDS to the main site router (serving DHCP and DNS).

From a freshly unpacked WRT54GL, connect the ethernet WAN uplink to your internet connection and one of the LAN downlinks to a usable computer. By default the WRT DHCPs the WAN connection and serves DHCP on the 192.168.1 subnet to its LAN.

Download http://downloads.openwrt.org/kamikaze/7.09/brcm-2.4/openwrt-wrt54g-2.4-squashfs.bin to the computer then login to the WRT on 192.168.1.1, default account admin/admin. Upload the image to the firmware upgrade form. Wait for the upload to finish and the router to reboot.

Once it's rebooted you may need to refresh the DHCP lease on the computer but the default subnet range is the same iirc. telnet to the router on the same address and login as root, no password. Change the password and the SSH service is enabled and telnet service disabled.

I personally prefer using the the x-wrt interface with the Zephyr theme so I install x-wrt by editing /etc/ipkg.conf and appending "src X-Wrt http://downloads.x-wrt.org/xwrt/kamikaze/7.09/brcm-2.4/packages". Back in the shell run "ipkg update ; ipkg install webif". Once completed you should be able to browse to the router's address (hopefully still 192.168.1.1) and continue the configuration. You may wish to install matrixtunnel for SSL support in the web administration interface.

I want to use this WRT both to extend the coverage of my client's office wireless network and to connect a handful of wired devices (1 PC, 1 Edgestore NAS and a NSLU2).

So step one is to assign the router a LAN address on my existing network. The WAN port is going to be ignored (although bridging that in as well is probably possible too). In X-wrt under 'Networks' I set a static IP of 192.168.1.253 , netmask of 255.255.255.0 and a router of 192.168.1.254 - the existing 'main router' BT homehub serving the LAN and whose wireless we'll be bridging to. The LAN connection type is 'bridged'. DNS in this case is the same as the main router. I've left the WAN as DHCP for convenience though the plan is not to use it. Save the settings and apply.

Under 'Wireless' turn the radio on and set the channel to the same as the main router. Choose 'lan' to bridge the wireless network to, set mode to 'Access Point', WDS on, 'broadcast ESSID' to your personal preference (I set 'on') and AP isolation off. The ESSID itself needs to be the existing name for your network and encryption set appropriately to match. Save and apply.

Now the magic bit - I'm told this should go in the BSSID box which only seems to be present when mode is set to WDS. What needs to happen is that the WRT needs to know which existing AP to bridge to. Under the hood it's done using the command 'wlc wds main-ap-mac-address-here' and not having an appropriate text box to put it in it's almost always possible to fiddle with the startup file. It's a hack for sure but it seems to work ok for me!

Lo! A WDS bridge.

Update 2007-01-07: After installing the bridge on-site I had to reconfigure it in "Client" mode using the regular WDS settings as that seemed to be the only way to make it communicate with the Homehub. Pity - that way it doesn't extend the wireless range, just hooks up anything wired to it. It worked fine when I set it up talking to my wrt.
(0 comments)

What Can Bioinformaticians Learn from YouTube? Posted by rmp at 22:46 6th Nov 2007 Caught Matt's talk this morning at the weekly informatics group meetings -

There were general murmurings of agreement amongst the audience but nobody asking the probing questions I'd hope for as a measure of interestedness.

Matt touched upon microformats in all but name - I was really expecting a sell of http://bioformats.org/ , websites as APIs and RESTful web services in particular.

Whilst I'm inclined to agree that standardised, discoverable, reusable web services are largely the way forward (especially as it keeps me in work) I'm not wholly convinced they remove the problems associated with, for example, database connections, database-engine specific SQL, hostnames, ports, accounts etc.

My feeling is that all the problems associated with keeping track of your database credentials are replaced by a different set of problems, albeit more standardised in terms of network protocols in HTTP and REST/CRUD. We now run the risk that what's fixed in terms of network protocols is pushed higher up the stack and manifests as myriad web services, all different. All these new websites and services use different XML structures and different URL schemes. The XML structures are analogous to database table schema and the URL schemes akin to table or object names.

At least these entities are now discoverable by the end user/developer simply by using the web application - and there's the big win - transparency and discoverability. There's also the whole microformat affair - once these really start to take off there'll be all sorts of arguments about what goes into them, especially in domains like Bio and Chem, not covered by core formats like hCard. But that's something for another day.

More over at Green_Is_Good
(0 comments)

7 utilities for improving application quality in Perl Posted by rmp at 23:10 8th Oct 2007 I'd like to share with you a list of what are probably my top utilities for improving code quality (style, documentation, testing) with a largely Perl flavour. In loosely important-but-dull to exciting-and-weird order...

Test::More . Billed as yet another framework for writing test scripts Test::More extends Test::Simple and provides a bunch of more useful methods beyond Simple's ok(). The ones I use most being use_ok() for testing compilation, is() for testing equality and like() for testing similarity with regexes.

ExtUtils::MakeMaker . Another one of Mike Schwern's babies, MakeMaker is used to set up a folder structure and associated 'make' paraphernalia when first embarking on writing a module or application. Although developers these days tend to favour Module::Build over MakeMaker I prefer it for some reason (probably fear of change) and still make regular mileage using it.

Test::Pod::Coverage - what a great module! Check how good your documentation coverage is with respect to the code. No just a subroutine header won't do! I tend to use Test::Pod::Coverage as part of...

Test::Distribution . Automatically run a battery of standard tests including pod coverage, manifest integrity, straight compilation and a load of other important things.

perlcritic, Test::Perl::Critic . The Perl::Critic set of tools is amazing. It's built on PPI and implements the Perl_Best_Practices book by Damien Conway. Now I realise that not everyone agrees with a lot of what Damien says but the point is that it represents a standard to work to (and it's not that bad once you're used to it). Since I discovered perlcritic I've been developing all my code as close to perlcritic -1 (the most severe) as I can. It's almost instantly made my applications more readable through systematic appearance and made faults easier to spot even before Test::Perl::Critic comes in.

Devel::Cover . I'm almost ashamed to say I only discovered this last week after dipping into Ian Langworthy and chromatic's book 'Perl Testing'. Devel::Cover gives code exercise metrics, i.e. how much of your module or application was actually executed by that test. It collates stats from all modules matching a user-specified pattern and dumps them out in a natty coloured table, very suitable for tying into your CI system.

Selenium . Ok, not strictly speaking a tool I'm using right this minute but it's next on my list of integration tools. Selenium is a non-interactive, automated, browser-testing framework written in Javascript. This tool definitely has legs and it seems to have come a long way since I first found it in the middle of 2006. I'm hoping to have automated interface testing up and running before the end of the year as part of the Perl CI system I'm planning on putting together for the new sequencing pipeline.
(0 comments)

Leisa Reichelt (disambiguity.com) - Ambient Intimacy Posted by rmp at 13:18 5th Oct 2007 Leisa presents an enjoyable voyage through cognitive psychology and the social network scene. Makes me wish I'd taken more of the psych options as part of my computer science degree.

ref: http://graphpaper.com/

Personal information bandwidth & learning speed has increased. New, lightweight yet extremely powerful means of communication represent ambient intimacy - a personal social platform. This isn't one to one messaging or one to the masses broadcasting, it's pushing messages into a defined area (multicast if you will). It represents the creation of a techno-social system beyond personal interaction - a more continuous interpersonal awareness.

In his book, "Grooming, gossip and the evolution of language", Dunbar describes how better social understanding leads to evolutionary growth of brains, improvement of language and better flexibility when competing for shared resources (food, sex etc.).

This intercommunication is largely a phatic expressiveness for virtual spaces. In linguistics a phatic expression is one whose only function is to perform a social task.

The phrase "continual partial friendship" coined by David Weinberger describes the almost permanent interconnectedness and friendship users feel when part of a collective virtual community built on these sorts of communication media.

"It's not about being poked and prodded, it's about exposing more surface area for others to connect with" - Johnnie Moore

New media (mobile 'phones, the internet) overcome geographical dislocation.

But it's often a love/hate thing (ref: http://twitter.com/ ) and can also cause problems with cognitive dissonance with false human interaction. Interacting virtually the subconsciousness is devoid of its usual cues - facial expressions, tone of voice, body language, resulting in unnatural stress.

The other problem associated is information overload - "infomania dents IQ more than marijuana"

- anticipated reciprocity

- reputation

- sense of efficacy

- identification with a group

ref: tom coates' presentation on social software

It has been noted that a social networks' pooled knowledge makes the whole network grow smarter. I'd personally take this further and suggest that any open data, social or otherwise but particularly in scientific contexts, makes the network grow smarter. ref: PLoS

As developers we need to support ambient intimacy. Applications need to be sympathetic to the fact that we as people are easily distracted. They need to be undemanding but intrusive enough to increase awareness of events.

- keep it lightweight

- stay out of the way

- open your API

- portable social networks

- use the periphery - antithesis of classical interface development/design

- allow for time-shifting

ref: twitterific

(0 comments)

Erika Hall, (Mule Design) - Copy is Interface Posted by rmp at 00:24 5th Oct 2007 Erika outlines some dos and don'ts for those of us writing copy and building interfaces.

Gesture driven interfaces are coming but not for rich/dense data. People will want to access your application in new ways, so what does this mean for applications? Are you beginning to take device independence into account?

Pretty much everything is/has a text-based interface. We as users need to draw meaning from a stream of data.

How do users benefit? Clarity & understanding often develops from immediately interacting with the data.

How do developers benefit? User adoption & success

5 ways to get words right

- be authentic - a strong sense of service focus. Add the human touch

- be engaging - ref: http://schoolofeverything.com/ immediate clarity of offering. Involves elements of empathy with users.

- be specific - disambiguation of meaning ref: http://etsy.com/

- be appropriate - understand what the role of your application is in your users' lives. Use copy, tone & concepts to build rapport.

- be polite - as long as you're considerate and respectful of what users are coming to do, users can be very forgiving ref: http://feedburner.com/ ref: http://subtraction.com/ - social engineering and implicit standards through copy

8 kinds of bad

- don't be vague

- don't use unnatural language (e.g. banks wanting to 'expand your relationship')

- don't be passive (e.g. third-person)

- don't be too clever/cute

- don't be rude

- don't be oblivious to your surroundings - you don't know how people are going to be accessing your app

- don't be inconsistent, e.g. my vs. your

- don't be presumptuous

Take home:

You will still need designers.

"You are sociable and entertaining"
(0 comments)

Eric Rodenbeck, (Stamen Design) - Next Generation Visualisations Posted by rmp at 00:18 5th Oct 2007 Eric takes us through a heavily visual applications developed by Stamen in the last few years. All aim to map detailed data spaces that's to say structures which are too complicated for lists.

Eric and by extension Stamen see data visualisation as a medium. The data is mostly live, but when it's not it's either vast or deep.

Example: http://cabspotting.org/ showing something that's live contrasted with something that's historical. Cabspotting's animation of circulatory systems really set me thinking about how this sort of visualisation could be applied to biotech, PP interactions, gene ontologies, citations and real biophysical systems.

Example: Oakland crime: notice and explore interpretation of patterns. Built with {modest maps} framework ref: http://modestmaps.com/

Example: Digg labs: Swarm, Stack, Digg Spy, Ark. "Ambient engagement".

Example: Twitter blocks: 3-dimensional message space

Example: Real Estate flow: Trelia (sp?)

Be open to the process of exploration - start with the data, not preconceived views of what it should look like.

Stamen_Design

I loved the question from AbilityNet about whether Eric had thought about accessibility for these apps. It had him completely stumped and I have to admit adding serious accessibility to these apps, whilst being extremely cool would also be extremely difficult.

I'd have liked to see something more immersive and with a biotech or medical twist. I wish I'd had the chance to hook up with Eric to discuss efforts in this sort of sphere.
(0 comments)

John Aizen & Eran Shir (Dapper) - Practical Semantic Web (web plumbing 101) Posted by rmp at 00:08 5th Oct 2007 TBL once said "in the future everyone will write semantically correct websites" but the vision of a world of personalised agents has not come about. Largely things failed to take off because making things semantically correct is expensive - it requires effort. Luckily current web apps are changing this with APIs, content distribution & aggregation and meaningful search.

How has this come about?

- The Feed

- Light, easily adopted technologies: e.g. REST vs. SOAP; AJAX vs. Server-side; Microformats vs. RDF+OWL

- Increasing openness, encouraging mashups via APis and low-effort semantics

Introducing Dapper: Creating APIs for other websites, mostly community-generated. The users have the time and incentive. Dapper extensions can then be reused as services on top of other platforms, e.g. Pipes, Google gadgets & Facebook.

Example: Semantically linked advertising, e.g. a loaded shopping cart built on a recipe page.

Example: Meaningful search w/ results dissected enabling search, drill-down and filtering by automatically indexed categories

Dapper attempts to address the serious issues of fragility commonly associated with classical screen-scraping using elements of graph theory and community power.

Dapper also incorporates the gamut of CC licensing to better enable site-owners to control their content whilst boosting consumer confidence in reliability of data.
(0 comments)

Atom
10,000 brains for hire