Sites, Applications, Solutions since 1995

Psyphi Blog v5

[Latest Entries] [Entries by Author] [Entries by Tag]
Matt Mullenweg (WordPress) - The Architecture Behind WordPress.com Posted by rmp at 15:18 3rd Oct 2007 Scaling platform:

88m global uniques 1.5m blogs 215m pageviews/day

- 7 boxes, $1500/mo.

- 2 balancers, 2Gb memory, any disk, pound+wackamole+spread.

- 2 databases, 4Gb+ memory, fast disk (RAID), master+slave mysql, split read/write.

- 3 webs, fast cpu, 2Gb memory, litespeed or well-configured apache.

- Everything in subversion.

- Be stateless (shared nothing)

- Memcached

geo-targetting of dns, might as well use a CDN

single box 300 req/sec 29.5m/day

Scaling community:

Ref: sxsw presentation

Scaling business:

Tie revenue streams to cost scaling, e.g. pageviews rather than bloggers. Ads scale revenues with pageviews, effective CPM, low outlay. VIP class of users with advanced features, customisation etc.

Scaling people:

Hire people as good as, or better than you. Great people = Rich environment + worthwhile problems. 5 things to look for:

- Personality fit (when the shit hits the fan)

- Ability to learn (curiosity)

- Taste (can't be taught)

- Passion for the space

- Familiarity with current technologies

Don't put out too easily - don't hire if doubts exists

It's fun to know that the folks at wordpress manage deployment with ssh scripts running svn up.

I disagree with Matt's QA response suggesting that you should never do your own hosting. In principle that's ok for the large faceless mass of current web sites & traffic but under special circumstances it's the only way. I build web apps on top of bioinformatics pipelines supporting (today) 320Tb data and there's no way on Earth this would be possible to outsource.
(0 comments)

Dave Morin (Facebook) - The story behind the Facebook platform Posted by rmp at 11:57 4th Oct 2007 Dave's engaging and talking fast so more commentary and less transcription this time. Facebook is an impressive example of social networking success. Apparently incredible growth, doubling every 6 months with near 50% users returning every day.

Facebook is about enabling mapping of the social graph. The power of the social graph enables near viral transfer of functionality and features. Facebook photos has nearly twice the traffic of all the other photo sites combined.

The Facebook platform aims to allow developers to create applications which leverage and add context to the social graph via deep integration with the platform. Like so many other apps presenting here, it all hinges on open APIs and transparent, open data access.

The Facebook opportunity chain looks something like this: Innovation, growth, engagement, monetisation.
(0 comments)

Massively Parallel Sequence Archive Posted by rmp at 00:02 30th Apr 2008 For some time now at Sanger we've been looking at the problems and solutions involved with building services supporting what are likely to become some of the biggest databases on the planet. The biggest problem is there aren't too many people doing this kind of thing and who are willing to talk about it.

The data we're storing falls into two categories. Short Read Format (SRF) files containing sequence, quality and trace (~10Gb per lane) data and FastQ containing sequence and quality (~1Gb per lane).

Our requirements for these data are fundamentally for two different systems. One is a long-term archival system for SRF, the responsibility for which will eventually be shifted to the EBI . The second is, for me at least, the more interesting system -

The short-term storage of reads and qualities (and possibly also for selected alignments) isn't the biggest problem - that honour is left to the fast, parallel retrieval of the same. The underlying data store needs to grow at a respectable 12TB per year and serve maybe a hundred simultaneous users requesting up to 1000 sequences per second.

Transfer times for reads are small but as a result are disproportionately affected by artefacts like TCP setup times, HTTP header payloads and certainly index seek times.

We're looking at a few horizontally-scaling solutions for performing these kinds of jobs - the most obvious are tools like MapReduce and equivalents like Hadoop running with Nutch . My personal favourite and the one I'm holding out for is MogileFS from the same people who brought you Memcached . Time to get benchmarking!

Updated: Loved this via Brad
(0 comments)

Atom
10,000 brains for hire