<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="html">Feed for tag platform</title>
  <link href="http://www.psyphi.net/cgi-bin/blog/tag/platform"/>
  <link rel="self" href="http://www.psyphi.net/cgi-bin/blog/tag/platform.atom"/>
  <updated></updated>
  <author>
    <name>PsyPhi Group</name>
  </author>
  <id>urn:uuid:</id>
<entry>
 <title type="html">Matt Mullenweg (WordPress) - The Architecture Behind WordPress.com</title>
 <summary type="html">Scaling platform:&#x3C;br /&#x3E;&#x3C;br /&#x3E;88m global uniques
1.5m blogs
215m pageviews/day&#x3C;br /&#x3E;&#x3C;br /&#x3E; - 7 boxes, $1500/mo.&#x3C;br /&#x3E;&#x3C;br /&#x3E; - 2 balancers, 2Gb memory, any disk, pound+wackamole+spread.&#x3C;br /&#x3E;&#x3C;br /&#x3E; - 2 databases, 4Gb+ memory, fast disk (RAID), master+slave mysql, split read/write.&#x3C;br /&#x3E;&#x3C;br /&#x3E; - 3 webs, fast cpu, 2Gb memory, litespeed or well-configured apache.&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Everything in subversion.&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Be stateless (shared nothing)&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Memcached&#x3C;br /&#x3E;&#x3C;br /&#x3E;geo-targetting of dns, might as well use a CDN&#x3C;br /&#x3E;&#x3C;br /&#x3E;single box 300 req/sec 29.5m/day&#x3C;br /&#x3E;&#x3C;br /&#x3E;Scaling community:&#x3C;br /&#x3E;&#x3C;br /&#x3E;Ref: sxsw presentation&#x3C;br /&#x3E;&#x3C;br /&#x3E;Scaling business:&#x3C;br /&#x3E;&#x3C;br /&#x3E;Tie revenue streams to cost scaling, e.g. pageviews rather than bloggers.
Ads scale revenues with pageviews, effective CPM, low outlay.
VIP class of users with advanced features, customisation etc.&#x3C;br /&#x3E;&#x3C;br /&#x3E;Scaling people:&#x3C;br /&#x3E;&#x3C;br /&#x3E;Hire people as good as, or better than you.
Great people = Rich environment + worthwhile problems.
5 things to look for:&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Personality fit (when the shit hits the fan)&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Ability to learn (curiosity)&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Taste (can&#x27;t be taught)&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Passion for the space&#x3C;br /&#x3E;&#x3C;br /&#x3E; - Familiarity with current technologies&#x3C;br /&#x3E;&#x3C;br /&#x3E; Don&#x27;t put out too easily - don&#x27;t hire if doubts exists&#x3C;br /&#x3E;&#x3C;br /&#x3E;It&#x27;s fun to know that the folks at wordpress manage deployment with ssh scripts running svn up.&#x3C;br /&#x3E;&#x3C;br /&#x3E;I disagree with Matt&#x27;s QA response suggesting that you should never do your own hosting. In principle that&#x27;s ok for the large faceless mass of current web sites &#x26; traffic but under special circumstances it&#x27;s the only way. I build web apps on top of bioinformatics pipelines supporting (today) 320Tb data and there&#x27;s no way on Earth this would be possible to outsource.
</summary>
 <link href="http://www.psyphi.net/cgi-bin/blog/entry/22"/>
 <updated>2007-10-03T15:18:05Z</updated>
 <id>urn:uuid:12c6fc06c99a462375eeb3f43dfd832b08ca9e17</id>
</entry>
<entry>
 <title type="html">Dave Morin (Facebook) - The story behind the Facebook platform</title>
 <summary type="html">Dave&#x27;s engaging and talking fast so more commentary and less transcription this time.
Facebook is an impressive example of social networking success. Apparently incredible growth, doubling every 6 months with near 50% users returning every day.&#x3C;br /&#x3E;&#x3C;br /&#x3E;Facebook is about enabling mapping of the social graph. The power of the social graph enables near viral transfer of functionality and features. Facebook photos has nearly twice the traffic of all the other photo sites combined.&#x3C;br /&#x3E;&#x3C;br /&#x3E;The Facebook platform aims to allow developers to create applications which leverage and add context to the social graph via deep integration with the platform. Like so many other apps presenting here, it all hinges on open APIs and transparent, open data access.&#x3C;br /&#x3E;&#x3C;br /&#x3E;The Facebook opportunity chain looks something like this: Innovation, growth, engagement, monetisation.
</summary>
 <link href="http://www.psyphi.net/cgi-bin/blog/entry/31"/>
 <updated>2007-10-04T11:57:50Z</updated>
 <id>urn:uuid:632667547e7cd3e0466547863e1207a8c0c0c549</id>
</entry>
<entry>
 <title type="html">Massively Parallel Sequence Archive</title>
 <summary type="html">For some time now at &#x3C;a href=&#x22;http://www.sanger.ac.uk/&#x22;&#x3E;Sanger&#x3C;/a&#x3E; we&#x27;ve been looking at the problems and solutions involved with building services supporting what are likely to become some of the biggest databases on the planet. The biggest problem is there aren&#x27;t too many people doing this kind of thing and who are willing to talk about it.&#x3C;br /&#x3E;&#x3C;br /&#x3E;The data we&#x27;re storing falls into two categories. Short Read Format (SRF) files containing sequence, quality and trace (~10Gb per lane) data and FastQ containing sequence and quality (~1Gb per lane).&#x3C;br /&#x3E;&#x3C;br /&#x3E;Our requirements for these data are fundamentally for two different systems. One is a long-term archival system for SRF, the responsibility for which will eventually be shifted to the &#x3C;a href=&#x22;http://www.ebi.ac.uk/&#x22;&#x3E;EBI&#x3C;/a&#x3E; . The second is, for me at least, the more interesting system -&#x3C;br /&#x3E;&#x3C;br /&#x3E;The short-term storage of reads and qualities (and possibly also for selected alignments) isn&#x27;t the biggest problem - that honour is left to the fast, parallel retrieval of the same. The underlying data store needs to grow at a respectable 12TB per year and serve maybe a hundred simultaneous users requesting up to 1000 sequences per second.&#x3C;br /&#x3E;&#x3C;br /&#x3E;Transfer times for reads are small but as a result are disproportionately affected by artefacts like TCP setup times, HTTP header payloads and certainly index seek times.&#x3C;br /&#x3E;&#x3C;br /&#x3E;We&#x27;re looking at a few horizontally-scaling solutions for performing these kinds of jobs - the most obvious are tools like &#x3C;a href=&#x22;http://labs.google.com/papers/mapreduce.html&#x22;&#x3E;MapReduce&#x3C;/a&#x3E; and equivalents like &#x3C;a href=&#x22;http://hadoop.apache.org/core/&#x22;&#x3E;Hadoop&#x3C;/a&#x3E; running with &#x3C;a href=&#x22;http://lucene.apache.org/nutch/&#x22;&#x3E;Nutch&#x3C;/a&#x3E; . My personal favourite and the one I&#x27;m holding out for is &#x3C;a href=&#x22;http://www.danga.com/mogilefs/&#x22;&#x3E;MogileFS&#x3C;/a&#x3E; from the same people who brought you &#x3C;a href=&#x22;http://www.danga.com/memcached/&#x22;&#x3E;Memcached&#x3C;/a&#x3E; . Time to get benchmarking!&#x3C;br /&#x3E;&#x3C;br /&#x3E;Updated: Loved &#x3C;a href=&#x22;http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html&#x22;&#x3E;this&#x3C;/a&#x3E; via &#x3C;a href=&#x22;http://brad.livejournal.com/&#x22;&#x3E;Brad&#x3C;/a&#x3E;</summary>
 <link href="http://www.psyphi.net/cgi-bin/blog/entry/53"/>
 <updated>2008-04-30T00:02:59Z</updated>
 <id>urn:uuid:c5b76da3e608d34edb07244cd9b875ee86906328</id>
</entry>

</feed>
