Mark Fletcher: Lessons Learned Birthing and Building Web Startups

This is part of my set of notes from the Startup School 2006 sessions at Stanford.

Mark Flecter is a two-time entrepreneur with two successful companies under his belt:

  • OneList: At the end of 1999 merged with Egroups, acquired by Yahoo in June 2000. Product is now Yahoo Groups, and has 140K users
  • BlogLines: Launched in June 2002, this company was totally self-funded. Put in a total of $200K from start to finish, build it using only one salaried employee, four stock-only compensated employees, and outsourced labor hired through eLance. BlogLines was bought in 2005 by Ask Jeeves.

Garage Philosophy

Mark has his own “garage philosophy” – secrets to success for startups:

  • Passion for the idea: He’s been driven by solving problems he has himself. OneList was driven by his need to have an easy way to set up a mail list. BlogLines was driven by the problem of having an enormous bookmark list that he visited every day. If you solve a problem that you have, other people most likely have the same problem. It’s easy to get enamoured by the technology, and do things because you can. Instead, focus on the stuff that interests you, solves a problem for you. This will be your life 24×7, so you’d better enjoy what you’re doing.
  • Cheap technologies: Design around the idea of cheap hardware and open source.
  • It doesn’t have to be perfect: Many people try to get it perfect before they launch. This should be avoided – launch early, launch often. OneList was really ugly at first, and missing huge chunks of functionality. Doing this kick-starts the “virtuous cycle” where active feedback from customers drive development. Half of customer support email sent to BlogLines are feature ideas. Also engages customers, ties them more to your service as you address their needs.
  • Moonlighting limits risk: He worked a full-time job when starting BlogLines – he had a mortgage to support.
  • Friends/Family funds: When you do raise money, look to family/friends. The longer you can go without raising VC funding, the better position you’ll be in. BlogLines didn’t take VC funding until they had a million users. This put them in excellent negotiating position.
  • Free services = less pressure: You don’t have to worry about high availability when you’re offering a free service. People will cut you some slack.
  • Hire a lawyer: When he did OneList, he used an online service to incorporate in Delaware. Ended up having to redo things entirely and in the end it didn’t really save much money or time.
  • Outsource to eLance/Rent A Coder: Althought Mark wouldn’t recommend outsourcing core pieces of technology development, it can still be useful. For example, BlogLines has a notifier application – BlogLines put together a proposal, posted it on eLance and had some guys from Kazakstan/Russia write the application to their exact specifications in two days. Ditto for translation services – they extracted all of the text, posted the proposal to eLance, and for $3500 had the service translated into 6 languages. Graphics is also a good place to try using outsourced labor. In general, Mark recommends making the proposal as specific as possible to guarantee success.
  • PR is cheapest marketing you can do: The only way to value your service is in the growth of your user base, and the buzz surrounding your service. Anybody can come along and be smarter than you, but they can’t copy your users. Focus on viral growth – motivate users to be your best cheerleaders. Make it valuable for them to have their friends on the service. OneList pulled people in because people created mail lists for their friends – you couldn’t create a mail list that didn’t encourage people to sign up. Also, PR is cheap. Find someone who can develop relationships with reporters. BlogLines had great press coverage, four write ups in the Wall Street Journal. PR person was working for stock!

Design Philosophies

One great resource that summarizes it better than Mark ever could: Amy Jo Kim’s presentation at Etech – Putting the Fun in Functional.

Technical Aspects

Software choices

  • Linux/Apache
  • C/C++/bash/python
  • DJB/qmail/DJBDNS/Daemontools (http://cr.yp.to)
  • ClearSilver (http://www.clearsilver.net)
  • Berkeley DB (http://www.sleepycat.com)
  • Memcached
  • Avoid NFS
  • Avoid table-level locking in MySQL
  • The faster your web site is, the more pageviews you get. Whenever BlogLines slowed down, they threw more hardware at it and saw 30% more pageviews, not from more users, but more use of the product from the existing users

Hardware choices

  • Dedicated servers v. Buying/Hosting: For BlogLines, they bought their own hardware. They had to find machines, the co-lo, and deal with the network issues. Thought the advantage is that it’s cheaper in the long run, but would not do this again in the future. Renting the machines is the way to go – it just doesn’t cost that much.
  • eBay is a great place to buy hardware – you can find several wholesalers who will put together any machine you want, even new hardware.
  • APC PDUs for remote power cycling
  • HP ProCurve
  • Avoid Seagate Ultra-SCSI drives – they had 50% failure rate
  • A good phone for SSH which allows remote problem solving. He’s been in a casino, in front of a slot machine, fixing a server from his phone!

Architecture Choices

  • Copying files v. Client/Server: The BlogLines News RSS Feed is basically a text file that they copy between servers; copying files scales infinitely, and is a good solution for a lot of things. Not ideal for everything, but something to consider.
  • Calculate on the fly v. Cache: BlogLines generated cached pages of user accounts for spiders to crawl.
  • Memory v. Disk: Notifications in OneList – they were keeping track of how many emails were sent to a mail list in memory.

Storage Choices

  • Relational DB v. flat files: Don’t underestimate the power of flat files. BlogLines uses subscriber information in a sleepycat DB, and blog articles are all in flat files.
  • RAID v. Redundant hardware: OneList was storing mail list articles on RAID, which doesn’t lose data in case of failure, but results in data being unavailable. With BlogLines, they replicate blog posts across machine, not just at the disk level, ensuring that even if storage failed, service was still available.
  • Linux Software RAID: Prefer software RAID over hardware

SysAdmin Choices

  • DNS round robin for web servers – no need for hardware load balancers
  • Hot backups for offline-processing
  • Worry about cooling at the co-lo. If you find yourself with a lot of hard drive failures, you should suspect a cooling issue. (Unless you determine, as they did, that Seagate drives suck).

Avoid making stupid bets

While working on OneList, he made bet that if they were successful, they’d have a big party and that he would shave his heads. They were successful – baldness ho!

Audience Questions

  • What is your strategy for choosing what to outsource? Choose isolated pieces of functionality that can easily be developed separated from the rest of the code. He has experienced disaster outsourcing core functionality. Put a boundary around what you will or won’t outsource.

A Question Of Copyright

Poor Martin Schwimmer has stirred up quite a hornet’s nest (Scoble’s got the running summary) with his recent post on his decision to ask Bloglines to stop aggregating his blog’s RSS feed. While many are quick to criticize, I think it’s important to stop to examine the issue a little deeper to see if there’s any validity to his concerns.

First, let’s examine Martin’s opening volley:

This website is published under a Creative Commons license that allows for non-commercial use, provided there is attribution. Commercial use and derivative works are prohibited.

It was brought to my attention that a website named Bloglines was reproducing the Trademark Blog, surrounding it with its own frame, stripping the page of my contact info.

At first, you might think this is a bit ridiculous, but let’s break down the issue by examining the site’s licensing terms.

Non-Commercial Use

This is probably a pretty valid argument – although I’m not clear whether or not Bloglines is currently making money, they certainly are an ongoing commercial entity. But it is a fine line – I seem to remember there being similar grumblings in the Open Source community, back when people starting building commercial services on the back of GPL software. Actually, now that I think about it, that argument is ongoing.

Maybe it would help if we took a step back. Can we agree that if someone took Martin’s page and sold in on t-shirts, then that would be an infringement? Absolutely. And if someone offered copies of the content printed and bound? Again, a blatant violation. But what about an individual user viewing the page through a commercial aggregator located on their desktop? No way in hell is that a violation.

But once the jump is made to a server-based aggregator that provides the same functionality, the line between commercial and non-commercial purpose becomes a little less certain.

(Another question: is a web-cafe that charges for Internet access in violation if one of its users view Martin’s site?)

Although Martin, in a response on his blog, laments:

At least with Google’s contextual ad program, the blog creator gets some money.

True. Although Martin certainly doesn’t make any money from Google when it creates a derivative work from his blog and displays the result in its search listings on Google.com. I wonder: has Martin contacted Google to have his site removed from its search engine? Apparently not.

No Derivative Works

You have to concede this one to Martin – “no derivative works” is a pretty clear statement.

Attribution

What constitutes “attribution”? Well, if you go according to the definition provided on the Creative Commons’ Licenses Explained page:

Attribution. You let others copy, distribute, display, and perform your copyrighted work ā€“ and derivative works based upon it ā€“ but only if they give you credit.

Hmm…that doesn’t really shed much light on it, does it? What constitutes “credit”? What “contact info” would satisfy Martin’s requirements under the Creative Commons licensing scheme? A telephone number? An address? A link directly to his contact page?

How about a link to the original web site?

A quick examination of any blog in Bloglines reveals that it displays a prominent banner featuring the name of the blog as part of its user interface. And a link to the original top-level blog URL. And links to each item on the original source blog. And the following description of the blog:

The Trademark Blog from the law offices of Schwimmer and Associates

If this doesn’t satisfy the criteria of “attribution”, what will?

An Interesting Twist

Up to now, the discussion has been focused on the terms of Martin’s Creative Commons license. But there’s an interesting twist: Martin’s RSS feed doesn’t actually contain his Creative Commons license! That’s right, if you examine the raw XML, you’ll find a “copyright” element with the contents:

Copyright 2005 Martin Schwimmer

Hmm. That’s interesting – given that every other page on Martin’s site contains an embedded link to his CC license, would I be right in thinking that the RSS is not subject at all to its licensing terms? Could it be that his feed is, gasp, protected using plain ol’ regular copyright? In that case, it would appear that all bets are off.

While I certainly don’t wish this to be the case, you have to concede that Martin is following the letter of the license he stipulated, both for the original page as well as the RSS feed. While we may not like the outcome, or the fact that such an attitude not only will balkanize useful applications and innovation on the web, I don’t think you can argue the facts – after all, trademarks and copyright are his beat.

Implications

While it may not have been his intent, I think Martin’s actions have highlighted a legitimate concern for both content creators and aggregators. The proliferation of aggregation services is driven by an age-old secret to business: steal from the commons. The web is being viewed by web-based businesses as a wonderful resource for building value-added services, but it’ll only take one really well-funded lawsuit to bring down this house of cards. Web-based services need to think about embedding CC recognition into server-based applications to protect themselves from this possibility.

For us, the blog community, we need to remember that the purpose of the Creative Commons license is to allow the creator to exert control over the fruits of their labor. While we might want everyone to choose the least restrictive CC licensing terms, if we choose to blatantly disregard those licensing terms when we don’t agree with them (or dog-pile on the creator), we’re undermining the viability of the licensing scheme as a whole. To that end, perhaps we should be browbeating web-based services, such as Bloglines, Rojo, Feedster, Feedburner, and PubSub to incorporate CC license recognition intelligence into their services and use it to filter out content that hasn’t been properly licensed for their purposes. Doing so would serve two-purposes: protect these services from future infringement litigation; and further cement the Creative Commons licensing scheme’s reputation as a legitimate mechanism for creators to exert control over their works. Indirectly, such action may also illustrate to copyright owners like Martin the value of participating in these services and choosing a less-restrictive CC license, enabling the creation of technologies that not only benefits readers, but also the content creator themselves.

I, for one, would like to thank Martin for the attention he’s brought to this issue. While it may not have been his intention to bring about this level of discussion, I think it’s been valuable nonetheless.