Kosmix raises $20M, launches in beta.

Kosmix (my employer) today announced the beta of Kosmix.com, our topic page/exploration engine. Along with that, we also announced raising $20M in funding which should last us through our hockey stick growth over the next year. Ed Zander (Ex-Motorola CEO) participated in the round led by Time Warner Brothers and has joined as a strategic advisor.

Why use Kosmix?

While Google works great if you know exactly what you want (search), Kosmix helps you learn about a topic by presenting various dimensions of information around your query and showing you a lot of context for topics related to your query. You could use this to learn more about your query and see a summary of the best of the web on the query.

Although Kosmix appears like a traditional search engine, it is not; Kosmix blurs the line between Search and Content by aggregating information like search engines (think Google) do but presenting it like a rich content site would (think About.com). The closest competitor is probably Mahalo, but they are human curated and work on only a small set of topics, while Kosmix is algorithmic and could work on any query.

How does this work?

We have a cool categorization technology that can take a query and categorize it into several million categories and  find related topics to a query. We can then use this context around a query to identify various content sources and pull content in applying various relevance algorithms to create a rich topic page that tries to summarize the information on that topic.

Here are some cool queries to try -

Chocolate cake – We show a bunch of recipes (including a vegan one), nutritional information, how to videos etc.

San Francisco – We show content from wikitravel, images, maps of the town, list of nearby towns, local events, hotels, trip reports, news etc.

Apple – We recognize this query as being ambiguous and offer various interpretations – apple fruit, apple inc., apple macintosh etc.

Cosmo Kramer – Profile of the kramer character from Seinfeld, video highlights, list of other characters from Seinfeld TV series, related content – a widget of quotes from Jerry Seinfeld.

Pizza in 94040 – Yelp reviews, maps with local results

Do give Kosmix a shot and let me know what you think…

Attracting mainstream news consumers on the web

While share of news papers seems to be dwindling down in terms of readership and circulation, it is still death by a 1,000 cuts. Online news sites such as digg have not seen a jump in readership from the mainstream readers. By mainstream readers, I mean mom and pop store owners, doctors, businessmen etc., essentially people in other walks of life besides technology. Surely all the tech-savvy users are using digg, google blog search, reddit etc., the real jump would come in when the mainstream consumers cross the chasm.

News consumers on the web seem to discover news through these channels -

  • Editorial - What editors and people in the media business suggest I should be reading. This may be newspapers like wsj.com, or a blog network like techcrunch.com or a vertical portal like webmd.com. Most main stream users that have migrated from offline to online are in this bucket/channel. They are saving on trees since the same news is being consumed online from the same offline media outfits, but they have not really altered their consumption habits.
  • Collaborative filtering – What fellow peers in the know suggest I should be reading. This is where digg, reddit and other sites fall in. Technology savvy audiences are already consuming news in this manner. The mainstream users have not migrated to collaborative filtering yet. Part of the reason may be they are not yet introduced to the channel, another reason might be that even after getting introduced to the channel they are not sticking around. Users would stick around if they saw network effects, other people in similar walks of life were also on the same channel and consuming news in the same fashion.
  • Personal interests – Personal interests of the consumer which the other two above don’t seem to address. Mainstream media (editors, content generators) and collaborative filtering (content synthesizers) seem to only write/synthesize on topics they are interested in. There are a lot of news topics that are news worthy to a consumer but are not mainstream. Examples would include topics like – a previous employer, a very specific type of industry niche (budget hotels in china), a long tail hobby (mini street car racing). In order to discover news in this bucket, users read tail blogs or use products like google blog search and news search to get to the news item.

Consumers have two types of intent with respect to news discovery -

  • Transient intent – A current event such as interest rate hike, obama versus clinton etc. This intent is transient and generally short lived, most users follow these because this indirectly affects them or these are topics of current events and they are interested in staying in the know.
  • Persistent intent – There are topics that the consumer is interested in and follows closely, usually these topics may affect the consumer directly. These may be head topics such as health care or very tailish topics such as casino investments in macau (I may have invested in LVS), venture capital investments in Web2.0 in India etc. Users are typically directly interested either professionally, through a hobby or monetarily.

If the news papers are losing market share, where are the main stream users getting their news dose?

Main stream users have started the migration to web based news. However, they seem to be consuming news in the online versions of their offline channels. They seem to be reading wsj.com, nytimes.com and webmd.com which are news/information portals. These users have not migrated to “user-generated”/”user-synthesized” news yet.

A compelling solution for mainstream consumers would be one that discovers news through editorial content, collaborative filtering and caters to his personal interests as well as addresses persistent and transient intent of the reader. The network effects of other mainstream users consuming news in this manner would create a stronger impetus. I think we should see some evolution in net news in this direction soon.

So long (newspapers), and thanks for all the fish…

Structure 08

I attended the Structure 2008 conference in San Francisco this week, thanks to Anand for arranging passes for the same. The talks at structure08 were not very technical, focused primarily on some business, ethical considerations and adoption of cloud computing.

Nick Carr:

  • drew a symbolic link between Bill Gates retiring and Structure08 (first cloud computing conference) being in the same week. Marked the shift of computing from desktop to the cloud.
  • while building infrastructure, think of the ethical dimension. With electricty there was no requirement, with cloud computing, there is information involved.

Jonathan Yarmis:

  • claimed that a single converged device would never exist and cloud computing enables independence of location/device
  • The enterprise itself hasn’t figured out how to embrace cloud computing; users are figuring it out very quickly.

Werner Vogels, Amazon:

  • claimed that a typical company spends 70% effort on scaling/undifferentiated heavy lifting. AWS enables companies to focus on core value.
  • Cloud computing makes CAPEX an OPEX (Operating expense, variable cost model
  • Amazon calls 100s of services to construct a single page on amazon.com

Mendel Rosenblum, VMWare:

  • Run VM in my house or run it outside on the cloud, and have an easy way to move it around.
  • Desktop on cloud

Greg Papadopoulos, Sun:

  • Drew an analogy between storing your money and storing data. People are more comfortable storing money in the bank, similarly they would be more comfortable storing data securely in the cloud.

Various panels:

  • Meebo – They use cloud computing (AWS) for things like file upload that are non-core to business, prefer to keep control for core applications.
  • Facebook – Leverage our community to translate our site in various languages
  • Q: How to handle PR around outages? A: Be transparent, communicate and set realistic timelines for when service will be restored.
  • Concerns raised about vendor lockin with cloud computing platform providers. Need for open APIs for cloud computing.

Other notables/observations -

  • VMWare and Sun seem to be positioned for cloud computing in the enterprise space. They are being bypassed though by the AWS and GAE (Google App Engine) who are using commodity servers/software.
  • Talk around everything as a service (software, storage, content, applications, data, platform etc.)
  • AWS is leading the cloud computing space. Google (GAE) and Microsoft were at a distant 2nd.
  • Yahoo did not have much presence at Structure08. That was a bit of a surprise considering their contribution to Hadoop.
  • Hallway conversations – ‘cloud computing = grid computing + billing’
  • Early stage startups seemed to use cloud computing more than established players. Established players were more concerned more about control, were willing to experiment with cloud computing when they need extra capacity, but would like to maintain their own infrastructure for bread and butter type stuff.
  • Hallway conversations about how Oracle viewed BigTable/Hypertable – they seem to be following the trend but don’t have any products in the space. Talk about how the Oracle’s enterprise customers did not care about the BigTable model (need a relational database etc.).

Yahoo: Open up your search index to gain market share

Yahoo has been going through some turbulent weather. The Microsoft/Yahoo merger talks has fizzled out with Yahoo trying to fiercely stay independent. Yahoo has signed up a multi-year search ads monetizing deal with Google. Techcrunch is reporting a lot of executives leaving. It seems as if Yahoo has given up on Search monetization. If they give up on Search monetization, how long would they stay in the Search relevance, quality and related applications itself? On the other hand, unless Yahoo improves adoption of it’s Search products, it will not gain traction on Search monetization (catch-22).

Yahoo’s vision was to be the starting point for the consumer on the web with various Yahoo portals and Yahoo’s directory. Consumers though have moved to search as their starting point on the web and the default search for most consumers is Google. Yahoo’s strategy for being the starting point on the web would not work unless it stays in the search game.

How can Yahoo stay competitive and win the search and search monetization game?

I think Yahoo should open up completely. Open source is a great way to win the game without playing second fiddle. This has worked in the past for Linux when competing with Microsoft Windows. Google tried the same thing with OpenSocial while competing with Facebook. Google is fiercely secretive about Search where it is a market leader but has embraced open source (with Android) when competing with iPhone, RIMM and other mobile phone platforms. Playing the ‘open’ card helps the smaller rival get significant market share against the dominant players.

While Yahoo is flirting with openness through programs such as Hadoop, Search API, YUI and SearchMonkey they need to embrace open source completely at their core if they want to win at Search against Google.

Yahoo should open up their crawl & index and let developers run computations and build applications on Yahoo’s crawl and index. Yahoo and has a large crawl corpus representative of the web, opening it up would enable development of a lot of other search and web mining applications on top of it. Developers could build new signals or vertical search engines on top of Yahoo’s index. Yahoo would benefit from development of new signals which could get incorporated in their own search applications. New search applications could integrate better with Yahoo properties.

Harnessing the developer community would get the technology early adopters to start using Yahoo Search, who in turn could influence the main stream users. Yahoo could build a platform and an ecosystem where developers and startups who want to build interesting applications/technology could build it on top of Yahoo. Yahoo could provide the platform, distribution (Yahoo gets huge amount of page views) and help them with monetization. The next gen web applications like vertical search engines, semantic web applications etc. would be built on top of Yahoo’s search platform. Think of it as Facebook opening up it’s platform to allow building of social applications, Yahoo could do the same for Search and Search related applications. Let the good folks who are building applications like Indeed, Spock, Simply Hired, Trulia and similar web applications, build it on top of Yahoo’s Search platform. I for one would have definitely built Sangeetix on Yahoo’s platform.

I & Hobbes

I have named this blog after my favorite comic strip – Calvin and Hobbes, hence the name ihobbes.

Who is Hobbes? (according to Wikipedia)

Hobbes is Calvin’s stuffed tiger. From everyone else’s point of view, they see Hobbes as Calvin’s stuffed tiger. From Calvin’s point of view, however, Hobbes is an anthropomorphic tiger, much larger than Calvin and full of independent attitudes and ideas.

This blog to me is Hobbes.