Monthly Archives: June 2016

Interesting But Not A Business: The Story of the WbSrch Search Engine

I write this after having just shut down my almost-startup, the WbSrch search engine.

I started working on WbSrch for “fun” in the fall of 2013. AltaVista, my favorite search engine from “back in the day” had shut down that summer. Nostalgia combined with annoyance at how bad/annoying/intrusive/evil Google had become convinced me to try building my own version of AltaVista.

Well, a month of hacking later I had the core of something rudimentary but sort-of-functional. It was pretty terrible, but proved that I could get something built. I crawled a total of about 200,000 pages and had a bare skeleton of a search engine. I started by calling it the “anti-social search engine” because at the time, searching Google for almost anything would return so much social media drivel, clickbait garbage, and otherwise low-value spam-like content.

Getting to the first prototype was easier than I expected, so I continued to work on it, improving the crawler and search algorithms and growing the index. At around 2 million pages it outgrew the Linode VPS it was on and I set up hosting at a local colocation center using a $400 server I picked up on eBay (great deal – dual quad-core Xeons and 72GB of RAM – plenty to grow with).

Things progressed and I ended up announcing it to the public around the end of May 2015. It only had 5 million pages and the indexing algorithms were still pretty terrible, but it started getting some Human traffic.

And the bots discovered it. Every link analyzer SEO app in the world decided that WbSrch was a juicy crawl target. I considered blocking them since the SEO industry is complete garbage, but they were a decent source of Human traffic, and most of the traffic came from webmasters who would check to see whether their sites were indexed and run a few searches.

As it grew, maintenance became more time-consuming. I wanted to keep it from being too porn-heavy, from being full of Chinese and Russian sites, and pages categorized by language so the German-language front-end would only return pages in German.

After a year and a half of running the site as a hobby project, I decided to put it away because the mission had been accomplished – an index of 10 million pages, and it worked about as well as any other mid-1990’s search engine. For a few months the front page just pointed to Yandex.com (a Russian search engine – the third-best search engine after Google and Bing).

Well, one unanswered question kept nagging at me: “What if I could turn this into a real business?”

So I turned it back on, and started working on it pretty hard. I read a bunch of textbooks on information retrieval, text processing, statistical language processing, and a bunch of other search-related topics that I knew nothing about when I started the project.

Almost immediately the drives in the server failed.

So I replaced them and rebuilt the server. Didn’t lose much other than a week or two of crawl data because I had a backup, and source control. A few months later the drive controller failed, but there was no data loss – just a day of downtime and a lot of swearing.

As it grew and improved, I also started running some advertising, trying to build the audience and increase traffic. Visits were pretty cheap, but not very sticky. The bar is extremely high for getting someone to switch to a new search engine.

Still, there was some traffic, on the order of a consistent 5-digit number of pageviews per month. So I tried monetizing using a few different ad networks (around ten). The best one was able to earn about $3 per month. When you can’t use ad networks that don’t let you link to porn, gambling, or torrent sites but don’t want to advertise porn, gambling, or torrents, your income is low. Abysmal. $.05 CPM on the high end.

With the math for getting new visitors figured out (3-7 cents per click depending on the channel), and the math for monetizing those visitors figured out (about 5 cents for every 500 visitors), it was clear that it would cost $5 to earn $0.01. If those users were really sticky and would return over and over again, then maybe it would be worth the price. But they weren’t.

I also ran a crowdfunding campaign to gauge interest/demand. I raised a little money for hardware upgrades, but more importantly I learned a little more about how much people just don’t care about having another search option. I did manage to get one donation from someone I didn’t already know, but only one.

At this point I had about 47 million pages indexed and the search engine had grown to 3 servers. It had crawled only a tiny fraction of the internet, but it was still possible to find what you were looking for much of the time. It’s surprising how well you can do with a small index if you focus mainly on the most popular sites.

But to take things to the next level of quality I would need to build a system able to handle at least a billion pages.

That’s where things get expensive. I had only spent about $8,000 on WbSrch so far. Did I want to spend another $50,000 to get to that next level where users might be a bit stickier and ad revenue might be better (it tends to be lower when you’re low-volume — when you have enough traffic that algorithms can optimize, it gets better). Maybe it would only cost $2 to earn $0.01 and those users would return often enough that I could earn another $0.01.

And that’s where I decided to shut everything down. Math doesn’t lie.

Call it failure to validate. There is no search engine business to be had for me. Maybe someone else could do it. Like DuckDuckGo. Interestingly, they didn’t start with their own crawl. And they’ve partnered with Microsoft for advertising. So they’re essentially a privacy-focused variant of Bing with a different UI. That’s good for them, but the interesting part is developing your own proprietary technology, your own crawler and algorithms. Otherwise, one day Microsoft could decide that having an API is inconvenient and shut down your entire business.

At some point Apple will decide that having a search engine is important and build or buy one. Maybe they’re already building one if the rumors are to be believed.

Anyhow, it’s a little bit sad that it didn’t work, and a little bit sad that I spent all that time on it, but I did get smarter. And not just code. I learned a lot more about marketing and advertising in the process.

So now it’s on to the next thing.