Author Archives: xangis

Roland: Wrong But Persistent

This is an update to the unsubstantiated legal threat I received from Roland a few weeks ago.

My ISP, Linode, was reviewing the situation and still hadn’t come to a decision (I suspect they wrote to the email address of the sender and failed to receive a reply, as I did). However, today they received a DMCA takedown notice.

— BEGIN NOTICE —

Dear Sir/ Madam,

I, the undersigned, state the following:

1) I am the legal representative authorized to act on behalf of Roland Corporation, of certain exclusive intellectual property rights (“Roland Corporation”);

2) I attest, under penalty of perjury, that I have a good faith belief that

freewavesamples.com
http://freewavesamples.com/roland-d-20-kick
http://freewavesamples.com/roland-jd-990-pizzicato-strings-c4
http://freewavesamples.com/roland-d-20-snare
http://freewavesamples.com/roland-gr1-orchestra-hit-c5
http://freewavesamples.com/roland-gr1-trumpet-c5

use Roland Corporation’s intellectual property in the content without authorization. This use falsely suggests Roland Corporation’s sponsorship or endorsement of the website and violates Roland Corporation’s exclusive rights;

3) Roland Corporation represents that use of the material is not authorized by the copyright owner, its agent, or the law;

4) Based upon information at its disposal on freewavesamples.com, we believe that the statements in this notice are accurate and correctly describe the infringing nature and status of the Infringing Material;

5) I understand that, pursuant to 17 U.S.C. § 512(f), any person who knowingly materially misrepresents that material or activity is infringing may be liable for damages, including costs and attorneys’ fees.

The reasons that the domains named above must be suspended are as follows:

a) Offer(s) a counterfeit or otherwise unauthorized item for sale that violates the IP Owner’s trademarks and/or copyrights.
b) Misuses the IP Owner’s brand name, trademarks and/or copyright.
c) Uses a copyrighted image without authorization from the IP Owner.

The reported website(s), and by consequence the infringing content, is accessible globally, and is protected under the Berne Convention, the protection of which extends to 168 countries (full list here: http://www.wipo.int/treaties/en/ShowResults.jsp?lang=en&treaty_id=15).

We are providing you this letter of notification pursuant to the Digital Millennium Copyright Act 17 USC§512(c) to make you aware of material on its network or system that infringes the exclusive copyrights of Roland Corporation.

Attempts to resolve this issue with the Registrant have been unsuccessful. We seek your help in removing the infringing content. Please take reasonable and prompt steps to investigate and respond appropriately to this report of abuse commensurate to your commitment to addressing abuse as outlined in your terms of service and commensurate to your obligation under relevant law.

We may be contacted at the email address below.

Sincerely,

brandprotection@rolandipr.com

— END NOTICE —

I do not agree with Roland’s claims and do not believe they would win a court challenge.

This content on freewavesamples.com does *not* violate Roland’s copyright, nor Roland’s trademark. The trademark issue is explained in this earlier post. Here I address the copyright (Linode thought that was their concern, but to me that letter still reads as a trademark threat).

The samples posted on freewavesamples.com are not samples that were created by Roland. They are sounds that I played on and recorded from the Roland synthesizers that I own, recorded at a specific pitch (C in the best octave for the patch, usually), edited and denoised, normalized, etc. Some of them sound better than the original instrument (particularly in the case of some of the noisy 1980’s synths).

This is a very important difference.

In buying an instrument, you also buy the right to do whatever you like with recordings of sounds *from* that instrument. If this were not the case, every song recorded using a sound from a Roland synthesizer would be in violation and the music industry would fall apart.

This is why there is a thriving sample pack industry, with retailers selling CDs, DVDs, and downloads of samples from various synthesizers and drum machines. If you want all of the quality, flexibility, programmability, and versatility of the original instrument you buy the instrument. If you just want access to some of the preset sounds, you buy the sample pack. You don’t get much value from a sample pack, though – the presets are the least interesting part of an instrument, and using them in a song makes you sound unoriginal and uncreative. In addition, the pitch shifting involved in transposing a single-note sample to other pitches loses any articulation associated with that sound and introduces artifacts and aliasing that makes it sound less realistic the farther you go from the recorded pitch.

This is also true for other instruments. You can buy a guitar and get all of the flexibility that comes with owning an expressive instrument with hundreds of years of design history behind it. Or you can buy a sample CD with a few notes and sounds that you’ll be hard-pressed to make sound like a real, live version of the instrument on a recording.

If these were binary dumps of the actual samples in the synthesizer’s ROM, this *would* be in violation of Roland’s copyright, assuming the samples in that particular synthesizer are legitimately copyrighted. That has been a problem for emulator developers, such as the MT-32 emulation project. (references HERE, HERE, HERE, and HERE). I was aware of that case before ever starting the site, which is why I have not taken binary dumps of any samples. Interestingly enough, the MT-32’s samples were not copyrighted, and things turned out favorably for the emulator.

However, worrying about this stuff makes me tired. And at this point thinking about Roland fills me with disgust. I’m not sure I even want to mention them on my site, freely advertising and promoting their brand and increasing their name recognition when I get nothing for it (these are FREE downloads, for pete’s sake!).

For now I’m taking the mentioned Roland samples down and filing a counter-notice to their DMCA request. If they don’t respond in 14 days, which I doubt they will, then I may put them back up. If I can stomach it. I’ve also emailed the EFF to ask for their opinion since they were involved in the original MT-32 emulator case.

I have not taken ALL of the Roland samples down. Only the ones mentioned in their complaint. Based on the outcome of this dispute, I’ll either remove everything from Roland (there’s lots more), or keep them all up.

There is an update to this situation.

Free Wave Samples Is in Danger

Ten years ago I created freewavesamples.com because there was a shortage of free high-quality samples online. Specifically, the type of samples that would be useful for making music with samplers and trackers. I recorded sounds from my large collection of synthesizers (and some other instruments), a mix of preset patches and custom patches that I’ve created.

Now, after a decade of giving away samples without any issues, the site is under threat from Roland. A trademark threat of all things.

This is an unsolicited email I received:

— BEGIN MESSAGE —

To Whom It May Concern:

I am writing on behalf of the IP Department for Roland Corporation and its division regarding your infringement of Roland intellectual property rights. As I am sure you know, Roland Corporation is a leading manufacturer and distributor of electronic musical instruments, including keyboards and synthesizers, guitar products, electronic percussion, digital recording equipment, amplifiers, audio processors, and multimedia products. With more than 40 years of musical instrument development, Roland sets the standard in music technology for the world to follow. For more information, visit http://www.Roland.com/global/.

In connection to Roland Corporation’s proprietary rights over its famous trademark we are notifying you of the following:

Roland Corporation has recently learned that the trademark ROLAND appears as a metatag, keyword, visible or hidden text on the web site(s) located at the below listed URL(s) without having obtained prior written authorization from Roland Corporation. This practice infringes upon the exclusive intellectual property rights of Roland Corporation.

http://freewavesamples.com/roland-d-20-kick
http://freewavesamples.com/roland-jd-990-pizzicato-strings-c4
http://freewavesamples.com/roland-d-20-snare
http://freewavesamples.com/roland-gr1-orchestra-hit-c5
http://freewavesamples.com/roland-gr1-trumpet-c5

As a trademark owner, Roland Corporation is obligated to enforce its rights by taking action to ensure that others do not use its trademarks without permission. Unauthorized use of the trademark(s) could create a likelihood of confusion with Roland Corporation’s trademark as to the source, sponsorship, affiliation, or endorsement of your web site(s), online location(s), products or services.

In light of the above, we request that you respond to this e-mail within ten (10) days, informing us whether you have obtained rights from Roland Corporation to use the trademark(s). If so, please provide us with details as to who granted you such rights and when. If not, please remove all metatags, keywords, visible or hidden texts including trademark(s) presently appearing on the above-cited web site(s) and any other web site(s), or draw this issue to the attention of the appropriate person(s).

Thank you in advance for your anticipated cooperation in this matter.

Sincerely,

Roland Corporation
brandprotection@rolandipr.com

— END MESSAGE —

This looks like the type of message that is generated by an automated IP enforcement bot — the kind of thing that crawls the web and automatically bothers people when it sees certain keywords. I sincerely doubt that a Human was involved in sending it at all.

Here’s why their claims are not valid:

1. There is no possibility of brand confusion because the mentions of Roland on the site are NOT a competing product. It’s THEM that I’m talking about when I use the word Roland. These are samples recorded from my Roland synthesizers and I have every right to say where they came from. That is Fair Use.
2. They are using an overly broad interpretation of trademark.
3. There are no products for sale on the site.
4. You don’t get to decide what keywords someone uses in their meta tags. That’s not covered by trademark, and it’s pointless to even care since search engines don’t even use keyword meta tags anymore.
5. If this was a problem, Roland should have objected long ago. As this capture from The Wayback Machine on May 5, 2007 shows, I’ve been offering Roland samples for more than ten years (including the JD-990 sample that is in their list of URLs). It has been a popular site for a long time, and I believe it was in 2008 that it first cracked the Alexa Top Million (it’s been the 200-250k range in recent months).

In any case, I replied to their email explaining that their claims were not valid in this case. Heard nothing back. Received the same email again a week later. Replied with the same answer. Heard nothing back.

A week or so later, my ISP notified me that it received the same email and said that I needed to remove the content or my server would be “limited”. I contested it and I’m still waiting to hear back from their “trust and safety” department. One thing I will definitely give Linode — their customer service is quite good, and they look into things rather than making unilateral decisions. I experienced this firsthand a few years back when a malware scanner came up with some false positives for some apps I had posted on the Zeta Centauri website.

One of six things will probably happen here:

1. A real Human at Roland will look at this and realize it’s silly and drop it.
2. I’ll end up removing the Roland content from the site under duress, even though they have no right to force me to not mention them. And, of course, I’ll replace it with a very unflattering note explaining why it was removed.
3. I’ll change ISPs. This is unlikely because the same thing will probably just happen again.
4. I’ll shut down the site. This isn’t too likely, but it is possible. It’s does require time and energy to maintain, and even though it gets a lot of traffic, the Google ads on the site earn only slightly more than the hosting costs. Enough to buy an inexpensive synth once a year (and then record more samples from it).
5. I’ll fight it in court and win. This isn’t too likely because it’s not worth the time, money, and energy. If I were earning a living from the site, that might be a different story (this is just a hobby, not a company). But, I also hate bullying, so it might be worth the fight, especially since trademark bullies are some of the worst kind of bullies.

I’ve spent a LOT of money on Roland audio equipment over the last 20 years (probably $20k total). Result #1 is the only thing that will keep me buying and using Roland gear.

If you’d like to keep freewavesamples.com online, please contact Roland and let them know they should leave me alone.

And for the love of all that is good, please don’t use (or let your company use) shoddy shotgun-blast-approach automated IP protection bots. A number of companies sell them, and none of them are very good. Down with Barratry bots.

There is an Update to this situation.

Having a Career Again

For the last five years, since moving to Portland, I’ve been obsessed with building a viable startup.

During that time, I was either working on my own startup as a founder (or cofounder), or working a regular job just long enough to get the money to try again with another startup. None of those four startups made it (one came close). In the process, I established a pattern of not sticking around long at any place I didn’t create.

Now that I’m over the startup founder bug, it’s time to go back to having a career.

Right now I have two problems to solve:

1) What’s the best entry point?

My last pre-startup-obsession jobs were “Senior Software Engineer” and “Engineering Manager”. I also have the skills to be a Software Architect, DevOps Engineer, System Administrator, CTO, Director of Engineering, Project Manger, or Product Manager. But with the exception of CTO, I haven’t done any of them exclusively over the last few years.

So far I’ve been applying to anything that looks interesting – anything I could see myself doing well for a while and enjoying. That list is pretty broad.

2) What place(s) won’t be too concerned with the lack of consistency over the past few years?

I imagine this is probably one of two types of place – one where the person doing the hiring has also been a startup founder, or something that is contract-to-hire.

I don’t really know how HR people think, but I need to find a good way to convey that yes, I did have a disease that causes flakiness, but I’m cured and would like to find something to stick with for a while.

And for the sake of all that is good, please no pre-revenue startups. I’ve eaten enough of that sandwich.

As an INTJ “Mastermind” type, the times when I’ve enjoyed working the most are when I’ve become a broad subject matter expert with an answer to just about any question and the power to create solutions that make user/coworker lives better. That only happens after being somewhere a while and getting to know all of the systems thoroughly. Promotions and raises are pretty enjoyable too.

I’m not sure where I’ll end up next, but finding the next thing is probably just a numbers game.

Old Basternae Blog Posts Imported

I ran a blog for about seven years at basternae.org. It was almost entirely about Basternae MUD and the evolution of the ModernMUD codebase, but also included a lot of general programming-related entries. I’ve imported all of the previous posts from that blog for the sake of preserving history, though many of them will no longer be relevant. Even so, there may be information that is helpful for people who want to make use of the ModernMUD code to build their own multi-user dungeon.

Quora Answer: How would you find the websites to build a search index from scratch?

I originally wrote this as an answer to a question on Quora.

It depends on the scale.

If you just want to experiment with web crawling and build a basic search index, it’s common to start with the Alexa top million websites, which can be downloaded in a CSV file via S3 at: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

The top million changes daily, and includes a lot of spam and porn. It’s easy to game the system to get into the bottom 500k, so you’ll need to decide what to include and how to weight it.

The DMOZ website dump used to be a good starting point. They shut down on March 17th, 2017, so that’s no longer really an option. It left a lot out, but contained about 4 million URLs with most of the low-quality sites filtered out. There may be a mirror that data somewhere (heck, I’d like to download their last URL dump if it’s available anywhere).

Real search engines have agreements in the place with the top-level registrars that let them get a zone file dump listing all registered domains. This involves jumping through some hoops and filling out some forms, and each registrar needs to be dealt with separately. Getting access to .social is completely separate from getting access to .net.

Since it takes a LOT of work to get access to all TLD domain files, a commercial service like Domains Index is probably your best bet if you want to do anything on a large scale. I’ve bought from them before and it’s a good service. They don’t have absolutely everything, but 200 million domains is far better than the 1 million you get from Alexa.

Quora Answer: Why do some people like instrumental music way over music with vocals in them?

I originally wrote this as an answer to a question on Quora.

Full question for this answer:

“I have noticed that my disinterest is turning to antipathy towards music with vocals in them, especially pop music. Most people say that vocals is extremely important to them and a song without it is just .. ‘blah’.

Is there a word for this sort of preference? How did people develop such an attraction towards instrumental music, while clearly the most heard music is with vocals in them (radio)?”

I strongly prefer instrumental music.

For me the main reason is because I can just enjoy the music and imagine/think whatever I like without being drawn into the world of the singer.

Most of the time when I’m listening to music, I don’t really want to think about someone else’s messed up relationship or dead dog. I’d rather imagine my own things and/or have my own thoughts. I like the way that sound enhances my thinking and I don’t want to give someone else control. Lyrics automatically change the subject.

I listen to a lot of genres, but the absolute worst thing is the tendency of a lot of ambient/atmospheric/psytrance electronic music to include fragments of lectures by Timothy Leary or Deepak Chopra. Stop ruining your music by trying to make idiots put ill-concieved ideas into my head. I instantly turn off anything that does that and never listen again.

Words paint a picture of a world. There are a lot of worlds I don’t want to live in. That’s why I never listen to country or gospel. But, when I’m in the mood for lyrics, the worlds are far stronger in non-instrumental music.

Words can be overwhelming sometimes. Notes, not so much.

Quora Answer: What are the reasons for Google’s search engine low market share in Russia, South Korea, the US?

I originally wrote this as an answer to a question on Quora.

Full question:

“While I know that Google is banned in China and Yahoo Japan uses Google algorithm, I do not know why Google’s market share is so poor in Russia, the US, South Korea. I’m especially interested in bad performance due to linguistic reasons, if they apply. What about the US?”

In Russia, Yandex is dominant. It’s a very good search engine, and is primarily Russian-language focused. It’s been around nearly as long as Google, and people have a certain loyalty to it, especially since it was the first good Russian-language search engine. Russians also don’t tend to be very trusting of U.S. companies.

In Korea, the story is similar. They have good regional homegrown search engines, Naver and Daum, that have been around about as long as Google and that cater specifically to the Korean-language market. There’s no reason to use Google because the local options work quite well.

In these cases it’s a combination of being better in local languages for a long time combined with the tendency to “buy local”. They don’t use Google because they don’t need or want Google.

In the U.S. there are a lot of factors. There have been a lot of good English-language options for a long time, even though many have come and gone. Brand loyalty is a very American thing, even when other brands might do slightly better, and getting a “second opinion” is also pretty ingrained. There are also a lot of people who are uncomfortable with Google’s level of information gathering and profile building (“spying”). This combined with the tens of billions of dollars in the search space leaves plenty of room for competitors, even though it’s very expensive to build a decent search engine and very difficult to monetize one (DuckDuckGo being a good modern example of the difficulty of building one in modern times).

One of the big influences on U.S. market share is the existence of marketing deals. Money changes hands to be featured as the default search engine in browsers like Firefox, Safari, and others. This can move market share by a few percentage points overnight. If you can pay $1 billion for enough traffic to generate $3 billion in ad revenue from the, it’s a great deal. The Apple deals have been very public, but others have been privately arranged.

Interesting But Not A Business: The Story of the WbSrch Search Engine

I write this after having just shut down my almost-startup, the WbSrch search engine.

I started working on WbSrch for “fun” in the fall of 2013. AltaVista, my favorite search engine from “back in the day” had shut down that summer. Nostalgia combined with annoyance at how bad/annoying/intrusive/evil Google had become convinced me to try building my own version of AltaVista.

Well, a month of hacking later I had the core of something rudimentary but sort-of-functional. It was pretty terrible, but proved that I could get something built. I crawled a total of about 200,000 pages and had a bare skeleton of a search engine. I started by calling it the “anti-social search engine” because at the time, searching Google for almost anything would return so much social media drivel, clickbait garbage, and otherwise low-value spam-like content.

Getting to the first prototype was easier than I expected, so I continued to work on it, improving the crawler and search algorithms and growing the index. At around 2 million pages it outgrew the Linode VPS it was on and I set up hosting at a local colocation center using a $400 server I picked up on eBay (great deal – dual quad-core Xeons and 72GB of RAM – plenty to grow with).

Things progressed and I ended up announcing it to the public around the end of May 2015. It only had 5 million pages and the indexing algorithms were still pretty terrible, but it started getting some Human traffic.

And the bots discovered it. Every link analyzer SEO app in the world decided that WbSrch was a juicy crawl target. I considered blocking them since the SEO industry is complete garbage, but they were a decent source of Human traffic, and most of the traffic came from webmasters who would check to see whether their sites were indexed and run a few searches.

As it grew, maintenance became more time-consuming. I wanted to keep it from being too porn-heavy, from being full of Chinese and Russian sites, and pages categorized by language so the German-language front-end would only return pages in German.

After a year and a half of running the site as a hobby project, I decided to put it away because the mission had been accomplished – an index of 10 million pages, and it worked about as well as any other mid-1990’s search engine. For a few months the front page just pointed to Yandex.com (a Russian search engine – the third-best search engine after Google and Bing).

Well, one unanswered question kept nagging at me: “What if I could turn this into a real business?”

So I turned it back on, and started working on it pretty hard. I read a bunch of textbooks on information retrieval, text processing, statistical language processing, and a bunch of other search-related topics that I knew nothing about when I started the project.

Almost immediately the drives in the server failed.

So I replaced them and rebuilt the server. Didn’t lose much other than a week or two of crawl data because I had a backup, and source control. A few months later the drive controller failed, but there was no data loss – just a day of downtime and a lot of swearing.

As it grew and improved, I also started running some advertising, trying to build the audience and increase traffic. Visits were pretty cheap, but not very sticky. The bar is extremely high for getting someone to switch to a new search engine.

Still, there was some traffic, on the order of a consistent 5-digit number of pageviews per month. So I tried monetizing using a few different ad networks (around ten). The best one was able to earn about $3 per month. When you can’t use ad networks that don’t let you link to porn, gambling, or torrent sites but don’t want to advertise porn, gambling, or torrents, your income is low. Abysmal. $.05 CPM on the high end.

With the math for getting new visitors figured out (3-7 cents per click depending on the channel), and the math for monetizing those visitors figured out (about 5 cents for every 500 visitors), it was clear that it would cost $5 to earn $0.01. If those users were really sticky and would return over and over again, then maybe it would be worth the price. But they weren’t.

I also ran a crowdfunding campaign to gauge interest/demand. I raised a little money for hardware upgrades, but more importantly I learned a little more about how much people just don’t care about having another search option. I did manage to get one donation from someone I didn’t already know, but only one.

At this point I had about 47 million pages indexed and the search engine had grown to 3 servers. It had crawled only a tiny fraction of the internet, but it was still possible to find what you were looking for much of the time. It’s surprising how well you can do with a small index if you focus mainly on the most popular sites.

But to take things to the next level of quality I would need to build a system able to handle at least a billion pages.

That’s where things get expensive. I had only spent about $8,000 on WbSrch so far. Did I want to spend another $50,000 to get to that next level where users might be a bit stickier and ad revenue might be better (it tends to be lower when you’re low-volume — when you have enough traffic that algorithms can optimize, it gets better). Maybe it would only cost $2 to earn $0.01 and those users would return often enough that I could earn another $0.01.

And that’s where I decided to shut everything down. Math doesn’t lie.

Call it failure to validate. There is no search engine business to be had for me. Maybe someone else could do it. Like DuckDuckGo. Interestingly, they didn’t start with their own crawl. And they’ve partnered with Microsoft for advertising. So they’re essentially a privacy-focused variant of Bing with a different UI. That’s good for them, but the interesting part is developing your own proprietary technology, your own crawler and algorithms. Otherwise, one day Microsoft could decide that having an API is inconvenient and shut down your entire business.

At some point Apple will decide that having a search engine is important and build or buy one. Maybe they’re already building one if the rumors are to be believed.

Anyhow, it’s a little bit sad that it didn’t work, and a little bit sad that I spent all that time on it, but I did get smarter. And not just code. I learned a lot more about marketing and advertising in the process.

So now it’s on to the next thing.

(Player Profile) Michael Manring

Michael Manring is a pretty well-known soloist among bass players, but less so among mainstream music listeners.

He’s known for his custom fretless Zon Hyperbass guitar shown in this video performance/interview from Bass Player LIVE! 2013:

He was the youngest of four in a musical family. He took classes at the Berklee College of Music and studied with Jaco Pastorius. A very technical player,  his style includes use of the e-bow, changing tunings mid-song, slapping, popping, muting, and two-handed tapping. To understand his style, it helps to know that he considers the bass guitar a very expressive instrument. He develops techniques that expand on that expressiveness, including quite heavy use of alternative tunings.

Much of Michael’s music could be considered instrumental “calm jazz” that is often filed as New Age or Adult Alternative, but he has a variety of styles and sometimes plays loud, upbeat, bouncy, funky music. He considers his work to be genre agnostic and doesn’t worry about fitting into any particular category.

His music recordings are very prolific, with of hundreds of collaborations and guest appearances with artists such as Alex Skolnick, Montreux, Jeff Loomis, and Paolo Giordano thanks in part to his role as house bassist with Windham Hill Records. He has also released a number of solo studio albums.

Original solo work (links to Amazon):

1986 Unusual Weather
1989 Toward the Center of the Night
1991 Drastic Measures
1994 Thonk
1995 Up Close 21
1998 The Book of Flame
2005 Soliloquy

(Video) Davie504 Plays 100 Amazing Bass Lines

“Davie504” is an Italian bass player with a great selection of YouTube videos.

In this video he plays 100 famous bass riffs in 13 minutes:

There’s also a sequel “100 Amazing Bass Lines 2” that will show up as a related video.

It includes riffs from a nice variety of bands, though it does have a heavier selection of Red Hot Chili Peppers and Jamiroquai.

If you want to learn a particular riff, you can click the gear on the player and change the video player to half speed. That’s a great feature of YouTube that not everyone knows about.

WbSrch Acquires Music Search Engine

Reprint of a press release originally published on PRWeb at https://www.prweb.com/releases/2016/03/prweb13295857.htm.

WbSrch has purchased the music search engine Snagr.io for an undisclosed sum and has rebranded it as MusicSrch.com.

The search engine, originally developed by Oto Brglez from Slovenia, searches a selection of popular social media and music sites such as Spotify, Tidal, Musicbrainz, and Twitter for the online presences of a band, DJ, or musician.

WbSrch founder Jason Champion had this to say about the purchase:

“Oto created a great, fast, responsive site but didn’t have the time, people, or resources to grow its audience. With this purchase, we hope to continue to expand on the great foundation that he created.

The addition of music search to our existing search capabilities will help us increase our reach online and we look forward to sharing the joy of discovering new music with a larger audience.”

The music search site is currently live at http://musicsrch.com.

WbSrch is a general-purpose search engine based in Portland, OR, that was created in 2013 and launched in 2014. It is online at https://wbsrch.com.

Quora Answer: My son got an offer from a 1-year-old startup by some very senior folks from Google. The pay is good and product idea is good, but it’s a startup. He asked for our advice on this. What are some suggestions from people from relevant fields?

I originally wrote this as an answer to a question on Quora.

I joined a startup founded by ex-Google people in 2010 as the third engineer. I worked there for 2 years, had a great time, learned a ton, got my first patent out of the process, and the experience is one of the best I’ve ever had. I wouldn’t trade it for anything.

The only reason I didn’t stay is because California wasn’t for me, but the startup is still going, and it feels like my shares are a pocket full of scratch-off lottery tickets. They could end up failing, but they won’t.

1-year-old is probably the best time to join a startup with solid backing, and even though being ex-Google doesn’t necessarily make you smarter or more likely to succeed than anyone else, it sure makes it easier to get funded.

In the worst scenario, they’ll run out of money and he’ll have to spend three or four weeks looking for a job. If he has any skill and/or motivation, he’ll be so much smarter and better for the experience that it’ll be really hard to NOT get hired for something new.

That’s the big fear fallacy with startups — that they’ll fail and you won’t be able to get a job. But all of the things you go through in making a serious go of building a startup make you so much better at what you do that there’s nothing to fear. I like to think of the experience factor as being 2 to 1. If you spend 2 years in a startup it’s a similar learning experience to spending 4 years at a stable company.

The fatigue factor is also similar. I feel like software can be a very mentally taxing and draining endeavor, so a 6-month sabbatical every 5 years is almost a must unless you’re in a perfect environment. Being in a startup shortens that “wear out” time, so you might need to take a break every 2-3 years in order to remain healthy and sane.

The most important thing to do if you’re considering joining a startup is to make sure the people running it are somewhat-well-adjusted grownups, or at least likely to become so in the near future.

WbSrch Launches New WbBrowse Web Browser for Windows, Mac, and Linux

Reprint of a press release originally published on PRWeb at https://www.prweb.com/releases/2016/01/prweb13179551.htm.

The independent search engine WbSrch just launched a new desktop web browser called WbBrowse.

WbBrowse supports tabbed browsing, is free, runs on Windows, Mac, and Linux, and is available at http://wbbrowse.com

“The major search engines all have their own browsers. It’s a good way for new users to discover your service, and an easy way to for them to return to it later. Even though the first release of WbBrowse doesn’t have all of the features of the top browsers yet, we think it’s pretty good for a 1.0 release.”

WbSrch also has OpenSearch plugins that can be added to any OpenSearch-capable browser, such as Firefox or Internet Explorer.

WbSrch is a general-purpose search engine based in Portland, OR, that was created in 2013 and launched in 2014.

Quora Answer: What is the minimum number of pages a modern general search engine would have to index to be useful?

I originally wrote this as an answer to a question on Quora.

“Useful” is a very subjective question. People who frequently ask deep and complex language-and-algorithm-specific software engineering questions will require different levels of depth than those who travel a lot and just want to find good prices on airfare and the top 5 restaurants and hotels in each city.

If you have /really/ good algorithms, you can build a search engine that is “good” for people who don’t require much depth with about 100 million pages. For people who require depth, you could probably be pretty useful at about 1 billion pages.

This depends HEAVILY on what you choose to include and exclude. Are these pages all in a single language? Or is this just 100 pages each from the top 1 million sites regardless of language and content?

Even though the web is phenomenally huge, much of it is duplication and/or computer-generated spam. There are millions of sites that are just scrapes/dumps of other sites (especially Wikipedia) and indexing 1000 copies of Wikipedia with different CSS isn’t going to get you very far.

Think about the sites you visit regularly, and about those that regularly turn up in searches. How many of those useful sites are below the top 100,000? Does it matter if there are 100 million+ domains when 99.9% of your needs are covered by the top 0.1%? With a smaller index, choosing what you leave out is pretty important.

There’s a site I like to play with when trying to find obscure results, it’s fun for experimenting with and it helps you understand how much the size/quality of your index affects your results: Million Short

Setting Up a Redash Dashboard

This was originally posted on wbsrch.com. It is reproduced here to preserve history.

The more WbSrch evolves, the more it becomes necessary to keep track of a bunch of metrics.

Until now we’ve been using a mix of simple report pages and raw SQL queries. It has worked well enough, but not having a clean way to track things in a single place is a nuisance.

That’s why I was happy to discover the redash.io open source project. It’s a query tool meant to be used for setting up business intelligence dashboards and it works with a wide range of databases.

No stranger to code, I tried to check out the GitHub source and get it running on my local machine. It didn’t quite work out. They have a bootstrap script, and it had some trouble with my particular system setup (it fell over when it came to configuring local database users).

But they also have EC2 AMI images you can launch to get running in AWS. I fired up an Amazon micro instance on the free tier and had the app running in seconds. It only took some minor configuration to get set up with my SSL certificate, and I was ready to go.

Adding a Database Connection to Redash

Connecting my three PostgreSQL databases was easy and the clean interface made it easy to find the query editor. After running a few queries I had the feel for how things worked well enough to save them. It also lets you set a refresh interval on your queries so you can have data refresh daily, hourly, or whatever. Results are cached so you’re not taxing your database gathering totals every page load.

Redash Query Editor

After you have a few queries, you can start adding them to a dashboard as panels. You just select the query name, the visualization type (you get table by default, but can add graphs and charts in the query builder), and the widget size.

This is a dashboard that I built to keep track of the search traffic and index state for the Somali-language version of WbSrch:

Redash Dashboard Example

I created dashboards for each supported language plus an overall meta-dashboard. It was fairly quick, taking about a day to set up 35 dashboards and about 200 queries.

Luckily the interface is pretty good, because once you have the software set up, that’s where the documentation ends. You can figure out most things with experimentation (trial-and-error), but it would be very helpful to have a few getting started tutorials, or at the least an explanation of how the various visualizations work.

A micro EC2 instance may stumble if you have some large queries (selecting an entire table is a bad idea, don’t do it), or a lot of things refreshing, but it kept up pretty well.

WbSrch, the Independent Search Engine, Expands

Reprint of a press release originally published on PRWeb at https://www.prweb.com/releases/2015/11/prweb13073007.htm.

WbSrch, an independent search engine based in Oregon, has expanded its data center, growing from a single dedicated server to three.

Founder Jason Champion had this to say about the expansion:

“We’ve grown enough that a single server no longer meets our needs. Tripling our footprint will allow us to continue growing and improving our algorithms throughout 2016.

We chose colocating servers over cloud hosting like EC2 because we’re running a very RAM-intensive operation. Most cloud hosting companies make their profits on memory, so once you go beyond a certain point, it’s much less expensive to host your own servers.

Our hosting provider, Opus Interactive, has been great. Their price, quality of service, and reliability has enabled us to continue improving our algorithms without worrying too much about infrastructure costs, and they have plenty of rack space available if we need to grow quickly.”

Started in 2013 and launched in 2014, WbSrch is still quite young, with an index of 10 million pages grouped into more than 30 languages. Traffic has been steadily increasing as the index grows and algorithms improve.

Rather than trying to crawl and index every page on the web, WbSrch aims to build quality results by weighing what is excluded just as heavily as what is included. More than 500,000 domains have been excluded, based primarily on content language and a few other criteria. The number of crawled pages, indexed keywords, and excluded sites is published on the site.

An Experiment with Project Wonderful

This was originally posted on wbsrch.com. It is reproduced here to preserve history.

I’m always looking for new and efficient ways to let people know about WbSrch. That’s why I decided to try advertising with Project Wonderful.

Project Wonderful was built as a banner ad network for web comics.

That doesn’t mean you can only advertise web comics or that advertising can only be placed on web comic sites, but that’s its core demographic.

As a trial, I ran an ad for WbSrch on a few sites that seemed like they’d have people who would be interested in trying out a new search engine. That means other search engines, SEO sites, and literature sites. I also wanted to find out whether webcomic readers were a good target audience.

I deposited $100, and after spending about $70, I think I have a pretty good idea of what works and what doesn’t.

If you want really fine-grained control over your campaigns and ad spending, this is the perfect network for you. You know exactly what you’re going to spend per day on a site, and you can bid on traffic on a per-region basis. Their regions are US, Canada, Europe, and Everywhere Else.

The search functionality is amazing. You can search for gaming sites that have traffic that is at least 50% from Germany and has between 100 and 10000 page views per day, for example.

As a publisher, you can set per-region bid minimums and can auto-approve bids, or require manual approval. This means that you don’t have to worry about running ads for things that you’d be opposed to, so no bacon ads on a veganism site.

Results have been mixed, and I’ve learned more about the types of people who are interested in trying WbSrch.

Some takeaways:

  • Webcomic sites have a high number of page views, but the number of unique users tends to be a fraction of that. The same goes for SEO tools.
  • Blogs tend to have more unique users and fewer page views.
  • Literature sites are somewhere in-between.

Here are my slightly-obfuscated results:

Site Pageviews Unique Views Clicks Spend CPM CPC
A Major Webcomic 581953 11126 35 13.34 0.02 0.38
An SEO Site 233489 17881 141 50.54 0.22 0.36
A Poetry Site 60780 4876 34 5.64 0.09 0.17
A Dutch Site 15584 305 4 0.73 0.05 0.18
A Hungarian Site 3485 730 1 0.35 0.10 0.35
A Search Engine 2711 963 40 0.62 0.23 0.02
A Swedish Site 2315 621 0 0.39 0.17 INF
A Movie Blog 1424 477 0 0.25 0.17 INF
A Knowledge Blog 1285 967 1 0.42 0.33 0.42
A Web Directory 296 83 0 0.05 0.17 INF
A Science Blog 231 124 1 0.29 1.24 0.29

The efficiency varies by site, but some are unbeatable deals for targeted traffic. Others are pricey, but just the type of people that will spend some time searching for themselves and the things they control. Hopefully we’ll be good enough for them to come back again.

There are some sites that I’ll run ads on as long as they exist even though the traffic is low. It’s easy to convince people trying a new search engine to try another new search engine.

I also suspect that my Hungarian and Swedish translations aren’t very good. I know basic Swedish, but the Hungarian is robot-translated.

One of the limitations of Project Wonderful is that if you have a large budget, you may run out of places to advertise efficiently, and for those things that are efficient, they may not get enough traffic to satisfy your hunger (2-cent clicks from your site? I’ll buy at least 1000 per day!). I could easily see struggling to spend a $1000/day budget effectively. If you’re prepared to work on a smaller scale, there is probably no better place to test-run ads because their data and reporting is good and you can learn a lot from your experiments. They also have enough fine-grained control that you can iterate and learn quickly.

$70 is hardly enough to get the full measure of an ad network, but I think I was able to get some useful data out of this experiment. Try Project Wonderful, you may just find it wonderful for your project, especially if your project plays well to webcomic audiences.

Arcade Tokens

These arcade (and other) tokens were originally posted on stampscoinsnotes.com.

Quora Answer: Are analog synthesizers overrated?

I originally wrote this as an answer to a question on Quora.

To put it simply, yes. Analog synthesizers are absolutely overrated. I’m referring specifically to subtractive synthesizers.

The differences in sound quality really ARE quite minor, so don’t feel bad if you can’t much tell them apart.

That’s not to say they’re not useful. They certainly are, and I say this as someone who owns more than one. They each have their own distinctive sound, but analog sounds are quite limited. I don’t mean you can’t create a wide variety of sounds with them, but rather that the type of sounds you can create don’t vary much from one analog synthesizer to another.

Most of the love (and idolatry) for vintage analog gear comes from how groundbreaking and iconic certain synthesizers are and how important they were for music of their time and the amount of joy the music created with them has brought to listeners. Just like most vintage cars aren’t actually very good cars, the historical and nostalgia aspects of vintage analog synthesizers affect their value greatly.

From a music maker’s perspective, the analog sound is worth it to a certain degree, but vintage analog not as much. 30-40 year old synthesizers break all the time, and you’ll be hard-pressed to find a single player or collector who hasn’t spent inordinate amounts of time and/or money on repairs. Good luck touring with an old Jupiter or CS. Just use or create the sounds you like and don’t worry about how they came into existence. The nice things about most modern synthesizer workstations is that they all have a massive library of sample-based sounds (including the sounds of vintage analog), one or more kinds of virtual analog synthesis, and the ability to load samples. Find a friend who likes vintage gear and sample some of it, because a working sampler or virtual analog sounds far better than a broken analog.

In a way it’s good that there’s an analog fad going on right now. New analog gear is being created and can substitute very well for decaying vintage gear. A prophet 6 is a great buy when you can get it new and it has much better capabilities and stability than a prophet 5, not the least of which is a large amount of patch memory.

Quora Answer: I know almost nothing about the stock market but have strong programming skills. Will I be able to make profitable trading software?

I originally wrote this as an answer to a question on Quora.

The world is littered with the empty wallets of engineers that approached the stock market as a math, algorithmic, or engineering problem.

The largest investment banks all have algorithmic trading programs, and they often lose large chunks of money. And sometimes they make large chunks of money. You might be able to compete in the gaps, but they have what amounts to infinitely deep pockets to hire the absolute best people and give them the highest-quality tools and set up servers that have sub-millisecond market access to systems you couldn’t begin to afford the connection fees for.

Something that makes money by trading algorithmically is one of the worst forms of parasite. Like a tapeworm, it removes nutrients from the market and offers nothing in return. But creating something like that is an interesting intellectual problem.

You might be able to build profitable trading software, but if you make it your life’s work, it’s entirely possible that you’ll end up with nothing to show at the end of the journey.