Monthly Archives: April 2017

Quora Answer: How would you find the websites to build a search index from scratch?

I originally wrote this as an answer to a question on Quora.

It depends on the scale.

If you just want to experiment with web crawling and build a basic search index, it’s common to start with the Alexa top million websites, which can be downloaded in a CSV file via S3 at: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

The top million changes daily, and includes a lot of spam and porn. It’s easy to game the system to get into the bottom 500k, so you’ll need to decide what to include and how to weight it.

The DMOZ website dump used to be a good starting point. They shut down on March 17th, 2017, so that’s no longer really an option. It left a lot out, but contained about 4 million URLs with most of the low-quality sites filtered out. There may be a mirror that data somewhere (heck, I’d like to download their last URL dump if it’s available anywhere).

Real search engines have agreements in the place with the top-level registrars that let them get a zone file dump listing all registered domains. This involves jumping through some hoops and filling out some forms, and each registrar needs to be dealt with separately. Getting access to .social is completely separate from getting access to .net.

Since it takes a LOT of work to get access to all TLD domain files, a commercial service like Domains Index is probably your best bet if you want to do anything on a large scale. I’ve bought from them before and it’s a good service. They don’t have absolutely everything, but 200 million domains is far better than the 1 million you get from Alexa.

Quora Answer: Why do some people like instrumental music way over music with vocals in them?

I originally wrote this as an answer to a question on Quora.

Full question for this answer:

“I have noticed that my disinterest is turning to antipathy towards music with vocals in them, especially pop music. Most people say that vocals is extremely important to them and a song without it is just .. ‘blah’.

Is there a word for this sort of preference? How did people develop such an attraction towards instrumental music, while clearly the most heard music is with vocals in them (radio)?”

I strongly prefer instrumental music.

For me the main reason is because I can just enjoy the music and imagine/think whatever I like without being drawn into the world of the singer.

Most of the time when I’m listening to music, I don’t really want to think about someone else’s messed up relationship or dead dog. I’d rather imagine my own things and/or have my own thoughts. I like the way that sound enhances my thinking and I don’t want to give someone else control. Lyrics automatically change the subject.

I listen to a lot of genres, but the absolute worst thing is the tendency of a lot of ambient/atmospheric/psytrance electronic music to include fragments of lectures by Timothy Leary or Deepak Chopra. Stop ruining your music by trying to make idiots put ill-concieved ideas into my head. I instantly turn off anything that does that and never listen again.

Words paint a picture of a world. There are a lot of worlds I don’t want to live in. That’s why I never listen to country or gospel. But, when I’m in the mood for lyrics, the worlds are far stronger in non-instrumental music.

Words can be overwhelming sometimes. Notes, not so much.