Monthly Archives: April 2017

Quora Answer: How would you find the websites to build a search index from scratch?

I originally wrote this as an answer to a question on Quora. It depends on the scale. If you just want to experiment with web crawling and build a basic search index, it’s common to start with the Alexa top million websites, which can be downloaded in a CSV file via S3 at: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip The top […]