Hacker News

Subscribe to Hacker News feed
Hacker News RSS
Updated: 25 min 36 sec ago

Scythe Works Without Borders

Sun, 10/27/2024 - 1:13am
Categories: Hacker News

Robots.txt pitfalls: what I learned the hard way

Sun, 10/27/2024 - 1:09am

This applies to sites indexed on Google that hope to gain organic traffic. As an indie blogger and SEO enthusiast, I foolishly updated my robots.txt file to prevent indexing of certain unwanted parts of my site, leading to subtle repercussions that I couldn't have foreseen.

A few days ago, while reading about SEO, I came across the concept of a "crawl budget." Apparently, Google allocates a specific crawl budget to your indexed site, and the more useless content it has to index and store on its servers, the more it affects your site—resulting in delays for new content indexing, favicon updates, and robots.txt crawling.

Being a minimalist and utilitarian, I decided to prevent indexing of the `/uploads/` directory on my site since it mostly contained images used in my articles. I thought blocking this "useless content" would free up more crawling budget for my primary content, i.e., articles. So, I added this directory to my site's robots.txt:

# Group 1 User-agent: * Disallow: /public/ Disallow: /drafts/ Disallow: /theme/ Disallow: /page* Disallow: /uploads/ Sitemap: https://prahladyeri.github.io/sitemap.xml The way search engines work means there's typically a 5-7 day gap between updating the robots.txt file and crawlers processing it. After about a week, I noticed that my site's favicon disappeared from SERPs on mobile browsers! Instead, there was a bland (empty) icon in its place. That’s when I realized that my favicons also resided in the `/uploads/` directory. After I recently optimized the favicon format by switching from WEBP to PNG, Google was unable to crawl and index the new favicon at all!

Once I realized this mistake, I removed the blocking of `/uploads/` from the robots.txt and requested a recrawl. But who knows how long it will take for Google's systems to sync this change and start showing the site's favicon back in SERPs! Two lessons learned:

1. The robots.txt file is highly sensitive; avoid modifying it if possible. 2. Applying SEO is like steering an extremely large ship or vessel. You pull a lever now, and the ship only moves after several days!

Comments URL: https://news.ycombinator.com/item?id=41960003

Points: 2

# Comments: 0

Categories: Hacker News

All (386) GNU Packages

Sun, 10/27/2024 - 1:04am

Article URL: https://directory.fsf.org/wiki/GNU

Comments URL: https://news.ycombinator.com/item?id=41959984

Points: 3

# Comments: 0

Categories: Hacker News

Pages