It is currently 05 Dec 2024, 11:29

All times are UTC [ DST ]




 Page 1 of 1 [ 15 posts ] 
Author Message
 Post subject: Googlebot says I'm blocking them...
PostPosted: 12 Jun 2024, 08:52 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
... and so the number of results on Google for LDDb or forum-related topics is slowly disappearing.

Problem is: I do not block Google

robots.txt is here => (file removed)

Does any of you know what filter breaks Google?

Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot ays I'm blocking them...
PostPosted: 12 Jun 2024, 09:00 
Absolute fan
Absolute fan
User avatar

Joined: 11 Jun 2008, 06:10
Posts: 1636
Location: Milky Way-Sol System-Terra-USA-North Carlolina.
Has thanked: 611 times
Been thanked: 258 times
What does that mean exactly ?
_________________
Acta Non Verba .....
Si Vis Pacem Para Bellum ....
Si Gorgiamus Allos Subjectatos Nunc ......
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 12 Jun 2024, 09:17 
Serious fan
Serious fan
User avatar

Joined: 22 Oct 2023, 20:20
Posts: 226
Location: Germany
Has thanked: 87 times
Been thanked: 123 times
I am not an expert on the topic.
That said:
I think the problem is a simple typo in the robots.txt.

"User-agent: Googlebot
Craw-delay: 10"

should be:
"User-agent: Googlebot
Crawl-delay: 10"

I guess due to the missing "l" it is interpreted as "disallow" or ignored alltogether.

Cheers,
Markus
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 12 Jun 2024, 19:40 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
mth1986 wrote:
Craw-delay: 10

I guess due to the missing "l" it is interpreted as "disallow" or ignored altogether.


Well spotted!

Trying to re-validate with Google to see if they like it better.

Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 18 Jun 2024, 09:41 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
Didn't work, Googlebot ignores the crawl-delay anyway.

But I see that the sitemaps failed to be retrieved by Google and they never mentioned it before...

Re-activating the Sitemaps.

Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 18 Jun 2024, 11:33 
Serious fan
Serious fan
User avatar

Joined: 22 Oct 2023, 20:20
Posts: 226
Location: Germany
Has thanked: 87 times
Been thanked: 123 times
Let's see if that did the trick. :problem:
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 21 Jun 2024, 11:04 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
It was still "blocked by robots.txt" when nothing was related to disallowing Google user-agents.

Removed the robots.txt and it worked again. No idea why.

Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 21 Jun 2024, 16:36 
True fan
True fan
User avatar

Joined: 26 Mar 2024, 10:01
Posts: 275
Location: Australia
Has thanked: 147 times
Been thanked: 84 times
KILL Google
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 21 Jun 2024, 21:59 
Serious fan
Serious fan
User avatar

Joined: 22 Oct 2023, 20:20
Posts: 226
Location: Germany
Has thanked: 87 times
Been thanked: 123 times
Have you tried tools like:

https://technicalseo.com/tools/robots-txt/

https://sanofeld.de/robots-tester/

Never had to do so, but maybe it is worth a try.
For sanofeld you need to implement the robots.txt again so it can be checked.
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 22 Jun 2024, 11:47 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
Google robots-test page seems to consider ANY directive to be aimed at them, regardless of the user-agent defined above.
I'll rewrite the robots.txt from scratch but I think Google changed something ~1 month ago.

Anyway, back on track:

For LDDb.com
Quote:
Google has started validating your fix of Page indexing issues on your site. Specifically, we are checking for ‘Blocked by robots.txt’, which currently affects 602292 pages.

Quote:
Google has started validating your fix of Page indexing issues on your site. Specifically, we are checking for ‘Crawled - currently not indexed’, which currently affects 2290577 pages.


For forum.LDDb.com
(not yet started)

I found that, since 2021, Google is publishing the list of their crawlers' IP addresses on https://developers.google.com/search/apis/ipranges/googlebot.json.

I'll pre-allow them for safety.

Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 04 Jul 2024, 19:23 
Serious fan
Serious fan
User avatar

Joined: 22 Oct 2023, 20:20
Posts: 226
Location: Germany
Has thanked: 87 times
Been thanked: 123 times
Just curious:
Is the problem solved?
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 04 Jul 2024, 23:45 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
mth1986 wrote:
Just curious:
Is the problem solved?


Only by completely removing the robots.txt file, which is really weird.
It used to work before for many, many years!

Something happened at Google that we haven't heard of yet.

Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 17 Jul 2024, 03:14 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
I was not paranoid...!

https://tech.slashdot.org/story/24/07/16/1843240/google-now-defaults-to-not-indexing-your-content

Quote:
Google is no longer trying to index the entire web. In fact, it's become extremely selective, refusing to index most content. This isn't about content creators failing to meet some arbitrary standard of quality. Rather, it's a fundamental change in how Google approaches its role as a search engine.

From my experience, Google now seems to operate on a "default to not index" basis. It only includes content in its index when it perceives a genuine need.


I can tell you it started around May 13th and increased until I removed the robots.txt altogether and asked for a re-indexing around June 22nd.

Attachment:
block_by_robots.txt.png
block_by_robots.txt.png [ 13.1 KiB | Viewed 855 times ]


Attachment:
LDDb-indexed.png
LDDb-indexed.png [ 19.14 KiB | Viewed 855 times ]


For some reason, it didn't work on this Forum, we are slowly disappearing from Google searches...

Attachment:
block_by_robots.txt-Forum.png
block_by_robots.txt-Forum.png [ 10.53 KiB | Viewed 855 times ]


Attachment:
Forum-indexed.png
Forum-indexed.png [ 18.17 KiB | Viewed 855 times ]


Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 28 Jul 2024, 09:09 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
Solution was to get rid of robots.txt and block AI crawlers / pre-allow Google and few others to crawl.

It seemed to improve Google indexing on the main LDDb.com:

Attachment:
LDDb.png
LDDb.png [ 17.65 KiB | Viewed 792 times ]


But not for the Forum:

Attachment:
Forum.png
Forum.png [ 17.54 KiB | Viewed 792 times ]


Here is the bandwidth (== Out) when the AI crawlers started acting crazy leading to 2 rounds of 30-day block per IP/User-Agent:

Attachment:
Clipboard_07-28-2024_01.png
Clipboard_07-28-2024_01.png [ 43.34 KiB | Viewed 792 times ]


Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
 Post subject: Re: Googlebot says I'm blocking them...
PostPosted: 31 Jul 2024, 02:20 
Site Admin
Site Admin
User avatar

Joined: 07 Aug 2002, 23:37
Posts: 4753
Location: Tokyo
Has thanked: 323 times
Been thanked: 1284 times
Oh... Google does not honor crawl-rate anymore :-/

https://developers.google.com/search/blog/2023/11/sc-crawl-limiter-byebye
https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget#emergencies

Quote:
The non-standard "crawl-delay" robots.txt rule is not processed by Googlebot.


That sure doesn't help.

Here is the top 10 crawlers this month:

Hosts : 30,192 Known, 23,483 Unknown (unresolved ip)
37,617 Unique visitors   Pages   Hits   Bandwidth
crawl-66-249-79-75.googlebot.com   267,211   267,211   2.53 GB
crawl-66-249-76-69.googlebot.com   217,311   217,311   2.19 GB
crawl-66-249-79-76.googlebot.com   214,158   214,158   2.04 GB
crawl-66-249-76-70.googlebot.com   166,465   166,465   1.69 GB
crawl-66-249-79-64.googlebot.com   164,493   164,493   1.58 GB
crawl-66-249-76-71.googlebot.com   120,471   120,471   1.23 GB
crawl-66-249-79-65.googlebot.com   111,521   111,521   1.10 GB
crawl-66-249-76-72.googlebot.com   78,509   78,509   828.70 MB
crawl-66-249-79-66.googlebot.com   71,149   71,149   728.98 MB
crawl-66-249-66-165.googlebot.com   56,581   56,581   537.56 MB


If you feel the website is slower than usual, thank Google for that.

Attachment:
CPU.png
CPU.png [ 53.56 KiB | Viewed 777 times ]


Julien
_________________
HARDWARE DATABASE
HLD-X0/9 LD-S9 OPPO 105/205 SL-1200G
LDD-1 MSC-4000 R2144 PONTUS II C45 MC257
Offline
 Profile  
 
Display posts from previous:  Sort by  
 Page 1 of 1 [ 15 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: