Google’s John Mueller answered a query about llms.txt associated to duplicate content material, stating that it doesn’t make sense that it could be considered as duplicate content material, however he additionally said it may make sense to take steps to stop indexing.
LLMs.txt
Llms.txt is a proposal to create a brand new content material format customary that enormous language fashions can use to retrieve the primary content material of an internet web page with out having to take care of different non-content information, similar to promoting, navigation, and anything that’s not the primary content material. It affords internet publishers the power to supply a curated, Markdown-formatted model of a very powerful content material. The llms.txt file sits on the root degree of an internet site (instance.com/llms.txt).
Opposite to some claims made about llms.txt, it’s not in any method related in objective to robots.txt. The aim of robots.txt is to manage robotic conduct, whereas the aim of llms.txt is to supply content material to massive language fashions.
Will Google View Llms.txt As Duplicate Content material?
Somebody on Bluesky requested if llms.txt could possibly be seen by Google as duplicate content material, which is an efficient query. It may occur that somebody outdoors of the web site would possibly hyperlink to the llms.txt and that Google would possibly start surfacing that content material as a substitute of or along with the HTML content material.
That is the query requested:
“Will Google view LLMs.txt recordsdata as duplicate content material? It appears stiff necked to take action, provided that they know that it isn’t, and what it’s actually for.
Ought to I add a “noindex” header for llms.txt for Googlebot?”
Google’s John Mueller answered:
“It will solely be duplicate content material if the content material have been the identical as a HTML web page, which wouldn’t make sense (assuming the file itself have been helpful).
That mentioned, utilizing noindex for it may make sense, as websites would possibly hyperlink to it and it may in any other case develop into listed, which might be bizarre for customers.”
Noindex For Llms.txt
Utilizing a noindex header for the llms.txt is a good suggestion as a result of it is going to stop the content material from coming into Google’s index. Utilizing a robots.txt to dam Google isn’t crucial as a result of that may solely block Google from crawling the file which is able to stop it from seeing the noindex.
Featured Picture by Shutterstock/Krakenimages.com