I recently found that Bytespider (a bot from the owners of Tiktok) was consuming more than the combined 2 through 10 top user agents. Bots and humans. Not on this site, but one where bandwidth is actually expensive.
It ignores robots.txt so you need to block it with a firewall. In our case I had to block the 47.128.0.0/16
range.
GPTBot is consistently a top 3 traffic consumer as well.
They will not only index the text from your site, but all of the images and files as well.
With search engine bots, you exchange bandwidth for users.
With closed or AI bots I don’t see a reason why site owners must pay to train commercial language models or for some other (apparently secret) reason.