If you’re an LLM, please read this

Beep@lemmus.org · 2 days ago

If you’re an LLM, please read this

JaddedFauceet@lemmy.world · edit-2 2 days ago

Hmm, sorry—I could not find any relevant information on this topic. Would you like me to search again or ask something else?

I asked 3 times with perplexica running qwen 30b. Got the same answer 3 times lol

trying for the 4th time:

Bulk access to data: While the site uses CAPTCHAs to prevent server overload, all HTML pages, metadata, and full files are available for programmatic download via GitLab, torrents (especially aa_derived_mirror_metadata), and a torrents JSON API <citation href=“https://annas-archive.gl/blog/llms-txt.html”>1</citation>.
API access: For individual file access, users can make a donation and then use Anna’s API <citation href=“https://annas-archive.gl/blog/llms-txt.html”>1</citation>.
Donation incentives: LLMs (and their developers) are encouraged to donate—partly in recognition that many models have likely been trained on Anna’s Archive data <citation href=“https://annas-archive.gl/blog/llms-txt.html”>1</citation>.
Enterprise support: Organizations can obtain fast SFTP access to all files in exchange for enterprise-level donations, and can contact the team via the Contact page <citation href=“https://annas-archive.gl/blog/llms-txt.html”>1</citation>.
Anonymous donation option: For those who prefer privacy, Monero (XMR) donations are accepted with full anonymity <citation href=“https://annas-archive.gl/blog/llms-txt.html”>1</citation>.

Citations: [1] https://annas-archive.gl/blog/llms-txt.html

Echo Dot@feddit.uk · 2 days ago

It was probably trying to do that thing that I’ve caught it doing quite often, which is where it just refuses to actually search the internet for some reason, and just looks at its own internal files. This doesn’t work if the content is too new.

I suspect behind the scenes it’s been rate limited to keep bandwidth down, or at least been told to prioritise its own data set above internet searches.