minus-squareBodilessGaze@sh.itjust.workstoTechnology@lemmy.world•Researchers gaslit Claude into giving instructions to build explosiveslinkfedilinkEnglisharrow-up1·14 days agoInterestingly, LLMs are horrible at Zork: https://arxiv.org/abs/2602.15867 Our results reveal that all tested models achieve less than 10% completion on average, with even the best-performing model (Claude Opus 4.5) reaching only approximately 75 out of 350 possible points linkfedilink
minus-squareBodilessGaze@sh.itjust.workstoLemmy Shitpost@lemmy.world•I am about to learn everything.linkfedilinkarrow-up3arrow-down1·1 month agoThey yanked the chapter about selling copper because it got bad reviews linkfedilink
Interestingly, LLMs are horrible at Zork: https://arxiv.org/abs/2602.15867