minus-squarearcine@jlai.lutoTechnology@lemmy.world•Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.linkfedilinkEnglisharrow-up3·3 days agoTry spelling things phonetically (example: faux net tick alley), that’s one of my benchmarks that AI fails almost every time. If the input is at all long, or purposefully includes a lot of words about a specific unrelated theme to the coded message, it’s impossible. linkfedilink
Try spelling things phonetically (example: faux net tick alley), that’s one of my benchmarks that AI fails almost every time.
If the input is at all long, or purposefully includes a lot of words about a specific unrelated theme to the coded message, it’s impossible.