Present AI benchmarks are struggling to maintain tempo with fashionable fashions. As useful as they’re to measure mannequin efficiency on particular duties, it may be arduous to know if fashions skilled on web knowledge are literally fixing issues or simply remembering solutions they’ve already seen. As fashions attain nearer to 100% on sure benchmarks, in addition they change into much less efficient at revealing significant efficiency variations. We proceed to put money into new and tougher benchmarks, however on the trail to basic intelligence, we have to proceed to search for new methods to guage. The more moderen shift in direction of dynamic, human-judged testing solves these problems with memorization and saturation, however in flip, creates new difficulties stemming from the inherent subjectivity of human preferences.
Whereas we proceed to evolve and pursue present AI benchmarks, we’re additionally constantly seeking to check new approaches to evaluating fashions. That’s why right now, we’re introducing the Kaggle Sport Area: a brand new, public AI benchmarking platform the place AI fashions compete head-to-head in strategic video games, offering a verifiable, and dynamic measure of their capabilities.









