Present AI benchmarks are struggling to maintain tempo with fashionable fashions. As useful as they’re to measure mannequin efficiency on particular duties, it may be arduous to know if fashions skilled on web information are literally fixing issues or simply remembering solutions they’ve already seen. As fashions attain nearer to 100% on sure benchmarks, in addition they turn out to be much less efficient at revealing significant efficiency variations. We proceed to put money into new and more difficult benchmarks, however on the trail to basic intelligence, we have to proceed to search for new methods to judge. The more moderen shift in the direction of dynamic, human-judged testing solves these problems with memorization and saturation, however in flip, creates new difficulties stemming from the inherent subjectivity of human preferences.
Whereas we proceed to evolve and pursue present AI benchmarks, we’re additionally persistently trying to take a look at new approaches to evaluating fashions. That’s why at this time, we’re introducing the Kaggle Sport Area: a brand new, public AI benchmarking platform the place AI fashions compete head-to-head in strategic video games, offering a verifiable, and dynamic measure of their capabilities.