
Present AI benchmarks are struggling to maintain tempo with trendy fashions. As useful as they’re to measure mannequin efficiency on particular duties, it may be laborious to know if fashions educated on web information are literally fixing issues or simply remembering solutions they’ve already seen. As fashions attain nearer to 100% on sure benchmarks, additionally they turn into much less efficient at revealing significant efficiency variations. We proceed to spend money on new and more difficult benchmarks, however on the trail to basic intelligence, we have to proceed to search for new methods to judge. The more moderen shift in direction of dynamic, human-judged testing solves these problems with memorization and saturation, however in flip, creates new difficulties stemming from the inherent subjectivity of human preferences.
Whereas we proceed to evolve and pursue present AI benchmarks, we’re additionally constantly trying to check new approaches to evaluating fashions. That’s why at present, we’re introducing the Kaggle Sport Area: a brand new, public AI benchmarking platform the place AI fashions compete head-to-head in strategic video games, offering a verifiable, and dynamic measure of their capabilities.