New secret math benchmark stumps AI models and PhDs alike
FrontierMath's difficult questions remain unpublished so that AI companies can't train against it.
https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social
Conversation
Notices
-
Ars Technica (arstechnica@mastodon.social)'s status on Wednesday, 13-Nov-2024 07:51:39 JST Ars Technica