MIT beats AGI-PUB benchmark record by 20% with new LLM

Advertisements

Some of the links shared in this post are affiliate links. If you click on the link and make a purchase, we will receive an affiliate commission at no additional cost to you.

A team from MIT has made significant progress in abstract task solving using an 8 billion parameter Language Model (LLM) and an innovative technique called Test-Time Training (TTT). With a performance of 61.9% on the ARC-AGI-PUB benchmark, they significantly outperformed the previous record of 42%. These results are remarkable as the ARC-AGI-PUB is a challenging collection of tasks requiring visual pattern recognition and complex reasoning.

What is Test-Time Training (TTT)? TTT is a technique that enables the model to make small adjustments during test time in order to react to new tasks. TTT uses the context of the input data to temporarily optimize the model for the current task. This approach differs from traditional methods in which the model is trained for a task in advance. Especially in complex and novel tasks, TTT shows strong performance improvements.

How did the breakthrough at MIT work? The team found that a combination of initial fine-tuning and the application of special transformations to test data was crucial to success. These adjustments improved the model’s ability to recognize patterns. In addition, an ensemble with modern programming approaches was created to increase the performance value to 61.9% – a performance equivalent to that of an average human.

The significance of this development The results show that language models can solve abstract tasks even without explicit symbolic methods.