The Struggle is Real: AI Fails at Telling Time
These days, artificial intelligence has become incredibly advanced, capable of generating photorealistic images, writing novels, and even predicting protein structures. However, recent research has revealed that it often struggles with a very basic task: telling time.
The Study
Researchers at Edinburgh University have conducted an experiment to test the ability of seven well-known multimodal large language models to answer time-related questions based on different images of clocks or calendars. The study, which is forthcoming in April and currently hosted on the preprint server arXiv, demonstrates that these language models have difficulty with these basic tasks.
The Importance of Time
The ability to interpret and reason about time from visual inputs is critical for many real-world applications, ranging from event scheduling to autonomous systems. Despite advances in multimodal large language models, most work has focused on object detection, image captioning, or scene understanding, leaving temporal inference underexplored.
The Models Tested
The team tested several models, including OpenAI’s GPT-4o and GPT-o1, Google DeepMind’s Gemini 2.0, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-11B-Vision-Instruct, Alibaba’s Qwen2-VL7B-Instruct, and ModelBest’s MiniCPM-V-2.6. They fed these models different images of analog clocks, including those with Roman numerals, different dial colors, and some missing the seconds hand, as well as 10 years of calendar images.
The Tasks
For the clock images, the researchers asked the models to identify the time shown on the clock in the given image. For the calendar images, they asked simple questions such as "what day of the week is New Year’s Day?" and more challenging queries like "what is the 153rd day of the year?"
The Challenges
Analogue clock reading and calendar comprehension involve intricate cognitive steps, requiring fine-grained visual recognition and non-trivial numerical reasoning. The researchers found that the AI systems did not perform well, reading the time on analog clocks correctly less than 25% of the time.
The Results
The models struggled with clocks bearing Roman numerals and stylized hands, as well as those lacking a seconds hand altogether. Google’s Gemini-2.0 scored highest on the clock task, while GPT-o1 was accurate on the calendar task 80% of the time. However, even the most successful model made mistakes about 20% of the time.
The Conclusion
The study’s findings highlight a significant gap in the ability of AI to carry out basic skills that are easy for humans. "Most people can tell the time and use calendars from an early age," said Rohit Saxena, a co-author of the study and PhD student at the University of Edinburgh’s School of Informatics. "These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation, and assistive technologies." While AI may be able to complete your homework, don’t count on it sticking to any deadlines.
Source Link