Skip to main content

Sakana, a Japanese startup, claims to have achieved a notable milestone by having its AI generate the first peer-reviewed scientific publication. However, there are significant caveats to consider when evaluating this accomplishment.

The ongoing debate about the role of AI in the scientific process continues to intensify. While some experts believe AI is not yet ready to be a “co-scientist,” others see potential, albeit in the early stages of development. Sakana falls into the latter category, recognizing the potential of AI in scientific research.

The company utilized its AI system, The AI Scientist-v2, to generate a paper that was subsequently submitted to a workshop at ICLR, a well-established and reputable AI conference. According to Sakana, the workshop organizers and ICLR leadership agreed to collaborate on an experiment to conduct a double-blind review of AI-generated manuscripts.

Sakana collaborated with researchers from the University of British Columbia and the University of Oxford to submit three AI-generated papers to the workshop for peer review. The AI Scientist-v2 generated these papers “end-to-end,” including the scientific hypotheses, experiments, experimental code, data analyses, visualizations, text, and titles.

Robert Lange, a research scientist and founding member at Sakana, explained that the AI generated research ideas based on the workshop abstract and description, ensuring that the submissions were on-topic and suitable.

One of the three papers was accepted to the ICLR workshop, which critically examined training techniques for AI models. However, Sakana withdrew the paper before publication to maintain transparency and adhere to ICLR conventions.

Sakana AI paper
A snippet of Sakana’s AI-generated paper.Image Credits:Sakana

Lange noted that the accepted paper introduced a new method for training neural networks and highlighted remaining empirical challenges, providing an interesting data point for further scientific investigation.

However, the achievement is not as impressive as it initially seems. Sakana’s AI occasionally made “embarrassing” citation errors, and the paper did not undergo the same level of scrutiny as other peer-reviewed publications.

The fact that the paper was withdrawn after the initial peer review means it did not receive an additional “meta-review,” which could have potentially led to rejection. Furthermore, acceptance rates for conference workshops tend to be higher than for the main conference track, a fact that Sakana acknowledges in its blog post.

Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, described Sakana’s results as “a bit misleading.” He pointed out that the company selected the papers from a set of generated ones, using human judgment to pick the ones they thought might be accepted.

Mike Cook, a research fellow at King’s College London specializing in AI, questioned the rigor of the peer reviewers and the workshop. He noted that new workshops are often reviewed by more junior researchers and that the workshop’s focus on negative results and difficulties might make it easier for an AI to write convincingly.

Cook also pointed out that it is not surprising that an AI can pass peer review, given its ability to write human-sounding prose. He noted that partly AI-generated papers passing journal review is not new and that the ethical dilemmas this poses for the sciences are already being discussed.

The technical shortcomings of AI, such as its tendency to hallucinate, make many scientists wary of endorsing it for serious work. Experts fear that AI could generate noise in the scientific literature rather than contributing meaningful progress.

Cook emphasized the need to distinguish between passing peer review and contributing knowledge to a field. “We need to ask ourselves whether [Sakana’s] result is about how good AI is at designing and conducting experiments, or whether it’s about how good it is at selling ideas to humans — which we know AI is great at already.”

Sakana acknowledges that its AI is not capable of producing groundbreaking or novel scientific work. Instead, the goal of the experiment was to study the quality of AI-generated research and highlight the need for norms regarding AI-generated science.

The company recognizes the importance of judging AI-generated science on its own merits to avoid bias and will continue to engage with the research community to ensure that AI does not develop into a tool solely for passing peer review, thereby undermining the scientific peer review process.


Source Link