Dark Mode Light Mode

OpenScholar, an AI Instrument for Scientific Literature Search, Outperforms ChatGPT and Different LLMs


Scientists recently designed a large language model called OpenScholar to help researchers sift through scientific publications.

Thousands of scientific papers are published every day, and it can be hard for researchers to keep up with the latest advances.

To help, Hannaneh Hajishirzi and Akari Asai, computer scientists at the University of Washington, recently developed OpenScholar, a large language model (LLM) designed specifically to help researchers sift through scientific publications.1 They found that the tool outperformed other LLMs in citation accuracy. Experts also rated OpenScholar-generated answers as more useful than those written by humans, demonstrating the tool’s functionality. Their work is now published in Nature.

Prior to their publication, the researchers had released a demo version of OpenScholar online, and “quicky, we got a lot of queries, far more than we’d expected,” said Hajishirzi, a coauthor on the paper, in a statement. “It really speaks to the need for this sort of open-source, transparent system that can synthesize research.”

Existing AI tools such as ChatGPT are getting better at synthesizing answers, “but the big question ultimately is whether we can trust that its answers are correct,” Hajishirzi said. Besides hallucination, LLMs’ answers are often also outdated, as they are unable to incorporate information that emerged after the model had been trained. They may also lack judgement on which sources should be referred to. “It might cite some research papers that weren’t the most relevant or cite just one paper or pull from a blog post randomly,” said Asai, a coauthor of the paper.

To overcome these limitations, the researchers trained OpenScholar on 45 million open-access scientific papers. The team also designed the AI tool so that it could incorporate new information beyond what was used during training. To do this, they employed a technique called retrieval-augmented generation.

Continue reading below…

Like this story? Sign up for FREE Newsletter updates:

Latest science news storiesTopic-tailored resources and eventsCustomized newsletter content

Subscribe

The researchers evaluated the tool’s performance both automatically, using metrics such as citation accuracy, as well as manually, by recruiting a panel of experts that consisted of graduate students and postdoctoral researchers across different disciplines including computer science, physics, and biomedicine.

For the latter, 12 individuals from the panel wrote a total of 108 questions along with their answers. The researchers asked these questions to OpenScholar, alone or in combination with GPT-4o (a version of ChatGPT). Then, 16 experts—12 of whom wrote the questions and answers but did not rate their own responses—evaluated the AI-generated responses against their human-written counterparts.

The experts rated OpenScholar’s responses (both when used alone and in tandem with GPT-4o) as more useful than human answers over 50 percent of the time. They mostly attributed this rating to OpenScholar’s answers being more comprehensive. This was partially due to length, as OpenScholar-generated responses were typically about twice as long as those written by human experts. Furthermore, the automatic assessment revealed that OpenScholar outperformed other models, such as GPT-4o and Perplexity, in citation accuracy.

The researchers are currently developing a new model, called Deep Research Tulu, which they designed to produce even more comprehensive responses than OpenScholar.2

“Existing AI systems weren’t designed for scientists’ specific needs,” Asai said. “We’ve already seen a lot of scientists using OpenScholar because it’s open source. Others are building on this research and already improving on our results.”



Source link

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Add a comment Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Shakur Stevenson Drops Rehydration Clause Want Vs Ryan Garcia

Next Post

Tremendous Bowl Tailgate Picture Essay: Dangerous Bunny, Large Tech, and the Large Recreation