It is said that Thomas Young, an English scientist and doctor who lived at the turn of the 19th century, was the last person who knew everything there was to know in the world. With a sharp mind and access to all the scientific literature of the time, Young made lasting contributions to the field of optics, material science, medicine, and Egyptology, just to name a few.

Since Young’s time, however, science has advanced with staggering speed. It is estimated that all of our scientific knowledge doubles every nine years. Nowadays, a scientist can spend her entire career siloed in a subsection of a single field–exploring the pathway of a single biochemical reaction in the membrane of stomach cells, for example. The deluge of data makes it a full-time job for scientists just to keep up with their domain, let alone reach out and learn something potentially useful from another field.

“The term ‘second opinion’ in the medical field speaks exactly to the fact that even domain experts cannot possibly keep up with the amount of research published in any one domain,” said Oren Etzioni, CEO of the Paul Allen Institute for Artificial Intelligence (AI2). “Two doctors in the same field end up having different views on symptoms or treatment of diseases.”
As head of a leading artificial intelligence (AI) research hub, Etzioni and his team aim to help advance science and medicine using the power of machine learning. There might be too much information out there for anyone in the modern world to be as well-read as Thomas Young, but Etzioni thinks there are ways AI can distill and compile information so that scientists and doctors can find new insights in a multidisciplinary manner.
To this effect, one of the Institute’s major projects is an AI-powered scientific literature search engine called Semantic Scholar. Previously restricted to computer and brain science articles, the team added all biomedical literature last week–taking the total number of papers from about 12 million to over 40 million.
Wading through the swamp of scientific articles is grueling work. Scientists themselves have been known to miss previous discoveries, delaying progress on things like anticancer treatments for years at a time.
Etzioni hopes to use AI to solve this problem for the common good. “So much effort has gone into product search, video search, and other popular arenas—it’s time to invest in scientific search,” he said. “Particularly in the field of biomedicine, where better research can directly translate into improved medicine and saved lives!”

Searching through scientific literature is a chore that Marie Hagaman knows all too well. She is a software engineer that joined the Semantic Scholar team and led the charge to add biomedical literature to the database, in part because of a frustrating experience trying to figure out treatment for a painful ulcer issue.
“In 2002, I started having severe stomach pain and heartburn with everything I ate or drank–even a glass of water was enough to have me doubled over in pain,” said Hagaman.
“The doctor prescribed a drug that reduces stomach acid and said I would need to be on it for the rest of my life. While the medication treated my symptoms, the cause could not be explained. I saw a second doctor and heard the same story. At my wits end, I started searching online for medical research papers about stomach ulcers. Not being an expert in the field, it was hard to know where to start, what search terms to use, and to tell what was important.”
Eventually, she learned about Helicobacter pylori, a bacteria that has been identified as the cause of most stomach ulcers. Nobody believed Barry Marshall, the doctor who discovered this in the 1980s, so he drank a beaker of H. pylori to show that he would start developing ulcers.
Armed with this knowledge, Hagaman went to a third doctor who gave her a course of antibiotics to flush out the invading bacteria. She hasn’t had any troubles since.
Hagaman hopes that the newly-upgraded Semantic Scholar can make these kind of digging easier for scientists, doctors, and the general public.

“With this update to Semantic Scholar, we’ve made it possible for anyone to learn about the latest biomedical research on a topic they’re unfamiliar with,” Hagaman said. “For experts in a particular field that are stepping outside their comfort zone or those in the general public who want to learn about their own conditions, we’ve added topic pages as a starting point and summary. We also include suggested terms to help augment or refine your search if you’re unfamiliar with the domain terminology. For example if you search for ‘stomach ulcer’ we show a topic page for ‘gastric ulcer’, the synonymous medical term, featuring several review papers explaining the link with H. pylori. I wish I had this years ago when I was exploring this topic!”
Machine learning first starts with machine teaching: the Semantic Scholar team feeds the software scientific papers in which people have identified attributes of interest–for example, how influential citations are within any given paper. From this, the software can extract surprisingly insightful information.
For example, “Semantic Scholar has learned that when a citation appears in a list of citations, then it is less influential than when it appears alone,” said Etzioni. The AI also determined that a citation is even more important if it appears in a paper’s abstract.
This is just one of the many features that makes Semantic Scholar a more powerful search engine than, say, Google Scholar or Web of Science. Instead of just ordering papers based on citations, Semantic Scholar actually provides context.
But isn’t relinquishing control over context a dangerous thing? What you get out of a machine learning program is only as good as what you put in. Put garbage in, get garbage out. One is reminded of the ill-fated AI chatbot Tay that Microsoft developed. It learned from talking to people on Twitter and, perhaps completely unsurprisingly, became vehemently racist in less than a day.
I asked Etzioni if Semantic Scholar could, wittingly or unwittingly, develop its own biases that skew search results.
“Our AI tool can definitely make mistakes,” agreed Etzioni. He stressed that transparency is key to identifying and fixing biases: “Whenever possible, we offer the users justification so they can make their own assessment.”

For a long time, I have been a big fan of Douglas Hofstadter and the idea of emergent consciousness–that consciousness arises organically from sufficiently complex systems, sort of like how an ant colony is smarter than any individual ant. I was curious if Semantic Scholar could get to the point that it could discover new science on its own.
“We’ve already heard AI compose music, I think it’s a realistic possibility to have Semantic Scholar author papers some day,” said Hagaman.
Etzioni, though he mentioned Hofstadter was the reason he became an AI researcher, was cautious not to overhype Semantic Scholar’s capabilities. “Our long-term vision is that it becomes a helpful discovery tool that suggests hypotheses to scientists and doctors, but even then it won’t have common sense, consciousness, and other key human attributes,” he said. “The AIs we have now are narrow and not smart at all in the general sense.”
I couldn’t help myself–I had to ask Semantic Scholar itself what it thought. The top choice for the search term “emergent consciousness”? A science fiction piece set in the year 2116, in which a conscious AI named SAMI 9000 lets physicists monitor his system as he drifts in and out of consciousness.
It is as yet unclear if Semantic Scholar would be as willing to cooperate with its creators. Perhaps, the next person to know everything won’t be a person at all.
Banner image credit: Pixabay.