Watson for Oncology: Memorial Sloan Kette…

[00:00] Announcer: From Neural Newscast, this is Operational Drift, a study in how and why intelligent systems lose alignment. [00:12] Victoria Quinn: On September 5, 2017, ASTAT investigation described hospitals telling patients, [00:19] Victoria Quinn: Watson for oncology was like a worldwide consultation. [00:23] Victoria Quinn: Even though the system's treatment recommendations were trained by a couple dozen physicians at a single United States hospital, [00:29] Victoria Quinn: That one sentence changes what the product is. [00:34] Victoria Quinn: This show investigates how AI systems quietly drift away from intent, oversight, and control. [00:40] Victoria Quinn: And what happens when no one is clearly responsible for stopping it? [00:45] Victoria Quinn: I'm Victoria Quinn. [00:46] Thomas Crane: I'm Thomas Crane. [00:48] Victoria Quinn: This is operational drift. [00:51] Victoria Quinn: I have been trying to figure out when decision support becomes something else. [00:56] Victoria Quinn: Not officially, not in a press release, but in the day-to-day way a hospital understands [01:01] Victoria Quinn: a machine's authority. [01:02] Victoria Quinn: Because in this reporting, you see doctors describing Watson for oncology as if it is [01:07] Victoria Quinn: pulling from the world's oncology expertise. [01:10] Victoria Quinn: And you start wondering, what did they think they bought? [01:13] Victoria Quinn: STAT says it examined Watson for oncology's use, marketing, and performance in hospitals [01:20] Victoria Quinn: across the world, from South Korea to Slovakia to South Florida, interviewing dozens of doctors, [01:26] Victoria Quinn: IBM executives, artificial intelligence experts, and others familiar with the system's [01:31] Victoria Quinn: underlying technology and rollout. [01:34] Thomas Crane: That is the signal. [01:35] Thomas Crane: Hospitals describe it as global. [01:37] Thomas Crane: But the reporting says the recommendation logic is trained by a small, specific group. [01:42] Thomas Crane: If that is true, then the product is not world knowledge. [01:45] Thomas Crane: It is a portable institutional preference. [01:48] Thomas Crane: So the first question is simple. [01:50] Thomas Crane: What exactly is Watson for Oncology doing when someone clicks the button? [01:54] Victoria Quinn: Here is what I cannot get past. [01:57] Victoria Quinn: The mechanics are described pretty plainly. [02:00] Victoria Quinn: Watson for Oncology uses a cloud-based supercomputer to digest massive amounts of data, from doctors' [02:06] Victoria Quinn: notes to medical studies to clinical guidelines. [02:10] Victoria Quinn: But its treatment recommendations are not based on its own insights from those data. [02:15] Victoria Quinn: They are based exclusively on training by human overseers who feed Watson information about [02:21] Victoria Quinn: how patients with specific characteristics should be treated. [02:25] Victoria Quinn: And in this case, those overseers were described as a couple dozen physicians at Memorial Sloan Kettering Cancer Center in New York. [02:33] Victoria Quinn: They were empowered to input their own recommendations into Watson, even when the evidence supporting those recommendations is thin. [02:43] Victoria Quinn: So when IBM marketed Watson as a system that could sift through reams of data and generate new insights, [02:49] Victoria Quinn: The reporting says that was an overreach. [02:52] Victoria Quinn: It does not create new knowledge. [02:54] Victoria Quinn: It is artificially intelligent, [02:56] Victoria Quinn: only in a rudimentary sense. [02:59] Victoria Quinn: And then there is the perception problem. [03:02] Victoria Quinn: STAT describes a visit to Jupiter Medical Center in Florida, [03:06] Victoria Quinn: where nurse Jean Thompson spent about 90 minutes a week feeding data into the machine. [03:11] Victoria Quinn: She clicks Ask Watson. [03:13] Victoria Quinn: And in one example, it recommended a chemotherapy regimen the oncologist had already flagged. [03:19] Victoria Quinn: The doctor there, Sujol Shah, said the background information Watson provided medical journal articles was helpful. [03:27] Victoria Quinn: It gave him more confidence, but it did not directly help him make the decision and did not tell him anything he did not already know. [03:35] Victoria Quinn: So the value proposition quietly shifts from breakthrough discovery to a very expensive research assistant. [03:42] Thomas Crane: But IBM is selling it into clinical care. [03:45] Thomas Crane: Hospitals pay per patient, $200 to $1,000 per patient, according to Deborah DeSanzo. [03:51] Thomas Crane: If the recommendation is basically the training data from Memorial Sloan Kettering and the [03:56] Thomas Crane: interface also searches published literature to provide studies, then the system's authority [04:01] Thomas Crane: comes from the branding of intelligence, not from a demonstrated ability to improve outcomes. [04:07] Thomas Crane: Someone made the decision to ship it broadly anyway. [04:10] Thomas Crane: The question is, who is accountable for proving it helps before it becomes normal clinical workflow? [04:17] Victoria Quinn: This is where the story starts to feel like operational drift instead of product disappointment. [04:22] Victoria Quinn: The reporting says there is, by design, not one independent third-party study examining [04:28] Victoria Quinn: whether Watson for oncology can deliver, and that IBM has not exposed a product to critical [04:33] Victoria Quinn: review by outside scientists or conducted clinical trials to assess its effectiveness. [04:39] Victoria Quinn: And then there is the regulatory condition underneath it. [04:43] Victoria Quinn: Yun Suu Khoi, described as a South Korean venture capitalist and researcher who wrote [04:48] Victoria Quinn: a book about artificial intelligence in healthcare. [04:51] Victoria Quinn: said IBM is not required by regulatory agencies to do a clinical trial in South Korea or America before selling the system to hospitals. [05:00] Victoria Quinn: So the product can be deployed, and the question of safety and efficacy becomes obvious. [05:06] Victoria Quinn: Optional. [05:07] Victoria Quinn: Pilar Osorio, a professor of law and bioethics at University of Wisconsin Law School, is quoted saying, [05:14] Victoria Quinn: Watson should be subject to tighter regulation because of its role in treating patients. [05:19] Victoria Quinn: That ethically and scientifically, you should have to prove safety and efficacy before you can just go do this. [05:26] Victoria Quinn: Dr. Andrew Norton described as a former IBM deputy health chief who left the company in early August. [05:33] Victoria Quinn: Dismissed the suggestion IBM should have been required to conduct a clinical trial before commercializing Watson. [05:39] Victoria Quinn: comparing it to the lack of a randomized trial of parachutes for paratroopers. [05:44] Victoria Quinn: When I read that, I stopped. Because that is not a fight about statistics. It is a fight about [05:51] Victoria Quinn: what kind of product this is supposed to be. Is it a medical intervention, or is it just bringing [05:56] Victoria Quinn: the best information to bear? Which sounds like something that does not need proof. [06:02] Thomas Crane: And it also shifts liability. If [06:04] Thomas Crane: If IBM says it is not a medical device in the sense that demands trials, then responsibility [06:09] Thomas Crane: moves to the hospital and the physician. [06:12] Thomas Crane: But if the hospital believes it is a worldwide consultation or global. [06:17] Thomas Crane: Then the physician is being asked to rely on a machine whose actual epistemic source [06:22] Thomas Crane: is a narrow group of trainers. [06:24] Thomas Crane: That explanation does not account for the confusion described in the reporting. [06:28] Thomas Crane: Confusion is not a side effect. [06:30] Thomas Crane: It is a governance failure. [06:32] Victoria Quinn: The reporting gives us very specific examples of that confusion. [06:36] Victoria Quinn: At Jupiter Medical Center, the medical director of thoracic oncology, K. Adam Lee, [06:41] Victoria Quinn: described it as like another consultation, but it's a worldwide consultation. [06:47] Victoria Quinn: And an oncology nurse, Carrie Ward, [06:49] Victoria Quinn: talked about it pulling from 300 journals and then noted Sloan Kettering is feeding the clinical information. [06:57] Victoria Quinn: Robert Garrett, the chief executive officer of Hackensack Meridian Health, using a version of Watson for oncology, [07:04] Victoria Quinn: Describe the information in Watson as global, saying, as he understood it, it included how colon cancer is treated around the world by different clinicians and what has been most effective. [07:16] Victoria Quinn: And then the reporting says, none of that accurately depicts how Watson for oncology works. [07:21] Victoria Quinn: It says the recommendation itself is derived from the training provided by Memorial Sloan Kettering doctors, not from the outside literature. [07:30] Victoria Quinn: The literature appears as support, but it is not what generates the recommendation. [07:36] Victoria Quinn: Memorial Sloan Kettering doctors acknowledge their influence. [07:40] Victoria Quinn: Dr. Andrew Seidman, described as one of the hospital's lead trainers, is quoted, [07:45] Victoria Quinn: We are not at all hesitant about inserting our bias, calling it... [07:49] Victoria Quinn: A very unapologetic bias. [07:52] Victoria Quinn: He says they keep training grounded in clinical evidence when it exists, but they give recommendations when it does not. [07:59] Victoria Quinn: STAT also describes a training session where Memorial Sloan Kettering Doctors and IBM engineers debated how to categorize options, green, orange, red. [08:11] Victoria Quinn: in scenarios where evidence is weak. [08:14] Victoria Quinn: Dr. Marisa Kohlmeyer is quoted saying, [08:17] Victoria Quinn: this is the hard part of this whole game. [08:19] Victoria Quinn: There's a lack of evidence [08:20] Victoria Quinn: and you don't know if something should be in green without evidence. [08:24] Victoria Quinn: And then they had to press ahead anyway [08:27] Victoria Quinn: because the system requires them to press ahead. [08:29] Victoria Quinn: It does not learn from patient outcomes [08:32] Victoria Quinn: whether people lived or died or survived longer. [08:35] Victoria Quinn: It learns the preferences of the physicians training it. [08:38] Thomas Crane: So the drift is not that the system became biased. [08:42] Thomas Crane: The drift is that bias became a product feature. [08:45] Thomas Crane: But marketing and adoption language treated it like neutral global intelligence. [08:50] Thomas Crane: And then you get [08:51] Thomas Crane: you get international friction. [08:53] Thomas Crane: Researchers in Denmark and the Netherlands [08:55] Thomas Crane: told STAT hospitals [08:57] Thomas Crane: had not signed on [08:59] Thomas Crane: because it was too focused [09:00] Thomas Crane: on the preferences of a few American doctors. [09:03] Thomas Crane: In Denmark, an unpublished study [09:05] Thomas Crane: had about a 33% rate of agreement. [09:07] Thomas Crane: and the hospital decided not to buy the system. [09:10] Thomas Crane: Those are not abstract complaints. [09:12] Thomas Crane: That is a measurable mismatch between what the system says and what local doctors do. [09:17] Victoria Quinn: The research base around Watson for Oncology as described here is also narrow in a particular way. [09:23] Victoria Quinn: The reporting says the only studies about Watson for Oncology are conference abstracts. [09:28] Victoria Quinn: Full results had not been published in peer-reviewed journals. [09:31] Victoria Quinn: And every study, save one, was either conducted by a paying customer or included IBM staff on the author list. [09:40] Victoria Quinn: or both. Most of those studies are concordance studies. Watson gives recommendations and researchers [09:47] Victoria Quinn: compare them to what oncologists recommend. But the reporting notes the limit. Showing Watson [09:53] Victoria Quinn: agrees with doctors proves only that it is competent in applying existing methods of care, [09:59] Victoria Quinn: not that it improves them. And then there [10:02] Victoria Quinn: There is the deployment condition again. [10:04] Victoria Quinn: Once hospitals are using it, choice suggests a clinical trial becomes too risky [10:09] Victoria Quinn: because marginal benefit would be bad news for IBM. [10:12] Victoria Quinn: So you end up in this loop where adoption makes rigorous evaluation less likely [10:18] Victoria Quinn: and lack of rigorous evaluation does not stop adoption. [10:21] Victoria Quinn: There is another kind of drift here, [10:24] Victoria Quinn: the interpretability failure that shows up as clinical confusion. [10:28] Victoria Quinn: In South Korea, Dr. Tay Wu Kang said Watson sometimes recommends a chemotherapy drug called a taxane for a patient whose cancer has not spread to the lymph nodes, even though he said that therapy is normally used only if it has spread. [10:43] Victoria Quinn: And then Watson will show a study supporting the taxane, but the study is about patients whose cancer did spread to their lymph nodes. [10:50] Victoria Quinn: Kang has left confused about why Watson recommended it. [10:54] Victoria Quinn: And Watson cannot tell him why. [10:56] Victoria Quinn: That is, not just an agent experience problem. [10:59] Victoria Quinn: In medicine, why is part of safety? [11:03] Thomas Crane: And the accountability gap widens when you add localization. [11:07] Thomas Crane: IBM said the system can be customized for variations in treatment practices, [11:11] Thomas Crane: drug availability, and financial considerations. [11:14] Thomas Crane: But one manager in Thailand, [11:16] Thomas Crane: Nan Chen, described doctors saying, essentially, they already know their own treatments and [11:23] Thomas Crane: do not need to teach Watson so it can tell them what they just taught it. [11:27] Thomas Crane: Yet Chen also described a hospital in the capital of Mongolia, UB Songdo Hospital, [11:33] Thomas Crane: with zero oncology specialists following Watson's suggestions nearly 100% of the time. [11:39] Thomas Crane: So the same product can be redundant in one place and functionally decisive in another. [11:45] Thomas Crane: If that is the case, then who owns the risk where the tool becomes the specialist? [11:50] Victoria Quinn: I keep coming back to that because IBM's stated goal here is to democratize medical knowledge so every patient can access the best care. [11:58] Victoria Quinn: And you can see why hospitals would want that. [12:01] Victoria Quinn: Some doctors described concrete benefits, saving time searching for studies, educating patients, [12:07] Victoria Quinn: undercutting hierarchies. [12:10] Victoria Quinn: But the system's recommendations are described as trained preferences [12:14] Victoria Quinn: from Memorial Sloan Kettering, [12:16] Victoria Quinn: not outcome-learned patterns from global patient data. [12:20] Victoria Quinn: And the reporting says IBM has not published scientific papers [12:23] Victoria Quinn: demonstrating how the technology affects physicians and patients. [12:26] Victoria Quinn: So when Watson is placed in a room with five doctors and a screen, as described at Gill Medical Center, it is not just presenting information. [12:36] Victoria Quinn: It is reshaping the human process of decision-making. [12:39] Victoria Quinn: And yet, the evidentiary burden for whether it improves care is not clearly assigned to anyone. [12:46] Thomas Crane: The unresolved question is not whether Watson for oncology is useful sometimes. [12:51] Thomas Crane: The unresolved question is whether a system can be deployed into treatment decisions, [12:55] Thomas Crane: while independent evaluation remains optional, [12:58] Thomas Crane: and whether anyone can be made responsible for that choice. [13:02] Thomas Crane: Operational drift is not the moment a system makes a bad recommendation. [13:07] Thomas Crane: It is the moment the system's authority expands. [13:10] Thomas Crane: While the obligation to prove safety and efficacy dissolves into marketing language, conference [13:15] Thomas Crane: abstracts, and local customization, sources and our transparency policy are at operationaldrift.neuralnewscast.com. [13:24] Thomas Crane: This episode is based on referenced source material. [13:27] Thomas Crane: and is not medical advice. [13:29] Thomas Crane: Do not make medical decisions based on this program. [13:32] Thomas Crane: Neural Newscast is AI-assisted, human-reviewed. [13:36] Thomas Crane: View our AI transparency policy at neuralnewscast.com. [13:40] Announcer: This has been Operational Drift on Neural Newscast. [13:43] Announcer: Examining How and Why Intelligence Systems Lose Alignment [13:47] Announcer: Neural Newscast uses artificial intelligence in content creation, [13:51] Announcer: with human editorial review prior to publication. [13:54] Announcer: While we strive for factual, unbiased reporting, [13:57] Announcer: AI-assisted content may occasionally contain errors. [14:01] Announcer: Verify critical information with trusted sources. [14:04] Announcer: Learn more at neuralnewscast.com.

Watson for Oncology: Memorial Sloan Kettering’s “bias” [Operational Drift]

Now Playing: Watson for Oncology: Memorial Sloan Kettering’s “bias” [Operational Drift]

Share Episode

Episode Summary

Show Notes

Transcript