In 2016, a team at DeepMind developed an algorithm called Streams that could detect acute kidney injury in hospital patients before it became clinically apparent. In trials, it performed better than existing early warning systems. It was subsequently deployed across several NHS trusts.
The project was later investigated by the National Data Guardian after it emerged that patient data from 1.6 million people had been transferred to DeepMind without patients’ knowledge or explicit consent. The transfer was ruled unlawful.
The Streams story is a reasonable encapsulation of where AI in healthcare currently sits: genuine potential, real implementation problems, and ethical tensions that don’t resolve neatly.
What AI can actually do in medicine
The areas where AI performs well are those involving pattern recognition in large datasets. Radiology is the most developed: algorithms trained on hundreds of thousands of images can detect certain cancers, diabetic retinopathy, and bone fractures from scans with accuracy that matches or in some cases exceeds specialists. In dermatology, a study in Nature (2017) found that a convolutional neural network classified skin lesions as well as board-certified dermatologists.
What AI does poorly: reasoning from incomplete or ambiguous information, integrating the patient’s social context and values, applying ethical judgment, and explaining its decisions. The last point matters clinically — if a model flags a chest X-ray as abnormal, a clinician needs to know why to act on it.
The bias problem
AI systems learn from data, and medical data carries the biases of the systems that generated it. Pulse oximeters read less accurately in patients with darker skin tones — a known hardware issue. If AI diagnostic systems are trained on populations that skew white, male, or affluent, their performance on other groups will be worse. In some cases, significantly worse.
A 2019 study published in Science found that an algorithm widely used in US healthcare to allocate additional care was systematically less likely to recommend it for Black patients than white patients with the same clinical need, because it used healthcare costs as a proxy for need — and Black patients historically had less access to care and therefore lower costs.
Autonomy and decision-making
If an AI system recommends a treatment and a patient refuses, or if a clinician overrides the system and the patient comes to harm, the chain of responsibility becomes murky in ways that existing medical ethics frameworks weren’t built to handle. Informed consent requires that a patient understands what they’re consenting to — does this mean explaining how the model was trained?
There is also the question of what happens to clinical judgment. A clinician who consistently defers to an AI recommendation may become less capable of independent assessment over time. Whether this deskilling effect is a genuine concern or overstated is actively debated.
The NHS’s particular position
The NHS holds data from one of the most genetically and demographically diverse patient populations in the world, over long time periods. This is genuinely valuable for training AI systems. But the governance of how that data is used — who profits, what patients are told, how results are validated before clinical deployment — remains inconsistent.
