logo
Published on developer.* Blogs (http://www.developerdotstar.com/community)

Does Software Need an Apgar Score?

By Daniel Read
Created 2006-11-01 14:31

This morning I was reading the Oct 9, 2006 issue of The New Yorker magazine, which contains an interesting article called "The Score: How childbirth went industrial [1]." The article is about the process of childbirth, and more specifically, the medical techniques and industry, for lack of a better word, that have developed around childbirth. When I started reading the article, I was not expecting to encounter an intriguing idea related to software development--but ideas often spring from unexpected sources.

The article narrates dramatic improvements that were achieved in childbirth and mortality rates in the 20th century. Author Atul Gawande describes a stark situation for childbirth in the U.S. in the 1930's:

But in 1933 the New York Academy of Medicine published a shocking study of 2,041 maternal deaths in childbirth. At least two-thirds, the investigators found, were preventable. There had been no improvement in death rates for mothers in the preceding two decades; newborn deaths from birth injuries had actually increased. Hospital care brought no advantages; mothers were better off delivering at home. The investigators were appalled to find that many physicians simply didn’t know what they were doing: they missed clear signs of hemorrhagic shock and other treatable conditions, violated basic antiseptic standards, tore and infected women with misapplied forceps. The White House followed with a similar national report. Doctors may have had the right tools, but midwives without them did better.

The author then describes how childbirth was improved through standardization of techniques, training, and regulation of who exactly was allowed to perform certain procedures (based on whether they had the training and experience):

These standards reduced the number of maternal deaths substantially. In the mid-thirties, delivering a child had been the single most dangerous event in a woman’s life: one in a hundred and fifty pregnancies ended in the death of the mother. By the fifties, owing in part to the tighter standards, and in part to the discovery of penicillin and other antibiotics, the risk of death for a mother had fallen more than ninety per cent, to just one in two thousand.

But the situation wasn’t so encouraging for newborns: one in thirty still died at birth—odds that were scarcely better than those of the century before—and it wasn’t clear how that could be changed.

Later in the article, the author describes huge improvements that came in the second half of the century:

In the United States today, a full-term baby dies in just one out of five hundred childbirths, and a mother dies in one in ten thousand. If the statistics of 1940 had persisted, fifteen thousand mothers would have died last year (instead of fewer than five hundred)—and a hundred and twenty thousand newborns (instead of one-sixth that number).

How did these huge improvements happen?

...a doctor named Virginia Apgar, who was working in New York, had an idea. It was a ridiculously simple idea, but it transformed obstetrics and the nature of childbirth. ... she took a less direct, but ultimately more powerful, approach: she devised a score.

The Apgar score, as it became known universally, allowed nurses to rate the condition of babies at birth on a scale from zero to ten. An infant got two points if it was pink all over, two for crying, two for taking good, vigorous breaths, two for moving all four limbs, and two if its heart rate was over a hundred. Ten points meant a child born in perfect condition. Four points or less meant a blue, limp baby.

The score was published in 1953, and it transformed child delivery. It turned an intangible and impressionistic clinical concept—the condition of a newly born baby—into a number that people could collect and compare. Using it required observation and documentation of the true condition of every baby. Moreover, even if only because doctors are competitive, it drove them to want to produce better scores—and therefore better outcomes—for the newborns they delivered.

Is anyone else seeing the parallels to software development here? Am I off base thinking that the idea of a simple "score" to assess the "health" or "quality" of a software system could be a powerful tool, as it was, apparently, for obstetrics? Such a score, I think, would have to be as simple as described above for the Apgar score, based on easily observable phenomena, and only marginally dependent on subjective factors.

Typical "ility" descriptors used often in software circles would, I think, be ineffective in this context. Is the software maintainable? Is the software secure? These are too fuzzy, too broad, too open to interpretation. We need things more like, is the baby pink or blue? Is the baby crying?

Does the application have an exception handling scheme, or doesn't it? Is there clean separation of presentation logic and business logic, or isn't there? These are things that can be assessed relatively objectively by someone with the proper knowledge and experience.

I've spent the last five months on my day job spelunking through our client's entire enterprise of back office and web software systems, reading tons of old code, combing through databases, and trying to make sense of arcane user interfaces. All this has resulted in about 500 pages of maintenance documentation for my client, who recently purchased a company and wanted to know whether the software it purchased would scale up for further growth and acquisition. I've done similar system assessments many times in the past (though not usually this large), and my gut tells me that assigning a score to each of the systems I've examined would be very useful to my client.

On the front side, with a known scoring system in mind, is the novice (or even "senior") developer more likely to produce software that at least attempts to implement the necessary elements to achieve a respectable score? Would developers working in team environments encounter the same peer-related effects as seen in obstetrics?

The Apgar score changed everything. It was practical and easy to calculate, and it gave clinicians at the bedside immediate information on how they were doing. In the rest of medicine, we measure dozens of specific things: blood counts, electrolyte levels, heart rates, viral titers. But we have no measure that puts them together to grade how the patient as a whole is faring. It’s like knowing, during a basketball game, how many blocked shots and assists and free throws you have had, but not whether you are actually winning. We have only an impression of how we’re performing—and sometimes not even that.

More parallels with software development abound in this article about birthing babies:

There’s a paradox here. Ask most research physicians how a profession can advance, and they will talk about the model of “evidence-based medicine”—the idea that nothing ought to be introduced into practice unless it has been properly tested and proved effective by research centers, preferably through a double-blind, randomized controlled trial. But, in a 1978 ranking of medical specialties according to their use of hard evidence from randomized clinical trials, obstetrics came in last. Obstetricians did few randomized trials, and when they did they ignored the results.

...

The question facing obstetrics was this: Is medicine a craft or an industry? If medicine is a craft, then you focus on teaching obstetricians to acquire a set of artisanal skills—the Woods corkscrew maneuver for the baby with a shoulder stuck, the Lovset maneuver for the breech baby, the feel of a forceps for a baby whose head is too big. You do research to find new techniques. You accept that things will not always work out in everyone’s hands.

But if medicine is an industry, responsible for the safest possible delivery of millions of babies each year, then the focus shifts. You seek reliability. You begin to wonder whether forty-two thousand obstetricians in the U.S. could really master all these techniques.

For the record, I've always been skeptical of software engineering metrics and their usefulness in the software shops where I've worked. (See my "Balance in Scoring" comment, below.) But something about the directness, simplicity, and apparent effectiveness of the Apgar score struck me. The author of this article, himself a physician at the Harvard School of Public Health [2], was it seems struck by it also:

In a sense, there is a tyranny to the score. Against the score for a newborn child, the mother’s pain and blood loss and length of recovery seem to count for little. We have no score for how the mother does, beyond asking whether she lived or not—no measure to prod us to improve results for her, too. Yet this imbalance, at least, can surely be righted. If the child’s well-being can be measured, why not the mother’s, too? Indeed, we need an Apgar score for everyone who encounters medicine: the psychiatry patient, the patient on the hospital ward, the person going through an operation, and the mother in childbirth. My research group recently came up with a surgical Apgar score—a ten-point surgical rating based on the amount of blood loss, the lowest heart rate, and the lowest blood pressure that a patient experiences during an operation. We still don’t know if it’s perfect. But all patients deserve a simple measure that indicates how well or badly they have come through—and that pushes the rest of us to innovate.

Even if this is a good idea for software, obviously many details would need to discussed and worked out. For example, would we really need multiple scoring systems for different kinds of software? However, I'll stop writing at this point and assess whether there is any further interest in this idea. What do you think?

Thanks for reading,
Dan

P.S.
I've quoted heavily from it here, but I recommend the full article [3] as a worthwhile read.


Source URL:
http://www.developerdotstar.com/community/community/node/635