Saturday, September 27, 2014

sensitivity/specificity

There was a meeting of the IEEE (Institute of Electrical and Electronics Engineers) Thursday night at AT&T labs in which Choudur Lakshminarayan gave a talk advertised as "Big Data, Cyber-Biological Systems and Pattern Recognition" which was also dubbed "Automation Classification Heartbeats" in Mr. Lakshminarayan's slideshow. By 2018 the United States will be spending $13,100 per head on its own in the name of healthcare. Wouldn't it be nice to get this cost down? Some opportunities being eyed for potential savings come in shape of technical solutions for monitoring and reporting, perpetually, a person's physique and sending preventative alerts when metrics start straying from their norms. "It's time to go to the doctor now Chuck, sooner rather than later. Something feels off." ...could be told to you in a friendly computer-generated voice saving your life, yes, but also saving you, me, and everyone some money in the form of one less tax increase perhaps. Just as we proactively monitor the tectonic plates of the earth to predict earthquakes we may see cardiac quirks a comin' by attaching a device to a subject that is perpetually monitoring the subject, separating heart beats from both lung vibrations and speech, and then looking for outliers in the patterns. Data is perpetually being wired beyond the device to a central data store in Choudur's vision opening the door to some challenges regarding security and privacy, but centralized "big" data is needed in the name of getting a clear aggregate of suggestions of what really is normal in the name of reducing false positives of what is not normal. In the image here the topmost line of data shows a healthy heartbeat in the timeline of an electrocardiogram (often called an EKG or ECG) recording. The middle row shows "premature ventricular fibrillation" or agitation in the event of... drama! The last row shows the heartbeat of someone who died of heart failure in advance of his demise. Clearly the last row looks more like row before it than the topmost row and it stands to reason that at some point the heartbeat in the last row strayed out of a shape that was closer to the first row and into a shape closer to the second row before it progressed to its state of mad high kicks. If we have the data telling us what the first two shapes look like how may we algorithmically tell when our current real time data sampling is more like a raisin than a grape? Using data deemed healthy by doctors reading electrocardiograms and also data deemed outside the bounds an aggregate of what is healthy is condensed into an average. Heartbeats follow a PQRST pattern on an electrocardiogram in which there are separate steps labeled P, Q, R, S, and T with the R representing the upper spike. When this spike strays too far above or below what is expected then we have a match for a would-be-issue. When the audience heard this idea they started to revolt against the notion that an average of the data could represent, for example, men and women equally or persons of different weights perhaps, but Choudur counter argued that when a doctor reads an electrocardiogram that no consideration is given to gender, or lifestyle, or if the individual is a smoker. Electrocardiograms are all read the same and thus there may be, in this circumstance, an average that applies to just about everyone. There was a woman in the crowd who was eighty-five who had been expected to only live to be seven as she had an extra large heart which gave funny readings ever implying that she was on death's door all her life, so she would be an outlier, but, again, for most of us, an average of the aggregate of "big data" should represent how our hearts should behave. Most of us have seventy-two heartbeats a minute. There is room for false positives and false negatives and we do need a way to polish our algorithm is the name of reducing these. To that end:

  1. Sensitivity may be thought of as true positives divided by the sum of true positives and false negatives.
  2. Specificity may be thought of as true negatives divided by the sum of true negatives and false positives.
  3. A positive prediction value (PPV) comes from true positives divided by all positives both true and false (the sum of true positives and false positives).
  4. F score comes from doubling the value derived from multiplying sensitivity by PPV and then dividing the doubled number by the sum of sensitivity and PPV. An F score of 1 suggests a perfect algorithm and an F score of 0.9 means you're doing pretty well in Choudur's opinion. The 1 represents 100% matching of good to good and bad to bad and the 0.9 is a grade of 90%, etc.

No comments:

Post a Comment