Basic Medical Statistics

49% of all statistics are made up. —Anonymous Background

Once data is paired down to its essentials, statistics continues to be needed to solidify and test conclusions. Untrustworthy data is no more than a compendium of interesting stories told by someone in a white coat.

Accuracy describes how far a data point is from the true value that is being measured. In archery terms accuracy describes how far away from the bull's-eye an arrow hits; each arrow is considered on its own. Accuracy correlates well with validity, which measures how far data deviates from the true value.

Precision describes how far a data point lies from the rest of the data points. It gives an idea of how reproducible a result is, and indicates the degree of random error. To use the archery metaphor again, precision describes how far an arrow hits the target away from the other arrows, regardless of where the arrows are in relation to the bulls eye. Like validity to accuracy, reliability correlates to precision.

Reliability is a measure of the reproducibility of a result: how many arrows can be shot into the same area on the target, regardless of their relation to the bull's-eye.

Mean, aka "average", is calculated by summing the value of all the data points and dividing by the number of data points. This gives a static picture of how a series of data has performed over a stretch of time. (Since it is algebraic in nature, it is by definition static; a dynamic look at how data changes over a set time period would require calculus.) For example, Mr. Smith's serum glucose levels over three days are 156, 240, 68,160,156,240,110,378,143,240,122, and 156. The mean of his serum glucose over these three days is calculated by adding all the values and dividing by 12 (four values each day for three days): 2169/12 = 180.75.

Median represents the middle data point in a series arranged sequentially (ascending or descending); and if the data set has an even number of observations, the two values in the middle define the median. The advantage of this is the median is not affected by the extremes (i.e., the 68 and 378 are effectively thrown out). In the example of Mr. Smith's glucose, arranging the values least to greatest: 68, 110, 122, 143, 156, 156, 156, 160, 240, 240, 240, 378, demonstrates the two middle values happen to be the same: 156.

Mode, simply put, is the most redundant value in a data set. In the example used thus far, 156 and 240 occur most commonly, therefore they are the modes. This seems rather inconsequential given the current example; however its significance is demonstrated by considering a disease such as adrenocortical carcinoma. This tumor occurs mainly during two age groups: children under five years old and adults between the ages of forty and fifty years old. The two groups of numbers occurring

Statistics 5-14

Mean

Mean Intensity

Figure 5.1. Normal bell shaped curve.

Intensity

Figure 5.1. Normal bell shaped curve.

most often in a listing of ages of people with adrenocortical carcinoma are under five and forty to fifty year olds. This distribution is called bimodal.

Standard deviation is a term to describe how far certain data points lie from the center, or true, values; in other words, how accurate a value is. If a value is too far from the center (greater than two standard deviations), it may be discounted as inaccurate or incorrect. This is best conceived of by remembering the familiar picture of the bell-shaped (Gaussian) curve (Fig. 5.1).

This curve graphically describes a set of data points, the mean is the peak of the bell, but the median and mode are also represented, and in fact, the mean, median, and mode are all equal in the curve. The standard deviation describes where in relation to the mean a data point is. The significance of this is that 68% of a group of data points lie within one standard deviation of the mean. Furthermore 95% of the group will fall within two standard deviations of the mean, and 99% lie within 2.5 standard deviations of the mean. When reading a study, if the data looks too good to be true, and most of the supporting data lie within four standard deviations, it probably is too good to be true. This can also be used for an individual patient such as Mr. Smith, where the further a patient's data point (a serum glucose of 378) is from the center, the more likely it is to be an abnormal value and not a normal variant.

Incidence is the number of new cases reported in the total population. Recently the incidence of HIV infection has decreased (considered over all populations) as testing, screening blood products, and education of at-risk populations has improved.

Prevalence is the number of existing data points (patients with a disease) in a population. Although the incidence of HIV infection has declined over the past several years, the prevalence has increased. How is this possible? People are still being diagnosed with the disease every year, just at a slower rate, and those with the disease now live longer with the advent of protease inhibitors. The end result of these two forces are the increase in the total number of people with HIV: an increase in the prevalence.

Table 5.1. Disease state

Test Result Afflicted Not Afflicted

Positive A B

Negative C D

Frequency is the number of afflicted people out of the at-risk population only. This effectively limits the population under consideration in incidence and prevalence to those at risk.

Relative risk shows a comparison of the people with a disease, who have, versus have not, been exposed to a certain risk (e.g., pathogen, carcinogen). The population under scrutiny is that group with the disease, and relative risk tries to answer the question, "is exposure to X responsible for Y disease?" A relative risk greater than one means there is more disease in the exposed population, therefore a positive association exists between the exposure and the disease (children exposed in utero to thalidomide are born with a higher rate of phocomelia than those children not exposed in utero to thalidomide). A relative risk equal to one shows that there is no relation between exposure and disease (children exposed in utero to caffeine are not born with a higher rate of phocomelia than those children not exposed in utero to caffeine). A relative risk less than one means there is less disease in the exposed population than in the unexposed population. It does NOT mean that exposure conveys a protection or immunity to the disease; that is a separate issue. Relative risk can be calculated only for prospective studies.

Attributable risk mathematically is the incidence rate in people NOT exposed subtracted from the incidence rate in the exposed population. If two groups of1000 babies each (one exposed to thalidomide in utero and the other not) are studied, and 100 babies out of the former group are found to have phocomelia, while 3 out of the latter have the deformity, the attributable risk is 97. That is, 3 diseased in the unexposed group subtracted from the 100 in the exposed group, which can be stated as the risk of phocomelia attributed to thalidomide exposure in utero is 97 out of 1000. This can be useful in predicting what would happen in a population if the exposure risk was removed.

Odds ratio is an estimate of relative risk calculated from retrospective studies (discussed below). The odds ratio is like looking at relative risk through the looking glass. Now that a group of people with a particular disease has been identified, can the odds that those affected people were exposed to something be compared with the odds that a group of control subjects (healthy population) was exposed? The answer is yes, and it can be expressed as the odds ratio. This is useful when the incidence of a disease is unknown, but the relationship between exposure to an agent and a disease is still being examined.

Knowing this vocabulary lays the basis for using the tools of statistics. Some of the studies have already been referenced, but discussion about these tests and studies begins with that famous "2 x 2 box" (Table 5.1).

Sensitivity measures how well a test identifies people with a disease. Graphically speaking, sensitivity is calculated as A/A+C. In other words, sensitivity begins with the population of all those afflicted and measures the odds that someone with a positive test actually has the disease. Of course, a higher sensitivity is preferable to lessen the chances of false negatives, which represent afflicted people, who

test negative. A second issue relating to sensitivity is positive predictive value, which measures the accuracy of a test. The positive predictive value measures the odds that someone with a positive test actually has the disease. Graphically this can be represented by A/A+B. Because the starting population for sensitivity is that group of afflicted people, and the positive predictive value is a measure of the accuracy of a test, the positive predictive value is highest when the disease prevalence is high. More people with a disease means more people in the sample population (for sensitivity), and thus measures (positive predictive value) regarding the population under scrutiny are stronger and more exact (variation is minimized by increasing the denominator).

Specificity measures how well a test identifies people without a disease. Graphically speaking, specificity is calculated as O/B+O. As opposed to sensitivity, specificity concerns itself with the healthy population and measures the odds that someone with a negative test actually does not have the disease. A higher specificity is desirable to decrease the number of false positives, which represent those testing positive, but who are actually not afflicted. Unfortunately, as often occurs, as the specificity increases, the sensitivity decreases. The ideal test has both a high specificity and sensitivity. Another term relating to specificity is negative predictive value, which measures the odds that a person with a negative test actually is healthy. Graphically speaking this can be written as O/C+O. Because the negative predictive value represents the ability of a test to identify healthy people, it is highest when the prevalence of a disease is low.

0 0