## Biostatistics

Review this section of Step I material for some easy points.

Sensitivity: ability to detect disease. Mathematically, sensitivity is calculated by dividing the number of true positives by the number of people with the disease. Tests with high sensitivity are used for screening.They may have false positives but do not miss many people with the disease (low false-negative rate).

Specificity: ability to detect health (or nondisease). Mathematically, specificity is calculated by dividing the number of true negatives by the number of people without the disease. Tests with high specificity are used for disease confirmation .They may have false negatives but do not call anyone sick who is actually healthy (low false-positive rate). The ideal confirmatory test must have high sensitivity and high specificity; otherwise, people with the disease may be called healthy.

The trade-off between sensitivity and specificity is a classic statistics question. Understand how changing the cut-off glucose value in screening for diabetes (or changing the value of any of several screening tests) will change the number of true and false negati ves and true and false positives. If the cut-off value is raised, fewer people will be called diabetic (more false negatives, fewer false positives), whereas if the cut-off value is lowered, more people will be called diabetic (fewer false negatives, more false positives).

Positive predictive value (PPV): when a test comes back positi ve for disease, the PPV mea sures how likely it is that the patient has the disease (probability of having a condition, gi ven a positive test). Mathematically, PPV is calculated by dividing the number of true positives by the number of people with a positive test. PPV depends on the prevalence of a disease (the higher the prevalence, the greater the PPV) and the sensitivity/specificity of the test (e.g., an overly sensitive test that gives more false positives has a lower PPV).

Negative predictive value (NPV): when a test comes back negative for disease, the NPV mea sures how likely it is t hat the patient is heal thy and does not have the disease (probabili ty of not having a condition, given a negative test). Mathematically, NPV is calculated by dividing the number of true negat ives by the number of people with a negative test. NPV depends on prevalence and sensitivity/specificity just like PPV. The higher the prevalence, the lower the NPV. In addition, an overly sensitive test with lots of false positives will make the NPV higher.

Attributable risk: number of cases attributable to one risk factor; in other words, the amount by which you can expect the incidence to decrease if a risk factor is removed. For example, if the incidence rate of lung cancer in the general population is 1/100 and in smokers it is 1 0/100, the attributable risk of smoking in causing lung cancer is 9/ I 00 (assuming a properly matched control).

Relative risk (RR): compares the disease risk in the exposed population to the disease risk in the unexposed population. RR can be calculated only after prospective or experimental studies; it cannot be calculated from retrospective data. RR greater than 1 is clinically significant.

Odds ratio (OR): used only for retrospective studies (e.g., case-control). OR compares disease in exposed and nondisease in unexposed populations with disease in unexposed and nondis-ease in exposed populations to determine whether there is a difference between the two. Of course, there should be more disease in exposed than unexposed populations and more nondisease in unexposed than exposed populations. OR is a less than perfect way to estimate relative risk.

Get in the habit of drawing a 2 X 2 table to make calculations easier:

Test

Standard deviation (SD): with a normal or bell-shaped distribution, I SD holds 68% of values, 2 SD holds 95% of values, and 3 SD holds 99.7% of values. The classic question gives you the mean and standard deviation and asks you what percentage of values will be above a given value; variations on this question are also common. In a normal distribution, the mean --- median " mode. The mean is the average, the median is the middle value, and the mode is the most common value, Questions may give you several numbers and ask for their mean, median, and mode.

Skewed distribution: a positive skew is asymmetry with an excess of high values (tail on right, mean > median > mode); a negati ve skew is asymmetry with an excess of low values (tail on left, mean < median < mode).These are not normal distributions; thus, standard devia tion and mean are less meaningful values.

Reliability of a test (synony mous with precision): measures the reproducibility and consistency of a test (e.g., the concept of inlerrater reliability: if two different people administer the same test, they will get the same score if the test is reliable). Random error reduces reliability/precision (e.g., limitation in significant figures).

Validity of a test (synonymous with accuracy): measures the trueness of measurement— whether the test measures what it claims to measure. For example, if you give a valid IQ test to a genius, the test should not indicate that he or she is retarded. Systematic error reduces valid ity/accuracy (e.g., miscalibrated equipment).

Correlation coefficient: measures the degree of relationship bet ween two values. The range of the coefficient is -1 to I I . The important point in determining the strength of the relationship between the two variables is how far the number is from zero. Zero equals no association what soever; positive one (-H) equals a perfect positive correlation (when one variable increases, so does the other); and negative one (—1) equals a perfect negative correlation (when one variable increases, the other decreases). Use the absolute value to give you the strength of the cor relation (e.g., -0.3 is a stronger correlation than 1-0.2).

Confidence interval (CI): when you take a set of data and calculate a mean, you want to say that it is equivalent to the mean of the whole population, but usually they are not exactly equal. The Cf (usually set at 95%) says that you are 95% confident that the population mean is within a certain range (usually within 2, SD of the experimental or derived mean). For example, if you sample the heart rate of 100 people and calculate a mean of 80 bpm and a SD of 2, your confidence interval (confidence limits) is written as 76 < X < 84 --- 0.95.This means that you are 95% certain that the mean heart rate of the whole population (X) is between 76 and 84.

Different types of studies (listed in decreasing order of quality and desirability):

1. Experimental: the gold standard, which compares two equal groups in which one variable is manipulated and its effect is measured. Remember to use double-blinding (or at least single-blinding) and well-matched controls.

2. Prospective, longitudinal, cohort, incidence, follow- up: choose a sample and divide it into two groups based on presence or absence of a risk factor and follow the group over time to see what diseases they develop (e.g., follow people with and without asymptomatic hypercholesterolemia to see whether people with hypercholesterolemia have a higher incidence of myocardial infarction later in life). This approach sometimes is called an observational study because all you do is observe. Relative risk and incidence can be calculated. Prospective studies are time consuming, expensive, and good for common diseases, whereas retrospective studies are less expensive, less time -consuming, and good for rare diseases.

3. Retrospective/case-control: samples are chosen after the fact based on presence (cases) or absence (controls) of disease. Information can then be collected about risk factors; for example, look at people with lung cancer vs. people without lung cancer and see if the people with lung cancer smoke more. An odds ratio can be calculated, but you cannot calculate a true relative risk or measure incidence from a retrospective study.

4. Case series: good for extremely rare diseases (as are retrospective studies). Case series simply describe the clinical presentation of people with a certain disease and may suggest the need for a retrospective study.

5. Prevalence survey/cross-sectional survey: looks at prevalence of a disease and prevalence of risk factors. When comparing two different cultures, you may get an idea about the cause of a disease, which can be tested with a prospective study (e.g., more colon cancer and higher- fat diet in U.S. vs. less colon cancer and low-fat diet in Japan).

Incidence: the number of new cases of disease in a unit of time (generally i year, but any time frame can be used). Incidence rate also equals the absolute risk (to he differentiated from relative or attributable risk).

Prevalence: the total number of cases of disease that exist (new or old). Important points:

1. The classic question about incidence and prevalence: when a disease can be treated and people can be kept alive longer but the disease cannot be cured, what happens to the inci dence and prevalence? Answer: nothing happens to incidence, but prevalence will increase as people live longer. In short-term diseases, such as the flu, incidence may be higher than prevalence, whereas in chronic diseases, such as diabetes inellitus, prevalence is greater than incidence.

2. An epidemic occurs when the observed incidence greatly exceeds the expected incidence. Comparison of data:

1. Chi-squared test: used to compare percentages or proportions (nonnumeric data, also called nominal data)

2. 1-test: used to compare two means

3. Analysis of variance (ANOVA): used to compare three or more means

P-value: the board exam always contains one or more questions about the significance of the p-value. If someone tells you that p < 0.05 for a given set of data, there is less than a 5% chance (because 0.05=5%) that these data were obtained by random error or chance. If p < 0.01, the chance that the data were obtained by random error or chance is less than 1 %. For example, if I tell you that the blood pressure in my controls is 180/100 mmHg but decreases to 120/70 mmHg after administration of drug X and that p < 0.10, there is less than a 10% chance that the difference in blood pressure was due to random error or chance. However, there is up to a 9.99999% chance that the result is due to random, error or chance. For this reason, p < 0.05 is commonly used as the cut-off for statistical significance. Three points to re member: (1) the study may still have serious flaws, (2) a low p-value does not imply causation, and (3) a study that has statistical significance does not necessarily have clinical significance. For example, if I tell you that drug X can lower the blood pressure from 130/80 to 128/80, p < 0.000000000000000001, you still would not use drug X.

The p-value also ties into the null hypothesis (the hypothesis of no difference). For example, in a drug study about hypertension, the null hypothesis is that the drug does not work; any difference in blood pressure is due to random error or chance. When the drug works beautifully and lowers the blood pressure by 60 points, I have to reject the null hypothesis, because clearly the drug works. When p < 0.05, I can confidently reject the null hypothesis, because the p value tells me that there is less than a 5% chance that the null hypothesis is correct—- and if the null hy pothesis is wrong, (lie difference in blood pressure is not due to chance and must be due to the new drug. In other words, the p value represents the chance of making a type I error (claim ing an effect or difference when none exists, rejecting the null hypothesis when it is true). If p < 0.07, there is less than a 7% chance that you are making a type 1 error if you claim a difference in. blood pressure between control and experimental groups. Type XI error is to accept the null hypothesis when it is false (the hypertension drug works, but you say that it does not).

Power: probability of rejecting the null hypothesis when it is false (a good thing). The best way to increase power is to increase sample size.

Experimental conclusions and. errors: the exam may give you data and the experimenter's conclusion and ask you to explain why the conclusion should not be drawn or to point out Haws in the experimental design:

1. Confounding variables: an unmeasured variable affects both the independent (manipulated, experimental variable) and dependent (outcome) variables. For example, an experimenter measures number of ashtrays owned and incidence of lung cancer and finds that people with lung cancer have more ashtrays. He concludes that ashtrays cause lung cancer. Smoking to bacco is the confounding variable, becati.se it causes the increase in ashtrays and lung cancer.

2. Nonrandom or nonstratified .sampling: city A and city B can be compared but may not be equivalent. For example, if city A is a retirement community and city B is a college town, of course city A will have higher rates of mortality and heart disease if the groups are not stratified into appropriate age-specific comparisons.

3. Nonresponse bias: people fail to return surveys or answer the phone for a phone survey. If nonresponse is a significant percentage of the results, the experiment will suffer. The first strategy is to visit or call the nonresponders repeatedly in an attempt to reach them and get their response. If this strategy is unsuccessful, list the nonresponders as unknown in the data analysis and see if any results can be salvaged. Never make up or assume responses!

4. Lead time bias: due to time differentials.The classic example is a cancer screening test that claims to have prolonged survival compared with old survival data, when in fact the difference in survival is due only to earlier detection, not to improved treatment or prolonged. survival.

5. Admission rate bias: in comparing hospital A with hospital B for mortality due to myocardial infarction, you find that hospital A has a higher mortality rate. But this finding may be due to tougher hospital admission criteria at hospital A, which admits only the sickest patients with myocardial infarction and thus has higher mortality rates, although their care may be super ior. The same bias can be found in a surgeon's mortality/morbidity rates if the surgeon takes only tough cases.

6. Recall bias: risk for retrospective studies. When patients cannot remember, they may inadvertently over- or underestimate risk factors. For exa mple, John died of lung cancer, and his angry wife remembers him as smoking "like a chimney," whereas Mike died of a non-smoking-related cause and his loving wife denies that he smoked "rnu.ch." In fact, both men smoked 1 pack per day.

1. Interviewer bias: due to lack of blinding. A scientist gets big money to do a study and wants to find a difference between cases and controls.Thus, lie or she inadvertently labels the same patient comment or outcome as "no significance" in controls and "serious difference" in treated cases.

8. Uriacceptability bias: patients do not admit to embarrassing behavior or claim to exercise more than they do to please the interviewer—or they may claim to lake experimental medications when they spit them out.

## Kicking the Habit

Stop Thinking About How You're Going To Quit Smoking When You Can Instantly Stomp It In Less Than 30 Days With A Proven Set of Techniques. When your

## Post a comment