Designing Research Survey Design Part One M A U R E E N GIUFFRE, PhD, RN Surveys are essential for discovering the incidence, distribution, and interrelationship of variables within a population. As such, it is important that we are able to critically read published surveys. This manuscript is the first of two dealing with survey research. Sampling and external validity are covered in this manuscript. 9 1997 by American Society of PeriAnesthesia Nurses.
URVEYS ARE essential for discovering the incidence, distribution, and interrelationship of variables within a population. There are two large categories of surveys (and then many subcategories). These two categories or types of research are status surveys and sample surveys.
S
STATUS SURVEY
Status surveys are undertaken to discover the state of being of a particular variable in the population. The whole population is surveyed. The US census is a status survey. On a particular day, the census bureau does its best to establish how many men, women, and children there are in the United States. Reportable diseases also act as an ongoing status survey. If we cared to ask, the Centers for Disease Control could tell us exactly how many cases of leprosy were diagnosed in the United States in any given year (at least since its inception), where they were, and who had it. One major problem associated with status surveys that distinguishes them from sample surveys is that they can be wildly expensive (Table 1). If I want to know the exact status of exercise participation in the United States and attempt to find this out by asking each and every person, I had better be mighty wealthy and have a lot of time. Because few people have the resources required to conduct a status survey of a major population, most researchers settle for a sample survey.
SAMPLE SURVEY
A sample is where a researcher selects a portion of the population (sample) to study and then tries to make inferences about the whole population from the findings in the sample. The researcher is said to try to generalize from the sample to the population or from the sample to the whole. Generalizing is like making an educated guess, and is the major issue of importance that differentiates sample research from status research. In a status survey, the researcher does not need to generalize. She knows what the status of the particular variable is in the population. With a sample survey, the researcher knows what the situation is in the sample and justifies why the situation is probably the same in the population from which the sample was drawn. Generalizability is also called external validity. EXTERNAL VALIDITY
A long time ago when we discussed internal validity we discovered that the term internal validity refers to how well an instrument measures what it purports to be measuring. External validity refers to how well the findings from the sam-
Maureen Giuffre, PhD, RN, is a Clinical Research Consultant in private practice in Salisbury, MD. Address correspondence to Maureen Giuffre, PhD, RN, 26361 High Banks Dr, Salisbury, MD 21801. 9 1997 by American Society of PeriAnesthesia Nurses. 1089-9472/97/1204-0007503.00/0
Journal of PeriAnesthesia Nursing, Vol 12, No 4 (August), 1997: pp 275-280
275
M A U R E E N GIUFFRE
276
Table 1. Differences in Risks or Problems Associated with Status and Sample Survey Research Status Survey
Sample Survey
Internal Validity Risks
•
x
Reliability Risks External Validity Risks Costs
•
• x
x
pie represents the probable situation in the population. If you want to make a statement about the incidence of a variable within a population, the best way to do that would be to study the whole population (a status survey). That not being possible, the next best thing would be to study a sample that represents the population. How that sample is derived will determine the study's external validity or generalizability. This is the tricky part. Let us say Mimi is interested in needle sticks. The best way she can think of measuring the incidence of them is by studying incidence reports. What Mimi wants to know is how many needle sticks there were in her hospital (Nowheresville Regional Medical Center [NRMC]) in 1996, who received them, and what factors they were related to. Well, Mimi is a busy woman, and a typical graduate student--job, children, husband, etc. She really does not want to spend a lot of time on this project. Rather than look at all the incidence reports for 1996 she would like to select a sample. What she has learned in her graduate research class is that in order to generalize from the sample to the population she must randomly draw that sample from the population. So Mimi says to herself, " I will randomly select 1 month from the whole year and study all the incident reports in that month. Because the month is randomly selected I will be able to generalize to the whole year." She puts the names of the 12 months in a hat and selects out o n e - - S e p t e m b e r . Will Mimi be able to generalize her findings to the whole population? Not very well. How often do we see samples taken from a convenient time of the year and generalized to the whole year? Although this particular way of selecting a sample is commonly done, it is not random. Whether or not a particu-
lar incident report is chosen to be included in the sample is not randomly determined but very specifically related to the month that the incident took place in. This may not seem particularly important, but it is. The only way to generalize from a sample to a population is for the sample to be truly randomly drawn from the population. In this case Mimi would have to select a sample of needle stick incident reports from all of the needle stick incident reports for the year 1996. Unless she knows all of the important related variables (and presumably she does not or she would not be studying the issue), even the smallest variation from truly random may have an impact on the findings in some imperceptible way. What Mimi does not know, because she has not adequately reviewed the literature, is that nursing students and interns are more likely to stick themselves than more experienced practitioners. So if the month chosen was one in which we might find an unusual number of nervous neophytes (July or September) handling sharp pointy objects, the incidence would be high and the characteristics of those who stick themselves might be different from the apparent characteristics if another month was chosen. Keep in mind that although I have shown you where this selection procedure fails to mimic randomization (ie September is not necessarily just like any other month), this is not often evident. If the researcher selects a sampling process that is not truly random this may have an impact on the findings in some way that neither you nor the researcher knows. If the researcher wishes to say s o m e t h i n g about a particular phenomenon in a population, the sample must be drawn from that population. If it is a random sample, every member of that population must stand an equal chance of being selected. A researcher who wishes to study outcomes after gallbladder surgery in the United States must randomly select from all cholecystectomies in the US. Obviously this is not feasible. Usually a particular hospital is selected. In this case the researcher can sample from the population of cholecystectomies at that hospital; in which case he will be able to generalize to his population. The population is the cholecystectomies at that hospital. Alternatively, the researcher can study all the cases at his hospital and then
SURVEY D E S I G N - - P A R T ONE will have no need to generalize to that population, because he will know the status of the phenomenon in his population. It is not acceptable to study the population of one hospital and then generalize to a different population. For example, Mary Jo studies music therapy in abdominal surgery patients in her hospital. Typically the sample is not randomly selected. She makes her conclusions. Then in limitations she says one of the limitations of the study is that we now know how this works in abdominal surgery patients, but we do not know how it works in orthopedic patients. Wrong. We only know how it works (or does not work) in the few abdominal surgery cases she studied. Because the sample was not drawn in any systematic way from the population (at her hospital) she cannot even generalize to the population of patients at her hospital, let alone all abdominal surgery patients. If the researcher has done a good job of randomly selecting his sample from his population he can generalize to his population. He may then present data that compares his population to the larger population (say of the US). Typically this will be demographic type data. It will then be up to you to decide if his population is like your population and if his results might be applicable to your situation. External validity does not usually have tests associated with it. It is a judgement call. One common mistake and then one common mistake with a twist will be presented. Let us use Mimi again. Let us say that there are 100 needle sticks at NRMC in 1996. Mimi could look at all of them, it is easily possible. But for some particular reason graduate students have gotten it into their heads that there is something to be gained by randomly selecting from that 100. Presumably it has something to do with misunderstanding generalizability. If you have the potential of studying the entire population there is nothing to be gained by sampling from that population. There is nothing to be gained by only looking at part of the picture and speculating about what the whole picture looks like if you have the potential to look at the whole thing to begin with. The common mistake with a twist. A number of years ago I reviewed a chart review study where the researcher had the potential to look at
277 all of the charts of a certain diagnosis in two hospitals in the city of Boston. Rather than look at all of them she randomly selected charts from these two hospitals. The common mistake would have been to believe that the findings were generalizable to (l) the world, (2) the United States, or (3) Boston. For some peculiar reason the author stated that one of the limitations of the study was that the findings were only generalizable to New England. Now, I grew up in New England and I do not seem to remember a fence around the place, and people of Boston certainly are not randomly selected from nor representative of the whole. It may seem that I have dwelt unnecessarily long on this issue of generalizability but it often appears an afterthought in the studies I receive to review. Inexperienced researchers seem quietly confused about it. To rephrase and reiterate; if you have a randomly drawn sample you can generalize to the population from which the sample was drawn. If you have a sample that is anything other than randomly drawn, you in fact do not have a sample, you have a population. Your data is derived from a group of people that do not represent any larger group of people, so they are the end point. They are the population. Because sampling is critical to the value of the study, let us go over sampling techniques before we discuss other issues. SAMPLING METHODS
Convenience Sample Probably the most commonly encountered method of sampling in nursing research is convenience sampling, sometimes known as accidental sampling: This method of sampling is very subject to bias. Some patients are just not convenient to study. For example, the nurses on the same day surgery unit want to survey patient satisfaction with nursing care. Polly is assigned to give 100 available patients the questionnaire. Polly is just about to enter Mr. Smith's room, when she gets a glimpse of a large unshaven man with what seems to be a month full of incisors growling at the nurse. Oops, time for lunch. Too bad, when Polly gets back Mr. Smith has signed himself out against medical advice and she cannot get his assessment of the nursing care. When doing a survey with a convenience sample (or actually
M A U R E E N GIUFFRE
278
a convenience population), sampling bias is not usually so obvious, but it is almost always there. If the interviewer is more comfortable talking with men, or young people, or people of a certain race, those people are more likely to be overrepresented in the results. Convenience sampling may in some cases be all that is possible. In these cases the researchers may produce interesting findings, but they serve as jumping off points for further questioning. Survey findings from a convenient sample have no external generalizability. The authors should be clear about this and the reader should clearly understand this. Random The goal of sampling is to get an accurate picture of the situation of interest in the population without having to spend the time or money necessary to study the entire population. To get an accurate picture of the population, all segments of the population must be represented or at least must stand an equal chance of being represented in the sample. All sampling techniques have the potential for bias, but some have more potential than others. True random selection has the least potential. For a sampling method to be truly random all elements in the population must stand an equal chance of being selected. No one individual's or item's chance of being included can be influenced by another individual or item. Let us say that you are interested in orthopedic surgery patients. Your study has some expensive follow-up so you cannot study everyone. You decide to study every other patient admitted to the PACU, Monday through Friday. Is that random? In a purest sense, no. Whether or not a patient is selected depends on whether or not the patient before him was selected. There may be some scheduling issues you do not know about that could have an influence on the order of patients. More obviously, patients who have surgery on the weekend may be different from patients who have surgery during the week. The most random method would be for the researcher to place two colored marbles in a hat, one for subject, one for non-subject. As each case arrived in the PACU the researcher could reach into the hat and pull out a marble, selecting whether or not that patient would be included. The marbles would always be
Table 2. Make Believe Table of Random Numbers 7842 5582
1272 5721
1953 5765
5349 3465
8249 7628
3491 3777
7463 5165 3249
5892 7692 8516
0849 4265 5498
8446 1217 2657
3419 7468 6879
9846 4751 8435
1654
9843
5126
7984
7516
8984
5167
6418
4987
4268
5854
9548
returned to the hat so that when the next patient arrived both potential choices would be available. This would be selection with replacement. Each and every patient stands an equal chance of being included. Selection without replacement is not a purist's approach to random sampling but for most people and situations it is acceptable. In this case the first patient who entered the PACU would be selected or not selected depending on the marble first pulled. The next patient would be assigned based on the remaining marble. Once both marbles are used, both would be returned to the hat for the next two patients. In a purist sense the first of every two patients is randomly assigned. The second one is not, but the chances for bias are diminished. For some reason or other Dr Joe does not want his patients included in the study. If a simple system of assigning every other patient to the study is being used it will not be too difficult for him to time his patient's arrival in the PACU so that he can avoid entering his patients. But if selection without replacement is used the opportunity to consciously or subconsciously influence the assignment is reduced. If a researcher intends to randomly assign, an alternative method of doing that other than colored marbles or envelopes is to use a table of random numbers. Table 2 is a table I made up that is probably not truly random since my fingers seem to have a bias to 2 and 7, but pretend it is a computer-generated table of random numbers. Just about every research textbook has one of these tables in the back. Let us say that the researcher wants to randomly assign patients to either a control group (group 1) or one of two treatment groups (2 or 3). (Whether or not the issue is random sample selection from a population or random assignment to group once the sample is determined, the methods of assignment are the same.) If he used a table of random num-
SURVEY D E S I G N - - P A R T ONE bers, he would decide on a system, any system will do as long as he is consistent. My system will be that I will go down each column of the table before moving to the next and I will select the last number in each set of numbers. The first number in the first column ends in 2, so my first patient is assigned to group 2. The second number also ends in 2 so the second patient is also assigned to groups 2. The third patient is assigned to group 3. The next four numbers do not end in 1, 2, or 3 and therefore, are passed over. This process is continued until all the patients are assigned. The only problem with a table of random numbers or any true randomization is that there is no guarantee that the groups will have equal numbers in them.
Stratified Random Sampling One of the major risks of random sampling is that certain sections of the population will not be selected. The patient population of your unit may be 50% male and 50% female. You would like to do a survey that is representation of that distribution and collect data from 100 patients. A random sample might lead to the desired distribution, but it might not. An acceptable way of ensuring the distribution is to think of the different genders or strata as separate surveys, at least for data collection. You will have 50 surveys to randomly distribute to the women and 50 to distribute to the men. If you finish one gender before the other that is acceptable. This method is more commonly used when a survey is attempted in a community. Most communities have neighborhoods that have differing socioeconomic characteristics. It might be easier to find someone at home to answer the survey in the upper middle class or poorest neighborhoods. Consequently certain segments of the population would be underrepresented, even in a random sample. A stratified random sample would eliminate this inequity.
Cluster Sampling Let us say you would like to survey PACU nurses across the country. Quickly you realize that you neither have the time nor money. But a representative sample is really important. The first thing you might do is randomly select five states, from there you would have to obtain the names of all the hospitals that had PACUs in
279
those five states. You could then randomly select three hospitals from each of the five states. After obtaining a list of the names of the nurses in each PACU in each hospital you could randomly select from each of those lists. The resulting sample would be the outcome of successive or cluster samplings. How is this different from Mimi first selecting a month, and then sampling from the month? You may wonder. Good point, because it is not. In Mimi's case it was obvious why all the months might not be the same. When one chooses to do cluster sampling the risks of selecting a cluster that is not necessarily like the population is there. When using cluster sampling the author must make an effort to determine if selection of the clustering technique in any way might result in a sample that would be different from a random sample selected from the whole. It is then up to the reader to decide whether or not she accepts the argument and therefore accepts the findings. There are many variations on the these themes when it comes to selecting a sample. If you intend to conduct a survey your goal will be to get a sample that is as free from bias as possible and as representative of the population as you can make it. Specialty text books should be referred to before starting. The last point I wish to discuss that is so important to the interpretation of the findings is sample size. SIZE OF SAMPLE
It turns out Mimi has a friend who is a researcher and she gets clued in to the randomization issue. So Mimi decides to randomly select from the whole population of incidence reports. The next question related to generalizability is does the sample adequately represent the population. The closer the sample size gets to the population size, in other words, the more like a status survey it becomes, the more accurate the generalizations from the sample will be. Let us look at an example. Table 3 contains the time it took 35 runners to complete the 1995 Wachapreague Half Marathon. This data is going to underrepresent the point because on this particular variable the participants are, by necessity, remarkable alike. These people trained for this distance. If we took the first 35 people we meet on the street and informed them that tomorrow
M A U R E E N GIUFFRE
280
Table 3. Run Times of 35 Participants for the 1995 Wachapreague Half Marathon Individual Times Sample Sample Sample Sample Sample Sample Sample
1 2 3 4 5 6 7
98 106 108 9B 130 108 111
116 99 112 104 114 125 103
110 109 83 92 115 105 117
119 130 98 88 113 116 104
they would be running 13 miles, the resultant times would vary a great deal more. But, even with this limitation, it I think you will get the point. I have broken the population of 35 runners into seven samples of five runners. The mean running time for this population of 35 runners is 107.1 minutes. As you can see none of the sample averages is the same as the population average. But by increasing the sample size, approximated by the cumulative averages on the far right, the sample average better represents the population average. How big is big enough? That is hard to say. In our example by the time we had included 20 subjects we were as close to the population average as we were going to get without including
123 134 111 102 94 100 79
Sample Average
Cumulative Average
113.2 115.6 102.4 96.2 113,2 110.8 102.8
114.4 110.4 106.9 108.1 108.6 107.7
every one. That is over 50%! And there is relatively little variance in the population. The impact of variability in the data being collected and how many subjects are needed would be clearer if the issue was weight. In the above example if we had used weight, again the variability would be fairly small, because runners are not usually obese. But if we were looking at weight in the nonexercising community the variability would be huge. One person of 300 lbs (unfortunately not unheard of) would distort any small sample. There is no clear answer to how many is enough. But the larger the portion of the population that is included in the sample, the closer the sample finding will probably be to the true result in the population.