The way the sample is selected and calculated is crucial for the generalisability of the results of our research. A poor choice of sampling technique and/or miscalculation of the sample size can lead to the results being limited to only those participants we have included in our study. Since we cannot study the entire target population as it is practically inaccessible, we must select a sample that allows us to infer, extrapolate and generalise our results to the reference population (more accessible under inclusion and exclusion criteria defined by the researcher). This sample must be representative of that population in order for the results of our study to have external validity and, furthermore, it must be of an adequate size. However, the sample must be large enough to ensure that it represents the reference population, and small enough to facilitate its analysis.1–5 Therefore, the representativeness of our sample will be conditioned by the sampling technique used to select it (probability and non-probability) and the size of the sample.
Probabilistic sampling techniquesThe participants selected using these techniques have a known non-zero probability of being included in the sample. In this way, they avoid possible researcher bias in sample selection. Therefore, the sample selected tends to be more representative of the reference population. Another advantage of these techniques is that they involve the application of statistical techniques capable of quantifying the random error we make in selecting the sample due to chance.3–5 However, it is possible that chance itself may cause the distribution of the variable obtained in our sample not to be the same as in the reference.6
Probability sampling techniques are divided into3–5:
- •
Simple random sampling: participants are selected randomly using random number tables or software (freely available on the internet), so everyone has the same probability of being selected. In addition to being the quickest and easiest method, as only randomness is involved, more representative samples are achieved. However, it requires listing the entire reference population, so it is rarely used unless the reference population is small.
- •
Stratified random sampling: this is a variant of the previous technique that is used when the variable we wish to study is not distributed homogeneously within the reference population but is distributed within groups or strata that are mutually exclusive. In this way, an attempt is made to ensure the same distribution of said variable in the reference population. It is recommended that these strata be determined according to some confounding variable that may influence the results. Subsequently, a random sample is selected from each stratum.
- •
Systematic random sampling: In this technique, the first participant is chosen randomly, and the following participants are selected by adding a previously defined sampling constant (k) until the sample size is reached.
- •
Multi-stage sampling: when the reference population is very large or dispersed and a complete list of the reference population is not available, in a first stage it is convenient to select sampling units from the reference population (primary units) and, in subsequent stages, to select samples from each previously selected unit (secondary units). In this way, the sample is selected in the stages deemed necessary, and more than one probability sampling technique can be applied (simple, stratified, systematic). As many stages as necessary can be used, and a different sampling method can be applied at each stage. If all secondary units are included in the sampling, it is known as cluster sampling. Therefore, although we do not have the list of the entire reference population, we can have the list of groups or clusters of the same.
In general, we will choose a probability sampling technique when the reference population is sufficiently accessible and well differentiated before starting our study.3–5 But, once we have opted for probability sampling, which technique should we choose within it? If the reference population is very large, dispersed and grouped by some characteristic, we will choose a multi-stage sampling technique. If this is not the case, and we are interested in controlling the distribution of some confounding variable, it would be more convenient to use stratified sampling. Within it, if we decide to include all groups or clusters of the reference population, we will choose cluster sampling. However, if we are not interested in controlling for any confounding variable, the reference population is small and we have it adequately enumerated in a list, it is best to choose a simple random or systematic sampling technique.3
Non-probabilistic sampling techniquesIf, on the other hand, the reference population is not easily accessible and is not sufficiently differentiated, it is most convenient to use non-probability sampling techniques.3–5 In these techniques, the probability of each participant being included in the sample is unknown, and they are selected using techniques that do not involve chance, and random error cannot be calculated.3–5 Therefore, participants are selected largely on the basis of the researcher's judgement, assuming that the samples selected are free of bias, and that they are representative of the reference population.3–5
The most common non-probabilistic techniques are:3–5
- •
Consecutive sampling: this is the most commonly used technique, especially in clinical trials. It consists of selecting participants who meet our selection criteria during the recruitment period in which we are going to carry out the study. It is usually used to recruit patients who come to the clinic and are diagnosed or admitted within a certain time period.
- •
Convenience, accidental or chance sampling: in this case, participants are selected because they are easily accessible to the researcher or because they wish to participate voluntarily. In this way, the researcher chooses participants based on their availability (proximity, friendship, etc.). It is recommended that the distribution of the variable under study is sufficiently homogeneous within the reference population, as there is a high risk that the sample will be biased.
- •
Purposive or intentional sampling: here the researcher selects the participants that he/she believes can contribute the most to the study. This ensures that he/she does not miss important participants if he/she were to choose a random or convenience technique. This technique is mainly used in qualitative studies or when you want to select a sample of experts.
- •
Quota sampling: firstly, the composition of the reference population is determined according to a characteristic or variable (frequently sex or age) and, subsequently, the quota or number of participants who meet that characteristic or variable is determined. The aim is to achieve the appropriate number to complete each of the quotas determined.
- •
Avalanche, snowballing or chain sampling: this technique is particularly useful and efficient when participants are difficult to reach and is more practical than convenience sampling, which is mainly used in qualitative studies. It consists of selecting a participant who meets the selection criteria and who is asked to inform the researcher about other participants, and so on until a sufficient sample is obtained.
- •
Theoretical sampling: this technique is mainly used in qualitative studies whose theoretical framework is based on grounded theory. Participants are selected gradually in order to capture all possible meanings in order to develop a theory.7
By calculating the sample size, we aim to define an approximate number of participants that need to be included in the sample in order for it to be representative of the reference population.3,4 If, on the one hand, we include an insufficient number of subjects, we run the risk of not finding significant differences when in fact they do exist (type II error or β). On the other hand, if we include too many participants, we will be wasting time and resources in our research.1,2,8,9
It should be noted that it is generally not necessary to calculate the sample size in qualitative studies, since the main aim is to achieve information saturation, which occurs when the information collected becomes redundant, and no new information is collected from the study participants.10
However, in quantitative studies it is necessary to perform this calculation very carefully, as the design of the study will depend on it (e.g., whether the sample recruitment period needs to be extended to achieve the calculated size).3,4 A number of standard error formulas are used for this purpose, which can be cumbersome and depend on the statistical test to be used in the study. Fortunately, there are freely available tables and software that facilitate their calculation from the estimated parameters. Some of these epidemiological calculators are available online (such as GRANMO or Powerandsamplesize.com), others are free software that can be downloaded on personal computers (such as Epidat, or G*Power), and others are even applications for mobile devices (n4Studies).
Depending on the objective of our research, sample size determination is possible3,4:
- •
Estimating population parameters: from the values collected in the sample, researchers aim to estimate the value of a parameter in the reference population. These parameters are statistically inferred and may be proportions (e.g., the proportion or percentage of critically ill patients presenting with a given complication) or means (e.g., the mean of a physiological variable collected in critically ill patients). To estimate these parameters, investigators must determine the following values:
- o
The variability of the estimated parameter: this is usually unknown, so the researcher must make an approximation of it by carrying out a pilot study or by taking data from previous research.
- o
The precision of the estimate: this consists of the width of the confidence interval (CI), with greater precision (i) being achieved the narrower the interval, so the sample size will be larger.
- o
The confidence level or statistical significance of the estimate: as a minimum, and as a rule, it is set at 95% (α = 0.05). The higher the confidence level (Z) we want, the lower the value of will be, so a larger number of samples will be needed.
- o
To calculate the sample size in these cases, only the variability of the parameter under investigation needs to be known, as both the precision and the confidence level are set by the researcher himself according to his own interests.
- •
Hypothesis testing: researchers aim to evaluate the results obtained in terms of previously established hypotheses (e.g. to assess which of two nursing interventions or care is more effective in critically ill patients). Therefore, this type of sample calculation is often applied mainly in clinical trials. For this purpose, researchers can compare whether the proportions or means obtained are different, according to the intervention applied. In this case, researchers need to determine the following values
- o
Direction of the alternative hypothesis (unilateral or bilateral): in general, it is recommended that the hypothesis be bilateral, as it is more conservative.
- o
Accepted risk of committing type I error or α: i.e., of rejecting the null hypothesis, when it should not have been rejected because it is true in the population. Generally, a risk of 5% (α = .05). is accepted.
- o
Accepted risk of committing type II (β): i.e., of not rejecting the null hypothesis, when it should have been rejected because it is false in the population. Generally, it is set between 5% and 20%. However, it is easier to make this decision based on statistical power (1 – β), since accepting an error of 20% implies that our study has an 80% chance of detecting the difference if it exists in reality.
- o
Magnitude of the expected difference, effect or association: the estimate of what we expect to obtain in our research should be realistic and based on previously conducted studies.
- o
Variability of the response variable in the reference population: an approximation of this should be taken, based on existing literature and previous research.
- o
Of these five values, only the last one needs to be known in order to calculate the sample size, as all the others are set by the researcher according to his/her own interests.
Loss-adjusted sample sizeFinally, all of the above calculations should be extended to include possible losses that may occur during the conduct of our research. This ensures that the study will end with the calculated sample. For this purpose, the expected proportion of losses (R) is defined and the formula Na = N [1/(1-R)] is applied, where N is the theoretical number of participants without losses and Na is the adjusted number of participants.3
FinancingThe author has no source of funding to declare.
Conflict of interestsThe author has no conflict of interests to declare.
Please cite this article as: Arrogante O. Técnicas de muestreo y cálculo del tamaño muestral: Cómo y cuántos participantes debo seleccionar para mi investigación. Enferm Intensiva. 2022;33:44–47.