We welcome the comments regarding our article since we cherish the high-level academic discussion that gives us the opportunity to delve into the underlying motivations which are the following:
- A)
Contribute to the continuous improvement of Mexican research.
- B)
Promote the use of checklists to improve the design, methodology, and reporting quality of national publications.
- C)
Sand out the manuscript quality impact on the clinician's judgment about the usefulness of diagnostic or therapeutic methods.
A worldwide concern exists about the reproducibility, reliability, and validity of published research and the biases to which it is exposed that has prompted the creation of different strategies to reduce it.1
One of these strategies, supported by the World Health Organization (WHO) and the Pan American Health Organization (PAHO) for its efficiency and low-cost implementation, is the use of guidelines for the development of higher methodological quality research. Among them are CONSORT for randomized trials, STROBE for observational studies, and STARD for diagnostic accuracy studies, to mention some examples. These guidelines are grouped for easy reference on the web page from the EQUATOR Network (Enhancing the Quality and Transparency of Health Research) initiative2.
Our paper is guided by this spirit, and therefore its goal is “to compare the quality of the validation reports published and their risk of bias among the screening tests developed and validated in Mexico3”. For this reason, our opinions are not issued on the comparative utility of the tests—which would invariably require an experimental design—but on the content of validation reports, as well as a subjective judgment based on validated checklists globally known for the risk of bias of presented data.
We briefly respond to the comments:
- 1.
It is mentioned: “In Table 1. General description of the screening tests compared in the study, in the column Aspects evaluated…”. In response, we are thankful for pointing out the error since the column should say “Diverse aspects without grouping by development areas”.
- 2.
Regarding the age range showed in the same table, this was obtained from the validation article on page 1384: “The INDIPCD-R were initially integrated in two formats: format “A” with 44 items (used with children from 0 to 2 years of age) and “B” with 41 items (for children 2 to 4 years of age)”, which results confusing because later they describe six age ranges and a standardization performed with a completely different sample to that mentioned in the article. Therefore, in the comparative analysis, in the age range column, it should say “From 6 to 48 months (assessed in six groups)”. We consider important that the author corrects the misunderstanding that gives rise to contradictions in her validation article.
- 3.
Regarding the marked errors in Table 2, these data were obtained unedited from the analyzed paper4, where the abstract states: “Materials and methods: transversal comparative study with 145 infants from a clinic and two childcare facilities (CENDI…”. Although the scale standardization with 347 children is mentioned in subsequent lines, such standardization is not the published validation; therefore, is not the review subject of the paper that the authors wrote. Subsequently, we observed that the total sample to which two tests were applied and, therefore, analyzed for validation was of 83 children.4
- 4.
About: “The assertion established in Table 4: Survey for assessing risk of bias in QUADAS diagnostic accuracy studies, which mentions that the INDIPCD-R has a high risk of bias, is incorrect”3, we disagree because of the confusing and contradictory paper analized4, where a validation sample of 145 children is mentioned, but then it is stated that the data of sensitivity and specificity was obtained from only 83 children. Then, a standardization of which no data is published for 347 children was suggested, which led us to affirm that there was a high risk of bias in the sample selection, based only on the quality of the validation report. With this in mind, the only statement we agree that can be doubtful is the one referring to the validity of the reference standard since PCD-R was validated with an entirely independent test and created by another group of people, widely recognized as a gold standard in the field.5
- 5.
Regarding the question “INDIPCD-R cannot be compared against itself, experts did not consider that the scale of development Profile of Behaviors in Development (PCD-R, for its Spanish acronym), which was used as a gold standard, is an independent from the INDPCD-R”, we believe that the statement we made: “the INDIPCD-R has a high risk of bias on its index and gold standard test, as it is validated against itself”3 is exaggerated and should be phrased as: “the INDIPCD-R has a high risk of bias on its index and gold standard test, as it is validated using a gold standard test with the same origin and proposed by the same authors”.
- 6.
In response to the statement: “We decided to use the PCD-R as a gold standard because it is a test that has validity and reliability studies in Mexican population”, and about the comment of the example mentioning Bailey as gold standard validation of BINSS6, we believe that the reputation of a research group does not exempt them from questioning, and that in both cases there is a clear conflict of interest, which is not bad per se, but it should have been reported.
- 7.
Finally, it is affirmed that our paper uses the words “diagnosis and screening” indistinctly. This statement is unfounded since the title specifies that the paper only compares screening and not diagnosis tests. The difference between these two concepts is very clear to the authors.
The use of the phrase “diagnostic accuracy validation studies report quality checklists” is entirely justified because, although screening instruments and diagnostic tools are different in their purpose, construct, development, and clinical use, there is no difference in their validation methodology. Thus, they report the same agreement points between the test evaluated and the gold standards used by the authors (sensitivity, specificity, negative predictive value, positive predictive value) and are exposed to the same methodological errors and limitations that may induce bias.
We appreciate the careful reading of our article and, in a spirit of collaboration where we all strive for quality research in Mexico, we hope that this letter prompts the author to make the corrections in the paper, which led to the misunderstanding used in the comparison, so that the INDIPCD-R can reach the recognition, impact and use that it deserves.
Please cite this article as: Orcajo-Castelán R, Sidonio-Aguayo B. Respuesta a carta al editor: Análisis comparativo de pruebas de tamiz para la detección de problemas en el desarrollo diseñadas y validadas en México. Bol Med Hosp Infant Mex. 2016;73:292–294.