NEJMの統計に対する指針
Welco111e and thank you for considering the New En9landjot.rrnal of Medicine (NEJM) as a venue for your work.
As the oldest continuously-published 1nedical journal, our 111ission since 1812 has been to bring to
physicians the best research at the intersection ofbio111edical science and clinical practice. We are
interested in publishing original research that is destined to change clinical practice and teaches
General Information
Author Cent er Home
Determine Your Article Type
Freq uently Asked Q uestions
Authors of Accepted Articles: What to Expect
Preparation I nstruetions
New Manuscripts
Revised Manuscripts
Letters to t he Editor
Images in Clinical Medicine
Policies and Guidelines
Editorial Po licies
Embargo Policy
Aut hor Perm issions
son1ething new about the biology of disease. In addition to original research, NEJM publishes reviews,
cases, conuuentary, and other content that is of interest to n1edical professionals.
The dedication of generations of researchers, authors, reviewers, and physician editors has 111ade NEJM the
111ost widely read and respected n1edical journal and website in the world.
Sttb1nitti11g to NEJM
NEJM uses highly rigorous editorial, peer, and statistical review processes to evaluate 111anuscripts for
scientific accuracy, novelty, and ilnportance.
Step 1: Acquaint yourself with NEJM Editorial Policies, Article Types, Presub111ission Options for
Presub111ission Inquiry and Rapid Review, Statistical Reporting Guidelines (if applicable), and key
NEJM Style Elen1ents
Step 2: Prepare Materials for Sub111ission including cover letter (optional), niain text, tables,
figures, supple111entary appendix, clinical trial protocol and statistical analysis plan (if applicable)
Step 3: Sub1nit your nianuscript by clicking on the red button at the top left hand side of this page.
Contact Us
Presubmission Inquiries
Request a Rapid Review
Editorial Office
Diselosures and Release Forms
Disclosure Form
Sam ple Disclosure Form
Photographs of Identifi able Patients
Presubmission Options
Most NEJM article types are both solicited (invited by NEJM editors) and unsolicited (subnJ.itted at author
discretion via the NEJM online 111anuscript sub111ission syste111).
PRES U BMISSION I NQ UI RY
Authors unsure of the suitability of their 1nanuscript for publication n1ay save considerable ti1ne and effort
by sending a Presubrnission Inquiry to which NEJM editors will endeavor to respond by e1nail within one
(1) week.
INVITED/COMM ISSION ED ART ICLES
Certain article types, including reviews and editorials, are usually solicited by NEJM editors in advance of
subnJ.ission. However, authors interested in proposing ideas for these article types niay also send
Presub111ission Inquiries.
RAPI D REVI EW
NEJM will consider requests for accelerated 111anuscript Rapid Review, especially when research results:
• Deal with urgent public health concerns
• Have potential to dran1atically change clinical practice or to affect n1ortality
• Are ti111ed to inuninent 111eeting presentations
Approval for Rapid Review does not guarantee acceptance of the 1nanuscript, nor does it guarantee
expedited publication ifthe 111anuscript is accepted. Each of these decisions is 1nade separately. NEJM
strives to reply to Rapid Reviev.r requests within three (3) business days.
Statistical Reporting Guidelines
Our Statistical Consultants reconunend the following best statistical practices in 1nanuscripts sub1nitted
to the journal. We reconunend that you follow then1 in the design and reporting of research studies.
For all studies:
• The Methods section of all n1anuscripts should contain a brief description of sa1nple size and power
considerations for the study, as well as a brief description of the n1ethods for pri1nary and secondary
analyses.
• The Methods section of all n1anuscripts should include a description of how n1issing data have been
handled. Unless n1issingness is rare, a con1plete case analysis is generally not acceptable as the pri1nary
analysis and should be replaced by n1ethods that are appropriate, given the n1issingness 111echanisni.
Multiple in1putation or inverse probability case weights can be used when data are n1issing at randon1;
111odel-based n1ethods 111ay be n1ore appropriate v.rhen n1issingness 111ay be infonnative. For the journal's
general approach to the handling of 111issing data in clinical trials please see Ware et al (N Engl J Med
2012;367:1353-1354 ).
• Significance tests should be acco111panied by confidence intervals for estilnated effect sizes, n1easures of
association, or other para111eters of interest. The confidence intervals should be adjusted to 1natch any
adjust111ent n1ade to significance levels in the corresponding test.
• Unless one-sided tests are required by study design, such as in noninferiority clinical trials, all reported P
values should be two-sided. In general, P values larger than 0.01 should be reported to two decitnal places,
and those between 0.01and0.001 to three decitnal places; Pvalues sn1aller than 0.001 should be reported
as P<0.001. Notable exceptions to this policy include P values arising fro111 tests associated with stopping
rules in clinical trials or fro1n geno1ne-wide association studies.
• Results should be presented with no n1ore precision than is of scientific value and is n1eaningful given the
available san1ple size. For exan1ple, 111easures of association, such as odds ratios, should ordinarily be
reported to two significant digits. Results derived fron1 n1odels should be li1nited to the appropriate
nun1ber of significant digits.
For clinical trials:
• Original and final protocols and statistical analysis plans (SAPs) should be subn1itted along with the
n1anuscript, as well as a table ofan1endn1ents n1ade to the protocol and SAP indicating the date of the
change and its content.
• The analyses of the prilnary outcon1e in n1anuscripts reporting results of clinical trials should n1atch the
Gp Vt I ~ Sir I M lnl I 0 Fi\ I • Gt I EIJ Cc I Ill Pr I EIJ ~ I f Sc I 0 Sc I 0 Sc I ~ G< I K Bl I =· w I G Pr I Q; H< I G la• I filiJ In I filiJ Tr I "':' Pc I "':' Pc e x +
i https://www.nejm.org/author-center/new-manuscripts
!:l Apps M Gmait 0 BLOG rb RBdigital Mag Gp ULMER :!JS Sci-Hub ® BooksSC 0 FultonSearch l!"J NYSL (;El FreeBook ; library l'J FB • German ~~ Westlaw
•
• Q Type here to search
,
analyses prespecified in the original protocol, except in unusual circu111stances. Analyses that do not
confonu to the protocol should be justified in the Methods section of the n1anuscript. The editors 111ay ask
for additional analyses that are not specified in the protocol.
• When co111paring outco1nes in two or 111ore groups in confinnatory analyses, investigators should use the
testing procedures specified in the protocol and SAP to control overall type I error - for exan1ple,
Bonferroni adjusunents or prespecified hierarchical procedures. P values adjusted for n1ultiplicity should
be reported when appropriate and labeled as such in the 1nanuscript. In hierarchical testing procedures, P
values should be reported only until the last con1parison for which the P value was statistically significant.
P values for the first nonsignificant con1parison and for all con1parisons thereafter should not be reported.
For prespecified exploratory analyses, investigators should use 111ethods for controlling false discovery rate
described in the SAP - for exa111ple, Benja1nini- Hochberg procedures.
• When no n1ethod to adjust for 111ultiplicity of inferences or controlling false discovery rate was specified in
the protocol or SAP of a clinical trial, the report of all secondary and exploratory endpoints should be
li1nited to point estilnates oftreaunent effects with 95°/o confidence intervals. In such cases, the Methods
section should note that the widths of the intervals have not been adjusted for 111ultiplicity and that the
inferences drawn 1nay not be reproducible. No P values should be reported for these analyses.
• Please see Wang et al (N Engl J Med 2007;357:2189- 2194) on reconunended n1ethods for analyzing
subgroups. When the SAP prespecifies an analysis of certain subgroups, that analysis should confonn to
the n1ethod described in the SAP. If the study tean1 believes a post hoc analysis of subgroups is ilnportant,
the rationale for conducting that analysis should be stated. Post hoc analyses should be clearly labeled as
post hoc in the 111anuscript .
x
.
.
•
• Forest plots are often used to present results fro111 an analysis of the consistency of a treat111ent effect
across subgroups of factors of interest. Such plots can be a useful display of estitnated treatn1ent effects
across subgroups, and the editors reco1111nend that they be included for in1portant subgroups. If
subgroups are s111all, however, fonnal inferences about the ho1nogeneity oftreatn1ent effects 1nay not be
feasible. A list of P values for treattnent by subgroup interactions is subject to the proble111s of111ultiplicity
and has li111ited value for inference. Therefore, in 1nost cases, no P values for interaction should be
provided in the forest plots.
• If significance tests of safety outcon1es (when not pri1nary outco1nes) are reported along with the
treat111ent-specific estitnates, no adjusunent for n1ultiplicity is necessary. Because infonnation contained
in the safety endpoints 111ay signal problen1s within specific organ classes, the editors believe that the type
I error rates larger than 0.05 are acceptable. Editors 111ay request that P values be reported for con1parisons
of the frequency of adverse events a111ong treaunent groups, regardless of whether such con1parisons were
prespecified in the SAP.
• When possible, the editors prefer that absolute event counts or rates be reported before relative risks or
hazard ratios. The goal is to provide the reader \¥ith both the actual event frequency and the relative
frequency. Odds ratios should be avoided, as they niay overestin1ate the relative risks in niany settings and
be niisinterpreted.
• Authors should provide a flow diagra111 in CONSORT fonnat. The editors also encourage authors to sub111it
all the relevant infonnation included in the CONSORT checklist. Although all of this infonnation 111ay not
be published with the nianuscript, it should be provided in either the nianuscript or a supple111entary
appendix at the tin1e of sub111ission. The CONSORT state1nent, checklist, and flow diagran1 are available
on the CONSORT website.
For observational studies:
The validity of findings fron1 observational studies depends on several in1portant assu111ptions, including
those relating to sa111ple selection, nieasured and un111easured confounding, and the adequacy of 111ethods
used to control for confounding. The Methods section of observational studies should describe how these
and other relevant issues were managed in the design and analysis.
• If an observational study included a prespecified SAP with a description of hypotheses to be tested, a
signed and dated version of that plan should be included \Vith the 111anuscript submission. The journal
encourages authors to deposit SAPs for observational studies in one of the online repositories designed for
this purpose.
• When appropriate, observational studies should use prespecified accepted 111ethods for controlling fa1nilywise error rate or false discovery rate when niultiple tests are conducted. In 111anuscripts reporting
observational studies without a prespecified niethod for error control, su111111ary statistics should be
li111ited to point esti111ates and 95°/o confidence intervals. In such cases, the Methods section should note
that the widths of the intervals have not been adjusted for niultiplicity and that the inferences drawn fro111
the inferences niay not be reproducible. No P values should be reported for these analyses.
• If no prespecified analysis plan exists, the Methods section should provide an outline for the planned
1nethod of analysis, including
o Eligibility criteria for the selection of cases and 1nethod of sa111pling fro111 the data, with a diagran1 as
appropriate.
o A description of the association or causal effect to be estin1ated and the rationale for this choice.
o The prespecified niethod of analysis to draw inference about treat1nent or exposure effect or
association.
• Studies reporting the effect of a treatinent or exposure should show the distribution of potential
confounders and other variables, stratified by exposure or intervention group. When the analysis depends
on the confounders being balanced by exposure group, differences between groups should be sununarized
with point estin1ates and 95°/o confidence intervals when appropriate.
• Con1plex niodels and their diagnostics can often be best described in a supplen1entary appendix. Authors
are encouraged to conduct an analysis that quantifies potential sensitivity to bias fron1 unn1easured
confounding; absent that, authors 111ust provide a discussion of potential biases induced by unn1easured
confounders.
• Authors are encouraged to retest findings in a sin1ilar but independent study or studies to assess the
robustness of their findings.
Key Journal Style Elements
U N ITS O F M EASUREM ENT
Authors should express all n1easuren1ents in conventional units, with Systen1e International (SI) units
U NITS OF MEASUREMEN T
Authors should express all nieasure1nents in conventional units, with Syste1ne International (SI) units
given in parentheses throughout the text. Figures and tables should use conventional units, with
conversion factors given in legends or footnotes. In accordance with the Unifonn Requiren1ents, however,
1nanuscripts containing only SI units will not be returned for that reason.
ABBREV IATIONS
Except for units ofn1easure1nent, abbreviations are strongly discouraged; the first titne an abbreviation
appears, it should be preceded by the words for which it stands.
DRUG NAMES
Generic na111es should be used. When proprietary brands are used in research, include the brand na1ne
and the na1ne of the 111anufacturer in parentheses after the first 1nention of the generic na111e in the
Methods section.
Prepare Materials for Submission
COVER L ETTER
Though cover letters are not required, the NEJM online subnlission syste111 contains a text field through
which itnportant infonnation that is not in the nietadata, such as a 1neeting presentation date or a 111ajor
conflict of interest not in the nianuscript, should be conununicated with initial 111anuscript sub1nissions.
MAN USCRIPT TEXT FILE
Con1pile all text, references, figure legends, and tables into a single double-spaced digital file (preferably
an MS Word docu111ent). NEJM will also accept text (.txt), or Rich Text Fonnat (.rtf) files.
TITLE PAGE
Create a title page that includes:
• Manuscript title
• Each author's name, highest degree, and affiliation/institution
• Contact information for one (1) corresponding author
ABSTRACT
Provide an abstract of not niore than 250 words with four labeled paragraphs containing the follo\ving:
• Background: Problen1 being addressed in the study
• Methods: How the study was perfor111ed
• Results: Salient results
• Conclusions: What the authors conclude fron1 study results
• Trial registration nun1ber
IDE NTIFYI NG DATA
At appropriate places in the 1nanuscript, please provide the following ite1ns:
• If applicable, a state1nent that the research protocol was approved by relevant institutional review boards
or ethics conunittees and that all hu111an participants gave written infonned consent
• Identities of those who analyzed the data ~~~~~~~~~~~~~~~~~~~~-
• For clinical trials, registration ntnnber and registry na111e (see: N EnglJ Med 2004;351:1J50cl)
• For studies containing n1icroarrays, accession ntunbers and repository na111e
RE FE RENCES
References n1ust be double-spaced and ntunbered consecutively as they are cited. References first cited in a
table or figure legend should be nu111bered so they will be in sequence with references cited in the text at the
point where the table or figure is first 1nentioned. List all citation authors when there are six or fewer; \Vhen
there are seven or 1nore, list the first three, followed by et al. The following are san1ple references:
1. Shapiro AMJ, Lakey JRT, Ryan EA, et al. Islet transplantation in seven patients with type 1 diabetes n1ellitus
using a glucocorticoid-free inununosuppressive regi111en. N Engl J Med 2000;343:230-8.
2. Goadsby PJ. Pathophysiology of headache. In: Silberstein SD, Lipton RB, Dalessio DJ, eds. Wolff's
headache and other head pain. 7th ed. Oxford, England: Oxford University Press, 2001:57-72.
3. IZucz111arski RJ, Ogden CL, Granuner-Strawn LM, et al. CDC growth charts: United States. Advance data
fro111 vital and health statistics. No. 314. Hyattsville, Md.: National Center for Health Statistics, 2000.
(DHHS publication no. (PHS) 2000-1250 0-0431.)
4. Medicare: trends in fees, utilization, and expenditures for iinaging services before and after
iluple111entation of the Deficit Reduction Act of2005. Washington, DC: Govern1nent Accountability
Office, Septe111ber 2008. (http://www.gao.gov/new.ite1ns/d081102r. pdf.)
Ntnubered references to personal conununications, unpublished data, or nianuscripts either "in
preparation" or "sub111itted for publication" are unacceptable. If essential, such n1aterials can be
incorporated at appropriate places in the text.
TABLES
All tables should be included at the end of the 1nanuscript text file . Double-space tables (including
footnotes) and provide a title for each table. For Original Articles, there is nonnally a litnit of five (5)
figures and tables (total) per 1nanuscript. Extensive tables or supplernentary n1aterials will be published as
supple1nental 1uaterials \¥ith the digital version of the article.
Figu res and Ill ustrations
Authors can either insert figures into text files (preferred) or upload figure files separately. Low-resolution
itnages n1ay be subtnitted for peer review, but be aware that NEJM 1nay, at a later stage, request highresolution versions that con1ply fully with detailed Technical Guidelines for Figures.
Supplementary Appendix
A 1nanuscript's Supple1uentary Appendix should be paginated, with a table of contents, followed by a list
of investigators (if there is one), text (such as n1ethods), figures, tables, and then references. Reference
citations in the Appendix and the corresponding list of references should be self-contained with respect to
the Appendix. The Appendix 1nust be subrnitted in two fonnats: PDF and MS Word (or another editable
text fonnat). The Appendix will not be edited for style and \¥ill be presented online as additional
infonnation provided by the authors.
Supplementary Figures and Tables
For outco111e scales, provide in the figure legend or table footnotes the range, sign, and 1ninitually
itnportant difference (ifknown). There n1ust be an infonnative reference citation for the scale. Each figure
should include a title and a legend, which should appear on the san1e page as the figure itself. Tables in the
Supplen1entary Appendix should be labeled Table Sl, Table S2, etc. Each table should be acco111panied by a
title and, if necessary, footnotes.
Trial Protocol and Statistical Analysis Plan (SAP)
Please include a clinical trial's protocol and statistical analysis plan with the sub111ission. The protocol 111ay
be redacted of proprietary infonnation, but nutst include infonnation on the patient flow and outcon1es.
Journal editors 1nay ask for n1ore infor111ation on redacted protocols.
SUBM IT TO NEJM
Click the button below to log in to the NEJM online subn1ission systen1 (ScholarOne Manuscripts)
to sub1nit a new 111anuscript. Once logged in, select Start New Sub111ission and follow the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
5-year review: The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)
Posted on April 19, 2024 by Mayo
In a July 19, 2019 post I discussed The New England Journal of Medicine’s response to Wasserstein’s (2019) call for journals to change their guidelines in reaction to the “abandon significance” drive. The NEJM said “no thanks” [A]. However confidence intervals CIs got hurt in the mix. In this reblog, I kept the reference to “ASA II” with a note, because that best conveys the context of the discussion at the time. Switching it to WSL (2019) just didn’t read right. I invite your comments.
The New England Journal of Medicine NEJM announced new guidelines for authors for statistical reporting yesterday*. The ASA describes the change as “in response to the ASA Statement on P-values and Statistical Significance and subsequent The American Statistician special issue on statistical inference” (ASA I and II,(note) in my abbreviation). If so, it seems to have backfired. I don’t know all the differences in the new guidelines, but those explicitly noted appear to me to move in the reverse direction from where the ASA I and II(note) guidelines were heading.
The most notable point is that the NEJM highlights the need for error control, especially for constraining the Type I error probability, and pays a lot of attention to adjusting P-values for multiple testing and post hoc subgroups. ASA I included an important principle (#4) that P-values are altered and may be invalidated by multiple testing, but they do not call for adjustments for multiplicity, nor do I find a discussion of Type I or II error probabilities in the ASA documents. NEJM gives strict requirements for controlling family-wise error rate or false discovery rates (understood as the Benjamini and Hochberg frequentist adjustments). They do not go along with the ASA II(note) call for ousting thresholds, ending the use of the words “significance/significant”, or banning “p ≤ 0.05”. In the associated article, we read:
“Clinicians and regulatory agencies must make decisions about which treatment to use or to allow to be marketed, and P values interpreted by reliably calculated thresholds subjected to appropriate adjustments have a role in those decisions”.
When it comes to confidence intervals, the recommendations of ASA II(note), to the extent they were influential on the NEJM, seem to have had the opposite effect to what was intended–or is this really what they wanted?
When no method to adjust for multiplicity of inferences or controlling false discovery rate was specified in the protocol or SAP of a clinical trial, the report of all secondary and exploratory endpoints should be limited to point estimates of treatment effects with 95% confidence intervals. In such cases, the Methods section should note that the widths of the intervals have not been adjusted for multiplicity and that the inferences drawn may not be reproducible. No P values should be reported for these analyses.
Significance levels and P-values, in other words, are terms to be reserved for contexts in which their error statistical meaning is legitimate. This is a key strong point of the NEJM guidelines. Confidence levels, for the NEJM, lose their error statistical or “coverage probability” meaning, unless they follow the adjustments that legitimate P-values call for. But they must be accompanied by a sign that warns the reader the intervals were not adjusted for multiple testing and thus “the inferences drawn may not be reproducible.” The P-value, but not the confidence interval, remains an inferential tool with control of error probabilities. Now CIs are inversions of tests, and strictly speaking should also have error control. Authors may be allowed to forfeit this, but then CIs can’t replace significance tests and their use may even (inadvertently, perhaps) signal lack of error control. (In my view, that is not a good thing.) Here are some excerpts:
For all studies:
Significance tests should be accompanied by confidence intervals for estimated effect sizes, measures of association, or other parameters of interest. The confidence intervals should be adjusted to match any adjustment made to significance levels in the corresponding test.
For clinical trials:
Original and final protocols and statistical analysis plans (SAPs) should be submitted along with the manuscript, as well as a table of amendments made to the protocol and SAP indicating the date of the change and its content.
The analyses of the primary outcome in manuscripts reporting results of clinical trials should match the analyses prespecified in the original protocol, except in unusual circumstances. Analyses that do not conform to the protocol should be justified in the Methods section of the manuscript. …
When comparing outcomes in two or more groups in confirmatory analyses, investigators should use the testing procedures specified in the protocol and SAP to control overall type I error — for example, Bonferroni adjustments or prespecified hierarchical procedures. P values adjusted for multiplicity should be reported when appropriate and labeled as such in the manuscript. In hierarchical testing procedures, P values should be reported only until the last comparison for which the P value was statistically significant. P values for the first nonsignificant comparison and for all comparisons thereafter should not be reported. For prespecified exploratory analyses, investigators should use methods for controlling false discovery rate described in the SAP — for example, Benjamini–Hochberg procedures.
When no method to adjust for multiplicity of inferences or controlling false discovery rate was specified in the protocol or SAP of a clinical trial, the report of all secondary and exploratory endpoints should be limited to point estimates of treatment effects with 95% confidence intervals. In such cases, the Methods section should note that the widths of the intervals have not been adjusted for multiplicity and that the inferences drawn may not be reproducible. No P values should be reported for these analyses.
As noted earlier, since P-values would be invalidated in such cases, it’s entirely right not to give them. CIs are permitted, yes, but are required to sport an alert warning that, even though multiple testing was done, the intervals were not adjusted for this and therefore “the inferences drawn may not be reproducible.” In short their coverage probability justification goes by the board.
I wonder if practitioners can opt out of this weakening of CIs, and declare in advance that they are members of a subset of CI users who will only report confidence levels with a valid error statistical meaning, dual to statistical hypothesis tests.
The NEJM guidelines continue:
…When the SAP prespecifies an analysis of certain subgroups, that analysis should conform to the method described in the SAP. If the study team believes a post hoc analysis of subgroups is important, the rationale for conducting that analysis should be stated. Post hoc analyses should be clearly labeled as post hoc in the manuscript.
Forest plots are often used to present results from an analysis of the consistency of a treatment effect across subgroups of factors of interest. …A list of P values for treatment by subgroup interactions is subject to the problems of multiplicity and has limited value for inference. Therefore, in most cases, no P values for interaction should be provided in the forest plots.
If significance tests of safety outcomes (when not primary outcomes) are reported along with the treatment-specific estimates, no adjustment for multiplicity is necessary. Because information contained in the safety endpoints may signal problems within specific organ classes, the editors believe that the type I error rates larger than 0.05 are acceptable. Editors may request that P values be reported for comparisons of the frequency of adverse events among treatment groups, regardless of whether such comparisons were prespecified in the SAP.
When possible, the editors prefer that absolute event counts or rates be reported before relative risks or hazard ratios. The goal is to provide the reader with both the actual event frequency and the relative frequency. Odds ratios should be avoided, as they may overestimate the relative risks in many settings and be misinterpreted.
Authors should provide a flow diagram in CONSORT format. The editors also encourage authors to submit all the relevant information included in the CONSORT checklist. …The CONSORT statement, checklist, and flow diagram are available on the CONSORT
Detailed instructions to ensure that observational studies retain control of error rates are given.
In the associated article:
P values indicate how incompatible the observed data may be with a null hypothesis; “P<0.05” implies that a treatment effect or exposure association larger than that observed would occur less than 5% of the time under a null hypothesis of no effect or association and assuming no confounding. Concluding that the null hypothesis is false when in fact it is true (a type I error in statistical terms) has a likelihood of less than 5%. [i]…
The use of P values to summarize evidence in a study requires, on the one hand, thresholds that have a strong theoretical and empirical justification and, on the other hand, proper attention to the error that can result from uncritical interpretation of multiple inferences.5 This inflation due to multiple comparisons can also occur when comparisons have been conducted by investigators but are not reported in a manuscript. A large array of methods to adjust for multiple comparisons is available and can be used to control the type I error probability in an analysis when specified in the design of a study.6,7 Finally, the notion that a treatment is effective for a particular outcome if P<0.05 and ineffective if that threshold is not reached is a reductionist view of medicine that does not always reflect reality. [ii]
… A well-designed randomized or observational study will have a primary hypothesis and a prespecified method of analysis, and the significance level from that analysis is a reliable indicator of the extent to which the observed data contradict a null hypothesis of no association between an intervention or an exposure and a response. Clinicians and regulatory agencies must make decisions about which treatment to use or to allow to be marketed, and P values interpreted by reliably calculated thresholds subjected to appropriate adjustments have a role in those decisions.
Finally, the current guidelines are limited to studies with a traditional frequentist design and analysis, since that matches the large majority of manuscripts submitted to the Journal. We do not mean to imply that these are the only acceptable designs and analyses. The Journal has published many studies with Bayesian designs and analyses8-10 and expects to see more such trials in the future. When appropriate, our guidelines will be expanded to include best practices for reporting trials with Bayesian and other designs.
What do you think?
The author guidelines:
https://www.nejm.org/author-center/new-manuscripts
The associated article:
https://www.nejm.org/doi/full/10.1056/NEJMe1906559
*I meant to thank Nathan Schachtman for notifying me and sending links; also Stuart Hurlbert.
[i] It would be better, it seems to me, if the term “likelihood” was used only for its technical meaning in a document like this.
[ii] I don’t see it as a matter of “reductionism” but simply a matter of the properties of the test and the discrepancies of interest in the context at hand.
[A] A self-published book on this episode, by Donald Macnaughton, came out in 2021: The War on Statistical Significance: The American Statistician vs. the New England Journal of Medicine.
Related
5-year Review: P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)
May 10, 2024
In "5-year memory lane"
P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)
November 30, 2019
In "ASA Guide to P-values"
The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)
July 19, 2019
In "ASA Guide to P-values"
Categories: 5-year memory lane, abandon statistical significance, ASA Guide to P-values |
6 Comments
Post navigation← OLDER POSTNEWER POST →
6 thoughts on “5-year review: The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)”
Nathan A Schachtman's avatar
April 21, 2024
Nathan A Schachtman
Mayo,
I have not done a careful review of the NEJM since its new statistical guidelines were issued, but I have occasion to read its articles. I’ve not seen any instance in which authors qualified the meaning of their confidence intervals, or adjusted their calculations of standard error to reflect multiple comparisons. On the other hand, I’ve not seen any instance of arguably inappropriate declarations of statistical significance for a non-prespecified primary outcome. In the recent clinical trial (TRAVERSE) of testosterone therapy in hypogonadal men with cardiovascular risk factors, the authors of article, from June 2023, reported hazard ratios with 95% CIs for primary, secondary, and tertiary end points. Lincoff, Cardiovascular Safety of Testosterone-Replacement Therapy NEJM (2023). Safety end points were presented as % in each arm, with a p-value. Even though an RCT such as this one will have hundreds of unadjudicated adverse events reported in both arms, the p-values are not adjusted for such end points. The article does not contain the words “statistically significant,” but others will, and have, used the phrase to describe the non-pre-specified safety endpoint findings.
The Journal of the American Medical Association (JAMA) and its many subjournals were also targeted by Dr Wasserstein’s email campaign, and they continue to use the phrase “statistically significant,” probably in a less disciplined way than the NEJM.
Nathan Schachtman
Reply
Mayo's avatar
April 21, 2024
Mayo
Nathan:
Thank you for your comment. It is cleverly put in lawyerly terms. You say you’ve not seen any declarations of statistical significance on non-prespecified outcomes, but also observe that p-values are reported without adjustment, in cases where multiplicity would warrant adjustment. Is that right? (I recall NEJM calling for a warning in such cases.) Moreover, you say “others will and have” used the banned terms in relation to the results of the same article? Are these “others” in published writings, court cases, or?
Your findings are very interesting and I hope you will write a guest post for this blog for my “5-year report” or whatever we might call it.
Reply
Nathan A Schachtman's avatar
April 21, 2024
Nathan A Schachtman
Haha; your question is a pretty good cross-examination for a non-lawyer! Yes; the p-values for safety outcomes (adverse events) were not adjusted. It was also obvious to readers that they were “nominal” levels of significance probabilities. The safety events are categorized by MedDRA codes, for which there are, I believe, more entries than ICD-10 codes. So well over a thousand. The events are reported by the blinded clinical trial centers without respect to the arm the patient is in, and some patients may give rise to more than one reported event. I believe the TRAVERSE trial involved close to 5,000 patients randomized. (I am going on memory here.) All I can say is that safety events have always been reported this way; many clinical readers distinguish between efficacy and safety outcomes, and indulge a certain amount of precautionary thinking on the latter; no one was fooled; and the authors themselves did not refer to the events reported as “statistically significant disparities.” Other commentators may have called the safety outcomes statistically significant, and certainly in legal settings, I would anticipate hearing the results described as such, because p < 5%.
In my view, the bigger point is that the NEJM and the JAMA resisted the lobbying of the ASA. The NEJM did refine its guidance for authors, but my sense is that not much changed at JAMA or its family of journals. Of the other three major clinical journals (BMJ, Annals of Internal Medicine, and Lancet), the Annals strikes as the best for its statistical editing, but I would be interested to hear others’ views.
Nathan
Reply
Mayo's avatar
April 21, 2024
Mayo
Nathan:
I hadn’t heard of MedDRA codes, so I looked it up. On a quick glance, it looks like a great big list of standardized terms to use, especially in describing patient adverse events and reactions. I read your last comment too quickly and I thought you were driving at some kind of misleading statistical report or use of p-values. Yes, obviously efficiency is very different from safety, and it’s appropriate to report all standard observed adverse events. There are a variety of known effects, and they are recorded, perhaps for further explanation. There’s no multiplicity and selection of the sort requiring adjustment that I can see, as there would be if, for example, only those in support of a given theory were reported. This sounds more like Fisher’s recommendation to ask many questions of a set of data. David Cox and I gave a very abbreviated list (of when adjustment mattered) in our 2006 paper—I , of course, was following his lead, as the expert. However, I’m just guessing here, as I shouldn’t. I haven’t seen the paper and don’t even know what the trials were all about Can you link it?
Yes, it’s good that respectable journals, to my knowledge, resisted the lobbying of certain members of the ASA. The funny thing is, I get along with Wasserstein, whenever we interact, and he’s never really intimidated me on this dispute of ours. I was, and to some extent, still am, convinced the wordsmithers he was relying on went too far. That’s why I wrote the “don’t say what you don’t mean” post on June 19, 2019, reblogged here:
https://errorstatistics.com/2024/04/05/my-2019-friendly-amendments-to-that-abandon-significance-editorial/
Not that the proposed revisions were made. He wanted to make a splash, and he did.
Reply
Nathan A Schachtman's avatar
April 22, 2024
Nathan A Schachtman
The randomized controlled trial to which I referred can be found here:
https://www.nejm.org/doi/full/10.1056/NEJMoa2215025
It is one that can be freely downloaded from the NEJM. It is, in a way, yet another example of the large RCT putting to rest prior observational studies (often poorly done), and meta-analyses of small RCTs. In medicine, this has happened before. The debacle over Avandia (rosiglitazone) is another example, but then it was Dr Nissen who published the meta-analysis, later undone by the “mega-trial.” Here Nissen was the P.I. of the TRAVERSE trial.
In addition to what might be a linguistic or word-smithing issue, there was the email campaign.
I roughly recall that I found an exemplar of the email that was sent out to journals as part of what seemed to be a campaign to end “statistical significance” testing. (I believe I shared the email from the website with you and others at the time. I am in Canada now and I cannot check my “archives.”) As I recall, the email had the logo of the ASA on it, and I took that as further support that the 2019 editorial was at best coy in not having had a disclaimer. Perhaps it is the lawyer in me. Although I don’t do it in informal communications, when I publish articles, I usually note that my views are not necessarily shared by my firm or my clients. Wasserstein put his official position in the 2019 opinion piece; to me, that gave rise to a need for a disclaimer.
I have no corroboration that the NEJM’s change in statistical guidelines, and its published explanation, came about because its editors received an email from the ASA written by Wasserstein. Or that the journal Clinical Trials published its piece around the same time because of the email campaign. Still, I think the cookie crumbs point in that direction.
I am not offended by the email campaign or by the editorial other than its implied misrepresentation of “provenance.” It seem clear and entirely proper that Wasserstein, as Wasserstein, has a view of statistical inference and practice that he would like to see followed in the scientific world. If the NEJM moved as a result of Wasserstein’s email or his editorial, the move was generally in a good direction, from where I am observing. He may be disappointed or disapproving, but I think the NEJM guidelines are an improvement over past practice. Perhaps he can take credit for that improvement.
Reply
Mayo's avatar
April 22, 2024
Mayo
Nathan:
Thank you for the link. I will read it. Of course i recall of the details you mention. This blog has a good repository of all of it, including the letterhead, the President’s Task Force, and, ultimately, the disclaimer. I am grateful to you for some of the highlights. I am reblogging some items over the next month, and hope to get a blogpost from you at some point, when you can.
Reply
I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension.
Write a comment...
Comment
Search for:
The Statistics Wars & Their Casualties
Tug of War
PhilStatWars Forum
Workshop LSE CPNSS (New date! 22-23 September 2022)
Blog links (references)
Links to commonly referenced papers
Reviews of Statistical Inference as Severe Testing (SIST)
C. Hennig (2019) Statistical Modeling, Causal. Inference, and Social Science blog
A. Spanos (2019) OEconomia: History, Methodology, Philosophy
R. Cousins 2020 (Preprint)
S. Fletcher (2020) Philosophy of Science
B. Haig (2020) Methods in Psychology
C. Mayo-Wilson (2020 forthcoming) Philosophical Review
T. Sterkenburg (2020) Journal for General Philosophy of Science
P. Bandyopadhyay (2019) Notre Dame Philosophical Reviews
Interviews & Debates on PhilStat (2020)
The Statistics Debate!with Jim Berger, Deborah Mayo, David Trafimow & Dan Jeske, moderator (10/15/20)
The Filter podcast with Matt Asher (11/23/20)
Philosophy of Data Science Series Keynote Episode 1: Revolutions, Reforms, and Severe Testing in Data Science with Glen Wright Colopy (11/24/20)
Philosophy of Data Science Series Keynote Episode 2: The Philosophy of Science & Statistics with Glen Wright Colopy (12/01/20)
Interviews on PhilStat (2019)
CUP Author Interview 2/7/19
Interview by Nick Zautra for Sci Phi Podcast (episode #58) 2/12/19
"Statistics Wars" APA interview by Nathan Oseroff 3/7/19
LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat
Summer Seminar 2019 (article)
Experts convene to explore new philosophy of statistics field
Top Posts & Pages
Stephen Senn (guest post): "Relevant significance? Be careful what you wish for"
Spurious Correlations: Death by getting tangled in bedsheets and the consumption of cheese! (Aris Spanos)
"Sennsible significance" Commentary on Senn's Guest Post (Part I)
Leaked: A Private Message From ChatGPT
(Guest Post) Stephen Senn: "Delta Force: To what extent is clinical relevance relevant?" (reblog)
Stephen Senn: The pathetic P-value (Guest Post)
Mayo Pubs
Georgi Georgiev (Guest Post): "The frequentist vs Bayesian split in online experimentation before and after the 'abandon statistical significance' call"
S. Senn: "Error point: The importance of knowing how much you don’t know" (guest post)
Going Where the Data Take Us
Conferences & Workshops
Summer Seminar in Phil Stat 2019
Stat Sci Phil Sci 2010 Conference site
ERROR06
Workshop on Philosophy of Science & Evidence Relevant for Regulation & Policy
Forum: Experimental Knowledge & The Deep Structure of the World
RMM Special Topic
Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?
Mayo & Spanos, Error Statistics
Mayo & Spanos (2011) ERROR STATISTICS
Follow
My Websites
Publications
2008 LSE Philosophy of Statistics course materials
2011 LSE 3 weeks in (Nov-Dec) ad hoc group reading materials
Recent Posts: PhilStatWars
The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2
THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4
Final session: The Statistics Wars and Their Casualties: 8 December, Session 4
SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4
December 2024
© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.
2025年7月5日 | カテゴリー:自然科学的基礎知識//物理学、統計学、有機化学、数学、英語 |