Conference Room 6, Building 31, NIH Campus, Bethesda, MD
In charging the Peer Review Oversight Group (PROG), Dr. Varmus
emphasized the importance of peer review, and of having on this
committee people representing program, review, and the scientific
community. He indicated that the purpose of the committee is
to address issues of review policy common to the entire NIH, rather
than to focus on specific applications or study sections. Making
decisions about extramural grants is one of the most important
things done at NIH; therefore high quality scientific peer review
is crucial. Dr. Varmus added that he did not anticipate that
they would recommend changes abruptly or without experimentation,
but looks forward to hearing from the committee frequently. He
thanked both Dr. Baldwin for agreeing to chair the committee and
the Division of Research Grants (DRG) for their willingness to
be a major participant.
Dr. Baldwin presented the agenda, explaining that there were few
topics in order to allow for in-depth discussion and for topic-selection
from the group, and that the PROG will advise on but not manage
review at NIH. She indicated that there are some issues that
need not be discussed by PROG, as they have already been put in
place: streamlined review, limited number of amended applications,
and empowering special reviewers to assign scores -- these were
relatively clear decisions which did not require or lend themselves
to pilot studies or to major deliberation. PROG is not a rule-
or policy-making body, but an advisory group. They can suggest
pilots that the DRG or specific ICs may volunteer to pilot, and
PROG may then make recommendations based on the data obtained.
She emphasized that while PROG will discuss areas where peer
review might be improved, scientific peer review at NIH is not
a system in chaos; rather it is a model for peer review systems
worldwide. In a time of reinvention and self-examination, we
are seeking to make a good system better.
Possible Topics for PROG Consideration:
The first day of the meeting was a discussion of possible topics
and procedures for future work by the PROG. The need for peer
review to be dynamic and adaptable given changes in science was
noted as a general area in which PROG could contribute. Topics
mentioned included scientific progress and how that might be
helped or hindered by the separation of review and program at
NIH; the quality of review, and how that can be measured; similarities
or differences among DRG and the Institute and Center (IC) review;
how science maps to specific review groups; expertise within
study sections (breadth vs. depth); how to continue to review
those applications currently submitted but simultaneously create
a nurturing environment for the next wave of science; how to
manage review in low-volume areas of science; identification of
high-risk research (including how to define it); how to combat
the innate conservatism in science and in study sections (or how
to collect data to determine whether it really exists); and how
reviewers deal with non-scientific issues.
Some issues will inevitably be concerns of both PROG and the DRG
Advisory Committee: Scientific Review Administrator (SRA) training;
selection and supervision of SRAs; the roles of review and program
staff; the reviewer selection and approval process; reviewer
training/ retraining; travel to scientific meetings for SRAs;
balance between new and senior reviewers on study sections; communication
among study sections; and role and training of study section chairs.
Rating of Grant Applications:
There was discussion of the
timing of any implementation of recommendations from the Rating
of Grant Applications (RGA) report. Not everything can or needs
to be piloted, but some pilots should be done, and some of these
will need to begin before the closing date for public comment.
Dr. Baldwin pointed out that the starting point for peer review
is always science, and that we need to be sure that peer review
is always fair and perceived as fair. She went on to state that
no decision has yet been made on any of the RGA recommendations,
but as NIH moves into the decision mode it seems clear that, however
it is obtained, there is a need for a single score for each application.
If this were produced by an algorithm, that algorithm would be
made public; whatever the conclusion of the RGA deliberations,
it will be communicated to the scientific community.
Dr. Baldwin summarized the responses that have been obtained from
within the NIH and from the extramural scientific community to
date. Some recommendations appear to be non-controversial; these
include the idea of having a higher score as the better score.
Also, there is an emerging consensus against standardization
by reviewer. There seems to be general acceptance of using criteria,
but there may be a need to add one, and there is enthusiasm expressed
for a review-assigned global score.
The group discussed the criteria recommended by the RGA committee
(significance, approach and feasibility) and those recommended
by Dr. Yamamoto (impact, feasibility, creativity/uniqueness, and
investigator/environment), and whether and how those overlapped.
It was noted that these are not conceptually very different.
The RGA criterion of feasibility roughly equates to Dr. Yamamoto's
investigator/ environment. There was some enthusiasm for the
term impact instead of significance. The RGA term significance
includes originality, while Dr. Yamamoto's
impact criterion does not, but he would add creativity/ uniqueness.
The argument for inclusion of creativity was that without it
as a separate criterion, smaller laboratories doing especially
creative work would tend to lose out to large, well-established
laboratories which would always score well in the areas of approach
and/or environment; thus this criterion might be viewed as helping
young investigators to be treated fairly. It was also noted that
this criterion might provide a way to deal with emerging areas
of science. The group felt that some word-crafting is needed
to clearly define whatever criteria might be used, and that the
criteria set would likely be a hybrid of those discussed. It
was generally agreed that the criteria need to be broad enough
to cover all mechanisms.
Review by criteria was thought to be a reasonable idea; group
members felt having criteria would help to guide and focus reviewers'
discussion, and having criteria specifically addressed in critiques/
summary statements might increase the useful information provided
to those making funding decisions. This might be enhanced by
asking reviewers to include a final sentence indicating how the
criteria contributed to the global score, if a global score were
assigned by reviewers.
The group also discussed whether these criteria should be individually
scored. It was noted that contracts are scored by criterion without
difficulty, but there was some concern that such a scoring approach
could limit flexibility. The idea of using a non-linear letter
grade scale was met with mixed enthusiasm, and there were concerns
such a system would be received negatively by the community of
scientific investigators, and that it would not really send a
clear message. Several of the group's
members felt that the clearest message would be the written critiques
of the reviewers, but there was concern expressed that reviewers
often are reluctant to directly and clearly state negatives, e.g.
that the proposed research is boring or lacks creativity. It
was unanimously agreed that clarity in critiques is crucially
important, and that if some sort of score or rating is given to
each criterion, the written text must be consistent with that
The opinion was expressed that an overall score taps the reviewer's
scientific expertise and judgement, and it is important not to
tie the overall score to an algorithm as this limits the reviewers'
flexibility in responding to the unique aspects of each application.
With a global score, criterion weights can change project by
project, and additional factors can be included. There was some
discussion of the rating scale, and some brief information was
provided on an experiment currently taking place in NIGMS using
rounded scores, in an attempt to eliminate what is considered
to be overly fine precision. Additional information on this experiment
will be provided to PROG members when it becomes available, and
can be considered at the next meeting.
Dr. Baldwin summarized the sense of the group as follows. There
is enthusiasm for using four explicit criteria, although there
are differences of opinion about the exact wording for labeling
and defining these, and whether to use adjectival descriptors
or letter grades to rate them. There is enthusiasm for having
reviewers assign a global score; a decision is still needed on
the number of points on the rating scale, and it appears that
there is little if any opposition to a reversal of the scale,
with the higher number representing the better score. The issue
of standardizing by reviewer did not generate enthusiasm and was
tabled. The committee will communicate regarding pilots to be
performed, and the next (Nov. 20-21) meeting will involve making
decisions on these issues and recommending what would be an optimal
"change package" so that changes could be implemented for fiscal year 1998.
Other review issues were mentioned that were not specifically
related to the recommendations in the RGA report: Changing the
instructions for grant application preparation to include a paragraph
on each of the three/four criteria, rather than or in addition
to the abstract, and changing reviewer instructions such that
all reviewers read that portion of all applications.
Integration of Peer Review of NIAAA, NIDA, and NIMH with DRG:
Dr. Baldwin pointed out that the integration of the National Institute
on Alcohol Abuse and Alcoholism (NIAAA) grant application review
with that of the DRG offers a concrete example, and may serve
as a model for the other institutes, although it is not the only
way that integration can be accomplished. Drs. Faye Calhoun
and Ken Warren of the NIAAA and Donna Dean of DRG presented the
steps which were followed in the integration process. These included
a great deal of planning and open communication not only among
NIH staff but also between NIH staff and the members of the scientific
community, through professional association meetings and less
formal mechanisms. Essentially, four study sections (two from
NIAAA and two from DRG) were combined and restructured into four
new study sections in DRG based on the science to be reviewed,
and one study section in NIAAA merged with a Special Emphasis
Panel in DRG to form a new study section. Dr. Calhoun pointed
out that the process of merging and restructuring study sections
within DRG is not a new process, but having it happen across DRG
and an institute is new. This required cooperation and communication,
which will continue now that the integration has been accomplished.
This effort involved individual research project grant (R01 and
R29) and fellowship applications. Reviewers from the original
review groups suggested the recasting of scientific groupings
for the new study sections, and reviewers from all of the "original"
review groups were used in the newly formed study sections, and
experienced, highly respected former reviewers were invited as
special reviewers. While the new study sections have been functioning
for only one or two rounds, the integration seems to be well-received
by the scientific community.
Dr. Leshner commented on the importance of the issue, and said
the National Institute on Drug Abuse (NIDA) sees this as the integration
of their review into the entire NIH review process, adding that
their review process does not differ from that of the DRG. He
stated that NIDA reviews 1200-1400 applications per year, so
that the impact of even half of these on DRG will be substantial,
in terms of volume and in terms of scientific overlap with existing
DRG study sections. He estimated that there also would be scientific
overlap with 8-10 other Institutes and Centers (ICs).
It was suggested that perhaps the next step in review integration
needs to be within specific scientific areas, and Dr. Dean pointed
out that biopsychology might be a reasonable area to consider
since it is a fairly circumscribed scientific area and one in
which DRG currently has a staff vacancy. Another suggested area
is basic neuroscience, which is estimated to involve approximately
150 applications in each of NIDA and the National Institute of
Mental Health (NIMH) and would involve other ICs. It was pointed
out that this might be a good starting point for addressing the
issue of how decisions are made as to what portions of review
should be performed in ICs and what in DRG. Dr. Baldwin pointed
out that this is an NIH-wide issue in which all can benefit:
in the NIAAA/ DRG integration, they now have four new study sections
better able to review applications. She added that DRG should
not be viewed as static by the drug and mental health communities.
Through these integration efforts, we should be able to develop
a process through which emerging and smaller areas of science
can be more easily managed within the review process, and areas
of scientific overlap that could appropriately be reviewed within
the DRG but are currently being reviewed within ICs may be identified.
Return to PROG Home Page