Tabak
began his presentation by reiterating how "important peer review is to
the NIH mission." He emphasized that the agency's two-tiered peer review
system is the "foundation upon which the funding of extramural research
is based." While this system is highly regarded throughout the world,
Tabak stressed that the NIH feels "that it is vital for [it] to continue
to innovate and optimize the process grant applications are reviewed."
He explained that there is "activity already ongoing in this space,"
highlighting the enhancing peer review project initiated a number of
years ago with the commitment "to continuously survey results of changes
made at that time." He pointed to the recently released report
that shows that people are satisfied or have "acculturated to the
changes." According to the deputy director, researchers have different
perceptions depending on whether they are funded or not. That, he
stressed, is the challenge when assessing peer review. The previous day,
the ACD heard from Roderic Pettigrew, director of the National
Institute of Biomedical Imaging and Bioengineering and the Acting Chief
Officer for Scientific Workforce Diversity along with Richard Nakamura,
director of the NIH's Center for Scientific Review (CSR), regarding
efforts associated with peer review as it relates to diversity efforts
and the release of the aforementioned report.
Specific
concerns, according to Tabak, have been raised over the years that the
structure of the CSR's integrated review groups (IRGs) along with NIH's
"dependence on normalized percentiling across IRGs might lead to funding
of applications that are not of the highest priority." He explained
that priority is defined as a "compilation of many things," including
the "scientific quality of novelty," and the alignment of the core
mission of the institute, center, or agency. He further explained that
in theory, things like select pay or high/low program relevance could be
used to address the issue.
The
question, Tabak suggested, is, "Should a portion of NIH resources be
redirected in a more systematic way to ensure [NIH] support of the 'best
opportunities?'" Tabak emphasized that "best" means many things to
different people. It is something the agency has to acknowledge, but "if
the NIH wants to approach that, should the agency try to systematically
evaluate the characteristics of the study section's 'performance?'" He
pointed out that proponents for the current systems would say no based
on their belief that the current system is great and there is a need for
highly specialized experts at all levels because "they appreciate the
nuances of a highly specialized focused field." Conversely, others will
argue, who "is to say what field is more important than another." He
acknowledged that this view has some validity. Nevertheless, he
explained, the NIH has an IRG organization driven by the nature and the
number of applications submitted. That raises the question of whether
the NIH should be more proactive in attempting to identify emerging
fields of science in order to "get a little ahead of the curve to ensure
an optimal review of the freshest ideas."
Tabak
reported that the NIH's Division of Coordination, Planning and
Strategic Initiatives and the NIH's Office of Extramural Research were
convened in January and given the task of overseeing development of
methods that could potentially identify emergent highly active areas of
science and others that may have become stagnated. This group was also
assigned to recommend approaches to compare the state of the scientific
field to how NIH organizes its study sections in order to produce a
"more optimized dynamic system that is responsive to changes in
scientific trends." He
emphasized that the task becomes increasingly difficult as budget
constraints become greater. The reflexive answer, of course, is peer
review, he maintained.
Tabak
indicated that the purpose of his presentation was to share some of the
ideas that the group has been testing in order to get feedback.
Quantitative approaches include analysis of so-called study section
inputs, i.e., the number of new applications, the number of new awards,
and the relationship between the two different study sections for their
different sizes. He showed plotted data, de-identified and collected
from 2008-2012. In describing the data, he noted that in one quadrant
high rates of new applications with high rates of awards could suggest
that these areas could represent IRGs that are more vibrant where new
science has been proposed. Conversely, in another quadrant, lower rates
of new applications could potentially mean some of the areas represented
in the IRGs are potentially stagnating.
Meaning of Differences in Application Rates
What
the agency does not know and what would be reasonable, is whether there
are inherent differences in application rates among different types of
science. For example, he said, the low award rate may mean the study
sections, for whatever reason, gives low scores to a initial (A0) grant
applications may represent that the study section is favoring the more
established investigators. On the other hand, the high rates of award
could man the study section is more open to new ideas or have a
preference for new investigators. If there is not a "caring bias" (low
award rate), it could mean the areas of new science that are proposed
are not as meritorious. On the other hand, he noted, high award rates
may mean there are areas that reviewers are particularly enthusiastic
about. Other possibilities: if award rates are not accounted for by the
percentile scores, then the area may be scientifically saturated; award
rates are driven by variations that one observes in individual
institutes and centers-- a study section may be providing their output
to an institute or center that has a particularly poor pay rate for the
fiscal year. He noted that any and all of these are possible but yet it
is a source of information that with additional examination may begin to
give NIH insight. An ACD member interrupted to say that "so many
variables" were making him "very uncomfortable."
Tabak
further noted that the agency is always asked how it finds an emerging
field. Accordingly, he stated that NIH is testing a whole series of
approaches including analyses of work, literature, or applications which
can precede widespread adoption that "could indicate the emergence of a
new area where you see people who have never been supported by NIH
before." Then there is the universe of social media and the data mining
of it.
The
agency can also look at study section outputs, "the bibliometric
history of publications or patents normalized by the field of the
science attributed to funded applications that were reviewed by an IRG."
Acknowledging that there are numerous reasons why "citation analysis
has limitation," Tabak stated that "if done with control it might be
possible that [NIH] might be able to derive some interesting
information." He then shared the "potential approach" to get the ACD's
reaction. Using citations per year versus journal impact factors as a
function of time, he suggested that NIH "might be able to reveal the
'performance of a study section.'" He immediately noted that what he is
not saying, "because it introduces an anaphylactic response in
people...is absolute citations as a number." Tabak stressed that he is
"not talking about journal impact factor per se," but an approach "that
allows one to self-control for these types of measures that may provide
[NIH] with some information about the performance of the study section
as a function of time." He then shared preliminary data with the
committee. Tabak's full presentation is available here beginning around 1:17:00.
Other
types of qualitative analyses include an NIH-wide portfolio review to
compare qualitative measures to quantitative assessments by experts. He
acknowledged that it is much easier to compare performance within a
single field and no one has been able to figure out how to compare
different fields to one another because it is immediately confounded by
value judgment about relative importance and alignment of one field
versus the other. This is a problem, said Tabak, no matter which field
is selected, and raises the question why that was field chosen. People
become very upset, nervous, hysterical, etc., he concluded.
Collins
thanked Tabak for walking the ACD through the process, "which will
definitely expand in other analyses." He emphasized that this is a
"really important issue especially in a time of constrained resources.
We cannot afford to just look the other way if we are not getting the
right balance in our portfolio," Collins stated. "Whatever metrics we
can come up with that are not inherently biased in their own way are
worth looking at," so that we can make corrections to achieve a balanced
portfolio. The Committee will resume its conversation of the topic at
its December meeting.