Thursday, June 27, 2013

Consortium of Social Science Associations - NIH: Exploring New Approaches to Optimizing Peer Review

 Please view the below repost from the Consortium of Social Science Associations. Click the link for additional information.

 
At the 106th meeting of the Advisory Committee to the Director (ACD) of the National Institutes of Health (NIH) on June 13-14, members heard a presentation from Principal Deputy Director Lawrence Tabak entitled, Exploring New Approaches to Optimizing Peer Review. According to NIH director Francis Collins, the goal of the effort "is to be sure that the way in which [NIH has] its IRGs (Initial Review Groups) and study sections organized to do peer review accurately reflects the way in which science is moving." Science, said Collins, "moves very fast."

Tabak began his presentation by reiterating how "important peer review is to the NIH mission." He emphasized that the agency's two-tiered peer review system is the "foundation upon which the funding of extramural research is based." While this system is highly regarded throughout the world, Tabak stressed that the NIH feels "that it is vital for [it] to continue to innovate and optimize the process grant applications are reviewed." He explained that there is "activity already ongoing in this space," highlighting the enhancing peer review project initiated a number of years ago with the commitment "to continuously survey results of changes made at that time." He pointed to the recently released report that shows that people are satisfied or have "acculturated to the changes." According to the deputy director, researchers have different perceptions depending on whether they are funded or not. That, he stressed, is the challenge when assessing peer review. The previous day, the ACD heard from Roderic Pettigrew, director of the National Institute of Biomedical Imaging and Bioengineering and the Acting Chief Officer for Scientific Workforce Diversity along with Richard Nakamura, director of the NIH's Center for Scientific Review (CSR), regarding efforts associated with peer review as it relates to diversity efforts and the release of the aforementioned report.

Specific concerns, according to Tabak, have been raised over the years that the structure of the CSR's integrated review groups (IRGs) along with NIH's "dependence on normalized percentiling across IRGs might lead to funding of applications that are not of the highest priority." He explained that priority is defined as a "compilation of many things," including the "scientific quality of novelty," and the alignment of the core mission of the institute, center, or agency. He further explained that in theory, things like select pay or high/low program relevance could be used to address the issue.

The question, Tabak suggested, is, "Should a portion of NIH resources be redirected in a more systematic way to ensure [NIH] support of the 'best opportunities?'" Tabak emphasized that "best" means many things to different people. It is something the agency has to acknowledge, but "if the NIH wants to approach that, should the agency try to systematically evaluate the characteristics of the study section's 'performance?'" He pointed out that proponents for the current systems would say no based on their belief that the current system is great and there is a need for highly specialized experts at all levels because "they appreciate the nuances of a highly specialized focused field." Conversely, others will argue, who "is to say what field is more important than another." He acknowledged that this view has some validity. Nevertheless, he explained, the NIH has an IRG organization driven by the nature and the number of applications submitted. That raises the question of whether the NIH should be more proactive in attempting to identify emerging fields of science in order to "get a little ahead of the curve to ensure an optimal review of the freshest ideas."

Tabak reported that the NIH's Division of Coordination, Planning and Strategic Initiatives and the NIH's Office of Extramural Research were convened in January and given the task of overseeing development of methods that could potentially identify emergent highly active areas of science and others that may have become stagnated. This group was also assigned to recommend approaches to compare the state of the scientific field to how NIH organizes its study sections in order to produce a "more optimized dynamic system that is responsive to changes in scientific trends." He emphasized that the task becomes increasingly difficult as budget constraints become greater.  The reflexive answer, of course, is peer review, he maintained.

Tabak indicated that the purpose of his presentation was to share some of the ideas that the group has been testing in order to get feedback. Quantitative approaches include analysis of so-called study section inputs, i.e., the number of new applications, the number of new awards, and the relationship between the two different study sections for their different sizes. He showed plotted data, de-identified and collected from 2008-2012. In describing the data, he noted that in one quadrant high rates of new applications with high rates of awards could suggest that these areas could represent IRGs that are more vibrant where new science has been proposed. Conversely, in another quadrant, lower rates of new applications could potentially mean some of the areas represented in the IRGs are potentially stagnating.

Meaning of Differences in Application Rates

What the agency does not know and what would be reasonable, is whether there are inherent differences in application rates among different types of science. For example, he said, the low award rate may mean the study sections, for whatever reason, gives low scores to a initial (A0) grant applications may represent that the study section is favoring the more established investigators. On the other hand, the high rates of award could man the study section is more open to new ideas or have a preference for new investigators. If there is not a "caring bias" (low award rate), it could mean the areas of new science that are proposed are not as meritorious. On the other hand, he noted, high award rates may mean there are areas that reviewers are particularly enthusiastic about. Other possibilities: if award rates are not accounted for by the percentile scores, then the area may be scientifically saturated; award rates are driven by variations that one observes in individual institutes and centers-- a study section may be providing their output to an institute or center that has a particularly poor pay rate for the fiscal year. He noted that any and all of these are possible but yet it is a source of information that with additional examination may begin to give NIH insight. An ACD member interrupted to say that "so many variables" were making him "very uncomfortable."

Tabak further noted that the agency is always asked how it finds an emerging field. Accordingly, he stated that NIH is testing a whole series of approaches including analyses of work, literature, or applications which can precede widespread adoption that "could indicate the emergence of a new area where you see people who have never been supported by NIH before." Then there is the universe of social media and the data mining of it.

The agency can also look at study section outputs, "the bibliometric history of publications or patents normalized by the field of the science attributed to funded applications that were reviewed by an IRG." Acknowledging that there are numerous reasons why "citation analysis has limitation," Tabak stated that "if done with control it might be possible that [NIH] might be able to derive some interesting information." He then shared the "potential approach" to get the ACD's reaction. Using citations per year versus journal impact factors as a function of time, he suggested that NIH "might be able to reveal the 'performance of a study section.'" He immediately noted that what he is not saying, "because it introduces an anaphylactic response in people...is absolute citations as a number." Tabak stressed that he is "not talking about journal impact factor per se," but an approach "that allows one to self-control for these types of measures that may provide [NIH] with some information about the performance of the study section as a function of time." He then shared preliminary data with the committee. Tabak's full presentation is available here beginning around 1:17:00.

Other types of qualitative analyses include an NIH-wide portfolio review to compare qualitative measures to quantitative assessments by experts. He acknowledged that it is much easier to compare performance within a single field and no one has been able to figure out how to compare different fields to one another because it is immediately confounded by value judgment about relative importance and alignment of one field versus the other. This is a problem, said Tabak, no matter which field is selected, and raises the question why that was field chosen. People become very upset, nervous, hysterical, etc., he concluded.

Collins thanked Tabak for walking the ACD through the process, "which will definitely expand in other analyses." He emphasized that this is a "really important issue especially in a time of constrained resources. We cannot afford to just look the other way if we are not getting the right balance in our portfolio," Collins stated. "Whatever metrics we can come up with that are not inherently biased in their own way are worth looking at," so that we can make corrections to achieve a balanced portfolio. The Committee will resume its conversation of the topic at its December meeting.