Scientific progress depends on a continual influx of new ideas that challenge established thinking, as well as the development of new approaches that can overcome experimental barriers. The field of biophysics is centrally positioned to provide such ideas and approaches, so nurturing programs that promote these developments is important to our Society. Many funding agencies have the stated goal of innovation, but in times of limited resources, grant reviewers tend to become risk-averse, which rewards less risky projects that are necessarily less innovative.
In particular, the National Institutes of Health (NIH) has recognized the need for increased innovation, and has tried many approaches to encourage innovation in its grant portfolio. I have experienced this as an applicant, a reviewer, and a review panel chair. The most obvious efforts are the specific grant programs designed to promote high-risk, high-impact research. These include Transformative R01 grants, Pioneer Awards, and New Innovator Awards, but even together, the funding for these programs totals less than one percent of the NIH budget. It would seem prudent for a larger percentage of funding to be directed to truly innovative research.
These high-risk, high-reward grant programs have their own review process, and the reviewers are given specific criteria for evaluation, but the results of the reviews still favor ideas with previous publications. Even though an innovative and novel concept might be viewed as potentially high-impact, it is extremely rare to see such a proposal funded without published preliminary data. The requirement for extensive preliminary data means that only highly funded investigators or those at deep-pocketed institutions can explore entirely new lines of research. While the researchers at these places may be talented and productive, limiting the source of new ideas to such a small number of labs is unlikely to maximize scientific progress. From a review perspective, I understand the difficulty in evaluating new ideas without preliminary data — they all sound too good to be true — and this is especially true for proposals that are not squarely in my field. More recently, NIH added a specific innovation scoring criterion in all of their research grant reviews. In theory, this is a great idea, but the data show what was always obvious to those of us that do a lot of grant reviews: the scores given for this criterion have little correlation to the final scientific merit scores.
Efforts by the NIH have been based on changing reviewer behavior, which is difficult if not impossible. A better approach would be to take advantage of existing reviewer behavior to rank innovative proposals. This approach would leverage the full set of data provided by the review panel impact scores. Currently the NIH evaluates the priority of proposals using a scientific impact score with a range of 1 to 9, where a score of 1 represents the highest impact. All of the reviewers on the panel score each proposal based on written critiques from a subset of reviewers and discussion of the whole panel. All of the scores are averaged and this is multiplied by 10, yielding a final impact score ranging from 10 to 90 (generally proposals with impact scores in the range of 10 to 30 are competitive for funding). This single number is used by NIH to prioritize funding within scientific priorities established by each institute. In other words, the mean of a potentially complex distribution is used as the only descriptor of the entire process. As scientists, none of us would consider reporting only averages of distributions, without any acknowledgement of the variations, deviations, or details about the shape of the distribution. Yet, this is exactly how we allow our funding to be decided.
From my experiences on NIH review panels, I know that exciting grant applications in terms of innovation and potential high-impact usually engender the most spirited discussion. When these proposals are viewed as high-risk, the discussion often ends without a clear consensus. This
leaves each reviewer free to “vote their conscience,” which means that many will vote in agreement with one side of the argument. The resulting scientific impact score is reported as the average, yet there might not be a single reviewer who actually gave that average score. For example, a grant where half of the reviewers score 1 and the other half score 5 would receive a scientific impact score of 30. This would yield the same impact score as for a proposal where all reviewers score 3, even though not a single reviewer scored those two proposals equally. My experience suggests that the inability to reach consensus almost always arises from different perceptions of feasibility and risk. The fact that different reviewers have different risk tolerance in their evaluations should be a strength of the review process, but by reporting only average results, this potential strength is lost.
To increase the proportion of grant funding that supports innovative, high-risk research, we should use all the information the reviewers are already providing. This would mean awarding some grants based on impact score distributions rather than solely on the average. I wish that I could present a statistical argument about how best to accomplish this, but the data needed to do so are not freely available from NIH. This would include widths of distributions, the frequency of non-normal distributions, and feedback from review panel chairs and the NIH Scientific Review Officers regarding the perceived risk and innovation of grants with broad and/or non-normal distributions. I hope that we can begin an evidenced-based dialog with the NIH and other funding agencies around the world towards encouraging innovation and risk in our research.
—David W. Piston