Tuesday, June 29, 2010

Too Many Papers

In (trying) to keep up with the literature in my field, I have often made that lament, especially after tracking down the 3rd in a series of Least Publishable Units that should have been one more complete paper. The LPU problem is directly related to the strongly held belief that more is better, especially when it comes to paper counts.

A week or so ago, there was this article in the Chronicle of Higher Education that suggested that the problem is not one of LPUs, but instead of useless and boring research being published, wasting everyone's time and resources. Female Science Prof has an awesome refutation of this article, with Drugmonkey adding some additional arguments against. For the other side, Derek Lowe agrees with the CHE.

This topic touches in some ways on my previous post on old, abandoned data. Personally, I think the problem of old abandoned data is more significant than excessive publication of results. I've seen this play out in my own career. When I switched to a new research focus area, I aggressively searched out techniques that would speed our progress, only to find nothing much published that was directly relevant. Some of the stuff I did find was from the 60's and 70's, with low citation counts. This stuff was incredibly useful to us, even though it had lain fallow for 30+ years before someone found it worthwhile.

I've also seen the opposite occur. We had some incidental findings that we didn't think worthy of a separate publication. A few years later, another group replicated and published our (unpublished) "incidental" results. Their paper has been cited 12 times in the year and a half since publication in a field-specific journal with an impact factor of 6. It is incredibly difficult to predict in advance what other scientists will find useful. Since data is so expensive in time and money to generate, I would much, much rather there be too many publications than too few (especially given modern search engines and electronic databases).

6 comments:

  1. Since data is so expensive in time and money to generate, I would much, much rather there be too many publications than too few

    I'm totally with you on this one.

    ReplyDelete
  2. I know that some programs demand x number of publications before a student can graduate. I do not necessarily agree with this, but for these students quantity understandably maters more than quality.

    For a PI one high impact papers is often preferable to two smaller ones. For students and postdocs first authorship on a low impact paper trumps auxiliary status on a high impact paper. When a group head decides between one big or several small publications he is morally obligated consider the careers and labors of his group members.

    ReplyDelete
  3. It seems like papers can be of questionable quality for more than one reason: they can be on unpopular topics, or they can be incomplete.

    This may be somewhat field-specific. In Cognitive Science, clean experiments are exceptionally difficult to run, and being confident of a result usually requires a number of experiments, often using different methods. Nonetheless, you see a lot of half-finished studies -- papers that contain one or two flawed yet suggestive experiments.

    This doesn't necessarily prevent them from being cited. If you publish a bad paper on a popular topic, you're likely to be cited, if only in the context of explaining how incomplete/wrong you are. Whether that means the paper was found useful is open to interpretation, but the point is that the citation index can't tell the difference.

    I'd personally rather see fewer, higher-quality papers. This may be partly because our papers are long and often difficult to write. They average 20-30 pages, 2/3 of which is discussion, theory and lit-review. They often take longer to write than it took to run the experiments, and even reading takes time (it usually takes me about an hour to read a paper, and there are dozens of relevant -- but not necessarily very useful -- papers published every month).

    ReplyDelete
  4. @ Dr. Girlfriend

    For a PI one high impact papers is often preferable to two smaller ones. For students and postdocs first authorship on a low impact paper trumps auxiliary status on a high impact paper. When a group head decides between one big or several small publications he is morally obligated consider the careers and labors of his group members.

    This is very true. This is where you see how sensitive the PI is to the needs of the group members. A good PI will weigh the chances of a big splashy paper realistically (they also take very long to review) and act accordingly (decide to fight it out or go for a more specialized venue).

    If something should be one comprehensive and coherent paper in a very good journal, I think it is unethical to cut it into several little ones just so the student would have more papers. I think the PI's primary duty is to help delineate a project for each student or postdoc where the student/postdoc has ownership of a significant enough chunk of the project so that a sufficient number of first-author pubs should not be in question. Sometimes, unfortunately, PI's overhire (hire 3 people where 1 would do, to get the work done faster) and then there is understandably the issue of ownership of a significant enough chunk of work...

    But I am going on a tangent. Prodigal's post was about "citedness" of a paper vs usefulness, and I think a good criterion is that, when people have something important to say, they should say it (publish it).

    ReplyDelete
  5. GWW, it sounds like cognitive science is really different from my field. In my field, it is difficult to get anything half-done published (which is I suspect a large cause of data left to rot in notebooks) unless it is truly, truly groundbreaking. I am often asked for additional data by referees, and I know that is even more common in the life sciences. There are a lot of shorter communications published in my area, but these are usually a complete story that is not long enough for a full paper. Perhaps the situation you are describing could be cleaned up with tighter refereeing? Do all cog sci papers really need such extensive lit review to be understood (displaying my ignorance here) or is it just the culture?

    I agree with you that citation count is not a marker of paper quality, although arguably papers with 0 citations are less useful than papers with the average number of citations for the field.

    Dr. G, I strongly disagree with programs that require a certain number of papers, since that has a large impact on the type of projects students work on. I agree with GMP that a professor has to take into account the career goals of her/his students, but that can't take precedence over properly communicating the results. If a project that 2 people are working on only produces enough data for one publication, that is unfortunate (and either really bad luck or poor planning), but that doesn't make it right to divide one publication into 2 really weak papers. In any case, I don't think this is a driver for the "avalanche of papers" decried in the CHE article.

    I still stand by my statement that it is better for data to get out of people's notebooks so others can benefit, even if it means more "useless" papers get published.

    ReplyDelete
  6. By "half-finished" I mean papers that simply leave too many alternative explanations open. Sometimes there are simply obvious follow-up studies that need to be run and wouldn't take that long and even the authors acknowledge this, but they weren't run.

    How much discussion is required depends a bit on exactly the subject matter. Long discussion sections are the norm in language research (what I study); they're less common in some other areas (like vision).

    Some of it is no doubt cultural, in that one could always decide to have a journal without discussion sections. The issue is that our results are usually not easy to interpret, since we largely study constructs that can't be directly detected, using instruments that aren't fully understood. It's probably telling that many people start reading a paper with the discussion section and come back to the results later if at all. So people clearly find them useful.

    ReplyDelete