Rapid Responses to:

RESEARCH:
Philip M Davis, Bruce V Lewenstein, Daniel H Simon, James G Booth, and Mathew J L Connolly
Open access publishing, article downloads, and citations: randomised controlled trial
BMJ 2008; 337: a568 [Abstract] [Full text]
*Rapid Responses: Submit a response to this article

Rapid Responses published:

[Read Rapid Response] Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion
Stevan Harnad   (1 August 2008)
[Read Rapid Response] Word is still out: Publication was premature
Gunther Eysenbach   (1 August 2008)
[Read Rapid Response] Reputation for open access
Shafqat Inam   (1 August 2008)
[Read Rapid Response] Who cares for citation advantage?
Nicola Latronico   (1 August 2008)
[Read Rapid Response] Re: Word is still out: Publication was premature
Trish Groves   (2 August 2008)
[Read Rapid Response] Not a Response: A correction on declared interests
Stevan Harnad   (2 August 2008)
[Read Rapid Response] Advantage of randomized experimental design
William H. Walters   (3 August 2008)
[Read Rapid Response] Early-Release Study Needs Both Randomized and Self-Selection Controls
Stevan Harnad   (3 August 2008)
[Read Rapid Response] Internal control variables were present / trial registration
Gunther Eysenbach   (5 August 2008)
[Read Rapid Response] Authors' Response
Philip M. Davis, Bruce V Lewenstein, Daniel H Simon, James G Booth, and Mathew J L Connolly   (6 August 2008)
[Read Rapid Response] Children in the Sandbox
Stevan Harnad   (6 August 2008)
[Read Rapid Response] Re: Authors' Response
James E Till   (8 August 2008)
[Read Rapid Response] Open questions for the author / comment on Harnad
Gunther Eysenbach   (9 August 2008)
[Read Rapid Response] Author's Second Response to Eysenbach
Philip M. Davis, Bruce V Lewenstein, Daniel H Simon, James G Booth, and Mathew J L Connolly   (12 August 2008)
[Read Rapid Response] Access is more important than citation
Giulio Bognolo   (12 August 2008)
[Read Rapid Response] Open access journals: publication costs deter junior academics
Gabriele Pollara   (14 August 2008)
[Read Rapid Response] On Eggs and Citations
Stevan Harnad   (29 August 2008)

Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion 1 August 2008
 Next Rapid Response Top
Stevan Harnad,
Professor
University of Southampton SO17 1BJ

Send response to journal:
Re: Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion

Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008) Open access publishing, article downloads, and citations: randomised controlled trial British Medical Journal 337: a568
Overview (by SH):

Davis et al's study was designed to test whether the "Open Access (OA) Advantage" (i.e., more citations to OA articles than to non-OA articles in the same journal and year) is an artifact of a "self-selection bias" (i.e., better authors are more likely to self-archive or better articles are more likely to be self-archived by their authors).

The control for self-selection bias was to select randomly which articles were made OA, rather than having the author choose. The result was that a year after publication the OA articles were not cited significantly more than the non-OA articles (although they were downloaded more).

The authors write:
"To control for self selection we carried out a randomised controlled experiment in which articles from a journal publisher’s websites were assigned to open access status or subscription access only"
The authors conclude:
"No evidence was found of a citation advantage for open access articles in the first year after publication. The citation advantage from open access reported widely in the literature may be an artefact of other causes."
Commentary:

To show that the OA advantage is an artefact of self-selection bias (or any other factor), you first have to produce the OA advantage and then show that it is eliminated by eliminating self-selection bias (or any other artefact).

This is not what Davis et al did. They simply showed that they could detect no OA advantage one year after publication in their sample. This is not surprising, since most other studies don't detect an OA advantage one year after publication either. It is too early.

To draw any conclusions at all from such a 1-year study, the authors would have had to do a control condition, in which they managed to find a sufficient number of self-selected self-archived OA articles (from the same journals, for the same year) that do show the OA advantage, whereas their randomized OA articles do not. In the absence of that control condition, the finding that no OA advantage is detected in the first year for this particular sample of journals and articles is completely uninformative.

The authors did find a download advantage within the first year, as other studies have found. This early download advantage for OA articles has also been found to be correlated with a citation advantage 18 months or more later. The authors try to argue that this correlation would not hold in their case, but they give no evidence (because they hurried to publish their study, originally intended to run four years, three years too early.)

(1) The Davis study was originally proposed (in December 2006) as intended to cover 4 years:
Davis, PN (2006) Ra ndomized controlled study of OA publishing (see co mment)
It has instead been released after a year.

(2) The Open Access (OA) Advantage (i.e., significantly more citations for OA articles, always comparing OA and non-OA articles in the same journal and year) has been reported in all fields tested so far, for example:
Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39- 47.
(3) There is always the logical possibility that the OA advantage is not a causal one, but merely an effect of self-selection: The better authors may be more likely to self-archive their articles and/or the better articles may be more likely to be self-archived; those better articles would be the ones that get more cited anyway.

(4) So it is a very good idea to try to control methodologically for this self-selection bias: The way to control it is exactly as Davis et al have done, which is to select articles at random for being made OA, rather than having the authors self-select.

(5) Then, if it turns out that the citation advantage for randomized OA articles is significantly smaller than the citation advantage for self- selected-OA articles, then the hypothesis that the OA advantage is all or mostly just a self-selection bias is supported.

(6) But that is not at all what Davis et al. did.

(7) All Davis et al did was to find that their randomized OA articles had significantly higher downloads than non-OA articles, but no significant difference in citations.

(8) This was based on the first year after publication, when most of the prior studies on the OA advantage likewise find no significant OA advantage, because it is simply too early: the early results are too noisy! The OA advantage shows up in later years (1-4).

(9) If Davis et al had been more self-critical, seeking to test and perhaps falsify their own hypothesis, rather than just to confirm it, they would have done the obvious control study, which is to test whether articles that were made OA through self-selected self-archiving by their authors (in the very same year, in the very same journals) show an OA advantage in that same interval. For if they do not, then of course the interval was too short, the results were released prematurely, and the study so far shows nothing at all: It is not until you have actually demonstrated an OA advantage that you can estimate how much of that might due to a self-selection artefact!

(10) The study shows almost nothing at all, but not quite nothing, because one would expect (based on our own previous study, which showed that early downloads, at 6 months, predict enhanced citations a year and a half or later) that Davis's increased downloads too would translate into increased citations, once given enough time.
Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072.
(11) The findings of Michael Kurtz and collaborators are also relevant in this regard. They looked only at astrophysics, which is special, in that (a) it is a field with only about a dozen journals, and every research astronomer has subscription access to them -- and these days also free online access via ADS -- and (b) it is a field in which most authors self-archive their preprints very early in arxiv -- much earlier than the date of publication.
Kurtz, M. J. and Henneken, E. A. (2007) Open Access does not increase citations for research articles from The Astrophysical Journal. Preprint deposited in arXiv September 6, 2007.
(12) Kurtz & Henneken found the usual self-archiving advantage in astrophysics (i.e., about twice as many citations for OA papers than non-OA) but when they analyzed its cause, they found that most of the cause was the Early Advantage of access to the preprint, as much as a year before publication of the (OA) postprint. In addition, they found a self-selection bias (for preprints -- which is all that were involved here, because, as noted, as of publication, everything is OA): The better articles by the better authors were more likely to have been self-archived as preprints.

(13) Kurtz's results do not generalize to all fields, because it is not true in other fields either that (a) they already have 100% OA for their published postprints, nor that (b) many authors tend to self-archive preprints before publication.

(14) However, the fact that early preprint self-archiving (in a field that is 100% OA as of postprint publication) is sufficient to double citations is very likely to translate into a similar effect, in a non-OA field, if one reckons on the basis of the one-year access embargo that many publishers are imposing on the postprint. (The yearlong "No-Embargo" advantage in other fields might not turn out to be so big as to double citations, as with the preprint Early Advantage in astrophysics, because at least there is some subscription access to the postprint, but the counterpart of the Early Advantage for the postprint is likely to be there too.)

(15) Moreover, the preprint OA advantage is primarily Early Advantage, and only secondarily Self-Selection.

(16) The size of the postprint self-selection bias would have been what Davis and al tested -- if they had done the proper control, and waited long enough to get an actual OA effect to compare against.

(17) We had reported in a pilot study that there was no statistically significant difference between the size of the OA advantage for mandated and unmandated self-archiving:
Hajjem, C & Harnad, S. (2007) The Open Access Citation Advantage: Quality Advantage Or Quality Bias?Preprint deposited in arXiv January 22, 2007.
(18) We will soon be reporting the results of a 4-year study on the OA advantage in mandated and unmandated self-archiving that confirms these earlier findings: Mandated self-archiving is like Davis et al's randomized OA, and it does not reduce the OA advantage at all -- once enough time has elapsed for there to be an OA Advantage at all.

Stevan Harnad American Scientist Open Access Forum

Competing interests: None declared

Word is still out: Publication was premature 1 August 2008
Previous Rapid Response Next Rapid Response Top
Gunther Eysenbach,
Senior Scientist, Centre for Global eHealth Innovation
Toronto M5G2C4

Send response to journal:
Re: Word is still out: Publication was premature

Davis’ et al. [1] have published a paper containing preliminary results from their Open Access RCT. While parts of the paper will be welcomed by most Open Access advocates as far as the access/usage data are concerned, showing (unsurprisingly) a significant increase in access and use of Open Access articles compared to non-OA articles, other parts of the paper are more controversial (to be diplomatic). Davis et al failed to show a citation advantage after 9-12 months, from which they conclude that “the citation advantage from open access reported widely in the literature may be an artifact of other causes.”. Jumping to these conclusions after only 9-12 months is quite irritating and the fact that the BMJ publishes “negative” results of an ongoing trial before it is even “completed” is deeply disturbing (by the way – where is the trial registration ID and/or the link to the study protocol? What period of time was stipulated in the protocol as the primary outcome comparison point? Didn’t the BMJ commit to publishing only RCTs which have been registered, if so, where is the trial registration number?[2]).

While it is legitimate to publish results of an ongoing RCT prematurely if surprisingly large, statistically significant differences between the intervention and control group emerge, it is generally not considered ethical to frame results from an ongoing RCT as negative prematurely, before it even makes sense to compare the groups at a predetermined endpoint which makes "clinical" sense. The same rules which have been developed for clinical trials should apply to health services/ health policy / science policy trials - such as this one.

In general – and on a positive note-, I fully agree that RCT’s are needed in this area to get to the bottom of the issue of citations and knowledge uptake (Disclaimer: I am the Principal Investigator of a similar, CIHR-funded project), although the practical difficulties in doing such RCT’s - where due to self-archiving it is almost impossible to generate an “uncontaminated” control group - should not be underestimated. I also congratulate Davis to having been able to convince publishers to participate in such a trial (I know from personal experience how hard this is).

However, to conclude or even imply that any citation advantage is an “artifact” after looking only at citations that occur within the same year of the cited article (9-12 months after publication) is as interesting and valid as doing a RCT on the effectiveness of a painkiller and comparing the pain between control and intervention after only one minute, concluding that the painkiller doesn’t work if there is no statistically significant difference between the groups after 60 seconds. It is unfortunate that Davis et al. did not wait longer to report their citation data after a reasonable period of time before implying that there are no differences – a reasonable period would be to let citations accumulate for at least 2-3 years. The way the paper stands has the potential to mislead readers who are not familiar with citation dynamics and who do not know that usually, in the first 9-12 months, papers receive very few - if any - citations at all (Davis et al actually "forgot" to report the actual mean number of citations in both groups!). True – according to their data there are no statistically significant differences in citation counts after 9-12 months between Open Access and non-Open Access articles. But apparently there are also no citation differences between other groups of articles for which one would clearly expect such a difference, for example articles featured on the cover-page or articles selected for press-releases versus normal articles. Ironically, Davis himself argued elsewhere that “Coverage in the popular press is well known to amplify the transmission of scientific information to the research community. As a result, articles that receive coverage in newspapers are more likely to be cited (Phillips et al. 1991; Kiernan 2003)” [http://biology.plosjournals.org/perlserv/?request=read- response&doi=10.1371/journal.pbio.0040157#r1438)].

Now his own analysis fails to show any effect of cover page or press release coverage on citations (see Table 3), as one would expect from the previous studies. What does this say about the validity of his other conclusions? Doesn’t this hint at the fact that the observation period might be much too short?

Davis et al. allowed only 9-12 months of time for the Open Access advantage to develop. For a citation event to occur, a cited paper has to be indexed, a citing author has to discover the cited paper, write his citing paper, the citing paper has to be submitted, peer-reviewed, published, and indexed by Thomson/ISI – a process that normally takes much longer than 9-12 months, which is why the citation rate is typically highest 1-2 years after publication. Published articles usually get very few, if any, citations in the early months immediately after publication, in particular in lower impact factor / immediacy index journals (many of these citations are “insider” citations from the authors themselves or others in their “inner circle” who have seen the manuscript before it was published). It is a notable omission from the manuscript that they never actually say what the crude, mean citation count in both groups actually is – reviewers and readers are deliberately left in the dark as to what I think are mean citation counts of less than on average 1 per article. How “significant” is a lack of difference at this point?

What surprised (and bothered) me especially is that Davis et al. cites my PNAS data published in PLoS Biology [3] to justify his short observation period. While I indeed found already a statistically significant difference after only 4-10 months (1.2 citations in the nOA group versus 1.5 citations in the OA group), most citations appeared after 10-16 months (4.5 versus 6.4) and 16-22 months (8.9 vs 13.1). Finding a significant difference after a short period of time certainly justifies reporting such differences. But does finding no differences in an even shorter period of time (9-12 months) justify a publication blaring out negative results implying that previous research reported artifacts? What Davis et al. fails to acknowledge is the fact that PNAS has a much higher impact factor than any of the APS journals (by the factor of 3), hence a higher citation rate, and also a high immediacy index. This alone (leaving aside any other possible differences due to the different nature of the disciplines) is reason enough to question their comparison with PNAS and argument that “our time-frame is more than sufficient to detect a citation advantage, if one exists”.

My final concern is the issue of contamination. The more articles in the control group are (self-archived) Open Access (i.e. “contaminate” the control group), the more difficult it will be to show a difference between the groups. My PNAS article cohort is from 2004, Davis article cohort is from 2007. Considerable headway has been made on the self-archiving front in the past couple of years, with several funding agencies mandating or strongly encouraging self-archiving, so one might assume that there are different “contamination” (i.e. self-archiving) rates in the control “non- OA” group, with higher contamination rates in the Davis study. However, Davis’ data in that respect are surprising. Davis says there were only 20 self-archived articles in his total sample, which is a suspiciously low self-archiving rate of only 1.2% (with an unreported contamination rate in his control group), while my PNAS sample had 10.6% of all articles in the control group self-archived. What Davis et al. unfortunately fail to report is when were the searches for self-archiving done? The low self- archiving rate suggests to me that this was perhaps only tested once right after publication, rather than continuously after publication? Presumably the self-archiving rate increases over time, and it is not clear to me how this was handled or controlled for.

So, what is the bottom-line of the Davis study? The access data if the RCT are certainly consistent with other postulated components of the “open access advantage” [4]. As I (and others) have argued previously, the OA advantage goes beyond citations as a crude measure for uptake within the “inner circle” of the scientific community, but includes an “end user uptake advantage” and a “cross-discipline fertilization advantage”, i.e. knowledge uptake by others beyond the “inner circle” of scientists working in a given discipline. Looking at Esposito’s nautilus model [5], one could argue that the open access advantage in fact happens primarily in the more “peripheral” layers of the nautilus, rather than the “innermost” circle of researchers. But it is those individuals in the outer layers who are more likely to cite an article later than those “insiders”. And for the public or other knowledge endusers (policy makers, physicians), “citations” are an unsuitable metric anyways.

In summary, this would have been an important and much more credible paper if it would have been published in 2-3 years as opposed to a salami approach.

Also worrisome is the fact that this premature publication makes it possible to sabotage the RCT, making interpretation of future results from this RCT more difficult. Imagine Davis’ et al would find – in their follow -up report – a significant difference between citation counts of OA articles and non-OA articles (or again no difference). Who can then rule out that these additional citations one way or the other aren’t intentionally produced by Open Access advocates (or critics), who now deliberately start citing preferentially those green-lock-marked articles from Davis’ dataset (or vice versa)?

References

1. Davis P et al. Open access publishing, article downloads, and citations: randomised controlled trial BMJ 2008;337:a568. DOI: 10.1136/bmj.a568

2. Laine et al. Clinical Trial Registration. BMJ 2007;334:1177-1178 DOI:10.1136/bmj.39233.510810.80

3. Eysenbach G (2006) Citation Advantage of Open Access Articles. PLoS Biology 4(5) e157 DOI: 10.1371/journal.pbio.0040157

4. Eysenbach G. The Open Access Advantage. J Med Internet Res 2006;8(2):e8 <URL: http://www.jmir.org/2006/2/e8/ DOI: 10.2196/jmir.8.2.e8

5. Joseph J. Esposito. Open Access 2.0: Access to Scholarly Publications Moves to a New Phase Ann Arbor, MI: Scholarly Publishing Office, University of Michigan, University Library vol. 11, no. 2, Spring 2008 URL: http://hdl.handle.net/2027/spo.3336451.0011.203

Competing interests: I am editor of an open access journal and principal investigator of the project "Impact of open access on knowledge translation", funded by the Canadian Institutes of Health Research (CIHR)

Reputation for open access 1 August 2008
Previous Rapid Response Next Rapid Response Top
Shafqat Inam,
Medical Student
University of New South Wales

Send response to journal:
Re: Reputation for open access

While randomized controlled trials remain the gold standard for testing certain hypothesis, in this case the random manipulation of availability without subscription may have distorted the true effect of the open access model.

Links from web searches and other websites are created over time; hence once journals have establish their open access status websites indexing free journals can create links and direct readers towards them. Randomized open access status negates this effect.

Competing interests: None declared

Who cares for citation advantage? 1 August 2008
Previous Rapid Response Next Rapid Response Top
Nicola Latronico,
Associate Professor
University of Brescia, 25123 Brescia, Italy

Send response to journal:
Re: Who cares for citation advantage?

Davis et al. conclude that free access to published papers does not confer any particular benefits compared to subscription access. As a proof, there was no citation advantage for open access articles in the first year after publication. The question is: “No benefits for whom”? It may well be that journals’ editors and researchers have no advantage, but doctors certainly do, provided the research is clinically relevant. Having access to full text to check the “Table 1” and see if patients are similar to those you are caring for, analyzing the methods to see if you can easily adopt the strategy or treatment, evaluating graphics and tables to have a clear cut idea of the results, having a look at discussion to see an explanation of the results and the limitations of the study is different than reading just the abstract. As usual, deciding whether something is important or not critically depends on who is the judge.

Competing interests: None declared

Re: Word is still out: Publication was premature 2 August 2008
Previous Rapid Response Next Rapid Response Top
Trish Groves,
Deputy editor
BMJ

Send response to journal:
Re: Re: Word is still out: Publication was premature

Stevan Harnad and Gunther Eysenbach both point out, helpfully, that Davis et al have reported here an interim analysis of their RCT of open access. We should have made this clearer in the paper, and no doubt the study's authors will elaborate on this in a further response.

Eysenbach also asks "Didn’t the BMJ commit to publishing only RCTs which have been registered, if so, where is the trial registration number?"

Yes, the BMJ has committed to this, and is an active supporter of the trial registration movement. Indeed, we will not send a clinical trial report for external peer review until its abstract states the trial registration number and the name of the register (http://resources.bmj.com/bmj/authors/types-of-article/research).

The RCT by Davis et al did not have to be registered, however, because it did not include human participants. See, for instance, the International Committee of Medical Journal Editors uniform requirements which say at http://www.icmje.org/#clin_trials

"The ICMJE believes that it is important to foster a comprehensive, publicly available database of clinical trials. The ICMJE defines a clinical trial as any research project that prospectively assigns human subjects to intervention or concurrent comparison or control groups to study the cause-and-effect relationship between a medical intervention and a health outcome. Medical interventions include drugs, surgical procedures, devices, behavioral treatments, process-of-care changes, and the like."

Competing interests: I'm the BMJ's senior research editor and was involved in the peer review of this study by Davis et al

Not a Response: A correction on declared interests 2 August 2008
Previous Rapid Response Next Rapid Response Top
Stevan Harnad,
Professor
University of Southampton, SO17 1BJ

Send response to journal:
Re: Not a Response: A correction on declared interests

This is not another response, just a request that you correct my declaration of interests to: "OA Advocate and co-author of several articles reporting OA citation advantage" on my already posted Response.

Thank you, Stevan Harnad

Competing interests: OA Advocate and co-author of several articles reporting OA citation advantage

Advantage of randomized experimental design 3 August 2008
Previous Rapid Response Next Rapid Response Top
William H. Walters,
Dean of Library Services and Associate Professor of Social Sciences
Menlo College (Atherton, CA 94027)

Send response to journal:
Re: Advantage of randomized experimental design

Stevan Harnad writes, "(9) If Davis et al had been more self- critical, seeking to test and perhaps falsify their own hypothesis, rather than just to confirm it, they would have done the obvious control study, which is to test whether articles that were made OA through self-selected self-archiving by their authors (in the very same year, in the very same journals) show an OA advantage in that same interval."

Davis et al. used a random process to determine which papers would be included in the OA group. In my opinion, their use of a randomized experimental design is a clear methodological advancement over previous work on this topic. I therefore find it amazing that Dr. Harnad argues in favor of an experimental design in which authors decide whether their papers will be OA or not. What's next -- pharmaceutical studies in which patients decide whether they'll get the experimental drug?

Competing interests: None declared

Early-Release Study Needs Both Randomized and Self-Selection Controls 3 August 2008
Previous Rapid Response Next Rapid Response Top
Stevan Harnad,
Canada Research Chair in Cognitive Sciences
Université du Québec à Montréal, Montréal, Québec, Canada H3C 3P8

Send response to journal:
Re: Early-Release Study Needs Both Randomized and Self-Selection Controls

William H. Walters writes:
"randomized experimental design is a clear methodological advancement over previous work on this topic. I therefore find it amazing that Dr. Harnad argues in favor of an experimental design in which authors decide whether their papers will be OA or not."
Dr. Walters has misunderstood the methodological point. If one tests over a short interval that may be too early to display the OA advantage, a self-selection control is needed in addition to the randomized control, not instead -- in order to demonstrate that there is a detectable self-selection advantage at all in such a short interval. Otherwise the study has to wait till an interval has been reached for which there are prior studies or norms demonstrating the OA effect.

By way of analogy, if you were trying to show, with randomized trials, that lung cancer was a self-selection (life-style spectrum) effect in smokers, rather than a causal effect of smoking, and you cut off the study with a null effect a year (or even a decade) after smoking had been imposed randomly, you would also have to have a self-selected smoker control group (or prior time-based norms) to test whether a year (or even a decade) was long enough to show any significant increase in lung cancer cancer at all. Otherwise the premature study shows nothing.


Competing interests: OA Advocate, co-author of several articles on OA Advantage, and principal investigator on study funded by the Canadian Social Science and Humanities Research Council (SSHRC) to compare the size of the OA Advantage for mandated versus self-selected self- archiving,

Internal control variables were present / trial registration 5 August 2008
Previous Rapid Response Next Rapid Response Top
Gunther Eysenbach,
Senior Scientist, Associate Professor
Toronto M5G2C4

Send response to journal:
Re: Internal control variables were present / trial registration

Stevan Harnard's critique about the lack of controls is confusing and also wrong: There is actually a self-selection control variable in the Davis study: "self-archived" papers, which - in line with Davis' own argument, with which I concur - tend to be selectively archived because they are the "better" studies, which is why previous studies which simply compared crude citations of self-archived (openly accessible) articles vs non-self-archived articles without adjusting for any confounders (ie. Lawrence, Harnard's and others' studies) did see a citation advantage of self-archived studies (while my PLoS study, a prospective cohort study adjusting for multiple confounders, did not).

In fact, there are several "control" variables in Davis' study which SHOULD have been independent predictors for citations (articles with press -release, articles on the cover page, self-archived articles) - the fact that these are NOT significant predictors for citations (one is even a negative predictor) leaves lingering questions about the internal validity of this study, more specifically, they are evidence for an insufficient follow-up time. At the very least, Davis should have waited with his publication long enough until other variables which are expected predictors for citations become significant predictors.

For a full critique and questions for the author see also

http://gunther-eysenbach.blogspot.com/2008/07/phil-davis-open-access-publishing.html

----

While I welcome this RCT and have no doubt about the integrity and meticulousness of these authors, I feel that the publications sets a bad precedence and is also unfair to those investigators who go through trial registration, adhere to their protocols, and avoid reporting "negative" "interim" results after a follow-up time that is clearly insufficient.

The explanation of the BMJ regarding trial registration ("not necessary because no 'humans' where randomized") is unsatisfactory (and I believe it is semantics to say I randomize "publications" versus "authors". With the same argument we could randomize "cars" rather than "drivers", or "hospital beds" rather than "patients"...). Semantics aside, the "spirit" of trial registration is to provide a mechanism to prevent a-posteriori modifications in analysis strategies as - as we all know - any data can be "fitted" post-hoc to support almost any hypothesis. The issue is particularly relevant if the primary endpoint is changed. The same quality standards that exist for clinical trials should also apply to health services / health policy / science policy type of trials.

Imagine this would have been a trial conducted by a pharmaceutical company to prove equivalence of their drug against a competitor drug which is widely regarded as superior. Would the BMJ have allowed publication of a "negative" trial which concludes that the competitor drug is NOT superior (as previous evidence and common sense suggest) without going through a rigorous process that ensures that a) the follow-up time is sufficient and b) the follow-up time was defined a-priori?

Competing interests: Editor of the Journal of Medical Internet Research, an open access journal, and PI of a project to evaluate the "Impact of Open Access on Knowledge Translation"

Authors' Response 6 August 2008
Previous Rapid Response Next Rapid Response Top
Philip M. Davis,
Graduate student
Dept. of Communication, Cornell University,
Bruce V Lewenstein, Daniel H Simon, James G Booth, and Mathew J L Connolly

Send response to journal:
Re: Authors' Response

Lack of self-selection control

Professor Harnad comments that we should have implemented a self-selection control in our study. Although this is an excellent idea, it was not possible for us to do so because, at the time of our randomization, the publisher did not permit author-sponsored open access publishing in our experimental journals. Nonetheless, self-archiving, the type of open access Prof. Harnad often refers to, is accounted for in our regression model (see Tables 2 and 3).

Insufficient timeframe to detect citation effects

We recognize that there was a tradeoff in choosing to submit our paper for publication with only one year of post publication citation data. We decided it was worthwhile to report preliminary results, rather than wait for more citation data, because of the importance of the issue and because of the stark contrast between our results and those in prior studies. For example, Eysenbach detected more than a two-fold difference in the incidence of open access articles being cited in the first 4 to 10 months after publication.[1] Based on the effect size reported in his and other previous studies, we should have seen a significant open access effect by the end of the first year.

To further assess concern of insufficient timeframe, we have gone back and reexamined the issue with additional months of citation data. Since our manuscript was submitted to BMJ (with citation data from 2 January, 2008), we have run several update analyses. As of 3 August, 2008 (15 to 18 months after article publication) the effect of randomized open access on citations remains insignificant (Incident Rate Ratio = 1.07, 95% confidence interval 0.95 to 1.20, P=0.23). Open access and subscription-access articles both have an average of 3.8 citations. In sum, we still find no open access effect on citations. Nonetheless, we plan to gather more citation data for these two sets of articles, and reexamine this issue again, after allowing even more time to pass.

To summarize, we believe that our research provides strong evidence that open access increases the dissemination of scientific articles, as indicated by our download results. However, we find no evidence of an open access citation effect, even after incorporating six additional months of citation data. There are many societal benefits to making the scientific literature freely available beyond the research community; a citation advantage may not be one of them.

Notes:

[1] Eysenbach G. Citation Advantage of Open Access Articles. PLoS Biology 2006;4(5).

Competing interests: Corresponding author

Children in the Sandbox 6 August 2008
Previous Rapid Response Next Rapid Response Top
Stevan Harnad,
Professor
University of Southampton, SO17 1BJ

Send response to journal:
Re: Children in the Sandbox

A short, simple guide for readers who may be perplexed about whether Eysenb ach is agreeing or disagreeing with Harna d:
(1) If a study reports that X is an artifact of Y, it has to show that X occurs, and that if Y is controlled, X disappears (or diminishes).

(2) Here, X is the higher number of citations for OA articles compared to non-OA articles, Y is self-selected OA self-archiving, and the control of Y is to randomize OA archiving.

(3) The Davis study found that there was no significant difference between the number of citations for OA compared to non-OA articles for their 1-year sample (i.e., that no X occurs within a year when Y is controlled).

(4) But what the Davis study failed to show is that there is a significantly higher number of citations for OA compared to non-OA articles for their 1-year sample (i.e., that X does occur within a year when Y is not controlled).

(5) As a consequence, the Davis et al. study has done nothing but confirm the many other studies of the OA/citation correlation, most of which likewise find that there is no significant difference between the number of citations for OA compared to non-OA articles in the first year (i.e., no detectable X at all within the first year).

(6) It is for this reason, and this reason alone, that the release of the Davis's studies null findings was premature and uninformative.

(7) (My own guess as to why Eysenbach is likewise declaring that the Davis study was premature, for precisely the same reasons, yet adding that "Stevan Harnard's [sic] critique about the lack of controls is confusing and also wrong" would be that Eysenbach is continuing to promote his own published report on the significantly higher number of citations for OA compared to non-OA articles as being the first and only methodologically sound demonstration of the OA citation advantage among the many studies listed in the Bibliography of studies on the effect of open access and downloads on citation impact.)
Eysenbach G (2006) Citation Advantage of Open Access Articles. PLoS Biology 4(5) p. e157

Harnad S (2006) PLoS, Pipe-Dreams and Peccadillos. PLoS Biology Responses (16 May 2006). Full version.
(8) (The Davis et al. study has also confirmed that although there is no significant difference between the number of citations for OA compared to non-OA articles in the first year (i.e., no X at all within the first year), there is a significantly higher number of downloads for OA compared to non-OA articles in the first year. Early downloads in the first six months have been found to be correlated with a higher number of citations after 18 months or more.)


Competing interests: OA Advocate, co-author of several articles on OA Advantage

Re: Authors' Response 8 August 2008
Previous Rapid Response Next Rapid Response Top
James E Till,
Senior Scientist Emeritus
Ontario Cancer Institute, University Health Network, Toronto ON M5G 2M9 Canada

Send response to journal:
Re: Re: Authors' Response

This randomized trial has already provided additional strong evidence that open access (OA) increases the dissemination of scientific articles. The enhanced transfer of knowledge to those whose access currently is restricted because of price barriers is an important example of the advantages of OA.

The absence of evidence for a citation advantage after a second interim analysis, done after incorporating six additional months of citation data, still does not provide strong evidence against an OA citation advantage. If, after 2 to 3 years, the trial continues to yield no significant evidence for a citation advantage, then alternative explanations for the OA citation advantage found by others would need to be considered more seriously.

If the self selection postulate does gain support, what's the problem? If authors selectively choose to make their best articles OA, or if highly cited authors prefer OA options, this means that higher quality articles are being self selected for wider dissemination. Why not regard this as yet another example of the advantages of OA?

Competing interests: The author has selectively chosen to make his most recently published articles openly accessible.

Open questions for the author / comment on Harnad 9 August 2008
Previous Rapid Response Next Rapid Response Top
Gunther Eysenbach,
Senior Scientist, Associate Professor
Toronto M4L3Y7

Send response to journal:
Re: Open questions for the author / comment on Harnad

Dr Davis continues to justify his short observation period with citing the PLoS paper. But, I hope Davis still answers the 8 questions I posed for him here and on my blog, in particular question #8 "PNAS has a very high impact factor of >10 (i.e. a very high citation rate). What are the impact factors of the journals you included and how would the different citation rates affect the type II error?". Surely, one would expect to see a statistically significant citation advantage earlier in a journal that has higher citation rates?

As to Dr Harnads comment on "Eysenbach is continuing to promote his own published report on the significantly higher number of citations for OA compared to non-OA articles as being the first and only methodologically sound demonstration of the OA citation advantage)": This is obvious nonsense. I wont't repeat my arguments why the PLoS study represented an advance at the time here (just to mention the keywords "prospective" and "adjusting for multiple confounders" - for an educational piece on this see http://www.webcitation.org/5ZttrqKAz ).

Science often progresses in small steps, each study adding something to prior studies. Early studies (Lawrence, Harnad, Antelmann etc) were completely uncontrolled, retrospective comparisons between crude citation rates of openly accessible articles (perhaps stratifying for a single confounder, but not for multiple confounders). The PLoS study was the first prospective (not retrospective, as Davis misleadingly writes) cohort study using multivariate regression methods to adjust for multiple confounders associated with study quality and author impact, which previous studies have not adjusted for (the PLoS study also showed that by adjusting for these multiple confounders, any citation advantage for self-archived (GREEN) open access articles disappeared -- which is why Dr Harnad dislikes the study). Dr Davis' study is the first published RCT (with others ongoing - including my own RCT), which gets rid of self-selection bias and residual confounding.

In epidemiology, evidence of risk factors or preventative factors from cohort studies - which remain predictors for the outcome even if one adjusts for multiple confounders - often motivate and justify the conduct of randomized trials. As such I still stand behind the results of the PLoS study as one important step to generate objective evidence beyond any advocacy rhetoric.

Competing interests: Editor of the Journal of Medical Internet Research (http://www.jmir.org), an open access journal

Author's Second Response to Eysenbach 12 August 2008
Previous Rapid Response Next Rapid Response Top
Philip M. Davis,
Graduate student
Dept. of Communication, Cornell University,
Bruce V Lewenstein, Daniel H Simon, James G Booth, and Mathew J L Connolly

Send response to journal:
Re: Author's Second Response to Eysenbach

Dr. Eysenbach has expresses repeated concerns over the negative results presented in our paper [1] and rightly questions whether we have waited long enough to observe differences, and secondly whether we have the statistical power to detect citation differences.

In his own study of author-sponsored open access publishing in the journal PNAS [2], Dr. Eysenbach found large and significant differences in the odds of being cited very shortly after publication (Odds Ratio: 1.7 after 0-6 months; 2.1 after 4-10 months; 2.9 after 10-16 months).

Considering the magnitude of these results (and those of others studies) we should have seen differences in the data -- even if not significant -- early in our study. Our regression analysis performed at 9-12 months (see Table 3) indicates that the direction of the Open Access effect goes in the opposite direction, reducing the incidence rate of being cited by about 5% (although this is not statistically significant). That the effect is biased against randomized open access articles provides additional confirmation that there is no positive effect of open access on article citations. As we mentioned in an earlier post, we are still unable to see an open access with an additional 6 months of data. While there is a risk of a Type II error, that is failing to detect an Open Access effect on citations when one really exists, there is little evidence of committing such an error in our study.

Impact Factors and Statistical Power
Based on our sample size and the standard variation in citations of APS journals, we calculated that we would be able to detect significant differences of about 20% (see the section Sample Size. While some of the 11 APS journals under investigation are cited more heavily than others, the impact factors range from as low as 3.632 for the Journal of Applied Physiology to 29.600 for Physiological Review. In comparison, PNAS = 9.598 for Dr. Eysenbach’s observational study.

Merits of Randomized Controlled Trial over Observational Study
While Dr. Eysenbach’s study of PNAS was exceptional in that it attempted to control for competing explanations for a citation advantage, there were variables left out of his analysis that are well known to be significant predictors of future citations, such as length of article, type of article (e.g. review, method, empirical), and number of references [3-5]. In addition, his study fails to account for indicators of novelty and newsworthiness -- factors which may have led some authors to pay the additional page charges ($1,000) for immediate open access status -- such as whether the article was featured on the cover of the journal, received a press release, or picked up by external scientific new sources and the lay press [6-8]. For example, open access articles published in Eysenbach’s sample of PNAS articles were more than twice as likely to be featured on the front cover of the journal, and nearly twice as likely to be picked up by the media as subscription-based articles [9]. A randomized controlled trail (the method we implement in our study) eliminates the chance that confounding variables -- whether they can be measured or not -- can bias the results.

Notes:
1. Davis PM, Lewenstein BV, Simon DH, Booth JG, Connolly MJL. Open access publishing, article downloads and citations: randomised trial. BMJ 2008;337:a586. http://dx.doi.org/10.1136/bmj.a568

2. Eysenbach G. Citation Advantage of Open Access Articles. PLoS Biology 2006;4(5). http://dx.doi.org/10.1371/journal.pbio.0040157

3. Stewart JA. Achievement and Ascriptive Processes in the Recognition of Scientific Articles. Social Forces 1983;62(1):166-189.

4. Baldi S. Normative versus Social Constructivist Processes in the Allocation of Citations: A Network-Analytic Model. American Sociological Review 1998;63(6):829-846.

5. van Dalen HP, Henkens K. What makes a scientific article influential? The case of demographers. Scientometrics 2001;50(3):455-482.

6. Phillips D, Kanter E, Bednarczyk B, Tastad P. Importance of the lay press in the transmission of medical knowledge to the scientific community. New England Journal of Medicine 1991;325(16):1180-1183.

http://content.nejm.org/cgi/content/abstract/325/16/1180 7. Kiernan V. Diffusion of news about research. Science Communication 2003;25(1):3-13.

8. Chapman S, Nguyen TN, White C. Press-released papers are more downloaded and cited. Tobacco Control 2007;16:71-. http://dx.doi.org/10.1136/tc.2006.019034

9. Davis PM. Citation advantage of Open Access articles likely explained by quality differential and media effects (letter). PLoS Biology 2007. http://arxiv.org/abs/cs/0701101

Competing interests: None declared

Access is more important than citation 12 August 2008
Previous Rapid Response Next Rapid Response Top
Giulio Bognolo,
consultant cardiothoracic surgeon
The Heart Hospital, London, W1G 8PH

Send response to journal:
Re: Access is more important than citation

The article by Davis and colleagues started a good number of rapid responses. The majority focused on methods and study’s power to detect a difference in citation rates. Several commentators highlighted limitations of the study, flaws, and explained why, in their view, the results may have limited importance. I agree that methodology may have been more robust. Possibly the results would have been different, if the authors would have used a different strategy, or simply have pursued a longer follow-up.

As a clinician, editor, and reader of medical journals, I think I still see an important message here, one that publishers, editors, and authors alike should listen to very carefully, once more. OA articles are more likely to be read than pay-per-access articles. From an author and reader perspective, this is great. The trial confirmed that OA increases readership and dissemination of information. Readers are more likely to read an article if it is open access, and this is what any editor and author should aspire to. If citation rate does not go up, hey, that may not be the end of the world after all. Is it a good way to measure the impact and importance of an article anyway? I don’t think so, and I certainly wouldn’t lose my sleep about it.

Competing interests: I am an advocate of open access publishing, and former associate editor of an open access journal

Open access journals: publication costs deter junior academics 14 August 2008
Previous Rapid Response Next Rapid Response Top
Gabriele Pollara,
Senior house officer
Royal Free Hampstead NHS Trust, London, UK

Send response to journal:
Re: Open access journals: publication costs deter junior academics

One aspect of the debate on open access jounrals that has been relatively neglected is the cost to publish in these. Retaining copyright of the manuscript often carries a significant upfront fee that is unaffordable to clinicans interested in doing small scale research but without significant funding in the form a grant. Thus, whilst the discussion regarding the effects on readership and citation rates is valid and interesting, the choice to publish in either traditional subscription-based or open access journals may not always be available to everyone.

Competing interests: None declared

On Eggs and Citations 29 August 2008
Previous Rapid Response  Top
Stevan Harnad,
Canada Research Chair in Cognitive Sciences
Université du Québec à Montréal, Montréal, Québec, Canada H3C 3P8

Send response to journal:
Re: On Eggs and Citations

Failing to observe a platypus laying eggs is not a demonstration that the platypus does not lay eggs. You have to actually observe the provenance, ab ovo, of the little newborn platypusses, if you want to demonstrate that they are not engendered by egg-laying.

Failing to observe a significant OA citation Advantage after a year (or a year and a half -- or longer, as the case may be) with rando mized OA does not demonstrate that the many studies that do observe a significant OA citation Advantage with nonrandomized OA are simply reporting self-selection artifacts (i.e., selective provision of OA for the more highly citable articles.)

You first have to replicate the OA citation Advantage with nonrandomized OA (on the same or comparable sample) and then demonstrate that randomized OA (on the same or comparable sample) eliminates the OA citation Advantage (on the same or comparable sample).

Otherwise, you are simply comparing apples and oranges (or eggs and expectations, as the case may be) in reporting a failure to observe a significant OA citation Advantage in a one one-year (or 1.5 year) sample with randomized OA -- along with a failure to observe a significant OA citation Advantage for nonrandomized OA for the same sample either (because the nonrandomized OA subsample was too small):

The many reports of the nonrandomized OA Citation Advantage are based on samples that were sufficiently large, and on a sufficiently long time- scale (almost never as short as a year) to detect a significant OA Citation Advantage.

A failure to observe a significant effect with small samples on short time-scales - - whether randomized or nonrandomized -- is simple that: a failure to observe a significant effect: Keep testing till the size and duration of your sample of randomized and nonrandomized OA is big enough to test your self-selection hypothesis (i.e., comparable with the other studies that have detected the effect).

Meanwhile, note that (as other studies have likewise reported), although a year is too short to observe a significant OA citation Advantage, it was long enough to observe a significant OA download Advantage -- and other studies have also reported that early download advantages correlate significantly with later significant citation advantages.

Just as mating more is likely to lead to more progeny for platypusses (by whatever route) than mating less, so accessing and downloading more is likely to lead to more citations than access and downloading less.

Stevan Harnad
American Scientist Open Access Forum

Competing interests: OA Advocate and co-author of several articles reporting OA citation advantage