There is a strongly held view that open access to articles and data will lead to more effective research. To this end, it would be useful if the public availability of research data could be linked to benefits for authors, such as increased citation of their work, as this would drive a virtuous cycle.

Piwowar et al (2007) examined the citation history of 85 cancer microarray clinical trial publications [*] and its correlation to the availability of their data. They found that the 48% of trials with publicly available microarray data received 85% of the aggregate citations. They claim that publicly available data was associated with a 69% increase in citations, independent of confounding factors such as journal impact factor, date of publication, and author country of origin.

The graph shows the number of citations received by each of the trials 24 months after publication, ranked by the impact factor of the journal they were published in. The graph also shows which articles were publically available as opposed to those that were not.
Citation rates for 85 articles referencing array data
There is evidently a strong correlation between impact factor and journal’s data deposition policies both of which have a strong correlation with citation. But what is the causal relationship here.

[*] The 85 cancer microarray trials were published before early 2003 and identified in a systematic review by Ntzani and Ioannidis, Ntzani EE, Ioannidis JP (2003) Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362: 1439-1444