Skip to Content

Open access – have we reached the tipping point?

Written on January 15, 2013 at 10:38 am, by

“Open access” only really gets interesting when users and product developers have access to a critical mass of content. I’ll leave the definition of “Critical mass” vague at this point, but it probably means something like 80-90% of the most cited content in a field such as biomedical research. But for this to happen the funding agencies across the discipline need to take hold of the reigns and shape the conversion process.

PMCOA

Last year a number of key European funding sources did give their full backing for open access publishing. The UK government announced that it would require much of the country’s taxpayer-funded research to publicly available from April 2013 onwards and the European Commission said that it would require all work funded by its Horizon 2020 research program to be freely available. Can we now begin to imagine an STM publishing environment in which open access is the dominant business model?

As is often the case, the future is a little easier to see in the US where since 2008 the NIH’s Public Access Policy has required scientists to submit final peer-reviewed manuscripts derived from work supported by NIH funds to the digital archive PubMed Central. Initially compliance was an issue, but from Spring 2013 NIH will delay processing of non-competing continuation awards if publications arising from grant awards are not in compliance with the Public Access Policy.

So how much free full text access is there? The graphic above compares the number of articles available as free full text as a proportion of the total volume of full text content available over time on PubMed. Simple key words such as “epilepsy” and “muscular dystrophy” were used to select a particular corpus and the total numbers of articles computed using the “Free full text[sb]” and “Full text[sb]” limiters.

The fraction of available content that is free varies quite significantly across the five fields being highest for “genomics” and the orphan disease, “leishmaniasis”. It is lowest for “epilepsy”. All of the trajectories increase gradually over time reaching levels of 30-50% by 2011. The rapid fall off after that is caused by the embargo periods of 6-12 months imposed by many publishers contributing OA content.

Why the differences between the fields? This question needs more research, but from visual inspection of the PubMed result lists it would seem that new areas such as genomics are driven more by younger journals that have fully embraced the open access model, whereas fields such as epilepsy are still dominated by society and larger commercial publishers that haven’t.

If that turns out to be the case, then the future dynamics of conversion to open access (at least in biomedicine) will be determined by the interaction between the leading funding agencies and top publishers.

This will have two interesting consequences. Firstly, as the funding agencies’ OA mandates become actively enforced authors will turn increasingly to OA publishers in order to meet these requirements with a minimum of hassle, and journals such as PLoS One and repositories like Europe PubMed Central will flourish. Secondly, open access “contagion” will spill over more rapidly into other disciplines such as engineering and chemistry. After all, the Horizon 2020 program is not just about biomedicine!

So, have we reached the tipping point? This analysis suggests that the 80-90% target cannot be reached by incremental growth. Something more has to give.

Global competition and the demise of the Ivory Tower

Written on January 10, 2013 at 10:43 am, by

According to Wikipedia the term Ivory Tower, which originates in the Biblical Song of Solomon, has been widely used to describe an academic environment where intellectuals engage in pursuits that are disconnected from the practical concerns of everyday life. Those days are over.

Battelle’s annual R&D report and the National Science Foundation’s Science and Engineering Indicators report for 2012 both highlight the strong link between the growth in national economies and their corresponding R&D budgets. Once the US, UK, Japan and Germany were dominant, but their positions are now being challenged by Brazil, China, India, Singapore and South Korea as these countries spend increasing slices of their GDP on science and technology in the belief that this investment will be crucial for their future competitiveness.

Global R&D

At the same time, the R&D investments made by industry (more often than not multinational) have outstripped those being made by national governments and charities. In the US, for example, R&D spending overall is dominated by development activities, largely performed by the business sector. The business sector also performs the majority of applied research, but most basic research is conducted at universities and colleges and funded by the federal government. These centers of excellence often receive support from the multinationals as an act to overcome nationalist fears (and gain access to the brightest brains). For example, Schlumberger’s “Brazil Research and Geoengineering Center” in Rio de Janeiro, inaugurated in 2010, is designed to promote the integration of geosciences and engineering to improve oil production and recovery from the country’s recently discovered huge deep-water offshore reservoirs.

Indeed, recent surveys conducted by Nature and the UK’s Royal Society have shown how important international academic collaboration has become. According to the Nature report, “national boundaries are being transcended by networks of collaborators and researchers who are much more mobile than in the past”, and governments are responding to this trend by trying harder to actively manage this brain drain. Brazil’s Programa Ciência sem Fronteiras is a good example of how “hands on” these attempts can be.

The increasing reliance on academic centers to provide a major source of innovation requires research funders to be able to identify and manage the intellectual property they create and focus on the fields and disciplines where they perform best. For example, use Scimago’s tools to compare Brasil’s biomedical/agricultural focus with China’s investment in engineering (shown in co-citation graphic below).

China

Overall then, economic competition will lead to increasing R&D specialisation, driving a global need for better management tools in the shape of metrics and statistics. This will in turn drive up the rate of conversion to the open access business model and shift emphasis way from the publication process as the major arbiter of quality, upstream to the allocation of funding and further downstream to assessments based on post-publication metrics such as paper rank algorithms not much different to the ones currently used by Google. In a future post I will look more closely at the development of these metrics.

Scientific publishing goes “Boink”

Written on January 9, 2013 at 11:49 am, by

A simplified picture of the scientific research process would include a cycle of events beginning with the identification of a new idea or hypothesis (A “known unknown” after Donald Rumsfeld’s definition) derived from a gap analysis of the literature, followed by competitive peer review of the corresponding research proposal and the release of funding and other resources. The conclusions drawn from the research are then divulged to the scientific community after a further stage of peer review directed at assessing the validity of the methodology and establishing the anticipated impact of the work. Thus, eventually, the new data updates the consensus view (the “known knowns”) and the cycle would be repeated.

Research cycle2

Described in this way, several weaknesses and inefficiencies can be identified by anyone familiar with the way the process currently works. For example, if a funding agency has commissioned a piece of research, why should a publisher be allowed to impede the communication of its findings if the authors work is methodologically plausible? Isn’t it important for the community to be made aware of hypotheses that are wrong as well as those that are correct?

It is true that the hierarchical filtration of papers by likely impact leads to a relatively stable ecosystem of journals based upon citation statistics, but this second round of peer review does little to help define the consensus view in a particular field. The citation of individual articles in Nature, for example, ranges over many orders of magnitude, and a significant proportion of results published in high impact titles turn out to be biased or just methodologically flawed. [See recent Blog post.] To be able to identify the really important “known unknowns” requires access to the relevant results generated by all research, so that the conclusions can be double-checked and their importance assessed over time as a consensus view develops.

The purpose of publishing (and its underlying business model), the process of peer review and the assessment of the value of individual pieces of research (and the effective return on investment made by the funders) are all under pressure to evolve. Future articles on this blog will take a look at some of the drivers of these changes and try to answer the question “Has scientific publishing really gone ‘Boink’, or has nothing fundamentally changed…?”.

Mistakes happen: but they need to be corrected more rapidly

Written on September 20, 2012 at 12:13 pm, by

“There is increasing unrest in global science. The number of retractions is rising, new examples of poor oversight or practice are being uncovered and anxiety is building among researchers.”  Thus spoke Jim Woodgett in a recent article published in Nature’s Worldview section, concluding that, although most mistakes are unintentional, there needs to be much greater  focus on the early detection of fundamental errors in the experimental design and on correcting defective data.

Sometimes the distinction between scientific misconduct and bad experimental design is hard to make – see for example, the recent article regarding the work of Marc Hauser, a Harvard professor accussed of “fabricating” data published in the journal Cognition.  But what this shows is that the gold standard of peer review cannot cope with the volume and diversity of modern scientific outputs.

These problems are not new, but they are growing in importance.  Back in 2005 John Ioannidis argued, in a provocative article published in PLoS Medicine entitled “Why most published research findings are false”, that, in many current scientific fields, claimed research findings were often  simply accurate measures of the prevailing preconceptions about how a particular hypothesis should be answered.  That is, they were components of paradigms long overdue for a shift in the Kuhnian sense.

Ioannidis’ arguments in PLoS were largely statistical: since the probability that a research claim is true depends on study power and bias, the number of other studies that exist tackling the same question, and  the the prior likelihood that the hypothesis being tested is correct.  He has since backed up this speculation by documenting the demise of a cohort of published medical “breakthroughs“, but the importance of his conceptual work is still not widely appreciated.

For example, similar conclusions have recently been identified in many other fields of research by Ed Yong.  He points out that positive results  dominate most journals, which each strive to present new, exciting research. On the other hand, attempts to replicate these studies, especially when the findings are negative, go unpublished.  There are techniques for detecting the presence of such bias in a cohort of publications, say, detailing the effects of a drug or a surgical procedure, but life would be much simpler if these distortions in the scientific literature did not occur in the first place.

Work published in the BMJ by Steven Greenberg shows  graphically how the publication of inaccurate false positive results can effectively generate information amplifying the unfounded authority of claims misleading researchers for years to come.

Are publishers  largely to blame for this?  Jeffrey Beall of the Scholarly Initiatives Librarian at the University of Colorado Denver thinks they are, claiming that, with the rise of the open access publishing model, there is now a journal willing to accept almost every article, as long as the author is willing to pay the fee. In his view, “Conventional scholarly publishers have had an important role in validating research, yet too often advocates of open access seem to overlook the importance of validation in online publishing. They promote access at the expense of quality: a shortcoming that tacitly condones the publication of unworthy scientific research.”

Perhaps, but the problem has been around since before the advent of the open access model, and is not by any means restricted to “bottom feeding” journals.

As Greenberg’s work shows so beautifully, we need to find ways of editing and annotating the citation networks upon which researchers depend. As Cardinal Thomas  Wolsey once  said of Henry VIII, “Be very, very careful what you put in that head because you will never, ever get it out.”  In the case of the scientific literature, however, we need to find a means of doing just that!

Social networking sites reviewed…

Written on February 12, 2010 at 5:51 pm, by

Here is a link to a blog item from Comprendia, the Biosciences Consulting Group. It is a good review of the some of the current social networking sites available to life scientists and the usage they get. As the authors says, these figures may not be totally accurate, but they do give an indication that some of these sites at least are gaining traction in the community.

Click here is see an earlier list of sites posted here last year.

On the reading habits of the digital generation…

Written on January 10, 2010 at 5:51 pm, by

Read a nice summary by Martin Fenner on digital reading habits across formats. It will kindle your imagination…

Welcome to Jane…

Written on January 9, 2010 at 5:52 pm, by

Jane stands for Journal/Author Name Estimator. None the wiser…?!

Have you recently written a paper, but you are not sure which journal to submit it to? Or maybe you want to find related articles so that you can identify a suitable referee.

Then just cut and paste the abstract of the paper into Jane and it (she?) will search Medline to find matching articles, journals and authors. It’s like eTBlast – see earlier post.

Nice tools, but you should be able to do this with PubMed or Scopus.

Text mining with GeneIndexer

Written on January 2, 2010 at 5:52 pm, by

GeneIndexer has been around now for a couple of years but seems too be getting some marketing dollars spent on it with a full page advertisement in the latest issue of Nature and a sale to the NIH Library. The tool enables researchers to reveal biological significance in a set of co-regulated or associated genes. Applications include:

  • Using keywords to identify and prioritize genes most relevant to any given research question. Keywords can be any string of words, e.g. disease names, molecular pathways, or Gene Ontology classifications.
  • Uncovering implicit, as well as explicit, functional relationships among genes–discover new genes and propose hypotheses above and beyond what is explicitly described in the literature.
  • Building hierarchical trees of genes in which gene subsets are clustered into functionally related groups. This allows researchers to navigate large gene collections easily and adds a new dimension to the analysis and discovery process.

Because GeneIndexer includes all of the genes contained in Entrez Gene and OMIM databases, and uses artificial intelligence and computational linguistic techniques, rather than human curators to identify conceptual gene relationships it is possibly the most up-to-date and accurate system of its kind.

eTBLAST 3.0: a similarity-based search engine

Written on January 2, 2010 at 5:00 pm, by

I have only just come across eTBlast. Conceptually, it is a bit like NCBI’s Blast utility wich assesses sequence similarites between geens or proteins. Here, you simply in put a paragraph of text and eTBLAST will return the abstracts of articles which use the same spectrum of terms. It works a little like PubMed’sRelated Article feature, except here you can supply your own text, which could be from an unpublished article or a research proposal. New features in version 3.0 include “Find an Expert” – identifies authors that are the most published in the topic of your query – and “Find a Journal” – identifies journals that published on your topic the most.

In this day and age of increasingly specialized science, it can often be a non-trivial task to identify an appropriate journal to submit to. And editors are forever trying to identify appropriate referees – so eTBLAST seems to fill a worthwhile niche.

Open access to federally funded research – a public debate

Written on January 2, 2010 at 2:53 pm, by

Currently, the National Institutes of Health is the only federal body that requires that research funded by its grants be made available to the public online at no charge within 12 months of publication. The US Administration is seeking views as to whether this policy should be extended to other science agencies and, if so, how it should be implemented. You can follow the discussion at the OSTC Blog.

The discussion will occur in three major phases:

  • Implementation: Which Federal agencies are good candidates to adopt Public Access policies? What variables (field of science, proportion of research funded by public or private entities, etc.) should affect how public access is implemented at various agencies, including the maximum length of time between publication and public release?
  • Features and Technology: In what format should the data be submitted in order to make it easy to search and retrieve information, and to make it easy for others to link to it? Are there existing digital standards for archiving and interoperability to maximize public benefit? How are these anticipated to change.
  • Management: What are the best mechanisms to ensure compliance? What would be the best metrics of success? What are the best examples of usability in the private sector (both domestic and international)? Should those who access papers be given the opportunity to comment or provide feedback?

A wrap up of phase one is posted here. It provides an interesting overview of how the opportunities presented by Open Access are seen by different research communities.