1. Open Access as a Paradigmatic Change in Scientific Document Dissemination
The promise of free and efficient information transfer is a cornerstone of the ideology underpinning the information society. The mathematician, Andrew Odlyzko (1994), predicted a brave new world of scientific e-publishing that would be ‘dramatically cheaper’ than the traditional paper journal based model. After almost twenty years of the publishing of Odlyzko’s paper, the present seems to be rather different from the future he predicted. For example, there has been no dramatic decrease in the purchasing costs of e-journals. In fact, scientific publishing is considered to be one of the most profitable and successful business practices. The profit margins of scientific publishers have been estimated to be between 20 and 30 cents in the dollar (Monbiot, 2011; Van Noorden, 2013, p. 427).
One could even argue that libraries are an integral part of the present publishing model, which is based on outsourcing the dissemination of scientific results—and the organization of peer review—to publishing companies. This model has been shown to be extremely challenging in economically turbulent times because it is based on the presumption that there will be continuous growth of the subscription costs of scientific journals. (Odlyzko, 2013.)
What we have actually witnessed since the 1990s is an ever-increasing prevalence of paywalls, i.e., technical mechanisms separating the digital content for which one has to pay from the open Internet content. The majority of new scientific information lies behind these paywalls. While it is true that the traditional paper-based publishing model has experienced a major transformation, many aspects of the old model still exist, only transformed into a new form. There has been an evolution but not the kind of complete digital revolution that many commentators of scientific publishing had prophesied. This development has been described as a digital divide: although the digital technology enables free dissemination of scientific information, there are many other economic, social and political barriers stopping this development (Ragnedda and Muschert, 2013).
The Open Access (OA) movement has definitely helped the scientific community in disseminating journals to a broader audience and at the same time it has led to a discussion about profits and costs involved in scientific publishing. Van Noorden (2013, p. 427; see also Laakso & Björk, 2012; Schimmer, Geschuhn, & Vogler, 2015) has argued that the OA publishing seems to be more cost effective than traditional publishing. Based on his analysis, the average cost of publishing an article in the traditional non-OA journals is $3500–$4000; to which one needs to add on the 20 to 30% profit expected by the publishers so the total cost is about $5000. The costs for publishing one article in BioMed Central or PLOS ONE are between $1350–$2250. In addition, there is some evidence that the average impact of OA articles is greater than the impact of those published behind a pay wall (Antelman, 2004, p. 379).
The turbulence within the scientific publishing is evident also in the different forms of OA dissemination; these have been characterized as Green and Gold OA. In the Green OA model, research organizations have built institutional repositories into which researchers can self-archive their publications, even if they have been published in paywalled journals. There are also a number of subject-based repositories which have the same role. Green OA is costless to the authors, but Gold OA usually—but not always—incurs costs for the authors. The publishing fees are called Article Processing Charges (APC). In addition, the traditional publishers have started to provide Open Access to articles published in their paywalled journals (Hybrid OA): by paying a set fee, the author(s) can provide free access to their papers.
In addition to many legitimate OA publication channels, the academic community has also seen the rise of so called “predatory OA publishers”. These companies are more interested in obtaining financial resources from academics rather than making sure that the quality of their publications meets the standards of the scientific community, either ethically or scientifically.
The provision of OA to the results of publicly funded research has been placed onto the political agenda by the research funders. For example, the European Union has decided to encourage OA publishing (see European Commission, 2013). The Finnish government has lately stressed the importance of the openness of the science in its strategies and policies (see more at http://openscience.fi/), although the actual policies and infrastructure are to some extent still under debate.
The emerging open science requires services and infrastructures, which need to be funded. For the research libraries, one of the key questions in the discussion is whether it would be possible to fund the APC costs by diverting the money currently spent on site licences of paywalled content to the APC payments. Thus the aim of this paper is to investigate the different business models of scientific publishing, what effects these have on the libraries and to analyse the present library statistics: how do they manage their collections and manage statistical data about open science?
2. Open Access as a Challenge for the Library Statistics Collection
The role of the research library is likely to change substantially as Open Access becomes more ubiquitous. Instead of acquiring materials produced elsewhere for the consumption of local patrons, the library will have a larger role, in making sure that locally produced scholarly products are disseminated effectively world-wide via Open Access publication channels. This will also represent a challenge for the collection of relevant library statistics.
Many research organizations utilize a Current Research Information System (CRIS) for the collection of publication data (see De Castro, Shearer, & Summann, 2014; Ilva, 2014). It is very reasonable to adopt this system also for the collection of data on the Open Access status of publications. There are two general approaches to the collection of this data—either it is reported by the researcher him/herself, or the data is gathered and generated from other sources, including the local Open Access repository and lists of OA publication channels such as the Directory of Open Access Journals (DOAJ, https://doaj.org). Both of these approaches have their limitations, and the collection and verification of the data is not always straightforward. There may well be also other complicating factors e.g. publisher-defined publication embargos.
In many European countries, the progress on the Open Access availability of research publications is monitored at a national level and there are also a number of international services (including the EU-funded OpenAire portal) harvesting information from the local and national systems. For example, the Nordic countries, Norway, Denmark, Sweden and Finland, have their own national data collection systems, although there are significant differences between the approaches these countries have adopted (for an overview of the Nordic situation see Ilva, 2014). In Norway, both the collection of publication data and the upload of self-archived publications are being integrated into the national CRIStin system. In addition to local university-level repositories, the metadata of Open Access publications is available for searching, browsing and analysis in a separate NORA interface (http://nora.openaccess.no/), which contains a subset of the CRIStin data. Denmark has its own national research publications portal (http://forskningsdatabasen.dk/), the content of which is harvested from the local CRIS of each Danish university. A national project, Open Access Barometer (see Price, 2014), has been working on the quality of the data. In Sweden, the publication data is harvested from local repositories into a national portal, SwePub (http://swepub.kb.se), maintained by the National Library of Sweden.
In Finland, the Ministry of Education and Culture collects the publication data from universities as part of the annual data collection. The collection of accurate data on research publications is of paramount importance, as the number and quality of these publications is one of the main criteria used in the current funding model for the Finnish universities. At the moment, more than 200 million euros a year (13% of total state funding to the universities) is distributed on the basis of this data. As there are plans to make the Open Access availability of research publications an additional criterion in the funding model starting from 2019, the collection of data on the prevalence of Open Access will become even more important.
The collection of publication data is handled by CSC—IT Center for Science, and the data is used both in the Vipunen statistics portal (http://vipunen.fi) and in the Juuli Research Publications portal (http://www.juuli.fi, maintained by the National Library of Finland). Unfortunately, the quality of data on Open Access is currently relatively poor, partly due to motivational reasons at the local level, partly because of problematic categories and instructions in the data collection itself. There are plans to clarify the categories and improve the methods of data collection in the near future.
As far as the Article Processing Charges of Gold OA and Hybrid OA journals are concerned, currently much of the money used for this purpose comes directly out of research funding, and in many cases, the library is not even aware of the flow of money associated with OA publications. This is by no means an optimal situation, as the libraries are paying a significant amount of money for site licenses of digital content, and there is a real danger that the publishers are charging both licensing fees and Open Access APCs for the same content (this is called “double dipping”). (See Björk & Solomon, 2014a,b).
From this point of view, it would make sense to create university-level OA funds, which would collect and administer all of the money used for OA costs, including both APCs, membership fees and also voluntary subsidies collected by some of the OA publishers using alternative business models. A centralized fund would make it much easier to monitor both the prevalence of Gold OA and the associated flow of money. There would be potential savings in the transaction costs, and more important, this would make it possible to combine the data on both licensing costs and OA charges at an organizational and (with some extra effort) also at a national level. In some European countries including the United Kingdom (Pinfield, Salter, & Bath, 2015), Norway and Sweden (Eriksson, 2013), university-level OA funds are already fairly common; in Finland they are currently still in early planning stages.
The development of OA business and funding models is currently undergoing a rapid spurt. While this is a major transition envisaged to exert a significant potential impact on research library budgets and workflows, the collection of library statistics does not currently provide fully adequate means for taking into account this change.
The libraries are advised to compile their annual statistics following the International Standard ISO 2789 – International Library Statistics (ISO 2789:2013(E)).
According to this protocol, electronic journals in free Internet resources (ISO 2789:2013(E):2.3.22–23) which have been catalogued by the library in its online catalogue or a database should be counted and reported separately (ISO 2789:2013(E):126.96.36.199) by counting the number of links to individual free Internet resources (electronic journals, etc.) which have been catalogued by the library in its online catalogue or a database (ISO 2789:2013(E):6.3.15).
In addition, with respect to the data collection part of the standard, there is a reference to counting the costs of institutional or single author fees for open access publishing paid by the library (ISO 2789:2013(E):188.8.131.52).
The annual statistics of Finnish libraries of higher education (HE) are compiled in the Finnish Research Library Statistics Database. The statistics are mainly collected according to the standard ISO 2789, but there are no statistics covering the use of the OA publications, nor information about their costs. Instead, the usage statistics and economy of electronic journals cover the use of both OA and PW together, with no possibility to separate them from each other.
3. Comparing Costs Associated with Open and Paywalled Access in Finnish Context
The aim of our study was to identify the costs associated with the OA and paywalled publications produced in Finland. The data for this study was compiled from the Finnish National Research Publications database Juuli (http://www.juuli.fi), the National Higher Education Statistics Portal Vipunen (http://vipunen.fi) and the Finnish National Research Library Database Kitt (https://yhteistilasto.lib.helsinki.fi/).
There is a wide variety of estimates concerning the publication fees of OA articles on the international level. Schimmer et al. (2015) report average article processing charges (APCs) in the range of €1100 to €1686, depending on the source of information. They predict that the APCs will be “well below €2000 in a purely open access scenario”, while according to their data, the expenses of the current subscription-based publishing model is between €3800 and €5000 per article. Pinfield et al. (2015) have come up with even higher numbers, based on the actual APC payment data collected from 23 British institutions. According to their data, the mean cost of APCs – including both Gold OA and hybrid journals—was £1682 (ca. €2200).
On the other hand, Solomon and Björk (2012) have come up with smaller numbers using a data set collected from the journals listed in the Directory of Open Access Journals (DOAJ). According to them, the average OA publication fee for Gold OA journals was €816 ($904). The difference is probably mostly due to the different source for data—the journals listed in DOAJ are a more heterogeneous group both in geographical terms and in the fields of research they represent than the journals in either of the previously mentioned studies. We have used the figure provided by Solomon and Björk as a comparison price for our subsequent analysis.
However, all of these authors agree that the average APC for OA articles published in hybrid journals is significantly higher than the average APC for articles published in Gold OA journals (see also Björk and Solomon, 2014b). The general consensus seems to be that there is currently no properly functioning market for the cost of APCs in hybrid journals. Many of the European research funders don’t provide monetary support for publishing in these journals.
Our study concentrated on peer-reviewed publications in journals, conference proceedings and books, using the 2011–2013 publication data collected from the Finnish universities and universities of applied sciences. According to the Vipunen statistics portal, researchers affiliated with Finnish universities published 53,556 peer-reviewed publications in paywalled publication channels during the years 2011–2013. During this three-year period, the total cost of the e-journals in budgets of the Finnish research libraries was €52 million. If we assume that this sum would have been used to buy open access to the whole Finnish article output, the weighted mean price per peer-reviewed article would have been €970.
It should be noted that the total number of Finnish publications used for our analysis includes journal articles, monographs, book chapters and articles in conference proceedings. In addition, about 17% of the publications came out in Finnish publication channels. This means that the total number of publications is quite a bit higher than e.g. the numbers used by Schimmer et al. (2015), which were based on the more restrictive Web of Science. The average OA publication fee for all Finnish publications would probably be lower than the average fee for the articles published in the (often high-prestige) international journals included in the Web of Science (see also Geschuhn, 2015).
In our study the Finnish publications were categorized by their publishing channel, OA indicating open access and PW paywalled, i.e. publications in non-OA publication channels. The quality of data had some issues, as some of the organizations had reported the OA status of a large number of their publications as “unknown”. However, we assumed that all of the publications with this status were non-OA publications, although this may not be correct in some of these cases.
Nonetheless, conducting a comparison between the OA costs of locally produced research outputs and the licensing costs of globally produced scientific content is obviously fraught with problems. However, we have assumed that at some point in the future, Open Access publishing will become the global norm that will be adopted in all countries so that it will no longer be necessary to pay for new licensed content.
In addition, to simplify our calculations we have assumed that the volume of Finnish research output and the amount of money spent on current licensing deals are both at the average international level, which is actually not quite the case. In reality, those institutions and countries producing a higher-than-average number of publications would also pay a larger amount of money for the APCs. On the other hand, as many of the scientific publications are produced in co-operation with researchers from other organizations, it is worth noting that in these cases, the APCs are likely to be paid only by the organization of the corresponding author. There is some discussion on the effects of these issues later in this paper.
Figure 1 illustrates both the numbers of peer-reviewed publications produced at Finnish institutions of higher education and the costs associated with the purchase of paywalled scholarly materials for these organizations. The first columns in the time series indicate the number of OA publications and the second column paywalled publications (PW). Based on this data, the share of OA publications can be estimated to be between 15 and 20% of the total number of peer-reviewed research publications.
In addition, we counted the hypothetical total APC cost for the current OA and PW publications in the ideal case that they had all been published in OA publication channels. Figure 2 shows comparison where the first column in each time series represents the costs currently paid by the HE libraries for the access to the paywalled journals. The column also includes the estimated amount of the OA costs (in lighter grey), based on the average cost of OA APCs in Gold OA journals.
The second column in each of the time series is an estimate that calculates the costs of all Finnish publications if they were to be published totally in OA. Again we have used the average price and disregarded some of the complicating factors which will be discussed below.
As can be seen from Figure 2, the hypothetical cost of the all-OA model appears to be somewhat lower than the cost of our current model with its mostly licensed/paywalled content. However, there are two other complicating factors which mean that the actual cost savings associated with OA publishing would be somewhat higher, especially on a global level.
First, only the home organization of the contributing author of each publication will pay the APCs for publications which have co-authors from many organizations. As there are articles with tens, hundreds, or in some cases even thousands of authors, this will mean that each organization will pay APCs for only some of the articles produced by its scholars or researchers. According to Schimmer et al. (2015), the quantity of APC-relevant publications for an institution generally lies between 40 and 60% of its total research output, depending on a number of factors including the research intensity of the organization. Although the difference is not quite as large at the national level, it is still significant especially for a small country like Finland, as international co-publications are very common in many fields.
Secondly, both the publication volume and the amount of money spent on licensing deals differ from country to country. Finland is currently producing a fairly high number of research publications, while compared to the examples cited by Schimmer et al. (2015), our per capita licensing costs seem likely be rather close to the European average. This means that the potential savings for Finland may be somewhat lower than for countries with either a smaller volume of publications or higher than average licensing costs.
Of course, it should be noted that in an all-OA model, all countries and all institutions would have access to all scientific publications, which would expand the benefits of the transition far beyond the level of whatever cost savings were achieved.
After the publication of Schimmer et al. (2015) there has been a quite extensive discussion on the prospects of migrating the money used by the libraries for site licenses to the payment of large-scale Open Access deals with the publishers (see, e.g., Shearer, 2015). The pioneering work done by the SCOAP3 consortium has to a large extent proved that this is possible. The aim of the SCOAP3 consortium was to “to convert high energy physics articles in the leading journals to “gold” open access” (SCOAP3, 2015), and although the negotiations were not easy, the consortium managed to make deals with the publishers to open up most of the major journals in this field.
One of the critical questions for the funding of Open Access in the future is whether the Article Processing Charges will stay at their current level. The leading journals with established brand names and high Impact Factors may charge higher than average sums, and there are clear incentives for researchers to try to publish in these journals. In addition, the hybrid journals tend to charge higher APCs than Gold OA journals (Björk & Solomon, 2014b). On the other hand, there are also new emerging OA business models which may lead to diminished costs. These include both membership-based models like that used by PeerJ (https://peerj.com/) and models which are based on library subsidies, including the one adopted by the recently founded Open Library of Humanities (https://www.openlibhums.org/). To make sure that the publication funds are used optimally, it is very important that the libraries are able to collect information on the flow of money from the research organizations to publishers.
Another key issue for libraries, as far as licensing deals are concerned, is legacy content—the non-OA back issues of scholarly journals, which in some cases go back more than a hundred years. While they may no longer be critical for cutting-edge research in some research areas, in fields like the humanities and social sciences, they may well be very relevant. If the current licensing deals for new publications are cancelled at some point when Open Access becomes ubiquitous, will there be some other way to provide access to this content?
4. Conclusions: Open Access and its Challenges for the Library Statistics and Funding Models
The on-going movement from paywalled publishing to Open Access is a major transition, which will have an impact on the role that research libraries will play within their organizations. There is also a huge demand for reliable statistical information both on the prevalence of OA and on the flow on money associated with different OA business models.
From the point of view of the research libraries, the rise of OA publishing holds a promise of significant cost savings, although it is still unclear how soon (if ever) we will reach a point at which it becomes feasible to start cancelling the major licensing deals we currently need to sign in order to obtain access to paywalled scholarly content. If the libraries wish to speed up this development and to make sure that they have a key role in the future, it is essential that they take an active role in monitoring the OA costs, making sure that the costs are administered and negotiated in an optimal way.
Although much of the statistical information on OA will be collected at an international, national or organizational level, it is also important that the collection of library statistics will be able to deal with these new demands adequately. The statistics must provide the library leadership with reliable information and functioning tools that help it to navigate the stormy waters they are experiencing today.
At the present, the number of electronic journal titles which a library has licensed to provide in its list of electronic periodicals does not differentiate their status as either OA or PW publication channels. The same applies to economy statistics and usage statistics: it is not possible to identify either of the categories from the statistics since they are often both included under the same category.
However, as shown in Figure 2 (and elaborated in the following discussion), it can be argued that OA publishing would be more affordable in the Finnish context than the current license-based model. However, to do this we need to assume that the figure on the average publication fee of Gold OA publications provided by Solomon and Björk (2012) (€816) can be applied to the Finnish publication data from 2011 to 2013. Unfortunately the statistics available at the moment are not reliable or exact enough to explicitly prove this hypothesis, and there are obviously many complicating factors which make it hard to predict at what cost the migration of all Finnish publications to OA would be achieved. Nonetheless, in the present economic climate where libraries and their funding organizations need to be able to show that they are operating cost-effectively, it is clearly important to be able to organize data collection to support this target.
The International Standard ISO 2789 gives a clear framework for compiling the statistics of OA vs. PW. From this point of view, it would be feasible to include this data in the library statistics, especially if the library starts to administer the payments of APCs, membership fees and subsidies to the scholarly OA publishers.
The final success of Open Access depends on finding solutions for both the long-time preservation and knowledge organization of documents and data. These are two major challenges that need to be solved by the digital academic libraries and archives.