In July 2016, the libraries of the Universitat de Barcelona (UB) and the Universitat Politècnica de Catalunya (UPC) decided to organize an internal workshop to find new ways of collaboration. As a conclusion of the meeting, the participants decided to focus on two projects. The aim of one of these two projects was to share data, methodologies and strategies around open access. Two years later, the main output of the project was the establishment of an observatory providing evidences and monitoring the situation of open access in both institutions: The Open Access Observatory (UPC, 2019a).
There is a clear idea that open access is growing year after year (see for instance the report from Universities UK, 2017) but not many institutions monitor this growth. Some institutions that have mandates or policies measure their fulfilment, but generally just focused on the green road. Institutions try to monitor how their repositories are being filled and how researchers self-archive their research papers. For instance, during the last two years, some Catalan universities have set up “thermometers” to measure the percentage of their scientific production available in their institutional repositories. This is the case of Universitat de Barcelona (UB, 2019), Universitat Autònoma de Barcelona (UAB, 2019), Universitat Politècnica de Catalunya (UPC, 2019b) and Universitat Pompeu Fabra (UPF, 2018).
However, we decided to go a step forward and analyse the uptake of all the open access options among our researchers by analysing the scientific production. We wanted to have the full picture: how many articles are published in open access journals, how many researchers choose hybrid options, and, finally how many papers are available in repositories beyond our institutional ones. With this goal, we started the first project of the observatory: measuring the situation of open access in our institutions. To start the measurements, we took the scientific publications during the last seven years because in 2011 both institutions had already an open access policy in place and we wanted to see its evolution.
After measuring the situation of open access, we decided to begin a second task aimed at estimating how much our institutions pay in publishing open access. Both universities lack a centralised system to pay the required fees; therefore, it is not easy to obtain this kind of information. The payments are carried out by departmental administrative officers using research projects or departmental budgets but without marking them as open access publication in the institutional accountability system. This lack of information makes the tracking and identification of these payments almost impossible. With the methodology we propose, institutions can make an initial estimation on how much they are investing in scientific publishing beyond the current budget on accessing digital resources.
We focused our work in these two activities as a mechanism to monitor the institutional open access uptake and to provide data to international organisations carrying out surveys like the European University Association (Morais & Borrell-Damián, 2019).
The aim of this paper is to share our monitoring methodologies and to engage further discussions with other research institutions in order to get the best tools and evidences for measuring open access. We acknowledge that the results we obtain are an approximation, but we think it gives enough information to make comparisons and estimations, and to identify tendencies. At the end of each explanation of the methodologies, we discuss the limitations of these methods and possible ways to improve and expand the results.
2. The Situation of Open Access in Our Institutions
The first task we assumed for the Observatory was to obtain the situation of open access in our institutions by finding out where research articles are publicly available. We can already identify which papers are available in our repositories, but we wanted to know how many papers are published in open access journals, how many researchers chose a hybrid option or which are the other venues where our researchers’ articles are freely available.
As it has already been mentioned in the introduction, to measure the situation of the open access in our institutions we decided not just to analyse the data from the last year but to take a historical data series from 2011. We chose that year because it is the year when both institutions had already adopted their institutional open access policy (Universitat de Barcelona, 2011; Universitat Politècnica de Catalunya, 2009). Our initial measurement was done with 7 datasets, one per year covering the period 2011–2017. During this year 2019 we have included an eighth dataset corresponding to 2018.
The first data we obtained was from Web of Science, accessing at its Core Collection where it is possible to filter by the affiliation of the authors. As we are aware of the problems of naming institutions, we decided to search in the Organization-Enhanced name field the unified names of both institutions. We decided to focus only on research articles leaving other types of publications (e.g., book chapters, reports, conference papers) for further analysis. Once we had the datasets for each university, we deleted those items without DOI (always below 5%) because they would be useless in the next steps of the process. To broaden the set of publications of each institution, we decided to do a similar process accessing Scopus. This database has a code for institutions, which we used to classify the data. In this way we obtained a new group of datasets per institution. At that moment we had two datasets per institution and per year with some items repeated. We merged those datasets deleting all the duplicates by identifying identical DOI’s.
To analyse the level of openness of the merged datasets we used Unpaywall1 to check which publications where publicly available through any of the possible ways: open access journals, individual open access through the hybrid model, publicly available without any license (bronze) and only available at a repository.
In Figures 1 and 2 we present the results obtained by each institution with the initial seven samples (2011–2017). We decided to follow the suggested classification from Pinowar et al. (2018) and also used by Bosman and Kramer (2018). In our samples we detected five different statutes: non open access (grey), fully open access (gold), hybrid open access (pale yellow), public access (bronze), only available in a repository (green).
Unpaywall provides a csv file with different fields; one of them is open access with a dual situation yes or no. Here we can determine the public accessibility of the articles. For the spectrum of openness we use other fields: “best oa license,” “best oa source” and “is journal OA,” as shown in Table 1.
Looking at the results obtained (in the first set of data in Figures 1 and 2) we can identify some patterns. First, the number of publications in open access journals has grown along the years together with the number of publications made open through the hybrid model. This increase is significant in the UB sample, probably due to its importance in the field of health sciences, while at UPC, a technical university, it is less significant. Secondly, we see an increase of the number of publications available only in a repository in the UPC sample, and especially in 2015. This is due to the changes introduced in its open access policy in 2014. From this date on, all UPC researchers must deposit a post-print version of their publication if they want it to be included in the internal assessment of the university, which is linked to the distribution of the university budget and other kinds of resources (Universitat Politècnica de Catalunya, 2014). In the following years this type of publications decreased because some of the deposited materials were still under embargo, at least at the UPC repository. We estimated that the green level would reach a similar situation as the one of 2015 or higher, when the data would be analysed again in the following years.
Finally, it is important to remark the pattern of the items published according to the bronze categorisation. Its percentage is decreasing with the years. We must remark here that we tag as bronze all those publications that are publicly available without any license, allowing reuse attached to them. In this group we included all those papers that publishers decide to make publicly available after a period of time after their publication date. Those papers have all rights reserved.
In March 2019, we repeated the measurements with an eighth sample, the one from 2018. The results are shown in Figures 3 and 4. As expected we found some changes from the previous results, especially in the percentages of green and bronze. The level of open access at the UPC for 2016 goes up to 80% especially for the increase of green as an effect of its institutional policy. In March 2019 many of the articles that were identified as non-open access were now classified as green because of the ending of embargoes. A similar pattern can be seen for the samples from 2015 to 2017. In relation to bronze, the percentages change from year to year as expected according to publishers’ policies on opening some articles after a period of time and closing others following marketing practices. In Tables 2 and 3 we summarise the changes in percentages.
|UB2011 ||UB2012 ||UB2013 ||UB2014 ||UB2015 ||UB2016 ||UB2017 |
|UPC2011 ||UPC2012 ||UPC2013 ||UPC2014 ||UPC2015 ||UPC2016 ||UPC2017 |
2.3. Comparison of the Results Obtained with Other Existing Analyses
We would like to compare these results with similar analyses published in the past. Looking at Pinowar et al. (2018), where there is a global picture of the Open Access situation, we found some data stating that 44.7% of articles published in 2015 are Open Access. In our case, at the UB this figure rises to 56.5% while at UPC goes up till 74.7%, considering the measurements made in 2018. In Table 4 we split those figures in the four different levels of openness. We notice that the level of gold is higher in UB, probably because a relevant part of its publications is on Health Sciences where the percentage of gold publications is higher than in other disciplines (Pinowar et al., 2018). Moreover, the percentages of green are higher in both institutions probably because of the existing institutional mandates and the funders’ policies. In the case of the UPC, the green percentage rises almost to 60% due to its new policy that requires the deposition of papers in the institutional repository in order to be considered for the internal assessment (Universitat Politècnica de Catalunya, 2014). We must remark here that our data come from two sources, Web of Science and Scopus, while Pinowar’s analysis only uses Web of Science.
|Pinowar et al. (2018)||UB2015 (2018)||UPC2015 (2018)|
We can also compare our results with the data provided in Bosman and Kramer (2018) in relation to Dutch universities where none reaches 50% of openness for publications from 2016 while both Catalan universities are over 50%. However, in a data set published later in Zenodo by Kramer and Bosman (2018), we found detailed data from Dutch universities with higher values. Again, we must remark some differences between this work and ours. The Dutch study uses only Web of Science outputs but includes articles and reviews. Although there are some differences, we think we can identify and compare some patterns. In Tables 5 and 6 we compare four Dutch universities with our two universities regarding publications from 2016 to 2017.
We have chosen Amsterdam and Leiden because there are similar institutions to the UB, and Delft and Eindhoven are mainly technical universities like the UPC. Looking at the figures in Tables 5 and 6, we see that the percentages of hybrid are higher for Dutch institutions. The reason may be the agreements reached in the Netherlands with some publishers allowing Dutch researchers to publish open access articles in subscription journals. Green levels are similar except the ones from the UPC probably due to its mandate. Regarding gold, technical universities reach lower percentages probably due to the fact that there are less open access journals in their disciplines in comparison with others, for instance in Health Sciences. However, there are no significant changes among similar Dutch and Catalan universities.
2.4. Existing Limitations of the Current Method
As we have already mentioned, the use of this methodology has some limitations. The sample we work with is not the whole production of each institution. We are aware that there are other publications not included in our sources and we are already thinking on how to measure those excluded items together with the items without DOI. It is possible to use a third source, the institutional CRIS and extract information in two directions. Firstly, we can get new items with DOI that can improve the working sample and secondly, even if the publication has no DOI, we might identify local or university journals that are open access and therefore can increase the percentage of openness. It is also possible to identify documents without DOI that are available in our institutional repositories and that cannot be identified by Unpaywall.
Besides the constraints from the choice of the samples, we would like to mention some limitations found when using the tools provided by Unpaywall. Sometimes not all the DOI analysed returned a result, there are items identified as bronze that should be identified as gold but the lack of a proper indication of the license makes this identification impossible. And finally, there is also a bad practice carried out by some repositories that leads to a false green classification. Some repositories provide a PDF file warning that the access to the full text of an item is closed for copyright reasons. However, the public availability of this warning file is identified as a green availability of the item, leading to a misclassification. Using our understanding of this problem we were able to improve the obtained results by doing some manual amendments. Moreover, we reported this misclassification to the people behind Unpaywall to improve their tools.
3. The Cost of Publishing Openly
We are witnessing the transition to a full open access system and publishers offer institutions some deals to flip journals to full Open Access, to offset subscriptions into vouchers (which cover Open Access publication fees), or even to get a reading and publishing agreement (ESAC, n.d.) to ensure that all output from the institution is available in Open Access. Institutions usually know how much they are paying to access digital resources but, in general, the information about how much they are paying to publish is not accurate. The reason is that some of the publishing fee payments are not centralised and they are reported by researchers through projects or departmental budgets. Moreover, the expenditure on publishing includes not only payments for open access journals or hybrid options, there are still subscription based journals requiring a fee for publishing or for including colour images.
As a second task for the Observatory, we thought we needed to get an estimation of the total amount of money that our institutions are dedicating to the publishing system. Knowing how much we are currently paying for publishing will allow us to get the full picture when discussing transitional agreements with publishers. The main reason for developing this methodology is to allow institutions to make an initial estimation on how much they are paying for publishing open access. Again, the data provided in this analysis is just an estimation with some limitations, but it will help us to figure out the magnitude of the costs of switching to OA publishing as default in relation with the current expenditure on subscriptions.
3.1. Methodology of Calculating Publishing Costs
To make the estimation of the institutional expenditure on open access publishing we made some assumptions and we used also external tools like Unpaywall. Like in the process of measuring the state of open access, we used the scientific production of each institution provided by Web of Science and Scopus and we limit our calculations to scientific articles. As in the previous case, we use the same samples by filtering by the name of the institution and merging both sets by deleting duplicates.
To proceed with the estimation, we make a first assumption: the corresponding author is paying an existing article processing charge (APC). Therefore, we need to filter our sample by corresponding authors belonging to our institutions. This filtering process is not always easy because there are again some difficulties. First of all, not all the items have a corresponding author. Moreover, from Scopus we must use the values of the corresponding address because there is no indication of the corresponding author. Knowing these drawbacks, we did our best to identify the articles whose corresponding authors, corresponding addresses or even whose corresponding email addresses belong to our institutions. We can also mention here that there are cases where there are two or, rarely, more corresponding authors. In those cases, we might split the cost in halves or thirds. But in a first approach we decided to include in our estimation any item for which we found at least one corresponding author from our institutions. We filter this sample of corresponding authors with Unpaywall to find publications in open access journals or published with a hybrid option.
There are many methods to get the estimation from this final filtered list. We were suggested to use an average APC cost, for instance the one provided by the initiative Open APC2 from the INTACT Project (INTACT, 2015). But we wanted to get a better approach. This is the reason why we started to gather information from the Directory of Open Access Journals (DOAJ), publishers’ web sources and other sources to build an APC database. When we were building such database, we discovered that Ryan Regier, a librarian from Canada, had already built it and we decided to use his program APCDOI.3 One of the advantages of Regier’s programme is that he has gathered more than 21.000 journal fees and the programme translates the currency exchange in Euros.4
In this methodology we have to introduce some elements that could be different from one institute to another. First of all, we are aware that some universities have agreements with open access publishers to get some discounts. Secondly there are publishers providing offsetting schemes that allow to get some vouchers to publish in closed journals without paying more. And, finally, we might also take into account that some universities have funds for publishing in open access and in general the institution can keep track of this expenditure.
|Number of articles with APC||Estimated expenditure||APC average|
|2017||353||596,931.36 €||1,691.02 €|
|2018||427||743,879.00 €||1,742.10 €|
|Number of articles with APC||Estimated expenditure||APC average|
|2017||145||185,835.15 €||1,281.62 €|
|2018||170||243,473.90 €||1,432.20 €|
The total number of open access articles with a corresponding author from the UB in 2017 was 603 (467 golds and 136 hybrids) while in 2018 it was 624 (512 golds and 112 hybrids). In the case of the UPC, this number was 213 (170 golds and 43 hybrids) in 2017 and 214 (164 golds and 50 hybrids) in 2018. Looking at Tables 7 and 8 it is clear that some of these gold articles have been published without any APC. Moreover, the average values of APC are below the mean provided by the Open APC Initiative that is now around 1,900 €. The UB values are higher than the ones from UPC probably because the journals where some disciplines publish have higher APC, especially in Health Sciences.
To make these estimations we have applied some specific conditions:
- Both institutions have a membership with the publisher MDPI that applies a discount to the official APC fee. In the case of the UPC the discount is 10% at no cost, and for the UB it is 25% with a cost that we will include as a known expenditure.
- Catalan universities had since 2015 an agreement with the Royal Society of Chemistry under the program Gold for Gold that offered a limited number of vouchers to publish open access in subscriptions venues. Those vouchers have been considered in all the estimations because the agreement ended in 2018.
- In the case of the UB, we have also excluded from the estimation the publications that have received a grant from the institutional fund (Universitat de Barcelona, 2019). This fund for publishing in open access venues was established in 2010 and in 2017 92 publications received from it, while in 2018 there were 101 who did. This amount is included as a known expenditure in the final estimation as shown in Table 9.
|Year||Known expenditure||Estimated expenditure||Total estimation|
|UB||2017||102,456.63 €||596,931.36 €||699,387.99 €|
|2018||100,055.91 €||743,879.00 €||843,934.91 €|
|UPC||2017||0 €||185,835,15 €||185,835.15 €|
|2018||2,165 €||243,473.90 €||245,638.90 €|
Finally, we have added to the estimated expenditures the amounts we know that we have paid: in the case of the UPC the contribution to the SCOAP3 project, and in the case of the UB this same contribution, the membership with MDPI, and the annual fund for publishing.
The final figures are shown in Table 9.
There are several limitations that must be outlined, some of which are outlined above. The first one is the uncertainty of who actually paid. Our assumption is that the corresponding author or the correspondence address identifies the researcher who paid and therefore the institution that carries the expenditure.
As in the previous methodology, we must remark that the outputs we used were extracted from two external sources, Web of Science and Scopus, and there might be other outputs not included in those sources that could also include some cost for publication.
A third factor is the currency of the APC. Some publications provide their publications fees in several currencies but some only use their local one. This leads to some variances when using the programme for calculating the estimate. Currencies oscillate along the year, which brings another inaccuracy in our estimates.
And, finally, we must consider that sometimes authors can get discounts or waivers for their publications. These discounts are impossible to detect in the estimations. We have included in our calculations the known memberships or vouchers programs provided by publishers to our institutions but is has been impossible to identify individual agreements if they exist.
The Open Access Observatory project wants to become an international reference in the open access panorama. We want to provide as many data as possible from Catalan universities in relation to this topic. We think it is fundamental to have data and evidences before discussing any process of transformation towards a full open access scholarly communication system. Moreover, the data could be used to compare our institutions with any institution in the world and identify trends and changes.
We think we have achieved our first objective which was to introduce a couple of methodologies for measuring the situation of open access and to estimate the publishing costs when choosing to publish open access. Our second objective was to invite the rest of Catalan universities into the project and also include research institutes. Once we will have all the performance institutions we might have the full picture of the Catalan research system. At the time we presented the project at the 2018 Annual LIBER conference (Rovira and Labastida, 2018) we already had on board the Open University of Catalonia. Now, when writing this paper, ten Catalan Universities have already joined this initiative. And the network of Spanish university libraries is using the first methodology to draw the current situation of open access in Spain.
Finally, as a third objective, we want to keep working in the field, provide more data and share them with everyone. If we are able to provide data that could be used for the transition to open access, we think we will succeed in our ultimate goal, to become a reference for the open access movement.
Data: The full data for the publications of both UB and UPC for the years 2011–2018 (version 2019) are available in the Harvard Dataverse at https://doi.org/10.7910/DVN/QKAMHR.