Since the mid 1990s, the Web has been a center for information discovery, retrieval, and creation. Seemingly overnight, the Web changed how library users found and utilized information. Where once there was a physical door to the library, there is now an electronic door in the form of the library’s web site, the gateway to library subscription databases and other discovery tools. Yet as the Web (and information technology in general) has matured, finding and retrieving information has become more fragmented. Users previously had two choices not so long ago—get it from the shelf or find it electronically via a database. There are now infinitely more options for finding information, including globally available search tools like Google Scholar and academic communities, such as Academia.edu and ResearchGate.
There are also more options for storing and sharing information. Users may adopt citation management software to store, annotate, and cite works, including Mendeley, Zotero, or Endnote. Broadly speaking, citation management software has the capacity to change how users manage their scholarly information collections. Citation management software provides a central place for several activities in the scholarly workflow—storing, annotating, citing, and sharing. At present, discovery of new materials is available within these tools, but user adoption of discovery options is not high, and comes with a learning curve for users accustomed to searching the web for information. Citation management software is beginning to be seen as part of a portfolio of research management tools (see Elsevier’s acquisition of Mendeley, and Digital Science’s development of ReadCube and acquisition of Papers—both information management tools). Citation management software may be more connected with other tools central to the research workflow, including lab notebook, data management, and research collaboration software (Schonfeld, 2018). This could raise the visibility and adoption rates of these tools in the future.
This article explores the concept of embedding discovery within citation management software with an eye towards other software environments in the research workflow that might also incorporate discovery. First, we review efforts to place information retrieval within citation management software, and explore related literature on situating discovery more intuitively for researchers. We look at the most frequently adopted citation management software in current use. Building upon this prior research, we describe the results of interviews with Penn State faculty members, focused on their research workflows. This research follows the findings from a qualitative, ethnographic study funded by the Andrew W. Mellon Foundation, analyzing how users at Penn State University, University Park, find, store, annotate, cite, and share information resources (Antonijević & Cahoy, 2014). The findings from this study indicated that two pervasive areas of disconnect exist within the scholarly workflow: discovering and saving new materials, and archiving self-authored work. We share our recent results, which confirm that faculty users welcome access to library resources within software significant to their individual research workflow. Our interviews also found a desire for more automated services (anticipatory completion of citations, analysis of existing bibliographies), as well as an enthusiasm for commercial services such as Google Scholar and Academia.edu.
As discovery shifts away from web-based platforms and fully situates within software and other platforms, libraries must begin locating discovery services within these tools as well. This choice will help users more intuitively find relevant information sources while connecting those resources to other critical phases in the scholarly workflow, thereby maximizing productivity and minimizing information loss.
2. Literature on Discovery—on the Library Web Site and Elsewhere
Ithaka S&R has been a major source for research on faculty members’ research behavior and discovery practices. The most recent report, the Ithaka S&R Faculty Survey 2015 looks again at where researchers begin their research—a question that has been explored in this series since 2003 (Wolff-Eisenberg, Rod, & Schonfeld, 2016). In previous years, faculty were more likely to use a discipline-specific, electronic subscription database (such as Web of Science, for example) than they were to use a broader search tool (such as Google or Google Scholar). The 2015 report notes a shift towards the broader search tools, with faculty equally as likely to search a search engine or a subscription database to serve their research needs. However, the report also discusses the rise in use of the library web site as a discovery portal (a trend which has been on the rise since 2012, perhaps coinciding with the library web development trend towards creating a front page that functions primarily as a search interface (Schonfeld, 2015, p. 12).
In “Meeting researchers where they start: Streamlining access to scholarly resources”, Roger Schonfeld outlines the six areas of failure relative to discovery and libraries, including the difficulty of off campus access and the shrinking profile of the library web site as a singular search destination. He notes that “Mechanisms for content access succeed only when they conform to Lorcan Dempsey’s observation that ‘discovery happens elsewhere.’ Authentication and authorization to licensed e-resources must work effectively without regard to the researcher’s starting point” (Schonfeld, 2015, p. 3). Further, Schonfeld mandates that, “To understand researcher practices, user experience specialists both in a library and a content provider setting should examine the researchers’ actual practices. Rather than trying to focus on specific tasks related to the system that their current project covers, as is all too often the approach taken, a more holistic, ethnographic perspective is vital” (Schonfeld, 2015, p. 13).
In “Thirteen ways of looking at libraries, discovery, and the catalog: Scale, workflow, attention”, Dempsey (2012) discusses the movement of user search behaviors from the local library catalog to the “network scale”—for example, searching Worldcat.org, a consolidated catalog, for a locally owned book. In this environment, he notes that “syndication and leveraging strategies” are needed, including connecting to networked resources, such as link resolver recognition within Mendeley or Google Scholar (Dempsey, 2012, p. 2). In this respect, the local catalog remains the data source, but the user accesses the data via a more globally available resource. Dempsey states, “The use and mobilization of bibliographic data and services outside the library catalog is an increasingly important part of library activity. This is especially important as ‘discovery increasingly happens elsewhere’—in other environments than in the library” (Dempsey, 2016, p. 12).
“Thinking the unthinkable: a library without a catalogue—Reconsidering the future of discovery tools for Utrecht University Library” describes the challenges facing libraries with regard to search and the significance of a destination web presence (Kortekaas, 2012). Citing user study trends, the Utrecht University Library decided to implement several approaches with regard to discovery: the decision not to implement a large discovery search tool; a commitment to embed library collection-related metadata to global initiatives and re-envisioning the local online public catalog as a tool primarily for known item searches.
Grant identified the concept of discovery within context specific software as a ‘knowledge creation platform’ (Grant, 2013, pp. 67–73). He highlighted the components of this platform as discovery, social networking, ready access to library expertise and services, and, perhaps most significant, “integrated tools for creating new knowledge.” Grant further specified that the knowledge creation tools “should cleanly integrate within the interface of the KCP so that again, the end-user does not need to step out of the interface in order to actively work on their research or assignment” (Grant, 2013, p. 71).
3. Discovery and Reference Management Software
Recent research articles on citation management software (from within a library-focused lens) have focused primarily on the library’s provision of support for citation management software users (Childress, 2011; Francese, 2013; Rempel & Mellinger, 2015). Less common are articles looking at the software functionality itself, and opportunities for embedding library services, including discovery options. Lubke, Paulus, Britt, and Atkins (2017) look at options for unifying the workflow steps connected with the literature review process, but do not address discovery, instead beginning the workflow with the act of PDF storage and annotation.
“As the process of citation management changed, the social dimension became more important, as citations are ‘social objects’ around which connections can be made. Today, citation management programs not only provide a repository in which you can store your work, but also allow you to share your work and to search the work of others” (Dempsey & Walter, 2014). Dempsey (2016) notes the significance of citation management software for academic libraries beyond simply supporting use of these products: “As some of these researcher-facing ‘productivity’ services are repackaged as licensed institutional offers, libraries will face important decisions about sourcing and procurement of workflow support services.”
4. Discovery Within Citation Management—An Overview of Current Software
Citation management software became readily available in the 1990s, and has experienced increased user adoption since that time. Originally envisioned primarily as a tool for citing sources and building bibliographies, citation management software has recently begun to increase its scope. Scholarly publishers, such as Elsevier and Nature/Digital Science, see the utility of embedded discovery within citation management software, and have developed discovery options within the native interface. In order to clearly illustrate the current state of discovery within citation management software, we present an overview of the major providers in this area, and the discovery and other workflow innovations included within individual software.
The criteria for inclusion in this overview included the following factors:
- The citation management software is an established tool with adoption in academia.
- The citation management software allows for storage of citation data and pdfs.
- The citation management software works collaboratively with Microsoft Word and other word processing software to produce formatted citations in a style of choice.
Optional criteria included:
- The citation management software allows for pdf annotation and note taking within the software.
- The citation management software includes a search or other discovery function within the software to search for and identify other relevant scholarly works.
- The citation management software includes ‘smart’ analysis of the user’s library and suggestion of other possible relevant works of interest.
- The citation management software allows for authentication to the user’s academic institution, including library resource access.
Web-based citation generators, such as EasyBib or Citation Machine, did not meet the required criteria and were therefore not included in this review.
One of the oldest reference managers, Endnote, has been in existence since the 1990s, and remains the most traditional software program of its kind. Endnote is produced by Clarivarate Analytics, and as such, is embedded within another company product, Web of Science (Clarivate Analytics, 2016). In Web of Science, Endnote users can save citations to Endnote Web or export citations out to the desktop Endnote. Endnote’s discovery options within its web and desktop interfaces are more limited than similar software, such as Mendeley or ReadCube. A ‘Find Full Text’ option allows users to match citations with affiliated PDFs, through an automated search of large repositories. ‘Capture’, a recently released Endnote bookmarklet tool, allows capture of citations from within a web browser. Users can also search for and retrieve citations through a Z39.50 search tool that includes many library catalogs. This search tool only retrieves citations, and while being one of the first examples of discovery within citation tools, it remains rudimentary in its approach and in its search results. Unlike Mendeley, analysis of a user’s reference library (and resulting suggested relevant articles) is not available.
Mendeley, citation and reference management software for the Web, desktop, and mobile device, was founded in 2007 (Shema, 2012). Initially, Mendeley’s most significant focus was bibliographic citation management, including integration with Microsoft Word and other word processing software. In recent years, Mendeley has begun to expand its reach as a platform for collaboration and information discovery. Mendeley’s catalog of user-uploaded citations and papers is an expansive data collection unto itself. While lacking the standardization of a traditional catalog, Mendeley users can search the Mendeley catalog from within the application, saving citations and PDFs to their reference library. On the Mendeley web site, users can also personalize their account to add finding full text options linking to their home institution resources. A newer feature, Mendeley Suggest, analyzes an individual user’s library and provides relevance ranked articles based on works in their existing library. In 2013, the publishing conglomerate Elsevier bought Mendeley. The acquisition has (to date) not changed how users access articles from within Mendeley. It has broadened how users can access Mendeley, including the embedding of Mendeley services within Elsevier-provided databases (such as ScienceDirect).
Like Mendeley, Zotero is a more recent tool (released in 2006) and has a web and standalone version available to users (Zotero, 2008). The original version of Zotero existed entirely within the Web browser (as a plugin for Firefox/Chrome/Safari/IE). A later version extended a desktop version, Zotero Standalone. Until 2016, options for discovery were not available within the Zotero interface, perhaps because Zotero was situated directly within the discovery portal itself (the Web browser). A new optimization created in 2016 (and funded by a Mellon Foundation grant awarded to the author and Zotero) allows for user created feeds within the Zotero interface. These feeds can be for a search, a journal title (and affiliated new articles) or other relevant syndicated content. The user views the feeds within the Zotero interface, and selects specific items from the feed to add to their Zotero library. There is a complexity to this feature, in that a user must know how to create and put to use an RSS feed. As this is a very new service (still in beta testing) within Zotero, its utility and level of usage remain to be seen.
ReadCube, software developed in 2011 by Digital Science & Research Solutions, is a reference management program with search at the center of the interface. New users are encouraged to deposit their PDF collections within ReadCube, which then processes the articles by DOI and other metadata, and (in the paid version) allows users to automatically connect with and retrieve articles citing and cited by articles in the user’s collections. In other words, ReadCube places discovery of new materials entirely within the user interface. ReadCube also has agreements with several large publishers, including Nature Publishing Group, Frontiers, and Wiley Publishing to feature their journal articles (including the option to purchase access to articles) within their interface. The user has the option to search several different catalogs from within the ReadCube interface, including Google Scholar, PubMed, and the ReadCube catalog. The option to authenticate with an institution specific proxy to aid in full text discovery is also available. ReadCube also recently acquired Papers3, another reference management tool featuring embedded search options (Digital Science & Research Solutions, 2016). As Mac only software, Papers3 features embedded search similar to ReadCube, but also offers enhanced citation management capabilities. Digital Science & Research Solutions’ acquisition of Papers3 may be with an eye towards bringing the strengths of Papers3 and ReadCube together within one tool in the future.
4.5. Other Citation Managers
A wide range of reference management systems exist for users with specific needs. ProQuest owned Refworks is an older tool, created in 2001 and entirely Web-based. Marketed heavily to libraries, access to Refworks can be embedded within library databases. Refworks also takes institutional affiliation and employs it within the interface as an institution specific link to full text articles (Refworks, 2016). Sente reference management software is Mac only, and features an embedded browser directly within the interface. The user does not need to leave the Sente environment to search for and retrieve resources. Sente’s ‘targeted browsing’ features allow users to search for information on supported websites, seeing automatically which articles are already in their library. Within Sente, options also include the ability to automate regular searches of selected databases (PubMed, Web of Science, Z39.50 tools) (Third Street Software, 2016). While discovery is more separate in Sente, it is integrated within the software in a manner that allows users to find, store, annotate, and cite from one tool. Perhaps because they are Mac only, Sente and Papers3 have not experienced the adoption levels of Mendeley, Zotero, and Endnote. Other tools with smaller user bases (not studied in this article) include BibDesk, JabRef, Citavi, CiteULike, and Connotea.
Integration of library discovery services within citation management software remains limited. Of the tools detailed in this article, there are few options to link directly within a reference management interface to library databases or other services. ReadCube provides perhaps the best access, asking users explicitly for their institutional affiliation and authentication information. ReadCube also prompts the user to login for authentication as the article retrieval process automatically begins within the interface. Endnote provides the opportunity to enter an authentication URL and an SFX resolver in preferences to aid in connecting with article PDFs via ‘Find Full Text’. Endnote Z39.50 search of library catalogs and other resources is rudimentary and perhaps one of the first examples of a search tool within citation software. Endnote also supports searching selected subscription databases (although this feature typically only works via a private subscription rather than an institutional one). Zotero features RSS feeds for discovery, which could be generated from library subscription databases. Mendeley previously offered the option (on their web interface) to enter an authentication URL and connect through the library to subscription articles. In 2016, this option was changed to only provide a DOI search to a journal provider (Gunn, 2016). Mendeley also features a newer service, Mendeley Suggest, which analyzes the user’s library and offers recommended citations based on the user’s library data. ReadCube offers a similar service as well, and their enhanced PDF optimization makes references within an individual article clickable, simplifying retrieval of related works. If anything, the current development trajectory for reference management software indicates that emphasis on the journal provider, rather than the user’s academic institution, will remain at the forefront in the near future. Work and advocacy is needed from academic libraries to ensure greater recognition of the important role of library services and institutional authentication in reference management software use.
5. Methodology and Prior Study Results
In 2012, the author received a grant from the Andrew W. Mellon Foundation to conduct research on faculty management of information within the scholarly workflow, including discovery and self-archiving of significant works (Penn State University, 2016). This grant was followed up in 2014 by another Mellon Foundation grant, enabling further research on the scholarly workflow, including software development by George Mason University’s citation management software, Zotero, to embed new options for workflow management within the Zotero environment (Penn State University, 2014).
5.1. Study Results: 2012 Study on Faculty Scholarly Workflow
The results of the 2012 study on faculty scholarly workflow were shared in the article, “Personal Library Curation: An Ethnographic Study of Scholars’ Information Practices” (Antonijević & Cahoy, 2014). The article presents the results of a web-based survey of scholars (n=196), as well as the analysis of ethnographic interviews with 23 Penn State faculty members during the same time period. The survey and interviews indicated, across faculty, a preference for electronic searches for information sources, more often using commercial sources (such as Google or Google Scholar) than more local, library-based resources (although Humanities researchers were more likely to start with library databases). Faculty also relied heavily on their own personal collections of article PDFs and data. With regard to citation management software, the 2012 study found limited use of the software, slightly more than 50% of surveyed Penn State faculty in the Sciences, and 30% in the Humanities. Faculty noted dissatisfaction with citation management software as a reason for non-adoption. When queried, survey respondents indicated as a majority that the responsibility for education on workflow management resided with the scholar, and not the library or campus librarians. With the results of the survey and interviews combined, the study found overall that faculty experience a pervasive disconnect between the activities of finding information (typically in a web-based, commercial service) and annotating and citing the information (within Microsoft Word or citation management software). Similarly, the act of archiving was also disconnected from the research process, with a majority of respondents indicating that they had lost important files or data. With this portrait of a disconnect existing within the scholarly workflow, particularly within the areas of discovery and self-archiving, the 2014 study was created to begin to explore and address this need.
5.2. 2014 Study on Software Optimizations and Impact on the Scholarly Workflow
In the second stage of our study, we conducted usability testing and interviews, focused on new enhancements in the areas of discovery and archiving added to Zotero, citation management software. The author worked with Zotero software developers to embed new functionality within Zotero, based on the findings re: the disconnectedness of the faculty scholarly workflow within the first phase of this study. Two specific enhancements were added to Zotero (and as of 2016 are publicly available to all users) as a result of the initial study (Takats, 2014). The first, addition of RSS feeds, addressed issues with the disconnected nature of discovery in relation to citation management software. The capability to add RSS feeds of any kind (including those pointing to journal level table of contents or targeted article or database keyword searches) was embedded within the Zotero interface. The user identified the RSS feed (from outside Zotero) and then entered the feed URL into the Zotero interface. Once the feed was accepted, it would begin retrieving results, which the user could browse within the Zotero interface, and selectively decide whether to add individual results to the user’s Zotero library. The second optimization focused on self-archiving of authored works. Zotero created a new ‘My Publications’ folder (the feature is also seen in other citation managers, such as Mendeley and Endnote). The ‘My Publications’ folder is intended to be populated with the user’s authored works. Once in the folder, the user may decide on access levels for publications, and may feature the publication(s) on their public Zotero profile. An additional enhancement created as a result of this study connected Penn State’s institutional Hydra-based repository, ScholarSphere, with the ‘My Publications’ folder, so that Penn State users could easily self-archive both on the Zotero server and within the Penn State IR. The code to connect a Hydra repository and Zotero is now publicly available for other institutions use (projecthydra, 2016).
5.3. Post-usability Interview Findings
A total of eight (8) subjects, four graduate students and four faculty members, participated in usability testing of the Zotero enhancements and a follow-up discussion of the utility of the new features. The small sample size for the usability testing was taken from Nielsen and Landauer’s recommendation (1993) that found the highest ratio of benefit to cost was found in a usability testing pool of between six and eight evaluators. The participants were divided between the social sciences and the sciences, with one Humanities graduate student. Audio recordings and transcripts were created for the sessions as well. The post-usability testing interviews lasted approximately 30-45 minutes in length. After usability testing concluded, each interviewee was asked a series of questions related to Zotero software and their own scholarly workflow needs (Appendix A). A majority of the participants had also participated in the 2014 study, and used citation management software regularly in their work. Although broad conclusions perhaps cannot be drawn from the interview findings, this pool of expert and enthusiastic citation management software users’ feedback and ideas provide food for thought on the future of these tools. While the initial Zotero optimizations proved challenging for usability testing, the post-usability testing interviews brought forward unique findings on what users expect and hope for with regard to workflow tools.
In the interviews, a majority of subjects referenced the disconnectedness of their scholarly workflow, and indicated the need for increased discovery options within the citation management software interface. General trends that emerged across the interviews addressed ‘smarter’ functionality within software (citation management software and word processing software), the value of commercial, broadly available scholarly services (such as Google Scholar, Academia.edu, and Research Gate), and a perceived lack of value for local storage and social networking services, including the institutional repository. While there are detailed findings in all of these areas, we will focus in this article on the findings related primarily to discovery and within the workflow as a whole.
5.4. Automating and Connecting the Scholarly Workflow
In general, greater automation of the scholarly research process was desired by multiple interview subjects. Several subjects mentioned the ability for citation management software to automatically ‘complete’ incomplete citations (according to individual citation style needs) without intervention or direction from the user. Another subject mentioned ‘anticipatory’ automation, where (within Microsoft Word) the citation management software would automatically complete a citation based on the references discussed in a specific paragraph. A tenured faculty member shared a ‘wish list’ of optimizations that he termed as “customizable automation”: these included natural language searching, validation of localized services from within the tool, and notifications when new citations are found so that users can validate entries.
5.5. Discovery Feeds
Four of the subjects interviewed in the study indicated that they already receive new content alerts (for new journal articles, etc…) in their email accounts. All of the interview subjects were positive about the new discovery RSS feeds within Zotero. The graduate students preferred to have email alerts continue, in addition to receiving new citation feeds within Zotero. One graduate student noted that this dual notification would be a good reminder to go back into Zotero and engage with new sources. A tenured faculty member said of the utility of the feeds, “I think the main role is obviously discovering a new work. Since I do some variations on that, I would probably use it. It’s interesting now that I really think about it. I used to use RSS feeds all the time.” This faculty member also noted that “It’s much more random now than it used to be. It’s much more pull rather push.” New research (for whatever reason) had stopped flowing naturally to his workflow, and he welcomed an option that might change this. Another faculty member stated that he saw the feeds as a positive enhancement, yet would not use them within Zotero. He preferred to have his alerts continue to arrive in his email, where he could search his entire email collection to find and retrieve specific items.
5.6. Embedded Discovery Services
All of the usability subjects were enthusiastic about multiple services (including library authentication and content) within the citation management software interface. In one participant’s words, it would give her the ability to “multitask on one screen.” Another was positive about this, and stipulated the institution-specific authentication must also be integrated into the interface in order for this enhancement to be useful. A faculty member expressed a desire for natural language searching within the citation software, as well as the ability to automatically receive new relevant citations within the interface, with an alert for the user to validate and accept citations. Another faculty member had a more intricate idea for embedded services—analysis of existing bibliographies, combined with embedded discovery and authentication to bring relevant new works automatically to the user. In essence, this idea is that the citation management software looks at publications the user has written. It extracts the data from the publication bibliography, and retrieves cited works that are not currently in the user’s library, for the user to accept. It also does analysis on relevant works (based on the cited works in the bibliography) and asks the user to accept those citations as well.
6. Discussion and Conclusion
Our post-usability testing interview participants were uniformly focused on using commercial search and software tools (Mendeley, Zotero, ResearchGate, Google Scholar), and saw the benefits of accessing and utilizing resources on platforms that are not primarily locally developed. They were open to embedded discovery services, and utilizing these services within citation management software. It seemed that perhaps the biggest user barrier was in learning how to use and integrate citation management software into one’s workflow. Once that was achieved (as was the case with a majority of our participants, with most of them admittedly expert users), the idea of adding on additional services seemed natural and realistic.
What do these findings mean for citation management software designers? For the Zotero designers, there are several clear outcomes to share. Our interview subjects were enthusiastic about discovery within the citation software interface. They expressed a desire for better automation of tasks, across the board. Software providers should consider the primacy of email as an information collection (and as an alert/reminder service) for users, offering the option of emailing users when new citations are found by the feed. Users valued their email collections, and looked to their email as a reminder to return to workflows in more disconnected applications, such as Zotero.
Software providers should also consider ways to mine user’s existing citation collections for additional recommended citations. As we previously mentioned, other citation managers have begun this service, including Mendeley Suggest and ReadCube Recommendations. It makes simple sense to mine the data that the researcher has already deposited to increase the utility of the tool within the scholarly workflow. The idea suggested by one of our subjects to pay special attention to importing works already cited in the researcher’s publications would again likely be a huge value-added service for users. Customization, automation, and predictive (i.e. smart) services are what the users in our study clearly wanted.
There are also recommendations for integration within word processing software. Our participants were as a group, clear that they wanted better integration with Word processing software, including infusing discovery into the word processing environment. How could this work? It might mean embedding semantic web capabilities within word processing software, for uses such as predicting / completing citations for the user, finding works attached to quotes used in text, etc… This is a new area of development for citation management software and research workflow software in general, and one that should be taken seriously.
Like the conclusions leading away from local tool provision in Kortekaas ‘Thinking the Unthinkable’ (2012), these findings are a clarion call for academic libraries’ discovery, storage, and instructional strategies. The significant ‘critical mass’ of other researchers that one of our subjects referenced is not present on local tools, such as the online catalog or the institutional repository. The general focus of the post-usability testing interviews was to determine the utility of discovery and archiving within the Zotero interface. The findings were unanimous among our subjects that localized content and services are welcomed within citation management software. The challenge now is for software providers, publishers, and academic libraries to begin embedding content where our users are rather than where we want them to go (library web sites, publisher web sites, subscription databases). This requires cross-institutional work on the part of academic libraries; large developments like this can’t occur on a campus by campus basis. It also means that academic libraries must begin to give up local development of services that are not heavily or intuitively used by their core user groups. From an ego or vanity perspective, this will be difficult for academic libraries. Yet, if it means that the resources a user needs are directly (to borrow a phrase from Lorcan Dempsey) “in the flow” when and where they need it, haven’t the library’s goals as a content provider been met? (Dempsey, 2005) With a continued focus on how users find information outside the library web site, and beyond that, outside the web browser, software providers and libraries can begin to close the gap and bring resources to users more easily from directly within their research workflow.