Understanding and Making Use of Academic Authors’ Open Access Rights

INTRODUCTION Authors of academic works do not take full advantage of the self-archiving rights that they retain in their publications, though research shows that many academic authors are well-aligned (at least in principle) with open access (OA) principles. This article explains how institutionally-assisted self-archiving in open access repositories can effectively take advantage of retained rights and highlights at least one method of facilitating this process through automated means. METHODS To understand the scope of author-retained rights (including the right to purchase hybrid or other open access options) at some sample universities, author-rights data through the SHERPA/RoMEO API was combined with individual article citations (from Thomson Reuters’ Web of Science) for works published over a one-year period (2011) and authored by individuals affiliated with five major U.S. research universities. RESULTS Authors retain significant rights in the articles that they create. Of the 29,322 unique articles authored over the one year period at the five universities, 28.83 percent could be archived in final PDF form and 87.95 percent could be archived as the post-print version. Nearly 43.47 percent also provided authors the choice of purchasing a hybrid paid open access option. DISCUSSION A significant percentage of current published output could be archived with little or no author intervention. With prior approval through an open access policy or otherwise, article manuscripts or final PDFs can be obtained and archived by library staff, and hybrid paid-OA options could be negotiated and exploited by library administrators. CONCLUSION Although mandates, legislation, and other policy tools may be useful to promote open access, many institutions already have the ability to increase the percentage of accessible works by taking advantage of retained author rights and hybrid OA options. © 2012 Hansen. This open access article is distributed under a Creative Commons Attribution 3.0 Unported License, which allows unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. RESEARCH jlsc-pub.org | Journal of Librarianship and Scholarly Communication Implications for Practice: • By combining rights data from SHERPA/RoMEO and aggregated citation data using relatively simple automated means, librarians can create an accurate and detailed understanding of the copyright situation for almost all institutionally-affiliated journal articles. • Universities with open access policies can leverage the underexploited rights that faculty already retain by identifying, downloading, and then posting to institutional repositories the articles for which faculty retain final PDF archival

• By combining rights data from SHERPA/RoMEO and aggregated citation data using relatively simple automated means, librarians can create an accurate and detailed understanding of the copyright situation for almost all institutionally-affiliated journal articles.
• Universities with open access policies can leverage the underexploited rights that faculty already retain by identifying, downloading, and then posting to institutional repositories the articles for which faculty retain final PDF archival rights.
• Identifying and posting final PDF versions of scholarly articles avoids potential faculty author concerns about disseminating pre-or post-print manuscripts while simultaneously building repository collections.
• Libraries and universities can identify and explore funding options for exploiting the large percentage of works for which paid hybrid OA options are already available.

INTRODUCTION
This paper examines the scope of author-retained rights in journal articles that were written by academic authors at universities with institutional repositories and with open access policies. The purpose of this examination is to highlight the rights that academic authors currently retain and to illustrate the ways that those rights might be more effectively leveraged to increase accessibility to their research.
Academic authors form a part of the large group of authors for whom copyright's economic incentive structure has little or no effect on their decision to create (Carroll, 2010). Instead, the desire to see their work distributed and made widely available has spurred efforts to make research available online on an open access basis. Although definitions of what exactly "open access" means vary according to a variety of formal declarations about the level of access and use permitted (Budapest, 2001;Berlin, 2003;Bethesda, 2003), the basic value is "unrestricted access and unrestricted use" (PLoS, n.d.). A large body of literature defines the intersection between the statements about open access and the variety of publishing, economic, and policy models that are used to achieve it (Suber, 2012;Bailey, 2012). Leaving aside the relative merits of open access and the practical, economic, and policy reasons for its widespread adoption, this paper focuses on the contractual arrangements through which open access is currently achieved and the areas where existing technological and legal options could be exploited to further increase access to works in which authors already retain open access-compliant rights.
Although academic authors do not typically rely upon the potential economic benefits of copyright, the law vests those rights, at least initially, in all authors regardless of their motivations for creation (U.S. Code, Title 17, Sec. 201, 2006). Traditionally, authors of scholarly articles assign or grant an exclusive license for all of their rights under copyright to the publisher of the journal in which the article will appear (Smith & Hansen, 2010 (Shieber, 2009a;Morrison, 2008). Even publishers that rely on traditional subscription journals now offer a hybrid "paid OA" option for authors that wish to make their individual articles freely available to the public (Elsevier, 2012). The majority of the approximately 1.5 million peer-reviewed scholarly articles published each year are not, however, made available through gold or hybrid-gold open access journals (Björk, Roosr & Lauri, 2008).
'Green open access' is the second major way that published articles are made available on an open access basis, whereby the author herself retains and then exercises her rights to post her own articles freely online. Self-archiving was once outlined as a "subversive" tactic that could both mollify and force the restructuring of publishers that operated the existing system of print subscription publications (Hanard, 1995). Today, most major publishers allow for at least some form of self-archiving in their publication agreements, yet research shows that authors do not regularly take advantage of those rights. Average author self-deposit rates in institutional repositories hover around a meager 15 percent of total article output (Harnad, Carr, Swan, Sale & Bosc, 2009 (2012), aims to prohibit such funding mandates from taking effect. Apart from government and funder efforts, deposit mandates have also been adopted by individual employers. For example, the University of Liege requires that its researchers deposit articles in its open access institutional repository; as an enforcement mechanism, the university will only review deposited articles in its internal promotion and review processes (University of Liege, 2008).
Sometimes referred to as "mandates," a second way to increase deposit rates is through university open access policies-which have similar goals, but weaker enforcement than the Liege mandate. These policies are more accurately described as "weak" faculty-adopted policies with two important characteristics: (1) forgiving and almost automatic opt-out mechanisms for authors (hence low compliance rates), but (2), strong licenses that allow the university to exercise rights over facultyauthored works. These institutions, following the "Harvard Model" open access policy (Sheiber, 2010), effectively have the authority to archive on the author's behalf. For versions of articles that can be archived under existing author-contracts and that can be obtained without faculty intervention (i.e., final published PDFs), institutions can easily deposit those articles into their repositories without further author involvement.
Providing a means to identify and deposit articles with minimal author involvement is at the heart of the third strategy for increasing deposit rates: providing faculty assistance. Assisted deposit (e.g., deposit by librarians or university administration on behalf of faculty authors) significantly improves deposit rates, but often comes at a high cost of providing adequate staffing to provide the service (Xia, 2007). Moreover, without the proper infrastructure and support, staff often do not have authority or access to the versions of the articles that faculty are permitted to deposit under their publishing contracts.
Beyond the issue of infrastructure, one of the key problems that face both mandated and assisted deposit is the unclear nature of the rights that authors retain in their articles; rights are often unclear, split up, and difficult to manage. Although this is a common problem with copyrighted works in the digital realm where reuse is so often desired (Van Houweling, 2010), efforts to catalog the rights of authors in scholarly journal articles has progressed to the point where it is possible to make reasonably certain statements about the open access rights that authors retain in the particular articles that they have authored. A database of standardized author-publisher contracts, SHERPA/RoMEO, allow those rights decisions to be automated (SHERPA/RoMEO, n.d.). This paper outlines a preliminary review of what that automated rights analysis might look like, and explains how it could be leveraged to increase access to scholarly research articles. By combining this review with a discussion of the opportunity presented by hybrid open access options, the author hopes to highlight a way forward for institutions that wish to improve the accessibility of their faculty's scholarly articles, whether through exercising existing rights to self-archived deposits, through hybrid author fees, or through a combination of both strategies.

LITERATURE REVIEW
Several studies have previously explored the aggregate level of rights retained by authors. These studies, using publishing data and similar high-level inputs, explain the general state of either rights available to authors (Harnad, Carr, Swan, Sale & Bosc, 2009), or rights exploited by authors (Björk, Roosr & Lauri, 2008). Others have explored in detail the particular types of rights retained and the meaning and methods of bargaining for particular contractual language (Fitzgerald & Long, 2008;Duranceau & Anderson, 2009). Such studies are useful in evaluating the willingness of publishers to adopt open access-compliant policies, though they do so at a level less granular than this study. Harnad, Carr, Swan, Sale & Bosc (2009), for example, report that 69 percent of articles in a given set could be archived under existing policies in their post-print version, and that 29 percent could be archived in a pre-print version. No data on final PDF versions or paid (hybrid) open access options is given. One study, by Mercer & Emmett (2005), does report data using the SHERPA/RoMEO database-the same author-rights contract database that this paper employs-which does give information on these factors, but they do so by manual evaluation and therefore look only at a small sample of articles. • The version of the article: In general, publishers provide for differing access rights to pre-print versions (the article before submission to the journal), post-print version (the version of the article after peer review but before final formatting) and the final PDF of the article (the version that appears in the published journal).
• The time period of availability: While some publishers allow access-rights to trigger immediately, many others put in place embargo periods during which the author may not exercise her archiving rights. Time periods generally range from six to 24 months.
The five universities that are the subject of this study were drawn from member institutions of the Coalition of Open Access Policy Institutions (COAPI), which is made up of 46 member schools from North America (COAPI, 2012). From those 46 member universities, five were selected as a sample to evaluate the scope of rights retained by authors. Because the citation database draws university affiliation from the same field as the author's address, schools with names that were similar with geographic areas (Kansas University, for example) were excluded to avoid over-including articles that were authored by other non-affiliated authors in that same state. The five selected universities were: Duke University, Emory University, Massachusetts Institute of Technology, Princeton University, and Stanford University. Each university has either a university-wide or school-specific open access policy.
Citations were collected from Thomson Reuters' Web of Science, which includes the Science Citation Index Expanded, Social Sciences Citation Index, and the Arts & Humanities Citation Index, which combined cover more than 10,000 leading journals (Thomson Reuters, n.d.). It should be noted that Web of Science does not cover the entire universe of published scholarly articles and sampling only from Web of Science does represent a limitation for this study because a significant number of published articles from the selected universities may not be accounted for. However, Web of Science was selected both because of its scope and its ability to provide sufficiently detailed citation data needed for this study.
Recognizing that the total number of articles and the overall picture of authors rights may be different than the numbers shown here, results just from Web of Science still reveal a significant number of articles for which authors OA rights could be further exploited. In total, 30,454 citations found in Web of Science were associated with authors from the five universities (Table 1, following page). Accounting for duplicates (e.g., where authors from one or more universities collaborated to publish a single article) leaves a total of 29,322 unique article citations.
Citations were collected by searching Web of Science for the relevant organization in its enhanced "organization" field (Web of Science field tag "OG"), and downloading the resulting citations in tab-delimited format. Because Web of Science limits exports to batches of 500 citations, citation matches were downloaded in batches of 500 and then combined manually. These collected citations were then parsed for unique ISSN numbers, resulting in a list of a total 4,874 unique ISSNs. Using a simple script, a query was run against the SHERPA/RoMEO database for each unique ISSN. For the few journals with multiple associated publisher policies (which happened when a journal was sold or otherwise changed from one publisher to another), a manual inquiry was made to determine the correct policy for the given time period. For each unique journal, information was collected for the following SHERPA/RoMEO fields: • Title (title of journal) • Publisher (publisher of journal; some journals were affiliated with more than one publisher because ownership changed over time) • preArchiving (data about the publisher's allowance for archiving the pre-print version of the article; values include "can," "cannot," "restricted," "unclear," or "unknown") • preRestrictions (any relevant restrictions on the author's right to archive pre-prints, typically focused on an embargo period of funder requirements) • postArchiving (data about the publisher's allowance for archiving the post-print version of the article; values include "can," "cannot," "restricted," "unclear," or "unknown") • postRestrictions (any relevant restrictions on the author's right to archive post-prints, typically focused on an embargo period of funder requirements) • pdfArchiving (data about the publisher's allowance for archiving the final PDF version of the article; values include "can," "cannot," "restricted," "unclear," or "unknown") • pdfRestrictions (any relevant restrictions on the author's right to archive the final PDF, typically focused on an embargo period of funder requirements) • paidAccessURL (link to any publishers open access option for the final PDF of the article) • paidAccessName (denoted name of publisher paid open access option) • paidAccessNotes (notes about limitations on the paid open access option) • romeoColour (SHERPA/RoMEO color-code) For journals for which SHERPA/RoMEO did not have a publisher policy on file, the appropriate fields were populated with the value "unknown." This data ("Author Rights Data") was then associated with each unique article citation by matching the citation's associated ISSN with the ISSN connected with the Author Rights Data. Although there are many ways to do so, a simple method (and the one used for this inquiry) is to use Microsoft Excel's VLOOKUP tool to match ISSNs in the tab-delimited citation spreadsheet with a master list of ISSNs and its associated SHERPA/RoMEO rights data.
It should be noted that the Author Rights Data, while useful, is necessarily incomplete. For one, this data may be over-simplified; some publishers have nuanced terms in their agreement that are not entirely revealed through this data. Also, although quality checks reveal no inconsistencies between SHERPA/RoMEO Author Rights Data and the available publisher contracts, opportunities for mistakes certainly exist. Nevertheless, the SHERPA/RoMEO database is unique in its ability to provide accurate and aggregated information about private copyright agreements, both of which are needed for effective institutional assistance for authors to exercise their open access rights.

RESULTS
Final PDF Archival Rights. For purposes of assisted deposit, the final PDF version of the article is of the most interest because libraries or others can easily obtain these versions of the article without faculty-author intervention. (In addition, faculty authors may prefer the deposit of a final PDF version, rather than a pre-print or post-print-the former lacks the authority of peer review, and the latter is more difficult for subsequent researchers to cite) In total, Post-print open access rights. The Author Rights Data reveal that post-print versions (the version after peer review but before final formatting) of the vast majority of articles can be made available on an open access basis. Most (71.71 percent of total unique articles) could either be made available immediately or after an embargo period that ranged from six to 24 months. Post-print versions were also subject to several other restrictions, such as funder requirements, and many require prior permission from the publisher or journal. Table 3 outlines the number of articles falling within these restrictions. Only 1,364 articles (4.65 percent of total unique articles) fell under publisher contracts that expressly prohibited archival of post-print versions. More (2,170 articles, 7.40 percent of total unique articles) either had unclear or unknown post-print rights.
Pre-print open access rights. Finally, the Author Rights Data confirm that most journals allow authors open access archival rights for pre-print versions of articles at a high rate. Table 4 shows the breakdown of the number of pre-print versions of articles that could be archived under existing policies. In total, the Authors Rights Allowed (with embargo) 1,576 (5.37% of total unique articles) Allowed (restriction not stated) 1,332 (4.54% of total unique articles) Allowed (other restrictions)* 1,079 (3.6% of total unique articles)

Total 8,456 (28.83% of total unique articles)*
Other restrictions included funder mandate requirements or the requirement that the author first obtain written permission from the publisher or editor of the journal.

DISCUSSION
This brief review of Authors Rights Data reveal that authors retain significant rights in many of the scholarly journal articles that they publish. At least some version (pre-print, post-print, or final published PDF) of nearly 90 percent of articles could, under existing publisher contracts, be posted to an author's personal website or an institutional repository with few or no limitations. Hybrid open access article processing fees-averaging $3,000 per article among the major publishers (Springer, n.d; Elsevier, n.d.; Wiley-Blackwell, n.d.)-are typically higher than article processing fees for true gold open access journals (Shieber (2009b); Cox & Cox (2008)), making the purchase of hybrid OA from non-OA journals a costly proposition. Nevertheless, even at an average $3,000 per article fee, the total cost for enabling access to the 12,746 eligible articles in this study would be around $37 million. Averaged among the five universities (though in reality it would not be averaged, because some universities own a larger or smaller share of these articles), the cost would come to approximately $7.5 million per institution.
Covering an additional $7.5 million dollar expenditure -an amount that would, at least initially, be borne in addition to the high and rising journal subscription fees that research libraries pay-would for most libraries be outside the realm of any realistic budgeting exercise. But consider current journal subscriptions costs. Association of Research Libraries statistics (2011) reveal that the average serial expenditures among four of the five of these universities (Stanford is not a member of ARL and does not report statistics) are $8.77 million.
The relative size of an aggregated hyrbid-OA expenditure is not out of line with the current spending, and to the extent that those subscription fees can be transitioned over to payment of open access fees, the possibility of large-scale hybrid open access purchase at the institutional level seems achievable. Indeed, members of the Compact for Open-Access Publishing Equity (COPE, n.d.) already provide institutional funding to cover article processing fees, with funds coming from a variety of library and university funds. How the transition from subscription fees to article processing fee payment would occur is a difficult administrative and budgeting question, and something that would undoubtedly occur over a number of years. It is also an issue that would require cooperation from many institutions (similar to the SCOAP3 project, http://scoap3.org/) in order to provide similar levels of access to articles from authors at diverse institutions. Ambitious initiatives like SCOAP3-or an alternative federal funding system (King,2010)-could hasten the transition. In any event, as King (2010)

CONCLUSION
Academic authors have long been interested in increasing access to their research (Swan & Brown, 2004), but action on that desire has not materialized in a significant way. Even though authors already retain significant archiving rights in the works that they create, motivating or assisting them in exercising those rights has been a challenge and has led to the development of funder and institutional mandates. Those tools may be necessary if open access to published research is to achieve a significantly greater hold, but they remain controversial and therefore difficult to bring to reality. In the meantime, several options remain for institutions to facilitate faculty-author deposit in a constructive and significant way. Using existing licenses granted by faculty in faculty-authored open access policies (or by working directly with faculty at institutions where such policies do not yet exist), universities can assist in the deposit process by identifying, obtaining copies of, and ultimately posting articles in which authors have retained self-archiving rights. For institutions with the economic means, pursuing hybrid open access options with publishers that do not already grant archiving rights to authors may be a way to further increase access. Aggregated rights data like that provided by SHERPA/RoMEO can be combined in an effective way with citation data from other sources to identify eligible articles based on journals' author archiving rights and fee options. Challenges of ensuring the accuracy and currency of that data will always be present, but it should not prevent institutions from moving forward in fulfilling their goals in this area.