An Assessment of Research Data Services Through Client Interaction Records

Research data services have become a key feature of academic libraries. In this paper, we provide an internal assessment of consulting reach and effectiveness for our Data Services provided by the University Libraries at Virginia Tech and using client records from 2016 to 2020. Through this assessment, we explore how service growth and reach across Virginia Tech has evolved with time. We also look more closely at these aspects for one college and discuss how we will use this data to assess the impact of our services. Finally, through the lens of client outcomes, we examine the trends of client interactions over the term of the study. Initially, we envisioned a successful service as one useful to the largest number of entities (primarily colleges and institutes) across Virginia Tech. However, analysis of the data we have gathered over the past 4 years leads us to consider targeting our service growth where it might be most useful. Rather than prioritizing services that are useful to the largest number of researchers, we instead could (and perhaps should) prioritize engagement with researchers and research communities for whom our assistance can make the largest positive impact on their research projects. This assessment of our client data demonstrates the utility of detailed client management records for periodic formative and summative assessment of research data services.


INTRODUCTION
Research data services have become a staple of academic libraries in the United States and in other countries. Within the US, the evolution of these services as a whole has been benchmarked at multiple points in the last decade (e.g., Fearon et al., 2013;Tenopir et al., 2015;Tenopir et al., 2019). Surveying of services has also taken place within tighter scopes than research data services as a whole; Hudson-Vitale et al. (2017) investigated curation services within Association of Research Libraries institutions, benchmarking their services and understanding challenges and resource needs. Tenopir et al. (2019) found that larger institutions are more likely to have staff dedicated to research data management services and that research institutions are hiring new staff to establish and expand these services.
As libraries establish new research data services, they often survey research faculty, staff, and students for their research data challenges and needs; see Ogier et al. (2015) and Miller et al. (2017) for early examples of these assessments at Virginia Tech, as well as Goben and Griffin (2019) for an overarching analysis of data-needs assessments. However, published assessments of how effectively established research data services address academic institution research data needs are not nearly as plentiful. Coates et al. (2018) provides a useful categorization of assessments that can be used to evaluate established services: A) developmental: assessment to inform choices in an uncertain service environment (e.g., an initial needs assessment); B) formative: assessment to provide feedback on and improve an existing service model; and C) summative: assessment to understand how well a service model has done toward established goals. Coates et al. (2018) provide examples for each of the three categories of this service assessment; however, each case focuses on approaches to gather data for future assessment. We build on the work of Coates et al. (2018) to ask the following question: Given 5 years of data collected on consultations and partnerships, what can we learn about our services?
Our Data Services unit, housed within the University Libraries, provides services to Virginia Tech researchers at multiple points across the research lifecycle, from data management planning, before a project begins, through workflow design, data analysis, and visualization and all the way to the "end" of a project with data publication, curation, preservation, and archiving. We have kept records of every researcher interaction since 2016 and are in a good position for a formative assessment of our services. For the assessment period, our unit consisted of a director, three personnel in data management and curation services, and from four to six data and informatics consultants in a variety of research areas (i.e., engineering, health sciences, social sciences, and visualization and arts). Added to environmental science expertise within data management and curation services, these areas are meant to broadly cover the wide variety of research conducted at Virginia Tech. Our two newest data and informatics consultants started in late 2017.
In this paper, we provide an initial internal assessment of consulting reach and effectiveness of our services from the beginning of 2016 through the end of 2020. We highlight challenges in interpreting our client records and comment on the disposition of such records. Through this assessment, we demonstrate the usefulness of detailed client management records in a formative assessment of research data services and discuss the role that this formative assessment plays in our evolving goals for these services. Initially, we saw a successful service as one useful to the largest number of entities (Colleges and Institutes, primarily) across the University. However, analysis of the data gathered over the past 4 years leads us to consider targeting our service growth where it might be most useful. Rather than prioritizing services that are useful to the largest number of researchers, we could (and perhaps should) prioritize engagement with researchers and research communities for whom our assistance can make the largest positive impact on their projects.

CLIENT INTERACTION RECORDS
Our Data Services' client base is primarily Virginia Tech researchers (e.g., students, faculty, and staff at Virginia Tech who produce research data) but also may include members of the broader community in support of the University's global land grant mission. Our services are briefly described earlier in this paper and are described in more detail in Ogier et al. (2018). For the purposes of this assessment, we define a client interaction as "the cumulative set of actions and responses to a particular inquiry or request from a client." An interaction could be tightly scoped, such as a one-reply email answer to a simple inquiry, or it could be a series of actions, meetings, or conversations over several months in support of a single research project or course. For each interaction, we record a client's name, affiliation, and email address (unless the client is a student, in which case we omit the email address, as it is considered protected information), the interaction title and brief description, the affiliation relevant to the interaction, the consultant or consultants working with the client, dates of service, and how the client initially found Data Services. Figure 1 shows a snapshot of the online input form for initially recording our client interactions. Acknowledging the variety of specialties among our consultants, we impose these minimal required fields and provide space to include further information such as email exchanges or additional details on the client requests and consultant actions taken. We collect and store these data for the purpose of ensuring that we provide the best services to our clients (e.g., following up with clients at appropriate times, being aware of previous interactions with other consultants on our team). Although we do collect the aforementioned data at the client level, we discuss aggregate nonidentifiable data for this initial service assessment. Deidentified interaction data that we used to make the rest of the figures are available at Hilal and Petters (2022).
Inconsistency in recorded data is a continuous challenge for effective formative and summative assessment of services. To address these inconsistencies, we cleaned our client interaction data and made efforts to investigate and correct erroneous data prior to performing our assessment. We did the following: Collapsed previous 10 categories of client reach down to 6 categories listed in Table 1.
Collapsed 22 previous categories of outcomes down to 9 categories listed in Table 2. Collapsed two previous categories of interaction deepness (collaborations and partnerships) to long interactions. Converted all data visualized by fiscal year (for internal reporting) to be visualized by calendar year.

INTERACTIONS AND CLIENT REACH
We first consider client interaction trends over time (Figure 2), next by College and Institute (Figure 3), and, finally, over time and the method by which they reached out to us, i.e., "client reach" (Figure 4). Each of these visualizations provide insight into different aspects Web Design/Special Tool Providing a tool customized to solve a problem, website support Table 2. Categories and short definitions for client interaction outcomes of how our services have grown. Note that we will capitalize College and Institute throughout this manuscript when describing entities at Virginia Tech. Figure 2 shows a steady increase in client interactions from 2016 to 2018, likely owing to an increase in the number of consultants and an accompanying increase in subject expertise during this time period. Our number of client interactions has remained relatively steady since then, including throughout the start of the COVID-19 pandemic in 2020.
Interactions in Figure 2 are categorized as completed, in progress, on hold, and abandoned. Because on-hold and abandoned projects will not lead to any completed outcomes and are few, we exclude them from further discussion. All subsequent figures will only show data from completed and in-progress interactions.
Virginia Tech has 10 Colleges and 8 research Institutes, as shown in Figure 3. As we look at the total number of our client interactions by College or Institute, we can see that client interactions are not distributed evenly across Virginia Tech. However, we do have a non-trivial number of client interactions in each of the Colleges in the University. Colleges tend to be the largest units in the University in that they have the largest number of member faculty and students. Institutes have a higher percentage of research faculty per capita, but those faculty often have primary appointments through a College. Thus, there is a higher probability that our clients have primary membership in the Colleges than in the Institutes.
When we began collecting data in 2016, we assumed that understanding the capacity of our research Data Services to provide assistance across Virginia Tech's research portfolio would be fundamental to gauging the success of our service provision. However, a nuanced consideration of Figure 3 shows that reach, spread, or coverage may be incomplete as an indicator of effective or successful services. How our research Data Services have evolved and what steps we may take to continue to improve them based on these data will be discussed in further detail later in this paper.
Considering how our clients have reached us to obtain assistance and visualizing how client reach has evolved over time offers insight into the most valuable entry-points into our services. In Figure 4, we see how our client interactions break down across the six client reach categories, as defined in Table 1.
From this view. we can see that the number of recurring clients has monotonically increased each year. This is a positive data point because unsatisfied clients would be unlikely to come back to us for more service assistance. We can also see that the number of clients reaching Data Services through events we deliver was highest in 2019 and 2020. Our Education Coordinator initiated several data training sessions for researchers in 2019, including qualitative data analysis and data science-related workshop modules. Additionally, the COVID-19 pandemic, which started in March 2020, necessitated a quick transition to online-only events. Our client interactions did not significantly decrease from 2019 to 2020, nor did our post-event interactions increase. Under normal conditions, our holding events such as lectures, workshops, and instructional sessions tends to generate new clientele. The increase in our recurring interactions from 2019 to 2020 without substantial growth from new clients reaching us through events indicates the persistent quality of the services we provide, even in the time of the pandemic.
Considered together, Figures 2, 3, and 4 tell an interesting story: whereas our number of client interactions is increasing year to year, we are not necessarily seeing even growth across all of the  Table 1.
University's Colleges and Institutes. In fact, despite the stability of the total number of interactions from 2018 to 2020, the increase in interactions from recurring clients from 2019 to 2020 (87 to 124) shows that our services continue to be useful to an increasing number of our current clients. How and why this may be occurring (and what it means for the future of our services) will be considered later in this paper.

SERVICE COVERAGE
We may expect to see a correlation between our service's growth in expertise in research areas and our interactions with researchers in those areas within Colleges and Institutes (i.e., our "service coverage"). We investigate this in Figure 5, which shows the number of client interactions by year for each College and Institute.
From this figure, we can see that we have provided relatively stable assistance over time for each of the 10 Colleges after becoming more fully staffed in 2018. However, we cannot take these interaction numbers at face value when thinking about the impact of our services. For the purposes of this discussion, we use the number of faculty in a College as a stand-in for prospective clients; more faculty leads to more students and more research (and research expenditures). Even though our number of client interactions for COE is significantly larger than for CNRE, comparatively, we have assisted a larger share of CNRE's faculty and students.
It may also seem odd that CALS and CLAHS are the Colleges with which we have had the most client interactions, because neither Agriculture nor Humanities are traditionally seen as being as data intensive or data management focused as Engineering or the Sciences. However, as our data show, we have found that this is not the case. CALS and its associated Cooperative Extension Programs are increasingly pursuing more data-intensive research questions. CLAHS has been investing heavily in transdisciplinary digital humanities projects, many of which involve collection, aggregation, or wrangling of textual or geographic data. Because these two areas, Agriculture and Humanities, are not traditionally associated with collecting and managing complex research data, we think that their faculty and students recognize the need for our services and, more importantly, are willing to ask for our assistance because their data support needs are more aligned in time with our service creation. This leads to our having more client interactions affiliated with these disciplines than the more traditionally datacentric disciplines within the Colleges of Science and Engineering.
To look at how our service coverage has changed over time for the Institutes, which generally have fewer researchers than the Colleges, we will need to look more closely at smaller numbers of interactions. In doing so (Figure 6), we also see that we have relatively steady consultations by year for each Institute. Institutes at Virginia Tech tend to have fewer primarily affiliated faculty, postdocs, and students, which may explain the low numbers of client interactions. In addition, most of the Institutes are primarily funded by grants and industry partnerships, which may enable the hiring of postdocs with specific skills and expertise (rather than asking for our assistance). These industry partnerships also place stricter restrictions on who can see and handle the data. It is important to note that the number of our interactions with each institute each year are small, and we should be careful not to overinterpret the data shown in this figure.
We can also use these data at a lower level of granularity to see which academic departments and programs we have interacted with within each College. Figure 7 shows the number of  interactions we have had with programs and departments within CALS. In this figure, we see that we have had over 10 interactions with 10 of the 13 academic departments and programs within CALS. Because one of our Data Services team members has an affiliation with the Biochemistry (BCHM) department, our 101 interactions with that department are far greater than for any other department. Of the other three departments and programs, Agricultural Technology (AGTECH) primarily provides a 2-year undergraduate program that may not be expected to generate much research. Another, Horticulture (HORT), was subsumed under the newer school of Plant and Environmental Sciences, along with Crop, Soil and Environmental Sciences (CSES) and Plant Pathology and Weed Science (PPWS). For Dairy Science (DASC), further investigation into the four affiliated interactions reveals that three were in 2017 and the fourth was a request for a service that we did not provide. This suggests that we may need to do further research and analysis on the DASC department to see whether they have needs with which we could assist. Given that the other departments (other than HORT and PPWS) all have 10 or more client interactions, we may not be serving DASC to the best of our ability or they may have needs that fall outside of the scope of our services. Both of these conclusions can help us better understand our services. Note that we can do similar analyses for each of the 10 Colleges at Virginia Tech.
Service coverage across the University is an important factor in assessing our services. However, as we have discussed, optimizing service coverage can be dependent on both environment and context. Although we track and value this type of data about our services, it is important for us to note that coverage is not the only factor in assessing the success of our services. We will consider the value of service coverage to the larger assessment of services in the conclusion section later in this paper.

OUTCOMES
Outcomes of a client interaction are a high-level metric for how our services positively impact a particular research project. The outcomes of a client interaction may appear in a variety of forms (see Table 2 for definitions), and one client interaction can have multiple outcomes. Figure 8 shows, by calendar year, the distribution of our outcome types, i.e., what the client received during their interaction with Data Services. Guidance, Dataset Operation, and Visualization are our largest outcome categories from year to year. The relative proportions of each kind of outcome are reasonably consistent year to year. Guidance is a catch-all category that holds short-to medium-length interactions wherein we provide basic assistance or advice on data-focused research. Dataset operation contains interactions in which we, as experts, do something to a dataset or in which our student consultants do something to a dataset. These operations include discovery, publication, aggregation, extraction, and cleaning. Visualization is also a mature service and can include assistance with visualizing research data for analysis, editing publication graphics for clarity, or helping prepare research data for presentation visualizations. We can see a larger increase in our grant development outcomes from 2018 to Figure 8. Data Services client interactions by calendar year for each of the nine client outcomes, as defined in Table 2.
2019 and 2020, and this increase is primarily associated with an increase in our support for and collaboration on funded research proposals both as co-investigators and as project staff. If we focus on what outcomes returning researchers were looking for (i.e., filter by client reach "Recurring"), we find similar relative proportions of each kind of outcome by calendar year, as we see in Figure 8, but with necessarily smaller values. As our services continue to mature over time, these data can be used to track the development of our service portfolio.

SHORT INTERACTIONS VERSUS LONG INTERACTIONS
Another angle from which we can assess our services is interaction length and ratio of short to long interactions. Short (and relatively shallow) interactions can be an email conversation or one initial meeting with an optional follow-up. A long (and relatively deep) interaction allows us to assist researchers in significant and impactful ways and often leads to project or grant collaborations in which we offer original thought or unique expertise. However, each of these long interactions substantially reduces the number of research projects and researchers that we can work with (e.g., increase depth of service, reduce breadth of service). Conversely, seeking solely short interactions allows us to increase the breadth of our services while reducing their depth.
Because research data services are relatively new units within research libraries, it is difficult to define what the ratio of short to long interactions should be. However, we can look empirically at this ratio for our services and how it has evolved with time. For this purpose, client interactions shorter than 180 days are defined as "Short," and client interactions greater than or equal to 180 days are defined as "Long." This demarcation at 180 days is artificial; we have deep client interactions that are shorter than 180 days and relatively shallow interactions that are longer than 180 days (e.g., when a client is not responsive for a while). Although our modifications to this threshold change the number of each interaction some, the overall picture and ratio of short and long interactions is relatively unchanged. Projects that did not have an end date recorded were excluded. Table 3 shows the number of short and long interactions and their computed ratio by calendar year. We see that this ratio, since 2018 (when Data Services became fully staffed), averages 4.3 short interactions to 1 long interaction. This ratio can serve as a benchmark for research Data Services as they seek to determine an appropriate ratio of short to long interactions. We also recognize that, as a team, we are constrained by our time and resources; as noted earlier, every time we devote more time and attention to longer interactions, we reduce the number of additional short and long interactions. The key is to find an appropriate balance between short and long interactions.

REFLECTIONS ON CLIENT INTERACTION DATA
Considering these data from 2016 to 2020 helps us identify a few important trends across our services and challenges a few of our assumptions about service success. For the first time since our inception, our number of interactions was relatively stable from 2018 to 2020, which shows us that, in spite of the pandemic and lockdown of early 2020, an average of around 425 interactions per year may be a good benchmark for year-to-year assessment with our current staffing levels. It is possible that the increasing number of longer, in-depth interactions paired with the increase in recurring interactions may indicate a pandemic-fueled trend of researchers concentrating on continuing longer-term research with known partners, as discussed in the Interactions and client reach section. Our tracking of this trend will be important in the future as we decide whether to focus our service resources on established, longer-term, recurring projects or shorter-term projects that expose us to new areas and opportunities within the University.
The earlier analysis of coverage across the Colleges and Institutes forces us to evaluate how we measure our success. Although equal service coverage across Colleges and Institutes may be a good measure of success for developing services, the relative stability of our interactions and Colleges and Institutes served in 2019 and 2020 suggests that other success measures may be more important. Given infinite time and resources, we have no doubt that we could find ways to even out our client interactions across all Colleges and Institutes. However, equal coverage across a university as large as Virginia Tech is not possible or even practical given our resource constraints. Rather than pushing resources to engage with large and well-funded departments and Institutes, we can instead prioritize projects and partnerships in which our knowledge, experience, and services can provide unique outcomes and make the most positive impact on a project's success. We communicate this impact to Library and University leadership in our annual reports through both the client interaction data here and also longer-form narratives about our involvement in successful projects, grants, and partnerships across the University. We work closely with our Library's communications team to ensure that successful projects are represented in the library magazine and disseminated through the Virginia Tech daily news email. Together, our data and these narratives demonstrate our importance and contributions to the Virginia Tech research enterprise.

DE-IDENTIFICATION/RETENTION OF CLIENT INTERACTION DATA
In the course of this assessment, we have been mindful that libraries and librarians have long held strong views on patron privacy and that new concerns are being raised about the erosion of this patron privacy with newly available digital tools (e.g., Jones et al., 2020). Through this lens, the amount of data we are collecting about our clients may be seen as excessive. We argue that the research Data Services that we provide are rather different from providing access to libraries' resources, and thus necessitate more client data collection. Tracking client interactions is integral to ensuring that our client's inquiries and requests are satisfied.
However, one can rightfully argue that, once a client's inquiry or request has been satisfied, we should at least act to deidentify those client interaction records. Such deidentification would still allow for aggregate data to be used in formative/summative assessments while lessening the potential for any negative impact of breach of confidentiality. Additionally, we should set a retention schedule and determine how long client interaction data should be retained. These are current topics of discussion for the University Libraries at Virginia Tech.

FUTURE RESEARCH DATA SERVICES ASSESSMENT
During this assessment, our Data Services has migrated tracking of client interactions to another platform that is used University Libraries-wide. The use of a Libraries-wide system University. This change in our value proposition will necessitate a change in our outreach strategy wherein we concentrate outreach efforts on the Colleges in which we have seen the most successful partnerships rather than the Colleges and Institutes with the fewest. Future analyses of client interaction data will help us further refine and target these services.