Skip to main content

The Utility of Generative Artificial Intelligence in Rating Interpreters’ Accuracy: A Case Study of ChatGPT-4

This chapter is part of: Chapelle, C. A., Beckett, G. H., & Ranalli, J. (Eds.). (2024). Exploring artificial intelligence in applied linguistics. Iowa State University Digital Press. https://doi.org/10.31274/isudp.2024.154.

  Download Chapter
Description
The assessment of interpreting has been criticized for its subjectivity and variability. Although employing multiple raters can enhance rating accuracy, it is impractical in everyday assessment situations. Analytical/rubric rating has been suggested to improve rating reliability and accuracy, but this can be time-consuming and labor-intensive. The generative AI (GenAI) ChatGPT-4 offers the potential to understand rubrics and facilitate the analytical rating process. This exploratory study aimed to investigate the affordances of ChatGPT-4 in interpreting accuracy assessment. A total of 36 interpreting transcripts were rated on their accuracy with a rating rubric. The mean scores generated by the GenAI were then compared with those assigned by three seasoned interpreting instructors to evaluate the rating quality of GenAI, using human ratings as a benchmark. The study also sought to investigate the distinguishing features of AI and human raters’ assessment decisions, focusing on their ability to rate interpreting trainees at varying proficiency levels and their performance on three topics, employing one-way ANOVA for analysis. Findings revealed that ChatGPT-4 understood interpreting tasks and rubrics well, demonstrating a rating pattern moderately similar to human raters. It generally assigned high scores to more proficient trainees and maintained consistency in the three topics. However, the reliability of individual ratings of ChatGPT-4 was rather low, and it exhibited difficulty in differentiating between intermediate and novice trainees.
  • Details
    Published Published By Pages DOI
    July 31, 2024 Iowa State University Digital Press 14 10.31274/isudp.2024.154.05
    License Information
    ©2024 the authors. Published under a CC BY license.
    Citation
    Jia, Y., & Aryadoust, V. (2024). The utility of generative Artificial Intelligence in rating interpreters’ accuracy: A case study of ChatGPT-4. In C. A. Chapelle, G. H. Beckett, & J. Ranalli (Eds.), Exploring artificial intelligence in applied linguistics (pp. 59-72). Iowa State University Digital Press. https://doi.org/10.31274/isudp.2024.154.05