A Comparative Analysis of Multiple- Choice Questions: ChatGPT-Generated Items vs. Human-Developed Items
This chapter is part of: Chapelle, C. A., Beckett, G. H., & Ranalli, J. (Eds.). (2024). Exploring artificial intelligence in applied linguistics. Iowa State University Digital Press. https://doi.org/10.31274/isudp.2024.154.
Download ChapterDescription |
---|
This study explores the potential of incorporating ChatGPT (GPT-3.5) to improve test development efficiency. It evaluates the quality of ChatGPT-generated multiple-choice questions (MCQ) compared to those written by human test developers. Additionally, the study seeks to identify the general characteristics of ChatGPT-generated items. A total of 80 items, 40 from ChatGPT and 40 from human writers, were developed from 20 authentic Korean passages. The quality of the items was evaluated by three raters on a five-point Likert scale against a rubric of eight criteria. Both quantitative and qualitative methods have been employed, incorporating an analysis of rating scores and the raters’ written comments. The results indicate an overall comparability between ChatGPT- generated items and those created by human writers. However, ChatGPT’s ability was significantly limited when creating plausible distractors. These findings underscore the importance of human judgment, particularly in the creation of effective distractors, to fully leverage ChatGPT in the test development process. |
-
Details
Published Published By Pages DOI July 31, 2024 Iowa State University Digital Press 19 isudp.2024.154.08 License Information ©2024 The authors. Published under a CC BY license. Citation Chun, J. Y., & Barley, N. (2024). A Comparative analysis of multiple-choice questions: ChatGPT-generated items vs. human-developed items. In C. A. Chapelle, G. H. Beckett, & J. Ranalli (Eds.), Exploring artificial intelligence in applied linguistics (pp. 118–136). Iowa State University Digital Press. https://doi.org/10.31274/isudp.2024.154.08.