Discover more from Our Long Walk
Teaching with ChatGPT
Lessons from my first semester experiments
I’ve been teaching the first semester of Economics 281 since 2011. Although the course content has evolved, as it should, and the mode of presentation and material has improved – a textbook, thanks to a pandemic – this year introduced a new challenge: natural language processing models like ChatGPT. Or, simply, AI.
As soon as I realised the power of ChatGPT in January, I decided to see it as an aide rather than as an obstacle. I decided to redesign the course in two ways: First, I asked students to use ChatGPT in writing an essay, and then write a critique of the essay. The critique would be graded. Second, I used ChatGPT to set test questions. The first aimed to encourage students to work with ChatGPT; the second aimed to make me, the lecturer, more productive.
Both were qualified successes. There were some teething problems. ChatGPT would only spit out 500 words, while the essay required 1000 words. That could be easily fixed by simply writing a second prompt asking it to continue, but many students did not know this. Also, in the month leading up to the final hand-in, ChatGPT launched a new version. Some eager students had by then already produced their essays. The older version was more rudimentary and therefore contained more ‘errors’ to critique; incorrect citations, for example, or was simply too generic. The updated version made fewer errors and made it far more difficult to offer a sound critique (for a second-year student, anyway). I also realised that students could use one version of ChatGPT to produce the essay and another version to write the report, but after reading through the reports, I doubt any student did this.
In general, I think the students enjoyed using ChatGPT in this way. Only two students mentioned ChatGPT in the formal course evaluation, though. One noted that ‘the lecturer was incredibly passionate about what he taught and integrated technology like ChatGPT in a very interesting way’. Another mentioned that ‘engaging with the content by grading an AI essay’ was one of the best aspects of the course.
The second experiment was to use ChatGPT to set a test. I did so for the second test of the year; the first test I set using the usual manual method. For the second, I prompted ChatGPT to ‘write five difficult multiple-choice questions based on the text provided, focusing on the economic intuition’. I then provided ChatGPT with the text for each chapter. (I am the prescribed textbook’s author, so it was easy to do so.) It would then provide me with five questions, and I would choose the one that I thought was most appropriate. I would repeat this exercise for the ten multiple-choice and ten true-or-false questions across all 33 chapters.
I hoped that this would save me time. It did not. I still spent several hours setting a test paper, judging which question was best and tweaking alternative (wrong) answers. What it did do was offer more creative questions than what I had used before, questions that I probably would not have asked myself.
Did students find it harder or easier to answer ChatGPT questions? The average between the two tests is remarkably similar: 58.6% vs 58.8%. The standard deviation, though, is much larger in the second, meaning that there are students who did really well (the first student, for example, to score 50/50 in all my years of teaching Eco’s 281), but also students who did poorly. The R-squared between the two tests, 0.65, also suggests that the same students who did well in the first test, did well in the second. The ChatGPT questions benefited stronger students.
The essay marks, by contrast, were far less correlated to the tests (R-squared of 0.03). It may be that it is simply too difficult to determine excellent ‘referee reports’ from mediocre ones. It may also be that such referee reports test different skills, but my sense is that this experiment penalised very strong students and aided weaker ones. It is also true that essays, in general, are usually only weakly correlated with test marks, but in past years, the R-squared was closer to 0.3, a far cry from 0.03.
AI is likely to affect every industry; education is no different. Many others have written about this revolution. I particularly enjoyed Ethan Mollick’s latest piece on the homework apocalypse (this must be compulsory reading for every teacher and lecturer), and an earlier one on the future of education in a world of AI. What is clear is that lecturers, whether they like it or not, simply won’t be able to offer the same as before. Time to innovate or become irrelevant.
Image created with Midjourney v5.2.