What About Essays?

I’ve been complaining for some time that education, particularly higher education, is long overdue for automation. It’s obvious (at least to me) that large lecture classes could easily be replaced with online lecture classes and become available to tens of thousands of students rather than hundreds. But what about grading essays?

Sure an automaton can figure out if a student has done a math or science problem by reading symbols and ticking off a checklist, writing instructors say. But can a machine that cannot draw out meaning, and cares nothing for creativity or truth, really match the work of a human reader?

In the quantitative sense: yes, according to a study released Wednesday by researchers at the University of Akron. The study, funded by the William and Flora Hewlett Foundation, compared the software-generated ratings given to more than 22,000 short essays, written by students in junior high schools and high school sophomores, to the ratings given to the same essays by trained human readers.

The differences, across a number of different brands of automated essay scoring software (AES) and essay types, were minute. “The results demonstrated that over all, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items,” the Akron researchers write, “with equal performance for both source-based and traditional writing genre.”

“In terms of being able to replicate the mean [ratings] and standard deviation of human readers, the automated scoring engines did remarkably well,” Mark D. Shermis, the dean of the college of education at Akron and the study’s lead author, said in an interview.

Hat tip: Tyler Cowen

Most of the classes I took in college consisted of lectures with from 30 to 500 students that today could easily be replaced by online lectures. What couldn’t have been replaced? Labs, small discussion classes, i.e. 8 to 15 students. That’s about it. I found the TA sections of large lecture classes completely useless. However, I’m sure the students in the sections that I taught found them invaluable. Maybe other people had different experiences.

5 comments… add one
  • Icepick Link

    I was a TA for some of the largest classes we had at UF, which would be some of the largest classes in any US university. My experience was this: The best students didn’t really need to be there most of the time – they picked it up either in the primary lecture or from their own efforts. (These were college algebra and calculus classes.) Office time could help them just as well if/when they required help. The worst students didn’t need them either, because they were simply terrible for whatever reason.* It was the 60-80% in the middle that stood any chance of my being able to assist them.

    And that often depended on the primary lecturer. I knew students that stopped going to one lecturer’s sessions and would slip into the other primary lecturer’s section. I had students who stopped going to the primary and only came to my TA sessions because they found the lecturer boring and my sessions helpful. (It wasn’t always the same lecturer, and in classes of this size it is always easy to find someone who can’t stand you and someone else that thinks you’re the bee’s knees.) There was no point in sucking up to a TA, lowliest of the low, so that wasn’t it. Some occasionally brought friends assigned to other sections. Woo-hoo, validation!

    OTOH, I also once got two student reviews that said exactly the same thing: “WORST TEACHER EVER.” That was it, nothing else. I was actually happy with those assessments, as I knew exactly who had made them and why. Two sorority pledges that had been upset _I_ made them take a test on an evening which a major pledge event had been scheduled. These tests required using rooms all over the campus to accommodate the 1500 to 2000 students taking the class, so they were actually scheduled up to three years in advance. And the instructor in charge was adamant that NO ONE would be allowed to take a make-up test or otherwise get special treatment. One test score per semester would be dropped to accommodate the unavoidable. In all the years she had been in charge of that course, probably over 100,000 students, she only allowed one student to take a make-up test. His excuse for missing the test? He had been DEAD. No joke! He got hit by lightning while riding his bike to take the test. He was able to substantiate his claim that he had been clinically dead, so he got to take a make-up test when he recovered! All the TAs told that story to their students so they’d know to NOT ASK. So the two little sorority chickees had no chance with me, and they spent the rest of the semester glowering at me, as best they could.

    We’d always get together after the evaluations were released to compare who had the most entertaining reviews, and the bad ones were almost always the best. I did get a couple of fun ones expressing how I was the most fun math TA because of the stories I had told about going to Ozzie concerts before class.

    * One of my favorite students was also one of my worst. He simply didn’t want to be there but his parents were making him go to UF. He either needed to be in a dedicated art school or needed a few years of seasoning. He also wrote one of the all time great equations ever seen in mathematics. In words: “MATH = the sq. rt of ALL EVIL”! He came to office hours a lot to hang out with me and one of my office mates. Cool guy, terrible student, and I hope he got his life figured out.

    Also, at the end of each semester I’d discover I had at least one student with an almost perfect score in the class who had never asked a question, come to office hours or made their presence known in any other way. In classes that large the best students don’t even leave behind an impression.

  • Junior high school and high school essays are largely exercises in vomiting up accepted themes with an emphasis on correct grammar and structure.

  • Icepick Link

    Junior high school and high school essays are largely exercises in vomiting up accepted themes with an emphasis on correct grammar and structure.

    You make it sound like that’s a bad thing!

  • Sam Link

    I welcome something that would make grading of essays a little more objective. I had a very tough English teacher in senior year who gave me “C”s on everything up until the midterm. The midterm exam was board wide and graded by a random teacher. I got an A on that, and then subsequent “A”s from my teacher the rest of the year. I can tell you for certain I didn’t have a radical change in output quality at the midterm.

  • I do think that the content of essays can change when a student is exposed to more studies and disciplines. That’s where the creativity comes in, and I can’t see machines doing that.

    As for lecturers, I did have an engaging one in the humanities who enticed me to take every course he taught that I could cram into my schedule at that institution.

Leave a Comment