Bring Back the Blue-Book Exam
Posted5 months agoActive4 months ago
chronicle.comOtherstory
controversialmixed
Debate
80/100
EducationAssessmentAICheating
Key topics
Education
Assessment
AI
Cheating
The article 'Bring Back the Blue-Book Exam' argues for a return to traditional handwritten exams, sparking debate among commenters about the effectiveness and relevance of this approach in the age of AI.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
34
2-4h
Avg / period
10.1
Comment distribution121 data points
Loading chart...
Based on 121 loaded comments
Key moments
- 01Story posted
Aug 24, 2025 at 1:44 PM EDT
5 months ago
Step 01 - 02First comment
Aug 24, 2025 at 2:47 PM EDT
1h after posting
Step 02 - 03Peak activity
34 comments in 2-4h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 25, 2025 at 3:22 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45006147Type: storyLast synced: 11/20/2025, 4:44:33 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
When I took physics we had weekly 3.5 hour lab sections. That should be enough for most CS assignments.
Why not open book + AI use exams, because that's what students will have in their careers?
We know, because we taught computers how to do both. The first long multiplication algorithm was written for the Colossus about 10 minutes after they got it working.
The first computer algebra system that could manage variable substitution had to wait for Lisp to be invented 10 years later.
https://www.sigcis.org/files/Haigh%20-%20Colossus%20and%20th...
The limitation seems to have been physical rather than logical.
Tools allow traversal of poorly understood, but recognized, subskills in a way that will make one effective in their job. An understanding of the entire stack of knowledge for every skill needed is an academic requirement born out of a lack of real world employment experience. For example, I don't need to know how LLMs work to use them effectively in my job or hobby.
We should stop spending so much time teaching kids crap that will ONLY satisfy tests and teachers but has a much reduced usefulness once they leave school.
I never need to "fall back" to the principles of multiplication. Multiplying by the 1s column, then the 10s, then the 100s feels more like a mental math trick (like the digits of multiples of 9 adding to 9) than a real foundational concept.
Thank god we still teach quadratic equations, complex numbers, hyperbolic trig functions, and geometric constructions though. I don't know what would become of the world if most people didn't understand those things when we set them loose in the world.
Oxford and Cambridge have a "tutorial" system that is a lot closer to what I would choose in an ideal world. You write an essay at home, over the course of a week, but then you have to read it to your professor, one on one, and they interrupt you as you go, asking clarifying questions, giving suggestions, etc. (This at least is how it worked for history tutorials when I was a visiting student at an Oxford college back in 2004-5 - not sure if it's still like that). It was by far the best education I ever had because you could get realtime expert feedback on your writing in an iterative process. And it is basically AI proof, because the moment they start getting quizzed on their thinking behind a sentence or claim in an essay, anyone who used ChatGPT to write it for them will be outed.
If they are trade schools, yes teach React and Node using LLMs (or whatever the enabling tools of the day are) and get on with it.
And the library, and inter-library loan (in my case), and talking to a professor with a draft...
And it did teach and evaluate skills I’ve used me entire career.
Faking intelligence with AI only works in an online-exclusive modality, and there’s a lot of real world circumstances where being able to speak, reason, and interpret on the fly without resorting to a handheld teleprompter is necessary if you want to be viewed positively. I think a lot of people are going to be enraged when they discover that dependency on AI is unattractive once AI is universally accessible. “But I benefited from that advantage! How dare they hold that against me!”
I get the same "you won't always have a calculator with you" vibes from 90s teachers chiding you to show your work when I hear people say stuff like this.
Plus all about capability to actually retain whatever you ask from the model...
It's more likely I will not have paper and writing implements than not having a calculator.
Besides, most people have room for fast arithmetic or integrals; fast arithmetic would be more useful, but I'm not putting the time in to get it back.
Challenge accepted. One possible solution: https://github.com/RonSijm/ButtFish
Because I was demonstrating that I understood the material intrinsically, not just knew how to use tools to answer it.
Making them open book + AI would just mean you need “larger” questions to be as effective a test, so you’re adding work for the graders for basically no reason.
For that, the student must have internalized certain concepts, ideas, connections. This is what has to be tested in a connectivity-free environment.
And, as someone who got paid minimum wage to proctor tests in college, I couldn't keep a straight face at this:
> The most cutting-edge educational technology of the future might very well be a stripped-down computer lab, located in a welcoming campus library, where students can complete assignments, by hand or on machines free of access to AI and the internet, in the presence of caring human proctors.
I think the author's leaning heavily on vibes to do the convincing here.
I have no idea what you're trying to express in your comment, so who's using vibes?*
Were you triggered by the word "caring?" A waiter usually cares that the people they're serving have an enjoyable meal. It doesn't mean that they love them, it means that they think the work of feeding people is purposeful and honest (and theirs.)
-----
[*] It's certainly not in the words; I don't know what made you angry about "joy," I don't know why you think the author does not teach writing skills in "communications," I don't know why the fact that you went through multiple drafts in writing school papers is relevant or different than anyone else's experience. Maybe that's over now. Maybe I actually don't care if you use AI for your second and further drafts, if I know you can write a first draft.
Drafting and redrafting a cumulative course paper, as well as iteratively honing a thesis, is a writing skill.
I would argue it as important than demonstrating recall and interconnection of the material. It is being lost if long-term work on papers is being replaced with 3-hour blue-book essays.
That is why I thought it was relevant. That's it.
I guess what I'm asking is, how did AI shift the status quo for in class exams over, say, Safari?
A common mode I have seen is phone in lap, front-facing camera ingests an exam page hung over the edge of the desk. Student then flips the page and looks down for the answer.
Large lectures have hundreds of students, and to properly proctor an exam for one of these classes, one needs dozens of proctors.
It can be done, but observe: for most faculty, maintaining the integrity of their course does not aid them on the path to tenure/advancement.
Administrations won't allow it because they just don't care enough. It's a pain dealing with complaining parents and students.
In any case, cheating has existed since forever. There is nothing much new about cheating in in-class exams now with AI than before without.
AI is transformative here, in toto, in the total effect on cheating, because its the first time you can feasibly "transfer" the question in with a couple muscle-memory taps. I'm no expert, but I assume there's a substantive difference between IDing someone doing 200 thumb taps for ~40 word question versus 2.
(part I was missing personally was that they can easily have a second phone. same principle as my little bathroom-break-cheatsheet in 2005 - can't find what's undeclared, they're going to be averse to patting kids down)
Feels like when /r/nba got too "joke-y" for me to participate. Asking a question without performative vulnerability that sounds fake to most will sound "aggressive" or "putting down" to most in a generic social environment. (i.e. one where loudness / randomness / 'being polite', i.e. not asking questions or inducing cognitive load rule)
Thanks for noticing and saying something. Ngl it made me feel bad like I did something wrong.
Currently, it's been a place for acquiring skills but also a sorting mechanism for us to know who the "best" are... I think we've put too much focus on the sorting mechanism aspect, enticing many to cheat without thinking about the fact that in doing so they shortchange themselves of actual skills.
I feel like some of the language here ("securing assessments in response to AI") really feels like they're worried more about sorting than the fact that the kids won't be developing critical thinking skills if they skip that step.
Maybe we can have
When I started out (and the original Van Halen was still together), blue book exams were the norm in humanities classes. I've had narrow experience with American undergrad classes the past 25 years, so I don't have a feeling for how things have evolved.
Why replace a system that generally works well with one that introduces additional potential problems?
Online instruction / learning can work for some people, and that's good.
I don't understand how anyone ever thought that an online exam could be made secure. There's just no way to ensure that the person who registered for the course is the one taking the exam when you don't control anything about the hardware or location, when students have a wide variety of hardware that you must support, and any attempt at remove video monitoring of the exam immediately runs into scalability and privacy issues. Like, even if you're watching a video of the person taking the online exam, how do they prove that they didn't just hook up an extra keyboard, mouse and (mirrored) monitor for person #2 to take the exam for them while they do their best to type and/or mouse in a convincing way?
It also doesn't help that you periodically get students who will try to wheedle, whinge, and weasel their way into an online exam, but then bomb the in-person exam (it's so strange and totally unrelated that they really, really wanted to take an online exam instead of in-person!).
Ok, I'll stop ranting now :)
There's a whole system for this, it already works very well if people actually wanted to make online exams work. Of course it's not "social distancing" so it didn't help with covid.
So here in WA, USA there's definitely a system for this. If you don't mind sharing - where do you live? Have you used this proctoring-for-other-colleges system? How do you know about it?
Both the University I attended and the ones near my home town proctored tests for the public. I actually used that service very heavily in high school to test out of ~50 hours of gened classes.
The gist of it is that I think someone willing to put a lot of work in could probably cheat using the strategies you suggest, but it would be a pain. During your checkin you have to send a selfie, photos of the front and back of your photo ID, and four photos of the space you've prepared. You can't have any writing tools, written materials, or anything else that looks like a computer or screen in the area, and the machine you're on has to be single-screen. If the pre-test greeter or the proctor aren't satisfied with what they see they can ask you (via text and voice chat) to show them the room in real time via your webcam and may ask you to make changes or move things around to provide evidence that something in the room is not hiding mechanisms used to cheat. From that point on, your webcam and mic are on and live streaming to the proctor for the duration of the test; they don't say anything about assistive technologies on their end but I assume they are using eye tracking to look for instances of eyes wandering offscreen for a protracted period of time. The test environment software effectively "takes over your PC" during the test and I would imagine is pretty effective at detecting alternate/multiple display outputs etc.
There are probably scalability issues, but privacy is not an issue from the perspective of the proctor - you are effectively surrendering it by agreeing to take the test online, you could have gone to a test center instead.
FWIW the 'privacy' concern is often voiced by students who find the test proctoring intrusive. On the one hand I agree it's intrusive but on the other hand it seems reasonable for the short-ish amount of time (several hours) that they'll be taking the test.
I'm guessing that some fraction of the students complaining about privacy genuinely object to the privacy issues, some folks may object to having to pay a fee for the oversight, and some fraction are objecting in an attempt to get the oversight removed so they can cheat. I'm sure there's overlap, and other reasons that I haven't thought of.
The scalability problem(s) comes from the need to have a human watching each student take the test. The more people each watcher needs to watch the less effective they'll be, but the lower the watcher-student ratio is the more expensive it will be. Especially since a good fraction of the students won't cheat (so you'll be paying people to watch students not cheat for several hours).
The requirement is that you remain in place and keep your face on camera for the duration of the test, but depending on the test there's an "unscheduled break" functionality that will let you take an unsupervised break, at the cost of locking you out of all of the questions you have already seen on the test so far.
Pearson OnVUE: https://www.pearsonvue.com/us/en/test-takers/onvue-online-pr...
Microsoft-specific documentation of the online testing process: https://learn.microsoft.com/en-us/credentials/certifications...
That being said, the whole experience had an impact on my generally optimistic view of human nature.
Our interview usually starts with them breathlessly reading from a script out of the corner of their eye. I'm ok with notes to make sure you hit some high points about yourself even in person. Nervousness shouldn't disqualify a talented person. But with the coding part I've gotten exasperated and started asking these senior candidates to share their screen and do a fizz buzz exercise live in a text editor in the first few minutes. If they struggle I politely end the interview on the 15.
One candidate cheated and it was interesting to watch. In the time I sent the message in Zoom and them sharing their screen, just a few seconds, they had either queried or LLM-ed it on their phone or another computer, had someone off screen or in the same room listening and sharing the answer on another monitor or something else. Whatever it was they turned their head slightly to the side, squinted a bit and typed the answer in Java. A few syncopated characters at a time. When asked what modulo was they didn't know and couldn't make any changes to it. It was wacky. In retrospect I think it was them reading the question out loud to an LLM.
I'm waiting for the candidate who has someone behind them with the same shirt on pretending to be their arms.
These are the absolute worst.
You're taking people out of their comfort zone (highly customized IDE like JetBrains / VSCode / Vim) which cause them to lose shortcuts and decently working intellisense. Yes, my TypeScript in my projects is configured in such a way that I get way more information from the compiler than the standard config. After all, you're testing my ability as a software engineer, not a code monkey, right?
In this very uncomfortable place there is no way of asking questions. Yes, sometimes stuff is ambiguous. I rather have someone who asks questions vs someone who guesses and gets it right.
The testing setup is horrible too. No feedback as to what part of the tests fail, just... fail.
No debugger. No way of adding log messages. When was the last time you've been in that situation at your workplace?
All under the pressure of time, and additional stress from the person that they really NEED a new job.
Oh, and when you use compiled languages, they're way slower than say TypeScript due to the compilation phase.
And then even when your score (comprised of x passed tests and y failed tests) is of passing grade there is a manager out there looking at how many times someone tabbed outside of the window/tab?
Where am I supposed to look up stuff? Do you know all of this information by heart: https://doc.rust-lang.org/std/collections/struct.BTreeMap.ht...
Which reminded me that one time I used a function recently stabilized, but the Rust version used was about 8 versions behind. With that slow compilation cycle.
/sigh.
Everyone says this over the years, even before AI, and I've never felt it made the slightest difference in how they rate me.
Grading a stack of blue books is a "just kill me now" brutal experience. A majority of cognitive effort is just finding the page for the next problem to grade, with finding the answer another delay; any programming language with this kind of access latency would stay a "one bus" language. So of course professors rely on disinterested grad students to help grade. They'll make one pass through the stack, getting the hang of the problem and refining the point system about twenty blue books in, but never going back.
With stapled exams one problem per page one can instead sort into piles for scores 0-6 (if you think your workplace is petty, try to imagine a sincere conversation about whether an answer is worth 14 or 15 points out of 20), and it's easy to review piles.
When I had 200 linear algebra exams to grade at once, I'd scan everything, and use my own software to mark and bin one question at a time, making review passes a pleasure. I could grade a 10 question final in one intense sitting, with far more confidence in the results than team grading ever gave me.
Also, second stapled exams with prescribed spaces for answers. So much time wasted just looking for where the answer is.
It’s still a hard problem though. If the students have the laptops outside of the testing site, they can load cheating materials on them, or use them to smuggle questions and answers out if the test is used in multiple class sections. You realistically will not lock down a laptop students can take home sufficiently that some people won’t tamper with it.
Otherwise you have to have enough laptops to get each student a wiped and working machine for every test, even with lots of tests going on. And students need to be able to plug them in unless the batteries are rigorously tested and charged, but not every classroom has enough outlets. And you need to shuttle the laptops around campus.
Then you need a way to get the student work off the laptops. You probably want the student to bring the laptop up when done and plug in a USB printer. Anything else, like removable media, and you have to worry about data loss or corruption, including deliberate manipulation by students not doing well on the exam, and students claiming what got graded wasn’t what they meant to hand in. And you still have to worry about students finding a way to deliberately brick their laptops and the inevitable paper jams and other hardware failures, especially an issue when students need to leave on time to get to another class.
So you need systems that are cheap but reliable, tamper-resistant but easy to diagnose and maintain, robust against accidental data loss but easy to reliably erase, able to export data and install security updates without letting students surreptitiously input data, and heavily locked down while easy to use for students with a wide variety of backgrounds, training, and physical abilities.
The rack charges the laptops, streamlines distributing/collecting tests, prevents tampering, and reports defects. A professor uploads the test to their LMS and specifies the exam time-frame and location. Before the exam, the LMS transfers the test to the correct rack, which saves it to the laptops. After the exam, the rack loads the student responses and transfers them to the LMS for the correct class. Each laptop has a light under it whose color indicates whether it's in standby, waiting to be distributed (shortly before exam start), has a submitted test, or is malfunctioning (the rack periodically pings laptops in storage to verify they still work). Multiple professors can queue exams to the same location, in different time-frames, and the LMS and rack know when to prepare each test on the laptops.
The hard part is to implement this well. The rack's hardware and software must be reliable; nonetheless it will fail (hardware breaks), and exams get moved and rescheduled, so there must be a way to transfer tests to another classroom and time-slot. The laptops can have slightly less reliability, since there are backups; but failures can't be clustered, and if a laptop is transferred from another rack during an exam, it should download the destination rack's test and become active (so if one rack's backup laptops run out, laptops can be borrowed from other classrooms). The laptops must periodically backup in-progress tests using wifi or bluetooth, and if a laptop breaks during the exam, the student can resume their progress on another. Tests must be downloaded well before exam time, in case there are problems (laptops should be able to store and queue multiple tests). Laptops must handle exams that start late (up to the official exam end time) and end late (including overlapping the next exam's start time, in which case the next exam is loaded when the laptop is put back into the rack). The rack must absolutely not indicate that a laptop is ready with an exam if it's not; and (since tests are downloaded before exam time) nothing unique should happen at exam start time (to reduce last-minute surprises, and let a professor who isn't convinced take out a laptop and check that it's functional 15 minutes before). Last but certainly not least, the UI to submit tests and grade responses should be intuitive, simple (not overwhelming) yet powerful (can handle edge-cases like rescheduled tests, randomized tests, accommodations; or be extensible enough to support these features).
Despite all the above requirements I actually think such a product is feasible. Unfortunately, with firsthand experience of typical EdTech and academic bureaucracy (and I'm not a professor so I don't know the worst of it), I'm skeptical it would be adequate quality unless the company designing it is uniquely motivated and capable.
Also, having classrooms full of laptops full of tests assume classrooms are secure and at many colleges they’re unlocked by default. There could be a 9am class and exam, then the room is empty until 11, then the same thing until a class at 2 and a night class at 6, and at 8 the PenTesting and Lockpicking Club meets in the room.
Keeping dozens or hundreds of laptops in every classroom also dramatically shifts the reward for burglars which colleges will not like.
I've never seen anyone attempt blue books for anything else.
It's long been known that a longer essay answer is more likely to get a higher grade. And yeah, having been a student and a stressed out grad student, after about the 20th exam, only length is the real signal of grade.
Other comments point out that with the prices of tuition these days, students should be expecting a lot higher quality of feedback (grading) than what they are getting at any random R1.
It really does seem that the University system (as opposed to the college-esque system) is broken and that the additional AI fears are just another log on the already collapsed bridge. We're getting over the wrong thing.
I did it in three passes - which might be all in one session or might be broken into smaller sessions. First, skim them all for spelling & grammar mistakes, with a red pen. Second, read them all (but not too deeply) to identify where they make key points, with a green pen. At this point you have a really good handle on the general level of accomplishment. Third, take a blue pen and comment them and grade them.
First, its interesting to see a situation where a new technology INCREASED the cost of something.
Second, this is alluding to one of the dimensions along which that cost is going to be born, staffing. If its too many papers to grade, then more staff are needed to grade them.We're probably not going to get that, though, and instead the cost increases, the funding stays the same, so the quality will decline to compensate.
So it goes...
Writing is probably the most difficult subject, because even now it's difficult to prompt an LLM to write in a human intonation - but they will do a perfectly good job at explaining how to write (here's a basic essay structure, here are words we avoid in formal writing, etc). You can even augment a writing tutor chatbot by making it use specific, human-written excerpts from English textbooks, instead of allowing it to generate example paragraphs and essays.
Given the absolutely wild increases in tuition, administrations should have massive resources to bring to bear to solving this and other problems.
The reasons they don't feel more like excuses than legitimate explanations.
https://research.collegeboard.org/media/pdf/Trends-College-P...
Published tuition has gone up, but grant aid has matched it, making the net cost at 4-year public institutions to be flat, or even down slightly over the 10 year period. The same applies at private 4-year institutions, large increase in nameplate price, matched by large increase in grant aid, actual net tuition flat.
Expenditure data also show that they are not spending significantly more. See the chart at the end of this page, which gives expenditures in real, per-student dollars. They are up, a little less than 20% over 10 years, but half of that is increases in hospital costs of their associated health care systems, which have nothing to do with tuition. The rest is a mix of various activities of universities, much of which are not tuition-funded.
https://nces.ed.gov/programs/digest/d23/tables/dt23_334.10.a...
Now we'll see the reverse, with students arguing that they can't handwrite effectively and need more time or whatever in order for exams to be fair. Hopefully handwritten exams will become the norm from grade school onward, so that this complaint will be rendered moot.
On the few occasions (3) I took one, I ended up "punching through" the paper every single time. I tend to write with high pressure and the paper quality is atrocious. Twice, the tares were so bad the book was partially de-bound.
On both occasions, when presented with a "torn/destroyed book" I had to show the proctor the "issues" and then, very carefully, hand-copy over everything into a new book in their presence--absolute PITA.
The question is: what does an exam measure better: the aptitude or hard work of the student, or the creative effectiveness of the teacher?
I experienced this question firsthand one year. I taught a branch of math in the way I (remembered being) taught to a class of students who were not receptive to that approach. When I tested them, I was very disappointed.
After a few weeks of mulling, I went back to that branch with some new ideas about how to approach the topic. This time with a graphic rather than an abstract approach. More grounded in their likely life experiences. Almost immediately I started hearing "oh, now I get it!" and "well that's easy". Same test, but -much- better results. It wasn't their fault. What they taught me was invaluable.
Yes, exams measure the effectiveness of teacher presentations as much as they measure what students have learned. Good teaching is not a part-time job ... many students are ill-served by this approach. A person who resents teaching as a part-time burden is unlikely to shine at it. And students sense it.
Nor is good teaching a gift from the divine - any more than great lab technique, or crisp programming. Many teachers don't recite the same notes year-after-year, because they're 'good enough'. Their exam results help them to learn from their mistakes.
If the exams don't measure teacher effectiveness as well, then what does? What their paying students walk away with. Is it a treasure, or a wheelbarrow of dirt?