Image generated by AI

Since ChatGPT-3.5 was released in late 2022, educators and higher education institutions have wrestled with the potential risks that generative artificial intelligence (GenAI) poses to assessment validity. It’s encouraging that, as a sector, we’ve invested significant energy and resources into understanding these challenges and identifying strategies to uphold academic integrity – an essential element of reputation of our universities. But despite the efforts, have we got a clear path forward?

One thing we know for sure is that GenAI is here to stay, likely forever. Its capabilities are advancing rapidly. New AI-powered applications or tools are emerging, almost every day, to handle a wide range of tasks that are traditionally completed by a human. Some of these applications already can closely mimic human thinking, communication, and interactions. These developments have significant implications for learning, teaching, and research.

Over the past 18 months, we’ve learned that institutional policies and GenAI guidelines are not enough. Policies do not always guarantee that students understand the rules or follow them. Detection technologies like AI checkers or plagiarism scanners remain unreliable. Some of us have even experimented with “human AI detectors” which have also proven ineffective. Both approaches are prone to false positives and negatives, which can damage trust in our university systems and – more alarmingly – lead to false accusations against students, with potentially lifelong consequences. For these reasons, many experts now urge us to shift our focus from AI detection and ban to redesign of assessment.

Redesigning assessment in the age of GenAI requires a long-term strategy. Approaches such as asking students to critique AI outputs, limiting or banning GenAI use, or requiring students to declare its use during submission may seem helpful but often depend solely on student use, compliance, and honesty. And since we have no reliable way to verify compliance, such approaches leave assessment to vulnerabilities and compromise their validity. So, what should assessment redesign look like?

One strategy could be to put greater emphasis on the assessment process rather than evaluating only the final product. For example, if we give a written essay or report, we should also assess how students develop their ideas or thinking over time – not just the product. This requires us to work closely with students, whether in-person or online, and regularly engage with them to observe how their thinking evolves throughout the semester. Lecturers or tutors can dedicate time during each class or tutorial to check in with students individually, ask them to articulate their ideas, and document their progress. In other words, we incorporate a human element into the verification of student work so that we can be reasonably confident that students completed the task independently.

Yes, this approach maybe time-consuming. It also demands a redesign not just of assessment tasks but also of class structures and tutorial plans. However, I think the effort is worthwhile. It offers a more reliable picture of student learning than relying on AI detection or asking students to follow a set of rules – methods that depend on student honesty and compliance: something that we cannot realistically enforce.

A second strategy is to adopt a programmatic approach to assessment design. Instead of designing assessments solely at the individual subject level, we can design them across an entire course or program. This means identifying the key knowledge, skills, and capabilities that students should develop by the time they graduate and then designing assessments that capture those outcomes at key stages throughout the course or program. Once we identify those “touchpoints”, we can apply the mentioned above process-focused method to each one so we can effectively secure those key moments in the journey of student learning. This approach can give us greater confidence that students have genuinely acquired the key capabilities we expect from graduates. We can then be less worried about the remaining assessment of the course or program.

Let me reiterate the key message here. Given the enhanced capabilities of GenAI, without a human element in verifying student work, at least for now, we can’t realistically determine whether a student has genuinely completed an assessment. And if we can’t determine what students have actually learned, we cannot guarantee assessment validity. This threatens the credibility of our degrees – diminishing the reputations of our universities.

Until we develop more robust and reliable approaches or frameworks to ensure assessment validity, we must accept this bitter truth and act on it. For now, thoughtful assessment redesign remains our most effective strategy for preserving assessment validity in a GenAI world. Securing assessment integrity is not a choice – it’s essential. A must!