A Box Unlocked, Not A Box Ticked: Tom Chatfield on AI and Pedagogy
In a new white paper by Tom Chatfield, the philosopher of tech and critical thinking outlines a practical roadmap for integrating artificial intelligence into teaching, learning, and assessment. His message is neither Pollyannish nor Luddite, and instead takes a pragmatic survey of the current state of education and examines how to best use the moment to the best advantage of human learners.
In AI and the Future of Pedagogy, Chatfield critiques the defensive and surveillance-based responses often seen toward AI and in turn endorses a more transparent, experimental, and mastery-based assessment. “AI is not a shortcut to learning — it’s a context and catalyst for deeper engagement,” says Chatfield. “The future of pedagogy depends on our ability to teach both for humans and with machines, in partnership with students whose lives are already entwined with AI.”
And there is no turning back toward some imagined better pre-AI past. “The genie is out of the bottle and circling the earth at high speed;” he explains below. “The task is to do better than indulging pure doom or hype.”
The white paper is published by Sage, the parent of Social Science Space. We took the opportunity to ask Chatfield – a longtime friend of Social Science Space — 10 questions surrounding the paper, including about a prototype ‘cognitive co-pilot’ AI tutor he is helping develop.
1. When preparing anything on AI that is expected to have a shelf life of longer than a mayfly’s existence, how do you avoid instantaneous obsolescence?
The quality of your answers depends on the quality of your questions, so you first need to make sure you’re asking worthwhile questions! If you focus purely on the technical specifications of the latest model or the features of a specific platform, your ideas are obsolete the moment you stop typing. So I try to focus on human fundamentals: the nature of cognition, our historical relationships with tools, the wider systems and values of which technologies like AI are one part.
It’s not the case that there is no new thing under the sun. What’s interesting is the tension between novelty and deeper continuities: history doesn’t repeat, but it often rhymes, as Mark Twain didn’t actually say. I love Alison Gopnik and Henry Farrell’s point that AI is the latest in a long line of informational technologies, from writing and print to the internet and search engines, all of which have transformed how we access and understand information.
A textbook from 50 years ago can still be a successful piece of educational technology both because of the thought that went into it and because it creates a clear “geography of knowledge” that respects how our minds work. Technologies succeed when they enhance our scope while automating that which holds us back. This hasn’t changed, nor have the biological and psychological basics of learning.
2. Sticking with that sense of rapid advancement and obsolescence, you write, “More broadly, frameworks are emerging that prioritize principles over platforms, offering the chance to reflect on the fundamentals of education in an age of increasingly ‘intelligent’ machines.” When do you anticipate we’ll see essentially ‘platformless’ interfaces for AI – or have Siri and Alexa essentially beaten us to the goal?
We are certainly moving toward a world where technologies like AI are working silently beneath the surface of ordinary tasks. But, particularly in education, I would argue that abolishing friction isn’t always the goal. In fact, it can be the enemy. In education, the right kinds of friction and difficulty are synonymous with building knowledge and understanding. We need to design constraints so that students don’t get lost in an endless, fluid feed.
More generally, “platformless” is really a synonym for an unseen algorithmic interface: mediation is still taking place, but you no longer see it. Once this happens, you risk letting information wash over you without engaging your critical faculties, or even noticing that value-laden choices are being made on your behalf.
As I note in the paper, we’re entering a world that is both extremely noisy and massively mediated, and this means we need forms of what the philosopher Andy Clark calls “extended cognitive hygiene:” deciding what you do and don’t want to outsource, cognitively speaking. Already, services like search and online shopping are riven with adverts, sponsored links, misleading or bogus products, and so on. Think what happens when the same kind of thing is happening within AIs coded to give you one “right” answer without any kind of insight into how this has been arrived at.
3. Speaking of platforms, you cite the large language model ‘cognitive co-pilot’ you developed in partnership with Timo Hannay. Before I ask detailed questions, perhaps you could describe the tool and highlight what makes it different from previous algorithmic learning aids.
What we’re doing with the first module of the co-pilot is essentially an act of translation: taking a static, rigorous textbook—my own critical thinking text for Sage—and turning it into a dynamic “knowledge base” optimized for a large language model, or LLM. Then we’re getting LLM to tutor you actively, engagingly and adaptively, in a way that’s anchored to both this knowledge base and to a pedagogic system that tracks progress, allows you to make notes, generate diagrams, and provides a user-facing “book” accompanying every element of the syllabus.
So unlike a standard chatbot that just gives you an answer (and perhaps hallucinates along the way), this tool is designed to act as a Socratic tutor. It refuses to dump information or let you outsource your thinking; instead, it asks you to explain concepts back to it, with reference to your particular discipline and interests.
It’s non-linear, in the sense that you can start or dip in wherever you like. But if you try to show progress without reflecting, or gives a shallow answer, the AI stops you: not to punish you, but to guide you through a remediation loop until you demonstrate mastery. It aims to combine the infinite patience of a machine with the structured rigor of a university course. And we’re developing it in very close consultation with students and faculty at City St George’s, so one eye is always on the prize of helping actual learners and tutors get the most out of technology, integrate it effectively into what they’re already doing, and help us improve it together.

4. What did you learn from seeing ‘behind the curtain’ how your own work was digested and translated?
I’m fascinated by how writing “for” an LLM is utterly different to writing for people. Even though an LLM has ingested vastly more books than any human, it is enormously lacking when it comes to tacit knowledge of the world: common sense, empathy, social context. It has information, but it doesn’t have knowledge.
This meant that getting good, consistent behaviors out of the tool required me to be incredibly specific about intentions and boundaries. I ended up creating two versions of the underlying “knowledge base”: a lean, readable one for humans, and a heavily augmented one packed with notes on behaviors, warnings, and pedagogical intent for the machine. It reminded me that an LLM is brilliant but also placeless and amnesiac; it constantly needs to be anchored in a context you have defined, otherwise it reverts to the probabilistic mean of the internet.
5. Does the co-pilot itself evolve from its interactions with students – both in that one-on-one setting and as a platform?
Yes, and that persistence is a key differentiator. We are making extensive use of histories and completion tracking to ensure that a student’s engagement is an accumulation of understanding rather than a series of one-off queries. Standard LLM interactions are often both fleeting and liable to going off track. Once the window closes, the context is gone; while the longer an interaction lasts, the more likely the system is to “forget” crucial elements of what came before.
Our system, by contrast, “knows” what you have done. It tracks those “mastery gates” I mentioned, so it remembers if you struggled with deductive reasoning last week, and it can adapt its future questions based on that history. It turns a chat into a curriculum.
We are also treating the entire project as a piece of research, and plan to use aggregate data about how students interact with the tool to refine the AI. As I put it in the paper, the ultimate goal is to make the education system itself a site of learning and improvement. The last thing we need is another abstract debate about what AI can or might do. We need to be looking at the particularities of what happens when we actually deploy it in an educational context, and how we can assess and iterate it in the light of this.
6. You write that the hope is that the tool will handle the foundational teaching, leaving the instructor to do higher-order teaching. Three quick questions centering on that:
– What additional training, AI or not, is in place to prepare the instructor?
– Do the organic teachers show any fear that they’re being edged out?
– How will you ensure that future instructors don’t use the tool to ‘set it and forget it’ with their students?
This is one of the reasons we’re approaching the entire enterprise as a research project: the point isn’t simply to create a useful, engaging tool (though this is certainly our hope) but also to work with students and faculty to see how such a tool can best be integrated into existing teaching and learning. And the place we start is something that both faculty and learners mention in every workshop and engagement: they are under intense pressure in terms of time and volume of information, and the last thing they want is an additional task dumped on top of everything else!
Similarly, we don’t have all the answers; we’re not arriving with a magical solution. But we do want to help educators think (and educate us about) how systems like this can best fit into their particular disciplines and syllabuses. Training is a part of this, but also listening and co-design. All of this requires time and attention, so a first step is trying to understand where you can both have maximum impact and relieve rather than generate pressure.
Our initial modules focus on critical thinking, AI literacy and computational thinking precisely because these are “meta” skills foundational to learning in the 21st century: if we’re doing it right, helping students master these should unlock further forms of learning for them, rather than just being a box they need to tick. If the AI can handle some of the foundational work of clarifying foundational ideas, and getting learners thinking about how these apply to their course and needs, this frees the instructor to do the complex, messy, discursive work that humans are best at. The goal is to give teachers the information and tools they need to intervene more effectively, not to walk away.
7. In that same section, you note that the tool, rather than isolating students, has created more peer-to-peer opportunities. Do you fear that the AI and the student may become pair-bonded? (See your statement: “But this risks both a distorting degree of anthropomorphism and the displacement of discussions that might otherwise have been had with colleagues and peers.”)
I’m very wary indeed of anthropomorphism: of people treating Large Language Models as friends, confidants, companions. The power of machines that can use natural language with human-like fluency is also a huge risk, because it’s incredibly hard not to conflate this fluency with the presence of human-like empathy, interest, compassion.
In this sense, a large part of our task is framing interactions with the system in a way that is both useful and honest. You need to make clear that it’s a tool, that using this tool appropriately means respecting certain institutional and personal boundaries, that guardrails and moderation are in place; and that there’s a clear, facilitative code of practice.
One advantage, here, is that we are working within the university’s learning management system rather than outside it. This creates an opportunity for students to interact thoughtfully with an AI-powered system that is explicitly not trying to be their friend, please them or simply do whatever they tell it; one whose persona is that of a Socratic tutor, but that also wears its artificiality on its sleeve, foregrounding the fact that LLMs are extremely powerful but imperfect tools.
This is where the AI Literacy module we are also developing in parallel to the critical thinking one is especially interesting, because by using an AI to teach people about AI you can play with these things. We’ve accentuated this in the form of a selection of standalone “AI Labs” that encourage students playfully to experiment with simulations of social media feeds, sorting algorithms, system prompts with different personae, and so on.
You also need to remember that over 90 percent of British undergraduates are already using Generative AI out “in the wild” without much in the way of guidance or oversight. So anything you do to support critical, reflective, constructive engagements with AI connected to clear learning objectives and skills should in principle be safer, more accountable and more transparent than this.
8. The paper describes the arms race of AI vs. traditional teaching, and suggest that perhaps traditional assignments may be partly at fault for fostering ‘cheating’ (by which we mean using an AI buddy). Could you offer some examples of assignments or outputs that could better foster a collaboration between student, instructor and AI buddy (perhaps drawing from ones used in the white paper)?
I particularly like assignments that reframe the outputs of AIs as evidence of algorithmic processes that it is a learner’s task to explore and interrogate, rather than as products that you take or leave. For example, I cite Christopher D. Jimenez’s work where students feed autobiographical details into an AI and ask it to guess information about things like their class and race, providing explanations and confidence intervals for its guesses. The AI’s output—which at the time Jimenez was writing was often riddled with stereotypes—itself becomes the subject of the essay. The student isn’t asking the AI to do the work; they are analyzing the AI’s biases, limitations and strengths as a case study.
Another approach, long championed by Ethan Mollick, is “AI logging.” Students use the AI as a teammate or mentor, but they must submit a detailed log of their prompts and the AI’s responses alongside their final work. They have to reflect on what they accepted, what they rejected, where the AI hallucinated, and so on. It turns the assessment into a conversation about how they think, and how they navigated the tool, rather than just grading the final product.
9. A lot of the paper describes using AI in service of things we already do, but could do better. What are some really blue-sky ideas you have for what AI could help accomplish that bear no relation to what schools presently focus on?
One of the biggest, for me, is accessibility. The computer scientist Chris Mairs, who is blind, writes a brilliant newsletter where he discusses (among other things) his experiences of different technologies and the mountain of “inclusivity debt” that exists around tech today. Conversational voice AI is a staggering opportunity to address this. The fact that you can use natural language as an interface with machines, and that these same machines can increasingly “see” and interact with the world on your behalf, is transformational.
Similarly, the ease and universality of translation potentially opens up resources and opportunities to hundreds of millions of people in new ways. I recently spoke at an AI and education event in Delhi. In the West, we often (rightly) worry that tech can accentuate divides, favoring those who already have resources. But in India, I heard educators talking passionately about the power of voice-interface technology to cross social divides, bringing information to people on mobile devices in local languages even if they are not literate.
This flips the script: it’s a positive vision of tech lifting people up rather than concentrating power. Both possibilities can and do exist simultaneously. But I think we need as clear and realistic a view of some of the prizes on offer as we do of the risks and costs. Universal translation and voice interfaces could open up resources to hundreds of millions of people who have been excluded by text-heavy, monoglot systems. And doing justice to such potentials means looking at the larger systems of which AI is one part: in terms of regulation, incentives, political and ethical ambition.
10. It may be too late, but could reflect on the quote from the fictional Dr Ian Malcolm: “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should” in relation to pedagogic AI?
I’m grateful for the opportunity to recall Jeff Goldblum’s immortal delivery of this line! He’s right, of course, in that it’s painfully easy to get carried away by novel possibilities without considering impacts, or while telling yourself that you’re simply working to the best of your abilities.
Two things I feel are that, as I try to say in the paper, students and faculty are already making use of AI all the time; and that developing some critical literacy and confidence as a user of AI (and audience for its outputs) is incredibly important. The genie is out of the bottle and circling the earth at high speed; the task is to do better than indulging pure doom or hype.
Indeed, as my colleague Timo recently put it to me, to avoid AI in university teaching is to send out into the world new doctors, lawyers, computer scientists, businesspeople and others who are entirely self-taught in a transformational technology that they will be using every day of their professional (and personal) lives. That would not be a service to either the students themselves or to wider society.
Finally, I think it also helps a lot to refuse to take AI at its own estimation: to insist that we know a great deal it simply does not and cannot; and that for all its speed and power it remains an informational technology whose successes and failures can ultimately only be measured in human terms.


