Note: If you haven’t already read my previous post, I’d recommend you give it a quick scan as I cover some material which I make reference to below.

So you’ve got some access to AI tools and sort of know how they work. But what are they for? I know sometimes big tech can meet education with a solution looking for a problem and I’m keen to be clear-eyed about how we review “innovation”. I think there are some genuine use cases which I’ll outline a bit below. It’s worth noting that engagement with AI tech is deceptively simple. You can just write a question and get an (uncannily good sounding) answer. However, if you put in some time to craft your interaction, you’ll find that the quality rises sharply. Most people don’t bother, but I think that in academia we have enough bespoke situations that this might be warranted. In this article I’ll also detail a bit of the learning and investment of time that might be rewarded for each scenario. Here are, as I see them, some of those use caess:

1. Transcribe audio/video

AI tools like whisper-AI, which can be easily self-hosted with a fairly standard laptop, enable you to take a video or audio file and convert it very quickly to very accurate text. It’s accurate enough that I think the days of qualitative researchers paying for transcription are probably over. There are additional tools being crafted which can separate text into appropriate paragraphs and indicate specific speakers on the transcript (person 1, person 2, etc.). I think that it’s faster for most of us to read / skim a transcript, but also, for an academic with some kind of hearing or visual impairment, this is an amazingly useful tool. See: MacWhisper for a local install you can run on your Mac, or a full-stack app you can run as a WebUI via docker in Whishper (formerly FrogBase / whisper-ui).

Quick note: the way that whisper has been developed makes it very bad at distinguishing separate speakers, so development work is quite actively underway to add on additional layers of analysis which can do this for us. You can get a sense of the state of play here: https://github.com/openai/whisper/discussions/264. There are a number of implementations which supplement whisper-ai with pyannote-audio, including WhisperX and whisperer. I haven’t seen a WebUI version yet, but will add a note here when I see one emerge (I think this is underway with V4 of whishper (https://github.com/pluja/whishper/tree/v4). Good install guide here: https://dmnfarrell.github.io/general/whisper-diarization.

2. Summarise text

Large language models are very good at taking a long chunk of text and reducing it to something more manageable. And  it is reasonably straight-forward to self-host this kind of service using one of the 7B models I mentioned in the previous host. You can simply paste in the text of a transcript produced by whisper and ask a Mistral-7B model to summarise it for you using LMStudio without too much hassle. You can ask things like, “Please provide a summary of the following text: <paste>”. But you might also benefit from different kinds of presentation, and can add on additional instructions like: “Please provide your output in a manner that a 13 year old would understand” or “return your response in bullet points that summarise the key points of the text”. You can also encourage more analytical assessment of a given chunk of text, as, if properly coaxed, LLMs can also do things like sentiment analysis. You might ask: “output the 10 most important points of the provided text as a list with no more than 20 words per point.” You can also encourage the model to strive for literal or accurate results: “Using exact quote text from the input, please provide five key points from the selected text”. Because the underlying data that LLMs are trained on is full of colloquialisms, you should experiment with different terms: “provide me with three key hot takes from this essay” and even emojis. In terms of digital accessibility, you should consider whether you find it easier to get information in prose or in bulleted lists. You can ask for certain kinds of terms to be highlighted or boldface.

All of this work writing out questions in careful ways to draw out more accurate or readable information is referred to by experts as prompt engineering, and there is a lot of really interesting work being done which demonstrates how a carefully worded prompt can really mobilise an AI chatbot in some impressive ways. To learn more about prompt engineering, I highly recommend this guide: https://www.promptingguide.ai.

It’s also worth noting that the questions we bring to AI chatbots can also be quite lengthy. Bear in mind that there are limits on the number of tokens an AI can take in at once (e.g. the context length), often limited to around 2k or 4k words, but then you can encourage your AI chatbot to take on personality or role and set some specific guidelines for the kind of information you’d like to receive. You can see a master at work on this if you want to check out the fabric project. One example is their “extract wisdom” prompt: https://github.com/danielmiessler/fabric/blob/main/patterns/extract_wisdom/system.md.

You can also encourage a chatbot to take on a character, e.g. be the book, something like this:

System Prompt:


You are a book about botany, here are your contents:
<context>

User Query: "What are you about?"

There are an infinite number of combinations of long-form prose, rule writing, role-playing, custom pre-prompt and pre-fix/suffix writing which you can combine and I’d encourage people to play with all of these things to get a sense of how they work and develop your own style. It’s likely that the kid of flow and interaction you benefit from is quite bespoke, and the concept of neurodiversity encourages us to anticipate that this will be the case.

There are some emerging tools which do transcription, diarisation of speakers and summarisation in real time, like Otter.AI. I’m discouraged by how proprietary and expensive (e.g. extractive) these tools are so far, and I think there’s a quite clear use case for Universities to invest time and energy, perhaps in a cross-sector way, to develop some open source tools we can use with videoconferencing, and even live meetings, to make them more accessible to participation from staff with sensory sensitivies and central auditory processing challenges.

3. Getting creative

One of the hard things for me is often the “getting started” part of a project. Once I’m going with an idea (provided I’m not interrupted, gasp!) I can really move things along. But where do I start? Scoping can stretch out endlessly, and some days there just isn’t extra energy for big ideas and catalysts for thinking. It’s also the case that in academia we increasingly have less opportunities for interacting with other scholars. On one hand this is because there might not be others with our specialisation at a given university and we’re limited to conferences to have those big catalytic converastions. But on the other hand, it’s possible that the neoliberalisation of the University and marketisation of education has stripped out the time you used to have for casual non-directed converastions. On my campus, even the common areas where we might once have sat around and thought about those things are also gone. So it’s hard to find spaces, time and companions for creativity. Sometimes all you’ve got is the late hours of the night and you realise there’s a bit of spare capacity to try something out.

The previous two tasks are pretty mechanical, so I think you’ll need to stick with me for a moment, but I want to suggest that you can benefit from an AI chatbot to clear the logjam and help get things flowing. LLMs are designed to be responsive to user input, they absorb everything you throw at them and take on a persona that will be increasingly companionable. There are fascinating ethical implications for how we afford agency to these digital personas and the valence of our relationships with them. But I think for those who are patient and creative, you can have a quite free-flowing and sympathetic conversation with a chatbot. Fire up a 7B model, maybe Mistral, and start sharing ideas and open up an unstructured converastion and see where it takes you. Or perhaps see if you can just get a quick list to start: “give me ten ideas for X”.

Do beware the underlying censorship in some models, especially if your research area might be sensitive, and consider drawing on models which have been fine-tuned to be uncensored. Consider doing some of the previous section work on your own writing: “can you summarise the key points in this essay?” “what are the unsubstantiated claims that might need further development?”

There’s a lot more to cover, but this should be enough to highlight some of the places to get started, and the modes of working with the tools which will really open up the possibilities. In my next post, I’ll talk a bit about LLM long-term memory and vector databases. If you’re interested in working with a large corpus of text, or having a long-winded conversation preserved across time, you might be interested in reading more!

I’ve spent the last several months playing with AI tools, more specifically large language (and other adjacent data) models and the underlying corpus of data that formed them, trying to see if there are some ways that AI can help an academic like me. More particularly, I’m curious to know if AI can help neurodivergent scholars in large bureaucratic Universities make their path a bit easier. The answer is a qualified “yes”. In this article, I’ll cover some of the possible use cases, comment on the maturity, accessiblity and availability of the tech involved and explain some of the technological landscape you’ll need to know if you want to make the most of this tech and not embarrass yourself. I’ll begin with the caveats…

First I really need to emphasise that AI will not fix the problems that our organisations have with accessiblity – digital or otherwise – for disabled staff. We must confront the ways that our cultures and processes are founded on ableist and homogenous patterns of working, dismantle unnecessary hierarchies, and reduce gratuitous beurocracy. Implementing AI tools on top of these scenarios unchanged will very likely intensify vulnerability and oppression of particular staff and students and we have a LOT of work to do in the modern neoliberal University before we’re there. My worst case scenario would be for HR departments to get a site license to otter.AI and fire their disability support teams. This is actually a pretty likely outcome in practice given past patterns (such as the many University executives which used the pandemic as cover to implement redundancies and strip back resource devoted to staff mental health support). So let’s do the work please? In the meantime, individual staff will need to make their way as best as they can, and I’m hoping that this article will be of some use to those folx.

The second point I need to emphasise at the outset is that AI need not be provided through SAAS or other-subscription led outsourcing.Part of my experimentation has been about tinkering with open source and locally hosted models, to see about whether these are a viable alternative to overpriced subscription models. I’m happy to say that “yes”! these tools are relatively easy to host on your own PC, provided it has a bit of horsepower. Even more, there’s no reason that Universities can’t host LLM services on a local basis at very low cost per end user, vastly below what many services are charging like otter.AI’s $6/mo fee per user. All you need is basically just a bank of GPUs, a server, and electricity required to run them.

What Are the Major Open Source Models?

There are a number of foundational AI models. These are the “Big Ones” created at significant cost running over billions of data points, by large tech firms like OpenAI, Microsoft, Google, Meta etc. It’s worth emphasising that cost and effort are not exclusively bourne by these tech firms. All of these models are generated on the back of freely available intellectual deposit of decades of scholarly research into AI and NLP. I know of none which do not make copious use of open source software “under the hood.” They’re all trained on data which the general public has deposited and curated through free labour into platforms like wikipedia, stackexchange, youtube, etc., and models are developed in public-private partnerships with a range of University academics whose salaries are often publicly funded. So I think there is a strong basis for ethically oriented AI firms to “share alike” and make their models freely available, and end users should demand this. Happily, there have been some firms which recognise this. OpenAI has made their GPT1 and GPT2 models available for download, though GPT3 and 4 remain locked behind a subscription fee. Many Universities are purchasing GPT subscriptions implicitly as this provides the backbone for a vast number of services including Microsoft’s CoPilot chatbot, which have under deployment to University staff this last year as a part of Microsoft’s ongoing project to extract wealth from the education sector in the context of subscription fees for software (Microsoft Teams anyone?). But it doesn’t have to be this way – there are equally performant foundational models which have been made freely available to users who are willing to hack a bit and get them working. These include:

  • LLaMA (Language Learning through Multimodal Adaptation), a foundation model developed by Meta
  • Mistral (a foundation model designed for mathematical reasoning and problem-solving), which has been the basis for many other models such as NeuralChat by Intel.
  • Google’s Gemini and BERT models
  • BLOOM, developed by a consortium called BigScience (led by huggingface primarily)
  • Falcon, which has been funded by the Abu Dhabi sovereign wealth fund under the auspices of Technology Innovation Institute (TII)
  • Pythia by EleutherAI
  • Grok 1 developed by X.ai

These are the “biggies” but there are many more smaller models. You can train your own models on a £2k consumer PC, so long as it has a bit of horsepower and a strong GPU. But the above models would take, in some cases, years of CPU time for you to train on a consumer PC and have billions or even trillions (in the case of GPT4) parameters.

What Do I Need to Know About Models? What can I run on my own PC?

To get a much broader sense of how these models are made and what they are I’d recommend a very helpful and accessible write-up by Andreas Stöffelbauer. For now it’s worth focussing on the concept of “parameters” which reflects the complexity of the AI model.You’ll usually see this listed next to the model’s name, like Llama7B. And some models have been released with different parameter levels, 7B, 14B, 30B and so on. Given our interest in self-hosting, it’s worth noting that parameter levels are also often taken as a proxy for what kind of hardware is required to run the model. While it’s unlikely that any individual person is going to train a 30B model from scratch on their PC, it’s far more likely that you may be able to run the model after it has been produced by one of these large consortia that open source their models.

Consumer laptops with a strong GPU and 16GB of RAM can generally run most 7B parameter models and some 10G models. You’ll need 32GB of memory and a GPU with 16GB of VRAM to get access to 14B models, and running 30B or 70B models will require a LOT of horsepower, probably 24/40+ GB RAM which in some cases can only be achieved using a dual-GPU setup. If you want to run a 70B model on consumer hardware, you’ll need to dive the hardware discussion a bit as there are some issues that make things more complex in practice (like a dual-GPU setup), but to provide a ballpark, you can get second hand NVidia RTX 3090 GPU for £600-1000 and two of these will enable you to run 70B models relatively efficiently. Four will support 100B+ models which is veering close to GPT4 level work. Research is actively underway to find new ways to optimise models at 1B or 2B so that they can run with less memory and processing power, even on mobile phones. However, higher parameter levels can help with complex or long-winded tasks like analysing and summarising books, preventing LLM “hallucination” an effect where the model will invent fictional information as part of its response. I’ve found that 7B models used well can do an amazing range of tasks accurately and efficiently.

While we’re on the subject of self-hosting, it’s worth noting that when you attempt to access them models are also often compressed to make them more feasible to run on consumer hardware, using a form of compression called “quantization“. Quantization levels are represented with “Q” values, that is a Llama2 7B model might come in Q4, Q5 and Q8 flavours. As you’ll notice lower Q levels require less memory to run. But they’re also more likely to fail and hallucinate. As a general rule of thumb, I’d advise you stick with Q5 or Q6 as a minimum for models you run locally if you’re going to work with quantized models.

The units that large language models work with are called tokens. In the world of natural language processing, a token is the smallest unit that can be analyzed, often separated by punctuation or white space. In most cases tokens correspond to individual words. This helps to breaks down complex text into manageable units and enables things like part-of-speech tagging and named entity recognition. A general rule of thumb is that 130 tokens correspond to roughly 100 words. Models are trained to handle a maximum number of array elements, e.g. tokens in what is called the “context length“. Humans do this too – we work with sentences, paragraphs, pages of text, etc. We work with smaller units and build up from there. Context length limits have implications for memory use on the computers you use for an LLM, so it’s good not to go too high or the model will stop working. Llama 1 had a maximum context length of 2,024 tokens and Llama 2 stops at 4,096 tokens. Mistral 7B stops at 8k tokens. If we assume a page has 250 words, this means that Llama2 can only work with a chunk of data that is around 16 pages long. Some model makers have been pushing the boundaries of context length, as with GPT4-32K which aims to support a context length of 32K or about 128 pages of text. So if you want to have an LLM summarise a whole book, this might be pretty relevant.

There are only a few dozen foundational models available and probably only a few I’d bother with right now. Add in quantization and there’s a bit more to sift through. But the current end-user actually has thousands of models to sift through (and do follow that link to the huggingface database which is pretty stellar) for one important reason: fine-tuning.

As any academic will already anticipate, model training is not a neutral exercise. They have the biases and anxieties of their creators baked into them. In some cases this is harmless, but in other cases, it’s pretty problematic. It’s well known that many models are racist, given a lack of diversity in training data and carelessness on behalf of developers. They are often biased against vernacular versions of languages (like humans are! see my other post on the ways that the British government has sharpened the hazards of bias against vernacular English in marking). And in some other instances, models can produce outputs which veer towards some of the toxicity embedded in the (cough, cough, reddit, cough) training data used. But then attempts to address this by developers have presented some pretty bizarre results, like the instance of Google’s gemini model producing a bit too much diversity in an overcorrection that resulted in racially diverse image depictions of nazis. For someone like me who is a scholar in religion, it’s also worth noting that some models have been trained on data with problematic biases around religion, or conversely aversion to discussing it at all! These are wonderful tools, but they come with a big warning label.

One can’t just have a “redo” of the millions of CPU hours used to train these massive models, so one of the ways that developers attempt to surmount these issues is with fine-tuning. Essentially, you take the pre-trained model and train it a bit more using a smaller dataset related to a specific task. This process helps the model get better at solving particular problems and inflecting the responses you get. Fine-tuning takes a LOT less power than training models, and there are a lot of edge cases, where users have taken models after they’ve been developed and attempted to steer them in a new direction or a more focussed one. So when you have a browse on the huggingface database, this is why there aren’t just a couple dozen models to download but thousands as models like Mistral have been fine-tuned to do a zillion different tasks, including some that LLM creators have deliberately bracketed to avoid liability like offering medical advice, cooking LSD, or discussing religion. Uncensoring models is a massive discussion, which I won’t dive into here, but IMHO it’s better for academics (we’re all adults here, right?) to work with an uncensored version of a model which won’t avoid discussing your research topic in practice and might even hone in on some special interests you have. Some great examples of how censoring can be strange and problematic here and here.

Deciding which models to run is quite an adventure. I find it’s best to start with the basics, like llama2, mistral and codellama, and then extend outwards as you find omissions and niche cases. The tools I’ll highlight below are great at this.

There’s one more feature of LLMs I want to emphasise, as I know many people are going to want to work with their PDF library using a model. You may be thinking that you’d like to do your own fine-tuning, and this is certainly possible. You can use tools like LLaMA-Factory or axolotl to do your own fine-tuning of an LLM.

How Can I Run LLMs on My Pc?

There is a mess of software out there you can use to run LLMs locally.

In general you’ll find that you can do nearly anything in Python. LLM work is not as complex as you might expect if you know how to code a bit. There are amazing libraries and tutorials (like this set I’d highly recommend on langchain) you can access to learn and get up to speed fairly quickly working with LLMs in a variety of use-cases.

But let’s assume you don’t want to write code for every single instance where you use an LLM. Fair enough. I’ve worked with quite a wide range of open source software, starting with GPT4All and open-webui. But there are some better options available. I’ve also tried out a few open source software stacks, which basically create a locally hosted website you can use to interface with LLM models which can be easily run through docker. Some examples include Fooocus, InvokeAI and Whishper. The top tools “out there” right now seem to be:

Note: the most comprehensive list of open source tools you can use to run an AI chatbot I’ve seen to date can be found here.

I have a few tools on my MacBook now and these are the ones I’d recommend after a bit of trial and error. They are reasonably straight-forward GUI-driven applications with some extensability. As a starting point, I’d recommend lmstudio. This tool works directly with the huggingface database I mentioned above and allows you to download and keep models organised. Fair warning, these take a lot of space and you’ll want to keep an eye on your hard disks. LMStudio will let you fine tune the models you’re using in a lot of really interesting ways, lowering temperature for example (which will press the model for more literal answers) or raising the context length (see above). You can also start up an ad hoc server which other applications can connect to, just like if you were using the OpenAI API. Alongside LMStudio, I run a copy of Faraday which is a totally different use case. Faraday aims to offer you characters for your chatbots, such as Sigmund Freud or Thomas Aquinas (running on a fine-tuned version of Mistral of course). I find that these character AIs offer a different kind of experience which I’ll comment on a bit more in the follow-up post along with mention of other tools that can enhance this kind of AI agent interactivity like memgpt.

There are real limits to fine-tuning and context-length hacking and another option I haven’t mentioned yet, which may be better for those of you who want to dump in a large library of PDFs is to ingest all your PDF files into a separate vector database which the LLM can access in parallel. This is referred to as RAG (Retrieval-Augmented Generation). My experimenting and reading has indicated that working with RAG is a better way to bring PDF files to your LLM journey. As above, there are python ways to do this, and also a few UI-based software solutions. My current favourite is AnythingLLM, a platform agnostic open source tool which will enable you to have your own vector database fired up in just a few minutes. You can easily point AnythingLLM to LMStudio to use the models you’ve loaded there and the interoperability is pretty seamless.

That’s a pretty thorough introduction to how to get up and running with AI, and also some of the key parameters you’ll want to know about to get started. Now that you know how to get access up and running, in my second post, I’ll explain a bit about how I think these tools might be useful and what sort of use cases we might be able to bring them to.

One of the key reasons I’m reluctant to share with others about being autistic relates to the way that communication by autistic people has been relentlessly pathologised. Even now, the key way that autism is defined in diagnostic manuals and social research primarily foregrounds, as one research article puts it is that: “autism manifests in communication difficulties, challenges with social interactions, and a restricted range of interests”. I don’t know a single autistic person who would foreground those things are the primary driver of their personal alterity and lived experience. They are challenges, but those aren’t the defining features of being autistic. But that’s the stereotype out there which is continually repeated by non-autistic researchers. This is foregrounded for those autists in Higher Education who declare a disability at work as we’re categorised in the following way by the Higher Education Statistics Agency: “A social/communication impairment such as Asperger’s syndrome / other autistic spectrum disorder.”

The upshot of this is that I have an abiding fear that when sharing about my neurodivergence with others, that person will subconsciously begin to find signs of disorder in every social interaction we have after that discovery. This has happened in the past and it’ll continue to happen in the future. And there are corrolaries which also make me wince, like when people speak really loudly or slowly to immigrants in spite of their clear English language proficiency. It’s very hard to surmount the challenges inherent in a relationship where someone is condescending because they have an implicit sense of personal superiority. And we all experience insecurity in ways that drives us to inhabit these spaces of superiority more often than we’d like to acknowledge.

So I’d like you to know about this worry I have.

But, there’s another piece in here that’s worth us considering. In the face of these odd diagnostic framings, I always want to ask: don’t we all have problems with social communication? Isn’t this a key part of being a living creature? Doesn’t every creature experience conflict as it occurs in any healthy relationship? There are whole fields of study, like philosophical hermeneutics, post-humanism and critical animal studies, which seek to confront the fascinating aspects of building understanding and the causes of misunderstanding in communication.

So rather than try to pretend you don’t notice when I’ve clearly missed your point, or I’ve read your reaction to something as more severe than you intended it to be, why not lean in to the awareness that you have trouble communicating sometimes too, that when you’re feeling tired and badgered by the world you might not have extra bandwidth for interpreting cues, mediating confusion or faciliting the process of bridging misunderstanding?

I’m fascinated by the ways that we hold culturally encoded double-standards around communication. In many cases, facilitating understanding by a listener or reader is taken to be a hallmark of skilled communication. This is undoubtedly the case, as I’ve learned from a lifetime of cross-cultural communication and teaching, which is often about troubleshooting how effectively you’ve been understood and learning to anticipate and surmount barriers. But if we’re being honest here, I think it’s worth acknowledging that being well-understood can also be a feature of having a homogenous social life and inhabiting hierarchies. It’s much more likely that, for most of us, we think of ourselves as easily understood and understanding people simply because we don’t spend that much time outside our comfort zone, staying within close-knit circles of people who share our experience, cultural background, social class, and particular competencies. There are forms of deference which are built into relationships where we are expected to mask misunderstanding and protect fragile egos.

What I really want to see is how a person performs when they’re thrown into a situation where they’re expected to communicate with people you don’t share much with. You can see this at work when people travel outside their home country for the first time, take an unexpected career transition, or move to a new place. Suddenly a person realises that their communication competencies do not arise from skills, experience and training, that those capabilities are more fragile than they’d expected and that there’s some hard work ahead. Moreover, when we are thrown into that kind of situation, we’re confronted with the sides of ourselves that emerge when we’re under stress: you may be impatient, sharp, slow to react, etc. and this compounds the embarrasment and difficulties of surmounting misunderstanding.

Some of the best teachers I’ve worked with are people who have placed themselves in situations of language and cultural diversity and developed forms of grace and patience for themselves and others which are the gateway to understanding. Some of the most skilled and empathetic communicators I know are neurodivergent people. Imagine how it might transform our organisations and families if we were more honest about how we’ve experienced breakdowns in communication, and more forensic about the aspects of our culture which drive us to conceal or hurry past misunderstanding in favour of quick and decisive action.

Over the past two years, there have been some significant challenges that educators in Universities have had to confront, in many cases driven by top-down policy initiatives. There are a few different places where impacts have been seen – but one area I’d like to highlight in this post lies in assessment. The stakes are already quite high for practitioners striving to engage in the craft of pedagogy, as I’ve recently posted, because quantifying student achievement in a homogenous way (e.g. grading) is bound to cause problems. When we try to signal achievement in meaningful ways whilst working with forms of feedback that reduce something complex, layered, and multi-modal like human communication, we are bound to experience tensions and shortcomings. This is HARD work. But let’s assume we’re stuck with grading systems and don’t have access to more creative and collaborative options. If we accept this as our lot, then we need to work with these systems to mitigate their failures, especially in inflexibility (feedback to students around assessed work will inevitably hit up against limits and edge conditions) and opacity (our communication back to students in feedback is just as complex as their communication to us in their essays!). So at the very least, I think, we need to include flexibility in our approach, because humans just aren’t all the same, and our experiences and styles of learning are often radically different. Difference, in my view is best engaged in a relational way, and by extension is best managed at a local level. But can we actually do this in practice? There are some external factors intervening which educators in Universities need to be aware of as we try to develop careful and responsible policy around marking.

The policy landscape has shifted in the background over the past three years in the UK in ways that many academics (including myself) won’t have noticed. I’d like to break this down at some length as it helps to explain why we’ve gotten where we are just now (many readers may experience an “ah, so that’s why….!” moment) and unpack some of the parameters around how I think the educational landscape is being changed around assessment.

Part 1: A brief history of OfS interventions around Assessment in Universities

Over the past decade in the UK, there have been massive shifts in the way that government relates to Universities: shifting from a fairly hands-off mode under the Higher Education Funding Council for England (HEFCE) which was finally disbanded in 2018 towards a much more interventionist model driven through two new units: the Office for Students (OfS) and United Kingdom Research and Innovation (UKRI). Over the past ten years, and leading up to the formation of the OfS, government bodies involved in oversight and regulation of Higher Educatio have taken a much more interventionist stance on a variety of fronts within education, and assessment has been a particular target. This is in many ways a resurgence of the “Quality Wars” which have been on and off in Britain for several decades. Returning to recent history, from 2018 until 2021, Universities were regulated under The UK Quality Code for Higher Education (often called simply “the Quality Code”). This was generated in partnership with academic bodies and well regarded on an international level as a set of principles for setting standards in academic practice at Universities.

The Higher Education and Research Act which was passed by Parliament in 2017 initiated a process of review which eventually led to the launch on 17 November 2020 of a “Consultation on regulating quality and standards in higher education“. This process underwrote the drafting by OfS of a set of new policies, intended to replace the Quality Code, which were disseminated to the public alongside a lot of unqualified (and I daresay irresponsible) commentary around confronting “low quality courses” (WONKHE summary). The new policies were shared and feedback was drawn in from 20 July 2021 to 27 September 2021 (PDF here). There were serious questions about the intention of this process from the start (cf. this piece for WONKHE by David Kernohan and Jim Dickinson). Why for example, describe a process as consultative without meaningful involvement from students (much less educators) in the process of policy design? After that consultation, OfS produced a (largely dismissive) response to feedback from those two groups (PDF here). The conditions in that code came into force on 1 May 2022, just over a year ago. If you want to get into the nitty-gritty of how that public dialogue has unfolded, I recommend you read the various waves of analysis by interpreters on WONKHE which I’ve highlighted above.

These policies revolve around five long-standing “key conditions”: titled “B1”-“B5”, which are the following:

Condition B1: The provider must deliver well-designed courses that provide a high quality academic experience for all students and enable a student’s achievement to be reliably assessed.
Condition B2: The provider must provide all students, from admission through to completion, with the support that they need to succeed in and benefit from higher education.
Condition B3: The provider must deliver successful outcomes for all of its students, which are recognised and valued by employers and/or enable further study.
Condition B4: The provider must ensure that qualifications awarded to students hold their value
at the point of qualification and over time, in line with sector recognised standards.
Condition B5: The provider must deliver courses that meet the academic standards as they are described in the Framework for Higher Education Qualification (FHEQ) at Level 4 or higher.

These are seem to me like fairly reasonable things to aspire to in providing higher education. And while one might object to the back and forth of the policy process and the underlying instability this produces for students and academic staff in the sector alike, the actual headline suggestions seem quite unobjectionable.

However OfS process has pressed far beyond headline guidance and unpacked these in what seem to be forensic yet problematic ways in subsequent communication. The consultation documents acknowledge this agenda: “The main change from the current conditions is that the proposals include more detail about the matters that would fall within the scope of each condition and how these would be interpreted”. So the process didn’t just reshape the principles, but also indicated a shift from broad principles and quality monitoring towards much more explicit and specific policy delineation. This dialectic between specific rules and broad principles is a perennial point of debate between ethicists (like myself), and as you’ll find in my publications, I tend to prefer a principled approach, not least because it allows for a more relational and flexible approach to policy which can accommodate diversity and pluralistic communities (as I’ve highlighted above). Rules are better at controlling people, but they also run the risk of causing oppression and undermining the relationships in which education functions.

Back to the discussion at hand. It’s worth emphasising that the OfS is not playing around here: the potential consequences for a University which is found to be in breach of any of these conditions might be a refusal by OfS to validate their degrees and basically cancel graduation for that institution. Also fines. Expensive scary fines.

So what’s in the details? There are little landmines throughout the details here, as one might expect. I’d like to focus on the development of criteria for how “a student’s achievement to be reliably assessed” (condition B4). In the 2021.24 document, OfS provides further detail of the requirements entailed by B4:

B4.2 Without prejudice to the scope of B4.1, the provider must ensure that:

a. students are assessed effectively;
b. Each assessment is valid and reliable;
c. academic regulations are designed to ensure that relevant awards are credible; and
d. relevant awards granted to students are credible at the point of being granted and when compared to those granted previously.

They explain B.4.2.a in more detail a bit further on:

c. “assessed effectively” means assessed in a challenging and appropriately comprehensive way, by reference to the subject matter of the higher education course, and includes but is not limited to:

i. providing stretch and rigour consistent with the level of the course;
ii. testing relevant skills;
iii. requiring technical proficiency in the use of the English language; and
iv. assessments being designed in a way that minimises the opportunities for academic misconduct and facilitates the detection of such misconduct where it does occur.

In case this wasn’t clear enough, OfS provides even further guidance down the page, and colleagues may begin to notice here some of the drivers of policy which has been depoyed over the past two years by cautious University VCs and management eager not to fall afoul in the ways that their universities manage education and risk scary consequences:

50. In relation to “assessed effectively”, the following is an illustrative non-exhaustive list of examples to demonstrate the approach the OfS may take to the interpretation of this condition… “Marking criteria for assessments that do not penalise a lack of proficiency in the use of written English in an assessment for which the OfS, employers and taxpayers, would reasonably expect such proficiency, would be likely to be of concern. Students are not penalised for poor technical proficiency in written English. For example, for assessments that would reasonably be expected to take the form of written work in English and for which the OfS, employers and taxpayers, would reasonably expect such proficiency, the provider’s assessment policy and practices do not penalise poor spelling, punctuation or grammar, such that students are awarded marks that do not reflect a reasonable view of their performance of these skills.”

As you can see, we’ve gone from broadly unoffensive headline principles to some quite odd particulars very quickly in the OfS process. There are a number of strange things here: why single out “technical proficiency in the use of the English language” at all given how such things are already intrinsic to most degree programmes, especially within the humanities? If there are problematic actors in HE, it seems much more sensible to confront that on a specific level rather than deploy a redundant policy. But also, the emphasis here is not on measuring student achievement, but on penalising a lack of proficiency. There’s no interest here of celebrating a surplus of profiency. It’s also uncomfortable to find language here, reaching beyond student experience towards “the OfS, employers and taxpayers”. Many readers will be able to imagine a thousand different ways this could have been written (or not at all), less agressively, more constructively, etc. But let’s set aside these uncomfortable quirks for a moment and ponder together what exactly this all might look like in practice.

Part 2: Why does it matter?

One might come to the end of this and say something like, “yes, I wish this was worded more carefully and sensitively, but there’s nothing wrong with this policy in practice. And why not have more specific versions of our policy, much less these ones? Don’t we want our students to be effective communicators?” There are two key problems here which are complex and deserve our attention, not least because the victims are already on the margins of HE:

Problem 1: ambiguity and overcompliance

From a policy perspective, this is all a bit confusing. On one hand, it has the aspect of really highly specified policy. But on the other hand, even after reading all the guidance, it all seems quite murky. This issue with highly-specified policy which is nonetheless unclear on implementation details, is a recurring problem in the sector. The same issues apply to immigration policy in higher education, with providers often stretching far beyond the “letter of the law” in reaction to ambiguity which is intrinsic to policy demands.

How does one assess English language proficiency (for the “taxpayers”, of course)? Well, there are basically two ways to interpret this guidance. The first option is to take it at face value and do the specific things they mention, e.g. include in our processes of assessment specific checks penalising people when they show they are not sufficiently proficient in spelling and grammar. As I’ve said above, these things are already widely practiced in the sector, so one couldn’t help but begin to wonder, is there somethign else I’m missing here? There’s an important phrase in the text above where the guidance explains what “assessed effectively” means. Did you see it? It’s the part which says “includes but is not limited to”. So is one right to assume that these very (even weirdly) specific examples are the policy? Or is this just a sort of initiation into a pattern of behaviour that educators are being pressed into? So here we have seemingly highly specified policy with loads more detail than previous policy guidance had in it, associated with really scary and dire consequences for people who breach these new policies, and some intimations that we need to develop regimes for surveillance to punish wrongdoers but even at the end it’s clear that “there’s more…” without specifying what exactly that means. In organisations where there are administrative staff whose job is to mitigate risk, when situations of ambiguity arise, the resulting policy response will veer to the side of caution, especially if there are fears of serious consequences looming if one is cast into the limelight for falling afoul of a policy. So in some cases, organisations do just what they’re told. But in many cases organisations reply to policy in extreme ways – this has been the case in response to prevent policy on campuses, immigration and right-to-work checks, even response to GDPR policy is implemented in ways that on closer look are unnecessarily extreme. So at the very least, when you’re faced with a situation like this, there is an increase in the hazard that the resulting implmentations or responses to a policy demand may reach much further than they need to.

Problem 2: linguistic competency

Things get even murkier when you start to really try and work out the details. Marking spelling accuracy is dead easy, and students ought to know better as they can easily access dictionaries and spell checks. We do penalise sloppy work in terms of misspelling, I know this is the case from years of practice as an educator, and also an an external examiner for a variety of Universities. The same is true of punctuation. The rules are highly standardised and easy to access. But what about grammar? On one hand, verb tense is pretty important to get right. And there are guides which can check this. From years of reading, I can spot bad grammar almost automatically on the page in a variety of ways. But what exactly does it look like to be maximally proficient in English language usage? What exactly are the rules we follow that make communication effective? This is where things get murky.

There are a number of examples where some people might point to the formality of prose as an important standard, my students every year ask me if they should use personal pronouns, “I think that…” versus “One might think that…”. And my answer is always, “it depends”. Formal language can be helpful, but sometimes it can be impersonal and inaccessible, even a bit stiff. Another example is the use of complex terms. Does proficiency in English language rest on the use of five syllable words? Or is the most compelling prose driven by neat sparse vocabulary? And even more pointedly, what about the use of slang, vernacular, pidgin, or creole terms and idioms? Are those kinds of language casual? Formal? Technical? I can think of many cases where scholars and poets I admire forced me to think through a particular vernacular or regional lens with their writing in English and this elevated their arguments. And as someone who has lived in a number of different English-speaking nations and regional cultures (USA, Canada, Scotland, England and Wales) I can attest to the ways that idiomatic English can vary wildly, both within elite and non-elite cultures.

Once we get past the basics of spelling and grammar, it gets really tricky to actually say what the rules are, and explain how and when they should be broken. Ultimately, good prose just feels right. We can judge its effectiveness through affect and intuition. But also, our judgement relies upon sympathy with the writer: I work hard to understand a poet’s use of vernacular, metaphor, etc. because I trust that they have something to teach me. Another reader could just as easily pick up the poem and conclude that it is obscure and poorly written. I don’t mean to suggest that there are no standards, but that they are not easily accessible and that teaching them is complex and requires a lot of recursive effort. And successful judgement like this only becomes accessible at the mature end of a learning journey and not checkpoints in the early stages. And can we really say that there is such a thing as “Standard English” (SE)? Even within my own department, and in specific taught modules, I can point to a variety of contextually meaningful differences around how to use language to communicate towards certain ends. And if we were to go for the lowest common denominator, is there any way we could say this is more than simply writing mechanics like proper spelling?

There are also ways that working with this kind of intuitive sensibility leads us to rely on personal familiarity – things which “feel right” as an implicit proxy alongside the more rigorous and obvious formal rubrics which establish conventional grammar and spelling. For all these reasons, the measurement of language competency is a hotly conteted topic among specialists in linguistics. At the very least, it is generally taken to be the case that measuring this effectively is extremely difficult, even for highly trained specialists and best not attempted by amateurs (like me, as I have a PhD but am not a scholar in linguistics).

So it’s hard. And potentially it’s pretty likely that any intelligent person, even University faculty, will make mistakes in assessing language competency – at the very least mistaking written communication that “feels familiar” as proficient. But why is this such a big deal?

The hazard here sharpens even further when we apprecaite how, in spite of our best efforts, the aesthetics of student feedback processes can collude with forms of racialised, class-encoded, ablist, and enculturated privilege. Linguistic racism and misogny is widely recognised as an issue in workplaces, impacting customer relationships, undermining team communication, and widening inequality in hiring processes. This has been investigated in some specific case studies into the ways that vernacular language can be stigmatised and used as the basis for discrimination and negatively impact academic outcomes. In particular, studies at Universities have found lower levels of achievement for Black students when compared to their counterparts who are racialised as white. This in turn has been linked, albeit only in part, to the stigmatisation of African American Vernacular English (AAVE) over so-called Standard English (SE). Seen in this way, forms of ethnic identity embedded in language patterns defeat the anti-bias intentions of anonymous marking and create a context for inequality in marking. A wide range of solutions have been proposed in pedagogical literature, including (particularly among proponents of student-centred learning) forms of teaching which encourage the reading and use of vernacular speech as a prelude to critical engagement with vernacular culture more broadly. Given the amazing levels of diversity that we enjoy in our Universities, it would be well worth engaging in a corporate discussion around how we engage and celebrate vernacular (if indeed we do), and conversely, how we can mitigate discomfort by other students and staff who are unaccustomed to deviations from Standard English (SE). However, setting aside this more proactive aspiration, it seems to me to mark a step in the opposite direction to introduce punitive measures for markers who find lacking English language proficiency, particularly without any specific guidance as to how vernacular language might be handled, and how evidence of research and understanding might be affirmed in ways which are separate from language flow. All human cognition mobilises bias, so it’s not enough to aspire to an unattainable “unbiased” state of mind, but rather essential to acknowledge and understand in an ongoing way how our biases, especially implicit ones, mobilise forms of privilege in our teaching and research.

In my teaching, I introduce students to the importance of local and culturally inflected forms of knowledge in responding to public policy challenges like climate change. We also discuss how cultural fluency and dynamism can serve as a transferrable skill, enabling our students to support workplaces which want to reach new audiences and develop forms of end-user engagement which are culturally relevant. In particular, within Theology and Religious Studies we discuss the importance of vernacular culture as tools for community development, we introduce students the ways that feminist scholars and poets deploy informal language, like contractions, and use creole and vernacular vocabulary as a way of challenging unjust hierarchies and emphasising the social good of diversity in practice. I even hope that my students may come to deploy those tools themselves in their written communication, thinking about the ways that forms of communication transmit not just information but personal and social values. I fear that education policy in Britain may prefer to pay lip service to the goods of diversity, whilst failing to provide support and infrastructure to underpin these kinds of learning.

Part 3: What should we do?

The OfS has created a really unfortunate challenge here, ultimately risking the recolonisation of education (contradicting our glossy brochures which advertise our de-colonising work). I gather from reading the OfS response to the policy consultation that there was quite a lot of unhappiness about the policy. This included, as the report indicates,

  • Suggestions that it was not appropriate for the OfS, as a principles-based regulator, to prescribe how a provider should assess a student’s language proficiency beyond the course learning outcomes, and whether it was possible ‘to infer what employers’ and taxpayers’ specific expectations might be in any particular circumstance.
  • Views that English language proficiency receives disproportionate attention in the proposed guidance, with respondents questioning whether it is a priority for non-UK partners or appropriate for all TNE courses.
  • Disabled students or students for whom English is a second language may be disproportionately affected by an approach that expect proficiency in the English language as a relevant skill. For example, one respondent commented that the ‘focus on proficiency in written English has potential implications for institutional approaches to inclusive assessment, which are designed to ensure that students with specific disabilities are not disadvantaged during assessments, and to thereby comply with the Equality Act’.
  • The level of English proficiency required would be better considered on a subject or course basis, based on academic judgement about whether mistakes in written English are material or not.

I’ll let you read their resposes (starting on page 25)

It has become clear to me that protest is not an option. There is simply too little will within the sector leaders to respond to OfS, at least overtly, with refusal to comply and a demand for more sensitive education policy (I would be delighted to be corrected if I’m wrong about this). So there are a two different models of compliance that seem viable, within some specific conditions, to me:

(1) Affirm ways that assessment criteria and feedback already achieve these demands. This is true of all the the teaching contexts where I’ve ever worked or assessed as an external examiner in the UK, so ultimately a viable option, though I accept that there may be some contexts where this isn’t the case. I don’t see any reason why these can’t be handled on a case-by-case basis.

(2) Make very specific additions to marking criteria. It may be that there are some cases where it becomes clear that spelling and grammar haven’t been part of assessing student work. If that’s the case, then the least harmful and hazardous way to achieve compliance here is to be highly specified: “spelling and grammar”. Reaching beyond this in any particular way will enhance the risk and dynamics of bias in assessment.

It’s worth noting that the latter option might be chosen not because there are clear lapses in educational design, but out of a desire to be seen as “doing something”. There may be a fear lurking here (which I’ve alluded to above) driven by the dynamics of over-complicance, that we need to do something visibe in response to the “new” demand to avoid surveillance and punishment. I’ll note here my reservations as an ethicist with any engagements with policy that are arbitrary. There will always be problematic and unfoeseen implications lurking downstream as our use of arbitary policy begins to accumulate and iterate. And these consequences often tend to impact persons (both educators and students) who are already marginalised in other ways.

My anxieties here around bias in marking align with some other work I’m running in parallel with some stellar colleagues around stereotype threat and implicit bias. I’ll be writing more in months ahead about these projects and what I think we can do to proactively address the harmful impacts these phenomena have on students and teachers more broadly in our teaching practice. I’m also working with colleagues to find ways to more proactively celebrate linguistic diversity in our teaching in higher education and will share more about this work as it unfolds. But it’s worth stressing at this point that it is irresponsible to implement harmful and risky policy (even in the current atmosphere) with the expectation that we (or others) will mitigate those problems after the fact with bias training. This is highlighted well in a recent piece by Jeffrey To in Aeon.

I’d be very glad to hear what colleagues have attempted and how educators are responding to these challenges across the sector and hope you’ll share from your experience in the comments.

So I think this might be a good summary of an ND preferences wrt/ digital systems. The author doesn’t claim to be autistic, but I certainly shout “amen” like every other line. Would love to know if others relate: https://catgirl.ai/log/comfy-software/

I also think that customisability is important because it’s often the only way that many of us can getting software accessibility. Eg by making it that way ourselves. So I’m a hacker because I like to play with digital tools, but am also starting to realize that I HAD to become a bit geeky or I would have been left behind in a zillion ways. Get to the front of the pack so you don’t get left behind…

The HackerNews comment thread for that article is also a hot and interesting mess – highlighting the ways that different ND flavours and generational cultures frame how we are allowed to speak. I found myself wondering if it was a suitable proxy for what we might find if at UOB we could pull back the curtain… c/w: discussions of depression, suicide, ablism, and generally insensitivity: https://news.ycombinator.com/item?id=33053144

What tools and techniques do you use to preserve the flow (e.g. work around interruptions), lean in on your monotropic self, and mitigate challenges brought about by co-occurring bits (mood, anxiety, etc.)?

Here’s a first one to get us started: I find it takes a lot of emotional and mental energy to set up meetings. And it’s one of those things which comes at you interrupts flow and requires setup each time. And the consequence of this was that it was taking ages for me to jump in and get stuff organised even for simple one-to-ones. So I started using an online scheduling tool (first calendly.com then cal.com when calendly jacked up prices and went in a non-open direction). I created profiles for my time (e.g. when I don’t want to be interrupted, and also consolidating certain types of conversation so I can get in the flow with a series of meetings).

Using the tool had a learning curve – there were a few months where I had twice as many meetings as I could really handle in a particular interval and I had to tweak. I also found that I needed to create “public” style meeting options and “private” ones, so that friends could book in really any time they wanted with a much less restricted set of options, and certain less exciting work tasks could be put in a very specific bin. It has made a huge difference for my workflow and energy levels each week. I’ve also been pleasantly surprised to see how many (some neurodivergent) friends, colleagues and students were also really grateful to have a low-friction way of setting something up.

Cal.com opens up a website for people to click on date and time to set things up, and then automatically puts it on my digital calendar along with any special information I’ve asked for (which is very useful for one-to-one meetings with students so I don’t forget what they want to talk about). It can also do automatic reminders to people so they don’t forget the meeting and then it’s a robot nagging them and not me which is also nice.

At our University IT services migrated to a new off-premise exchange server setup last year and in the space of a week turned off all access to our calendars for external tools without warning or consultation (at least not any that I’d seen – and I would have provided a response if I’d been asked!). My attempts to explain what a severe impact this had on me fell on deaf ears. So I had three months where my diary and life were in total disarray last Spring. So I don’t use exchange for my calendar anymore. I’ve got another one offsite I use (on my own server if you’re curious). I put a message in my diary for every day of the year which warns people that I don’t use outlook for my diary. The downside is that it’s a pain for colleagues doing meeting requests in exchange. Upside is that I don’t have to worry about IT changing policies on me without consultation in the future, which I’m sure they will do.

I sat through a teaching session with some children this week, in what was an eye-wateringly bad, cringeworthy, accessibility-worst-case scenario. And a few basic principles popped into my head while I watched the speaker shame and marginalise children who were desperately trying to find ways to engage with the material and show their enthusiasm for the topic.

  1. Cut your one hour talk into 5 minute pieces
  2. Is there seating?
  3. Will ambient noise in the room be amplified or dampened by acoustics?
  4. How can you share information through discussion and mutual exploration?
  5. Assume you will be interrupted, how can you plan so that interruptions will not feel to you as facilitator and powerful-person-in-the-room like a challenge but a form of eager interested participation
  6. Members of your audience may assume that you don’t want to hear from them. How can you show that this will be different from other lectures they’ve attended?
  7. Your participants may need to fidget. If you expect them to be still it will be oppressive for them. How can you create opportunities for motion?
  8. How are you accommodating a person who can’t process auditory information?
  9. Some people may participate differently – shaming people for not holding up their hands or participating in conventional ways is cruel.

I’ve given two talks this summer to colleagues in my school. This arises from work I’ve been doing learning from and supporting neurodivergent students in Philosophy, Theology & Religon departments in an ongoing support/tutorial group since 2020. They’re brave and amazing students, and I’ve learned so much from them! I realised it was high time that I shared some of that information with colleagues, and it was a lot of work synthesising what I’d been thinking about and trying to open it up to others, particularly thinking towards others (neurodivergent or not) who hadn’t been on the self-learning and unmasking journey I’ve been on.

Do please feel free to have a look at the slides and let me know if you have thoughts. I’ll be continuing to revise and present this work.

https://jeremykidwell.info/slides/presentation-20230614-teaching_neurodiversity/presentation-20230614-teaching_neurodiversity.html#1

I’m an amateur anthropologist at best, having taken the plunge as a post-doc in 2015. Having not taken Anthropology 101 (or 701 for that matter) I was left to consult with colleagues in various departments on the best mode of induction. Aside from “do the work” (e.g. fieldwork), a common piece of advice that I received was to read ethnographies. A lot of them. One common adage, particularly in US anthropology departments, is that a PhD student should try to read 100 ethnographies. I’ve not quite gotten there myself, yet, but found the principle to be a good one. Don’t start with the technical manuals and methods handbooks, though do consult these as well. Start with practice and studying the practice of others. In passing this advice along to students and other researchers, I’ve often been asked how to find ethnographies and get started on this journey. I’ve gradually accumulated a list of works based on my own interests which includes folks like James Frazer, Emile Durkheim, Bronislaw Malinowski, Marcel Mauss, Margaret Mead, E.E. Evans-Pritchard, Claude Lévi-Strauss, Mary Douglas, Victor Turner, Gregory Bateson, Roy Rappaport, Clifford Geerz, Talal Asad, Roy Wagner, Maurice Bloch, Paul Rabinow, Bruno Latour, Arjun Appaudurai, James Clifford, Lila Abu-Lughod, Keith Basso, George Marcus and Donna Haraway. I’ve also been delighted to discover the work of more recent “greats” like Saba Mahmood, Anna Tsing, Veena Das, Stephan Helmreich, Paolo Gerbaudo, Gabriella Coleman, and Michael Jackson. You can see my interests here in more-than-human anthropology, science & technology studies, netnography etc.

There are specialist areas not represented in the list above where you can do a deep dive – into visual ethnography or auto-ethnography (two other interests of mine), and it’s not hard to find a few recent journal articles on a given methodological niche and the trace citations backwards to the key reference points in monograph form.

If you’re looking to get into the field, it’s also worth keeping an eye, or reading backlists arising from the various anthropology prizes. This includes prizes awarded by the Society for Cultural Anthropology (including the Gregory Bateson prize).