Note: If you haven’t already read my previous post, I’d recommend you give it a quick scan as I cover some material which I make reference to below.

So you’ve got some access to AI tools and sort of know how they work. But what are they for? I know sometimes big tech can meet education with a solution looking for a problem and I’m keen to be clear-eyed about how we review “innovation”. I think there are some genuine use cases which I’ll outline a bit below. It’s worth noting that engagement with AI tech is deceptively simple. You can just write a question and get an (uncannily good sounding) answer. However, if you put in some time to craft your interaction, you’ll find that the quality rises sharply. Most people don’t bother, but I think that in academia we have enough bespoke situations that this might be warranted. In this article I’ll also detail a bit of the learning and investment of time that might be rewarded for each scenario. Here are, as I see them, some of those use caess:

1. Transcribe audio/video

AI tools like whisper-AI, which can be easily self-hosted with a fairly standard laptop, enable you to take a video or audio file and convert it very quickly to very accurate text. It’s accurate enough that I think the days of qualitative researchers paying for transcription are probably over. There are additional tools being crafted which can separate text into appropriate paragraphs and indicate specific speakers on the transcript (person 1, person 2, etc.). I think that it’s faster for most of us to read / skim a transcript, but also, for an academic with some kind of hearing or visual impairment, this is an amazingly useful tool. See: MacWhisper for a local install you can run on your Mac, or a full-stack app you can run as a WebUI via docker in Whishper (formerly FrogBase / whisper-ui).

Quick note: the way that whisper has been developed makes it very bad at distinguishing separate speakers, so development work is quite actively underway to add on additional layers of analysis which can do this for us. You can get a sense of the state of play here: https://github.com/openai/whisper/discussions/264. There are a number of implementations which supplement whisper-ai with pyannote-audio, including WhisperX and whisperer. I haven’t seen a WebUI version yet, but will add a note here when I see one emerge (I think this is underway with V4 of whishper (https://github.com/pluja/whishper/tree/v4). Good install guide here: https://dmnfarrell.github.io/general/whisper-diarization.

2. Summarise text

Large language models are very good at taking a long chunk of text and reducing it to something more manageable. And  it is reasonably straight-forward to self-host this kind of service using one of the 7B models I mentioned in the previous host. You can simply paste in the text of a transcript produced by whisper and ask a Mistral-7B model to summarise it for you using LMStudio without too much hassle. You can ask things like, “Please provide a summary of the following text: <paste>”. But you might also benefit from different kinds of presentation, and can add on additional instructions like: “Please provide your output in a manner that a 13 year old would understand” or “return your response in bullet points that summarise the key points of the text”. You can also encourage more analytical assessment of a given chunk of text, as, if properly coaxed, LLMs can also do things like sentiment analysis. You might ask: “output the 10 most important points of the provided text as a list with no more than 20 words per point.” You can also encourage the model to strive for literal or accurate results: “Using exact quote text from the input, please provide five key points from the selected text”. Because the underlying data that LLMs are trained on is full of colloquialisms, you should experiment with different terms: “provide me with three key hot takes from this essay” and even emojis. In terms of digital accessibility, you should consider whether you find it easier to get information in prose or in bulleted lists. You can ask for certain kinds of terms to be highlighted or boldface.

All of this work writing out questions in careful ways to draw out more accurate or readable information is referred to by experts as prompt engineering, and there is a lot of really interesting work being done which demonstrates how a carefully worded prompt can really mobilise an AI chatbot in some impressive ways. To learn more about prompt engineering, I highly recommend this guide: https://www.promptingguide.ai.

It’s also worth noting that the questions we bring to AI chatbots can also be quite lengthy. Bear in mind that there are limits on the number of tokens an AI can take in at once (e.g. the context length), often limited to around 2k or 4k words, but then you can encourage your AI chatbot to take on personality or role and set some specific guidelines for the kind of information you’d like to receive. You can see a master at work on this if you want to check out the fabric project. One example is their “extract wisdom” prompt: https://github.com/danielmiessler/fabric/blob/main/patterns/extract_wisdom/system.md.

You can also encourage a chatbot to take on a character, e.g. be the book, something like this:

System Prompt:


You are a book about botany, here are your contents:
<context>

User Query: "What are you about?"

There are an infinite number of combinations of long-form prose, rule writing, role-playing, custom pre-prompt and pre-fix/suffix writing which you can combine and I’d encourage people to play with all of these things to get a sense of how they work and develop your own style. It’s likely that the kid of flow and interaction you benefit from is quite bespoke, and the concept of neurodiversity encourages us to anticipate that this will be the case.

There are some emerging tools which do transcription, diarisation of speakers and summarisation in real time, like Otter.AI. I’m discouraged by how proprietary and expensive (e.g. extractive) these tools are so far, and I think there’s a quite clear use case for Universities to invest time and energy, perhaps in a cross-sector way, to develop some open source tools we can use with videoconferencing, and even live meetings, to make them more accessible to participation from staff with sensory sensitivies and central auditory processing challenges.

3. Getting creative

One of the hard things for me is often the “getting started” part of a project. Once I’m going with an idea (provided I’m not interrupted, gasp!) I can really move things along. But where do I start? Scoping can stretch out endlessly, and some days there just isn’t extra energy for big ideas and catalysts for thinking. It’s also the case that in academia we increasingly have less opportunities for interacting with other scholars. On one hand this is because there might not be others with our specialisation at a given university and we’re limited to conferences to have those big catalytic converastions. But on the other hand, it’s possible that the neoliberalisation of the University and marketisation of education has stripped out the time you used to have for casual non-directed converastions. On my campus, even the common areas where we might once have sat around and thought about those things are also gone. So it’s hard to find spaces, time and companions for creativity. Sometimes all you’ve got is the late hours of the night and you realise there’s a bit of spare capacity to try something out.

The previous two tasks are pretty mechanical, so I think you’ll need to stick with me for a moment, but I want to suggest that you can benefit from an AI chatbot to clear the logjam and help get things flowing. LLMs are designed to be responsive to user input, they absorb everything you throw at them and take on a persona that will be increasingly companionable. There are fascinating ethical implications for how we afford agency to these digital personas and the valence of our relationships with them. But I think for those who are patient and creative, you can have a quite free-flowing and sympathetic conversation with a chatbot. Fire up a 7B model, maybe Mistral, and start sharing ideas and open up an unstructured converastion and see where it takes you. Or perhaps see if you can just get a quick list to start: “give me ten ideas for X”.

Do beware the underlying censorship in some models, especially if your research area might be sensitive, and consider drawing on models which have been fine-tuned to be uncensored. Consider doing some of the previous section work on your own writing: “can you summarise the key points in this essay?” “what are the unsubstantiated claims that might need further development?”

There’s a lot more to cover, but this should be enough to highlight some of the places to get started, and the modes of working with the tools which will really open up the possibilities. In my next post, I’ll talk a bit about LLM long-term memory and vector databases. If you’re interested in working with a large corpus of text, or having a long-winded conversation preserved across time, you might be interested in reading more!

Leave a Reply

Your email address will not be published. Required fields are marked *