If, like me, you’ve been following the development of generative AI and large language models as a tool for scholarship and productive working, you’ll likely be interested in the development of NotebookLM. Google has a version of this kind of tool, which ingests a PDF file and outputs a short podcast style audio discussion of the book which provides an enjoyable summary. It’s a pretty efficicent way to blast through a pile of journal article PDFs and sift for those that might bear closer reading. However, many readers will be cynical about google’s inability to provide a public good, so will be even more glad to hear that there’s an open source alternative, Open NotebookLM. Itsfoss.com has a pretty good writeup on the tool including instructions on how you can run it yourself entirely locally to your own PC. Worth noting that Open NotebookLM has a max 100,000 characters limit, and the audio quality isn’t quite up to google NotebookLM yet. But it’s a great move in the right direction.

Also, after you’ve done a few listens to podcasts from NotebookLM, you might benefit from some comic relief:

If you really want to have fun, you can even add faces for your podcast “hosts”

Note: If you haven’t already read my previous post, I’d recommend you give it a quick scan as I cover some material which I make reference to below.

So you’ve got some access to AI tools and sort of know how they work. But what are they for? I know sometimes big tech can meet education with a solution looking for a problem and I’m keen to be clear-eyed about how we review “innovation”. I think there are some genuine use cases which I’ll outline a bit below. It’s worth noting that engagement with AI tech is deceptively simple. You can just write a question and get an (uncannily good sounding) answer. However, if you put in some time to craft your interaction, you’ll find that the quality rises sharply. Most people don’t bother, but I think that in academia we have enough bespoke situations that this might be warranted. In this article I’ll also detail a bit of the learning and investment of time that might be rewarded for each scenario. Here are, as I see them, some of those use caess:

1. Transcribe audio/video

AI tools like whisper-AI, which can be easily self-hosted with a fairly standard laptop, enable you to take a video or audio file and convert it very quickly to very accurate text. It’s accurate enough that I think the days of qualitative researchers paying for transcription are probably over. There are additional tools being crafted which can separate text into appropriate paragraphs and indicate specific speakers on the transcript (person 1, person 2, etc.). I think that it’s faster for most of us to read / skim a transcript, but also, for an academic with some kind of hearing or visual impairment, this is an amazingly useful tool. See: MacWhisper for a local install you can run on your Mac, or a full-stack app you can run as a WebUI via docker in Whishper (formerly FrogBase / whisper-ui).

Quick note: the way that whisper has been developed makes it very bad at distinguishing separate speakers, so development work is quite actively underway to add on additional layers of analysis which can do this for us. You can get a sense of the state of play here: https://github.com/openai/whisper/discussions/264. There are a number of implementations which supplement whisper-ai with pyannote-audio, including WhisperX and whisperer. I haven’t seen a WebUI version yet, but will add a note here when I see one emerge (I think this is underway with V4 of whishper (https://github.com/pluja/whishper/tree/v4). Good install guide here: https://dmnfarrell.github.io/general/whisper-diarization.

2. Summarise text

Large language models are very good at taking a long chunk of text and reducing it to something more manageable. And  it is reasonably straight-forward to self-host this kind of service using one of the 7B models I mentioned in the previous host. You can simply paste in the text of a transcript produced by whisper and ask a Mistral-7B model to summarise it for you using LMStudio without too much hassle. You can ask things like, “Please provide a summary of the following text: <paste>”. But you might also benefit from different kinds of presentation, and can add on additional instructions like: “Please provide your output in a manner that a 13 year old would understand” or “return your response in bullet points that summarise the key points of the text”. You can also encourage more analytical assessment of a given chunk of text, as, if properly coaxed, LLMs can also do things like sentiment analysis. You might ask: “output the 10 most important points of the provided text as a list with no more than 20 words per point.” You can also encourage the model to strive for literal or accurate results: “Using exact quote text from the input, please provide five key points from the selected text”. Because the underlying data that LLMs are trained on is full of colloquialisms, you should experiment with different terms: “provide me with three key hot takes from this essay” and even emojis. In terms of digital accessibility, you should consider whether you find it easier to get information in prose or in bulleted lists. You can ask for certain kinds of terms to be highlighted or boldface.

All of this work writing out questions in careful ways to draw out more accurate or readable information is referred to by experts as prompt engineering, and there is a lot of really interesting work being done which demonstrates how a carefully worded prompt can really mobilise an AI chatbot in some impressive ways. To learn more about prompt engineering, I highly recommend this guide: https://www.promptingguide.ai.

It’s also worth noting that the questions we bring to AI chatbots can also be quite lengthy. Bear in mind that there are limits on the number of tokens an AI can take in at once (e.g. the context length), often limited to around 2k or 4k words, but then you can encourage your AI chatbot to take on personality or role and set some specific guidelines for the kind of information you’d like to receive. You can see a master at work on this if you want to check out the fabric project. One example is their “extract wisdom” prompt: https://github.com/danielmiessler/fabric/blob/main/patterns/extract_wisdom/system.md.

You can also encourage a chatbot to take on a character, e.g. be the book, something like this:

System Prompt:


You are a book about botany, here are your contents:
<context>

User Query: "What are you about?"

There are an infinite number of combinations of long-form prose, rule writing, role-playing, custom pre-prompt and pre-fix/suffix writing which you can combine and I’d encourage people to play with all of these things to get a sense of how they work and develop your own style. It’s likely that the kid of flow and interaction you benefit from is quite bespoke, and the concept of neurodiversity encourages us to anticipate that this will be the case.

There are some emerging tools which do transcription, diarisation of speakers and summarisation in real time, like Otter.AI. I’m discouraged by how proprietary and expensive (e.g. extractive) these tools are so far, and I think there’s a quite clear use case for Universities to invest time and energy, perhaps in a cross-sector way, to develop some open source tools we can use with videoconferencing, and even live meetings, to make them more accessible to participation from staff with sensory sensitivies and central auditory processing challenges.

3. Getting creative

One of the hard things for me is often the “getting started” part of a project. Once I’m going with an idea (provided I’m not interrupted, gasp!) I can really move things along. But where do I start? Scoping can stretch out endlessly, and some days there just isn’t extra energy for big ideas and catalysts for thinking. It’s also the case that in academia we increasingly have less opportunities for interacting with other scholars. On one hand this is because there might not be others with our specialisation at a given university and we’re limited to conferences to have those big catalytic converastions. But on the other hand, it’s possible that the neoliberalisation of the University and marketisation of education has stripped out the time you used to have for casual non-directed converastions. On my campus, even the common areas where we might once have sat around and thought about those things are also gone. So it’s hard to find spaces, time and companions for creativity. Sometimes all you’ve got is the late hours of the night and you realise there’s a bit of spare capacity to try something out.

The previous two tasks are pretty mechanical, so I think you’ll need to stick with me for a moment, but I want to suggest that you can benefit from an AI chatbot to clear the logjam and help get things flowing. LLMs are designed to be responsive to user input, they absorb everything you throw at them and take on a persona that will be increasingly companionable. There are fascinating ethical implications for how we afford agency to these digital personas and the valence of our relationships with them. But I think for those who are patient and creative, you can have a quite free-flowing and sympathetic conversation with a chatbot. Fire up a 7B model, maybe Mistral, and start sharing ideas and open up an unstructured converastion and see where it takes you. Or perhaps see if you can just get a quick list to start: “give me ten ideas for X”.

Do beware the underlying censorship in some models, especially if your research area might be sensitive, and consider drawing on models which have been fine-tuned to be uncensored. Consider doing some of the previous section work on your own writing: “can you summarise the key points in this essay?” “what are the unsubstantiated claims that might need further development?”

There’s a lot more to cover, but this should be enough to highlight some of the places to get started, and the modes of working with the tools which will really open up the possibilities. In my next post, I’ll talk a bit about LLM long-term memory and vector databases. If you’re interested in working with a large corpus of text, or having a long-winded conversation preserved across time, you might be interested in reading more!

I’ve spent the last several months playing with AI tools, more specifically large language (and other adjacent data) models and the underlying corpus of data that formed them, trying to see if there are some ways that AI can help an academic like me. More particularly, I’m curious to know if AI can help neurodivergent scholars in large bureaucratic Universities make their path a bit easier. The answer is a qualified “yes”. In this article, I’ll cover some of the possible use cases, comment on the maturity, accessiblity and availability of the tech involved and explain some of the technological landscape you’ll need to know if you want to make the most of this tech and not embarrass yourself. I’ll begin with the caveats…

First I really need to emphasise that AI will not fix the problems that our organisations have with accessiblity – digital or otherwise – for disabled staff. We must confront the ways that our cultures and processes are founded on ableist and homogenous patterns of working, dismantle unnecessary hierarchies, and reduce gratuitous beurocracy. Implementing AI tools on top of these scenarios unchanged will very likely intensify vulnerability and oppression of particular staff and students and we have a LOT of work to do in the modern neoliberal University before we’re there. My worst case scenario would be for HR departments to get a site license to otter.AI and fire their disability support teams. This is actually a pretty likely outcome in practice given past patterns (such as the many University executives which used the pandemic as cover to implement redundancies and strip back resource devoted to staff mental health support). So let’s do the work please? In the meantime, individual staff will need to make their way as best as they can, and I’m hoping that this article will be of some use to those folx.

The second point I need to emphasise at the outset is that AI need not be provided through SAAS or other-subscription led outsourcing.Part of my experimentation has been about tinkering with open source and locally hosted models, to see about whether these are a viable alternative to overpriced subscription models. I’m happy to say that “yes”! these tools are relatively easy to host on your own PC, provided it has a bit of horsepower. Even more, there’s no reason that Universities can’t host LLM services on a local basis at very low cost per end user, vastly below what many services are charging like otter.AI’s $6/mo fee per user. All you need is basically just a bank of GPUs, a server, and electricity required to run them.

What Are the Major Open Source Models?

There are a number of foundational AI models. These are the “Big Ones” created at significant cost running over billions of data points, by large tech firms like OpenAI, Microsoft, Google, Meta etc. It’s worth emphasising that cost and effort are not exclusively bourne by these tech firms. All of these models are generated on the back of freely available intellectual deposit of decades of scholarly research into AI and NLP. I know of none which do not make copious use of open source software “under the hood.” They’re all trained on data which the general public has deposited and curated through free labour into platforms like wikipedia, stackexchange, youtube, etc., and models are developed in public-private partnerships with a range of University academics whose salaries are often publicly funded. So I think there is a strong basis for ethically oriented AI firms to “share alike” and make their models freely available, and end users should demand this. Happily, there have been some firms which recognise this. OpenAI has made their GPT1 and GPT2 models available for download, though GPT3 and 4 remain locked behind a subscription fee. Many Universities are purchasing GPT subscriptions implicitly as this provides the backbone for a vast number of services including Microsoft’s CoPilot chatbot, which have under deployment to University staff this last year as a part of Microsoft’s ongoing project to extract wealth from the education sector in the context of subscription fees for software (Microsoft Teams anyone?). But it doesn’t have to be this way – there are equally performant foundational models which have been made freely available to users who are willing to hack a bit and get them working. These include:

  • LLaMA (Language Learning through Multimodal Adaptation), a foundation model developed by Meta
  • Mistral (a foundation model designed for mathematical reasoning and problem-solving), which has been the basis for many other models such as NeuralChat by Intel.
  • Google’s Gemini and BERT models
  • BLOOM, developed by a consortium called BigScience (led by huggingface primarily)
  • Falcon, which has been funded by the Abu Dhabi sovereign wealth fund under the auspices of Technology Innovation Institute (TII)
  • Pythia by EleutherAI
  • Grok 1 developed by X.ai

These are the “biggies” but there are many more smaller models. You can train your own models on a £2k consumer PC, so long as it has a bit of horsepower and a strong GPU. But the above models would take, in some cases, years of CPU time for you to train on a consumer PC and have billions or even trillions (in the case of GPT4) parameters.

What Do I Need to Know About Models? What can I run on my own PC?

To get a much broader sense of how these models are made and what they are I’d recommend a very helpful and accessible write-up by Andreas Stöffelbauer. For now it’s worth focussing on the concept of “parameters” which reflects the complexity of the AI model.You’ll usually see this listed next to the model’s name, like Llama7B. And some models have been released with different parameter levels, 7B, 14B, 30B and so on. Given our interest in self-hosting, it’s worth noting that parameter levels are also often taken as a proxy for what kind of hardware is required to run the model. While it’s unlikely that any individual person is going to train a 30B model from scratch on their PC, it’s far more likely that you may be able to run the model after it has been produced by one of these large consortia that open source their models.

Consumer laptops with a strong GPU and 16GB of RAM can generally run most 7B parameter models and some 10G models. You’ll need 32GB of memory and a GPU with 16GB of VRAM to get access to 14B models, and running 30B or 70B models will require a LOT of horsepower, probably 24/40+ GB RAM which in some cases can only be achieved using a dual-GPU setup. If you want to run a 70B model on consumer hardware, you’ll need to dive the hardware discussion a bit as there are some issues that make things more complex in practice (like a dual-GPU setup), but to provide a ballpark, you can get second hand NVidia RTX 3090 GPU for £600-1000 and two of these will enable you to run 70B models relatively efficiently. Four will support 100B+ models which is veering close to GPT4 level work. Research is actively underway to find new ways to optimise models at 1B or 2B so that they can run with less memory and processing power, even on mobile phones. However, higher parameter levels can help with complex or long-winded tasks like analysing and summarising books, preventing LLM “hallucination” an effect where the model will invent fictional information as part of its response. I’ve found that 7B models used well can do an amazing range of tasks accurately and efficiently.

While we’re on the subject of self-hosting, it’s worth noting that when you attempt to access them models are also often compressed to make them more feasible to run on consumer hardware, using a form of compression called “quantization“. Quantization levels are represented with “Q” values, that is a Llama2 7B model might come in Q4, Q5 and Q8 flavours. As you’ll notice lower Q levels require less memory to run. But they’re also more likely to fail and hallucinate. As a general rule of thumb, I’d advise you stick with Q5 or Q6 as a minimum for models you run locally if you’re going to work with quantized models.

The units that large language models work with are called tokens. In the world of natural language processing, a token is the smallest unit that can be analyzed, often separated by punctuation or white space. In most cases tokens correspond to individual words. This helps to breaks down complex text into manageable units and enables things like part-of-speech tagging and named entity recognition. A general rule of thumb is that 130 tokens correspond to roughly 100 words. Models are trained to handle a maximum number of array elements, e.g. tokens in what is called the “context length“. Humans do this too – we work with sentences, paragraphs, pages of text, etc. We work with smaller units and build up from there. Context length limits have implications for memory use on the computers you use for an LLM, so it’s good not to go too high or the model will stop working. Llama 1 had a maximum context length of 2,024 tokens and Llama 2 stops at 4,096 tokens. Mistral 7B stops at 8k tokens. If we assume a page has 250 words, this means that Llama2 can only work with a chunk of data that is around 16 pages long. Some model makers have been pushing the boundaries of context length, as with GPT4-32K which aims to support a context length of 32K or about 128 pages of text. So if you want to have an LLM summarise a whole book, this might be pretty relevant.

There are only a few dozen foundational models available and probably only a few I’d bother with right now. Add in quantization and there’s a bit more to sift through. But the current end-user actually has thousands of models to sift through (and do follow that link to the huggingface database which is pretty stellar) for one important reason: fine-tuning.

As any academic will already anticipate, model training is not a neutral exercise. They have the biases and anxieties of their creators baked into them. In some cases this is harmless, but in other cases, it’s pretty problematic. It’s well known that many models are racist, given a lack of diversity in training data and carelessness on behalf of developers. They are often biased against vernacular versions of languages (like humans are! see my other post on the ways that the British government has sharpened the hazards of bias against vernacular English in marking). And in some other instances, models can produce outputs which veer towards some of the toxicity embedded in the (cough, cough, reddit, cough) training data used. But then attempts to address this by developers have presented some pretty bizarre results, like the instance of Google’s gemini model producing a bit too much diversity in an overcorrection that resulted in racially diverse image depictions of nazis. For someone like me who is a scholar in religion, it’s also worth noting that some models have been trained on data with problematic biases around religion, or conversely aversion to discussing it at all! These are wonderful tools, but they come with a big warning label.

One can’t just have a “redo” of the millions of CPU hours used to train these massive models, so one of the ways that developers attempt to surmount these issues is with fine-tuning. Essentially, you take the pre-trained model and train it a bit more using a smaller dataset related to a specific task. This process helps the model get better at solving particular problems and inflecting the responses you get. Fine-tuning takes a LOT less power than training models, and there are a lot of edge cases, where users have taken models after they’ve been developed and attempted to steer them in a new direction or a more focussed one. So when you have a browse on the huggingface database, this is why there aren’t just a couple dozen models to download but thousands as models like Mistral have been fine-tuned to do a zillion different tasks, including some that LLM creators have deliberately bracketed to avoid liability like offering medical advice, cooking LSD, or discussing religion. Uncensoring models is a massive discussion, which I won’t dive into here, but IMHO it’s better for academics (we’re all adults here, right?) to work with an uncensored version of a model which won’t avoid discussing your research topic in practice and might even hone in on some special interests you have. Some great examples of how censoring can be strange and problematic here and here.

Deciding which models to run is quite an adventure. I find it’s best to start with the basics, like llama2, mistral and codellama, and then extend outwards as you find omissions and niche cases. The tools I’ll highlight below are great at this.

There’s one more feature of LLMs I want to emphasise, as I know many people are going to want to work with their PDF library using a model. You may be thinking that you’d like to do your own fine-tuning, and this is certainly possible. You can use tools like LLaMA-Factory or axolotl to do your own fine-tuning of an LLM.

How Can I Run LLMs on My Pc?

There is a mess of software out there you can use to run LLMs locally.

In general you’ll find that you can do nearly anything in Python. LLM work is not as complex as you might expect if you know how to code a bit. There are amazing libraries and tutorials (like this set I’d highly recommend on langchain) you can access to learn and get up to speed fairly quickly working with LLMs in a variety of use-cases.

But let’s assume you don’t want to write code for every single instance where you use an LLM. Fair enough. I’ve worked with quite a wide range of open source software, starting with GPT4All and open-webui. But there are some better options available. I’ve also tried out a few open source software stacks, which basically create a locally hosted website you can use to interface with LLM models which can be easily run through docker. Some examples include Fooocus, InvokeAI and Whishper. The top tools “out there” right now seem to be:

Note: the most comprehensive list of open source tools you can use to run an AI chatbot I’ve seen to date can be found here.

I have a few tools on my MacBook now and these are the ones I’d recommend after a bit of trial and error. They are reasonably straight-forward GUI-driven applications with some extensability. As a starting point, I’d recommend lmstudio. This tool works directly with the huggingface database I mentioned above and allows you to download and keep models organised. Fair warning, these take a lot of space and you’ll want to keep an eye on your hard disks. LMStudio will let you fine tune the models you’re using in a lot of really interesting ways, lowering temperature for example (which will press the model for more literal answers) or raising the context length (see above). You can also start up an ad hoc server which other applications can connect to, just like if you were using the OpenAI API. Alongside LMStudio, I run a copy of Faraday which is a totally different use case. Faraday aims to offer you characters for your chatbots, such as Sigmund Freud or Thomas Aquinas (running on a fine-tuned version of Mistral of course). I find that these character AIs offer a different kind of experience which I’ll comment on a bit more in the follow-up post along with mention of other tools that can enhance this kind of AI agent interactivity like memgpt.

There are real limits to fine-tuning and context-length hacking and another option I haven’t mentioned yet, which may be better for those of you who want to dump in a large library of PDFs is to ingest all your PDF files into a separate vector database which the LLM can access in parallel. This is referred to as RAG (Retrieval-Augmented Generation). My experimenting and reading has indicated that working with RAG is a better way to bring PDF files to your LLM journey. As above, there are python ways to do this, and also a few UI-based software solutions. My current favourite is AnythingLLM, a platform agnostic open source tool which will enable you to have your own vector database fired up in just a few minutes. You can easily point AnythingLLM to LMStudio to use the models you’ve loaded there and the interoperability is pretty seamless.

That’s a pretty thorough introduction to how to get up and running with AI, and also some of the key parameters you’ll want to know about to get started. Now that you know how to get access up and running, in my second post, I’ll explain a bit about how I think these tools might be useful and what sort of use cases we might be able to bring them to.