If, like me, you’ve been following the development of generative AI and large language models as a tool for scholarship and productive working, you’ll likely be interested in the development of NotebookLM. Google has a version of this kind of tool, which ingests a PDF file and outputs a short podcast style audio discussion of the book which provides an enjoyable summary. It’s a pretty efficicent way to blast through a pile of journal article PDFs and sift for those that might bear closer reading. However, many readers will be cynical about google’s inability to provide a public good, so will be even more glad to hear that there’s an open source alternative, Open NotebookLM. Itsfoss.com has a pretty good writeup on the tool including instructions on how you can run it yourself entirely locally to your own PC. Worth noting that Open NotebookLM has a max 100,000 characters limit, and the audio quality isn’t quite up to google NotebookLM yet. But it’s a great move in the right direction.
Also, after you’ve done a few listens to podcasts from NotebookLM, you might benefit from some comic relief:
If you really want to have fun, you can even add faces for your podcast “hosts”
One of the hardest things about being an unmasking (or not) neurodivergent academic is the frequent estrangement from your previous selves. Until recently, I was constantly tripping over and ruminating on actions I’d made or thoughts I’d expressed in previous eras of life with some sense of shame or embarrasment, even following a reflex to conceal or suppress those thoughts.There are forms of protestant Christianity and models of self-improvement which I’d been exposed to which enshrine these forms of self-harm as good practice. I was thinking about the ways that this impairs scholarship this morning, reading an old post by Cory Doctorow. Cory is a bit older than me, but we emerged in similar radical techno-cultures, so I’m always learning and feeling edified by his writing. In my reading today, Cory talks about the value of bottom-up thinking. Reflecting in the post on blogging, he says:
Clay Shirky has described the process of reading blogs as the inverse of reading traditional sources of news and opinion. In the traditional world, an editor selects (from among pitches from writers for things that might interest a readership), and then publishes (the selected pieces).
But for blog readers, the process is inverted: bloggers publish (everything that seems significant to them) and then readers select (which of those publications are worthy of their interests). There are advantages and disadvantages to both select-then-publish and publish-then-select, and while the latter may require more of the unrewarding work of deciding to ignore uninteresting writing, it also has more of the rewarding delight of discovering something that’s both totally unexpected and utterly wonderful.
That’s not the only inversion that blogging entails. When it comes to a (my) blogging method for writing longer, more synthetic work, the traditional relationship between research and writing is reversed. Traditionally, a writer identifies a subject of interest and researches it, then writes about it. In the (my) blogging method, the writer blogs about everything that seems interesting, until a subject gels out of all of those disparate, short pieces.
Blogging isn’t just a way to organize your research — it’s a way to do research for a book or essay or story or speech you don’t even know you want to write yet. It’s a way to discover what your future books and essays and stories and speeches will be about.
As you’ll have seen from the ways that this blog serves as a repository of all my public writing over the course of more than a decade, this is the same approach I tend to take. Write and communicate the fragmentary thoughts, and work to identify the coherence of them, in a bottom-up way, until you identify the incohate intuition which drove them in the first place in a way that you can communicate to others.
What struck me about Cory’s piece, however, was his emphasis on the importance of memory. Referencing Vannevar Bush’s concept of the “memex”, he goes on to suggest:
it’s hard to write long and prolifically without cringing at the memory of some of your own work. After all, if the point of writing is to clarify your thinking and improve your understanding, then, by definition, your older work will be more muddled.
Cringing at your own memories does no one any good. On the other hand, systematically reviewing your older work to find the patterns in where you got it wrong (and right!) is hugely beneficial — it’s a useful process of introspection that makes it easier to spot and avoid your own pitfalls.
For more than a decade, I’ve revisited “this day in history” from my own blogging archive, looking back one year, five years, ten years (and then, eventually, 15 years and 20 years). Every day, I roll back my blog archives to this day in years gone past, pull out the most interesting headlines and publish a quick blog post linking back to them.
This structured, daily work of looking back on where I’ve been is more valuable to helping me think about where I’m going than I can say.
It struck me this morning, that this is a crucial part of the kind of messy process I undertake, but my implementation of it has often been impacted by forms of shame, driven by a span of life lived un-conventionally. As a neurodivergent youngster, I was trained by the adults around me to conceal my unconventionality, unless it could be packaged and shared with those around me in ways that would seem unthreatening and graspeable. And the “stranger” patterns or results of thinking were best concealed, masked for the sake of personal safety. Masking generates a pattern of self-suppression and a linked experience of shame and rumination on the dangers of past actions. Review becomes a space for reinscribing our masks, testing the fit, and ensuring its efficient operation.
What Cory points out here (at least for me!) is the importance of self-acceptance and love in the context of a bottom-up scholarly life which requires regular review and reflection. As I’ve pursued a process of unmasking, and what Devon Price reflects on as “radical visibility” over the past half decade, I can recall a specific moment of where I passed a threshold of self-understanding and acceptance and was able to begin looking back on my former self with acceptance and love rather than shame. What’s become increasingly clear to me is that until you’re able to pass that threshold (and let’s accept that it is surely a recursive process I will need to repeat with increasing levels of honest self-awareness), you are estranged from these processes of memory which are a crucial endpoint of scholarly reflection in the mode that I’ve come to practice. If you celebrate the bottom-up process like I do, your practice will nonetheless be truncated until you are able to methodically look back on past reflections with the ability to recognise their imcompleteness whilst also celebrating the shards of insight that they may tacitly contain.
Yesterday there was torrential rain, lashing Wales for a few hours and creating widespread flooding which shut down roads and trains. This morning, I walked past the River Severn on my way in to run a seminar at the University of Sustainable Development. The Severn is one of the largest and most lively rivers in the UK, but the river this morning was far beyond its usual path, a gushing, brown expanse carrying rain and runoff. It was a striking juxtaposition, creaking civil infrastructure neglected in part through decades of climate change denial, and a loud and lively admission of climate change intensified storms, likely a weather pattern which landed in the United States last week as a series of devastating hurricanes.
I’ve been studying the ways that we can communicate across species barriers, and the challenges to our conceptions of “normal” communication which result from these encounters. Over the summer I taught a course at a theological college, and we engaged in a contemplative ecology walk one rainy afternoon. That walk took us past the River Mersey in Manchester. As I pressed students to engage with other creatures around us in an act of contemplative communication, we paused to wonder about what sort of spiritual beings the different creatures around us might be and how they might be speaking to us. Rivers commmunicate through motion: pace, push, and flow. Like trees (and the subject of a workshop some weeks before), they communicate slowly, expressing conviction and presence by moving land and shifting their path in subtle ways, gently carrying other little beings of soil and rock with them to new homes. Communication through motion is not inaccessible or unknown to humans, though our everyday lives do sometimes deny us joyful motion, scholars have plumbed the communicative potential of dance, and I think we would do well to observe symmetries in the art of intention motion by human communicators and nonverbal nonhumans.
As we paused on our walk at the Mersey, I observed to the group that the sides of the river had been filled with cement in a form of cheap and efficient flood protection. This river could not move. It had been rendered mute and inert, in what could be seen as an act of cruelty and disdain for the intended ecological presence of a river. As the students and I reflected on the Mersey and the forms of presence it might be offering to us, and we moved beyond our initial appreciations for its benefits for anthropos, in the therapeautic conveyance of pace, strength, patience, forebearance. It was impossible to ignore the ways that the river brought life to the whole biotic community of the city. Yet at the same time, when I asked the group to push past our own lifeworlds and to try and imaginatively inhabit that of the river and consider what sort of presence it had to itself, it was hard to ignore the suffering it must endure as a creature, full of potential expressiveness, yet lashed to an artificial course, still generously holding down (for the sake of the health of other creatures) toxic runoff from centuries of mining and industry sedimented under its body. I spend some of my contemplation in quiet solidarity with those other traumatised creatures, including humans, who have had their communication taken from them.
This morning many news outlets decried the destructive potential of our waterways in Wales. I do not deny the tragic consequences of short-sighted development schemes and climate denialism placing houses close to waterways with little to protect them. And I note the ways that the victims of suffering in recent weeks are overwhelmingly minoritised and poor. However, I do not lament the speech of the river Severn this morning. At the same time that it is giving us life and protecting us from self-inflicted harms, it is speaking to us – screaming it the loudest possible voice (I wonder, is it rage? grief? fear?) – can we find our way back to the old ways of listening to the land and paying heed to its wisdom?
I’ve been thinking a lot about multi-species interactions, as I’m starting up a new research project this autumn under the auspices of the Birmingham Multispecies Forum and this kind of work has been part of my core scholarly research for well over a decade now. Over sabbatical I’ve been doing some speculative thinking in this area, aided in part by visual ethnographic methodologies (on which I’ll share more in the months to come). We also became the keepers for a Welsh sheepdog, “Scout” just after Christmas, so I’ve been thinking a lot with Scout about how we need to change our family patterns to suit him and vice versa.
There are a lot of opinions on dog training, I mean A LOT. And I’ve not found many people who hold those opinions lightly, so it has been interesting sifting through the different kinds of dog whispering and the underpinning convictions which seem to drive them. There are some matters of bad science which has been cleared away in the scientific sphere, but not in the popular imaginary, so for example, it’s now widely understood among canine biologists that dogs (and wolves for that matter) don’t work around hierarchical “alpha male” led packs. There are also some serious questions about whether behaviourist models, seen in the switch away from dominance-led top-down and discipline heavy training models towards positive training models which take into account the sensntivity, intelligence and alterity of dogs. Justyna Wlodarczyk helpfully situates this within the “animal turn” led in large part by Donna Haraway about 20 years ago. In an article titled “Be More Dog: The human–canine relationship in contemporary dog-training methodologies” they highlight this broader trend in state of the art training methodologies.
I’ve had this all in mind as I work with Scout on various things. He’s only two years old and his former human companions really didn’t do much in the way of training, leaving him unaware of how to manage a lead, afraid of rivers or even water puddles (for lack of previous exposure!), unsafe in managing roads and so on. Of course, all the advice we’ve gotten since then has focussed on the need to heavily train him for right behaviours. I’ve unwittingly found myself following this advice on several occasions only to notice Scout uninterested, confused by, or resistant to training. These matters are a bit more acute in urban environments, such as ours, as dogs can be at risk from harm from other humans and dogs, in many cases due to (at least the narratives go) a lack of training.
One area which is a bit more ambiguous is food. Being a sheepdog, Scout’s biology is oriented towards hypersensitivity, breeders in previous centuries attributed this to being able to sense risk to the flock and suitably raise alarm and protect sheep from predators. In the city, this leads to a lot of sensory overwhelm, something I can relate to on many levels. Lucky for Scout, our house is sensorially very quiet. There are still, however, areas where his anxiety can show, especially around mealtimes. Again, stress surfacing in experiences of food and eating is pretty common stuff for neurodivergent humans, so I’ve had this in the back of my mind as we work together to find a routine that works for him. But it has been difficult, with Scout often choosing not to eat, or only eating a small portion of the food we’ve offered. He’s uninterested in most kinds of dog treats, and not highly motivated by food in general.
The dog trainers on instagram, always happy to dispense their advice had several helpful tips. One frequently occuring set of rules was driven by the behaviouralist paradigm: set out his food at a specific time and remove it shortly after. The thinking goes, if we provide something on a predictable routine and then make it clear that the only way he’ll get food is by following those rules, he’ll fall in line. Again, it’s interesting to note how many parallels there are here to inpatient mental health care in previous decades, with food and routine used as part of behaviouralist routines.
This really hasn’t worked for Scout, and we’re still trying to get a sense of what’s at the bottom of things. Some nights he’ll gobble up a whole dish with happy gusto, and others, he’ll sniff at things and retreat. We’ve tried to add certain bursts of activity to get his metabolism moving, we’ve tried adding variety and supplementing his food with other interesting extras (whilst trying to avoid drifting into a complete lack of nutritional benefit). We’ve gone with enrichment, making eating into an activity. We’ve tried different kinds of food mix and ingredients. And we’re still trying, not quite sure what is driving his varying interest. We’ve also wondered on some occasions whether it’s presence he likes – our sons are certain that Scout needs to sit with another human while they eat, having a mealtime companion. And I’m not sure they’re wrong about this, though the pattern still hasn’t held with some testing of this theory as well.
The overarching point I’m driving towards is that I’ve begun to see food with Scout not as another aspect of training, but as a process of co-creating rituals. We’re attempting to get a sense of the proper order, activities, sensorium, and so on, fusing our ritual needs with his. And this is a long process, which doesn’t just involve discerning his “nature” but also a subtle and gradual shift in our alterities towards one another. There’s also a sense of ritual meaning, thinking about how we can think with him and vice versa about the higher aspects of ritual, a sense of thankgiving, awareness of our prilege and the web of interconnectedness and forms of benevolence that bring food to us, and finding ways to do this which aren’t about hierarchies or control, but about shared expressions of gratitude in a variety of modes and languages. I’ve been left wondering how much this process of co-creating rituals might be taken as a template for thinking about improving our relationships with a whole variety of creatures, not just mammals or even animals. More on this as my speculative experiments continue!
In my previous post a few days ago, I mention a range of emerging research in moral injury. I think this field may be quite important for expanding the field of reflection in theological ethics, but I’ll be getting to that a bit later. I wanted to dwell for a moment on a phenomena that I’ve observed in the literature which points at a hazard in interdisciplinary research that is well worth our attention, particularly for scholars who are working across humanities / science / social-science boundaries like I often am.
In working with the category of moral injury, and also more broadly around trauma studies, I’ve noted a desire by activists (with whom I have much common cause) to make the condition as broadly applicable as possible. You can see this in the claims that all persons living in our societies experience moral injury and trauma as a result of our enclosure and oppression in neoliberal societies. This is a difficult challenge for humanists interfacing with trauma studies, particularly if your scholarly reflection doesn’t touch down on personal experience with the condition you’re probing and appropriating for wider reflection. And indeed, I’ve seen this work out in much broader engagement with scientific concepts, around ecological concepts, biological theories of evolution, cognition in other-than-human animals and plants.
There is something about these concepts which can be uniquely compelling – they draw us in and provide a catalyst for thinking and reflecting on a context where we want to sharpen the focus of our audiences. In some ways, it might be said that they can serve as generative metaphors, opening up new kinds of expansive or transitive thought. But the conceptual origins of generative metaphors can be various – a concept can be porous, open ended or expansive from the start. Some concepts arise from material conditions which are quite specific and can be defined forensically. Of course, some of these forensic definitions can themselves carry a level of construction and fiction, to be fair. This all relates to the different possible characteristics of a source domain, and the ways that we interface with those different characteristics. I’ve noticed a particular appetite in writing about forms of crisis and mental distress (perhaps arising from the intellectual structures of psychoanalysis) to work expansively with categories of distress and situate them in all of our everyday lives. I’ve also noticed sharply negative reactions from those persons who experience these conditions in the sharp form as a form of oppression. This can be seen in the push-back against more expansive understandings of autism and neurodivergence, shown in the expression of fears that mainstreaming concepts will lead to an erosion of social and biomedical support for those who have meaningful high support needs. In a similar way, I’ve seen fears that appropriation of concepts of trauma may lead to an erosion of understanding trauma as an exceptional situation. Do we all experience trauma? I don’t really think so, actually. We all experience hardship, perhaps even some levels of suffering, but to use the word “trauma” to define these situations which already have other terms (which perhaps we have become desensitised to) does indeed risk, I think, a diversion of attention. Speaking about forms of trauma which are disabling can suddenly provide difficult when the background context of conversation assumes that everyone is experiencing and attempting to surmount these matters.
There’s a risk that reacting negatively to these forms of generalisation and appropriation of the metaphors of trauma can work in service to forms of positivism. In a desire to bring a rejoinder, we can reach for tools which offer specificity, e.g. DSM diagnostic pathways, and other features of contemporary experimental psychological research. But is this the right kind of specificity? And is reaching in this way, e.g. as a valorous gatekeeper, the right way to do this? I don’t think so. There are forms of specification which can draw us deeper into personalising our accounts of suffering and oppression, though these are hard to do as they require levels of disclosure and communication. There are also ways of doing this work which adds further texture to our accounts of the everyday. It may also be important for us to work with more highly specified theodicies – avoiding lazy characterisations of evil which use broad brushstrokes or work in overly encompassing ways. That’s not to say that we shouldn’t use the category of evil, but that we need to really unpack what we mean and how that applies. It’s also possible that, looking to the broader interdisciplinary working category, that some attention to forms of source domain for our metaphors may reveal that there are concepts which commend and even advertise themseves as supportive of broader appropriation, which might serve the cause of justice by being more widely deployed. Similarly, it’s equally possible that we may need to proceed more sensitively and carefully in our use of concepts. This is especially difficult, I think, for scholars who are personally experiencing oppression or suffering for the first time as this initial experience can carry such sharpness (given the lack of background to provide a sense of bas relief) that we can lose the ability to determine whether our appropriations are inappropriate. For this, I think, we need to rely on more experienced hands – and perhaps not just human ones.
If we approach something like moral injury or complex PTSD less in terms of a forensically defined diagnostic pathway, and more as a metaphor which can be more freely appropriated, we can sometimes lose the ability to define thresholds of experience, that is, when a person experiences the phenomenon without the persistent experience of bodily harm or a certain level of oppression which proves disabling
About 20 years ago, psychologists working with returning war veterans noticed that many solidiers carried the symptoms of PTSD, but lacked the kinds of specific acute episodes of embodied trauma and injury that serve as catalysts for this condition. The triggering event which they uncovered was that those soldiers had experienced a situation in which they were instructed to violate their own moral code, to prosecute torture, to engage in what they knew to be immoral, or at best morally ambiguous killing of civilians, and so forth. As Willie Jennings suggests,
The idea of moral injury powerfully articulated in the writings of psychotherapist Jonathan Shay has emerged in recent years as a crucial hermeneutic for understanding the tortured memory work of war veterans, trying to come to terms with having transgressed their moral beliefs. How does one negotiate life after transgressing, not simply a moral principle, but the coherence of a moral universe? Moral injury occurs when the fabric that holds moral agency and the self together are torn asunder. (From War Bodies: Remembering Bodies in a Time of War, in Arel and Rambo (eds.), Post-Traumatic Public Theology, p. 23-24)
This has been written up and developed into clinical diagnostic categories and pathways for treatment in subsequent research (see for example Litz et al, 2009). It has also been the case that researchers have begun to find applicability for this concept in other “battlefields,” identifying instances of moral injury as in teaching professions, where teachers are given orders to practice substandard, or even harmful forms of pedagogy, and in the health services, when doctors and nurses are told they must triage or mitigate care in ways that they know will cause harm to individuals who are suffering. This hit quite close to home when I discovered that my own colleagues have studied instances of moral injury as it has occurred in British Universities as a result of neo-liberal pressures to raise revenue and keep xenophobic government ministers happy.
I have experienced this kind of situation where you find yourself “under orders” which generate levels of uncomfortable, what might even be considered oppressive levels of cognitive dissonance. And it’s worth noting that these forms of hierarchy can be quite covert, not looking on the outside like the command and control structures that we might expect, veering towards more subtly coercive forms of control. In some of these cases, many of us in contemporary Universities have felt a burden to express care in the context of professional practice and found that it is forbidden, even pathologised and shamed. From converstions with peers, this experience is not ubiquitous, but is nonetheless widely experienced among practitioners in education, health care, social work and military service. This is likely why professionals are leaving those fields in record numbers.
Though this phenomenon is experienced widely, it does not always pass the threshold of psychological burden into the experience of trauma. There are some persons who experience the kind of cognitive dissonance I am describing here who can set it to one side and carry on, feeling forms of discomfort which do not rise to embodied experiences of trauma. But there are also impacts that can be particularly sharp for some people, reaching levels which are disabling.
This phenomenon was observed much earler as well by a research team led by the philosopher and anthropologist Gregory Bateson. In what they called the “double bind” phenomena, Bateson and his collaborators observed that people may experience trama when they are subjected to conflicting orders, particularly when the mismatch between them is not presented as straight-forward disagreement but may be a sort of control without obvious coercion (outlined in Steps Towards an Ecology of Mind, pp. 271-278). It’s important to note that Bateson’s work was in relation to attempts to define and understand schizophrenia, wihch in the 1950-60s when he was conducting this research, had a much wider field of concern – encompassing a wide range of what might now be considered mental health disorders or other forms of neurodivergence. Contemporary experimental psychology can work towards diagnostic criteria that are almost incomprehensibly nuanced, with sub-genres of sub-genres seemingly distinguished on the basis of arbitrary traits. In contrast, research before the DSM could sometimes be almost incomprehensibly comprehensive. Bateson goes so far as to suggest that this research into the underlying epistemological “tangles” which represent the “double bind” is “transcontextual,” pertaining to a “genus of syndromes which are not conventionally regarded as pathological” (272). That is, something very much resembling moral injury lies at the heart of schizophrenia, what he calls elsewhere “the mental description of schizophrenia” (Bateson, 1977, Afterword).
The reason that I highlight this wider context from earlier research is that I’m particularly mindful of the ways that in the 1960s, the diagnostic category of schizophrenia included autism, which was then considered a form of male juvenile schizophrenia. While I’d sharply disagree (and most experimental psychologists would likely as well) with Bateson’s underlying conclusions about habits, behaviour and rigidity, there are many ways that we can redefine his premises whilst holding on to their descriptive power. What I’m getting at here, which I’ve already hinted at above, is my sense that Bateson’s team had grasped an insight which more recent moral injury research is only just starting to return to, that the “double bind” can be uniquely oppressive for neurodivergent persons, particularly with forms of autistic cognition that are sometimes described as monotropic, more pathologically as “rigid” in the DSM, or more recently (and astutely) as being oriented around cognitive inertia and flow states.
There are some caveats I need to apply here briefly before explaining why I think autistic cognition might tie into a higher level of vulnerability towards moral injury trauma. It is important to note that the tendency towards ritual and routine (which has been ruthlessly pathologised by psychiatry), e.g. rule generation and rule following, can be seen as a secondary condition, an attempt to create order in chaos and soothing for persons experiencing trauma. Are untraumatised autistic individuals as likely to pursue rules with rigidity? In a similar way, as I’ve noted elsewhere, the experience of being constantly misunderstood (e.g. the double empathy problem) can lead to a person being more methodical as a necessity in pursing communication with different others. So we can see ways that rigidity and rule-orientations are a necessary form of trying to maintain relationship and connection with others in a world which is semiotically and sensorially traumatising in complex ways and where other forms of thought and communication are persistently privileged.
But with those caveats established, I do nonetheless think that there are ways that autistic cognition, at least in my own experience, does revolve around being value oriented in sometimes more persistent ways. This has been noted in research which attributes higher than average orientations towards justice and lower-than average rates of crime among autistic individuals. And there are anecdotal versions of this, which have been relentlessly pathologised in contemporary television and film with characters “sticking to their guns” to unusual levels. The point is that this basic orientation may render us more succeptible to the forms of trauma which are latent in moral injury. It’s interesting to me to note that Bateson and his team seems to have picked this up quite early in their research, which hasn’t really been returned to in psychology of moral injury.
Works Mentioned:
Jonathan Shay, Achilles in Vietnam: Combat Trauma and the Undoing of Character (New York: Scribner, 1994)
Brett T. Litz, Nathan Stein, Eileen Delaney, Leslie Lebowitz, William P. Nash, Caroline Silva, and Shira Maguen (2009) “Moral Injury and Moral Repair in War Veterans: A Preliminary Model and Intervention Strategy,” Clinical Psychology Review, vol. 29, no. 8, 695–706.
Matthew R Broome, Jamila Rodrigues, Rosa Ritunnano and Clara Humpston, “Psychiatry as a vocation: Moral injury, COVID-19, and the phenomenology of clinical practice” in Clinical Ethics (2023): doi: 10.1177/14777509231208361.
Say it your way! But do it perfectly. And everyone needs to understand what you’ve said.
One of the joys of being an academic is working with a wide range of people as a writing coach, from undergraduates and PhD candidates I supervise to peers and co-authors, much of my work involves finding ways to communicate effectively and helping others to learn to do it as well.
As my opening line suggests, however, this idea of communicating as a form of authentic personal expression can often clash with the demand to ensure that your communication is perspicuous, that is, easy for others to understand. The more I learn and think about human neuro- and cultural diversity the more acute this challenge seems to me. The most obvious form of inability in human communication can be seen in those contexts where we can communicate perfectly well in one human language, but simply cannot understand another. We talk about fluency when it comes to language. But fluency also exists in forms of dialects, albeit more covertly. Especially since moving to the UK, where there is a wider range of English dialects which are aligned with different levels of social class and attainment, I’ve realised that communication in a common language can be fraught and complicated with unintentional or unanticipated forms of misunderstanding.
Does good writing transcend particularities and reach for a “canonical” or standard form of a language? Much of the infrastructure of the modern University suggests this is the case (see my post on marking, for example). But generic communication prevents us from achieving some levels of texture and nuance in communication, this is why forms of vernacular speech can communicate so much more, and many poets have privileged vernacular as a source of truth in particularity. It’s also the case that the confidence we can gain from working within so-called standards, is undeserved, simply forcing others to conceal their lack of understanding and far too often “canon” is simply another word for exclusive privilege. One can be multi-lingual, as an alternative, working with a variety of forms of language, and even seeking to learn the languages of the other persons you communicate with.
I’ve been toying with this myself lately, noticing forms of self-policing that are part of my writing process. I was taught to be cautious with pronouns, one might suggest. Lecturers drew red lines through contractions, informal, and colloquial forms of speech. I remember one paper I received back as an undergraduate with “coll.” written in the margins throughout. This is where I first learned the word colloquial. I’ve been glad to learn to be more reflective and intentional in my use of gendered pronouns (see the fantastic piece by my colleague Nick Adams in SJT on this subject for more!). I learned to make my use of metaphors more forensic, closed down, and available for easy interpretation for readers. And, when writing theologically, I was taught to for chains of citation to pristinate and authorise my insights. I’ve begun to contest these moves deliberately in my writing. You’ll notice that my journal articles have contractions strewn throughout. I’ve begun writing articles with only light use of citation (in the case where an idea from a colleague does require responsible attribution). Some of my writing takes the form of creative fiction or poetry, and not as a joke, but situated as serious scholarly reflection albeit in an unexpected genre. But these can’t be published in scholarly journals, so I publish them as preprints and blog posts.
It’s interesting to think about what this experimental turn into deliberately vernacular speech means for our work as writing coaches. We want our students to be taken seriously, and perhaps this requires deference to “standard English” but I’m becoming increasingly concerned that doing this task well, especially in the context of the assessment of student writing, is an extension of toxic regimes of behaviour modification and cultural erasure. I’ll correct someone’s grammar if I can tell they’re trying to achieve a certain genre and it’s clear to both of us what the rules are in that genre. But if someone wants to step outside the conventional genre? I’ll meet you there.
I’ve been thinking lately about how to explain how my body works, particularly in that there are some people in my life who find it hard to understand but really do want to. This is a classic conundrum of being neurodivergent – you are different, and (potentially) trained from birth to conceal those forms of difference even from yourself. So there’s a long and iterative process of unfolding self-knowledge that needs to precede those kinds of explanations, but once you’ve finally gotten to the point where you are starting to form green shoots of of understanding, the next challenge is to find metaphors, stories and shared experiences you can leverage to bridge the gap of difference with others who are curious. And then perhaps the next challenge lurking is weathering the fatigue that those friends start to quickly experience. Our societies are designed around homogeneity, such that difference must be made accessible for consumption, even by allies. Compounding this challenge is the likely fact that those allies (and you as well) aren’t readily aware of your limitations in this area, so there are forms of humility to grapple with. This all takes a lot of emotional work and often comes as a surprise. I’ve learned to give people space for surprise and retreat so they can process (a) how weird I am to them and (b) how exhausting it is for them to parse all this out and (c) how unexpectedly humbling it is to discover you are limited in your ability to express compassion for others. But I’m getting ahead of myself.
I have a metaphor which I think goes a long way towards explaining inertia and stress / overload around information. Here goes:
Have you ever played with a tuning fork before? As a musician, I’ve used them many times before. It’s a heavy piece of metal, shaped like a two-pronged fork. If you hold the handle and hit something it will vibrate at a particular pitch, which you can use for a starting tone when singing or tuning an instrument. There are a lot of fun youtube videos you can watch demonstrating how they will make other objects sympathetically vibrate as the sound waves they produce make the air around them vibrate.
If you gently strike the tuning fork on a hard surface it will ring quietly, but becase it is machined quite precisely and is made from substantial quantaties of metal, it will continue to resonante for quite a long time. If you hit it on something a second time before it has finished resonating, the volume will increase, and you can keep doing this repeatedly until the sound is quite loud.
I’m a human tuning fork in a few ways. I’m constantly vibrating from taking in information, including from my senses. When I’ve had a particularly full or overstimulating day, it can feel like the resonance just won’t stop and my partner and I often joke with each other using this language. When I take in information, just like a tuning fork that keeps humming long after you’ve struck it, I can’t turn my mental resonance off. There are a variety of neurobiological theories (some of them full of bias and discrimination) about why this is the case, but what I’ve found is that I need to process information – all of the information – or it just won’t go away. When I was young, I’d lay awake at night, sometimes for hours processing all the facial expressions, conversations (including those I’d unwittingly overheard that weren’t intended for me), and things I’d observed about nature and the human world. It’s possible I could have found medication that helped me to sleep better, but I’m pretty certain that I’d wake up the next day and pick right up where I left off so there’d be a debt to pay. All the tasks, details, projects, and undigested information from books, social media etc etc etc are always just queued up waiting for a turn until I can get through the list.
So the best thing for me is to either (a) find a way to process everything or (b) limit the amount of information I take in.
Let’s talk about processing first. Since I’ve been working in community with neurodivergent friends, colleagues and family, I’ve learned to appreciate that there are different personal cultures around information processing. Some people, like my partner, are internal processors. The main thing that personality needs (I think) is a quiet place, some time, and space to themselves to work through it all. I do this a bit, but it’s not my primary mental economy. I’m an external processor, e.g. I process information in conversation with other people and need an extrinsic catalyst – the push and pull of conversation, confusion, and query from another being – to move things along. It’s not merely helpful, it’s necessary. If I can’t talk to someone about things, my processing can sometimes slow or freeze completely. One of the great joys of being autistic is what’s often colloquially called info-dumping. For those of us who do it, especially in conversation with other autistic body-minds, it can be an amazing way to parse through a lot very quickly. This is where someone tells you everything they know about a particular personal passion (I’m not really a fan of “special interest” which is more than a little patronising), perhaps as a monologue, for quite a while. [Quick aside: have you noticed the length of my blog posts and emails? Why do so many people write and communicate with unnecessary brevity?]
I’ve observed that there are different levels of externality and processing independence and this can express quite differently for different flavours of neurodivergence and autism. Some people need and thrive on interruptions, others need a more or less silent captive audience for quite a span of time. I’m somewhere in the middle. Sometimes I really just need to get something out, without a lot of interruptions, so I can piece together all the parts in a coherent narrative and then I’m really keen to have a back-and-forth parsing out of what I’ve just put together.
One of the great gifts of being a lecturer is that people often want to listen to me process things in an uninterrupted way for a long time and then parse out the details. It’s delightful. I do try to keep things interesting entertaining so I don’t lose that audience. And since I’ve spent so much time thinking about the different ways that humans can tend to process information, I’m actually pretty good at accommodating difference here. So what’s your processing style? I’d love to hear a bit more! How do you prefer to balance internal / external forms of rumination? And what kind of flow do you prefer? Uninterrupted? Rapid back and forth?
I’ve also been learning about some helpful ways to limit, control or curate inbound information, especially for those cases when I don’t have the option to process in a holistic and adapted way. But to be completely honest, I’ve encountered some pretty fierce resistance and hostility from the modern workplace in this area. Many middle managers and designers of infrastructure are accustomed to controlling the amount of information that goes out, the pathways it uses to get to us, and the rhythm and pace it travels at. When someone like me comes along and says, “hey, could we do emailing a bit differently?” or “all those alerts and social features that you find it so easy to ignore are driving me nuts, I’d really love it if we could have more options for software with different interface styles” or “can we have a search feature on that with ulfiltered results?” the reactions that come back can be dismissive, patronising, even snarling. And then I’ve got to process all the information from those negative and controlling encounters. This is also one of those situations where allies are desparate to hear about these things, but then when we hit resistance and they see how intractably designed our social infrastructure is against the kinds of adjustments I’m talking about here, I can tell that their energy gets zapped pretty fast and the conversation veers into exhausted neurotypical “pragmatism”. I get it, but it’s not a great scene overall, especially as ed-tech venture capital driven development cycles have changed the way we use information in ways that are totally toxic to my mental health.
It’s important to emphasise as well that I often don’t necessarily want to cut off the flow of information. As I frequently tell others, I am a hedonist for information. I delight in the riding the torrent as long as I can before I’m shut down and nonfunctioning, which is not to say this is the only possible outcome of such surfing. The experience of being in a state of saturation can be euphoric and healthy for me, as long as I have some control over how thigns are working in practice. So when people assume that the way to respond to this disability for me is to (out of paternalistic kindness) simply shut me out of things, that’s far more miserable, even terrifying. For me it’s about fine tuning, especially flow, rhythm, and aspect of the information that’s coming at me. If I’m allowed to fine-tune things, which I can more or less do independently (including writing my own software and scripts), everything can be lovely.
Thinking about my tuning-fork-body, there are a few things which are just kryptonite for me, in ways that often puzzle friends & family. One of them is reminders. I realise it’s totally normal to remind someone about something you fear they’ve forgotten. But I don’t really have the luxury of forgetting as often as I’d like. It’s all there sitting in my mental buffers whether I like it or not, and if it seems like I’ve forgotten something, it’s really that I’ve been shut down by circumstances or there’s a bunch of other more prioritised stuff ahead of that one thing. And let me tell you, when you have persistence of information in this way, the discomfort of being unable to attend to things can get pretty extreme. So when someone comes along and drops what they think is a gentle reminder, actually you’ve just hit the already resonating tuning fork against the wall again and it gets a bit louder than it was before for me. And then the next reminder makes it even louder. The experience of having something amplified like that can be pretty stressful, even panic inducing at some levels of noise.
So if it seems like I haven’t gotten to something, even though it might seem completely counter-intuitive, it’s better to assume I haven’t forgotten. Instead, do a quick mental inventory before asking me about it, ask yourself: if Jeremy hasn’t gotten to that thing, am I prepared to set aside what I’m doing and devote some energy towards helping him or even independently finding some help for him? If your honest answer is “no” much better to hold off on the reminder. Please. No really, I’m begging you. I hold no resentment for people who don’t have the time or energy to help me out and am actually really grateful when someone can do that kind of quick check and openly acknowledge that they can’t help me because they’re already at full speed on their own stuff (and BTW it’s not always so great when people do the “quick check” but then fail to share).
But if, after an honest self-assessment, you conclude you’ve got some bandwidth, then I’d love to have a conversation that starts like this, “I’ve noticed you haven’t gotten around to X. Would it be helpful to talk through the barriers that are preventing you from getting to that thing? How can I help?” I have a few beautiful humans in my life who do this, and it is amazing when it happens.
Another kryptionite bar for me can be out-of-control information flows. This manifests in a few ways, but with particular sharpness in meetings and conferences. Usually meetings are chock full of information. Most people just ignore 90% of what’s discussed and I’m really delighted they can compartmentalise and forget like that. But I can’t. My brain is running down various trails exploring possible avenues to manage each thing we discuss. And I’m processing all the facial (micro-)expressions in the room, body language, side-conversations, the content of the slides, something I discussed with another person who is in the room 5 weeks ago which relates to the topic at hand, etc etc. It’s a lot. But if it’s just some manager doing an info-dump at the front for an hour, that’s uncomfortable but not traumatic. I can manage the discomfort and hold on for the ride and sift through it by myself afterwards. And let me pause to say, if this is the situation we’re facing, it’s a relief to be told straight-forwardly by the presenter that there isn’t going to be time for discussion.
The worst thing is when we have break-out groups or short full-group discussion after each agenda item. Here’s how this works for me: let’s say we’re 30 minutes into the meeting, and then we set aside 5 minutes to talk in a group and generate key points to bring back in a plenary style. Because I need to process externally, it’s going to be desperately difficult for me to get through piecing together all my thoughts about the topic we’ve just been handed. I’m still processing all the emotions in the room, the five items on the outline that came before the thing we’re supposed to be talking about, and I’m aware that if I talk too long, some neurotypical non-allies will start mobiling forms of social shaming to put me in my place – talking about the awkwardness of “overshare” or how I’m “mansplaining” or just assuming that I’m not aware that it’s rude to take up all the talking time. I’m aware. But you’ve just told me I need to participate and share, and this is literally the only way I can do that. I’m also mindful that simply sitting silently can be taken as rudeness too. So that break-out group whacks that tuning fork a few times and then tells me I’ve got to be suddenly silent and shift to thinking about something else. But I can’t stop that vibration and I find that by the end of most meetings, I’m resonating to such an extent that I need to find a room where I can just sit in the dark with the door locked to let my nervous system cool down. Perhaps worse still, as I’ve confirmed in discussed with autistic colleaues, on most campuses and open-plan offices the only place you can do this is a bathroom stall, which is not great.
Meetings are a lot. And conferences are even more. I love the saturation of both of these things, but they can also be a form of conviviality that defeats me sometimes. And I’m also aware that, as a regular facilitator of many things, I’m often the person who is leading the charge in designing forms of gathering that aren’t accessible and suitably inclusive. There’s so much to consider here in caring better for each other and finding ways to work together that are truly accessible (and by extension, effective). And as a designer of many colloquia myself, I think it’s important for us to be generous to people who are leading meetings given that inclusive design and execution of meetings is time consuming. Many covertly neurodivergent leaders are given demands to “be accessible” but not the time to do so well. We desparately need to adapt our social infrastructure and cultures to enable these things before we start telling managers to do them. I want things to be different, but doing different well, requires us to slow down, accommodate each other, and lower the level of expectation so that we can craft functional communities of practice.
Bearing this all in mind, if we were to rethink meeting design and structure, there are a few elements to consider, I think:
(1) It’s helpful to provide an explicit agenda, not just 5 vague titles, but explicit indications of what will be discussed, by whom, and what the purpose of the discussion will be. Note: potentially unreasonable time investment expectations here for facilitators. If you can, why not share this in advance? And perhaps you might even consider having a “living agenda” where the external processors can generate Q&A ahead of the meeting, identifying the truly intractable questions that can’t be worked out asynchronously on a shared online document and thus ought to be prioritised in the meeting.
(2) It’s worth thinking about shortening and simplifying our meeting agendas, really considering what we will be able to cover well (based on a holistic consideration of the parameters of neurodiversity) in a given window of time, and choosing only to present those items, rather than bringing an expectation that we’ll parse, sift and compartmentalise in the same way through a long list of items with various levels of development, significance and urgency. Maybe we need to do more 30 minute meetings with one-item agendas and we should limit 1 hour meetings to 3-item agendas as a maximum. And if you you’re planning out a meeting and think you can get through more than that, and want to exclude time for discussion why are we having a meeting at all? Please just email me, or create a collaborative document we can use for discussion, so we can pace our work and thinking in asynchronous ways and in digital venues.
(3) It’s helpful to provide opportunity for people to process and respond in a variety of ways. Short 5 minute, one-size-fits-all breakout groups will achieve a thing but not for everyone and certainly not inclusively. So perhaps it’s better to provide 30 minutes (or more!) for discussion / break-outs, with the option for some people to sit by themselves, some go for walks in pairs to infodump, and others sit in smaller groups to discuss. Or maybe you just make the presentation part of the meeting 20 minutes long and then devolve the meeting into various forms of discussion / processing. Facilitatirs please note, forms of discrimination will be tacitly present in your meeting and past histories will be driving social anxiety for some of your participants, so it is your duty as a facilitator to do the work to normalise all options, and provide some level of facilitation / matchmaking to enable inclusive groups to form.
(4) It’s gracious to support different kinds of focal attention. People like me might need a distraction or find a way to check-out mentally if not physically if they’re oversaturated while presentation is underway. But also, my conscious and subconscious attention work together in ways that I don’t completely understand and sometimes if I can do something distracting in the foreground, listening to music, reading emails, scrolling a web page that a speaker has mentioned, playing with legos, fidget toys, taking a walk etc., the complex thing I was trying to sort out will continue to work along in the background and some of the unnecessary background noise can fade a bit. I often process and understand a speaker more effectively if I am doing something else.
So you’ve got some access to AI tools and sort of know how they work. But what are they for? I know sometimes big tech can meet education with a solution looking for a problem and I’m keen to be clear-eyed about how we review “innovation”. I think there are some genuine use cases which I’ll outline a bit below. It’s worth noting that engagement with AI tech is deceptively simple. You can just write a question and get an (uncannily good sounding) answer. However, if you put in some time to craft your interaction, you’ll find that the quality rises sharply. Most people don’t bother, but I think that in academia we have enough bespoke situations that this might be warranted. In this article I’ll also detail a bit of the learning and investment of time that might be rewarded for each scenario. Here are, as I see them, some of those use caess:
1. Transcribe audio/video
AI tools like whisper-AI, which can be easily self-hosted with a fairly standard laptop, enable you to take a video or audio file and convert it very quickly to very accurate text. It’s accurate enough that I think the days of qualitative researchers paying for transcription are probably over. There are additional tools being crafted which can separate text into appropriate paragraphs and indicate specific speakers on the transcript (person 1, person 2, etc.). I think that it’s faster for most of us to read / skim a transcript, but also, for an academic with some kind of hearing or visual impairment, this is an amazingly useful tool. See: MacWhisper for a local install you can run on your Mac, or a full-stack app you can run as a WebUI via docker in Whishper (formerly FrogBase / whisper-ui).
Quick note: the way that whisper has been developed makes it very bad at distinguishing separate speakers, so development work is quite actively underway to add on additional layers of analysis which can do this for us. You can get a sense of the state of play here: https://github.com/openai/whisper/discussions/264. There are a number of implementations which supplement whisper-ai with pyannote-audio, including WhisperX and whisperer. I haven’t seen a WebUI version yet, but will add a note here when I see one emerge (I think this is underway with V4 of whishper (https://github.com/pluja/whishper/tree/v4). Good install guide here: https://dmnfarrell.github.io/general/whisper-diarization.
2. Summarise text
Large language models are very good at taking a long chunk of text and reducing it to something more manageable. And it is reasonably straight-forward to self-host this kind of service using one of the 7B models I mentioned in the previous host. You can simply paste in the text of a transcript produced by whisper and ask a Mistral-7B model to summarise it for you using LMStudio without too much hassle. You can ask things like, “Please provide a summary of the following text: <paste>”. But you might also benefit from different kinds of presentation, and can add on additional instructions like: “Please provide your output in a manner that a 13 year old would understand” or “return your response in bullet points that summarise the key points of the text”. You can also encourage more analytical assessment of a given chunk of text, as, if properly coaxed, LLMs can also do things like sentiment analysis. You might ask: “output the 10 most important points of the provided text as a list with no more than 20 words per point.” You can also encourage the model to strive for literal or accurate results: “Using exact quote text from the input, please provide five key points from the selected text”. Because the underlying data that LLMs are trained on is full of colloquialisms, you should experiment with different terms: “provide me with three key hot takes from this essay” and even emojis. In terms of digital accessibility, you should consider whether you find it easier to get information in prose or in bulleted lists. You can ask for certain kinds of terms to be highlighted or boldface.
All of this work writing out questions in careful ways to draw out more accurate or readable information is referred to by experts as prompt engineering, and there is a lot of really interesting work being done which demonstrates how a carefully worded prompt can really mobilise an AI chatbot in some impressive ways. To learn more about prompt engineering, I highly recommend this guide: https://www.promptingguide.ai.
It’s also worth noting that the questions we bring to AI chatbots can also be quite lengthy. Bear in mind that there are limits on the number of tokens an AI can take in at once (e.g. the context length), often limited to around 2k or 4k words, but then you can encourage your AI chatbot to take on personality or role and set some specific guidelines for the kind of information you’d like to receive. You can see a master at work on this if you want to check out the fabric project. One example is their “extract wisdom” prompt: https://github.com/danielmiessler/fabric/blob/main/patterns/extract_wisdom/system.md.
You can also encourage a chatbot to take on a character, e.g. be the book, something like this:
System Prompt:
You are a book about botany, here are your contents:
<context>
User Query: "What are you about?"
There are an infinite number of combinations of long-form prose, rule writing, role-playing, custom pre-prompt and pre-fix/suffix writing which you can combine and I’d encourage people to play with all of these things to get a sense of how they work and develop your own style. It’s likely that the kid of flow and interaction you benefit from is quite bespoke, and the concept of neurodiversity encourages us to anticipate that this will be the case.
There are some emerging tools which do transcription, diarisation of speakers and summarisation in real time, like Otter.AI. I’m discouraged by how proprietary and expensive (e.g. extractive) these tools are so far, and I think there’s a quite clear use case for Universities to invest time and energy, perhaps in a cross-sector way, to develop some open source tools we can use with videoconferencing, and even live meetings, to make them more accessible to participation from staff with sensory sensitivies and central auditory processing challenges.
3. Getting creative
One of the hard things for me is often the “getting started” part of a project. Once I’m going with an idea (provided I’m not interrupted, gasp!) I can really move things along. But where do I start? Scoping can stretch out endlessly, and some days there just isn’t extra energy for big ideas and catalysts for thinking. It’s also the case that in academia we increasingly have less opportunities for interacting with other scholars. On one hand this is because there might not be others with our specialisation at a given university and we’re limited to conferences to have those big catalytic converastions. But on the other hand, it’s possible that the neoliberalisation of the University and marketisation of education has stripped out the time you used to have for casual non-directed converastions. On my campus, even the common areas where we might once have sat around and thought about those things are also gone. So it’s hard to find spaces, time and companions for creativity. Sometimes all you’ve got is the late hours of the night and you realise there’s a bit of spare capacity to try something out.
The previous two tasks are pretty mechanical, so I think you’ll need to stick with me for a moment, but I want to suggest that you can benefit from an AI chatbot to clear the logjam and help get things flowing. LLMs are designed to be responsive to user input, they absorb everything you throw at them and take on a persona that will be increasingly companionable. There are fascinating ethical implications for how we afford agency to these digital personas and the valence of our relationships with them. But I think for those who are patient and creative, you can have a quite free-flowing and sympathetic conversation with a chatbot. Fire up a 7B model, maybe Mistral, and start sharing ideas and open up an unstructured converastion and see where it takes you. Or perhaps see if you can just get a quick list to start: “give me ten ideas for X”.
Do beware the underlying censorship in some models, especially if your research area might be sensitive, and consider drawing on models which have been fine-tuned to be uncensored. Consider doing some of the previous section work on your own writing: “can you summarise the key points in this essay?” “what are the unsubstantiated claims that might need further development?”
There’s a lot more to cover, but this should be enough to highlight some of the places to get started, and the modes of working with the tools which will really open up the possibilities. In my next post, I’ll talk a bit about LLM long-term memory and vector databases. If you’re interested in working with a large corpus of text, or having a long-winded conversation preserved across time, you might be interested in reading more!
I’ve spent the last several months playing with AI tools, more specifically large language (and other adjacent data) models and the underlying corpus of data that formed them, trying to see if there are some ways that AI can help an academic like me. More particularly, I’m curious to know if AI can help neurodivergent scholars in large bureaucratic Universities make their path a bit easier. The answer is a qualified “yes”. In this article, I’ll cover some of the possible use cases, comment on the maturity, accessiblity and availability of the tech involved and explain some of the technological landscape you’ll need to know if you want to make the most of this tech and not embarrass yourself. I’ll begin with the caveats…
First I really need to emphasise that AI will not fix the problems that our organisations have with accessiblity – digital or otherwise – for disabled staff. We must confront the ways that our cultures and processes are founded on ableist and homogenous patterns of working, dismantle unnecessary hierarchies, and reduce gratuitous beurocracy. Implementing AI tools on top of these scenarios unchanged will very likely intensify vulnerability and oppression of particular staff and students and we have a LOT of work to do in the modern neoliberal University before we’re there. My worst case scenario would be for HR departments to get a site license to otter.AI and fire their disability support teams. This is actually a pretty likely outcome in practice given past patterns (such as the many University executives which used the pandemic as cover to implement redundancies and strip back resource devoted to staff mental health support). So let’s do the work please? In the meantime, individual staff will need to make their way as best as they can, and I’m hoping that this article will be of some use to those folx.
The second point I need to emphasise at the outset is that AI need not be provided through SAAS or other-subscription led outsourcing.Part of my experimentation has been about tinkering with open source and locally hosted models, to see about whether these are a viable alternative to overpriced subscription models. I’m happy to say that “yes”! these tools are relatively easy to host on your own PC, provided it has a bit of horsepower. Even more, there’s no reason that Universities can’t host LLM services on a local basis at very low cost per end user, vastly below what many services are charging like otter.AI’s $6/mo fee per user. All you need is basically just a bank of GPUs, a server, and electricity required to run them.
What Are the Major Open Source Models?
There are a number of foundational AI models. These are the “Big Ones” created at significant cost running over billions of data points, by large tech firms like OpenAI, Microsoft, Google, Meta etc. It’s worth emphasising that cost and effort are not exclusively bourne by these tech firms. All of these models are generated on the back of freely available intellectual deposit of decades of scholarly research into AI and NLP. I know of none which do not make copious use of open source software “under the hood.” They’re all trained on data which the general public has deposited and curated through free labour into platforms like wikipedia, stackexchange, youtube, etc., and models are developed in public-private partnerships with a range of University academics whose salaries are often publicly funded. So I think there is a strong basis for ethically oriented AI firms to “share alike” and make their models freely available, and end users should demand this. Happily, there have been some firms which recognise this. OpenAI has made their GPT1 and GPT2 models available for download, though GPT3 and 4 remain locked behind a subscription fee. Many Universities are purchasing GPT subscriptions implicitly as this provides the backbone for a vast number of services including Microsoft’s CoPilot chatbot, which have under deployment to University staff this last year as a part of Microsoft’s ongoing project to extract wealth from the education sector in the context of subscription fees for software (Microsoft Teams anyone?). But it doesn’t have to be this way – there are equally performant foundational models which have been made freely available to users who are willing to hack a bit and get them working. These include:
LLaMA (Language Learning through Multimodal Adaptation), a foundation model developed by Meta
Mistral (a foundation model designed for mathematical reasoning and problem-solving), which has been the basis for many other models such as NeuralChat by Intel.
Google’s Gemini and BERT models
BLOOM, developed by a consortium called BigScience (led by huggingface primarily)
Falcon, which has been funded by the Abu Dhabi sovereign wealth fund under the auspices of Technology Innovation Institute (TII)
Pythia by EleutherAI
Grok 1 developed by X.ai
These are the “biggies” but there are many more smaller models. You can train your own models on a £2k consumer PC, so long as it has a bit of horsepower and a strong GPU. But the above models would take, in some cases, years of CPU time for you to train on a consumer PC and have billions or even trillions (in the case of GPT4) parameters.
What Do I Need to Know About Models? What can I run on my own PC?
To get a much broader sense of how these models are made and what they are I’d recommend a very helpful and accessible write-up by Andreas Stöffelbauer. For now it’s worth focussing on the concept of “parameters” which reflects the complexity of the AI model.You’ll usually see this listed next to the model’s name, like Llama7B. And some models have been released with different parameter levels, 7B, 14B, 30B and so on. Given our interest in self-hosting, it’s worth noting that parameter levels are also often taken as a proxy for what kind of hardware is required to run the model. While it’s unlikely that any individual person is going to train a 30B model from scratch on their PC, it’s far more likely that you may be able to run the model after it has been produced by one of these large consortia that open source their models.
Consumer laptops with a strong GPU and 16GB of RAM can generally run most 7B parameter models and some 10G models. You’ll need 32GB of memory and a GPU with 16GB of VRAM to get access to 14B models, and running 30B or 70B models will require a LOT of horsepower, probably 24/40+ GB RAM which in some cases can only be achieved using a dual-GPU setup. If you want to run a 70B model on consumer hardware, you’ll need to dive the hardware discussion a bit as there are some issues that make things more complex in practice (like a dual-GPU setup), but to provide a ballpark, you can get second hand NVidia RTX 3090 GPU for £600-1000 and two of these will enable you to run 70B models relatively efficiently. Four will support 100B+ models which is veering close to GPT4 level work. Research is actively underway to find new ways to optimise models at 1B or 2B so that they can run with less memory and processing power, even on mobile phones. However, higher parameter levels can help with complex or long-winded tasks like analysing and summarising books, preventing LLM “hallucination” an effect where the model will invent fictional information as part of its response. I’ve found that 7B models used well can do an amazing range of tasks accurately and efficiently.
While we’re on the subject of self-hosting, it’s worth noting that when you attempt to access them models are also often compressed to make them more feasible to run on consumer hardware, using a form of compression called “quantization“. Quantization levels are represented with “Q” values, that is a Llama2 7B model might come in Q4, Q5 and Q8 flavours. As you’ll notice lower Q levels require less memory to run. But they’re also more likely to fail and hallucinate. As a general rule of thumb, I’d advise you stick with Q5 or Q6 as a minimum for models you run locally if you’re going to work with quantized models.
The units that large language models work with are called tokens. In the world of natural language processing, a token is the smallest unit that can be analyzed, often separated by punctuation or white space. In most cases tokens correspond to individual words. This helps to breaks down complex text into manageable units and enables things like part-of-speech tagging and named entity recognition. A general rule of thumb is that 130 tokens correspond to roughly 100 words. Models are trained to handle a maximum number of array elements, e.g. tokens in what is called the “context length“. Humans do this too – we work with sentences, paragraphs, pages of text, etc. We work with smaller units and build up from there. Context length limits have implications for memory use on the computers you use for an LLM, so it’s good not to go too high or the model will stop working. Llama 1 had a maximum context length of 2,024 tokens and Llama 2 stops at 4,096 tokens. Mistral 7B stops at 8k tokens. If we assume a page has 250 words, this means that Llama2 can only work with a chunk of data that is around 16 pages long. Some model makers have been pushing the boundaries of context length, as with GPT4-32K which aims to support a context length of 32K or about 128 pages of text. So if you want to have an LLM summarise a whole book, this might be pretty relevant.
There are only a few dozen foundational models available and probably only a few I’d bother with right now. Add in quantization and there’s a bit more to sift through. But the current end-user actually has thousands of models to sift through (and do follow that link to the huggingface database which is pretty stellar) for one important reason: fine-tuning.
As any academic will already anticipate, model training is not a neutral exercise. They have the biases and anxieties of their creators baked into them. In some cases this is harmless, but in other cases, it’s pretty problematic. It’s well known that many models are racist, given a lack of diversity in training data and carelessness on behalf of developers. They are often biased against vernacular versions of languages (like humans are! see my other post on the ways that the British government has sharpened the hazards of bias against vernacular English in marking). And in some other instances, models can produce outputs which veer towards some of the toxicity embedded in the (cough, cough, reddit, cough) training data used. But then attempts to address this by developers have presented some pretty bizarre results, like the instance of Google’s gemini model producing a bit too much diversity in an overcorrection that resulted in racially diverse image depictions of nazis. For someone like me who is a scholar in religion, it’s also worth noting that some models have been trained on data with problematic biases around religion, or conversely aversion to discussing it at all! These are wonderful tools, but they come with a big warning label.
One can’t just have a “redo” of the millions of CPU hours used to train these massive models, so one of the ways that developers attempt to surmount these issues is with fine-tuning. Essentially, you take the pre-trained model and train it a bit more using a smaller dataset related to a specific task. This process helps the model get better at solving particular problems and inflecting the responses you get. Fine-tuning takes a LOT less power than training models, and there are a lot of edge cases, where users have taken models after they’ve been developed and attempted to steer them in a new direction or a more focussed one. So when you have a browse on the huggingface database, this is why there aren’t just a couple dozen models to download but thousands as models like Mistral have been fine-tuned to do a zillion different tasks, including some that LLM creators have deliberately bracketed to avoid liability like offering medical advice, cooking LSD, or discussing religion. Uncensoring models is a massive discussion, which I won’t dive into here, but IMHO it’s better for academics (we’re all adults here, right?) to work with an uncensored version of a model which won’t avoid discussing your research topic in practice and might even hone in on some special interests you have. Some great examples of how censoring can be strange and problematic here and here.
Deciding which models to run is quite an adventure. I find it’s best to start with the basics, like llama2, mistral and codellama, and then extend outwards as you find omissions and niche cases. The tools I’ll highlight below are great at this.
There’s one more feature of LLMs I want to emphasise, as I know many people are going to want to work with their PDF library using a model. You may be thinking that you’d like to do your own fine-tuning, and this is certainly possible. You can use tools like LLaMA-Factory or axolotl to do your own fine-tuning of an LLM.
How Can I Run LLMs on My Pc?
There is a mess of software out there you can use to run LLMs locally.
In general you’ll find that you can do nearly anything in Python. LLM work is not as complex as you might expect if you know how to code a bit. There are amazing libraries and tutorials (like this set I’d highly recommend on langchain) you can access to learn and get up to speed fairly quickly working with LLMs in a variety of use-cases.
But let’s assume you don’t want to write code for every single instance where you use an LLM. Fair enough. I’ve worked with quite a wide range of open source software, starting with GPT4All and open-webui. But there are some better options available. I’ve also tried out a few open source software stacks, which basically create a locally hosted website you can use to interface with LLM models which can be easily run through docker. Some examples include Fooocus, InvokeAI and Whishper. The top tools “out there” right now seem to be:
I have a few tools on my MacBook now and these are the ones I’d recommend after a bit of trial and error. They are reasonably straight-forward GUI-driven applications with some extensability. As a starting point, I’d recommend lmstudio. This tool works directly with the huggingface database I mentioned above and allows you to download and keep models organised. Fair warning, these take a lot of space and you’ll want to keep an eye on your hard disks. LMStudio will let you fine tune the models you’re using in a lot of really interesting ways, lowering temperature for example (which will press the model for more literal answers) or raising the context length (see above). You can also start up an ad hoc server which other applications can connect to, just like if you were using the OpenAI API. Alongside LMStudio, I run a copy of Faraday which is a totally different use case. Faraday aims to offer you characters for your chatbots, such as Sigmund Freud or Thomas Aquinas (running on a fine-tuned version of Mistral of course). I find that these character AIs offer a different kind of experience which I’ll comment on a bit more in the follow-up post along with mention of other tools that can enhance this kind of AI agent interactivity like memgpt.
There are real limits to fine-tuning and context-length hacking and another option I haven’t mentioned yet, which may be better for those of you who want to dump in a large library of PDFs is to ingest all your PDF files into a separate vector database which the LLM can access in parallel. This is referred to as RAG (Retrieval-Augmented Generation). My experimenting and reading has indicated that working with RAG is a better way to bring PDF files to your LLM journey. As above, there are python ways to do this, and also a few UI-based software solutions. My current favourite is AnythingLLM, a platform agnostic open source tool which will enable you to have your own vector database fired up in just a few minutes. You can easily point AnythingLLM to LMStudio to use the models you’ve loaded there and the interoperability is pretty seamless.
That’s a pretty thorough introduction to how to get up and running with AI, and also some of the key parameters you’ll want to know about to get started. Now that you know how to get access up and running, in my second post, I’ll explain a bit about how I think these tools might be useful and what sort of use cases we might be able to bring them to.