Greetings from Read Max HQ! In today’s issue, responding to some good recent human-generated writing on A.I.-generated writing.
A reminder: Read Max is a subscription newsletter whose continued existence depends on the support and generosity of paying readers. If you find the commentary at all enlightening, entertaining, or otherwise important to your weekly life, please consider upgrading to a paid subscription, an act which will allow you to enjoy Read Max’s weekly recommendations for overlooked books, movies, and music, and give you a sense of pride and accomplishment and supporting independent journalism, such as it is.
Will A.I. writing ever be good?
Sam Kriss has an excellent new piece in The New York Times Magazine examining the actual style of “A.I. voice.” It’s a great rundown of the many formal quirks of A.I.-generated text, and I’m glad that Kriss was able to appreciate the strangeness of even the ever-more-coherent writing produced by frontier large language models, which is “marked by a whole complex of frankly bizarre rhetorical features”:
Read any amount of A.I.-generated fiction, you’ll instantly notice an entirely different vocabulary. You’ll notice, for instance, that A.I.s are absolutely obsessed with ghosts. In machine-written fiction, everything is spectral. Everything is a shadow, or a memory, or a whisper. They also love quietness. For no obvious reason, and often against the logic of a narrative, they will describe things as being quiet, or softly humming.
This year, OpenAI unveiled a new model of ChatGPT that was, it said, “good at creative writing.” As evidence, the company’s chief executive, Sam Altman, presented a short story it wrote. In his prompt, he asked for a “metafictional literary short story about A.I. and grief.” The story it produced was about 1,100 words long; seven of those words were “quiet,” “hum,” “humming,” “echo” (twice!), “liminal” and “ghosts.” That new model was an early version of ChatGPT-5. When I asked it to write a story about a party, which is a traditionally loud environment, it started describing “the soft hum of distant conversation,” the “trees outside whispering secrets” and a “quiet gap within the noise.” When I asked it to write an evocative and moving essay about pebbles, it said that pebbles “carry the ghosts of the boulders they were” and exist “in a quiet space between the earth and the sea.” Over 759 words, the word “quiet” appeared 10 times. When I asked it to write a science-fiction story, it featured a data-thief protagonist called, inevitably, Kael, who “wasn’t just good—he was a phantom,” alongside a love interest called Echo and a rogue A.I. called the Ghost Code.
Even as L.L.M.s get better at producing fluid and plausibly human text, these persistent stylistic tics remain interestingly abrasive--in a single short answer, presented to you in a vacuum, A.I. text is as smooth as can be, but when you’re confronted with an overwhelming amount of it, the strangeness that’s been fine-tuned out really begins to re-assert itself. Kriss argues (in part) that one reason A.I. writing remains so (in aggregate) weird and waffly is that L.L.M.s “can’t ever actually experience the world”:
This puts a lot of the best writing techniques out of reach. Early in “To the Lighthouse,” Virginia Woolf describes one of her characters looking out over the coast of a Scottish island: “The great plateful of blue water was before her.” I love this image. A.I. could never have written it. No A.I. has ever stood over a huge windswept view all laid out for its pleasure, or sat down hungrily to a great heap of food. They will never be able to understand the small, strange way in which these two experiences are the same. Everything they know about the world comes to them through statistical correlations within large quantities of words.
A.I. does still try to work sensory language into its writing, presumably because it correlates with good prose. But without any anchor in the real world, all of its sensory language ends up getting attached to the immaterial. In Sam Altman’s metafiction about grief, Thursday is a “liminal day that tastes of almost-Friday.” Grief also has a taste. Sorrow tastes of metal. Emotions are “draped over sentences.” Mourning is colored blue. […] This is a cheap literary effect when humans do it, but A.I.s can’t really write any other way. All they can do is pile concepts on top of one another until they collapse.
But I wonder if it’s true that the lack of a “world model” is what pushes L.L.M. text toward metaphorical drivel: It seems just as likely that chatbots over-rely on this kind of sensory-immaterial conjunction because, as Kriss says, it’s a “cheap literary effect” that impresses people passing superficially over a text--exactly the kind of fake-deep crowd-pleaser for which L.L.M. output is being fine-tuned.
These satisfyingly plausible folk-technical explanations come up often when people are trying to describe the limitations of A.I.-generated writing. One well-rehearsed account blames A.I.’s stylistically uninteresting output on next-token prediction: Large language models, this argument goes, intrinsically cannot generate truly great writing, or truly creative writing, because they’re always following paths of less resistance, and regurgitating the most familiar and most probable formulations. This is a satisfying argument, not least because it’s easily comprehensible, and for all we know it’s even a true one.
But we don’t actually know that it’s right, because we’ve never really tried to make an L.L.M. that’s great at writing. I appreciated Nathan Lambert’s recent piece at Interconnects “Why AI writing is mid,” which argues that the main roadblocks to higher-quality writing are as much economic as technical: There simply isn’t enough demand for formally ambitious (or even particularly memorable) writing to be worth the expense or resources necessary to train a model to produce it.
Some models makers care a bit about this. When a new model drops and people rave about its creative writing ability, such as MoonShot AI’s Kimi K2 line of model, I do think the team put careful work into the data or training pipelines. The problem is that no model provider is remotely ready to sacrifice core abilities of the model such as math and coding in pursuit of meaningfully better writing models.
There are no market incentives to create this model — all the money in AI is elsewhere, and writing isn’t a particularly lucrative market to disrupt. An example is GPT 4.5, which was to all reports a rather light fine-tune, but one that produced slightly better prose. It was shut down almost immediately after its launch because it was too slow and economically unviable with its large size.
As Lambert points out, much of what we dislike about A.I.-generated text from a formal perspective--it’s generally cautious, inoffensive, anodyne, predictable, neutral, unmemorable and goes down smooth--is a product not of some inherent L.L.M. “voice” but of the training and fine-tuning processes imposed by A.I. companies, which are incentivized to make their chatbots sound as annoying and bland as possible. No one is out there actually trying to create Joycebot (or whatever), and for good reason: The saga of Microsoft’s Bing and its “alter-ego,” Sydney, is in a broad sense the best fictional story yet produced by an L.L.M. chatbot, but it was also an unmitigated disaster for the company.
To the extent that their output is pushed into “mid-ness” by economic circumstance, L.L.M.s are not unprecedented. In a real sense, “why A.I. is writing mid” and “why most professional writing is mid” have the same explanation: “Good writing,” whether authored wholly by humans or generated by an L.L.M., requires capacious resources (whether in time and education and editing or in compute and training and fine-tuning) to create an idiosyncratic (and likely polarizing) voice for which there usually isn’t economically sufficient demand.1
I sometimes think that it’s more helpful to think about large language models as equivalent not to individual writers in the specific but to whole systems or institutions of which writing is an end-product. A given L.L.M. is less akin to, say, a replacement-level magazine writer than it is to “the entire magazine industry at its peak,” if you imagine the magazine industry as a giant, complex, unpredictable machine for producing a wide variety of texts. Just as that industry, as a whole, once was able to generate text to a certain degree of predictability and at a relatively high floor of quality, to varying client specifications and structured by its own internal systems and incentives, so too do Claude or ChatGPT.2
I bring up magazines in particular as a point of comparison because I’ve been struck for a while at the similarity between the voice deployed in the latest generation of chatbots and what a friend calls “F.O.B. voice,” or the smooth, light, savvy, vaguely humorous tone that once reigned in magazine front-of-book sections:
In better times for the magazine industry, there was higher demand for a particular kind of glib (but not actually humorous), knowing (but not actually smart), fluid (but not actually stylish) text--what my friend Mahoney calls “F.O.B. voice,” for front of book, the pre-features section of a magazine for which, depending on the magazine, editors might end up cranking out 150-to-500-word nuggets of smooth blurb prose about new books, movies, news stories, gadgets, restaurants, or whatever.
F.O.B. voice is as a rule smooth and clichéd and often only semi-coherent, because it needs to be reproduced quickly and without much effort by overworked writers and editors on deadline. It’s also superficially impressive to most readers, thanks both to the packaging that surrounds it, and to their standards for impressiveness, which are quite a bit lower than professionals. For all these reasons, and additionally because it’s obviously been trained on archives of magazines written in F.O.B. voice, it’s unsurprising that ChatGPT takes naturally to producing in F.O.B. voice.
“Mid,” as the magazine industry knew, and as L.L.M.s “know,” is a rewarding zone to be in: It’s what people find easiest to consume, and what advertisers feel most comfortable appearing adjacent. Of course, the magazine industry generated more than just reams and reams of smooth placeholder text; it also produced New Journalism, the modern short story, “Eichmann in Jerusalem,” the Hillary planet, etc. But these were positive externalities, not inevitabilities, driven more by cultural prerogatives than by financial necessity. To get something similar from an L.L.M. would likely require a lot of not-necessarily-profitable groundwork.
Another way of thinking about it might be: A parallel timeline where an A.I. was pumping out great novels would be an improvement to our own, because it’d suggest that there was enough demand for genuinely great novels to make it worth training an L.L.M. to do so.
Not to get too whatever about it, but it’s good to note that that magazine articles (or, even moreso, Hollywood movies) are the products of many humans operating within larger systems and frameworks. Do we think of those articles as “magazine industry-generated,” or major-studio movies as being “Hollywood-generated”? I’m not saying we should, necessarily, but I suspect that if and whenever A.I. is able to create great (or even non-slop) writing, we will come to think of it less as “A.I.-generated” and more as authored by the prompter, or the prompter in concert with the model creators at various levels.


