🍷💬 language models should come with tasting notes: “chatGPT (2022): fresh northern california terroir, mostly reserved but with an informational overtone, subtle notes of peach, vanilla, bias; overall a good value for the price”
The premier publication for discerning model connoisseurs. Coming Fall 2024 to newsstands near you.
AI Spectator started with a joke that wasn’t really a joke: “Language models should come with tasting notes.” The deeper point was that we lacked a good vocabulary to discuss these models. They look like technical goods, but their conversational and generative qualities make them act more like cultural ones. Ultimately models are built on a foundation of choices made by their creators—decisions of architecture, data, and fine-tuning that help determine the model’s feel. And it is a “feel.” Ask anyone who spends a lot of time playing with these AI tools and they’ll tell you that the only real way to evaluate them is to get your hands on them and start prompting.
🍷💬 language models should come with tasting notes: “chatGPT (2022): fresh northern california terroir, mostly reserved but with an informational overtone, subtle notes of peach, vanilla, bias; overall a good value for the price”
However, the current paradigm for model evaluation is reflexively tied to quantitative benchmarks, a legacy of artificial intelligence’s computer science roots. If you can't do the chart that shows that you're better on MMLU, Big-Bench, etc., you're sunk. This leaves us poorly equipped to grapple with most of what makes LLMs truly interesting—conversation, coding, creative production. These are precisely the kinds of expressions that are poor fits for a purely quantitative paradigm. They are matters of taste. LLMs are aesthetic artifacts, very specific expressions of the taste of their creators.
What we need is the Cahiers du Cinéma, the London Review of Books, the Artforum of Artificial Intelligence. We need deeply personal, subjective, narrative reviews of new LLMs written like they were addressing a new film, book, fashion collection, restaurant, or vintage. This is the guiding vision behind AI Spectator. We aim to be a home for a new kind of AI criticism, one that embraces the subjective and the aesthetic, that treats LLMs as cultural creations to be interpreted and debated rather than just technical objects to be benchmarked.
Our contributors will be individuals who have put in the time to develop a connoisseur's palate for language models. They'll bring an informed but personal perspective, evaluating each new model attentive to the subtle flavors, the surprising combinations, and the overall gestalt that emerges from countless small decisions.
If this resonates with you, we want to hear from you. AI Spectator is looking for contributors for our first edition, slated for release this fall. We're seeking pitches for essays that offer a distinctly personal and aesthetic take on specific LLMs or AI systems. Tell us what models have captivated you, frustrated you, expanded your sense of what's possible. Write with conviction, with style, with an eye towards the ineffable qualities that can't be captured in a benchmark.
Artificial intelligence deserves a better class of critic.
Respectfully,
Tim Hwang & Noah Brier,
Founders/Publishers