Word Vectors Make all the Différance

A big thing in machine learning-driven artificial intelligence for language processing right now is word embedding vectors. The word cat is represented to the computer not as a string of letters c-a-t, nor as an item in a vocabulary (which would boil down to an index, i.e. an integer) but rather as a point in a high-dimensional continuous vector space. Not just any point though. One trains a language model on a large corpus of English text and in the course of doing so produces these “word embeddings” as a side effect. They are an intermediate data structure employed on the way towards the standard language modeling task of predicting which word is likely to appear near which other words. It turns out that treating these vectors as the proxy for the meaning of the words they correspond to is helpful in many other natural language tasks, so instead of throwing them out you set them aside for future use, the way a chef would set aside chicken bones to make a broth.

better cat vector

In the 300-dimensional GloVe embedding of English, for example, the word cat is represented by the numbers -0.15067, -0.024468, -0.23368, -0.23378, -0.18382 and so on 295 more times. This is in some crude way what cat “means”. The numbers corresponding to any given word are completely uninterpretable, but taken as a whole the system makes a certain degree of sense. For example, the words with points in the vector space closest to cat–kitten, dog, kitty, pet, feline–refer to things obviously similar to cats.


We can even do basic semantic “arithmetic”. If we subtract the vector for man from the vector for king we get a point in the space relatively close to the result of subtracting woman from queen. Meaning is captured not by the individual pieces, but in the structure of the whole.


What’s more, this structure is essentially differential. The only thing distinctive about the vector embedding of cat is that it is some sequence of numbers and not another. Crucially it is different from the sequences for dog, queen, man, and so forth. Different, yet intermingled with. By virtue of being mapped into the same space, every word bears a relationship to every other word, and this relationship is itself no accident. It is, as I said above, the result of having a computer churn through an enormous body of English text, the product of millions upon millions of calculations. Imagine shaking a vast multi-dimensional numeric matrix like a snow globe until its individual elements arrange themselves in a shape that allows the statistical patterns peculiar to the English language to pass through with relative ease. You could never work backwards from the vector embeddings to the training text, but nevertheless you know that each word’s vector is where it is because of its interplay with the other words in actual living language. Each point in the space bears the invisible trace of all the others.

Have you ever opened a dictionary to look up an unfamiliar word and thought, why this definition is just composed of other words, all of which also appear in this dictionary? And if you looked up their definitions, they would just be other words, and so on. This can produce a vertiginous feeling, the realization that language has no beginning and no outside vantage point. Word vector space is like that, except even more vertiginous because it is a continuous space. Your mind naturally turns to imagining a topology of this space. There could be surfaces and manifolds. About each word’s point we cannot help but imagine a little hypersphere, its semantic penumbra. There will be an infinite number of points within that hypersphere that do not correspond to any English word, but nevertheless could correspond to a word, if only a word were to happen to have appeared in such-and-such a set of contexts. If we were to go back an insert our novel term into our training texts, would it make sense? Would it express a novel concept, but one nevertheless similar to the concepts near it in the embedding space? Perhaps the move from the discrete space of words to the continuous space of embeddings reverses language’s discretizing nature, its ability to chop the smooth flow of experience into discrete atoms of meaning. Continuity, after all, is just infinity standing on its head. So this is not just an interplay of signifiers, but an endless interplay.

What is the result of all this work? Well, we can create computer programs that solve a number of practical problems. They can summarize documents, group similar news articles, help people find the information they need, and transcribe speech. All useful and impressive, and all seemingly impossible just a few decades ago before we had invented these particular mathematical tools. But still there’s something unsatisfying about the whole business, because at the end of the day it’s all just bits on a machine. Your computer programs process reams of text in order to produce…more text. And that can’t be all there is. We know from our experience that language isn’t just some complicated interplay of tokens. At some point it has to touch on the outside world. It has to be about something. And yet all the computer can model for us is a closed system. Disappointingly, there’s nothing outside the text.

It’s all very strange, very counterintuitive. Honestly it makes your head hurt just to think about it. Such an odd conceptualization must be an artifact of the software engineering process, the awkward attempt to force the fundamentally human phenomenon of language onto a computer. No one would be so perverse as to conceive of language in this manner unless practical engineering necessity forced them to.

Posted in Fabulous ones, Mermaids, Those that have just broken the flower vase | Leave a comment


Leonid Brezhnev sat in the small theater at the back of the State Cultural Intelligence Agency, receiving his periodic briefing on western media. Up on the screen an Englishman in oversized spectacles (Elton John–though that wasn’t his real name) sat at a piano barking out a vulgar song, accompanied by a gaggle of crudely made felt puppets. This, apparently, was what children in the United States of America watched on a weekly basis. Brezhnev sank deeper into the padded seat and felt a familiar despair wrap its hands around his heart. What did it–what did any of it mean?

Posted in Those that at a distance resemble flies | Leave a comment

Intension and Extension in Python

{n|0≤n<20 . n is even}

intension = (n for n in range(20) if n % 2 == 0)
<generator object>
extension = list(intension)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Posted in Innumerable ones, Those that have just broken the flower vase | Leave a comment


I just saw this t-shirt.


Though a snappy example of circa-2018 relationship slang, the phrase does little more than repeat the age-old wisdom that a woman should not give her heart to a cad. Still, I find one bit of that slang particularly intriguing, the word “wifey”.

In this context, “wifey” functions as an adjective. Morphologically the -y suffix also makes it sound like an adjective. (“She’s acting all prickly.” “His shirt is sparkly.”) However, being a noun is what distinguishes this new slang sense. A woman who you hold in high esteem, who you have feelings for and aren’t merely using for sex is a wifey.1 Clearly the trick here lies in the defamiliarization of “wife”. The extra morpheme at the end adds an extra kick to a word that is otherwise so commonplace we barely notice it. But how exactly does it accomplish this? What is -y’s semantic payload? Is it merely an affectionate diminutive like in “puppy” or “kitty”, or does the ending actually create an adjectival form of  “wife” which is then insouciantly employed as a noun? Etymologically I bet it’s the former (though I couldn’t tell you why), but the latter reading also feels plausible, so I entertain it as well whenever I hear “wifey”. Which is what makes the above usage so delightful: a noun transformed into an ostensive adjective, thereby flaunting its nouniness, dropped into a situation in which it must function without hesitation as an adjective. It’s like a grammatical version of Victor, Victoria. I love it.

1Not so much a literal wife though. Which makes you think that the current rules of heterosexual engagement (in which a woman is supposed to aspire to be a wifey and ultimately a wife, while a man is supposed to keep negotiating for sex for as long as he can) are more concerned with enhancing matrimony’s exchange value by way of endless deferral than they are with actually getting people laid or hitched.2

2 I also misunderstood the term “fuck boy” when I first heard it. Obviously, it is intended to be dismissive, but I initially believed that a fuck boy was a boy a woman merely used for sex. Like a boy toy, except more disposable. But, no, I came to learn that “fuck boy” basically just means cad too. This is a letdown, because the notion of cadishness requires an implicit sense of female victimization. I want there to be language to describe women who pursue sex in a manner that is maybe slightly selfish, but still powerful and exuberant. I’d use those words. I’d sleep with those women.

Posted in Mermaids, Those that tremble as if they were mad | Leave a comment

Every Time I Fire a Linguist Someone Cooks Me a Delicious Osso Buco

My career as an artificial intelligence engineer began in a master’s program in linguistics. There I memorized the International Phonetic Alphabet, played hunt-the-allophone, read the literature on control verbs and code switching, and generally made a good faith effort to locate myself in the proud tradition of Noam Chomsky, Ray Jackendoff, and Henry Higgins. The only role computers played in this endeavor were in coaxing me to spend way too much time figuring out how to draw syntax trees in Microsoft Word. Exposure to the Minimalist Program (“It makes French literary theory look downright reasonable!”) quickly disabused me of any academic aspirations, and I made a pivot into NLP and later industry, but linguistics remained close to my heart. For years I was fascinated by parsing. My master’s thesis asked whether information about the grammatical structure of sentences could be used to improve speech recognition. (The answer: not really.) Using a computer to draw a tree structure above a string of words seemed on some fundamental level to just be what one did. Syntax was the queen of linguistics, the field I hoped would be the key that unlocked the AI kingdom. I wanted linguistics to matter, but it never did.

No, that’s unfair. Linguistics provides an invaluable intellectual framework. The Saussurean notion of the sign, the syntax/semantics/pragmatics trichotomy, an appreciation of the endless structural variety of human communication: centuries of work have gone into compiling that knowledge. A programmer who fails to grapple with it and instead plunges ahead, hacking on natural language like it’s just another data structure will quickly disappear into the weeds, never to be seen again. But once you get beneath the level of worldview, the specific theoretical constructs of linguistics are largely irrelevant to natural language engineering. I will never sit in a meeting arguing the merits of LFG versus HPSG. No money will ever ride on my team’s ability to apply predicate calculus to the Zen koan that is “Every man loves a woman”. PRO-drop, ergodicity, and the middle voice–all fascinating, but as far as the software industry is concerned, just so much irrelevant donkey abuse.

For a while natural language processing was a subfield of machine learning in which linguistic knowledge was required for the feature engineering, but deep learning has started to erode even that. Deep learning is, after all, an attempt to reduce the art of feature engineering itself to just another numerical optimization problem. A new steam engine to wear down the latest generation of John Henrys. Though the deep learning technique du jour of word vector embedding is clearly an implementation of Firth’s distributional hypothesis, in its particulars it bears less of a resemblance to anything I studied in grad school than it does to Jacques Derrida’s concept of différance, God save us all. Soon your performance won’t go up every time you fire a linguist, because you won’t have hired any to begin with.

For a while this upset me. I didn’t want my work to be merely a language-shaped widget in the software machine. I wanted to do language. And how could I be if I wasn’t using linguistics? NLP was an enjoyable enough challenge to build a career around, but it wasn’t truly deep. Soon the only thing I’d have in common with my former colleagues was our shared envy of physicists. But then over the past few years it began to dawn on me that I hadn’t left the kingdom after all. Sure I wasn’t doing language, but the machines I programmed were.

Cooking is chemistry. It’s all about how different substances interact when you combine them and subject them to heat. It clearly falls within a particular scientific purview, but being a brilliant research chemist does not make you a great chef. It doesn’t hurt, but it’s irrelevant. Likewise, being a great chef doesn’t give you even a crude insight into molecular chemistry. Though concerned with the same stuff, cooking and chemistry are entirely separate disciplines. And this isn’t just the difference between theory and practice. Cooking has a theory: you can read cookbooks, learn techniques, and memorize what flavors go with what, but knowing all that won’t make you a great chef either. To be a great chef you have cook day-in day-out for years until making good food is a part of who you are.

In artificial intelligence we say that we are making computers that “understand” language, but we mean this in a qualified and metaphorical way. The thing we are trying to instill into machines is what linguists call linguistic competence, and as any linguist will tell you, linguistic competence is understanding of a very particular sort. It is not an accumulation of facts, or a set of conscious techniques. You don’t learn French by buying a French dictionary and memorizing it. Linguistic competence is knowing-how, not knowing-that. Linguistics is the science that takes linguistic competence as its object of study. Because both are abstract cognitive phenomena it can be easy to get them confused, but they are entirely different things. That linguistics is largely irrelevant to computer language engineering is no mark against linguistics, but merely a reflection of how vast the phenomenon of language is. It rarely impacts my daily work because I’m not trying to teach computers how to be linguists. I’m trying to teach them how to speak.

Posted in Mermaids, Those that have just broken the flower vase | Leave a comment

It’s Frank’s World, the Rest of Us Just Live in It

Ferdinand de Saussure: Meaning is difference.

Claude Shannon: Difference can be quantified.

Alan Turing: Quantification can be automated.


Posted in Fabulous ones, Mermaids | Leave a comment

At the Institute for Primate Communication

“NEW SIGN. WE MAKE. YOU SEE.” At first I thought I might have been misinterpreting BoBo, but he kept repeating the signs until it was clear what he meant.

“YES YOU SEE” Dian added. “WE MAKE SIGN. YOU HAPPY.” Noam crowded in behind them, eager to get in on the action. Where was this enthusiasm coming from? For months the chimpanzees had all been so uninterested in learning sign language they had seemed downright surly, but now they could barely contain themselves. “GOOD. YOU SHOW ME” I signed back.

BoBo waved Kong over and the four of them arranged themselves in a line. They were about to start, but then Noam stepped forward flailing his arms. “BANANAS FIRST!” So I gave them each a banana, and they took a long time peeling them, eating them, exchanging looks that appeared to be commentary on how the bananas tasted.

“YOU SHOW ME NOW?” I signed. “YES” replied BoBo. “NEW SIGN. WE SHOW YOU.” The four of them sat still for a moment, then in unison began making a one-handed jack off motion. This continued for about thirty seconds until the chimps collapsed on the ground shrieking uncontrollably. “NEW SIGN. WE HAPPY!” Bobo managed to tell me between shrieks.

I hate this job.

Posted in Mermaids, Those that at a distance resemble flies | Leave a comment