You shall know a word by the company it keeps.
Popular press reporting on scientific findings tends to be sensationalistic and oversimplified so I approached the recent Guardian article “AI programs exhibit racial and gender biases, research reveals” with a trepidation that proved to be mostly unfounded. The headline is inaccurate, but otherwise the article is a well-written précis of Caliskan et al. 2017, “Semantics derived automatically from language corpora contain human-like biases“.
In that paper, computer scientists found significant correlation between word vector separation of lexical stimuli in reaction time experiments and the reaction times themselves. For example, if reaction time indicated that people were more likely to associate “flowers” and “pleasant” and “insects” and “unpleasant”, the distance between these pairs would be correspondingly smaller in the embedding vector space.The fact that such radically different experimental paradigms point to the same results is an indication that a real phenomenon is being observed.
These findings take on an ethical significance because the same techniques reveal biases that are not just of a benign flowers-are-nicer-than-insects type. Reaction time and word embedding data also jointly find evidence that recognizably black names are perceived as less pleasant than recognizably white ones, or that “woman” is more tightly associated with “homemaker” than “scientist”. The Guardian headline is inaccurate because word embeddings are not AI programs themselves but rather statistical summaries of language phenomena that make AI programs possible. To my knowledge no one has yet built a racist HAL 9000 (at least not one that did anything worse than make Microsoft look stupid) but we know that unconscious bias can cause harm, so it seems reasonable to worry about how it might do so in software. This article captures some of the conversations taking place in the machine learning community around this issue.
Word vectors are just the latest instance of the distributional hypothesis that holds cooccurrences to be an indicator of semantics. It’s an old and eminently compelling idea, but it presupposes the existence of a semantics. That is to say, each word (or morpheme, or syntactic structure, or whatever you suppose the meaning-bearing unit to be) has an essential property called its meaning, which individual utterances only imperfectly reveal. It is semantics as Platonism. (Saussure’s langue/parole contrast embodies a similar idealization.) Semantics guides natural language engineering in that we want computers to not merely babble but say something meaningful. Lexical statistics help because they are a proxy for distributional facts, which in turn is a proxy for meaning.
But each link in this chain holds only if it is true in general, in the large. We may find it offensive that a mathematical representation of the word “woman” contains implicit sexist biases, but in a sense that is correct. Sexist ideas are part of the culture-wide concept represented by the term “woman”. If we didn’t observe this in our semantic representations, we’d suspect that we’d done something wrong. But word vectors aren’t just observations, they’re also an implementation tool, and there’s a big difference between observing an pernicious bias and replicating it. One might be tempted to invoke the old computer science adage Garbage-In-Garbage-Out at this point, but that misses the mark. We may object to the content of racist, sexist, or otherwise offensive language, but it is definitely not linguistic garbage. By causing offense it shows itself to be perfectly coherent, doing one of the things that language can do.
If you don’t want your word vectors to contain implicit sexism, you have to remove all the sexist documents from your training data. This is easier said than done, since human beings disagree with each other about what constitutes sexism, and even where there is consensus, automatically detecting that bias at the scale necessary for training language models it itself the kind of task that requires word vectors to work. Which doesn’t mean people aren’t trying. For instance, there is research into automatically debiasing language representations without degrading their statistical utility. Would debiased word vectors be less “true” in some Platonic sense than the unsanitized ones? Perhaps, but in an engineering context this is beside the point. There we are not concerned with having the computer capture some ideal form, but just in making it do what we want it to do.
My friend and colleague Jeremy Kahn refers to current deep learning techniques as “postmodern computing”. This is a tongue-in-cheek characterization that turns on the fact that “postmodern” is an ill-defined term that can mean pretty much whatever you want it to mean. In keeping with this spirit, let me propose a definition of “postmodern computing” that I find useful. “Modern” computing is Good Old Fashioned AI that abstracts the messiness of human behavior into logical, comprehensible rules. It is Platonic to its core. A word representation in modern computing might look like a dictionary entry: short, clear, and controllable. By contrast, current machine learning methods comprise “postmodern” computing. They make no attempt to abstract away from human messiness, but rather jump into the full statistical muck of it and proceed to wallow about. They are built out of opaque structures like word embedding vectors, which are impossible for a person to interpret, much less curate for ethical bias. It shrugs at underlying Platonic forms, and focuses entirely what you want to do in a particular, contingent moment. Pace J.R. Firth, you cannot know a word. You can only use it.