Word Vectors, Gender Bias, and Postmodern Computing

You shall know a word by the company it keeps.

—J.R. Firth

Popular press reporting on scientific findings tends to be sensationalistic and oversimplified so I approached the recent Guardian article “AI programs exhibit racial and gender biases, research reveals” with a trepidation that proved to be mostly unfounded. The headline is inaccurate, but otherwise the article is a well-written précis of Caliskan et al. 2017, “Semantics derived automatically from language corpora contain human-like biases“.

In that paper, computer scientists found significant correlation between word vector separation of lexical stimuli in reaction time experiments and the reaction times themselves. For example, if reaction time indicated that people were more likely to associate “flowers” and “pleasant” and “insects” and “unpleasant”, the distance between these pairs would be correspondingly smaller in the embedding vector space.The fact that such radically different experimental paradigms point to the same results is an indication that a real phenomenon is being observed.

These findings take on an ethical significance because the same techniques reveal biases that are not just of a benign flowers-are-nicer-than-insects type. Reaction time and word embedding data also jointly find evidence that recognizably black names are perceived as less pleasant than recognizably white ones, or that “woman” is more tightly associated with “homemaker” than “scientist”. The Guardian headline is inaccurate because word embeddings are not AI programs themselves but rather statistical summaries of language phenomena that make AI programs possible. To my knowledge no one has yet built a racist HAL 9000 (at least not one that did anything worse than make Microsoft look stupid) but we know that unconscious bias can cause harm, so it seems reasonable to worry about how it might do so in software. This article captures some of the conversations taking place in the machine learning community around this issue.

Word vectors are just the latest instance of the distributional hypothesis that holds cooccurrences to be an indicator of semantics. It’s an old and eminently compelling idea, but it presupposes the existence of a semantics. That is to say, each word (or morpheme, or syntactic structure, or whatever you suppose the meaning-bearing unit to be) has an essential property called its meaning, which individual utterances only imperfectly reveal. It is semantics as Platonism. (Saussure’s langue/parole contrast embodies a similar idealization.) Semantics guides natural language engineering in that we want computers to not merely babble but say something meaningful. Lexical statistics help because they are a proxy for distributional facts, which in turn is a proxy for meaning.

But each link in this chain holds only if it is true in general, in the large. We may find it offensive that a mathematical representation of the word “woman” contains implicit sexist biases, but in a sense that is correct. Sexist ideas are part of the culture-wide concept represented by the term “woman”. If we didn’t observe this in our semantic representations, we’d suspect that we’d done something wrong. But word vectors aren’t just observations, they’re also an implementation tool, and there’s a big difference between observing an pernicious bias and replicating it. One might be tempted to invoke the old computer science adage Garbage-In-Garbage-Out at this point, but that misses the mark. We may object to the content of racist, sexist, or otherwise offensive language, but it is definitely not linguistic garbage. By causing offense it shows itself to be perfectly coherent, doing one of the things that language can do.

If you don’t want your word vectors to contain implicit sexism, you have to remove all the sexist documents from your training data. This is easier said than done, since human beings disagree with each other about what constitutes sexism, and even where there is consensus, automatically detecting that bias at the scale necessary for training language models it itself the kind of task that requires word vectors to work. Which doesn’t mean people aren’t trying. For instance, there is research into automatically debiasing language representations without degrading their statistical utility. Would debiased word vectors be less “true” in some Platonic sense than the unsanitized ones? Perhaps, but in an engineering context this is beside the point. There we are not concerned with having the computer capture some ideal form, but just in making it do what we want it to do.

My friend and colleague Jeremy Kahn refers to current deep learning techniques as “postmodern computing”. This is a tongue-in-cheek characterization that turns on the fact that “postmodern” is an ill-defined term that can mean pretty much whatever you want it to mean. In keeping with this spirit, let me propose a definition of “postmodern computing” that I find useful. “Modern” computing is Good Old Fashioned AI that abstracts the messiness of human behavior into logical, comprehensible rules. It is Platonic to its core. A word representation in modern computing might look like a dictionary entry: short, clear, and controllable. By contrast, current machine learning methods comprise “postmodern” computing. They make no attempt to abstract away from human messiness, but rather jump into the full statistical muck of it and proceed to wallow about. They are built out of opaque structures like word embedding vectors, which are impossible for a person to interpret, much less curate for ethical bias. It shrugs at underlying Platonic forms, and focuses entirely what you want to do in a particular, contingent moment. Pace J.R. Firth, you cannot know a word. You can only use it.

Posted in Fabulous ones, Those that have just broken the flower vase | Leave a comment

Power. Truth. Speaking.

In the end the Party would announce that two and two made five, and you would have to believe it. It was inevitable that they should make that claim sooner or later: the logic of their position demanded it. Not merely the validity of experience, but the very existence of external reality was tacitly denied by their philosophy.

–George Orwell, 1984

Posted in Fabulous ones | Leave a comment

English Has No Word For

The kind of detritus—rubber bands, thumbtacks, orphaned fasteners, possibly dead batteries—that collects in drawers.

A rock or brick left next to a locked door of a common area (the rear entrance of an apartment building, say, or a laundry room) so that people can prop the door open when their hands are full.

The unspoken agreement to leave a rock or a brick next to a locked door to a common area by all the people who use it.

The colors yellow or orange perceived as a single thing. (Perhaps “yorange”. If necessary we could call yellow “light yorange” and orange “dark yorange”.)

A concept that cannot be expressed because there is no language for it. Oops, sorry. “Ineffable”. Never mind.

Musical genres that haven’t been invented yet.

Relating to the bank of a creek. Specifically a creek and not a river.

Absent-mindedly scraping off the label of a beer bottle with your fingernail.

An insufficient amount of sand.

The quality of being small and requiring delicate manipulation—characteristic of earrings, watch knobs, pretty much all surgery.

Having a useless skill.

The opposite of photogenic.

The time between when your fuel gauge reads empty and when you actually run out of gas. This one in particular lends itself to metaphor.

The property of lending oneself to metaphor.

What sawdust feels like when rubbed between your fingers.

A song with only two chords.

Being just beyond the cusp of something.

Strikingly angular or strikingly rounded but definitely not in between.

Something that is not optional that really should be optional.

Something you have momentarily forgotten.

Tossing a ball in the air and catching it 99 times, then missing it on the 100th toss. Again, lending itself to metaphor.

Abandoning a train of thought.

Nostalgia for things you did not actually experience.

The fleeting realization that you too will someday die.

Posted in Mermaids | 2 Comments

For the Sake of the King

Write a computer program to play chess. It doesn’t have to play particularly well, just display an ability comparable to the average human being who has some aptitude for the game.

What is the point of that? Computers have been playing passable games of chess for almost as long as there have been computers. In 1996 IBM’s Deep Blue defeated the human grandmaster Gary Kasparov, surely moving chess-playing into the “solved” column of computer science. In hindsight, making chess a benchmark of artificial intelligence seems like a mistake. To play well one must be able to quickly enumerate a large but nevertheless finite and well-defined search space, something that computers are better at than human beings. Suggest something harder.

People playing chess in a park

But wait, I was serious about that “average human being” part. For example, a human being can play on all sorts of sets. The board can be a piece of black-and-white cardboard with a fold down the middle, an unrolled green-and-tan square of felt, or rigid expensive carpentry with mother-of-pearl squares. There can be wood pieces, plastic pieces. Novelty sets with chessmen in the shape of Civil War soldiers or J.R.R. Tolkien characters. Stylized two-dimensional shapes from a newspaper column, or those same shapes projected on the screen of a different chess-playing computer program. Make your program handle that.

That’s probably doable. Distilling the common essence of these various situations (thirty-two identifiable things, grouped into various equivalence classes, arranged on an 8×8 grid) is about state of the art for computer vision. It is straightforward to formulate a machine learning approach to a chessboard/not-chessboard classification. To identify individual pieces and their relevant spatial relationships to each other is harder, but let’s say it too is doable. “Merely” an engineering task. You may elect to put an image processing layer of sufficient accuracy in front of your traditional chess-playing program, in whose source code I could no doubt find strings like “bishop”, “board_position”, and “legal_to_castle”.

But the average person isn’t born knowing anything about bishops or board positions or castling. We don’t have the capacity for identifying knights (paradigmatically horses, but in failing that the most contextually horse-like things in a set of fourteen other things) hard-wired into our brains. So it is cheating to hard-code a concept of knight into your program, or even to have amassed pictures of knights as part of the training process for a computer vision system. (And what does “contextually” mean anyway?) No, you must write a program that when presented with a series of chess-like situations is somehow able to discern their significance.

(You are allowed to write a program learn to play chess by having the game explained to it. That is how most people learn. In this case, though, your program would have to understand human language.)

What is a chess-like situation? How do you steer your program’s attention towards those things in particular? Why does a person play chess? For enjoyment, intellectual challenge? To be sociable, or to satisfy a competitive urge? The earn masters’ points, to win, to prove yourself, to hustle money in Washington Square Park? How do you incline a machine to arrange these disparate concepts around a hub of black-and-white and thirty-two pieces? (And it must be thirty-two. If someone presents your program with a setup missing a queen, or with the pieces lying in a jumble in the middle of the board, it must be able to identify this as “not-chess” and fail to play in an appropriate manner.) I haven’t even asked you to build a robot to pick up chessmen and physically move them, even though that is what people do too, and also aren’t born knowing how to do it. How do you pick chess out from the general flow of life?

Posted in Those that have just broken the flower vase | Leave a comment

Whorf and Whoop

As a member of Generation X, I can’t stand Millennials. Their sense of entitlement, their social-media driven narcissism, their inability to put their smart phones down for one moment and have a conversation like normal human beings. But mostly what I mind is their having stolen the media attention that was once lavished on my cohort. Lifestyle puff pieces used to read deep cultural significance into our musical and sartorial tastes. The details are fuzzy now—something about latchkey children and flannel shirts—but there was a clear consensus that something was afoot. I’d say the peak of this attention came when I was in my early twenties, which honestly doesn’t seem that long ago. Then sometime in the interim attention shifted to these Millennials. I wonder what could have changed?

Pity the poor lifestyle section writer. You are charged with chronicling the minute tics of whatever demographic is currently comfortable with the latest technology and in reasonable physical shape. The photographer sent to accompany you has it easy (“Make sure to get some pictures of pretty girls.” “No problem, boss!”), while you are charged with conveying the meaning of it all. What to do? You can latch onto some other current element in the public consciousness—divorce, Vietnam, the internet—and assert a connection. If you are in America, you can take the name of the last President, append “-era”, and pretend your subjects are representative of whatever that was. Sometimes, exhausted, you draw attention to an arbitrary detail and hope it just kinda reifies itself.

The greatest example of this is is a 2009 New York Times lifestyle piece in which a writer clearly desperate for an angle took a stroll around Williamsburg, noticed that a number of the men had pot bellies, and tried to spin it into a tale of the evolving Millennial physique. The headline he came up with was “It’s Hip to be Round”, but a more honest one would have been “Young Men in Brooklyn Have Normal Human Bodies”.

The latest bit of arbitrary generational signification to make the rounds is the “Millennial Whoop”. This is the “Wa-oh-wa-oh” backup singer riff that has shown up in a lot of pop songs in the past decade or so. The clearest example may be Katy Perry’s “California Gurls”, but it’s not hard to hear once you’re cued in to it.

Given that this is just movement between a perfect 5th and a major 3rd, it’s surprising that it’s distinctive enough to register, but it does. After listening to a few of the clips linked in the article above, I am convinced that the Millennial Whoop is indeed a thing, and what’s more, it actually deserves a demographic moniker. Musical forms really are generational characteristics. Musicians naturally copy other similar musicians, who tend to be about the same age. Style is consensus and stylistic adjacency lines up with adjacency in time. Slang works the same way. In this manner, kids today really are different.

As a structural observation about music this is interesting if not terribly surprising. (Imagine the alternative: nothing ever changes.) The trick is that once you’re aware of it, it is difficult not to read some significance into it, just as it’s difficult not to read significance into phenomena less obviously characteristic of a generation. To its credit, the linked article mostly avoids delving into these particular tea leaves, but in its final paragraph even it feels compelled to name check climate change, economic injustice, and racial violence as things from which the diatonic scale can provide refuge. And of course in the comments section, anyone who tries to judge this innovation inevitably sees it as a bad thing, a fall from that Edenic past when girl groups sang “Sha-la-la” instead of “Wa-oh-wa-oh”. This is a ratchet effect familiar to anyone foolish enough to wade into the comments section on YouTube.

Reading unwarranted cultural meaning into arbitrary schemes of signification in language is the apophenic sin linguists call Whorfianism, and we never tire of beating up on people who fall for it. Reading significance into a purely structural move such as this is a kind of musical Whorfianism, but the concluding paragraph of the Millennial Whoop article gives a hint as to why this temptation is so hard to resist. You’ve just written a short essay identifying a certain musical pattern, and now you want to end on a punchy note.

So it is that the Millennial Whoop evokes a kind of primordial sense that everything will be alright. You know these notes. You’ve heard this before. There’s nothing out of the ordinary or scary here. You don’t need to learn the words or know a particular language or think deeply about meaning. You’re safe. In the age of climate change and economic injustice and racial violence, you can take a few moments to forget everything and shout with exuberance at the top of your lungs. Just dance and feel how awesome it is to be alive right now. Wa-oh-wa-oh.

This is well-written. But it is well-written because it doesn’t come to the more accurate if anti-climactic conclusion of “so that exists”. It pulls back to put things in a broader human context. As humans, we may construct our story-telling tools out of systems of arbitrary signification, but to be compelling the stories we tell cannot themselves be arbitrary. They have to be about us.

Posted in Mermaids, Those drawn with a very fine camel’s-hair brush | Leave a comment

Everyday Heroes’ Lives Matter

If an adult man with diminished mental capacity sits down in the middle of the street, acting confused and belligerent, it might be necessary to call the police. Ideally, the police would calm the person and get him out of harm’s way. That they manage to perform this kind of task day in and out without anyone getting hurt is one of the reasons we are grateful for their service.

Law enforcement’s job is done once they’ve gotten the guy out of the street, but that isn’t where the story ends. Ideally they would then turn the man over to a facility where he could be looked after by level-headed compassionate people who made his well-being their concern. People like Charles Kinsey.

Once again we have video of an unarmed black man being shot by police. As with Philando Castile, the motorist recently killed in Minnesota, Kinsey complied with police orders. It is difficult to imagine what he could have done differently in that situation in order not to get shot. As we’ve come to say, he “did everything right.” But in this case that doesn’t just mean he refrained from any sudden movements.

Watching this video, I don’t just feel sorry for Kinsey, I admire him. He didn’t have to go out into that street. He could have hung back and waited for the cops. When things spiraled out of control, he kept his head, communicated clearly, and did everything he could to deescalate the situation, all while still trying to protect a confused and vulnerable man. We should all be as able to handle ourselves in a crisis as well. Charles Kinsey showed exactly the kind of courage, professionalism, and everyday heroism we expect from a police officer.

I have also been in situations where I had to care for someone who was confused, belligerent, and a possible danger to himself and others. A couple of times I had to call the police, who helped me to the best of their ability. And while I was grateful for their assistance, in the end there was little they could do. We’re law enforcement, they told me flat out. We’re not nurses or social workers. If we tried to be we couldn’t do our job. In the end the people I needed the most weren’t cops, but nursing home attendants. On a day-to-day basis these are the people who make a difference for me and the people I’m responsible for, and they appear to do basically the same work as Charles Kinsey.

Cops have a job that most of us don’t want to do. In part because they subject themselves to possible violence, even risk getting killed. But they do a lot more than that. They handle drunks, domestic arguments, and schizophrenics who shouldn’t be living on the street, but there they are so somebody has to step in when they start yelling at passersby. Police handle situations that never rise anywhere near the level of violence, but still lie at the unpleasant edges of society where the social norms that get the rest of us through the day have broken down. There are a few other professions in this space. EMTs and social workers come to mind—and also behavioral therapists at halfway houses for mentally impaired adults. It’s not a stretch to say that Charles Kinsey and the North Miami police department are doing different aspects of the same job. (Considering how full our jails are of the mentally ill, not much of a stretch at all.) This week Kinsey got shot while doing his job better.

Posted in Belonging to the emperor | Leave a comment

IDEs are Code Smell

A pile of various hand toolsPretty much every computer programmer uses an Integrated Development Environment. Pretty much everyone has their favorite: Eclipse, IntelliJ, Sublime Text, Emacs, Vim. There will never be consensus on which is best. Fortunately there doesn’t have to be. The wide variety of work environments in use in the open source world has forced language designers to invent standard, robust, IDE-agnostic build environments. Java has Maven and Maven archetypes. Javascript has npm and Yeoman. Scala has sbt. These tools allow you create, build, and manage the dependencies of your project without ever touching an IDE.

IDE-independence has a lot of advantages. Command-line only environments play nice with continuous build systems. They make it easier to on-board new developers by removing IDE-specific tweaks that often take root as undocumented lore. They tend to be more stable, more future-proof, and more popular than IDE-specific build techniques. (Which makes a big difference when you have to go ask a question on StackOverflow.) Also, developers have their favorite tools for a reason, and when you force them to use something else it hurts their productivity.

When starting a new software project you should adopt a strict IDE-agnostic policy. The rule is “I shouldn’t be able to tell from anything you check in to source control which IDE (if any) you are using”. Putting this policy into place isn’t hard–it’s just a matter of using the right tools to create a fresh project. After that, things pretty much take care of themselves. A little up-front standardization wards off a lot of build environment technical debt down the line.

Posted in Those that have just broken the flower vase | 3 Comments