Some people believe that after you die your soul will move on to an afterlife which if you were good will be an eternity of joy but if you were bad will be a period of endless torment. They’re almost right. You do spend an eternity experiencing either pleasure or pain depending on the moral quality of your life, but this happens before you are born. The kindly grandmother who volunteers at the local soup kitchen has already been rewarded by a infinite period of blissful communion with her creator. As for the pillar-of-his-community child molester, uncaptured serial killer, and the concentration camp guard who dies peacefully in his bed at age eighty, take heart–their skin was indeed flayed by demons in a lake of fire. True, this agony ended at the moment of their births, but it’s not like these monsters are getting away with anything because it had no beginning. Only a mere mortal would quibble over ordering since the books end up balanced just the same.

–So what happens after you die?
–Gee, I don’t know. I never really thought about it.

Posted in Innumerable ones | Leave a comment

Wittgenstein Went to New Orleans and all he got was this Lousy Croissant

When you can speak multiple languages or dialects and switch back and forth between them in order to make a point, that’s called code-switching. When you speak a single language or dialect and employ the forms of a different one for no clear reason, that’s called affectation. As always there are boundary cases.


The French pronounce this /ˈkwɔsɔn/. Most Americas say /kɹ̩ˈsɑnt/. Croissant is a loan word, but we’ve been eating croissants over here in North America for long enough that I think we can call this a fully-fledged member of the English lexicon, recognizably French spelling notwithstanding. However, it’s not uncommon for me to hear (presumably monolingual) English speakers say /ˈkwɔsɔn/ to other (presumably monolingual) English speakers in an otherwise English-only context. This sounds affected to my ears, but whenever I’ve heard it the usage has been so unselfconscious–so devoid of wince-inducing social jockeying–that I wonder if there is an actual lexical change afoot.

New Orleans, Long Island, Shanghai

Here’s something that I do so I don’t think it’s affected, but you be the judge. When saying the names of places I’ll lean towards a well-known native pronunciation instead of what would be standard for my dialect. So given a choice between /nju ɔɹˈlinz/, /nju ˈɔɹlɪnz/, or /ˈnɔlɪnz/ (N’awlins for you non-linguists) I’ll opt for the last one, even though the total amount of time I’ve spent in New Orleans is about two days when I was five. Likewise I say /lɔnˈgaɪln̩d/ instead of /lɔŋ aɪlæn̩d/, hitting that medial /g/ hard like a loanshark does a deadbeat. More subtly for an American, I pronounce the name of the China’s largest city /ʃɒŋhaɪ/ instead of /ʃæŋhaɪ/ because that’s how I’ve heard Chinese people say it. (Though I say /ʃæŋhaɪ/ for the verb that means to kidnap someone and force them to serve in your navy, because that’s a different lexical item. I also refer to the 1986 Madonna/Sean Penn vehicle as /ʃæŋhaɪ supraɪz/, but perhaps it is best if we don’t speak of this at all.)

There’s a limit of course. I’m not going to say “Last summer I took a lovely vacation in /mɛxiko/” because I don’t want to sound like a schmuck. I suggest the native pronunciation without trying to sound convincingly native myself. What this translates to into phonetic terms is borrowing segments and broad stress patterns, but drawing the subtler aspects of pronunciation (e.g. vowel quality) from my own dialect.


All these issues come together in a crucial aspect of Wittgenstein scholarship: how should an English speaker say his name? As I see it, there are four options.

  1. /wɪtgɪnstaɪn/…This is reasonable because it is consistently anglicized in a here-in-America-we-speak-American-buddy kind of way.
  2.  /vɪtgɪnstaɪn/…This is how I say it. It’s pretty standard and sounds reasonable, but when you think about it, just changing the initial consonant while leaving everything else the same is kind of sloppy. This pronunciation belongs to a dialect that I call War Movie German, which is identical to English except for the addition of the lexical items “achtung” and “jawohl” and the single phonetic rule #w → v.
  3. /vɪtgɪnʃtaɪn/…Logically this seems better than (2). If you’re going to try and sound German you might as well go all the way, but to my ears something is off. I think because the segment change v→w is so much more famous as a German stereotype than s→ʃ that the latter perversely comes off affected by comparison.
  4. /wɪtgɪnʃtaɪn/…See sounding like a schmuck above.
Posted in Mermaids | 1 Comment

The Prospector and the Snowstorm

There was a gold prospector who got caught in a blizzard in the Alaskan wilderness. He hiked for three days in blinding whiteness, lost, starving, frozen, and alone. The prospector prayed to God to save him, but all that came was more snow. On the fourth day the prospector’s strength failed him, and he lay down to die. Just at that moment an Eskimo hunting party wandered past. They strapped the prospector to their sled and dragged him back into town.

That night as he lay safe and warm in his bed, the prospector prayed to God again. “Lord,” he prayed, “when I asked for your help and it didn’t come, I thought you had forsaken me. But then as I was about to die, you sent those Eskimos. It is a great comfort for me to know that the moment things seem the most hopeless is the moment when you will intervene.”

All at once the prospector felt a strange and powerful presence, and God’s voice sounded in his head. “No,” God said, “you are mistaken. I was the one who sent the blizzard. I had nothing to do with those Eskimos. It was dumb luck that they came by when they did. If you want to take comfort in something, take comfort in knowing that if you should freeze to death in the wilderness, that would be part of My plan, and My plan is the right one because it’s Mine.”

Posted in Those that at a distance resemble flies | Leave a comment

Gay Animal Hoarders

As mental illness reality TV goes, Animal Hoarders lies somewhere in the middle of the voyeurism spectrum–not on a par with the legitimate public service that is Intervention but also nowhere near the rank exploitiveness of Celebrity Rehab with Dr. Drew. In keeping with its genre, each installment is as formulaic and interchangeable as a sea chanty or episode of CSI. There is a brief prelude in which a happy couple expresses love for their pets followed by a reveal of just how many pets we’re talking about here–thirty dogs, a hundred cats, two hundred screeching budgies colonizing every square inch of ceiling space. The hoarder half of the couple insists to the camera that everything is fine, the depth of their denial underscored by shots of domestic chaos and animal filth. The non-hoarder half of the couple tries to talk tough about the need for change, but it quickly becomes apparent that they are enablers of this behavior. At the midpoint the producers intervene, negotiating the removal of at least some of the animals. The episode concludes with a follow-up some time later in which the animal level is generally still high but below where it was, and the couple expresses hope for the future.

About ninety percent of these couples are married people of the opposite sex. About ten percent are same-sex couples. This has absolutely no relevance for either the hoarding behavior or the clockwork manner in which it is depicted. The soft-butch spaniel enthusiast is precisely as crazy as the married cat lady, and in precisely the same way. Even if the producers believed the hoarders’ sexual orientation to be relevant, they would probably end up ignoring it anyway because these episodes have to hit a very specific sequence of marks in only twenty minutes. The result is that a TV show with no political agenda whatsoever ends up normalizing homosexuality in a particularly insidious way, by permitting us to gawk at unattractive queer couples who are messed up for reasons that have nothing to do with sex.

There are only so many hours in the day and so many words that can be spoken in an hour, and the narrowness of a communication channel naturally tilts it towards conservatism. It is easier to buttress a consensus than challenge it because the pat phrases are already out there, the groundwork has been laid. In the thirty-second news slot the spokesman for received wisdom always has the home field advantage. But contrary to the whingeing of pols and conspiracy theorists alike who feel like they can’t get a fair shake from the media, this is a purely structural phenomenon. It is ideologically neutral, and sometimes concision helps to undermine a consensus, or at least hustle a fading one out the door a little faster.

Twenty years ago a show like Animal Hoarders wouldn’t have been able to let the sexual orientation of its profilees pass without comment. It would have had to have been acknowledged and somehow minimized either for fear of offending homophobic viewers, or providing more ammunition for their prejudices, or both. The easiest thing would have been to quietly adopt a heterosexuals-only policy for the show, but from the producers’ standpoint that would have been a hassle because there probably aren’t that many non-camera-shy animal hoarders out there, and the less picky you can be the better. So the moment this delicacy is no longer absolutely required, the invisible hand of the reality TV marketplace pushes it to the curb. An outmoded sexual taboo is abandoned literally because no one has time for it.

Posted in Those drawn with a very fine camel’s-hair brush | Leave a comment

Importing scikit-learn Models into Java

Currently scikit-learn is the best general purpose machine learning package. It is part of the Scientific Python family of tools, built on top of the Numeric Python matrix processing engine. The code is readable, documentation extensive, and the package is popular, so there’s plenty of help available on Stack Overflow when you need it. But perhaps scikit-learn’s best selling point is that it’s written in Python, a language well suited for the ad hoc exploratory working style typical of machine learning. Java machine learning toolkits like Weka and Mallet are mathematically solid, but running mathematical algorithms is only part of the job of data science. There’s also inevitably lots of format munging, directory groveling, glue code, and trying things that don’t work. You want the basics to be as easy as possible. The Python command line achieves a level of transparency that Java–with its boilerplate, IDEs, compilers, complex build systems, and lack of a REPL–cannot match.

Illustrations of machine learning classification

Still, the JVM is a popular platform, and it would be nice to be able to train a model in scikit-learn and then deploy it in Java. There is currently no support for this. The right thing would be to have scikit-learn export its model files to some common format like PMML, but that feature does not currently exist.1 scikit-learn’s only serialization is Python’s native pickle format, which works great, but only for other Python programs. In theory, writing your own serialization should be easy. A model is just a set of numbers, but it only works if the test time code exactly reproduces the training code’s processing of its input. Any deviation and your finely tuned vector of coefficients becomes nothing more than a numeric jumble.

Let’s take a look at a fairly simple but still non-trivial machine learning model and see what is involved in exporting its semantics in a cross-language way. Say I want to do text classification. I have a corpus of short documents drawn from two genres: cookbooks and descriptions of farm life. I have tab-delimited text files that look like this.

0   The horse and the cow lived on the farm
1   Boil two eggs for five minutes
0   The hayloft of the barn was full
1   Drain the pasta

The first column is an integer class label and the second is a document. I want the computer to learn how to hypothesize a 0 or 1 for any string input it is given. A standard approach would be to treat the documents as bags of words and build a Naive Bayes model over them. To make things more sophisticated, let’s train on bi-grams in addition to individual words, and work with Tf-Idf values instead of raw counts. scikit-learn makes this easy. Here is the bulk of the code needed to train such a model.

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import Pipeline

def train_model(corpus):
    labels = []
    data = []
    for line in corpus:
        label, vector = line.decode('utf-8').split("\t")
    model = Pipeline([('vect', CountVectorizer(ngram_range=[1, 2])),
                      ('tfidf', TfidfTransformer()),
                      ('clf', MultinomialNB())]), labels)
    return model

The model returned is a scikit-learn object. If we want to export it to another language, we have to extract its meaningful parts and serialize them in a general-purpose way. These meaningful parts are sets of numbers. Specially, for each dimension in vector space there is an Idf score and an array of coefficients for each class. Additionally there is a scalar bias for each class. So what we have is a vector of numbers plus a mapping for strings to vectors of numbers.

Bias 0 Bias 1
-0.693 -0.587
Term Idf Coefficient 0 Coefficient 1
garlic 4.673 -8.327 -6.825
peel garlic 3.522 -12.805 -10.505

You have to do some detective work to figure where inside the scikit-learn objects these numbers actually reside, but once you have them you can serialize them in a language-agnostic way by writing them out as JSON. Sure the file will be huge, and the representation of floating point numbers as strings is wildly inefficient, but we can always gzip the thing.

Now the Java decoder needs to 1) load this file 2) turn the input into n-gram terms 3) build a vector of term Tf-Idf scores 4) linearly transform that vector using the model’s coefficients and biases. None of this is particularly difficult,but you have to make sure that the Java decoder performs each of these steps in exactly the same way as the Python encoder, so that the numbers passed between them retain their meaning.

The Linear N-gram Model project contains a Python training script and a Java decoder that does this. Train a model in Python on a corpus like the one pictured above, run it in Java on unlabeled text and it will produce class predictions and log likelihoods like so.

0   -47.8674 -47.1280   The harvest was finished early this year
0   -47.0950 -42.8352   We fed the horses and the pigs
1   -45.3605 -46.8341   Place the garlic in a pan

This project can serve as starter example code for machine learning researchers faced with a similar cross-language serialization task.

1But check out Py2PMML, which looks like it gets you part of the way there. (Hat tip darknightelf.)

Posted in Innumerable ones, Those that have just broken the flower vase | Leave a comment

The Rules

There is one sexually explicit photo of me in existence. San Francisco. Luxury suite in the Mandarin Oriental Hotel. A settee. Face clearly visible, and as X-rated as you could want. Actually, I’m not sure if it is still in existence, because it was taken on a digital camera, and a lot of that data got lost during the great smartphone migration. Still, it is possible that this picture might surface on the internet where malice or a reverse image search could tie it to my name, so that henceforth “W.P. McNeill” would no longer connote insightful short essays on the intersection of linguistics and artificial intelligence leavened with the occasional McSweeneyish joke list, but just common porn.

I would not be humiliated if this photo were to become public. Maybe I’m fooling myself, but the hypothetical scenario evokes feelings ranging from mild embarrassment to sneaky pride. I imagine acquaintances who stumbled across the picture by accident making exaggeratedly comic shows of shielding their eyes. I’d like to imagine those to whom I’m attracted then stealing a second glance. I cannot imagine anyone who has ever been in a position of power over me either failing to immediately grasp the circumstances or actually caring. I can imagine a bit of razzing, easily laughed off. Of course someone who had it in for me could use a sexually explicit photo to create an embarrassing situation, but they’d have to do more than just show the thing around. They’d have to work at it, and likely would come off looking worse than me.

Hotel room interior

I am immune to this particular form of humiliation because I am male. The celebrity victims of the latest round of hacked private photo accounts have all been female, as have been the non-celebrity victims of the ongoing phenomenon of revenge porn. This latest round has seen second-order outrage directed at people–mostly male–who recommend that women who don’t want to risk nude pictures of themselves being distributed simply not take nude pictures of themselves. I agree that this is a simplistic blame-the-victim mentality. Life is inherently risky, and you are entitled to feel upset when some extra risk you take on comes back to bite you. But the outrage isn’t over the risk/reward ratios inherent in easily reproduced boudoir snaps. Instead it’s the cavalier way some men shrug off a thing many women feel is a form of sexual assault.

The men who do so are showing a lack of empathy. Literally. They are failing to imagine what it feels like to be in a woman’s place, but I find this lack more comprehensible than other forms of gender blindness. We all know that the rules can be arbitrarily different for women and men, but in the case of the power of a nude photo, the arbitrary difference is all there is. By contrast, at the same time the celebrity photo scandal was breaking, pop culture commentator Anita Sarkeesian has been subject to vicious anonymous attacks for having written about misogyny in computer games. No one says, oh just shrug it off when someone is making public threats to kill you and your family. Likewise, the recent #YesAllWomen Twitter campaign drew attention to the phenomenon of street harassment, which is scary to be on the receiving end of regardless of your gender. The feminist issue in these cases is not harassment per se but rather the fact that there are certain kinds of harassment that women face disproportionately.

In the case of nude picture distribution, however, the act itself, a particularly strident form of sexual objectification, has a different meaning depending on the gender of the person in the picture. The rules in our society state that when men objectify women it is a hostile act, but when women objectify men it is either odd or flattering. By convention, a man is supposed to be bemused by a leaked nude selfie, and a woman is supposed to be devastated. These are the rules regardless of the relative shyness of individual men and women, and regardless of the fact that all of us at one time or another want to be treated like sexual objects by people of our choosing. By their ubiquity these rules exhibit a powerful force, even on those of us who believe that they are senseless and leave women at an unfair disadvantage.

That is why the men who say about revenge porn just shrug it off–even though they are failing to comprehend the full hostility of the act–are still onto something. The damage comes not from the images themselves, but the conventions surrounding them. Because even if you’re a cheerfully un-slut-shamable woman (or an actress who does nude scenes in films) you can still be rankled by your awareness of the army of mouth-breathing cowards out there who benefit from the consensus that they have taken something from you by viewing your particular aureolae out of the sea of aureolae available online. But it is possible to imagine a new consensus that declares they haven’t. Viewed from a different angle, revenge porn is just a way of proclaiming to the world, “Hey look, another woman who won’t be having sex with me!” A generation from now I suspect it won’t exist.

Posted in Included in this classification, Those that tremble as if they were mad | Leave a comment

Two Feet of Air

I can swim. We also say that I know how to swim, but that is misleading because it makes the whole business sound more intellectualized than it is. How it actually works is this: put me in water and I will begin to do the crawl. I will propel myself from one side of the pool purposefully. My body knows what to do and I do not drown.

If you handcuff me before throwing me in the water, I would not be able to swim. But I would not have lost the capacity to swim. My body would still know how. It would just have been prevented by circumstance. You may be a sadist for applying handcuffs, but I am still a swimmer.

Monkey in purple shirt seated in front of a typewriter

Linguists rely on the handcuffed swimmer analogy to illustrate the sometimes elusive difference between competence and performance. I have a capacity for speaking English, yet there are any number of ways I may fail to speak English: I may slur, stumble over a word, or try unsuccessfully to shout over a jackhammer. None of them detracts from my status as an English speaker. They are accidental impediments. Handcuffs.

To linguists, the capacity is the thing. It is what we call competence and is what we study. Performance effects are the accidental noise that must be stripped away, like a physicist strips away the effect of friction to get at the pure laws of motion. In language and other human activities there appears to be a strong correspondence between competence/performance and inside/outside. My capacity to swim or to speak is something internal to me. Impediments are external. They may matter in practice, but they are not me.

This can be a difficult concept to convey, despite the fact that it is something we all want to believe. I am my potential. What I actually do is just a pale refraction of the real thing, what I could do. The true me, my authentic self, is the homunculus inside my skull, directing my body through an imperfect world.

However, very few of us are ever thrown into a pool handcuffed. If you encounter a non-swimmer, it is probably because that person never learned to swim. Their body does not know what to do. No amount of academic study can grant you the internal capacity to swim, only time in the pool can do that. To be a swimmer you must have actually swum in water at some point in your life. Multiple times. Successfully. To count as an English speaker, you must communicate with other English speakers in English on a regular basis. Likewise, the mensch must also exhibit actual, felt kindness, and the the casanova must unmistakably seduce.

Properties that we would like to attribute to the internal homunculus–swimmer, Anglophone, mensch–have meaning only in our interaction with others. Our true selves don’t live inside our heads. They live in the two feet of air around our heads, where the world impinges.

Posted in Fabulous ones, Mermaids | Leave a comment