Example syntax sentences are a genre unto themselves, and that genre is shaped by forces both superficial and deep. In conversation, syntacticians are always having to come up with illustrations of this or that grammatical phenomenon on the spot, so their examples overrepresent situations typical to offices and classrooms. Books are always sitting on tables. People are eternally giving one another pens. When syntacticians get around to writing things down, they try to spice it up a bit. Cheeky graduate students illustrate transitive verbs by having the characters from South Park do horrible things to one another. Liliane Haegeman fills an entire textbook with examples from the Hercule Poirot stories. And you can often tell when a paper was written by merely observing which United States President is being gently razzed. All this falls under the superficial heading because it’s on the periphery of syntax: the pens-and-tables business is clearly a performance effect, while humor takes place way up in the lofty heights of pragmatics.
A deeper force is the pressure to choose a certain variety of structurally simple examples that I call “clean room sentences” because they are particularly amenable to linguistic prodding. A lot of these examples may not seem simple–multiclause monsters constructed to tease out subjacency effects can get pretty hairy, but it’s a kind of hairiness that comes from operating at grammaticality’s bleeding edge. There’s a different kind of perfectly grammatical complexity we overlook even when it’s right under our noses. I remember when I took my first syntax class I decided to practice by sitting down in a coffeeshop one morning and draw transformational grammar trees for sentences in the newspaper. I started with the first sentence of the lead article, but, no, it had a tricky parenthetical clause, so I moved on to the next sentence, but that was a no-go as well because it contained the word “and”, and conjunction is a dark art as far as much of theoretical syntax is concerned, best passed over in silence. Eventually I found a few short treeable sentences, but they were clearly the outliers. This syntax stuff just didn’t work for everyday written language. And as for the spoken conversations going on around me–forget about it.
There are two ways you can go from this. Some frameworks like HPSG see this as a problem to be solved and make coverage an explicit theoretical goal. As a rule these frameworks have computer implementations and are less insistent about their grounding in psychology. More E- than I-language. They still have a hard time with your daily paper, but at least they try. Traveling further still along the computers-and-description axis you come to the markup formalism employed by the Penn Treebank, which does a helluva job on Wall Street Journal stories and can even handle speech disfluency, but at the cost of drastically less theoretical sophistication. No feature structures, no logical form, no movement rules aside from circa-early 1980’s Principles and Parameters-style noun-phrase traces which dot the trees like legwarmers. Really it’s just a bunch of n-way-branching structures and non-terminal node names that sometimes have a whiff of oh-hell-let’s-just-call-it-something about them.
The other way you can go is to say if clean rooms are good enough for physicists, they’re good enough for linguists as well. Newton’s First Law was an advance over Aristotelian models because it abstracted away the phenomenon of friction which is so much a part of our daily experience of the world. Chomsky puts it quite well when he says that you can’t discover the laws of physics by “looking out the window and…watching the leaves float by.” He doesn’t say so, but it’s safe to take this as a classic Kuhnian sentiment. There isn’t some universally agreed upon body of facts out there that theories compete to explain. Instead the delineation of facts is part and parcel of a theoretical framework.
For Chomsky, theoretical sophistication is the point of syntax, and if that comes at the price of being unable to give a formal account of any random newspaper paragraph or conversation on the street, so be it. Let’s just call that part performance, or rhetoric, or whatever, so that we can move past it and on to the heart of the matter, which is an account of the innate human language capacity. I smell a tradeoff here.
|Tradeoff||Coverage vs. Theoretical Sophistication|
|Deeper Principle||A linguist can only remember so many rules|
Generating a large set of utterances from a smaller set of rules is hard. There’s just too many things a person can say, and so many rules a linguist can fit comfortably in their head. So either you simplify your rule set to make it hew more closely to the range of utterances, or you rule out classes of utterances as beside the point.
Where you choose to locate yourself on this continuum depends on what you believe the object of linguistic study should be. If you’re on board with the reigning hypothesis of the past fifty years or so–that there is an innate and distinct language module in the human brain–then it’s worth ignoring some New York Times dross to get at it. If you think Universal Grammar is a reification–or if you believe it’s real, but doubt that it’s possible to discover neurological facts by drawing trees on a blackboard–then you’ll veer towards coverage, relying on data culled from the wild to keep you honest. It may not be a paradigm shift, but it’s at least a paradigm rift, across which practitioners eye each other nervously.