University of Chicago, firstname.lastname@example.org
Time: Friday, March 10, 2017, 9:00 – 10:00
Location: Building B 4.1, room 0.01
All linguists are familiar with the experience of explaining to non-linguists that the goal of linguistics is to make explicit what native speakers know implicitly. We understand that this task (the task of turning the implicit into something that is both formal and explicit) is anything but trivial. Tacit or implicit knowledge is wonderful and of great value (the knowledge of language that native speakers have), but it is not easy to subject it to analysis—though we linguists believe that there is a great reward that comes from the analysis.
The same thing can be said, I believe, about the linguist’s knowledge of how to construct grammars, and the task of making this process explicit and open to public scrutiny. My goal in this talk is to illustrate what can be learned from the effort to develop algorithms for producing grammars from data.
I will begin with a brief introduction to word discovery based on Minimum Description Length (MDL) analysis (Rissanen 1989, de Marcken 1996), and show how the errors we observe there lead to the study of the automatic learning of morphological structure. The first steps of morphological analysis are segmentation and classification, and these steps are followed by building a morphological grammar (such as a phrase-structure grammar, for example).
Each step we take to build an algorithm to embody our analytic knowledge as linguists teaches us in two ways: on the one hand, it reassures us that we do succeed in important ways in analyzing even languages we have never seen, but on the other, it makes us very aware of how difficult it is to analyze one part of a language without assuming other parts of the language to have already been successfully analyzed.
There are three very difficult questions that emerge from this effort:
- How do we evaluate how well a grammar models a particular set of data?
- How do we evaluate and compare two different grammars that handle the same data?
- How is it possible for an algorithm to seek a better analysis than its current analysis? That question has three subparts: What does it even mean for an algorithm to propose something new? How can an algorithm “look at” part of its analysis and recognize that it should be dissatisfied with it? And how can it tell if a change in its analysis is an improvement or not?
These questions relate very directly to the question of what kind of innate knowledge linguists’ work supports. Is the knowledge that the learning algorithm (or the child) must be endowed with of the same sort as the knowledge that she will be learning as she learns her grammar? If the answer is yes, then this kind of innate knowledge fits Leibniz’s picture of innate knowledge as enthymeme, whereby the discovery of innate knowledge is just like discovering implicit major premises in people’s arguments. If the answer is no (and the answer is no), then this kind of knowledge is akin to Kant’s notion of a category (though it is not static in the way Kant’s is): the knowledge that is needed in order to answer the 3 questions above all involve serious questions regarding information and complexity, notions that are preconditions for any kind of concrete knowledge about the world.
Much of the relevant work in this area has been done under the rubric of machine learning, most notably the subdomain referred to as unsupervised learning. Among the most important elements found there is the ability to quantify the amount of information that is left unexplained in an analysis. This is important not because we wish to leave information unexplained, but because it allows us to compare two different analyses of the same data, and determine which one leaves less information unexplained.
These lofty ideas will be illustrated in concrete problems of learning the morphology of English, French, and Swahili. Review articles can be found in Goldsmith 2010 and Goldsmith, Lee, and Xanthos 2017. Chater et al (2015) presents a different aspect of this concern.
References: • Rissanen, Jorma (1989): Stochastic Complexity in Statistical Analysis. World Scientific Publishing. • de Marcken, Carl. (1996): Unsupervised Language Acquisition. PhD dissertation, MIT. arXiv:cmp-lg/9611002. • Goldsmith, John (2010): Segmentation and morphology. The Handbook of Computational Linguistics and Natural Language Processing, 364-393. Wiley-Blackwell. • Goldsmith, John, Jackson Lee, and Aris Xanthos. (2017): Computational approaches to morphology. Annual Review of Linguistics. • Chater, Nick, Alex Clark, John Goldsmith, and Amy Perfors (2015): Empiricism and Language Learnability. OUP.