For my current project, which automates an approach to understanding decision making processes of elites, first developed by Robert Axelrod in the 1970s, I make use of two newly developed computational linguistic tools (see my github profile).
Rather than discussing the specifics of my current project, which you can read about here, I will use this space as an attempt to stimulate a discussion about the study of texts and language as a means to understanding culture, norms and collective behavior.
There is a long-standing tradition in all of the social sciences to analyze the language that people use to communicate ideas and express themselves as a means to understanding their motives, desires and actions. In economics, proponents of the Historical School analyzed language to study the unfolding of economic events of a particular time and place and while this school of thought died in the early 20th century, ironically for historical reasons, new theories of language and economics are starting to sproud (here and here) although they seem to all be conceived by the same person: Arial Rubenstein (here is a review). In Political Science, Gary King and Justin Grimmer have had some noteworthy contributions to advancing our social and political understanding by systematically studying texts. Using mostly a practice known as "discourse analysis", in Sociology, spoken language and texts are probably still the main sources of data and this has been so since the early days of Max Weber and Emile Durkheim and the same seems to be true of Anthropology; please correct me if I'm wrong (although the criticism below most often applies) ...I would greatly appreciate if someone could point me to more scientifically rigorous approaches to the study of texts and/or utterances in these fields (Anthropology and Sociology)!
The main critiques of using spoken and written language to gain an understanding of social forces, which I'm aware of, relate to one of two things: 1) the selection of evidence seems to be arbitrary and anectodal, or worse, most often evidence seems to be purposely selected to make a particular point (it is hard to check how evidence was selected and the data collection process is hardly transparent). 2) even if the texts or utterances were to be selected in a transparent and unbiased way, the amount of evidence is usually small as it has to be processed, analyzed, or interpreted by humans, which seems to allow for only very small bits of texts or utterances to be subjected to analysis; this hardly amounts to a serious method, it seems to be a collage, an artistic or poetic expression, rather than a science. For true understandig, this modus operandi is no doubt unacceptable as a means to transparently gain a shared understanding of the social world; but this is not to say that texts and utterances produced by humans can not serve as rich data sources to test interesting hypotheses. To the contrary, it seems to me that texts and transcribed utterances are some of the richest sources of data that are perpetually produced by the social world (in very large amounts)!
Hence, it seems natural to me that text analysis, as practiced in the 21st century (by computational linguists and artificial intelligence researchers), should be an integral part of the social sciences and it comes as a continuous surprise to me that I can still find astonishingly little use of NLP (Natural Language Processing) in the social sciences and practically no use of the more cutting edge tools that have been developed in Computational Linguistics over the last few years, for example at Stanford, MIT, or Carnegie Mellon University.