Possible measures of asymmetry and redundancy in collocations
Issue: Vol 1 No. 2 (2014)
Journal: Journal of Research Design and Statistics in Linguistics and Communication Science
Subject Areas: Linguistics
Abstract:
It has long been recognized that developing measures of the internal structure of collocations is an important goal (Sinclair, 1991). Recently, Gries’ (2013) presented a measure that captures the asymmetric nature of conditional probabilities in collocations. This paper intends to contribute to the discussion by introducing measures of asymmetry and redundancy that may meet the needs of some researchers. Two asymmetry measures are described. The first captures only frequency asymmetry while the second is an asymmetric version of the mutual information measure. A measure of semantic redundancy is also described here. This measure takes a higher value when the fact that two words co occur contains more information than the uncertainty introduced by the occurrence of the individual words.
Author: Robert Nelson
References :
Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing in Python. Sebastopol, CA: O’Reilly Media.
Bybee, J. L. (2010). Language, Usage and Cognition (Vol. 98). Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511750526
Dirven, R. and Verspoor, M. (Eds) (2004). Cognitive Exploration of Language and Linguistics (Vol. 1). New York: John Benjamins Publishing. http://dx.doi.org/10.1075/clip.1
Ellis, N. C. (2006). Language acquisition as rational contingency learning. Applied Linguistics, 27 (1): 1–24. http://dx.doi.org/10.1093/applin/ami038
Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In Aijmer, Karin (Ed.), Corpora and Language Teaching, 13–332. New York: John Benjamins. http://dx.doi.org/10.1075/scl.33.04gra
Gries, S. T. (2010). Useful statistics for corpus linguistics. In Aquilino Sánchez and Moisés Almela (Eds) A Mosaic of Corpus Linguistics: Selected Approaches, 269–291. Frankfurt am Main: Peter Lang.
Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics, 18 (1): 137–166. http://dx.doi.org/10.1075/ijcl.18.1.09gri
Justeson, J. S. and Katz, S. M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1 (1): 9–27. http://dx.doi.org/10.1017/S1351324900000048
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22 (1): 79–86. http://dx.doi.org/10.1214/aoms/1177729694
Liu, D. (2013). Salience and construal in the use of synonymy: A study of two sets of near-synonymous nouns. Cognitive Linguistics, 24 (1): 67–113. http://dx.doi.org/10.1515/cog-2013-0003
Michelbacher, L., Evert, S. and Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7 (2): 245–276. http://dx.doi.org/10.1515/cllt.2011.012
Ramscar, M., Dye, M. and McCauley, S. M. (2013). Error and Expectation in language learning: The curious absence of mouses in adult speech. Language, 89 (4): 760–793. http://dx.doi.org/10.1353/lan.2013.0068
Renouf, A. and Banerjee, J. (2007). Lexical repulsion between sense-related pairs. International Journal of Corpus Linguistics, 12 (3): 415–444. http://dx.doi.org/10.1075/ijcl.12.3.05ren
Rescorla, R. A. and Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasky Classical conditioning II: Current Research and Theory, 64–99. New York: Appleton-Century-Crofts.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27 (3): 379–423.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press. http://dx.doi.org/10.1002/j.1538-7305.1948.tb01338.x
Spivey, M. J. and Richardson, D. C. (2008). Language embedded in the environment. In P. Robbins and M. Aydede (Eds) The Cambridge Handbook of Situated Cognition, 382-400. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511816826.020
Theil, H. (1970). On the estimation of relationships involving qualitative variables. American Journal of Sociology, 76 (1): 341–357. http://dx.doi.org/10.1086/224909
Watanabe S (1960). Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development, 4 (1): 66–82. http://dx.doi.org/10.1147/rd.41.0066
Wolfram, S. (2014). Launching Mathematica 10 – with 700+ New Functions and a Crazy Amount of R&D. http://blog.wolfram.com/2014/07/09/launching-mathematica-10-with-700-new-functions-and-a-crazy-amount-of-rd