Item Details

Possible measures of asymmetry and redundancy in collocations

Issue: Vol 1 No. 2 (2014)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.v1i2.20304

Abstract:

It has long been recognized that developing measures of the internal structure of collocations is an important goal (Sinclair, 1991). Recently, Gries’ (2013) presented a measure that captures the asymmetric nature of conditional probabilities in collocations. This paper intends to contribute to the discussion by introducing measures of asymmetry and redundancy that may meet the needs of some researchers. Two asymmetry measures are described. The first captures only frequency asymmetry while the second is an asymmetric version of the mutual information measure. A measure of semantic redundancy is also described here. This measure takes a higher value when the fact that two words co occur contains more information than the uncertainty introduced by the occurrence of the individual words.

Author: Robert Nelson

View Original Web Page

References :

Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing in Python. Sebastopol, CA: O’Reilly Media.


Bybee, J. L. (2010). Language, Usage and Cognition (Vol. 98). Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511750526


Dirven, R. and Verspoor, M. (Eds) (2004). Cognitive Exploration of Language and Linguistics (Vol. 1). New York: John Benjamins Publishing. http://dx.doi.org/10.1075/clip.1


Ellis, N. C. (2006). Language acquisition as rational contingency learning. Applied Linguistics, 27 (1): 1–24. http://dx.doi.org/10.1093/applin/ami038


Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In Aijmer, Karin (Ed.), Corpora and Language Teaching, 13–332. New York: John Benjamins. http://dx.doi.org/10.1075/scl.33.04gra


Gries, S. T. (2010). Useful statistics for corpus linguistics. In Aquilino Sánchez and Moisés Almela (Eds) A Mosaic of Corpus Linguistics: Selected Approaches, 269–291. Frankfurt am Main: Peter Lang.


Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics, 18 (1): 137–166. http://dx.doi.org/10.1075/ijcl.18.1.09gri


Justeson, J. S. and Katz, S. M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1 (1): 9–27. http://dx.doi.org/10.1017/S1351324900000048


Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22 (1): 79–86. http://dx.doi.org/10.1214/aoms/1177729694


Liu, D. (2013). Salience and construal in the use of synonymy: A study of two sets of near-synonymous nouns. Cognitive Linguistics, 24 (1): 67–113. http://dx.doi.org/10.1515/cog-2013-0003


Michelbacher, L., Evert, S. and Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7 (2): 245–276. http://dx.doi.org/10.1515/cllt.2011.012


Ramscar, M., Dye, M. and McCauley, S. M. (2013). Error and Expectation in language learning: The curious absence of mouses in adult speech. Language, 89 (4): 760–793. http://dx.doi.org/10.1353/lan.2013.0068


Renouf, A. and Banerjee, J. (2007). Lexical repulsion between sense-related pairs. International Journal of Corpus Linguistics, 12 (3): 415–444. http://dx.doi.org/10.1075/ijcl.12.3.05ren


Rescorla, R. A. and Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasky Classical conditioning II: Current Research and Theory, 64–99. New York: Appleton-Century-Crofts.


Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27 (3): 379–423.


Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press. http://dx.doi.org/10.1002/j.1538-7305.1948.tb01338.x


Spivey, M. J. and Richardson, D. C. (2008). Language embedded in the environment. In P. Robbins and M. Aydede (Eds) The Cambridge Handbook of Situated Cognition, 382-400. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511816826.020


Theil, H. (1970). On the estimation of relationships involving qualitative variables. American Journal of Sociology, 76 (1): 341–357. http://dx.doi.org/10.1086/224909


Watanabe S (1960). Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development, 4 (1): 66–82. http://dx.doi.org/10.1147/rd.41.0066


Wolfram, S. (2014). Launching Mathematica 10 – with 700+ New Functions and a Crazy Amount of R&D. http://blog.wolfram.com/2014/07/09/launching-mathematica-10-with-700-new-functions-and-a-crazy-amount-of-rd