International Encyclopedia of Linguistics

Colin P. Masica

South Asian Languages. 

A major linguistic area, the South Asian languages are centered in the Indian subcontinent, hence this is often called the “Indian linguistic area.” It covers India, Pakistan, Bangladesh, Nepal, and Sri Lanka, with extensions into Afghanistan, Tibet, and Burma (see Map 1). It includes languages belonging to eight genetic groups:

South Asian LanguagesClick to view larger

Map 1. Distribution of Language Families of South Asia

  • (a) Dravidian: all languages

  • (b) The I[ndo-]A[ryan] sub-branch of I[ndo-] E[uropean]: all languages except Romany—which, having left the region, has lost many areal features

  • (c) The Munda branch of Austro-Asiatic most languages (see below)

  • (d) The Iranian sub-branch of IE: those languages on the immediate margins of the subcontinent—whether originally of the Eastern Iranian group, like Pashto, or intrusive, like Western Iranian Baluchi

  • (e) The Nuristani sub-branch of IE, mainly in northeast Afghanistan

  • (f) T[ibeto-]B[urman], particularly languages within the mountain rim that defines the subcontinent—but also Tibetan and, in part, Burmese

  • (g) Tai: only the Khamti language, intrusive in Assam

  • (a) The language isolate Burushaski, in the far north of Pakistan

Mere physical presence within the geographic confines of the area, without much interaction with speakers of other languages, does not entail typological participation in it. Thus the Khasi language (belonging to the Mon-Khmer branch of the Austro-Asiatic family)—self-sufficient on a mountaintop in the middle of north-eastern India (now the State of Meghalaya)—does not really participate; and Sora of the South Munda group, hidden in rugged hills on the Orissa/Andhra border, does so only to a limited extent. (Although the Nicobar and Andaman Islands are politically part of India, their languages stand outside the area, both geographically and typologically.)

The morpheme-by-morpheme intertranslatability of local varieties of Marathi, Kannada, and Urdu—reported by Gumperz and Wilson 1971 for the village of Kupwar in the Maharashtra/Karnataka border area—suggests some of the processes that went into the formation of the South Asian area; however, this is not the general norm. More typically, South Asian languages show agreement at some points, while retaining language-specific or group-specific characteristics at others. Like all large linguistic areas with a complex history, South Asia shows many sub-areas of more intense or special convergence, or of partial convergence with other areas. It also has features that place it within larger areal configurations.

For reference, see Weinreich 1957, Andronov 1964, Kuiper 1967, Edel'man 1968, Vermeer 1969, Southworth 1971, Pandit 1972, Southworth and Apte 1974, Masica 1976, 2001, Shapiro and Schiffman 1981 (especially chaps. 4–8), Klaiman 1986, and Krishnamurti 1986 (especially Part II, pp. 123–285).

1. Features of larger areas

Pan-areal features defining South Asia as part of significantly larger areas include the following.

1.1. O[bject] V[erb] word order

This is dominant in a wide contiguous area of Asia, including Iran, Burma, Central Asia, native Siberia, Korea, and Japan—not, however, in further Southeast Asia or the Arab Middle East, where VO order prevails. A VO exception within South Asia, besides Khasi, is Kashmiri, where this is a recent innovation.

1.2. So-called absolutive or conjunctive participle

This is a non-finite verbal form in which one or more successive predications are subordinated to a final one containing a finite verb. It is largely co-extensive with OV syntax in Asia, occurring in Altaic and Iranian as well as in South Asia (including Kashmiri); but it extends farther, in a diminished role, in the west.

1.3. Explicator compound verb construction

Here a finite verb, one of a limited set of special auxiliaries, “completes or specifies the sense” of an immediately preceding main verb in the absolutive form. The two verbs refer to a single event (cf. Hook 1977). The distribution is narrower than that of the absolutive itself: it is characteristic in varying degrees of Central Asian Altaic and of Japanese and Korean. The resultative verbs of Chinese and mainland SE Asian languages are somewhat analogous. As a device, this contrasts with the use of prefixes in Slavic, German, Hungarian, and Greek to express similar meanings (with important differences).

1.4. Reduplication

As a syntactic (rather than wordforming) device, reduplication is found in all languages to some degree, but is highly developed, with special grammaticalized functions, in both South and SE Asia. Whether any of these functions is distinctively South Asian, and hence definitive of that area, is not clear (see Gonda 1949, Abbi 1987); they are generally least characteristic of Malayalam, according to Abbi.

2. Features of South Asia

Pan-areal features defining or characteristic of South Asia include the following.

2.1. Retroflexion

A contrast of retroflexion (some prefer the term “retraction”) in the apical stops ṭ/t, ḍ/d—and in some languages, also in nasals, laterals, and flaps, thus ṇ/n, ḷ/l, ṛ/r—is found in all the areal stocks: Drav., IA, Munda, TB (Tibetan itself and a number of Sub-Himalayan languages), Nuristani, Burushaski, Eastern Iranian (Pashto, Ormuri, some Pamir languages), and intrusive Western Iranian Baluchi—and even in Indian English. However, this clearest “South Asian” feature is not present in all languages of the area: it is absent from Assamese (easternmost IA) and from nearby eastern TB dialects (Bodo, Garo, Meithei, and the Naga and Kuki groups), as well as from the Munda languages, Sora and Korku (the latter is located far to the west of other Munda languages, in the Satpura Range north of Maharashtra). These two Munda languages preserve a peculiar asymmetric system—dental voiceless t vs. retroflex voiced —which seems to have been characteristic of Proto-Munda. (On this and other phonological features, see Ramanujan and Masica 1969.) The retroflex opposition is most strongly developed in a band of languages stretching from eastern Iranian, Nuristani, and Burushaski in the northwest, southward through western IA to Drav., and around to Oriya on the east coast. These languages have or , or both—and, at the northwestern end, the additional oppositions ̣/c, ṣ/s z̳/z; they also have typically greater lexical and textual frequency of retroflexes.

2.2. Postpositions

rather than prepositions occur in all areal stocks, except Eastern Iranian and Nuristani on the western border of the area, which have both constructions. (Several optional prepositions have wandered farther east into Hindi.) This feature is shared with Altaic—with which the South Asian distribution is not contiguous, because of intervening Iranian and Nuristani. The pattern has been linked typologically with OV word order; but the two features are independent, as shown by their disassociation in Persian, which has OV with prepositions, transitional to “normal” VO with prepositions further west.

2.3. The “echo-word,”

repetition of a stem with change of the first consonant or syllable (typically to a labial in IA, a velar in Drav.), yields the meaning ‘and/or things like that’: Hindi caay-vaay ‘tea, etc.’, Telugu paalu-giilu ‘milk, etc.’ The function of the feature may be called ‘generalization’; it is also found in Munda and Eastern Iranian (with m-). Heston 1980 has pointed out that it extends to colloquial Iranian (using labials, e.g. cai-mai); its status in TB, Nuristani, and Burushaski is not clear.

2.4. Phonesthetic or expressive words

form a large lexical category. These are no doubt found in many languages (Eng. zigzag, flipflop), but are highly developed in South Asia (a statistical criterion might be invoked), where they have specific formal characteristics. Reduplication occurs on the pattern CVC-CVC or CVCV-CVCV, along with partial reduplication under certain rules, a characteristic suffix -k, and verb-forming propensities. The pattern also yields many “areal etymologies” (Emeneau 1980). Possible connections with SE Asia, via Munda, need investigation.

2.5. Adj[ective] + N[oun] order

(a variable independent of OV/VO), occurs in Drav., IA (including Kashmiri), Munda, Eastern Iranian, Nuristani, and Burushaski, but generally not in TB, with some exceptions (Balti in Kashmir; Newari, Magari, Rai, and Sunwar in Nepal; Kanauri). This pattern is also characteristic of Uralic and Altaic—with which South Asia is narrowly connected via the Pamirs—and of (northern) Chinese; it contrasts with N + Adj order in SE Asia, Persian, and Arabic.

2.6. Dative subject construction

The experiencer of an act or condition is expressed as a “Subject” (or topic) in an oblique case, most often the dative. Although found in other languages (e.g. German Mir ist kalt), this construction seems to be developed to an extraordinary degree in most South Asian languages (statistical confirmation may be possible); it may be correlated with an areal semantic feature of volitionality (Klaiman 1986). It is also related to the absence of a verb ‘have’.

2.7. Quotative construction

is a quote or onomatopoeic expression with a postposed marker which is either equivalent to ‘having said’ or to ‘thus’, the latter paralleled by the Sanskrit element iti. This is Pan-Dravidian; in IA it is not now universal, but is widely distributed (Eastern group, Dakhani Hindi-Urdu, Nepali, Marathi, Gujarati). It is also found in TB (Newari, Ladakhi), but its status in Munda is not clear.

3. Sub-areal features

Cross-genetic features defining major sub-areas within South Asia include the following.

3.1. An opposition of nasalized versus oral vowels

is found in IA, Nuristani, Munda, TB, and apparently in Baluchi, but not in Drav. languages except the northern Kurukh (and allophonically in Telugu and Tamil); also lacking in Burushaski, and in Pashto and other Iranian languages. It has been lost in IA Sinhalese and Marathi. The nearest languages with similar oppositions are very distant.

3.2. Exclusively suffixing agglutinative morphology

(except for some lexical prefixes recently introduced via Persian and Sanskrit vocabulary) is typically Aryo-Dravidian, though other areal stocks, as well as Dravidian Brahui, have both prefixes and suffixes. (Munda even makes extensive use of infixes.) This feature sets modern IA apart from the rest of IE, including neighboring Iranian and Nuristani, and from ancestral Sanskrit. It is shared by Altaic and Uralic languages to the north; but these are cut off from the Aryo-Dravidian area by Iranian, Tibetan, and Burushaski, in which prefixes play important roles.

3.3. Elaboration of suffixal causative morphology

(many verbs have second causatives) is found in Drav., West/Central IA, and Baluchi. It is surrounded by languages (including Eastern IA and Sinhalese) which are limited to morphological first causatives. In Munda (originally), TB, and Burushaski, causative morphology is prefixal; but Northern Munda has developed suffixal devices. Second-degree morphological (suffixal) causatives are also characteristic of Altaic and Uralic, which are not strictly contiguous with the Aryo-Dravidian area.

4. Partial area features

Features attaching some South Asian languages to languages outside the area include those listed below.

4.1. Contrastive aspiration in consonants

is absent from Drav. (except in careful pronunciation of borrowed words in Telugu and Kannada), Iranian, and Nuristani (also from IA Sinhalese as a sub-areal feature), and is present only through borrowing in Munda; but it is a major feature of IA, Burushaski, TB, and of the Chinese and Tai languages beyond. While voiceless aspirates kh, ch, ph, etc. are widespread, voiced aspirates gh, jh, bh etc. are a peculiarity of IA. Their absence from a few IA languages (Punjabi, East Bengali dialects) correlates with the presence of tone and related phenomena (glottalization, etc.), which are also present in TB, Burushaski, Tai, and Chinese. Neither aspiration nor tone is characteristic of Altaic or Uralic.

4.2. Numeral classifiers

(as in ‘two-piece thing’) have developed mainly in Eastern IA (especially Assamese) and adjoining NE Drav., TB, and Munda languages, but they are peripheral to a center of the phenomenon in SE Asia. Classifiers are also found (independently?) in Iranian, according to Heston 1980.

4.3. Pronominal suffixes

link some northwestern IA languages and Drav. Brahui with Iranian generally, with Burushaski (prefixes), and with Nuristani.

4.4. The ergative construction

has a distribution now interrupted by the spread of Persian and Turkish, perhaps linking much of IA plus Tibetan, Burushaski, and various Iranian languages to the ergative languages of the Caucasus.

5. Area and history

The history of these and similar developments (the list above is not exhaustive) is varied and complex. Attention has long focused on the basic retroflexion feature which sets off the South Asian area from neighboring areas. Though clearly non-IE, it is found in the earliest Sanskrit. The opposition appears to be recent in Munda and Tibetan, but must be reconstructed for Proto-Dravidian. The debate has accordingly been over whether the developments in early Sanskrit were motivated purely internally (Hock 1984), or owe something to Drav. loanwords, or to the carryover of habits involved in language shift from a Dravidian or other substrate (Thomason and Kaufman 1988). Generally overlooked in this controversy is the high development of the opposition in the northwestern corner of the area, where it affects sibilants and affricates. Such sounds may have triggered the Sanskrit developments long ago in the immediate geographic vicinity.

It is not disputed that the formation of the area has involved a basic morphological, syntactic, and semantic remodeling of IA on a Drav. model, occasionally tempered by other influences. The next question is that of the similarity of Drav. to the Uralic and Altaic languages. ‘Typological’ explanations are not completely satisfying. Genetic relationship has long been proposed—earlier to Altaic, and more recently to Uralic. The similarity may reflect ancient areal association.

The South Asian area is important for linguistic area studies generally, since the genetic boundaries are clear within the subcontinent. There is no question that convergence has taken place across them, both locally and on a larger scale.

                                                  Colin P. Masica

