Multilingual Virtual World: Languages on the Internet

Por Elena Maceviciute
Número 42

The present stage of the Internet development has already changed the previous conceptions of what it is and how it can be used. The maturing and well exploited technological tool has opened also entirely new spaces for researchers in the fields of culture, ethnography, arts, psychology, etc. The very first book on language pragmatics on the Internet has appeared in 2001 (Crystal 2001). The issue of languages on the Internet has excited me since 1998 and I have been monitoring the changing situation more or less regularly (see: Maceviciute 2002).

As the Internet came into being and started developing as a key-feature of the present and future information societies, most of its users and researchers were supporting an idea that it would be the main tool of globalisation, totally devoid of any specific cultural features. The authors used to point out that even a language, the main feature of national identity, was unified on the Internet. Multiple fears of weakening social links, estrangement, ruthless dictatorship of unification, etc. blossomed in response. At present, we know that many of them were unfounded and we should worry about other things. One of the unfounded fears was disappearance of national languages on the Web. However, the issue of one dominating language is not yet resolved.

The aims of this paper are:

  • to assess the multilingual situation on the net as it is now and in the future;
  • to introduce factors influencing creation of a multilingual net;
  • to introduce main attitudes related to language situation on the Internet;
  • to estimate the trends in development of “small” languages and their language resources on the Web using a case of Lithuanian language.

The present paper draws together results of research and surveys published by different organisations and individuals to provide an overview of Internet-related language diversity issues and a short original investigation.

Dr. M. Saulauskas (Vilnius University) stipulates that the “globalising effect of information society (and computer networks) does not mean common uniformity of all involved structures of social, political and cultural being; on the contrary, it multiplies the variety of the diversity of the world's social fabric. It stimulates sporadic proliferation of social morphology making total unification more and more inconceivable...” (Saulauskas 2000).

The present language situation on the Internet proves his point.

Language situation on the Internet: sites and users
The Internet is essentially non-geographic, but it is possible to look at the geography of its users as well as of information placed or exchanged on the web. For most of the time the U.S. users and English language content (which is also U.S. centred) dominated the Internet.

What is the present situation? “The art of estimating how many are online throughout the world is an inexact one at best. Surveys abound, using all sorts of measurement parameters. However, from observing many of the published surveys over the last two years, here is an “educated guess” as to how many are online world-wide as of September 2004: 800,040,498 people (Internet World Stats 2004). The following Table 1 breaks this number into Internet users by languages.

Table 1: Top Ten Languages in the Web( Number of Users of the Internet by Language )

Source: Internet World Stats 2004 – <http://www.InternetWorldStats.com>

The table shows also the average penetration of the Internet in a country with majority of speakers of the given language, an estimated number of language users, and a percentage of the language users on the Internet. English speakers are still a biggest group of the Internet users; however the penetration of the Internet is much higher in German, Dutch, Japanese, and Italian language zones. And Chinese speakers are the second largest language group of the Internet users, though the rate of the Internet penetration in China is only 8%. In the nearest future, one can expect that a dominating language group on the net will be Chinese. The majority of the users of other languages are in the minority of Internet users at the moment. However, among them there are big groups of Arab, Malaysian, Russian, or Indian language users.

For example, in Russia, penetration of the Internet has increased slowly over the past year, according to the latest findings from the “Russian Internet Monitoring III-2000” report
. The maximum Internet audience stands at 9.2 million, or 8.3 percent of the adult population, but it is growing fast and will reach 11 mln. people by January 2001. However, the number of Internet users in Russia according to other sources is much lower – 6 mln. The growth rate is 93,5% (though penetration rate only 4,1%) (Internet World Stats 2004). The Russian survey has found out that 84% of Russian users prefer Russian language sites for shopping, entertainment or other activity. More and more people use Internet in India – at present There were over 7 mln. Internet users in this country in 2001 (Nua 2002). The most popular services there are e-mail and online chat. There were over 3 million Internet users in the Middle East Arab countries and almost 2 mln. in Israel. The number of users is currently doubling every year. The African world is multilingual in many aspects though the percentage of African population on the net is low. Almost all Internet users in Morocco speak Arabic and French and the majority also speak English. About a quarter also speak Spanish or Berber. Internet usage in Mexico, Brazil and Argentina is continuing with e-commerce services becoming more and more popular (Nua 2002). Most of the users from these regions will be looking for non-English sites.

The usage of certain languages within regions and countries (not only world-wide) should also be taken into account. E.g., in a country like Sweden with the population of 9,010,700 there are 6,722,600 Internet users (penetration rate 74.6%) (Nielsen/NR 2004). The main language used by this group will be Swedish, and the Internet is already used by a majority of all speakers of this language in the world.

The very existence of the language statistic shows the importance of the issue for various interest groups. We could not find anything like that in 1998 and had to rely on indirect data in 2002.

Among these, there was an international survey of Internet availability at schools. According to it, Sweden leads the list in offering students access to the Internet from their schools (age 12-24) as estimated by a multi-country survey, “The Face of the Web”. Three of top ten in this respect have English as a native language.

Table 2: Availability of Internet access from schools

% of students accessing the internet from school
Country @school
Sweden 78%
Canada 74%
Taiwan 63%
UK 59%
US 59%
Japan 28%
Italy 28%
Germany 25%
France 25%
Urban China 13%
(Angus Reid Group 2000)

Among other things the study found that more than nine-in-ten students who have Internet access in Australia, Canada, the U.S. and Sweden report using the World Wide Web to complete their school assignments. English proficiency of most children in non-English speaking countries will not be good enough for this type of task, so this group of users is mainly interested in native language sites (with exception of the entertainment purposes, in most cases).

According to Cyveillance, the Internet is growing at an astounding rate - 7.3 million unique pages are added every day. (Cyveillance 2004). It is very difficult to estimate the amount of Web-sites and Web-pages in different languages. There are 239 domain names related to various territories with at least one registered host (ISC 2004). However, the domain names may not be related to a particular language used to create WWW contents.

However, if we look at the representation of the languages on the Internet the picture will reflect the status of connectivity of the regions and countries. One can find pages in practically any European language and even dialects, despite the number of people speaking them (or in some case, despite the fact that nobody speaks that particular language any more, like Sudovian or Old Prussian). It is impossible to find sites in most languages used in Africa, though there are plenty of pages in Arabic. The languages of the Asia are much better represented: Chinese, Japanese, Hindi, Farsi, Gujarati, Punjabi, Thai and many other language pages are abundant, but one may guess that the inequality in representation of many others will be great. The same situation is noticed with the languages of the Pacific area. There are very many pages on Native American languages though few of them done in these languages themselves. I have deliberately chosen the regional, not linguistic (like language families), divide, as it corresponds to the “global digital divide” primarily between North and South. It is quite clear that the number of languages represented on the Internet is growing together with the number of new pages.

The US Internet Council has produced a report “State of the Internet 2000”. One of the main themes of the report is the continuing globalisation of the Internet. As the authors comment, “…the Internet is becoming multicultural, multilingual, and multipolar.” The report's authors emphasise that only 50% of present users of the Internet have English as their first language. The majority of future Internet users will be non-English speakers expecting Internet content in languages other than English. In 2001 USIC stated “Significantly, native English speakers lost their dominance in 2001 and now represent approximately 45% of the online population” (USIC 2001).

However, earlier it was also suggested that language of the country together with wealth, education, and pricing of connectivity are important correlates of Internet diffusion in different countries: the countries with English as native language or having high rates of proficiency in English would have more hosts connected to the Internet. Eszter Hargittai in her investigation into the factors of connectivity among OECD countries has tested the proposition that: “English language exposure will influence the connectivity by favouring native speakers ([of English] most, followed by countries with population exhibiting high levels of English training, and discriminating most against populations with low English exposure and proficiency” (Hargittai 1999:706). She expected that a native English speaking population would encourage Internet spread compared to countries' with other native languages. However her data shows “that having a population of native speakers versus good English speakers does not make a difference... even lower levels of English exposure also does not have a large impact on connectivity” (Hargittai 1999:710).

A similar conclusion was reached in assessing the factors of the Internet development in Asia. Its findings show that the Internet penetration is related to country’s wealth, telecommunication infrastructure, urbanisation and stability of the government, but not related to literacy level, political freedom or proficiency in English (Xiaoming and Seet Kay 2004).

Research has established the language of the users does not affect the spread of Internet in the countries. What are the factors that ensure the usage of various languages on the Internet?

The attitudes and factors ensuring language diversity on the net
The attitudes towards the role of languages on the Internet are as diverse as the attitudes and opinions that people hold about languages in general. The actual activity on the Web to some extent reflects these attitudes. In Europe, there is an emerging division of language policies that were summed up by Treanor (2004):

  • neo-Atlanticists support English as European language of contact,
  • defensive national language activists seek a limited multilingualism, of national languages,
  • regionalists and separatists want all languages to get equal status, with hundreds of official languages in Europe,
  • technological optimists believe full automatic translation will be available "soon", so the political issues will disappear.

The user groups of different languages (especially, enjoying the status of national languages or not) are ensuring their presence and usage by network communities. Different movements of enthusiasts defending language diversity in general or on the net may be accounted as a power driving multilingualism on the net as well. On one hand, there are those who, like Yukio Tsuda, consider that the dominance of English signifies continuity of neo-colonialism through colonialisation of consciousness and ensures social and communication inequality as well as language discrimination. The only answer to this is promotion of the “Ecology of language paradigm” that advocates the right to language, equality in communication, multilingualism and multiculturalism (Tsuda 2000). On the other hand, there is a research community and those who consider language diversity to be the source of cultural diversity and vitality for humanity. Paul Treanor maintains a page “Language futures Europe” and collects on it the links on language policy, multilingualism, global language structures, and the dominance of English. Others maintain various sites of endangered languages (See: Foundation for endangered languages 2004).

National governments continue their policy on the Internet and introduce the new aspect of it into governmental discourse. France serves as a notorious example of national language policy for the Web. At the opening ceremony of a Francophone summit in Hanoi in 1997, French President Jacques Chirac said that the world could end up speaking and thinking the same way unless nations fight to preserve their linguistic and cultural diversity, especially on the Internet. He was speaking to leaders and representatives of the 47 member and two observer countries of La Francophonie, a loose association of states that have the French language in common. Chirac stressed that linguistic diversity on the Internet should be both defended and imposed and promised that France would put $3.45 million into a special fund created by Francophone nation ministers to ensure that Francophone text, sound, and images would be massively present on the net. (Reuters 1997).

Programmes of international governmental organisations, like the European Union and UNESCO mirror the policies of governments. UNESCO has established the Linguapax Institute – a non-governmental organisation located in Barcelona that has to continue activities started by a series of meetings organized by UNESCO within the framework of the Linguapax project. Its main orientation is the promotion of policies that protect language diversity and that foster the learning of several languages. Among other aims the Institute seeks to contribute to the presence of multilingualism in cyberspace (Linguapax 2004).

The European Commission pays special attention to the promotion of online content in languages other than English in its action plan “eContent: European digital content on global networks”. It launches, as a follow-on of the INFO2000, Multilingualism in Information Society, and eEurope, a new programme supporting European digital content on global networks and promoting linguistic diversity in information society and expects to cover a multiplicity of languages of the countries including the new members of EU (eContent 2003).

However, the main factor of language diversity on the Internet is the users need for sites in native languages.

Most non-English Internet users prefer Websites in their own language, according to IDC's eWorld 2001 Survey. Almost 34% of French respondents prefer to visit web sites in English, while 62% prefer sites in French. In Germany, only 18% prefer English language sites, compared to 79% who prefer their native language. China ranks highest, with almost 85% favoring web sites in Chinese over those in English (almost 15%). Japan has the lowest preference for English (nearly 8%) of the 27 surveyed countries, and is second only to China in its preference for Websites in its native language (almost 84%). (Maroto 2003).

This factor was recognised by business community quite early, and it became more and more interested in providing services in different languages in order to reach their customers. “Being successful in an [electronic] market requires not only understanding the needs of the clients but also being able to communicate with them in their language. The multilingualism is good business, is good for business and those developing [electronic commerce via the Internet] products and services should think multilingual from the start” (Knoppers 1998:101).

The dominance of English language on the Web for a long time was ensured not only by the place of its origin and international character but by technology and standards, which did not support different characters and other multilingual features. At present these technical problems are either solved or under investigation.

Barriers to localisation and multilingualism are falling away. The possibilities and diversity of language resources as well as means of teaching, learning, promoting, and practising language are constantly growing. The major move was creation of means supporting different character sets. Later multilingual search engines, translation software, etc. came. Additional technical support to languages is the possibility to use multimedia, especially sound and video files or transmission.

The diversity of linguistic sites on the net is overwhelming. There are language lists registering languages in different forms and for different purposes. Research and educational organisations, NGOs, or individuals maintain them. Some of them list only some languages (e.g., endangered languages of one region), the others up to 4900 (like the Global Recording Network for promotion of evangelism).

All kinds of dictionaries are also available: one language, two languages (like Swedish English, Swedish-Finnish, Swedish-Greek on <http://www-lexikon.nada.kth.se/skolverket/lexin-en.html>), multilingual, subject multilingual, etc. Special dictionary retrieval sites like <http://www.yourdictionary.com> (registers over 200 language dictionaries) are maintained and continuously updated. Grammars and teaching material, textbooks, free and paid language courses, research findings and projects for revival of dying languages - it is impossible even to name all of them.

It is important to mention software for translation of original texts and for translation of found pages. It seems that there are many professionals trying to create solutions for the major obstacle for global communication. Quite recently there were only few programmes for translation of the major languages, but more new ones appear allowing to switch not only between the languages using the same script, but also between different systems of writing.

Browsing and search tools of the net are becoming multilingual too. AltaVista allows searching in 25 languages, including Chinese, Japanese, Hebrew, and Korean. New software is developed allowing conducting searches in ones native language across the Web sites.

One of the explanations why the connectivity to the Internet does not depend on proficiency in English is communication services (like e-mail, chat groups, etc.) that are language independent. These services allow not only the use of any natural language, but any dialect, slang, or cipher. Several ethnographic studies of youth chat groups and other on-line communities in different countries (Check young people chat groups by S. Simsova, Birmingham Young University research team in Britain) have noted this particular usage of languages. In fact a natural language may be used as a rhetorical code (or a secret language) which becomes a distinguishing feature of on-line community allowing not only to unify its members but to exclude 'outsiders'. The representatives of "small" native languages take advantage of this possibility, especially if they are scattered throughout the world. The numbers of non-English language discussion groups or lists are also constantly growing. I have found many lists in which people interested in other languages are communicating in English. They are of two different types: created by people interested in non-English language study or research and members of small nations like Hopi, Cheyenne, Navaho, etc. scattered over the world. It seems as though most of the latter represent languages without strong written traditions. Some of their language sites provide audio materials for language demonstration and learning. It is possible that with other audio and video communication possibilities emerging on the Web some of these languages will find their users as well.

Lithuanian language on the Web
With the help of the students I have tried to follow the development of the Internet in Lithuania from the linguistic point of view for a longer period.

First of all, there are some figures about the Internet usage in Lithuania.

  Hosts total Hosts per 10 000 inhabitants Users (k) Users per 10000 inhabitants
Lithuania 66 373 203.79 695.7 2136.01
(Source: ITU 2003)

Internet usage depends much on the available infrastructure and according to eEurope on ICT infrastructure, Internet access and use in different countries, Lithuania belongs to the group with a great need for development. However, the recent survey of households in Lithuania indicates that 47% of people between 15 and 74 years of age, and 84% of those between 15 and 24 use the Internet at least once a week. 73% of all users were seeking local magazines and newspapers, almost a half was looking for information about products and services (Statistikos Departamentas 2003). This means that a number of inhabitants seeking information on Internet in Lithuanian language must be quite numerous.

In 1997, L. Rudokaite has carried out a survey of existing Lithuanian homepages (for a Master's thesis). It led to a conclusion that most of the Web sites and homepages are bilingual (Lithuanian and English). Moreover, there is a tendency to start the creation of homepages from the English version. There were no Web-sites related to Lithuanian language of any kind.

The repeated small-scale survey of Lithuanian Web sites and homepages in 2000 found that most of those, belonging to the central institutions and organisations, had an English version. However, there were many new categories that had only Lithuanian versions. Most of the pages created in provinces were only Lithuanian; over 100 e-magazines and newspaper versions were only Lithuanian, only in some instances were they in another language (Russian, Polish, English, Jewish, etc.); two thirds of e-commerce pages that were only Lithuanian, few supported several languages on the same page, some had versions in several other languages (mainly English, Russian, German), and a minority had only Lithuanian and English versions. I had not come across one entirely in English. On the other hand, I found pages composed only in specific dialects, like literature items in Samogitian. <http://samogitia.mch.mii.lt/LITERATURA/proza.lt.htm>)
(Maceviciute 2002).

In September 2004 a survey of the Web-pages of Lithuanian Internet (domain .lt) was made aiming to find out what languages are most popular.

The register of LithuanianWeb-sites of Lithuania On Line <http://www.online.lt/index.html.en> was used for this purpose because it registers the largest amount of Lithuanian Web-material in various categories. It links almost 13000 various Web-sites (governmental, public organisations, businesses, education, leisure, music, periodicals, databases, etc.). Out of these 298 Web-sites were selected randomly. The Web-sites were visited in two days. The data about the languages used on the Web-sites was collected. The amount of text on different language versions of the same site was compared and some language use characteristics noted. 9.7% of the sites from the directory were not found or were under construction during the visiting time.

The survey has shown that 47.3% of visited Web-sites use only one language. 42,2% are created only in Lithuanian. 30.5% of sites use two languages, usually Lithuanian and English (26.8%). 7.3% of sites are created in three languages and the most popular combination is Lithuanian, English, and Russian (6.4%). 4% of sites use more that 3 languages (from 4 to 22).

Lithuanian language is used in 95% of all sites, English in 45.9%, Russian in 9.7%, Latvian and German in 1.7%, Polish and Estonian in 1.3% of visited sites. French, Swedish, Danish, Finnish, and Japanese are also used in the multilingual sites. Two sites used 20 and 22 languages. One of them was an e-commerce site (nomatica.lt), the other belonged to an organisation related to the Council of EU and used all official EU languages.

The following table shows the use of different languages in the sites of various organisations:

According to the survey, the creators of the Web-sites in Lithuania use 29 languages. It is evident that Lithuanian language dominates in the surveyed sites. In most cases when more than one language is used to create a site, the second or one of additional languages is English. This pattern is common for the sites of all types of organisations. There are still 4,3% of sites created only in English, mainly to attract the attention of foreign audience (e.g., to the work of an artist in a personal site, or to rent a summer house). In general, the sites using one language (Lithuanian, Russian, or Polish) can be divided in several main categories: periodicals, e-commerce, created for internal use of organisations, directed only to regional population (e.g., Kaunas territorial health security office, sports facilities, public libraries), or created by small organisations (e.g., schools, clubs, SMEs). Most of the sites with more than 4 languages are created by companies because of the same reason. Within the chosen sample, the Web-sites of hotels were using the biggest amount of various languages (6, 8, and 9). We also found a site without any text but graphics and a site transmitting music without any comments or text.

The usage of parallel languages in the Web-sites differs. Less than half of the surveyed sites provide the equivalent versions of the pages in all languages. More often the Lithuanian version is more extensive than the version in English or subsequent languages. Sometimes, English and other versions are only short introductions to the organisation, or summary of the full version. In some cases the parallel text in Lithuanian and English is used on the same page. We were not evaluating the quality of languages, but even the superficial browsing reveals that at least on the half of the sites the mistakes and incorrect uses are characteristic for any language (including Lithuanian).

The other part of the survey of the Web-sites was aimed to retrieve the Web-sites related to Lithuanian language. The pages that contain description of the origins of Lithuanian language, history of Lithuanian language, occasional pages of language-related governmental programmes (e.g., The Lithuanian word on <http://www.spaudos.lt/index.htm>), and pages created by teachers were found.

Web-sites of Language-related institutions (e.g., National Language Commission, the State Language Inspection, Institute of Lithuanian Language, etc.), provide information about Lithuanian language and access to various language databases (e.g., to the Archive of Lithuanian dialects at <http://www.mch.mii.lt/dba/index.htm>). One also can access some linguistic periodicals on the Web, but mainly the contents pages or abstracts (not full-texts, e.g., Baltistica at <http://www.lzua.lt/eperiodika/sience.htm>).

In the year 2000, we could not find a decent dictionary either of Lithuanian language or bilingual. At present, the selection of high-quality online dictionaries is quite impressive: the Dictionary of Modern Lithuanian Language, Lithuanian-English, Lithuanian-Polish, Lithuanian-German-English, short dictionaries for other languages, including Lithuanian-Japanese online dictionary. There are also specialised dictionaries (e.g., of computer terminology, finance, etc.)

To some extent one can also treat the sites of Lithuanian radio stations as language resources – they provide a possibility to listen to Lithuanian speech. However, the material supporting learning of Lithuanian as a second language is poor.

“Multilingualism on the Web is the logical and natural consequence of the diversity of human populations” as persons interviewed by Marie Lebert state (Lebert 1999). However, we should not depend on natural consequences as there are many factors influencing the development on the net. Ensuring of multilingualism demands great resources and investments, far greater than businessmen may be inclined to put in and enthusiasts can afford. The development and sustaining of special language policies on international and national levels is necessary to achieve equality of languages and citizens in an electronic environment.

The development of the Lithuanian online language resources was ensured by the recent governmental programmes, like The Lithuanian Word or the Commemoration of the 450th Anniversary of the First Lithuanian Book. The resources of UNESCO and EU were also invested into development of some major databases for this field.

The language variety has to be supported, but it is also driven by the needs of organisations and businesses to address the target audiences in their own language. The strategies of language use for creating Web-sites reveal this trend unambiguously.


Dr. Elena Maceviciute
Associated Professor, Faculty of Communication, Vilnius University,
Vilnius, Lithuania