INDICATORS OF LANGUAGES IN THE INTERNET
SYNTHESIS OF RESULTS FROM VERSION 3, March 2022
Version 1 : 2017, with 130 languages with L1 > 5 million speakers
Version 2 : 2021, with 329 languages with L1 > 1 million speakers and important bias reduction
Version 3 : 3/2022, with comprehensive bias reduction and redefinition of some outputs
More than a new version, this is the reach of maturity for the method
as all the biases are now controlled to an acceptable threshold,
and the produced indicators are reliable within a
- 20% +20% confidence interval
What do the results tell us?
SYNTHESIS
Access a short PDF document describing the linguistic resource obtained
(English version presented at LREC2022 in SIGUL2022 workshop)
• STUDIES CONDUCTED BY
.
.
.
• This is an indirect approximation of the space of languages in the net using different data sources and statistics technics.
• All computations and results are made on the basis of L1+L2 where L1 is mother tongue and L2 second language(s)
• Following our main demo-linguistic source (Ethnologue #24) the world population (L1) and L1+L2 speakers population are :
L1 = 7 231 699 136 L2 = 10 361 716 756 L1+L2/L1 = 1.4328
• The confidence interval of all the produced figures is estimated to be within the window
-20% …..V..… +20%.
Read the results below as % of Web contents in English is higher than 16% and lower than 24%
the % of contents for the rest of languages is between 18% and 26%.
ALL INDICATORS FOR 30 LANGUAGES WITH HIGHER CONTENT PERCENTAGE
RANK |
|
|
% |
% |
% |
|
% |
% |
% |
|
|
|
|
|
WORLD |
|
|
|
|
C.PROD. |
|
CONTENTS |
|
|
INTERNAUTS |
POP. |
CONN. |
|
CONTENTS |
VIRT.PRES. |
|
|
L1+L2 |
ISO |
LANGUAGES |
L1+L2 |
L1+L2 |
Speakers |
|
L1+L2 |
L1+L2 |
L1+L2 |
|
1 |
zho |
Chinese Macro |
18,46% |
14,72% |
71,38% |
|
21,60% |
1,47 |
1,17 |
|
2 |
eng |
English |
14,83% |
13,01% |
64,86% |
|
19,60% |
1,51 |
1,32 |
|
3 |
spa |
Spanish |
6,79% |
5,24% |
73,72% |
|
7,85% |
1,50 |
1,16 |
|
4 |
hin |
Hindi |
4,19% |
5,80% |
41,16% |
|
3,76% |
0,65 |
0,90 |
|
5 |
rus |
Russian |
3,51% |
2,49% |
80,32% |
|
3,76% |
1,51 |
1,07 |
|
6 |
fra |
French |
2,98% |
2,58% |
65,80% |
|
3,33% |
1,29 |
1,12 |
|
7 |
por |
Portuguese |
2,99% |
2,49% |
68,43% |
|
3,13% |
1,26 |
1,05 |
|
8 |
ara |
Arabic Macro |
3,97% |
3,53% |
63,99% |
|
3,09% |
0,87 |
0,78 |
|
9 |
jpn |
Japanese |
1,99% |
1,22% |
92,63% |
|
2,66% |
2,18 |
1,34 |
|
10 |
deu |
German, Standard |
2,04% |
1,30% |
89,17% |
|
2,37% |
1,82 |
1,16 |
|
11 |
msa |
Malay Macro |
2,36% |
2,36% |
56,93% |
|
1,96% |
0,83 |
0,83 |
|
12 |
tur |
Turkish |
1,17% |
0,85% |
78,05% |
|
1,14% |
1,35 |
0,98 |
|
13 |
ita |
Italian |
0,87% |
0,66% |
75,83% |
|
1,00% |
1,53 |
1,14 |
|
14 |
kor |
Korean |
0,90% |
0,79% |
65,16% |
|
0,98% |
1,24 |
1,09 |
|
15 |
fas |
Persian Macro |
1,08% |
0,81% |
75,91% |
|
0,88% |
1,09 |
0,82 |
|
16 |
ben |
Bengali |
1,11% |
2,58% |
24,55% |
|
0,88% |
0,34 |
0,79 |
|
17 |
vie |
Vietnamese |
0,92% |
0,74% |
70,96% |
|
0,85% |
1,15 |
0,92 |
|
18 |
urd |
Urdu |
0,95% |
2,22% |
24,38% |
|
0,66% |
0,30 |
0,70 |
|
19 |
tha |
Thai |
0,80% |
0,59% |
77,95% |
|
0,65% |
1,12 |
0,82 |
|
20 |
pol |
Polish |
0,60% |
0,39% |
87,09% |
|
0,63% |
1,59 |
1,04 |
|
21 |
mar |
Marathi |
0,69% |
0,96% |
41,06% |
|
0,58% |
0,60 |
0,83 |
|
22 |
tel |
Telugu |
0,68% |
0,92% |
41,69% |
|
0,56% |
0,60 |
0,82 |
|
23 |
tam |
Tamil |
0,61% |
0,82% |
42,15% |
|
0,51% |
0,62 |
0,83 |
|
24 |
jav |
Javanese |
0,62% |
0,66% |
53,76% |
|
0,44% |
0,66 |
0,70 |
|
25 |
nld |
Dutch |
0,38% |
0,24% |
91,14% |
|
0,41% |
1,73 |
1,08 |
|
26 |
guj |
Gujarati |
0,44% |
0,60% |
41,47% |
|
0,36% |
0,61 |
0,83 |
|
27 |
ukr |
Ukrainian |
0,40% |
0,32% |
71,02% |
|
0,35% |
1,09 |
0,88 |
|
28 |
kan |
Kannada |
0,41% |
0,57% |
41,11% |
|
0,33% |
0,59 |
0,82 |
|
29 |
ron |
Romanian |
0,32% |
0,23% |
79,57% |
|
0,30% |
1,29 |
0,93 |
|
30 |
aze |
Azerbaijani Macro |
0,33% |
0,23% |
81,54% |
|
0,28% |
1,21 |
0,85 |
|
|
|
REMAIN |
22,60% |
30,10% |
|
|
15,13% |
|
|
|
|
|
TOTAL |
100,00% |
100,00% |
|
|
100,00% |
|
|
|
See the results for the top languages by category of indicator
Read the basic methodological note
Check the comparison with other similar data (W3Techs and InternetWorldStats)
Access the full results for all 329 languages by downloading corresponding Excel files