Annex 3
Analysis of the quotations per language
The exercise consists in verifying the distribution of
quotations of some individuals according to languages, thanks to AltaVista search engine
(for limits deriving from the identified mistakes of the algorithm - see L3 study). We have divided the subset of
individuals according to "the indication of totality", that is to say according
to the fame individuals have acquired outside of their linguistic borders. Three groups
were formed: individuals that have a high indication (as Victor Hugo), an average
indication (as George Sand), and a low indication (as Georges Brassens). A
supplementary category of individuals assimilated by the culture "of the
United-States" (as Jacques Cousteau) has been added. A first table offers a detailed
distribution between the group of Latin languages, English and German. The other tables
offer a distribution between English, the individuals's native Latin language (if that is
the case) and other languages.
First of all it is necessary to underline that the mistakes
indicated in the L3 study continue to appear: in some cases (marked in red smaller and in
italics), the whole AltaVista language algorithm is significantly lower than the sum of
every single language; these results therefore do not a big relevance.
The results of Sigmund Freud and Napoleon Bonaparte are those
that are closest to the distribution without cultural preference (with a logical
"plus" for French and Italian in Napoleon's case, in German, in Freud's case).
What are the lessons of this table?
The percentage of English varies between 40% and 75% for the
non English-speaking characters.
The percentage of English varies between 85 and 87% for the
English-speaking characters.
To read the results in relative way, it is necessary to
remember the percentages of the presence of languages that we calculated for the
linguistic part of the study:
| |
EN |
FR |
SP |
PO |
IT |
RU |
GE |
OTHERS |
| % of web pages |
75% |
2,8% |
2,5% |
0,8% |
1,5% |
0,15% |
4,2% |
13% |
Using these numbers, it is possible to
constitute a weighting function between -100 and 100 that indicates the absolute
"winning" or "loss" of the character in a given language. The function
has the value 0 when the score corresponds precisely to the mean.
Logically, the percentage of English for individuals that
have a high index of Anglo-Saxon locality exceeds the 90% and sometimes even reaches 97%,
as for the Nobel prize Ernest Orlando Lawrence.
It can lower than 15% for the French-speaking or
Portuguese-speaking local individuals. This number is higher for the Hispanic or Italian
characters, probably because of the presence of people that speak these languages within
the community of the United-States.
These results suggested another research: where do the
quotations of Latin individuals made in English come from? This analysis demands reading
and calculation of all the results and, therefore, can be done only with little quoted
characters.
What reveals us this last analysis?
- a noteworthy percentage of Latin sites as well as sites of
countries of other languages written in English
- a relatively low number of authentically English language
sites.
- a considerable number of international sites dealing with the
electronic trade that use English as business language.
- and a number of mistakes of the recognition of AltaVista
algorithm in the order of 10%.