Commit graph

10 commits

Author SHA1 Message Date
Jean Chalard
7ec72b80ed Update dictionaries
Full diff too long: truncated

Summary diff
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1366277083 <=> 1366957492
  version : 31 <=> 32
Contents :
  - Reinstate 2- and 3- letter words that were demoted to avoid
    bad space insertion (343 entries)
  - Add missing words as per b/6341908 and b/5674314
    (98 entries)

This has zero effect on the regression tests

Bug: 6341908
Bug: 5674314
Change-Id: Ifce268a7eab5edd264d963489187e975017f8b72
2013-04-26 15:56:54 +09:00
Jean Chalard
9cf468646f Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1366021966 <=> 1366272052
Body :
Added: yt 0

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1366021978 <=> 1366272093
Body :
Added: yt 0

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1366021987 <=> 1366272977
Body :
Added: yt 0

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1366003217 <=> 1366272255
Body :
Freq changed: cash 80 -> 20

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1366003693 <=> 1366277083
Body :
Deleted: толщ 76

>>> java/res/raw/main_en.dict
Header :
  date : 1366021987 <=> 1366272977
Body :
Added: yt 0

>>> java/res/raw/main_fr.dict
Header :
  date : 1366003217 <=> 1366272255
Body :
Freq changed: cash 80 -> 20

>>> java/res/raw/main_ru.dict
Header :
  date : 1366003693 <=> 1366277083
Body :
Deleted: толщ 76

Bug: 8635822
Change-Id: I44dc73bd010b125c994387894847a008276d69f7
2013-04-18 18:41:19 +09:00
Jean Chalard
da175bdcb1 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1355802832 <=> 1366003032
  version : 29 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 72
Added: mm 135

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1355112451 <=> 1366003070
  version : 28 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 71
Added: mm 135

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1355802851 <=> 1366003861
  version : 29 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 71
Added: mm 135

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1357617878 <=> 1366003217
  version : 29 <=> 31
Body :
Not a word: re false -> true
Shortcut added: re le 15

>>> dictionaries/nb_wordlist.combined.gz
Header :
  date : 1355802836 <=> 1366003450
  version : 29 <=> 31
Body :
Freq changed: iPhone 91 -> 30
Added: app 30

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1358763720 <=> 1366003693
  version : 30 <=> 31
Body :
Freq changed: за 140 -> 181
Freq changed: не 140 -> 191
Freq changed: про 131 -> 151
Freq changed: эры 125 -> 140

>>> dictionaries/sv_wordlist.combined.gz
Header :
  date : 1355802856 <=> 1366003804
  version : 29 <=> 31
Body :
Added: vi 180

>>> java/res/raw/main_en.dict
Header :
  date : 1355802851 <=> 1366003861
  version : 29 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 71
Added: mm 135

>>> java/res/raw/main_fr.dict
Header :
  date : 1357617878 <=> 1366003217
  version : 29 <=> 31
Body :
Not a word: re false -> true
Shortcut added: re le 15

>>> java/res/raw/main_ru.dict
Header :
  date : 1358763720 <=> 1366003693
  version : 30 <=> 31
Body :
Freq changed: за 140 -> 181
Freq changed: не 140 -> 191
Freq changed: про 131 -> 151
Freq changed: эры 125 -> 140

Bug: 8560415
Bug: 7556679
Change-Id: If1c628edcb1cc5efd67e1715acf94f19c0eb4643
2013-04-15 14:51:02 +09:00
Jean Chalard
be94d212e8 Update the Russian dictionary
The point is to get as close as possible to having the
golden Russian tests pass.

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1355818916 <=> 1358763720
  version : 29 <=> 30
Body :
Deleted: НКТ 14
Freq changed: без 0 -> 140
Freq changed: бонус 94 -> 130
Freq changed: за 0 -> 140
Freq changed: на 0 -> 180
Freq changed: не 0 -> 140
Freq changed: парка 133 -> 110
Freq changed: про 0 -> 131
Freq changed: ручьи 93 -> 80
Freq changed: ура 86 -> 100
Freq changed: юрты 86 -> 60
Added: вечерком 100
Added: задачки 100
Added: сорри 100
Added: узнай 100
Added: учти 100

>>> java/res/raw/main_ru.dict
All the same above changes

Change-Id: I8685c34d9ab1dcbf8ae8e23d2e26380059684c95
2013-01-21 19:30:17 +09:00
Jean Chalard
cd89c5d6ed Update dictionaries
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1355802857 <=> 1355818916
Body :
Freq changed: БД 18 -> 0
Freq changed: ГБ 14 -> 0
Freq changed: ЕС 44 -> 0
Freq changed: ЖД 3 -> 0
Freq changed: ЖЖ 8 -> 0
Freq changed: ЖК 3 -> 0
Freq changed: ИИ 21 -> 0
Freq changed: КБ 37 -> 0
Freq changed: МБ 19 -> 0
Freq changed: МО 26 -> 0
Freq changed: ОС 40 -> 0
Freq changed: РФ 65 -> 0
Freq changed: СБ 21 -> 0
Freq changed: СК 23 -> 0
Freq changed: ТВ 37 -> 0
Freq changed: УК 36 -> 0
Freq changed: ЦБ 11 -> 0
Freq changed: ЦК 59 -> 0
Deleted: бэ 0
Freq changed: дБ 92 -> 0
Deleted: йо 0
Freq changed: мм 149 -> 0
Freq changed: рН 104 -> 0
Deleted: ша 0

>>> java/res/raw/main_ru.dict
Header :
  date : 1355802857 <=> 1355818916
Body :
Freq changed: БД 18 -> 0
Freq changed: ГБ 14 -> 0
Freq changed: ЕС 44 -> 0
Freq changed: ЖД 3 -> 0
Freq changed: ЖЖ 8 -> 0
Freq changed: ЖК 3 -> 0
Freq changed: ИИ 21 -> 0
Freq changed: КБ 37 -> 0
Freq changed: МБ 19 -> 0
Freq changed: МО 26 -> 0
Freq changed: ОС 40 -> 0
Freq changed: РФ 65 -> 0
Freq changed: СБ 21 -> 0
Freq changed: СК 23 -> 0
Freq changed: ТВ 37 -> 0
Freq changed: УК 36 -> 0
Freq changed: ЦБ 11 -> 0
Freq changed: ЦК 59 -> 0
Deleted: бэ 0
Freq changed: дБ 92 -> 0
Deleted: йо 0
Freq changed: мм 149 -> 0
Freq changed: рН 104 -> 0
Deleted: ша 0

Change-Id: I03f0f4e8d03e0f77f5879e6dd5c424673466afca
2012-12-18 17:25:37 +09:00
Jean Chalard
21dbe3701c Update dictionaries
cs, da, de, el, es, fi, fr, hr, it, lt, lv, nb, nl, pl,
pt_BR, pt_PT, sl, sr, sv, tr : rescale frequencies to match
spec. This has no large effect in the practice except the
dictionary will become stronger vs spatial model (especially in
lower count corpora, like lt, lv, sr)
en* : Small changes (rounding going the other way essentially)
ru : the above rescaling, and remove the following words:
Дре, ОСТа, Планше, легкими, легком, легкому, легкости,
легкую, нелегкие, нелегкий, нелегким, нелегкое, нелегкой,
нелегкую, полулегком and add нелёгкие, нелёгкое, нелёгкую;
other accented forms were already in the dictionary.

Change-Id: I40386c2ebd4d2be38874e822bde89db7cb512ae6
2012-12-18 13:06:48 +09:00
Jean Chalard
bd793ed50d Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1353500789 <=> 1354870724
Body :
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1351675958 <=> 1354870736
  version : 26 <=> 27
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1353500998 <=> 1354870744
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1353500832 <=> 1354872988
Body :
Deleted: noël 71
Deleted: po 73
Deleted: ti 73
Added: Noël 71
Added: lose 1
Added: y'a 130

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1353567943 <=> 1354870130
Body :
Demote all CAPS words by 80
Freq changed: модно 51 -> 20

>>> java/res/raw/main_en.dict
Header :
  date : 1353500998 <=> 1354870744
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> java/res/raw/main_fr.dict
Header :
  date : 1353500832 <=> 1354872988
Body :
Deleted: noël 71
Deleted: po 73
Deleted: ti 73
Added: Noël 71
Added: lose 1
Added: y'a 130

>>> java/res/raw/main_ru.dict
Header :
  date : 1353567943 <=> 1354870130
Body :
Demote all CAPS words by 80
Freq changed: модно 51 -> 20

Change-Id: I6f2d1c359d716535923b22c33d7fa4c3b0a330e4
2012-12-07 18:52:21 +09:00
Jean Chalard
b40a1ce50b Update RU dictionary header.
>>> dictionaries/ru_wordlist.combined.gz
>>> java/res/raw/main_ru.dict
Header :
  date : 1353500945 <=> 1353567943
  MULTIPLE_WORDS_DEMOTION_RATE : null <=> 0
Body :
  No differences

Bug: 7540132
Change-Id: I837831b1e214da64962cf1bb68c840a3d4e6bf76
2012-11-22 16:21:10 +09:00
Jean Chalard
d5f53710c5 Update dictionaries and fix mistakes
- Combined de dict :
  Remove digraph shortcuts that were in by mistake.
- Combined en dict :
  Set freq of "baton" "batons" "mace" "puff"
  "puffs" and "tasers" to zero. They are offensive
  in en_GB.
- Combined en_GB dict :
  Change freq of "il" to 0 and flag it "not a word". Still
  in the dict as a whitelist entry for "I'll"; for some
  reason it had freq 99.
  Add "milk:122" and "practice:143"
- Combined fr dict :
  Add missing words : "Nostradamus:40" "défendais:30"
  "gmail:50" "générale:140" "hm:0" "hmm:0" "y'en:130"
  "l'apocalypse:31" "m'épuise:30" "recontacter:80"
  "t'annonce:30"
  Set freq of non-word shortcuts for digraphs to 1 instead
  of 0, allowing to gesture them.
- Combined ru dict :
  Remove a lot of two-character non-words.

- Binary de dict :
  Remove the obsolete "options" header, and add the "dictionary"
  header.
- Binary en dict :
  Flag "hoe" "hoes" "il" "shel" as non-words.
  Also drop freq of "il" and "shel" to 0
  Add the "locale" header that was missing.
- Binary es dict :
  Add the "dictionary" header.
- Binary fr dict :
  Add the same words as above. Non-word shortcuts were already
  set to 1.
- Binary it dict :
  Add a "dictionary" header. Also change freq of
  "Šarapova" from 50 to 37; not sure why it was 50.
- Binary pt_BR dict :
  Add a "dictionary" header.
- Binary ru dict :
  Add a "dictionary" header and remove the same words as above.

For all dictionaries : bump the version to 27.

Change-Id: I94fe7f8f42b31fdad223085c00a94115e14d2276
2012-11-21 22:03:24 +09:00
Jean Chalard
d0cf96493c Use all Lexiteria sources and update existing directories.
New dictionaries :
- Danish
- Greek
- Finnish
- Lithuanian
- Latvian
- Dutch
- Polish
- Russian
- Slovene
- Serbian
- Swedish
- Turkish

Also, compress those files to reduce the footprint in the
repository.
Also, update and improve English and French dictionaries, and
add the ligatures shortcut into the French dictionary.
Finally, move the Russian binary dictionary here now that it
can at last be open sourced.

Bug: 5587752
Bug: 6775251
Bug: 6995793
Bug: 7149666
Change-Id: Iec9831d4dce425a2b5b0657571e4448436610525
2012-09-21 22:07:23 +09:00