Commit Graph

28 Commits (2ed1ec411d8b539890359f8ced8e1fe8d90344cd)

Author SHA1 Message Date
Jean Chalard be94d212e8 Update the Russian dictionary
The point is to get as close as possible to having the
golden Russian tests pass.

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1355818916 <=> 1358763720
  version : 29 <=> 30
Body :
Deleted: НКТ 14
Freq changed: без 0 -> 140
Freq changed: бонус 94 -> 130
Freq changed: за 0 -> 140
Freq changed: на 0 -> 180
Freq changed: не 0 -> 140
Freq changed: парка 133 -> 110
Freq changed: про 0 -> 131
Freq changed: ручьи 93 -> 80
Freq changed: ура 86 -> 100
Freq changed: юрты 86 -> 60
Added: вечерком 100
Added: задачки 100
Added: сорри 100
Added: узнай 100
Added: учти 100

>>> java/res/raw/main_ru.dict
All the same above changes

Change-Id: I8685c34d9ab1dcbf8ae8e23d2e26380059684c95
2013-01-21 19:30:17 +09:00
Jean Chalard 84f932be73 Add words to Portuguese
>>> dictionaries/pt_BR_wordlist.combined.gz
Header :
  date : 1355802839 <=> 1357790917
  version : 29 <=> 30
Body :
Added: à 30
Added: é 30
Added: ò 30
Added: ô 30

>>> dictionaries/pt_PT_wordlist.combined.gz
Header :
  date : 1355802856 <=> 1357790930
  version : 29 <=> 30
Body :
Added: à 30
Added: é 30
Added: ò 30
Added: ô 30

>>> java/res/raw/main_pt_br.dict
Header :
  date : 1355802839 <=> 1357790917
  version : 29 <=> 30
Body :
Added: à 30
Added: é 30
Added: ò 30
Added: ô 30

Bug: 7966948
Change-Id: I71c0986cf616d67926d0a6a0e53099b04b0427d5
2013-01-10 14:14:17 +09:00
Jean Chalard 420528ed97 Update dictionaries
>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1355802835 <=> 1357617878
Body :
Deleted: jai 50

>>> dictionaries/pl_wordlist.combined.gz
Header :
  date : 1355802847 <=> 1357618222
Body :
Added: żebyście 69
Added: żebyśmy 69

>>> java/res/raw/main_fr.dict
Header :
  date : 1355802835 <=> 1357617878
Body :
Deleted: jai 50

Change-Id: I8651a4689bea06d5fe2caead471ef52969c77089
2013-01-08 14:24:22 +09:00
Jean Chalard cd89c5d6ed Update dictionaries
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1355802857 <=> 1355818916
Body :
Freq changed: БД 18 -> 0
Freq changed: ГБ 14 -> 0
Freq changed: ЕС 44 -> 0
Freq changed: ЖД 3 -> 0
Freq changed: ЖЖ 8 -> 0
Freq changed: ЖК 3 -> 0
Freq changed: ИИ 21 -> 0
Freq changed: КБ 37 -> 0
Freq changed: МБ 19 -> 0
Freq changed: МО 26 -> 0
Freq changed: ОС 40 -> 0
Freq changed: РФ 65 -> 0
Freq changed: СБ 21 -> 0
Freq changed: СК 23 -> 0
Freq changed: ТВ 37 -> 0
Freq changed: УК 36 -> 0
Freq changed: ЦБ 11 -> 0
Freq changed: ЦК 59 -> 0
Deleted: бэ 0
Freq changed: дБ 92 -> 0
Deleted: йо 0
Freq changed: мм 149 -> 0
Freq changed: рН 104 -> 0
Deleted: ша 0

>>> java/res/raw/main_ru.dict
Header :
  date : 1355802857 <=> 1355818916
Body :
Freq changed: БД 18 -> 0
Freq changed: ГБ 14 -> 0
Freq changed: ЕС 44 -> 0
Freq changed: ЖД 3 -> 0
Freq changed: ЖЖ 8 -> 0
Freq changed: ЖК 3 -> 0
Freq changed: ИИ 21 -> 0
Freq changed: КБ 37 -> 0
Freq changed: МБ 19 -> 0
Freq changed: МО 26 -> 0
Freq changed: ОС 40 -> 0
Freq changed: РФ 65 -> 0
Freq changed: СБ 21 -> 0
Freq changed: СК 23 -> 0
Freq changed: ТВ 37 -> 0
Freq changed: УК 36 -> 0
Freq changed: ЦБ 11 -> 0
Freq changed: ЦК 59 -> 0
Deleted: бэ 0
Freq changed: дБ 92 -> 0
Deleted: йо 0
Freq changed: мм 149 -> 0
Freq changed: рН 104 -> 0
Deleted: ша 0

Change-Id: I03f0f4e8d03e0f77f5879e6dd5c424673466afca
2012-12-18 17:25:37 +09:00
Jean Chalard 21dbe3701c Update dictionaries
cs, da, de, el, es, fi, fr, hr, it, lt, lv, nb, nl, pl,
pt_BR, pt_PT, sl, sr, sv, tr : rescale frequencies to match
spec. This has no large effect in the practice except the
dictionary will become stronger vs spatial model (especially in
lower count corpora, like lt, lv, sr)
en* : Small changes (rounding going the other way essentially)
ru : the above rescaling, and remove the following words:
Дре, ОСТа, Планше, легкими, легком, легкому, легкости,
легкую, нелегкие, нелегкий, нелегким, нелегкое, нелегкой,
нелегкую, полулегком and add нелёгкие, нелёгкое, нелёгкую;
other accented forms were already in the dictionary.

Change-Id: I40386c2ebd4d2be38874e822bde89db7cb512ae6
2012-12-18 13:06:48 +09:00
Jean Chalard d080986f93 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1354870724 <=> 1355112440
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1354870736 <=> 1355112451
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1354870744 <=> 1355112460
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1351676002 <=> 1355117676
  version : 26 <=> 28
Body :
Deleted: DoCoMo 40
Added: Docomo 40
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/fi_wordlist.combined.gz
Header :
  date : 1351676054 <=> 1355117691
  version : 26 <=> 28
Body :
Deleted: DoCoMo 28
Added: Docomo 28
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1354872988 <=> 1355117708
  version : 27 <=> 28
Body :
Deleted: DoCoMo 52
Added: Docomo 52
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/pt_PT_wordlist.combined.gz
Header :
  date : 1351676510 <=> 1355117723
  version : 26 <=> 28
Body :
Deleted: DoCoMo 48
Added: Docomo 48
Added: Softbank 25

>>> java/res/raw/main_en.dict
Header :
  date : 1354870744 <=> 1355112460
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> java/res/raw/main_es.dict
Header :
  date : 1353500806 <=> 1355117676
  version : 27 <=> 28
Body :
Deleted: DoCoMo 40
Added: Docomo 40
Added: KDDI 25
Added: Softbank 25

>>> java/res/raw/main_fr.dict
Header :
  date : 1354872988 <=> 1355117708
  version : 27 <=> 28
Body :
Deleted: DoCoMo 52
Added: Docomo 52
Added: KDDI 25
Added: Softbank 25

Change-Id: I3801cbe4535407f55ede8db327674d493a92d1ae
2012-12-10 14:52:43 +09:00
Jean Chalard bd793ed50d Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1353500789 <=> 1354870724
Body :
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1351675958 <=> 1354870736
  version : 26 <=> 27
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1353500998 <=> 1354870744
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1353500832 <=> 1354872988
Body :
Deleted: noël 71
Deleted: po 73
Deleted: ti 73
Added: Noël 71
Added: lose 1
Added: y'a 130

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1353567943 <=> 1354870130
Body :
Demote all CAPS words by 80
Freq changed: модно 51 -> 20

>>> java/res/raw/main_en.dict
Header :
  date : 1353500998 <=> 1354870744
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> java/res/raw/main_fr.dict
Header :
  date : 1353500832 <=> 1354872988
Body :
Deleted: noël 71
Deleted: po 73
Deleted: ti 73
Added: Noël 71
Added: lose 1
Added: y'a 130

>>> java/res/raw/main_ru.dict
Header :
  date : 1353567943 <=> 1354870130
Body :
Demote all CAPS words by 80
Freq changed: модно 51 -> 20

Change-Id: I6f2d1c359d716535923b22c33d7fa4c3b0a330e4
2012-12-07 18:52:21 +09:00
Jean Chalard b40a1ce50b Update RU dictionary header.
>>> dictionaries/ru_wordlist.combined.gz
>>> java/res/raw/main_ru.dict
Header :
  date : 1353500945 <=> 1353567943
  MULTIPLE_WORDS_DEMOTION_RATE : null <=> 0
Body :
  No differences

Bug: 7540132
Change-Id: I837831b1e214da64962cf1bb68c840a3d4e6bf76
2012-11-22 16:21:10 +09:00
Jean Chalard d5f53710c5 Update dictionaries and fix mistakes
- Combined de dict :
  Remove digraph shortcuts that were in by mistake.
- Combined en dict :
  Set freq of "baton" "batons" "mace" "puff"
  "puffs" and "tasers" to zero. They are offensive
  in en_GB.
- Combined en_GB dict :
  Change freq of "il" to 0 and flag it "not a word". Still
  in the dict as a whitelist entry for "I'll"; for some
  reason it had freq 99.
  Add "milk:122" and "practice:143"
- Combined fr dict :
  Add missing words : "Nostradamus:40" "défendais:30"
  "gmail:50" "générale:140" "hm:0" "hmm:0" "y'en:130"
  "l'apocalypse:31" "m'épuise:30" "recontacter:80"
  "t'annonce:30"
  Set freq of non-word shortcuts for digraphs to 1 instead
  of 0, allowing to gesture them.
- Combined ru dict :
  Remove a lot of two-character non-words.

- Binary de dict :
  Remove the obsolete "options" header, and add the "dictionary"
  header.
- Binary en dict :
  Flag "hoe" "hoes" "il" "shel" as non-words.
  Also drop freq of "il" and "shel" to 0
  Add the "locale" header that was missing.
- Binary es dict :
  Add the "dictionary" header.
- Binary fr dict :
  Add the same words as above. Non-word shortcuts were already
  set to 1.
- Binary it dict :
  Add a "dictionary" header. Also change freq of
  "Šarapova" from 50 to 37; not sure why it was 50.
- Binary pt_BR dict :
  Add a "dictionary" header.
- Binary ru dict :
  Add a "dictionary" header and remove the same words as above.

For all dictionaries : bump the version to 27.

Change-Id: I94fe7f8f42b31fdad223085c00a94115e14d2276
2012-11-21 22:03:24 +09:00
Jean Chalard 306e0a800f Update AOSP dictionaries.
Changes :
- Add "emoji"
- Change the whitelist target of "foo" from "for" to "too"
- Fix non-word frequencies to 0
- Fix the freq of common en_US vs en_GB words
- Add "connection" to the en_GB dictionary

Bug: 7368441
Bug: 7370033
Bug: 7371955
Change-Id: Ib22a97e97b486b05012d5496619557f406c441b9
2012-10-24 16:12:28 +09:00
Jean Chalard 3d83a1648b Update AOSP dictionaries.
Differences :
oh 90 -> 105
ooh 54 -> 54
hoy,kinkier,kinkiest,kinkiness,kinkily,kinky -> 0
trst -> remove

New whitelist entries (actually old that had not been applied)
"berm" -> "been"
"foe" -> "for"
"hid" -> "his"
"thong" -> "thing"

French :
Add "six" and remove some non-words

Bug: 7329149
Bug: 7356297
Change-Id: I55092f0538db8627148b0a314e50eff926c47275
2012-10-18 00:39:16 +09:00
Jean Chalard a44942810d Update the AOSP dictionaries for the 0-freq review
Bug: 7227265
Change-Id: I384f7d76cef67b96b106ddac96e4baf1fa32afd4
2012-10-03 21:15:27 +09:00
Jean Chalard d0cf96493c Use all Lexiteria sources and update existing directories.
New dictionaries :
- Danish
- Greek
- Finnish
- Lithuanian
- Latvian
- Dutch
- Polish
- Russian
- Slovene
- Serbian
- Swedish
- Turkish

Also, compress those files to reduce the footprint in the
repository.
Also, update and improve English and French dictionaries, and
add the ligatures shortcut into the French dictionary.
Finally, move the Russian binary dictionary here now that it
can at last be open sourced.

Bug: 5587752
Bug: 6775251
Bug: 6995793
Bug: 7149666
Change-Id: Iec9831d4dce425a2b5b0657571e4448436610525
2012-09-21 22:07:23 +09:00
Jean Chalard 6f7b1ff468 Update dictionaries.
- English : some words caught through regression tests
- English : some words externally reported
- French : some words externally reported
- French : finished review of all accented words

Bug: 6726969
Bug: 6730031
Change-Id: I37d0dc310db2c79e03ac7ad452391e92d9b13357
2012-06-29 19:30:01 +09:00
Jean Chalard 401e70535e Make sure whitelist targets are in the main dictionary
Bug: 6680976
Change-Id: Ieddb5eecb813da3a8a515930568e356bc3526386
2012-06-19 02:08:57 +09:00
Jean Chalard 79451e0a70 Update dictionaries.
- English dict scrubbed for distractors
- EN, FR, IT, DE include improvements from user feedback

Bug: 6394369
Change-Id: I9af5415d0b6a5edfea2956657b0fee7906ebb344
2012-06-16 04:25:43 +09:00
Jean Chalard 51fb65569a Improvements to the English dicts
Bug: 6394369
Change-Id: I7a4747386adef44e6d1a0c9fec52d09611f1ce10
2012-05-31 18:46:22 +09:00
Jean Chalard 3a6efa06e2 Small update to the English dictionaries
Demote 'HDTV'

Bug: 6563090
Change-Id: I39a1632397569cf79a8d67d93cdff5cf29f82f3a
2012-05-28 13:01:59 +09:00
Jean Chalard 383f4d6a69 Fix the name of the resource to lower case
Change-Id: Icbacf10702de20ef1a60d2648ee6440812d13f1d
2012-05-25 15:27:58 +09:00
Jean Chalard 1b3db401bc Add the dictionary for Portuguese to the apk
This adds about 1MB to the system image, but Ibae3cd55
has been committed to make up for it. Both those considered,
we are still adding 23kB to the build.

Bug: 6558327
Change-Id: Iae066d39a193a0a380d2872a35661920dd5cea54
2012-05-25 14:59:04 +09:00
Jean Chalard b2acdba809 Remove non-words from the French dictionary.
Change-Id: I98c546818aa456a534e833495deb670e79df4104
2012-05-24 17:16:41 +09:00
Jean Chalard 80058c73cb Update AOSP dictionaries
Change-Id: Ia6bb1f9d6df4a9f859f132affc9cb030f14effd9
2012-05-22 16:12:50 +09:00
Jean Chalard 624150b11b Update dictionaries.
Bug: 6517432
Bug: 6525702
Change-Id: I47a8c4612bffb16971575b59e9e20fd0276a2f92
2012-05-22 11:29:33 +09:00
Jean Chalard 1fc0c71fad Update French/English dictionaries to the latest version
Change-Id: I9c98280f900914d1af22b47019ebc0ad5ab175de
2012-05-18 18:54:37 +09:00
Jean Chalard 8fec807800 Add open-source-able word lists to AOSP.
Bug: 6458744
Change-Id: If28aeb7360ee7ec7408f55934ca2a684f032e338
2012-05-17 19:20:04 +09:00
Tadashi G. Takaoka a645d88228 Remove unused resources
Bug: 4436327
Change-Id: I2573786aac5fd8d543cf12d24c951b67c7353fd7
2011-05-16 16:22:39 +09:00
Tadashi G. Takaoka fa086c9076 Cleanup unused Java import
This change also fixes wrong file mode.

Change-Id: Ifcf4c9444ddcdc62d2e4b394891d6eee135c1e8f
2010-11-29 17:57:48 +09:00
Amith Yamasani 07b1603a3f Don't let the native code target be included twice when unbundling.
Move java code to a different directory so that the unbundled
version doesn't try to compile the native code again.

Change-Id: I05cf9e643824ddc448821f69805ccb0240c5b986
2010-03-09 15:01:09 -08:00