Commit Graph

6 Commits (6d4054689ca9649455106841630c72397d939cae)

Author SHA1 Message Date
Adrian Velicu 8dd31a28ae Update dictionaries (possibly_offensive flag)
Correctly encoding possibly offensive words with their correct
frequency and the possibly_offensive flag set.

Continuing to encode with zero frequency only distracters or
words that should never come up.

https://paste.googleplex.com/5167060875214848

Bug: 11031090
Change-Id: Ia394b1827f292ff8d4791cc2f3e6e50b5aff4cbe
2014-10-31 14:49:24 +09:00
Jean Chalard 004cec01a9 Update all dicts to version 44.
Bug: 13164302
Change-Id: I8dc1a839c7dcfaa08a53e26cb6600e9f871447ce
2014-02-24 21:27:25 +09:00
Jean Chalard a267ebed5a Update dictionaries
Add KitKat to all dictionaries.
Version
da, fi, pl : 29 → 40
cs, de, hr, it, lt, lv, nb, nl, sl, sr, sv, tr : 35 → 40
es : 36 → 40
en_gb, en_us, en, fr, pt_br, pt_pt : 39 → 40

Bug: 10958192
Change-Id: I14436616285ced5eb3b70b8c44b9243da94eed4f
2013-09-30 07:12:03 +00:00
Jean Chalard 420528ed97 Update dictionaries
>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1355802835 <=> 1357617878
Body :
Deleted: jai 50

>>> dictionaries/pl_wordlist.combined.gz
Header :
  date : 1355802847 <=> 1357618222
Body :
Added: żebyście 69
Added: żebyśmy 69

>>> java/res/raw/main_fr.dict
Header :
  date : 1355802835 <=> 1357617878
Body :
Deleted: jai 50

Change-Id: I8651a4689bea06d5fe2caead471ef52969c77089
2013-01-08 14:24:22 +09:00
Jean Chalard 21dbe3701c Update dictionaries
cs, da, de, el, es, fi, fr, hr, it, lt, lv, nb, nl, pl,
pt_BR, pt_PT, sl, sr, sv, tr : rescale frequencies to match
spec. This has no large effect in the practice except the
dictionary will become stronger vs spatial model (especially in
lower count corpora, like lt, lv, sr)
en* : Small changes (rounding going the other way essentially)
ru : the above rescaling, and remove the following words:
Дре, ОСТа, Планше, легкими, легком, легкому, легкости,
легкую, нелегкие, нелегкий, нелегким, нелегкое, нелегкой,
нелегкую, полулегком and add нелёгкие, нелёгкое, нелёгкую;
other accented forms were already in the dictionary.

Change-Id: I40386c2ebd4d2be38874e822bde89db7cb512ae6
2012-12-18 13:06:48 +09:00
Jean Chalard a424ff06ec Switch the AOSP word lists to the combined format.
This will help with managing the word lists.

Bug: 7388859
Change-Id: I89f049569b177d3027fe56d6c67eaca27d44dc7d
2012-10-31 18:52:00 +09:00