Commit Graph

62 Commits (ef83ec51bb928fcfcc3f81e85c796eda044f1359)

Author SHA1 Message Date
Adrian Velicu 8dd31a28ae Update dictionaries (possibly_offensive flag)
Correctly encoding possibly offensive words with their correct
frequency and the possibly_offensive flag set.

Continuing to encode with zero frequency only distracters or
words that should never come up.

https://paste.googleplex.com/5167060875214848

Bug: 11031090
Change-Id: Ia394b1827f292ff8d4791cc2f3e6e50b5aff4cbe
2014-10-31 14:49:24 +09:00
Adrian Velicu 5fd77cfcca Update dictionaries
>>> dictionaries/de_wordlist.combined.gz
Header :
  date : 1412325412 <=> 1412572955
Body :
Added: überzeugen 50

>>> java/res/raw/main_de.dict
Header :
  date : 1412325412 <=> 1412572955
Body :
Added: überzeugen 50

Change-Id: Ief0a0bbe1a280cdba59a74158cd6a4f5bd1b5287
2014-10-06 15:20:50 +09:00
Adrian Velicu 487a6a6949 Update dictionaries
>>> dictionaries/de_wordlist.combined.gz
Header :
  date : 1393228134 <=> 1412325412
  version : 44 <=> 52
Body :
Probability changed: kommen 0 -> 149
Added: Käsebrötchen 50
Added: Lädst 50
Added: Müllbeutel 50
Added: Theresienwiese 50
Added: Verdammtes 50
Added: Wurstbrötchen 50
Added: abgebe 50
Added: angucke 50
Added: async 20
Added: backends 20
Added: brate 50
Added: erschreckendes 50
Added: erwische 50
Added: fahrt 80
Added: fragst 100
Added: gepostet 50
Added: gewundert 80
Added: gucke 50
Added: hattet 50
Added: hinkriege 50
Added: hustet 50
Added: hättet 60
Added: irgendwer 60
Added: koche 50
Added: kriege 70
Added: lehrst 50
Added: motivierenden 50
Added: müsstest 50
Added: müsstet 50
Added: organisiere 50
Added: peilen 50
Added: probiere 50
Added: rede 50
Added: reserviere 50
Added: sag 120
Added: schickes 80
Added: schickst 90
Added: sitze 50
Added: standet 50
Added: stolpere 50
Added: stressig 50
Added: telefoniere 80
Added: wolltest 100
Added: wolltet 100
Added: würdet 100
Added: ziele 50
Added: ähnlich 50
Added: älteren 50
Added: übelriechend 80
Added: überholen 50
Added: überlege 50
Added: überlegen 50
Added: überlegt 50
Added: übermorgen 50
Added: übernachte 50
Added: überquert 50
Added: überstanden 50
Added: übrig 50
Added: übrigens 50

>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1402373154 <=> 1412325408
  version : 47 <=> 52
Body :
Deleted: Pinterest  25
Added: Edamame 25
Added: Pinterest 25
Added: amd 0

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1402373154 <=> 1412325184
  version : 47 <=> 52
Body :
Deleted: Pinterest  25
Added: Edamame 25
Added: Pinterest 25
Added: amd 0

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1402373178 <=> 1412325419
  version : 47 <=> 52
Body :
Deleted: Pinterest  25
Added: Edamame 25
Added: Pinterest 25
Added: amd 0

>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1404131686 <=> 1412325412
  version : 49 <=> 52
Body :
Added: cállese 30
Added: mándame 30
Added: recupérate 35

>>> dictionaries/ro_wordlist.combined.gz
Header :
  description : Româna <=> Română
  date : 1408019089 <=> 1412325511
  version : 50 <=> 52
Body :
!!!!!! Truncated. !!!!!!!

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1406597821 <=> 1412325424
  version : 50 <=> 52
Body :
Deleted: Агг 52
Deleted: ЗАГС 77
Deleted: КОНКАКАФ 19
Deleted: Монк 69
Probability changed: НКАО 13 -> 0
Probability changed: НКВД 46 -> 0
Probability changed: НКО 14 -> 0
Probability changed: НКР 22 -> 0
Deleted: НОМОС-БАНК 58
Deleted: ПДД 77
Probability changed: РНК 33 -> 0
Deleted: СМС 78
Probability changed: СНК 35 -> 0
Deleted: ТОО 14
Probability changed: ТЦ 85 -> 5
Probability changed: УНКВД 11 -> 0
Deleted: ФИО 65
Deleted: Эбля 49
Probability changed: асексуальность 59 -> 0
Probability changed: бисексуал 72 -> 0
Probability changed: бисексуалов 85 -> 0
Probability changed: бисексуальной 67 -> 0
Probability changed: бисексуальности 75 -> 0
Deleted: бумажке 94
Deleted: бумажку 104
Deleted: важней 86
Deleted: вероника 58
Deleted: вероники 54
Deleted: вероникой 29
Deleted: веронику 29
Deleted: влезет 94
Deleted: влезть 87
Deleted: врожденная 75
Deleted: врожденного 78
Deleted: врожденное 71
Deleted: врожденной 85
Deleted: врожденную 66
Deleted: врожденные 82
Deleted: врожденный 82
Deleted: врожденным 79
Deleted: врожденными 76
Deleted: врожденных 86
Probability changed: врождённая 68 -> 75
Probability changed: врождённое 69 -> 71
Probability changed: врождённой 80 -> 85
Probability changed: врождённые 78 -> 82
Probability changed: врождённый 77 -> 82
Probability changed: врождённым 74 -> 79
Probability changed: врождённых 80 -> 86
Probability changed: все-таки 113 -> 30
Deleted: вылезли 88
Deleted: г-же 65
Deleted: г-н 88
Deleted: г-на 88
Probability changed: га 135 -> 0
Probability changed: гг 160 -> 0
Probability changed: гетеросексуалов 73 -> 0
Probability changed: гетеросексуального 67 -> 0
Probability changed: гетеросексуальной 71 -> 0
Probability changed: гетеросексуальности 65 -> 0
Probability changed: гетеросексуальность 67 -> 0
Probability changed: гетеросексуальную 65 -> 0
Probability changed: гетеросексуальные 76 -> 0
Probability changed: гетеросексуальных 77 -> 0
Probability changed: гомосексуал 74 -> 0
Probability changed: гомосексуала 67 -> 0
Probability changed: гомосексуалам 75 -> 0
Probability changed: гомосексуалами 70 -> 0
Probability changed: гомосексуализм 91 -> 0
Probability changed: гомосексуализма 91 -> 0
Probability changed: гомосексуализме 74 -> 0
Probability changed: гомосексуализму 68 -> 0
Probability changed: гомосексуалист 80 -> 0
Probability changed: гомосексуалиста 72 -> 0
Probability changed: гомосексуалистам 69 -> 0
Probability changed: гомосексуалистами 69 -> 0
Probability changed: гомосексуалистов 94 -> 0
Probability changed: гомосексуалистом 78 -> 0
Probability changed: гомосексуалисты 77 -> 0
Probability changed: гомосексуалов 93 -> 0
Probability changed: гомосексуалом 65 -> 0
Probability changed: гомосексуалы 82 -> 0
Probability changed: гомосексуальная 70 -> 0
Probability changed: гомосексуального 78 -> 0
Probability changed: гомосексуальное 71 -> 0
Probability changed: гомосексуальной 93 -> 0
Probability changed: гомосексуальности 103 -> 0
Probability changed: гомосексуальность 100 -> 0
Probability changed: гомосексуальностью 73 -> 0
Probability changed: гомосексуальную 75 -> 0
Probability changed: гомосексуальные 92 -> 0
Probability changed: гомосексуальный 75 -> 0
Probability changed: гомосексуальным 74 -> 0
Probability changed: гомосексуальными 70 -> 0
Probability changed: гомосексуальных 91 -> 0
Probability changed: д-р 93 -> 0
Deleted: дада 72
Deleted: даша 55
Deleted: даши 47
Deleted: дашу 29
Probability changed: де 154 -> 30
Probability changed: др 156 -> 0
Deleted: зажги 92
Deleted: зажгу 89
Deleted: зажигай 95
Deleted: зажигаю 88
Probability changed: зоосексуальность 65 -> 0
Probability changed: иРНК 68 -> 0
Probability changed: кДНК 62 -> 0
Probability changed: кв 133 -> 0
Deleted: кио 49
Deleted: лег 91
Deleted: лезу 88
Deleted: лезь 91
Probability changed: ля 103 -> 30
Probability changed: мРНК 102 -> 0
Deleted: машка 29
Probability changed: микроРНК 65 -> 0
Deleted: мону 29
Probability changed: мтДНК 79 -> 0
Probability changed: мяРНК 65 -> 0
Deleted: нажрался 97
Deleted: налил 97
Deleted: налили 86
Probability changed: негетеросексуальной 73 -> 0
Probability changed: негетеросексуальный 73 -> 0
Deleted: орут 98
Deleted: отт 64
Deleted: паша 83
Deleted: паше 66
Deleted: пашей 69
Deleted: пашой 73
Deleted: подоконник 88
Deleted: подскажет 87
Deleted: подскажете 89
Deleted: подскажите 112
Deleted: покажите 95
Deleted: полезли 91
Probability changed: пр 129 -> 0
Probability changed: пре-мРНК 78 -> 0
Deleted: пресекся 73
Probability changed: рРНК 91 -> 0
Deleted: раздражённо 91
Deleted: сажусь 99
Deleted: саше 54
Probability changed: секс 106 -> 0
Probability changed: секс-символ 74 -> 0
Probability changed: секс-символов 65 -> 0
Probability changed: секс-символом 74 -> 0
Probability changed: секс-туризм 62 -> 0
Probability changed: секса 105 -> 0
Probability changed: сексе 93 -> 0
Deleted: секси 88
Probability changed: сексизм 63 -> 0
Probability changed: сексизма 72 -> 0
Probability changed: сексолог 75 -> 0
Probability changed: сексологии 80 -> 0
Probability changed: сексом 102 -> 0
Probability changed: сексу 80 -> 0
Probability changed: сексуальная 95 -> 0
Probability changed: сексуально 88 -> 0
Probability changed: сексуального 107 -> 0
Probability changed: сексуальное 98 -> 0
Probability changed: сексуальной 111 -> 0
Probability changed: сексуальном 84 -> 0
Probability changed: сексуальному 79 -> 0
Probability changed: сексуальности 99 -> 0
Probability changed: сексуальность 90 -> 0
Probability changed: сексуальностью 70 -> 0
Probability changed: сексуальную 95 -> 0
Probability changed: сексуальные 105 -> 0
Probability changed: сексуальный 91 -> 0
Probability changed: сексуальным 95 -> 0
Probability changed: сексуальными 84 -> 0
Probability changed: сексуальных 113 -> 0
Deleted: сете 78
Deleted: слезой 87
Deleted: соображаю 90
Probability changed: тРНК 86 -> 0
Deleted: тав 69
Probability changed: транссексуал 67 -> 0
Probability changed: транссексуалки 64 -> 0
Probability changed: транссексуалов 82 -> 0
Probability changed: транссексуалы 71 -> 0
Probability changed: транссексуальности 77 -> 0
Probability changed: транссексуальность 65 -> 0
Deleted: укажите 83
Probability changed: ул 137 -> 0
Deleted: устар 93
Deleted: эдак 99
Added: Вероника 58
Added: Вероники 54
Added: Вероникой 29
Added: Веронику 29
Added: Даша 55
Added: Даши 47
Added: Дашу 29
Added: Маш 57
Added: Машка 29
Added: Паша 83
Added: Паше 66
Added: Пашей 69
Added: Пашой 73
Added: Саше 54
Added: впросак 0
Added: врождённую 66
Added: втечение 0
Added: втечении 0
Added: лёг 97
Added: машу 80
Added: чтоли 0
Added: чтоль 0
Added: ща 0
Added: щас 0

>>> java/res/raw/main_de.dict
Header :
  date : 1393228134 <=> 1412325412
  version : 44 <=> 52
Body :
Probability changed: kommen 0 -> 149
Added: Käsebrötchen 50
Added: Lädst 50
Added: Müllbeutel 50
Added: Theresienwiese 50
Added: Verdammtes 50
Added: Wurstbrötchen 50
Added: abgebe 50
Added: angucke 50
Added: async 20
Added: backends 20
Added: brate 50
Added: erschreckendes 50
Added: erwische 50
Added: fahrt 80
Added: fragst 100
Added: gepostet 50
Added: gewundert 80
Added: gucke 50
Added: hattet 50
Added: hinkriege 50
Added: hustet 50
Added: hättet 60
Added: irgendwer 60
Added: koche 50
Added: kriege 70
Added: lehrst 50
Added: motivierenden 50
Added: müsstest 50
Added: müsstet 50
Added: organisiere 50
Added: peilen 50
Added: probiere 50
Added: rede 50
Added: reserviere 50
Added: sag 120
Added: schickes 80
Added: schickst 90
Added: sitze 50
Added: standet 50
Added: stolpere 50
Added: stressig 50
Added: telefoniere 80
Added: wolltest 100
Added: wolltet 100
Added: würdet 100
Added: ziele 50
Added: ähnlich 50
Added: älteren 50
Added: übelriechend 80
Added: überholen 50
Added: überlege 50
Added: überlegen 50
Added: überlegt 50
Added: übermorgen 50
Added: übernachte 50
Added: überquert 50
Added: überstanden 50
Added: übrig 50
Added: übrigens 50

>>> java/res/raw/main_en.dict
Header :
  date : 1402373178 <=> 1412325419
  version : 47 <=> 52
Body :
Deleted: Pinterest  25
Added: Edamame 25
Added: Pinterest 25
Added: amd 0

>>> java/res/raw/main_es.dict
Header :
  date : 1404131686 <=> 1412325412
  version : 49 <=> 52
Body :
Added: cállese 30
Added: mándame 30
Added: recupérate 35

>>> java/res/raw/main_ru.dict
Header :
  date : 1406597821 <=> 1412325424
  version : 50 <=> 52
Body :
Deleted: Агг 52
Deleted: ЗАГС 77
Deleted: КОНКАКАФ 19
Deleted: Монк 69
Probability changed: НКАО 13 -> 0
Probability changed: НКВД 46 -> 0
Probability changed: НКО 14 -> 0
Probability changed: НКР 22 -> 0
Deleted: НОМОС-БАНК 58
Deleted: ПДД 77
Probability changed: РНК 33 -> 0
Deleted: СМС 78
Probability changed: СНК 35 -> 0
Deleted: ТОО 14
Probability changed: ТЦ 85 -> 5
Probability changed: УНКВД 11 -> 0
Deleted: ФИО 65
Deleted: Эбля 49
Probability changed: асексуальность 59 -> 0
Probability changed: бисексуал 72 -> 0
Probability changed: бисексуалов 85 -> 0
Probability changed: бисексуальной 67 -> 0
Probability changed: бисексуальности 75 -> 0
Deleted: бумажке 94
Deleted: бумажку 104
Deleted: важней 86
Deleted: вероника 58
Deleted: вероники 54
Deleted: вероникой 29
Deleted: веронику 29
Deleted: влезет 94
Deleted: влезть 87
Deleted: врожденная 75
Deleted: врожденного 78
Deleted: врожденное 71
Deleted: врожденной 85
Deleted: врожденную 66
Deleted: врожденные 82
Deleted: врожденный 82
Deleted: врожденным 79
Deleted: врожденными 76
Deleted: врожденных 86
Probability changed: врождённая 68 -> 75
Probability changed: врождённое 69 -> 71
Probability changed: врождённой 80 -> 85
Probability changed: врождённые 78 -> 82
Probability changed: врождённый 77 -> 82
Probability changed: врождённым 74 -> 79
Probability changed: врождённых 80 -> 86
Probability changed: все-таки 113 -> 30
Deleted: вылезли 88
Deleted: г-же 65
Deleted: г-н 88
Deleted: г-на 88
Probability changed: га 135 -> 0
Probability changed: гг 160 -> 0
Probability changed: гетеросексуалов 73 -> 0
Probability changed: гетеросексуального 67 -> 0
Probability changed: гетеросексуальной 71 -> 0
Probability changed: гетеросексуальности 65 -> 0
Probability changed: гетеросексуальность 67 -> 0
Probability changed: гетеросексуальную 65 -> 0
Probability changed: гетеросексуальные 76 -> 0
Probability changed: гетеросексуальных 77 -> 0
Probability changed: гомосексуал 74 -> 0
Probability changed: гомосексуала 67 -> 0
Probability changed: гомосексуалам 75 -> 0
Probability changed: гомосексуалами 70 -> 0
Probability changed: гомосексуализм 91 -> 0
Probability changed: гомосексуализма 91 -> 0
Probability changed: гомосексуализме 74 -> 0
Probability changed: гомосексуализму 68 -> 0
Probability changed: гомосексуалист 80 -> 0
Probability changed: гомосексуалиста 72 -> 0
Probability changed: гомосексуалистам 69 -> 0
Probability changed: гомосексуалистами 69 -> 0
Probability changed: гомосексуалистов 94 -> 0
Probability changed: гомосексуалистом 78 -> 0
Probability changed: гомосексуалисты 77 -> 0
Probability changed: гомосексуалов 93 -> 0
Probability changed: гомосексуалом 65 -> 0
Probability changed: гомосексуалы 82 -> 0
Probability changed: гомосексуальная 70 -> 0
Probability changed: гомосексуального 78 -> 0
Probability changed: гомосексуальное 71 -> 0
Probability changed: гомосексуальной 93 -> 0
Probability changed: гомосексуальности 103 -> 0
Probability changed: гомосексуальность 100 -> 0
Probability changed: гомосексуальностью 73 -> 0
Probability changed: гомосексуальную 75 -> 0
Probability changed: гомосексуальные 92 -> 0
Probability changed: гомосексуальный 75 -> 0
Probability changed: гомосексуальным 74 -> 0
Probability changed: гомосексуальными 70 -> 0
Probability changed: гомосексуальных 91 -> 0
Probability changed: д-р 93 -> 0
Deleted: дада 72
Deleted: даша 55
Deleted: даши 47
Deleted: дашу 29
Probability changed: де 154 -> 30
Probability changed: др 156 -> 0
Deleted: зажги 92
Deleted: зажгу 89
Deleted: зажигай 95
Deleted: зажигаю 88
Probability changed: зоосексуальность 65 -> 0
Probability changed: иРНК 68 -> 0
Probability changed: кДНК 62 -> 0
Probability changed: кв 133 -> 0
Deleted: кио 49
Deleted: лег 91
Deleted: лезу 88
Deleted: лезь 91
Probability changed: ля 103 -> 30
Probability changed: мРНК 102 -> 0
Deleted: машка 29
Probability changed: микроРНК 65 -> 0
Deleted: мону 29
Probability changed: мтДНК 79 -> 0
Probability changed: мяРНК 65 -> 0
Deleted: нажрался 97
Deleted: налил 97
Deleted: налили 86
Probability changed: негетеросексуальной 73 -> 0
Probability changed: негетеросексуальный 73 -> 0
Deleted: орут 98
Deleted: отт 64
Deleted: паша 83
Deleted: паше 66
Deleted: пашей 69
Deleted: пашой 73
Deleted: подоконник 88
Deleted: подскажет 87
Deleted: подскажете 89
Deleted: подскажите 112
Deleted: покажите 95
Deleted: полезли 91
Probability changed: пр 129 -> 0
Probability changed: пре-мРНК 78 -> 0
Deleted: пресекся 73
Probability changed: рРНК 91 -> 0
Deleted: раздражённо 91
Deleted: сажусь 99
Deleted: саше 54
Probability changed: секс 106 -> 0
Probability changed: секс-символ 74 -> 0
Probability changed: секс-символов 65 -> 0
Probability changed: секс-символом 74 -> 0
Probability changed: секс-туризм 62 -> 0
Probability changed: секса 105 -> 0
Probability changed: сексе 93 -> 0
Deleted: секси 88
Probability changed: сексизм 63 -> 0
Probability changed: сексизма 72 -> 0
Probability changed: сексолог 75 -> 0
Probability changed: сексологии 80 -> 0
Probability changed: сексом 102 -> 0
Probability changed: сексу 80 -> 0
Probability changed: сексуальная 95 -> 0
Probability changed: сексуально 88 -> 0
Probability changed: сексуального 107 -> 0
Probability changed: сексуальное 98 -> 0
Probability changed: сексуальной 111 -> 0
Probability changed: сексуальном 84 -> 0
Probability changed: сексуальному 79 -> 0
Probability changed: сексуальности 99 -> 0
Probability changed: сексуальность 90 -> 0
Probability changed: сексуальностью 70 -> 0
Probability changed: сексуальную 95 -> 0
Probability changed: сексуальные 105 -> 0
Probability changed: сексуальный 91 -> 0
Probability changed: сексуальным 95 -> 0
Probability changed: сексуальными 84 -> 0
Probability changed: сексуальных 113 -> 0
Deleted: сете 78
Deleted: слезой 87
Deleted: соображаю 90
Probability changed: тРНК 86 -> 0
Deleted: тав 69
Probability changed: транссексуал 67 -> 0
Probability changed: транссексуалки 64 -> 0
Probability changed: транссексуалов 82 -> 0
Probability changed: транссексуалы 71 -> 0
Probability changed: транссексуальности 77 -> 0
Probability changed: транссексуальность 65 -> 0
Deleted: укажите 83
Probability changed: ул 137 -> 0
Deleted: устар 93
Deleted: эдак 99
Added: Вероника 58
Added: Вероники 54
Added: Вероникой 29
Added: Веронику 29
Added: Даша 55
Added: Даши 47
Added: Дашу 29
Added: Маш 57
Added: Машка 29
Added: Паша 83
Added: Паше 66
Added: Пашей 69
Added: Пашой 73
Added: Саше 54
Added: впросак 0
Added: врождённую 66
Added: втечение 0
Added: втечении 0
Added: лёг 97
Added: машу 80
Added: чтоли 0
Added: чтоль 0
Added: ща 0
Added: щас 0

Change-Id: I0c6bf1a1ecc9edf03523bfb080774738aa40d163
2014-10-06 10:13:37 +09:00
Jean Chalard 23f41049d6 Add the source for the Romanian dictionary
This is only informational data - it has no functional
impact at all.

Bug: 7645206
Change-Id: I01f0c2b4fba17a37079531c9a5246c796c836d18
2014-08-18 12:49:56 +09:00
Jean Chalard ae41058659 Improve the russian dictionary.
Deleted: 38 words
Probability adjusted: 11 words
Added: 1299 words

[Category diff]
+1      15
-1       0
+2       0
-2       0
+3       0
-3       0
+4       0
-4       0
+5       0
-5       3
+6       1
-6       0
+7       0
-7      13

[Weighted category diff]
+1      15
-1       0
+2       0
-2       0
+3       0
-3       0
+4       0
-4       0
+5       0
-5       3
+6       1
-6       0
+7       0
-7      13

Change-Id: I1a6513954d60b30738cb849578ce535c5e05eb1a
2014-07-29 13:31:23 +09:00
Jean Chalard bb0d93c4b0 Update dictionaries
>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1403847862 <=> 1404131686
  version : 48 <=> 49
Body :
Added: apurate 50
Added: bondi 50
Added: chamuyar 50
Added: conocela 50
Added: conocelo 50
Added: conoceme 50
Added: conocenos 50
Added: conocete 50
Added: copate 50
Added: creele 50
Added: creeme 50
Added: creenos 50
Added: creete 50
Added: creiste 50
Added: creés 50
Added: dale 50
Added: dame 50
Added: danos 50
Added: decile 50
Added: decime 50
Added: decinos 50
Added: estate 50
Added: hablale 50
Added: hablales 50
Added: hablame 50
Added: hablanos 50
Added: hablate 50
Added: hablá 50
Added: hacele 50
Added: haceme 50
Added: hacenos 50
Added: hacete 50
Added: hacés 50
Added: llegás 50
Added: llevale 50
Added: llevame 50
Added: llevanos 50
Added: llevate 50
Added: llevá 50
Added: llevás 50
Added: parecé 50
Added: parecés 50
Added: pasala 50
Added: pasale 50
Added: pasales 50
Added: pasalo 50
Added: pasame 50
Added: pasanos 50
Added: pasate 50
Added: pasás 50
Added: podés 50
Added: ponele 50
Added: poneme 50
Added: ponenos 50
Added: ponete 50
Added: quedá 50
Added: querela 50
Added: querelo 50
Added: quereme 50
Added: querenos 50
Added: querete 50
Added: querés 50
Added: rascate 50
Added: sabelo 50
Added: sabés 50
Added: tenele 50
Added: teneme 50
Added: tenenos 50
Added: tenete 50
Added: tenés 50

>>> java/res/raw/main_es.dict
Header :
  date : 1403847862 <=> 1404131686
  version : 48 <=> 49
Body :
Same changes

Bug: 8010862
Change-Id: I98fc8542e21e35a7c80b332148c461144425e61a
2014-07-01 18:19:30 +09:00
Jean Chalard a70b710c9d Update the Spanish dictionary
>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1403153360 <=> 1403847862
  version : 47 <=> 48
Body :
Added: bañate 30
Added: correte 30
Added: duchate 30
Added: mostrame 40
Added: muestrame 40
Added: prestame 40
Added: sos 100

>>> java/res/raw/main_es.dict
Header :
  date : 1403153360 <=> 1403847862
  version : 47 <=> 48
Body :
Added: bañate 30
Added: correte 30
Added: duchate 30
Added: mostrame 40
Added: muestrame 40
Added: prestame 40
Added: sos 100

Bug: 8010862
Change-Id: I0a478b5fd5edfadea420f306dc9b2d98876c246e
2014-06-27 14:56:29 +09:00
Jean Chalard 75bc45cb12 Update dictionaries
>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1401802362 <=> 1403153360
  version : 45 <=> 47
Body :
Added: grandísimo 30

>>> java/res/raw/main_es.dict
Header :
  date : 1401802362 <=> 1403153360
  version : 45 <=> 47
Body :
Added: grandísimo 30

Bug: 15719556
Change-Id: Ifaa97d40d52a278e41f4dd1292781494d4eb939b
2014-06-23 16:56:00 +09:00
Jean Chalard 267a8614a0 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1400639634 <=> 1402373154
  version : 45 <=> 47
Body :
Shortcut added: lust list 15

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1400750526 <=> 1402373154
  version : 45 <=> 47
Body :
Shortcut added: lust list 15

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1400639659 <=> 1402373178
  version : 45 <=> 47
Body :
Shortcut added: lust list 15

>>> java/res/raw/main_en.dict
Header :
  date : 1400639659 <=> 1402373178
  version : 45 <=> 47
Body :
Shortcut added: lust list 15

Bug: 15347469
Change-Id: I35cb410bdb7b641f2f0d4d9bb19a17e3f4eb9c0b
2014-06-10 14:08:32 +09:00
Jean Chalard ff3e488e1e Enrich the Spanish dictionary.
Enrich the dictionary with many words generated from stems
extracted from the dictionary and rules written by hand.
This adds 45,619 words to the dictionary. Hopefully, almost none
of them is incorrect, though a lot are not very common.

Bug: 8010862
Change-Id: I51c7ebd16ff859ec1e765b0604dd1cfca159ab08
2014-06-03 22:48:19 +09:00
Jean Chalard 2e795144a6 Update dictionaries
No TRT differences

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1400639634 <=> 1400750526
Body :
Added: google 72

Bug: 11822756
Change-Id: I399fc4e97f4d9e0092ee153f3f6dc5b29ca4d3bd
2014-05-22 20:16:26 +09:00
Jean Chalard 0c80c5e200 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1393228134 <=> 1400639634
  version : 44 <=> 45
Body :
Deleted: SVD 73
Added: google 72

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1393228135 <=> 1400639634
  version : 44 <=> 45
Body :
Deleted: SVD 73

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1393228155 <=> 1400639659
  version : 44 <=> 45
Body :
Deleted: SVD 73
Added: google 72

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1393228135 <=> 1400639634
  version : 44 <=> 45
Body :
Deleted: Déshabillez-moi 27
Deleted: Déshabillons-les 22
Deleted: Laisse-les 14
Deleted: Pendez-les 34
Deleted: Pendez-moi 14
Deleted: Regardez-les 22
Deleted: Saint-Louis-les 38
Deleted: Saint-Vincent-et-les 62
Deleted: Sortez-les 14
Deleted: brula 56
Deleted: brulaient 38
Deleted: brulait 27
Deleted: brulant 68
Deleted: brulante 45
Deleted: brulantes 38
Deleted: brulants 38
Deleted: brule 64
Deleted: brulent 57
Deleted: bruler 67
Deleted: brulera 31
Deleted: bruleur 38
Deleted: bruleurs 38
Deleted: brulez 22
Deleted: brulis 46
Deleted: brulot 36
Deleted: brulots 45
Deleted: brulure 49
Deleted: brulures 56
Deleted: brulèrent 60
Deleted: brulé 82
Deleted: brulée 71
Deleted: brulées 67
Deleted: brulés 74
Deleted: coutaient 43
Deleted: coutait 61
Deleted: coutant 58
Deleted: coutent 60
Deleted: couter 57
Deleted: coutera 60
Deleted: couterait 52
Deleted: couteuse 72
Deleted: couteusement 45
Deleted: couteuses 65
Deleted: couteux 81
Deleted: coutât 31
Deleted: coutèrent 52
Deleted: couté 81
Deleted: rent 51
Deleted: street 96
Added: déshabillez-moi 27
Added: déshabillons-les 22
Added: laisse-les 14
Added: pendez-les 34
Added: pendez-moi 14
Added: regardez-les 22
Added: sortez-les 14

>>> java/res/raw/main_en.dict
Header :
  date : 1393228155 <=> 1400639659
  version : 44 <=> 45
Body :
Deleted: SVD 73
Added: google 72

>>> java/res/raw/main_fr.dict
Header :
  date : 1393228135 <=> 1400639634
  version : 44 <=> 45
Body :
Deleted: Déshabillez-moi 27
Deleted: Déshabillons-les 22
Deleted: Laisse-les 14
Deleted: Pendez-les 34
Deleted: Pendez-moi 14
Deleted: Regardez-les 22
Deleted: Saint-Louis-les 38
Deleted: Saint-Vincent-et-les 62
Deleted: Sortez-les 14
Deleted: brula 56
Deleted: brulaient 38
Deleted: brulait 27
Deleted: brulant 68
Deleted: brulante 45
Deleted: brulantes 38
Deleted: brulants 38
Deleted: brule 64
Deleted: brulent 57
Deleted: bruler 67
Deleted: brulera 31
Deleted: bruleur 38
Deleted: bruleurs 38
Deleted: brulez 22
Deleted: brulis 46
Deleted: brulot 36
Deleted: brulots 45
Deleted: brulure 49
Deleted: brulures 56
Deleted: brulèrent 60
Deleted: brulé 82
Deleted: brulée 71
Deleted: brulées 67
Deleted: brulés 74
Deleted: coutaient 43
Deleted: coutait 61
Deleted: coutant 58
Deleted: coutent 60
Deleted: couter 57
Deleted: coutera 60
Deleted: couterait 52
Deleted: couteuse 72
Deleted: couteusement 45
Deleted: couteuses 65
Deleted: couteux 81
Deleted: coutât 31
Deleted: coutèrent 52
Deleted: couté 81
Deleted: rent 51
Deleted: street 96
Added: déshabillez-moi 27
Added: déshabillons-les 22
Added: laisse-les 14
Added: pendez-les 34
Added: pendez-moi 14
Added: regardez-les 22
Added: sortez-les 14

Bug: 15065819
Bug: 13618068
Change-Id: I3dbe2f1d8868e0880ac76058d99346242bace8cc
2014-05-21 17:33:06 +09:00
Jean Chalard 004cec01a9 Update all dicts to version 44.
Bug: 13164302
Change-Id: I8dc1a839c7dcfaa08a53e26cb6600e9f871447ce
2014-02-24 21:27:25 +09:00
Jean Chalard 66c96e8813 Update dictionaries
en* : add common app and Google product names
en_GB : also add "filters"
ru : add some missing words

Bug: 11043181
Bug: 12276653
Bug: 12953122
Change-Id: I6b62e681a07b7f0149a10ba4e05954e60d6212d4
2014-02-24 15:30:47 +09:00
Jean Chalard 155cb77231 Update dictionaries
This change has no effect on TRT results.

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1381226409 <=> 1389654051
  version : 42 <=> 43
Body :
Added: dialogue 120
Added: dialogues 94

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1381226409 <=> 1389654052
  version : 42 <=> 43
Body :
Deleted: d'Orange 114
Added: d'orange 114

>>> dictionaries/it_wordlist.combined.gz
Header :
  date : 1380519383 <=> 1389654052
  version : 40 <=> 43
Body :
Freq changed: ciao 85 -> 180

>>> java/res/raw/main_fr.dict
Header :
  date : 1381226409 <=> 1389654052
  version : 42 <=> 43
Body :
Deleted: d'Orange 114
Added: d'orange 114

>>> java/res/raw/main_it.dict
Header :
  date : 1380519383 <=> 1389654052
  version : 40 <=> 43
Body :
Freq changed: ciao 85 -> 180

Bug: 12487270
Bug: 12344108
Change-Id: I94768e223d05ad2551a5508e9e01222a028665c4
2014-01-14 10:37:15 +09:00
Jean Chalard b1eedc6ba0 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1381130519 <=> 1381226409
  version : 41 <=> 42
Body :
Added: haha 45

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1380293342 <=> 1381226409
  version : 40 <=> 42
Body :
Added: haha 45

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1380293363 <=> 1381226429
  version : 40 <=> 42
Body :
Added: haha 45

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1380519383 <=> 1381226409
  version : 40 <=> 42
Body :
Freq changed: haha 0 -> 30

>>> java/res/raw/main_en.dict
Header :
  date : 1380293363 <=> 1381226429
  version : 40 <=> 42
Body :
Added: haha 45

>>> java/res/raw/main_fr.dict
Header :
  date : 1380519383 <=> 1381226409
  version : 40 <=> 42
Body :
Freq changed: haha 0 -> 30

Bug: 11114205
Change-Id: I39d429d24d93ee07a70d8613ce0752432b26acc4
2013-10-08 10:34:56 +00:00
Jean Chalard 0ce97695dc Update en_GB dictionary
Header :
  date : 1380293342 <=> 1381130519
  version : 40 <=> 41
Body :
Added: filter 115

Bug: 11076171
Change-Id: I4e88b38b61b794c58b645f7b39e28524d979caba
2013-10-07 17:58:38 +09:00
Jean Chalard a267ebed5a Update dictionaries
Add KitKat to all dictionaries.
Version
da, fi, pl : 29 → 40
cs, de, hr, it, lt, lv, nb, nl, sl, sr, sv, tr : 35 → 40
es : 36 → 40
en_gb, en_us, en, fr, pt_br, pt_pt : 39 → 40

Bug: 10958192
Change-Id: I14436616285ced5eb3b70b8c44b9243da94eed4f
2013-09-30 07:12:03 +00:00
Jean Chalard 50b36e2a4b Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1374721653 <=> 1380099152
  version : 36 <=> 39
Body :
Freq changed: gay 127 -> 10
Added: draft 138

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1374721654 <=> 1380099152
  version : 36 <=> 39
Body :
Freq changed: gay 127 -> 10

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1374721663 <=> 1380099172
  version : 36 <=> 39
Body :
Freq changed: gay 127 -> 10

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1376888819 <=> 1380099153
  version : 37 <=> 39
Body :
Added: septembre 150

>>> dictionaries/pt_BR_wordlist.combined.gz
Header :
  date : 1376884524 <=> 1380099168
  version : 37 <=> 39
Body :
Freq changed: atras 87 -> 0
Not a word: atras false -> true
Shortcut added: atras atrás 15
Shortcut added: cade cadê 15
Shortcut added: cafe café 15
Shortcut added: ferias férias 15
Shortcut added: musica música 15
Shortcut added: musicas músicas 15

>>> dictionaries/pt_PT_wordlist.combined.gz
Header :
  date : 1376884536 <=> 1380099168
  version : 37 <=> 39
Body :
Shortcut added: atras atrás 15
Shortcut added: cade cadê 15
Shortcut added: ferias férias 15
Shortcut added: musica música 15
Shortcut added: musicas músicas 15
Added: cafe 0

>>> java/res/raw/main_en.dict
Header :
  date : 1374721663 <=> 1380099172
  version : 36 <=> 39
Body :
Freq changed: gay 127 -> 10

>>> java/res/raw/main_fr.dict
Header :
  date : 1376888819 <=> 1380099153
  version : 37 <=> 39
Body :
Added: septembre 150

>>> java/res/raw/main_pt_br.dict
Header :
  date : 1376884524 <=> 1380099168
  version : 37 <=> 39
Body :
Freq changed: atras 87 -> 0
Not a word: atras false -> true
Shortcut added: atras atrás 15
Shortcut added: cade cadê 15
Shortcut added: cafe café 15
Shortcut added: ferias férias 15
Shortcut added: musica música 15
Shortcut added: musicas músicas 15

Bug: 10504313
Bug: 10507536
Bug: 10561100
Change-Id: I4267c76cf0de221a703523d5f2dd2befbaf020a0
2013-09-26 08:34:53 +00:00
Jean Chalard 5937c03f15 Update dictionaries
Bug: 10354668
Bug: 10188528

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1374634549 <=> 1376888819
  version : 36 <=> 37
Body :
Deleted: color 78
Deleted: men 85
Deleted: o 115
Added: nationaux 120

>>> dictionaries/iw_wordlist.combined.gz
Added. New dictionary.

>>> dictionaries/pt_BR_wordlist.combined.gz
Header :
  date : 1374634563 <=> 1376884524
  version : 36 <=> 37
Body :
Deleted: la 152

>>> dictionaries/pt_PT_wordlist.combined.gz
Header :
  date : 1357790930 <=> 1376884536
  version : 30 <=> 37
Body :
Deleted: la 152

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1372393835 <=> 1376897704
  version : 35 <=> 37
Body :
Freq changed: говно 68 -> 0

>>> java/res/raw/main_fr.dict
Header :
  date : 1374634549 <=> 1376888819
  version : 36 <=> 37
Body :
Deleted: color 78
Deleted: men 85
Deleted: o 115
Added: nationaux 120

>>> java/res/raw/main_pt_br.dict
Header :
  date : 1374634563 <=> 1376884524
  version : 36 <=> 37
Body :
Deleted: la 152

>>> java/res/raw/main_ru.dict
Header :
  date : 1372393835 <=> 1376897704
  version : 35 <=> 37
Body :
Freq changed: говно 68 -> 0

Change-Id: I87a85571c61068ff46a32d291aa43becbb75598a
2013-08-19 16:41:09 +09:00
Jean Chalard 665e4ecc62 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1374634548 <=> 1374721653
Body :
Added: Caltrain 30

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1374634548 <=> 1374721654
Body :
Added: Caltrain 30

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1374634568 <=> 1374721663
Body :
Added: Caltrain 30

>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1372393817 <=> 1374721654
  version : 35 <=> 36
Body :
Added: Caltrain 10

>>> java/res/raw/main_en.dict
Header :
  date : 1374634568 <=> 1374721663
Body :
Added: Caltrain 30

>>> java/res/raw/main_es.dict
Header :
  date : 1372393817 <=> 1374721654
  version : 35 <=> 36
Body :
Added: Caltrain 10

Bug: 9995706
Change-Id: Icf96bf01e45ef94d3ffd6d6a9d6431c52f0f5a86
2013-07-25 12:48:55 +09:00
Jean Chalard f0046aea26 Update dictionaries
en, en_GB, en_US:
Add "id" -> "I'd" whitelist entry
Reinstate "id" and "ID" in the respective dicts

fr:
Remove many words that are not French
Change "google" to "Google"

pt_BR:
Delete "idéia"

Change-Id: I942266ac7995345580926f60de45d202aa257ae7
2013-07-24 12:10:06 +09:00
Jean Chalard ffe7dbbe7a Update dictionaries
>>> dictionaries/cs_wordlist.combined.gz
Header :
  date : 1355802831 <=> 1372393817
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/de_wordlist.combined.gz
Header :
  date : 1355802835 <=> 1372393817
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1366272052 <=> 1372393817
  version : 31 <=> 35
Body :
Deleted: Sea 126
Added: LTE 25

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1366272093 <=> 1372393817
  version : 31 <=> 35
Body :
Added: LTE 25

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1366272977 <=> 1372393837
  version : 31 <=> 35
Body :
Deleted: Sea 126
Added: LTE 25

>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1355802832 <=> 1372393817
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1366272255 <=> 1372393818
  version : 31 <=> 35
Body :
Deleted: R'n'B 95
Deleted: count 60
Deleted: d'Inti 34
Added: beurk 25

>>> dictionaries/hr_wordlist.combined.gz
Header :
  date : 1355802836 <=> 1372393818
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/it_wordlist.combined.gz
Header :
  date : 1355802836 <=> 1372393818
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/lt_wordlist.combined.gz
Header :
  date : 1355802843 <=> 1372393818
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/lv_wordlist.combined.gz
Header :
  date : 1355802843 <=> 1372393818
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/nb_wordlist.combined.gz
Header :
  date : 1366003450 <=> 1372393818
  version : 31 <=> 35
Body :
Added: LTE 25

>>> dictionaries/nl_wordlist.combined.gz
Header :
  date : 1355802844 <=> 1372393818
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1370244430 <=> 1372393835
  version : 34 <=> 35
Body :
Freq changed: связывание 93 -> 0

>>> dictionaries/sl_wordlist.combined.gz
Header :
  date : 1355802835 <=> 1372393835
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/sr_wordlist.combined.gz
Header :
  date : 1355802853 <=> 1372393835
  version : 29 <=> 35
Body :
Added: LTE 25

>>> dictionaries/sv_wordlist.combined.gz
Header :
  date : 1366003804 <=> 1372393836
  version : 31 <=> 35
Body :
Added: LTE 25

>>> dictionaries/tr_wordlist.combined.gz
Header :
  date : 1355802858 <=> 1372393837
  version : 29 <=> 35
Body :
Added: LTE 25

>>> java/res/raw/main_de.dict
Header :
  date : 1355802835 <=> 1372393817
  version : 29 <=> 35
Body :
Added: LTE 25

>>> java/res/raw/main_en.dict
Header :
  date : 1366272977 <=> 1372393837
  version : 31 <=> 35
Body :
Deleted: Sea 126
Added: LTE 25

>>> java/res/raw/main_es.dict
Header :
  date : 1355802832 <=> 1372393817
  version : 29 <=> 35
Body :
Added: LTE 25

>>> java/res/raw/main_fr.dict
Header :
  date : 1366272255 <=> 1372393818
  version : 31 <=> 35
Body :
Deleted: R'n'B 95
Deleted: count 60
Deleted: d'Inti 34
Added: beurk 25

>>> java/res/raw/main_it.dict
Header :
  date : 1355802836 <=> 1372393818
  version : 29 <=> 35
Body :
Added: LTE 25

>>> java/res/raw/main_ru.dict
Header :
  date : 1370244430 <=> 1372393835
  version : 34 <=> 35
Body :
Freq changed: связывание 93 -> 0

Bug: 9301610
Bug: 9607966
Change-Id: I1117ed85d97fbb0ee50f11bc31776f1970b56f12
2013-06-28 14:54:51 +09:00
Jean Chalard e73802f335 Update dictionaries
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1366974711 <=> 1370244430
  MULTIPLE_WORDS_DEMOTION_RATE : 0 <=> 50
  version : 32 <=> 34
Body :
Deleted: МДА 2
Freq changed: а 0 -> 60
Freq changed: в 0 -> 60
Deleted: возбужденные 0
Freq changed: гей 92 -> 0
Freq changed: жид 80 -> 0
Freq changed: зареган 0 -> 50
Freq changed: и 0 -> 60
Freq changed: к 0 -> 60
Deleted: клевом 0
Freq changed: куи 29 -> 0
Freq changed: лох 69 -> 0
Freq changed: о 0 -> 60
Freq changed: ребут 0 -> 50
Freq changed: с 0 -> 60
Freq changed: у 0 -> 60
Freq changed: хуй 77 -> 0
Freq changed: хукера 38 -> 0
Freq changed: широко 0 -> 144
Deleted: щеткой 70
Freq changed: щёткой 69 -> 70
Freq changed: я 0 -> 60
Added: жены 134
Added: звони 100
Added: клёвом 50
Added: мда 0

>>> java/res/raw/main_ru.dict
Header :
  date : 1366974711 <=> 1370244430
  version : 32 <=> 34
  MULTIPLE_WORDS_DEMOTION_RATE : 0 <=> 50
Body :
(same changes)

Change-Id: Ie10bdd1f33cac43c5be35e99faef7cfdfe877d2b
2013-06-03 16:41:12 +09:00
Jean Chalard d57a7748c1 Update dictionaries
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1366957492 <=> 1366974711
Body :
Added: ложись 100
Added: под 100
Added: посмотрю 100
Added: угу 100
Added: ух 100

>>> java/res/raw/main_ru.dict
Header :
  date : 1366957492 <=> 1366974711
Body :
Added: ложись 100
Added: под 100
Added: посмотрю 100
Added: угу 100
Added: ух 100

Change-Id: Ida39ea2cf25cd291554f3b2f3ce31f57dca24113
2013-04-26 20:15:14 +09:00
Jean Chalard 7ec72b80ed Update dictionaries
Full diff too long: truncated

Summary diff
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1366277083 <=> 1366957492
  version : 31 <=> 32
Contents :
  - Reinstate 2- and 3- letter words that were demoted to avoid
    bad space insertion (343 entries)
  - Add missing words as per b/6341908 and b/5674314
    (98 entries)

This has zero effect on the regression tests

Bug: 6341908
Bug: 5674314
Change-Id: Ifce268a7eab5edd264d963489187e975017f8b72
2013-04-26 15:56:54 +09:00
Jean Chalard 9cf468646f Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1366021966 <=> 1366272052
Body :
Added: yt 0

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1366021978 <=> 1366272093
Body :
Added: yt 0

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1366021987 <=> 1366272977
Body :
Added: yt 0

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1366003217 <=> 1366272255
Body :
Freq changed: cash 80 -> 20

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1366003693 <=> 1366277083
Body :
Deleted: толщ 76

>>> java/res/raw/main_en.dict
Header :
  date : 1366021987 <=> 1366272977
Body :
Added: yt 0

>>> java/res/raw/main_fr.dict
Header :
  date : 1366003217 <=> 1366272255
Body :
Freq changed: cash 80 -> 20

>>> java/res/raw/main_ru.dict
Header :
  date : 1366003693 <=> 1366277083
Body :
Deleted: толщ 76

Bug: 8635822
Change-Id: I44dc73bd010b125c994387894847a008276d69f7
2013-04-18 18:41:19 +09:00
Jean Chalard e99daea083 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1366003032 <=> 1366021966
Body :
Deleted: FTP 88
Deleted: HTTPS 66
Added: www 72

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1366003070 <=> 1366021978
Body :
Deleted: FTP 88
Deleted: HTTPS 66
Added: http 95
Added: www 71

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1366003861 <=> 1366021987
Body :
Deleted: FTP 88
Deleted: HTTPS 66
Freq changed: http 120 -> 95
Added: www 71

>>> java/res/raw/main_en.dict
Header :
  date : 1366003861 <=> 1366021987
Body :
Deleted: FTP 88
Deleted: HTTPS 66
Freq changed: http 120 -> 95
Added: www 71

Bug: 8233807
Change-Id: Id55f6e0dcc9ddff26902c0857edcbb9b10d42328
2013-04-15 20:25:48 +09:00
Jean Chalard da175bdcb1 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1355802832 <=> 1366003032
  version : 29 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 72
Added: mm 135

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1355112451 <=> 1366003070
  version : 28 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 71
Added: mm 135

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1355802851 <=> 1366003861
  version : 29 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 71
Added: mm 135

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1357617878 <=> 1366003217
  version : 29 <=> 31
Body :
Not a word: re false -> true
Shortcut added: re le 15

>>> dictionaries/nb_wordlist.combined.gz
Header :
  date : 1355802836 <=> 1366003450
  version : 29 <=> 31
Body :
Freq changed: iPhone 91 -> 30
Added: app 30

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1358763720 <=> 1366003693
  version : 30 <=> 31
Body :
Freq changed: за 140 -> 181
Freq changed: не 140 -> 191
Freq changed: про 131 -> 151
Freq changed: эры 125 -> 140

>>> dictionaries/sv_wordlist.combined.gz
Header :
  date : 1355802856 <=> 1366003804
  version : 29 <=> 31
Body :
Added: vi 180

>>> java/res/raw/main_en.dict
Header :
  date : 1355802851 <=> 1366003861
  version : 29 <=> 31
Body :
Deleted: HTTP 95
Deleted: WWW 71
Added: mm 135

>>> java/res/raw/main_fr.dict
Header :
  date : 1357617878 <=> 1366003217
  version : 29 <=> 31
Body :
Not a word: re false -> true
Shortcut added: re le 15

>>> java/res/raw/main_ru.dict
Header :
  date : 1358763720 <=> 1366003693
  version : 30 <=> 31
Body :
Freq changed: за 140 -> 181
Freq changed: не 140 -> 191
Freq changed: про 131 -> 151
Freq changed: эры 125 -> 140

Bug: 8560415
Bug: 7556679
Change-Id: If1c628edcb1cc5efd67e1715acf94f19c0eb4643
2013-04-15 14:51:02 +09:00
Jean Chalard be94d212e8 Update the Russian dictionary
The point is to get as close as possible to having the
golden Russian tests pass.

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1355818916 <=> 1358763720
  version : 29 <=> 30
Body :
Deleted: НКТ 14
Freq changed: без 0 -> 140
Freq changed: бонус 94 -> 130
Freq changed: за 0 -> 140
Freq changed: на 0 -> 180
Freq changed: не 0 -> 140
Freq changed: парка 133 -> 110
Freq changed: про 0 -> 131
Freq changed: ручьи 93 -> 80
Freq changed: ура 86 -> 100
Freq changed: юрты 86 -> 60
Added: вечерком 100
Added: задачки 100
Added: сорри 100
Added: узнай 100
Added: учти 100

>>> java/res/raw/main_ru.dict
All the same above changes

Change-Id: I8685c34d9ab1dcbf8ae8e23d2e26380059684c95
2013-01-21 19:30:17 +09:00
Jean Chalard 84f932be73 Add words to Portuguese
>>> dictionaries/pt_BR_wordlist.combined.gz
Header :
  date : 1355802839 <=> 1357790917
  version : 29 <=> 30
Body :
Added: à 30
Added: é 30
Added: ò 30
Added: ô 30

>>> dictionaries/pt_PT_wordlist.combined.gz
Header :
  date : 1355802856 <=> 1357790930
  version : 29 <=> 30
Body :
Added: à 30
Added: é 30
Added: ò 30
Added: ô 30

>>> java/res/raw/main_pt_br.dict
Header :
  date : 1355802839 <=> 1357790917
  version : 29 <=> 30
Body :
Added: à 30
Added: é 30
Added: ò 30
Added: ô 30

Bug: 7966948
Change-Id: I71c0986cf616d67926d0a6a0e53099b04b0427d5
2013-01-10 14:14:17 +09:00
Jean Chalard 420528ed97 Update dictionaries
>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1355802835 <=> 1357617878
Body :
Deleted: jai 50

>>> dictionaries/pl_wordlist.combined.gz
Header :
  date : 1355802847 <=> 1357618222
Body :
Added: żebyście 69
Added: żebyśmy 69

>>> java/res/raw/main_fr.dict
Header :
  date : 1355802835 <=> 1357617878
Body :
Deleted: jai 50

Change-Id: I8651a4689bea06d5fe2caead471ef52969c77089
2013-01-08 14:24:22 +09:00
Jean Chalard cd89c5d6ed Update dictionaries
>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1355802857 <=> 1355818916
Body :
Freq changed: БД 18 -> 0
Freq changed: ГБ 14 -> 0
Freq changed: ЕС 44 -> 0
Freq changed: ЖД 3 -> 0
Freq changed: ЖЖ 8 -> 0
Freq changed: ЖК 3 -> 0
Freq changed: ИИ 21 -> 0
Freq changed: КБ 37 -> 0
Freq changed: МБ 19 -> 0
Freq changed: МО 26 -> 0
Freq changed: ОС 40 -> 0
Freq changed: РФ 65 -> 0
Freq changed: СБ 21 -> 0
Freq changed: СК 23 -> 0
Freq changed: ТВ 37 -> 0
Freq changed: УК 36 -> 0
Freq changed: ЦБ 11 -> 0
Freq changed: ЦК 59 -> 0
Deleted: бэ 0
Freq changed: дБ 92 -> 0
Deleted: йо 0
Freq changed: мм 149 -> 0
Freq changed: рН 104 -> 0
Deleted: ша 0

>>> java/res/raw/main_ru.dict
Header :
  date : 1355802857 <=> 1355818916
Body :
Freq changed: БД 18 -> 0
Freq changed: ГБ 14 -> 0
Freq changed: ЕС 44 -> 0
Freq changed: ЖД 3 -> 0
Freq changed: ЖЖ 8 -> 0
Freq changed: ЖК 3 -> 0
Freq changed: ИИ 21 -> 0
Freq changed: КБ 37 -> 0
Freq changed: МБ 19 -> 0
Freq changed: МО 26 -> 0
Freq changed: ОС 40 -> 0
Freq changed: РФ 65 -> 0
Freq changed: СБ 21 -> 0
Freq changed: СК 23 -> 0
Freq changed: ТВ 37 -> 0
Freq changed: УК 36 -> 0
Freq changed: ЦБ 11 -> 0
Freq changed: ЦК 59 -> 0
Deleted: бэ 0
Freq changed: дБ 92 -> 0
Deleted: йо 0
Freq changed: мм 149 -> 0
Freq changed: рН 104 -> 0
Deleted: ша 0

Change-Id: I03f0f4e8d03e0f77f5879e6dd5c424673466afca
2012-12-18 17:25:37 +09:00
Jean Chalard 21dbe3701c Update dictionaries
cs, da, de, el, es, fi, fr, hr, it, lt, lv, nb, nl, pl,
pt_BR, pt_PT, sl, sr, sv, tr : rescale frequencies to match
spec. This has no large effect in the practice except the
dictionary will become stronger vs spatial model (especially in
lower count corpora, like lt, lv, sr)
en* : Small changes (rounding going the other way essentially)
ru : the above rescaling, and remove the following words:
Дре, ОСТа, Планше, легкими, легком, легкому, легкости,
легкую, нелегкие, нелегкий, нелегким, нелегкое, нелегкой,
нелегкую, полулегком and add нелёгкие, нелёгкое, нелёгкую;
other accented forms were already in the dictionary.

Change-Id: I40386c2ebd4d2be38874e822bde89db7cb512ae6
2012-12-18 13:06:48 +09:00
Jean Chalard d080986f93 Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1354870724 <=> 1355112440
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1354870736 <=> 1355112451
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1354870744 <=> 1355112460
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/es_wordlist.combined.gz
Header :
  date : 1351676002 <=> 1355117676
  version : 26 <=> 28
Body :
Deleted: DoCoMo 40
Added: Docomo 40
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/fi_wordlist.combined.gz
Header :
  date : 1351676054 <=> 1355117691
  version : 26 <=> 28
Body :
Deleted: DoCoMo 28
Added: Docomo 28
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1354872988 <=> 1355117708
  version : 27 <=> 28
Body :
Deleted: DoCoMo 52
Added: Docomo 52
Added: KDDI 25
Added: Softbank 25

>>> dictionaries/pt_PT_wordlist.combined.gz
Header :
  date : 1351676510 <=> 1355117723
  version : 26 <=> 28
Body :
Deleted: DoCoMo 48
Added: Docomo 48
Added: Softbank 25

>>> java/res/raw/main_en.dict
Header :
  date : 1354870744 <=> 1355112460
  version : 27 <=> 28
Body :
Deleted: DoCoMo 65
Added: Docomo 65
Added: KDDI 25
Added: Softbank 25

>>> java/res/raw/main_es.dict
Header :
  date : 1353500806 <=> 1355117676
  version : 27 <=> 28
Body :
Deleted: DoCoMo 40
Added: Docomo 40
Added: KDDI 25
Added: Softbank 25

>>> java/res/raw/main_fr.dict
Header :
  date : 1354872988 <=> 1355117708
  version : 27 <=> 28
Body :
Deleted: DoCoMo 52
Added: Docomo 52
Added: KDDI 25
Added: Softbank 25

Change-Id: I3801cbe4535407f55ede8db327674d493a92d1ae
2012-12-10 14:52:43 +09:00
Jean Chalard bd793ed50d Update dictionaries
>>> dictionaries/en_GB_wordlist.combined.gz
Header :
  date : 1353500789 <=> 1354870724
Body :
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/en_US_wordlist.combined.gz
Header :
  date : 1351675958 <=> 1354870736
  version : 26 <=> 27
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/en_wordlist.combined.gz
Header :
  date : 1353500998 <=> 1354870744
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> dictionaries/fr_wordlist.combined.gz
Header :
  date : 1353500832 <=> 1354872988
Body :
Deleted: noël 71
Deleted: po 73
Deleted: ti 73
Added: Noël 71
Added: lose 1
Added: y'a 130

>>> dictionaries/ru_wordlist.combined.gz
Header :
  date : 1353567943 <=> 1354870130
Body :
Demote all CAPS words by 80
Freq changed: модно 51 -> 20

>>> java/res/raw/main_en.dict
Header :
  date : 1353500998 <=> 1354870744
Body :
Deleted: Rod's 46
Added: Dad 75
Added: Daddy 60
Added: Grandma 60
Added: Grandpa 55
Added: Mama 59
Added: Mom 77
Added: Papa 55

>>> java/res/raw/main_fr.dict
Header :
  date : 1353500832 <=> 1354872988
Body :
Deleted: noël 71
Deleted: po 73
Deleted: ti 73
Added: Noël 71
Added: lose 1
Added: y'a 130

>>> java/res/raw/main_ru.dict
Header :
  date : 1353567943 <=> 1354870130
Body :
Demote all CAPS words by 80
Freq changed: модно 51 -> 20

Change-Id: I6f2d1c359d716535923b22c33d7fa4c3b0a330e4
2012-12-07 18:52:21 +09:00
Jean Chalard b40a1ce50b Update RU dictionary header.
>>> dictionaries/ru_wordlist.combined.gz
>>> java/res/raw/main_ru.dict
Header :
  date : 1353500945 <=> 1353567943
  MULTIPLE_WORDS_DEMOTION_RATE : null <=> 0
Body :
  No differences

Bug: 7540132
Change-Id: I837831b1e214da64962cf1bb68c840a3d4e6bf76
2012-11-22 16:21:10 +09:00
Jean Chalard d5f53710c5 Update dictionaries and fix mistakes
- Combined de dict :
  Remove digraph shortcuts that were in by mistake.
- Combined en dict :
  Set freq of "baton" "batons" "mace" "puff"
  "puffs" and "tasers" to zero. They are offensive
  in en_GB.
- Combined en_GB dict :
  Change freq of "il" to 0 and flag it "not a word". Still
  in the dict as a whitelist entry for "I'll"; for some
  reason it had freq 99.
  Add "milk:122" and "practice:143"
- Combined fr dict :
  Add missing words : "Nostradamus:40" "défendais:30"
  "gmail:50" "générale:140" "hm:0" "hmm:0" "y'en:130"
  "l'apocalypse:31" "m'épuise:30" "recontacter:80"
  "t'annonce:30"
  Set freq of non-word shortcuts for digraphs to 1 instead
  of 0, allowing to gesture them.
- Combined ru dict :
  Remove a lot of two-character non-words.

- Binary de dict :
  Remove the obsolete "options" header, and add the "dictionary"
  header.
- Binary en dict :
  Flag "hoe" "hoes" "il" "shel" as non-words.
  Also drop freq of "il" and "shel" to 0
  Add the "locale" header that was missing.
- Binary es dict :
  Add the "dictionary" header.
- Binary fr dict :
  Add the same words as above. Non-word shortcuts were already
  set to 1.
- Binary it dict :
  Add a "dictionary" header. Also change freq of
  "Šarapova" from 50 to 37; not sure why it was 50.
- Binary pt_BR dict :
  Add a "dictionary" header.
- Binary ru dict :
  Add a "dictionary" header and remove the same words as above.

For all dictionaries : bump the version to 27.

Change-Id: I94fe7f8f42b31fdad223085c00a94115e14d2276
2012-11-21 22:03:24 +09:00
Jean Chalard f5adbb1e1b Move the emoji dictionaries source under AOSP.
Change-Id: Ie870a90d483d9f27aed96fb4b44126315c43922f
2012-10-31 19:22:42 +09:00
Jean Chalard a424ff06ec Switch the AOSP word lists to the combined format.
This will help with managing the word lists.

Bug: 7388859
Change-Id: I89f049569b177d3027fe56d6c67eaca27d44dc7d
2012-10-31 18:52:00 +09:00
Jean Chalard 306e0a800f Update AOSP dictionaries.
Changes :
- Add "emoji"
- Change the whitelist target of "foo" from "for" to "too"
- Fix non-word frequencies to 0
- Fix the freq of common en_US vs en_GB words
- Add "connection" to the en_GB dictionary

Bug: 7368441
Bug: 7370033
Bug: 7371955
Change-Id: Ib22a97e97b486b05012d5496619557f406c441b9
2012-10-24 16:12:28 +09:00
Jean Chalard 3d83a1648b Update AOSP dictionaries.
Differences :
oh 90 -> 105
ooh 54 -> 54
hoy,kinkier,kinkiest,kinkiness,kinkily,kinky -> 0
trst -> remove

New whitelist entries (actually old that had not been applied)
"berm" -> "been"
"foe" -> "for"
"hid" -> "his"
"thong" -> "thing"

French :
Add "six" and remove some non-words

Bug: 7329149
Bug: 7356297
Change-Id: I55092f0538db8627148b0a314e50eff926c47275
2012-10-18 00:39:16 +09:00
Jean Chalard b24cda3c0c Fix the Danish dictionary
Human error: this contained "Nederlands" ("Dutch" in Dutch) as the
human-readable description in the header.

Bug: 7272686
Change-Id: I7a67e7bf1afca6928de7825fb63c5b213e8d7978
2012-10-04 15:52:50 +09:00
Jean Chalard a44942810d Update the AOSP dictionaries for the 0-freq review
Bug: 7227265
Change-Id: I384f7d76cef67b96b106ddac96e4baf1fa32afd4
2012-10-03 21:15:27 +09:00
Jean Chalard d0cf96493c Use all Lexiteria sources and update existing directories.
New dictionaries :
- Danish
- Greek
- Finnish
- Lithuanian
- Latvian
- Dutch
- Polish
- Russian
- Slovene
- Serbian
- Swedish
- Turkish

Also, compress those files to reduce the footprint in the
repository.
Also, update and improve English and French dictionaries, and
add the ligatures shortcut into the French dictionary.
Finally, move the Russian binary dictionary here now that it
can at last be open sourced.

Bug: 5587752
Bug: 6775251
Bug: 6995793
Bug: 7149666
Change-Id: Iec9831d4dce425a2b5b0657571e4448436610525
2012-09-21 22:07:23 +09:00
Jean Chalard c278142745 Remove useless backslashes from the whitelist dictionary
For some reason, these are necessary for resources, but XML
standard does not require them.

Change-Id: I7cdaecb6815aa4020e0d453e33be38ff2968df50
2012-08-13 15:53:07 +09:00
Jean Chalard 1d8103ea57 Add a shortcut-format version of the whitelist.
This will ultimately replace the whitelist resource, but
this change doesn't delete it to avoid removing the functionality
temporarily.

Bug: 6906525
Change-Id: I576edc42cd2a964b86b7597f1ede1cf6ec8e26c3
2012-08-10 15:51:18 +09:00
Jean Chalard 6f7b1ff468 Update dictionaries.
- English : some words caught through regression tests
- English : some words externally reported
- French : some words externally reported
- French : finished review of all accented words

Bug: 6726969
Bug: 6730031
Change-Id: I37d0dc310db2c79e03ac7ad452391e92d9b13357
2012-06-29 19:30:01 +09:00
Jean Chalard 401e70535e Make sure whitelist targets are in the main dictionary
Bug: 6680976
Change-Id: Ieddb5eecb813da3a8a515930568e356bc3526386
2012-06-19 02:08:57 +09:00
Jean Chalard 79451e0a70 Update dictionaries.
- English dict scrubbed for distractors
- EN, FR, IT, DE include improvements from user feedback

Bug: 6394369
Change-Id: I9af5415d0b6a5edfea2956657b0fee7906ebb344
2012-06-16 04:25:43 +09:00