Keisuke Kuroyanagi
a88c9682fc
Merge "Change v403 historical info format."
2014-10-31 13:38:38 +00:00
Keisuke Kuroyanagi
3cde19ded1
Merge "Initial commit for native dicttoolkit."
2014-10-31 11:29:20 +00:00
Keisuke Kuroyanagi
e101a53ffc
Initial commit for native dicttoolkit.
...
Bug: 10059681
Change-Id: Ib730af8ebc944e08aaada869c0626724a499747c
2014-10-31 20:27:06 +09:00
Keisuke Kuroyanagi
2383575d2d
Change v403 historical info format.
...
count -> 2B, level -> 0B.
Change-Id: I3b241126f56eb33cdf09cb1ebfed04f534e4ec48
2014-10-31 17:22:13 +09:00
Adrian Velicu
009e02ce4a
Further fixes to treat 0-frequency words
...
Previously, when both legitimate 0-frequency words (such as
distracters) and offensive words were encoded in the same
way, distracters would never show up when the user blocked
offensive words (the default setting, as well as the setting
for regression tests).
When b/11031090 was fixed and a separate encoding was used
for offensive words, 0-frequency words would no longer be
blocked when they were an "exact match" (where case
mismatches and accent mismatches would be considered an
"exact match"). The exact match boosting functionality meant
that, for example, when the user typed "mt" they would be
suggested the word "Mt", although they most probably meant
to type "my".
For this reason, we introduced this change, which does the
following:
* Defines the "perfect match" as a really exact match, with
no room for case or accent mismatches
* When the target word has probability zero (as "Mt" does,
because it is a distracter), ONLY boost its score if it is a
perfect match.
By doing this, when the user types "mt", the word "Mt" will
NOT be boosted, and they will get "my". However, if the user
makes an explicit effort to type "Mt", we do boost the word
"Mt" so that the user's input is not autocorrected to "My".
Bug: 11031090
Change-Id: I92ee1b4e742645d52e2f7f8c4390920481e8fff0
2014-10-31 15:58:50 +09:00
Adrian Velicu
10416241f7
Block offensive words in multi-word suggestions
...
If the user has chosen to block offensive words and types
"aaaxbb", where "aaa" is an offensive word and "bb" is not,
we should not suggest "aaa bb".
Bug: 11031090
Change-Id: Ie23b8dd5d347bc26b1c046c3f5e8dfbc259bf528
2014-10-31 15:58:50 +09:00
Adrian Velicu
aa20342d7e
Merge "Using "blacklist" flag as "possibly offensive""
2014-10-31 06:49:29 +00:00
Adrian Velicu
7c87859d4c
Using "blacklist" flag as "possibly offensive"
...
Bug: 11031090
Change-Id: I5cc0d006ab003656498eb82b0875eb9c051d331e
2014-10-31 14:33:05 +09:00
Keisuke Kuroyanagi
0cd1f222fd
Fix: native unit test build.
...
Change-Id: Id2bd4b60d6a4023815a630ebb3059a435b72c193
2014-10-31 12:50:45 +09:00
Keisuke Kuroyanagi
bcb52d73e2
Enable count based dynamic ngram language model for v403.
...
Bug: 14425059
Change-Id: Icc15e14cfd77d37cd75f75318fd0fa36f9ca7a5b
2014-10-30 23:38:19 +09:00
Keisuke Kuroyanagi
660b00477c
Add DynamicLanguageModelProbabilityUtils.
...
Bug: 14425059
Change-Id: Ia58ab3f0ead02798046d182a9464dcbd95f086bc
2014-10-30 21:33:57 +09:00
Keisuke Kuroyanagi
0a9c3f30b6
Add method to encode probability.
...
Bug: 14425059
Change-Id: I3e5d359ba5fa38f1669f0e98dfae792ff53efbf8
2014-10-30 12:42:35 +09:00
Keisuke Kuroyanagi
c2ba0ce411
Fix: TRT and ime-simulator bulid.
...
Change-Id: I1697a907562d1ed6aff2b001763d1594263ba0d3
2014-10-30 01:01:40 +09:00
Keisuke Kuroyanagi
afe67611c3
Merge "Add a class to have global counters for LanguageModelDictContent."
2014-10-29 12:18:12 +00:00
Keisuke Kuroyanagi
6b0561f9d2
Add a class to have global counters for LanguageModelDictContent.
...
Bug: 14425059
Change-Id: I08ec19903432356b6028853fd73b4eefce20218e
2014-10-29 21:05:41 +09:00
Keisuke Kuroyanagi
dabc12974c
Merge "Improve space substitution error correction."
2014-10-28 09:26:40 +00:00
Keisuke Kuroyanagi
8a809f3433
Improve space substitution error correction.
...
Bug: 17432052
[Category diff]
+1 262
-1 93
+2 2
-2 18
+3 18
-3 2
+4 111
-4 148
+5 295
-5 217
+6 51
-6 276
+7 139
-7 124
[Weighted category diff]
+1 276
-1 100
+2 4
-2 20
+3 20
-3 4
+4 118
-4 160
+5 309
-5 225
+6 52
-6 298
+7 163
-7 135
show diff for ./en_user_log_phones_2011_08.csv
+1 173
-1 28
+2 2
-2 17
+3 17
-3 2
+4 63
-4 82
+5 120
-5 51
+6 24
-6 220
+7 88
-7 87
Change-Id: I9d673acb0ff632828ae2e0ead56e76e3a20411c6
2014-10-28 17:11:14 +09:00
Keisuke Kuroyanagi
3844f74aff
Fix: deleted PtNode handling in v403.
...
If a word is once deleted, the word never gets into the
personalized dictionaries due to this bug.
Change-Id: Ife4e3fe1ba0615b4135e6291d2151b0db7d3f940
2014-10-27 15:32:05 +09:00
Yohei Yukawa
69402dc992
Merge "Enable Address Sanitizer for native host test 2nd try"
2014-10-23 16:07:39 +00:00
Keisuke Kuroyanagi
e65973882d
Merge "Fix: Personalized dicts suggest invalid words with v403."
2014-10-23 10:33:09 +00:00
Keisuke Kuroyanagi
090c3819d7
Fix: Personalized dicts suggest invalid words with v403.
...
Bug: 14425059
Change-Id: I45ae00069dd3b7c461dd9a1f3558b96af0a1c975
2014-10-23 19:26:01 +09:00
Yohei Yukawa
5c4bec31d1
Enable Address Sanitizer for native host test 2nd try
...
This CL enables Address Sanitizer for native host test. Note that
production build is not affected with this change. ASan is enabled
only in static lib for test executables.
Change-Id: I2c8e99b8c55e611e86f74579f24a63ac949bb02d
2014-10-23 10:16:55 +00:00
Yohei Yukawa
2db1e56ff4
Merge "Stop building host native test in unbundled build"
2014-10-23 09:39:56 +00:00
Yohei Yukawa
ba35bb83a8
Stop building host native test in unbundled build
...
It turned out that building native code for host environment
is not supported in NDK build. Hence this CL makes the host
native test available only as a part of platform build to
avoid accidental build breakage in unbundled build.
BUG: 18095678
Change-Id: If608da166d5a478358e6890b8db526b4c2c0ab41
2014-10-23 18:31:06 +09:00
Keisuke Kuroyanagi
16cc3992d7
Use trigrams for personalization dict.
...
5Bug: 14425059
Change-Id: I73cf6904e569d60996a3b079f16ea6df0cb90f02
2014-10-23 14:32:45 +09:00
Yohei Yukawa
9c0b3419da
Merge "Revert "Enable ASan (Address Sanitizer) for native host test""
2014-10-22 10:54:18 +00:00
Yohei Yukawa
b9dc32ffd5
Revert "Enable ASan (Address Sanitizer) for native host test"
...
This reverts commit af2673f17d
because of build failure in tapas build.
Change-Id: Ib02931116181c98b35ce938e42d2376225e9b255
2014-10-22 10:51:33 +00:00
Yohei Yukawa
0672e8554f
Merge "Enable ASan (Address Sanitizer) for native host test"
2014-10-22 10:13:08 +00:00
Yohei Yukawa
af2673f17d
Enable ASan (Address Sanitizer) for native host test
...
This CL enables Address Sanitizer for native host test. Note that
production build is not affected with this change. ASan is enabled
only in static lib for test executables.
Change-Id: Idbe1f2e4502dfce9b6fb0253d7ebda8d37fbf84e
2014-10-22 19:08:58 +09:00
Keisuke Kuroyanagi
b5ef884fbb
Support dumping ngram entries.
...
Bug: 14425059
Change-Id: Ib03a0c3d166ed6f1e60c67127b28006d55143b6b
2014-10-22 18:15:53 +09:00
Keisuke Kuroyanagi
c9865785f4
Support ngram entry migration.
...
Bug: 14425059
Change-Id: I98cb9fa303af2d93a0a3512e8732231c564e3c5d
2014-10-22 11:31:16 +09:00
Keisuke Kuroyanagi
0b8bb0c21b
Fix debug build.
...
Change-Id: Id94636714d04a8828718b87741c0ee62a14cb3b4
2014-10-21 20:20:11 +09:00
Keisuke Kuroyanagi
dfc82fa366
Merge changes I210acb81,Ie9508788
...
* changes:
Make NgramProperty have NgramContext.
Create .cpp file for NgramContext.
2014-10-21 10:28:25 +00:00
Keisuke Kuroyanagi
88bb28c132
Make NgramProperty have NgramContext.
...
Bug: 14425059
Change-Id: I210acb816b122857dbbe1ee4dd6a35c5335bf2bf
2014-10-21 17:12:32 +09:00
Keisuke Kuroyanagi
f87bb77a91
Create .cpp file for NgramContext.
...
Bug: 14425059
Change-Id: Ie950878817b9c80cc9c970e1a84880c9b9ab228a
2014-10-21 17:04:56 +09:00
Keisuke Kuroyanagi
fa1e65cb3a
Merge "Use EntryCounters during GC."
2014-10-21 07:55:04 +00:00
Adrian Velicu
c51b9b5b3f
Merge "Renaming "blacklist" flag to "possibly offensive""
2014-10-21 07:39:18 +00:00
Keisuke Kuroyanagi
47fc656cd7
Use EntryCounters during GC.
...
Bug: 14425059
Change-Id: I61eb798686dc753fb6c0fe99a0719c1732198f30
2014-10-21 16:36:03 +09:00
Keisuke Kuroyanagi
e8750d970e
Introduce EntryCounters to count entries in a dictionary.
...
Bug: 14425059
Change-Id: Ic13ba827d96fa4a147485ba92fdb37e23e04e8e8
2014-10-21 15:46:14 +09:00
Adrian Velicu
05172bf1a5
Renaming "blacklist" flag to "possibly offensive"
...
No behaviour changes.
Unified the overloaded FusionDictionary::add method to always take an
isPossiblyOffensive argument.
Bug: 11031090
Change-Id: I5741a023ca1ce842d2cf10d4f6c926b0efabaa78
2014-10-21 11:51:47 +09:00
Keisuke Kuroyanagi
1085fef8d0
Change entry count limit.
...
Unigram 10K, Bigram 30K, Trigram 30K.
Change-Id: Ibd19c6a2b618499df1c70000bad7b47498187f0a
2014-10-20 15:01:49 +09:00
Keisuke Kuroyanagi
f4928ad4dd
Merge "Update useless n-gram entry detection logic during GC."
2014-10-15 21:44:45 +00:00
Keisuke Kuroyanagi
3601c214f8
Update useless n-gram entry detection logic during GC.
...
Bug: 14425059
Change-Id: Ib939deae5b60167751dee07965bb1ef1a43c4625
2014-10-15 20:43:27 +09:00
Keisuke Kuroyanagi
183e21c36c
Merge "Use better conditional probability for ngram entries."
2014-10-15 09:27:21 +00:00
Keisuke Kuroyanagi
72d17d9209
Use better conditional probability for ngram entries.
...
Old:
P(W | W_prev) = f(W, W_prev) + C
New:
P(W | W_prev) = f(W, W_prev) / f(W_prev)
Bug: 14425059
Bug: 16547409
Change-Id: I4d13be6de2c6bad6bad7fb22320a23ba4ecd361c
2014-10-15 18:23:00 +09:00
Keisuke Kuroyanagi
c2429c54ac
Merge "Move entry updating method to language model dict content."
2014-10-15 04:51:04 +00:00
Keisuke Kuroyanagi
5400701908
Move entry updating method to language model dict content.
...
Bug: 14425059
Change-Id: I710055490d141539458cbf968adf5a7ccffd9552
2014-10-15 12:29:31 +09:00
Keisuke Kuroyanagi
d8ccb9093b
Quit using weightChildNode for ADDITIONAL_PROXIMITY and SUBSTITUTION.
...
[Category diff]
+1 0
-1 1
+2 0
-2 0
+3 0
-3 0
+4 1
-4 1
+5 8
-5 7
+6 0
-6 1
+7 1
-7 0
[Weighted category diff]
+1 0
-1 1
+2 0
-2 0
+3 0
-3 0
+4 1
-4 1
+5 8
-5 7
+6 0
-6 1
+7 1
-7 0
Bug: 13756409
Change-Id: I6ac3567545676bbefbee3e87dda54bc083c15fb6
2014-10-14 20:20:55 +09:00
Keisuke Kuroyanagi
d1471ee053
Merge "Remove shouldBlockAutoCorrectionBySafetyNet"
2014-10-14 10:52:32 +00:00
Keisuke Kuroyanagi
29b4f7aa67
Remove shouldBlockAutoCorrectionBySafetyNet
...
Bug: 13756409
[Category diff]
+1 27
-1 0
+2 0
-2 0
+3 0
-3 1
+4 11
-4 0
+5 51
-5 0
+6 0
-6 38
+7 0
-7 50
[Weighted category diff]
+1 28
-1 0
+2 0
-2 0
+3 0
-3 1
+4 11
-4 0
+5 51
-5 0
+6 0
-6 39
+7 0
-7 50
show diff for ./en_user_log_phones_2011_08.csv
+1 4
+4 5
+5 7
-6 9
-7 7
The increase of false positives comes from the spaceless
typing test cases that are synthetic data.
Change-Id: I4ea77aa56ebfaa5518c71107169e1d2332de6327
2014-10-14 11:20:33 +09:00