Commit Graph

1384 Commits (79273b04772824a8c547e1a8d33900040a03264b)

Author SHA1 Message Date
Keisuke Kuroyanagi 79273b0477 Define arguments for commands in dicttoolkit.
Bug: 10059681
Change-Id: I1ceaeeaa9e2055c357fe969818498de9d6288862
2014-11-15 09:58:19 +09:00
Keisuke Kuroyanagi 52582a22d1 Merge "Add OffdeviceIntermediateDictHeader." 2014-11-13 01:59:10 +00:00
Keisuke Kuroyanagi 99754e2d3e Add OffdeviceIntermediateDictHeader.
Used to have header information in OffdeviceIntermediateDict.

Bug: 10059681

Change-Id: I966c26e514ddd229cf5597d3b96941234c530863
2014-11-13 01:57:42 +00:00
Keisuke Kuroyanagi bae0fff04a Merge "Utf8Utils for dicttoolkit." 2014-11-13 01:56:56 +00:00
Keisuke Kuroyanagi f0c303dd02 Utf8Utils for dicttoolkit.
Bug: 10059681
Change-Id: Ie484ba8096823792f0ac663524d1c02d1be070e9
2014-11-13 10:47:37 +09:00
Keisuke Kuroyanagi da99cfc29d Merge "Introduce OffdeviceIntermediateDict for dicttolkit." 2014-11-11 21:21:04 +00:00
Keisuke Kuroyanagi cd10540973 Introduce OffdeviceIntermediateDict for dicttolkit.
Bug: 10059681
Change-Id: Ib6e9019502b59dd959c04c8f4996ca932c2b1ba8
2014-11-12 04:08:25 +09:00
Keisuke Kuroyanagi 580420d21b Implement IntArrayView::split for dicttoolkit.
Bug: 10059681
Change-Id: Ic29e79d049bb532727cf5cb1e529fec5d35156ed
2014-11-11 15:06:48 +09:00
Keisuke Kuroyanagi 0c1822df5b Merge "Implement help command for dicttoolkit." 2014-11-10 18:53:19 +00:00
Keisuke Kuroyanagi b23f03488f Merge "Use reference instead of pointer for WordProperty()." 2014-11-10 18:32:24 +00:00
Keisuke Kuroyanagi 7d5420aa5e Make profiler use getTimeInMicroSec().
Bug: 17797064
Change-Id: Ie992c9454edfc3bf93d5ea367c3a4427b513a205
2014-11-11 01:38:49 +09:00
Keisuke Kuroyanagi 395f6e7020 Implement help command for dicttoolkit.
Bug: 10059681
Change-Id: I0cadf1f80103136cdac5c00b6fca4d81b4bf7384
2014-11-11 00:18:25 +09:00
Keisuke Kuroyanagi bbf0d4141b Use reference instead of pointer for WordProperty().
Change-Id: Idf03e97661d64186c752e35964d641a5528be5b1
2014-11-10 09:15:11 +09:00
Keisuke Kuroyanagi bd48963bdf Add CommandExecutor for dicttoolkit.
Bug: 10059681
Change-Id: I90334caaf37c84ce7d1b93d12efbfb5f244a9420
2014-11-09 06:22:28 +09:00
Keisuke Kuroyanagi 4bfa3b273e Introduce CommandUtils for dicttoolkit
Bug: 10059681
Change-Id: Ic6947e76d77dc87bf88dc3a2b749e41fae7553b7
2014-11-08 09:58:26 +09:00
Keisuke Kuroyanagi 2cf5550749 Fix: BoS prediction after inputting just once.
Change-Id: Ib69569ab6b6edfcc8c1d2c621b95de4127789ab6
2014-11-01 17:58:22 +09:00
Keisuke Kuroyanagi b3bae2e89b Merge "Update v4 format version from 402 to 403." 2014-10-31 14:19:44 +00:00
Keisuke Kuroyanagi ef931546a0 Merge "Add hacks for better handling count value during migration." 2014-10-31 13:53:57 +00:00
Keisuke Kuroyanagi a88c9682fc Merge "Change v403 historical info format." 2014-10-31 13:38:38 +00:00
Keisuke Kuroyanagi 3cde19ded1 Merge "Initial commit for native dicttoolkit." 2014-10-31 11:29:20 +00:00
Keisuke Kuroyanagi e101a53ffc Initial commit for native dicttoolkit.
Bug: 10059681

Change-Id: Ib730af8ebc944e08aaada869c0626724a499747c
2014-10-31 20:27:06 +09:00
Keisuke Kuroyanagi ea468cc9de Update v4 format version from 402 to 403.
Without personalization:
Total words: 1134774, Success Num: 899230, Success Percentage: 79.243%
Bad Failures, with auto-correction (typed word == expected word, output word != expected word): 1871, Bad Failure Percentage: 0.165%
Failures, with auto-correction (F-C): 29084, F-C Percentage: 2.563%
Max Keystrokes: 6072959, Min Keystrokes: 4436090, Keystroke Saving Percentage:26.953%

Before:
Total words: 1134646, Success Num: 925194, Success Percentage: 81.540%
Bad Failures, with auto-correction (typed word == expected word, output word != expected word): 1316, Bad Failure Percentage: 0.116%
Failures, with auto-correction (F-C): 28288, F-C Percentage: 2.493%
Max Keystrokes: 6072831, Min Keystrokes: 3946188, Keystroke Saving Percentage:35.019%

After
Total words: 1134659, Success Num: 944746, Success Percentage: 83.263%
Bad Failures, with auto-correction (typed word == expected word, output word != expected word): 1258, Bad Failure Percentage: 0.111%
Failures, with auto-correction (F-C): 28016, F-C Percentage: 2.469%
Max Keystrokes: 6072844, Min Keystrokes: 3387333, Keystroke Saving Percentage:44.222%

Change-Id: I3af42ec37a11847c0429c28616e726f6a339247f
2014-10-31 17:23:39 +09:00
Keisuke Kuroyanagi c611989929 Add hacks for better handling count value during migration.
Bug: 14425059
Change-Id: Ib050574aa7c4babd4285322a11c3af9be9fbab1e
2014-10-31 17:22:13 +09:00
Keisuke Kuroyanagi 2383575d2d Change v403 historical info format.
count -> 2B, level -> 0B.

Change-Id: I3b241126f56eb33cdf09cb1ebfed04f534e4ec48
2014-10-31 17:22:13 +09:00
Adrian Velicu 009e02ce4a Further fixes to treat 0-frequency words
Previously, when both legitimate 0-frequency words (such as
distracters) and offensive words were encoded in the same
way, distracters would never show up when the user blocked
offensive words (the default setting, as well as the setting
for regression tests).

When b/11031090 was fixed and a separate encoding was used
for offensive words, 0-frequency words would no longer be
blocked when they were an "exact match" (where case
mismatches and accent mismatches would be considered an
"exact match"). The exact match boosting functionality meant
that, for example, when the user typed "mt" they would be
suggested the word "Mt", although they most probably meant
to type "my".

For this reason, we introduced this change, which does the
following:
* Defines the "perfect match" as a really exact match, with
no room for case or accent mismatches
* When the target word has probability zero (as "Mt" does,
because it is a distracter), ONLY boost its score if it is a
perfect match.

By doing this, when the user types "mt", the word "Mt" will
NOT be boosted, and they will get "my". However, if the user
makes an explicit effort to type "Mt", we do boost the word
"Mt" so that the user's input is not autocorrected to "My".

Bug: 11031090
Change-Id: I92ee1b4e742645d52e2f7f8c4390920481e8fff0
2014-10-31 15:58:50 +09:00
Adrian Velicu 10416241f7 Block offensive words in multi-word suggestions
If the user has chosen to block offensive words and types
"aaaxbb", where "aaa" is an offensive word and "bb" is not,
we should not suggest "aaa bb".

Bug: 11031090
Change-Id: Ie23b8dd5d347bc26b1c046c3f5e8dfbc259bf528
2014-10-31 15:58:50 +09:00
Adrian Velicu aa20342d7e Merge "Using "blacklist" flag as "possibly offensive"" 2014-10-31 06:49:29 +00:00
Adrian Velicu 7c87859d4c Using "blacklist" flag as "possibly offensive"
Bug: 11031090
Change-Id: I5cc0d006ab003656498eb82b0875eb9c051d331e
2014-10-31 14:33:05 +09:00
Keisuke Kuroyanagi 0cd1f222fd Fix: native unit test build.
Change-Id: Id2bd4b60d6a4023815a630ebb3059a435b72c193
2014-10-31 12:50:45 +09:00
Keisuke Kuroyanagi bcb52d73e2 Enable count based dynamic ngram language model for v403.
Bug: 14425059

Change-Id: Icc15e14cfd77d37cd75f75318fd0fa36f9ca7a5b
2014-10-30 23:38:19 +09:00
Keisuke Kuroyanagi 660b00477c Add DynamicLanguageModelProbabilityUtils.
Bug: 14425059
Change-Id: Ia58ab3f0ead02798046d182a9464dcbd95f086bc
2014-10-30 21:33:57 +09:00
Keisuke Kuroyanagi 0a9c3f30b6 Add method to encode probability.
Bug: 14425059
Change-Id: I3e5d359ba5fa38f1669f0e98dfae792ff53efbf8
2014-10-30 12:42:35 +09:00
Keisuke Kuroyanagi c2ba0ce411 Fix: TRT and ime-simulator bulid.
Change-Id: I1697a907562d1ed6aff2b001763d1594263ba0d3
2014-10-30 01:01:40 +09:00
Keisuke Kuroyanagi afe67611c3 Merge "Add a class to have global counters for LanguageModelDictContent." 2014-10-29 12:18:12 +00:00
Keisuke Kuroyanagi 6b0561f9d2 Add a class to have global counters for LanguageModelDictContent.
Bug: 14425059
Change-Id: I08ec19903432356b6028853fd73b4eefce20218e
2014-10-29 21:05:41 +09:00
Keisuke Kuroyanagi dabc12974c Merge "Improve space substitution error correction." 2014-10-28 09:26:40 +00:00
Keisuke Kuroyanagi 8a809f3433 Improve space substitution error correction.
Bug: 17432052

[Category diff]
+1     262
-1      93
+2       2
-2      18
+3      18
-3       2
+4     111
-4     148
+5     295
-5     217
+6      51
-6     276
+7     139
-7     124

[Weighted category diff]
+1     276
-1     100
+2       4
-2      20
+3      20
-3       4
+4     118
-4     160
+5     309
-5     225
+6      52
-6     298
+7     163
-7     135

show diff for ./en_user_log_phones_2011_08.csv
+1     173
-1      28
+2       2
-2      17
+3      17
-3       2
+4      63
-4      82
+5     120
-5      51
+6      24
-6     220
+7      88
-7      87

Change-Id: I9d673acb0ff632828ae2e0ead56e76e3a20411c6
2014-10-28 17:11:14 +09:00
Keisuke Kuroyanagi 3844f74aff Fix: deleted PtNode handling in v403.
If a word is once deleted, the word never gets into the
personalized dictionaries due to this bug.

Change-Id: Ife4e3fe1ba0615b4135e6291d2151b0db7d3f940
2014-10-27 15:32:05 +09:00
Yohei Yukawa 69402dc992 Merge "Enable Address Sanitizer for native host test 2nd try" 2014-10-23 16:07:39 +00:00
Keisuke Kuroyanagi e65973882d Merge "Fix: Personalized dicts suggest invalid words with v403." 2014-10-23 10:33:09 +00:00
Keisuke Kuroyanagi 090c3819d7 Fix: Personalized dicts suggest invalid words with v403.
Bug: 14425059
Change-Id: I45ae00069dd3b7c461dd9a1f3558b96af0a1c975
2014-10-23 19:26:01 +09:00
Yohei Yukawa 5c4bec31d1 Enable Address Sanitizer for native host test 2nd try
This CL enables Address Sanitizer for native host test. Note that
production build is not affected with this change.  ASan is enabled
only in static lib for test executables.

Change-Id: I2c8e99b8c55e611e86f74579f24a63ac949bb02d
2014-10-23 10:16:55 +00:00
Yohei Yukawa 2db1e56ff4 Merge "Stop building host native test in unbundled build" 2014-10-23 09:39:56 +00:00
Yohei Yukawa ba35bb83a8 Stop building host native test in unbundled build
It turned out that building native code for host environment
is not supported in NDK build.  Hence this CL makes the host
native test available only as a part of platform build to
avoid accidental build breakage in unbundled build.

BUG: 18095678
Change-Id: If608da166d5a478358e6890b8db526b4c2c0ab41
2014-10-23 18:31:06 +09:00
Keisuke Kuroyanagi 16cc3992d7 Use trigrams for personalization dict.
5Bug: 14425059
Change-Id: I73cf6904e569d60996a3b079f16ea6df0cb90f02
2014-10-23 14:32:45 +09:00
Yohei Yukawa 9c0b3419da Merge "Revert "Enable ASan (Address Sanitizer) for native host test"" 2014-10-22 10:54:18 +00:00
Yohei Yukawa b9dc32ffd5 Revert "Enable ASan (Address Sanitizer) for native host test"
This reverts commit af2673f17d
because of build failure in tapas build.

Change-Id: Ib02931116181c98b35ce938e42d2376225e9b255
2014-10-22 10:51:33 +00:00
Yohei Yukawa 0672e8554f Merge "Enable ASan (Address Sanitizer) for native host test" 2014-10-22 10:13:08 +00:00
Yohei Yukawa af2673f17d Enable ASan (Address Sanitizer) for native host test
This CL enables Address Sanitizer for native host test. Note that
production build is not affected with this change. ASan is enabled
only in static lib for test executables.

Change-Id: Idbe1f2e4502dfce9b6fb0253d7ebda8d37fbf84e
2014-10-22 19:08:58 +09:00
Keisuke Kuroyanagi b5ef884fbb Support dumping ngram entries.
Bug: 14425059
Change-Id: Ib03a0c3d166ed6f1e60c67127b28006d55143b6b
2014-10-22 18:15:53 +09:00