satok
985312e88f
Refactor the correction algorithm related to missing character correction
...
Change-Id: If68f2aaea7df48d013aea5401cee4ec0df32111a
2011-08-09 12:53:12 +09:00
satok
8876b75ca1
Move scoring part to the correction state
...
Change-Id: I2dc4a0869636fce5526f48b3a6267b6bdf61dbfb
2011-08-05 17:24:56 +09:00
satok
f071e75b78
Change the prune condition
...
Change-Id: I92aef12e0e1d89cfe1b346ddc6ef4df158ffe0b3
2011-08-04 18:32:37 +09:00
satok
4e4e74e6b6
Move the input index and output index to correction state
...
Change-Id: Idebdb59143f3367929df6a0475cefe941eb16d01
2011-08-04 14:16:14 +09:00
satok
0f6c8e8aeb
Move code related to ranking algorithm to correction_state.cpp
...
Change-Id: I52b34de45969fef82e46d9c10079c2d45e0b94eb
2011-08-03 20:34:19 +09:00
Jean Chalard
588e2f2964
Add bigram lookup implementation.
...
Bug: 5046459
Change-Id: Id2c7686c5da078751ed587e559417e808779aa7a
2011-08-02 18:05:59 +09:00
satok
612c6e49c0
Move code related to ranking algorithm to the correction state
...
Change-Id: I2d9e2db81cf6597ca4e88d7bc6737ab3b52b34b2
2011-08-02 15:44:59 +09:00
satok
db2c0919cf
Remove old dictionary format code
...
Change-Id: Ic4b9e069c9bd5c088769519f44d0a9ea45acb833
2011-08-01 16:01:54 +09:00
satok
2df3060883
Add correction state
...
Change-Id: I0d281cede1590893bd1def005cf83c9431d12750
2011-08-01 15:42:09 +09:00
Jean Chalard
6a0e9642a8
Small native refactoring.
...
Move a purely dictionary-format-related function that is needed
both by unigrams and bigrams to the binary format handling
file.
Also remove the empty UnigramDictionary::getBigrams placeholder
function, on grounds that it should be in the BigramDictionary
class.
Bug: 5046459
Change-Id: I8a67a25f72122e2fa0b19ae1d936db25eb0b20ba
2011-07-26 16:13:53 +09:00
Jean Chalard
848b69a5f9
Some refactoring
...
Getting the frequency of a terminal is not very useful, however
getting its position will be very useful for retrieving bigrams
later.
Moreover, from the position it's easy to find out the frequency.
Bug: 5046459
Change-Id: Ica53472c2038c7e407dbd1399d336511c731087f
2011-07-26 15:44:51 +09:00
Jean Chalard
999ba61b34
Some native cleanup
...
Take a function that does not need to be a member and make it
static inline.
Also replace the return value of -1 by a #define'd constant.
Change-Id: I92e0deaa1df65998b76aba6329a4c8eb4d287485
2011-07-22 18:09:48 +09:00
Jean Chalard
f0a9809662
Check the binary dictionary magic number
...
...and return NULL if it does not matched an expected value.
Bug: 5052486
Change-Id: I1dc7955d2785ee080bc5c22398be9befe332f096
2011-07-20 19:43:14 +09:00
satok
d24df43eaf
(Step2)Move functions related to proximity to proximity_info.cpp
...
Change-Id: Iae0eb2a5cd758bda820fa42b4bc3eb3d2665bf96
2011-07-14 15:47:32 +09:00
satok
46f2d44a29
Merge "(Step 1) Move proximity related parameters from unigram_dictionary to proximity_info"
2011-07-13 21:30:30 -07:00
satok
1d7eaf8462
(Step 1) Move proximity related parameters from unigram_dictionary to proximity_info
...
Change-Id: Ic630b35f4abffeb84c38bcf5935795b7ff07556a
2011-07-14 13:21:34 +09:00
satok
827ced8486
Separate logging definitions in C
...
Change-Id: I1d79814d1fd74e92a280f355c535517618c51752
2011-07-14 09:01:09 +09:00
satok
787945bf1e
Fix build for profiling
...
Change-Id: I39cd0fa37fb738dcbbcf82839b6bb030e3af606b
2011-07-14 08:32:57 +09:00
satok
3e41c071e6
Merge "Add a flag for a profiling"
2011-07-12 23:27:51 -07:00
satok
20d9fdae3a
Add a flag for a profiling
...
Change-Id: Iae509a24fd0f0f416376c3f8051aa2eb92d48659
2011-07-13 15:21:10 +09:00
Jean Chalard
0adf7ae299
Merge "New dict format, step 7"
2011-07-12 22:48:45 -07:00
Jean Chalard
1059f27364
New dict format, step 7
...
This actually implements the new dictionary format, but does not
activate the implementation through #defines.
Bug: 4392433
Change-Id: I9b26b9bcb4b823a36e0984799b69730acfc6f7f3
2011-07-13 14:33:48 +09:00
Doug Kwan
ce9efbff53
Compile code used in logging conditionally so that gcc does not complain
...
about unused-but-set variables.
Change-Id: I141f438694a1854d54d08cb5a74c23222dd9d85e
2011-07-08 00:29:11 -07:00
Jean Chalard
bb15e77511
Move a function to make next commit more readable
...
Change-Id: Ieaa935ff4d68ce88137dcc5c672a4149a4c9c64f
2011-06-30 20:14:38 +09:00
Jean Chalard
e6715e32d5
Move a function out of a #endif to reduce a future commit
...
Change-Id: Ic8f3160a96b6d79ba19ff9c8eda1692e94a38e98
2011-06-30 19:47:25 +09:00
Jean Chalard
0584f02ee1
Rename parameters for future change
...
Change-Id: Id15a17340fb26f91c72687f30bef24b2d8b94940
2011-06-30 19:23:16 +09:00
Jean Chalard
432789ac93
Internal cleanup
...
Moving functions around, renaming parameters
Change-Id: I3ab480f483d7d9700b9328cb07b16b51005098e5
2011-06-30 17:50:48 +09:00
Jean Chalard
ffefdb6c1a
Cleanup.
...
Function renaming, moving around for future patch readability
Change-Id: Id33b961cf2e899b5a3c9189951d2199aba801666
2011-06-30 17:22:19 +09:00
Jean Chalard
980d6b6fef
Internal cleanup.
...
Function renaming, useless function supressing, fix comments
Change-Id: I148acbaf367cd556a85b89016676b46cc971af81
2011-06-30 17:02:23 +09:00
Jean Chalard
594a9a1963
Internal cleanup.
...
Removed unused function prototypes.
Change-Id: Ia56ea8e285deed17ce8377df855b045b7850d58d
2011-06-30 16:51:17 +09:00
Jean Chalard
85a1d1ea74
New dict format, step 6
...
Copy the modified functions to be able to see the diff
Bug: 4392433
Change-Id: Ic9b83b4b4b7b89cc922eed1825507d7d516aff24
2011-06-21 22:24:54 +09:00
Jean Chalard
bc90c72faf
New dict format, step 5
...
Move functions that will be modified and enclose those that will
be replaced into #ifdefs.
This change does not modify any code, only move some code around.
Bug: 4392433
Change-Id: Ibefbda1eb8bdc8a0c72de47ad9c67a08d0aca960
2011-06-21 12:15:00 +09:00
Ken Wakasa
ce9e52a12a
Clean up in LatinIME native code
...
Change-Id: I0062200a0181a491690115ac0fab8d11358e2f14
2011-06-18 23:52:09 +09:00
Jean Chalard
23eb0fa0b5
Merge "New dict format, step 4"
2011-06-17 05:30:26 -07:00
Jean Chalard
ca5ef2890e
New dict format, step 4
...
Consolidate terminal cases, streamline the word adding process
and create the entrances for adding alternate spellings with an
empty implementation.
Bug: 4392433
Change-Id: I781c93ec49945d71c7c20624c86596aa49add4c8
2011-06-17 20:59:21 +09:00
Jean Chalard
4fd9650f0b
New dict format, step 3 - followup
...
Make the passing of an argument clearer
Bug: 4392433
Change-Id: Id82662ff4dc25282f70a08bee77378fee2b4b590
2011-06-17 17:08:09 +09:00
Jean Chalard
581335c3fb
Fix a bug where bigram search would never return
...
Bug: 4690487
Change-Id: Ie8f3f651508cc48bbb043a0b308f7e0d1524371c
2011-06-17 12:45:17 +09:00
Jean Chalard
17e44a72e8
New dict format, step 3
...
Some refactoring and add of a parameter that will be necessary.
Bug: 4392433
Change-Id: I17f001a7efd4f69f4c35f94ee1ca8e97391b81d5
2011-06-16 23:28:09 +09:00
Jean Chalard
8124e64dcc
New dict format, step 2
...
Move some methods around and make static some methods
Bug: 4392433
Change-Id: I2bbe98aec118a416d21d1e293638e1d324505b9b
2011-06-16 22:33:41 +09:00
Jean Chalard
293ece0f34
New dict format, step 1
...
This renames some variables and removes dependancies to values that
will disappear
Bug: 4392433
Change-Id: I79a44462d6bf25248cc2de0d63d7918fc6925d68
2011-06-16 22:18:10 +09:00
Jean Chalard
e93b1f2209
Allow reading a binary dictionary even without proximity info.
...
This prepares the way for spell checking, which is to be done
without context so without proximity info.
Bug: 4176026
Change-Id: I1b4bfaefe2611e1b484acdf3c33598cb80f81ff4
2011-06-02 12:10:13 +09:00
satok
99c908a595
Tweak the demotion rate for the mistyped space correction
...
Bug: 4402942
Change-Id: I7f5412b9fd2f1506f529cff0c3399d748c6ece92
2011-05-24 14:31:06 +09:00
satok
bb68d80119
Tweak the demotion rate of mistyped space correction
...
Bug: 4402942
Change-Id: I6e0421dfa99e261c72a901c9699fec864ab4b3c5
2011-05-23 18:35:29 +09:00
satok
d8db9f86d0
Fix a bug on the calculation of the freq on the mistyped space error correction
...
Bug: 4402942
Change-Id: I0b611e3d0e8c25ca528ef7408c3949200e5cad85
2011-05-18 18:36:54 +09:00
satok
0b6b0a5a98
Enable fast power
...
Change-Id: I00a91381f63cde62d9e7cf7e17f75869294cf2df
2011-04-27 16:29:27 +09:00
satok
b2e5e5937c
Handle overflow properly in multiplyRate
...
Bug: 3401513
Change-Id: I8dd2523caa58bb51c378a01e160a58f9106ce9b8
2011-04-26 22:03:26 +09:00
satok
9674f654a7
Fix a bug that 2 length words were demoted.
...
Change-Id: I4a3558d0f1f1b0a9d6a36c3f75db3089b0566d7f
2011-04-20 17:15:27 +09:00
satok
63546344b3
Merge "Promote a word with a missing space because the formula was changed by Ifa4338c5f4"
2011-04-19 07:54:02 -07:00
satok
cbc66e0711
Promote a word with a missing space because the formula was changed by Ifa4338c5f4
...
Change-Id: Id4bc965aef387800facb64164d8c36a3bdd2fa07
2011-04-19 23:48:36 +09:00
satok
4c981d3a40
Demote a word with mistyped space and missing space according to the length of each word
...
Change-Id: Ifa4338c5f43b37e6bcd0700767ef2178189de3af
2011-04-19 23:14:27 +09:00
satok
a4374d2eb7
Promote the correction of words with a missing space character
...
Change-Id: I37ba618b54f7115163a3e9c6c555485e7024dc92
2011-04-18 12:36:11 +09:00
satok
9d2a3020ba
Promote a word with a proximity character
...
Bug: 4293295
Change-Id: Ib0ec8aff087c71c4fbe983f3f5bc78e9c7868fd8
2011-04-14 20:30:25 +09:00
satok
72bc17ec9f
Promote a word with only one proximity character.
...
Bug: 4271049
Change-Id: I755986f582f43417fda6b117207530c519233baf
2011-04-13 19:11:13 +09:00
satok
dc5301e590
Change the formula of the missing character.
...
- Bug: 4271049
- Due to the result of the recent user study, a word with a missing character needs to be promoted a bit.
so I changed the formula from:
- freq * 70 * (n - 2) / (n - 1)
to:
- freq * 90 * (10n - 12) / (10n - 2)
Change-Id: Ibff72cbdb0f2d7b91460a06a0fd39a9f5749aa46
2011-04-13 10:44:18 +09:00
Ken Wakasa
de3070a71b
Add -Werror flag to catch more warnings and errors
...
Change-Id: I9c39ba24578931944aae8182918ed48a2e82eb39
2011-03-19 10:12:15 +09:00
satok
e07f93d3ab
Merge "Tweak the demotion rate for a word with missing letter" into honeycomb-mr1
2011-03-07 22:44:16 -08:00
satok
0bddb2f4d6
Tweak the demotion rate for a word with missing letter
...
Bug: 4027223
Change-Id: Ie9a5552d2f41d60f433573fde52efc097f5143bf
2011-03-07 19:44:52 -08:00
satok
1df8c82d71
Fix a bug that a word with only one missing word is not promoted
...
Bug: 4027223
Change-Id: Icf7c5b917c18b565dca95b98b96c1c8e2963f540
2011-03-07 18:01:09 -08:00
satok
3c4bb7747d
A bug fix for the mistyped space algorithm
...
Bug: 3311719
-- also fixed compiler warnings
Change-Id: I6941c0d02f10d67af88bc943748dde8d8783fabb
2011-03-04 23:25:48 -08:00
Jean Chalard
eaecb56f94
Merge "Demote skipped characters matched words with respect to length." into honeycomb-mr1
2011-03-04 22:43:16 -08:00
satok
817e517e46
Add the suggestion algorithm of words with space proximity
...
Bug: 3311719
Change-Id: Ide12a4a6280103c092fa0f563dd5b9e3f7f5c89b
2011-03-04 20:37:18 -08:00
Jean Chalard
07a8406bc1
Demote skipped characters matched words with respect to length.
...
Words that matched user input with skipped characters used to be demoted
in BinaryDictionary by a constant factor and not at all in those dictionaries
implemented in java code. To represent the fact that the impact of a skipped
character gets larger as the word is shorter, this change will implement a
demotion that gets larger as the typed word is shorter. The demotion rate
is (n - 2) / (n - 1) where n is the length of the typed word for n >= 2.
It implements it for both BinaryDictionary and java dictionaries.
Bug: 3340731
Change-Id: I3a18be80a9708981d56a950dc25fe08f018b5b89
2011-03-05 13:20:19 +09:00
Jean Chalard
a787dba83b
Fix a bug with umlaut processing.
...
Issue: 3275926
Change-Id: Ibcb00aaea3ff05ad59ad4e8e54dd3caab5ab9bca
2011-03-04 13:07:07 +09:00
Jean Chalard
c2bbc6a449
Use translation of fallback umlauts digraphs for German.
...
For German : handle "ae", "oe" and "ue" to be alternate forms for
umlaut-bearing versions of "a", "o" and "u".
Issue: 3275926
Change-Id: I056c707cdacc464ceab63be56c016c7f8439196c
2011-03-03 11:52:23 +09:00
satok
8fbd552292
Add proximity info to native
...
Bug: 3311719
Change-Id: Ie596304070e321ad23fb67a13bf05e2b6af1b54b
2011-02-23 23:04:00 +09:00
Jean Chalard
f5f834afcd
Rename variables with obscure names.
...
The `snr' variable has a very obscure name. Rename it to `matchWeight'.
Also, the `toLowerCase' function is error-prone, since it actually returns
a lower case version of the BASE char, that is without diacritics. Hence,
rename it to `toBaseLowerCase' and update variables with similar names.
Change-Id: Ibdbe73018a33ee864db59a51d664c3b104d5fb3f
2011-02-22 16:43:19 +09:00
Jean Chalard
a5d5849701
Force autocorrection of matching words with different accents.
...
When entering a word without accents the user expects the system to
add accents automatically if there is no other matching word. This
patch ensures the accented version is promoted accordingly and
autocorrection really takes place.
Issue: 3400015
Change-Id: I8cd3db5bf131ec6844b26abecc1ecbd1d6269df4
2011-02-22 15:27:06 +09:00
Tadashi G. Takaoka
887f11ee43
Remove next letters frequency handling
...
Bug: 3428942
Change-Id: Id62f467ce4e50c60a56d59bf96770e799a4659e2
2011-02-17 13:59:41 +09:00
Jean Chalard
8dc754a411
Promote full matches with differing accents.
...
Stop considering accented characters as different from their base
character for proximity scoring.
Also give a huge boost (basically overriding frequency) to a word
fully matched with only differing accents.
Bug: 2550587
Change-Id: I2da7a71229fb3868d9e4a53703ccf8caeb6fcf10
2011-01-27 17:29:24 +09:00
satok
fd16f1d2a3
Handle the last char correctly in excessive char correction algortihm.
...
bug: 3278422
Change-Id: I651d3cb0130ab9834ed9d7a97f41360c6eaa9de1
2011-01-27 16:44:54 +09:00
satok
58c49b9132
Fix auto-correction threshold and promote full matched words
...
Bug: 3374359
Bug: 3278422
"zbe" will be auto corrected to "be" by fixing s-line
"teh" will be auto corrected to "the" by promotion of full matched words
Change-Id: I314c632820e4e0b1501edeca60ada205d291451f
2011-01-27 12:53:13 +09:00
Ken Wakasa
e90b333017
Load main dic in native
...
Follow up to Id57dce51
bug: 3219819
Change-Id: I00e11ef21d0252ffa88c12dffb9c55b0f2e19a66
2011-01-07 19:51:45 +09:00
satok
f7425bb15b
Supress overflow at mulitplying demotion rate
...
Change-Id: I2003c5f88a5062b11e2f21522095bb94b1eb4efd
2011-01-05 16:43:17 +09:00
satok
61e2f85e3f
Add profiler for native dictionary code
...
Change-Id: I2569756c9ef4fa677ae52f2ccfcb90d2115d129f
2011-01-05 15:47:29 +09:00
satok
54fe9e0e20
Suggest words with excessive chars out of proximity chars
...
Bug: 3273807
Change-Id: Ib8f48e562bcf4c2aac0ad5cb46809fd5f539a322
2010-12-13 17:44:14 +09:00
satok
a3d78f606e
Suggest words with transposed chars
...
Bug: 3193883
Change-Id: I884b669258bfc522bc04e14f22a7646164a4cac5
2010-12-10 18:34:23 +09:00
satok
e07baa6fab
Limit the suggestions with an excessive character by filtering proximity characters
...
Change-Id: Iad26dad545f1a431aa0fa53f99198b27defd03a3
ug: 3269482
2010-12-10 00:47:37 +09:00
satok
aee09dc5fa
Fix a bug that We can't suggest words with missing space if one of the words starts with a capitalized character.
...
Bug: 3268825
Change-Id: I0634a243ad1e45dd096b30824b463c366a2e7f0f
2010-12-09 21:41:26 +09:00
satok
662fe69ba2
Suggest words with missing space
...
Bug: 3193883
Change-Id: I8d25f3e1d4db10be733d85edfa4f55a094feef80
2010-12-09 14:26:27 +09:00
satok
cdbbea735f
Suggest excessive characters
...
bug: 3193883
Change-Id: Iea7a0fce7ce62d8779a7c7e4613d50db30d82b07
2010-12-08 16:56:06 +09:00
satok
d299792368
Make no-recursive getWordRec
...
Change-Id: Id90f3ca86ef490834cefa92f0d6958b1289fc633
2010-12-07 16:45:32 +09:00
satok
f5cded1c6c
Fix a crash when MAX_WORD_LENGTH is too short.
...
Change-Id: Idcb5aa2685321b8d0ac7d846caecbd1c79e4dd77
2010-12-06 22:58:56 +09:00
satok
48e432ceb8
Breakdown getWordRec
...
Change-Id: I4fef02c227fb858334dbe2eabf2762d5b6e1d919
2010-12-06 18:45:48 +09:00
satok
683192684c
Trim the flow of getWordRec
...
Change-Id: Ic0cfa64ee1e55682ca73681c585db6a5cb510900
2010-12-06 14:56:11 +09:00
satok
28bd03b9f5
Breakdown getWordRec
...
Change-Id: I8556efb1dd053eff9a9681971cbe1014abf0333f
2010-12-03 19:25:42 +09:00
satok
715514d7dd
Breakdown getWordRec and add comments
...
Change-Id: I88bad8a4a8177e3540b995b664c47b86d6904027
2010-12-03 10:01:09 +09:00
satok
18c28f431e
Detach bigram functionarities from unigram_dictionary
...
Change-Id: Ie35164a5f293e5370885a1ba13d6ed7caf6000ec
2010-12-02 18:24:53 +09:00
satok
e808e436cb
Refactor: Move utility functions and no suggestion functions from unigram_dictionary.cpp to dictionary.cpp
...
Change-Id: I6f695e4f5852547d2c00de5ee54a650fef9accbe
2010-12-02 16:11:35 +09:00
satok
3008825948
Fix parameters of native functions and refactor Dictionary
...
- created bigram/unigram dictionary classes
Change-Id: I233a28ed8d611870db3f4cf8f25fc45b5d41529b
2010-12-02 01:16:44 +09:00
satok
d4952c8fe9
Move a logic for finding words with a missing character to the native code.
...
Change-Id: I58338643830ff4f9708f78a9c26f75c8bf2ebf45
2010-12-01 19:26:36 +09:00
satok
15dc33d9f6
Add an easy way to output native debug logs
...
Change-Id: Ieff2b8e60c5e7dedb7f86e17f7c37b349a912ab4
2010-12-01 15:56:17 +09:00
Jae Yong Sung
80aa14fd43
- separate dict (uses xml)
...
- retrieve bigrams that only starts with character typed and neighbor keys
- contacts bigram
- performance measure
bug: 2873133
Change-Id: If97c005b18c82f3fafef50009dd2dfd972b0ab8f
2010-07-28 11:08:08 -07:00
Jae Yong Sung
937d5ad013
added bigram prediction
...
- after first character, only suggests bigram data (but doesn't autocomplete)
- after second character, words from dictionary gets rearranged by using bigram
- compatible with old dictionary
- added preference option to disable bigram
Change-Id: Ia8f4e8fa55e797e86d858fd499887cd396388411
2010-07-13 11:33:39 -07:00
Ken Wakasa
826269c8ae
Get rid of dependency on native AssetManager API. Confirmed the native code builds with the NDK r3.
...
Change-Id: I0d2d3a0e262847d6948a0336a35440e21e312ad2
2010-04-27 22:23:03 +09:00
Ken Wakasa
f1abb8ce3c
Get rid of code taken from bionic to avoid license issue.
...
Change-Id: If96f4247edbc7b1e9f7418d2ddef191618a54ae3
2010-04-23 01:24:09 +09:00
Ken Wakasa
707505ec18
A part of efforts of unbundling LatinIME: Get rid of ICU dependency in the native code.
...
This is actually a back merge from the LatinIME sandbox. Please refer to
http://arvarest.i.corp.google.com:8080/#change,77
Change-Id: I3ff3781903d5c642c662c2d744f808be7e4d8997
2010-04-21 22:43:17 +09:00
Amith Yamasani
07b1603a3f
Don't let the native code target be included twice when unbundling.
...
Move java code to a different directory so that the unbundled
version doesn't try to compile the native code again.
Change-Id: I05cf9e643824ddc448821f69805ccb0240c5b986
2010-03-09 15:01:09 -08:00