Jean Chalard
b3c98901c5
Add auto detection and decoding of dictionary files. (A2)
...
Bug: 7388852
Change-Id: I25e755fc15f5b383acc046f668e9681efa4f0c2f
2012-10-25 16:40:15 +09:00
Jean Chalard
ddb0bcc051
Fix a bug where a bigram would be ignored
...
Bug: 7403386
Change-Id: I89f495d07f7059a9f1ccd97d487c2f2657a8ebd2
2012-10-24 13:24:59 +09:00
Jean Chalard
c59c741987
Return the correct bigram frequency
...
The "correct" bigram frequency is now returned by the reading
code. However, as the binary format represents the frequency
in a lossy manner, the frequency is not guaranteed to be the
exact same as the one in the source text format - only a close
enough value. It is however the exact same value seen by the
native code.
Bug: 7395653
Change-Id: I49199ef18901c671189912b3550623e9643baedd
2012-10-23 17:17:37 +09:00
Tadashi G. Takaoka
15f6d4ae34
Add @UsedForTesting and @ExternallyReferenced annotations
...
Bug: 7268357
Change-Id: I0b7e0c19f04af9ae30874d0a4c26ad81bc80be8c
2012-10-22 11:18:43 -07:00
Yuichiro Hanada
d2579c4832
fix writeCharGroup.
...
Change-Id: Ib841afaba0a20c3b300eb7d3e9133243f9f3ae58
2012-10-05 14:54:17 +09:00
Yuichiro Hanada
3c6d9fe148
Add insertWord.
...
bug: 6669677
Change-Id: Ide55a4931071de9cd42c1cddae63ddd531d2feba
2012-10-04 17:19:47 +09:00
Yuichiro Hanada
c3a98ca306
Add writeNode.
...
Change-Id: I088bb6ea43ce0841d725e48b677d429e1155569d
2012-10-04 14:28:42 +09:00
Yuichiro Hanada
38712ff27d
Add updateParentAddresses.
...
Change-Id: Iac210131b7c003ef363e1138bf22f777a37c6a89
2012-10-03 19:37:17 +09:00
Yuichiro Hanada
a853356b82
Add isDeletedGroup.
...
Change-Id: I83f09c068868e5e6e1b46f494a6ef957f0b466d8
2012-10-03 02:19:41 -07:00
Yuichiro Hanada
7223cc2ef1
Add MAX_BIGRAMS_IN_A_GROUP.
...
Change-Id: I128d5deb8e523045d7ad77d7a8fd3db944f71238
2012-10-03 18:10:06 +09:00
Yuichiro Hanada
4ad4ff618f
Add makeCharGroupFlags.
...
Change-Id: Id2c580f21b77f66a97c5fbdf4542fdafe6c43614
2012-10-03 14:33:59 +09:00
Yuichiro Hanada
7f438aa12f
Make writeCharGroup return a size of a new group.
...
bug: 6669677
Change-Id: I56f6a07b04b08443f2c052927404318c2018fc9d
2012-10-01 22:02:04 +09:00
Yuichiro Hanada
fb7e08ea8f
Add writeCharGroup.
...
bug: 6669677
Change-Id: I36792ba9c511a5148c963096cc93ca8c2e0ee04e
2012-10-01 21:50:38 +09:00
Yuichiro Hanada
f3aed3ea26
Add updateChildrenAddress.
...
Change-Id: Ic06a755d85612476e719e580469dc1cd9447286c
2012-09-28 18:45:56 +09:00
Tadashi G. Takaoka
a28a05e971
Cleanup: Make some classes as final
...
Change-Id: I6009b3c1950ba32b7f1e205a3db2307fe0cd688e
2012-09-27 19:03:30 +09:00
Yuichiro Hanada
84d858ed5e
Use BinaryDictInputOutput to save UserHistoryDictionary.
...
bug: 6669677
Change-Id: I08193c26f76dbd48168f8ac02c1b737525bfc7b2
2012-09-27 12:02:17 +09:00
Yuichiro Hanada
2aea34fb31
Add updateParentAddress.
...
bug: 6669677
Change-Id: I353f8ae53720cdf7a809271a28cb703709609f53
2012-09-26 17:18:01 +09:00
Yuichiro Hanada
2ee70804e9
Add moved char groups.
...
bug: 6669677
Change-Id: I372f841044fe8e076a50a80ac10b715e5f8fd4eb
2012-09-26 17:01:48 +09:00
Yuichiro Hanada
a161bdac88
add capacity to FusionDictionaryBufferInterface.
...
bug: 6669677
Change-Id: I4627093811a19c46ce13fe351d1db63cbd78cf4a
2012-09-25 21:47:11 +09:00
Yuichiro Hanada
93d7c6233f
Make getTerminalPosition read linked-list nodes.
...
bug: 6669677
Change-Id: I599d276f430efe23d402695c325e23906b7705b3
2012-09-25 21:11:15 +09:00
Yuichiro Hanada
8ec0064c49
Make children addresses and parent addresses use signed addresses.
...
Signed addresses are used only in version 3 with dynamic update.
bug: 6669677
Change-Id: Iadaeab199b5019d2330b4573c24da74d64f0945e
2012-09-25 12:55:14 +09:00
Yuichiro Hanada
82d9deaaf2
Combine mHasParentAddress with mHasLinkedListNode into mSupportsDynamicUpdate.
...
bug: 6669677
Change-Id: I82799af199358420f09ac34fc005091e202c5d3b
2012-09-24 13:17:44 +09:00
Yuichiro Hanada
66597f5e5f
Add deleteWord.
...
bug: 6669677
Change-Id: I1a5b90ee05e5cffd74a5c140384a3e37c79e7e70
2012-09-21 12:40:07 +09:00
Yuichiro Hanada
73779f7631
Make readUnigramsAndBigramsBinary read linked-list nodes.
...
Change-Id: I07ae036b0b06e71d7a18f2bf11e4692cd4213568
2012-09-20 20:37:02 +09:00
Yuichiro Hanada
d36245fad2
Add getTerminalPosition.
...
Change-Id: If04d779db23b1aea2cc12e5e9b8cecfcb35a5737
2012-09-20 18:02:16 +09:00
Yuichiro Hanada
65feee12e5
Make BinaryDictIOUtils.
...
Change-Id: I45830235ee738233e8eb2bd91d659705b698f58c
2012-09-19 15:37:37 +09:00
Yuichiro Hanada
c2fdf0dfbf
Make readNode read linked list nodes.
...
Change-Id: Ia5eaae0653179b2eb74c53b0823beaf80377a389
2012-09-19 14:49:23 +09:00
Yuichiro Hanada
a149c53c8e
add limit to FusionDictionaryBufferInterface.
...
Change-Id: Ic9ff717a9751023d47b02ff3b9d1fbf3115c2501
2012-09-19 12:28:19 +09:00
Yuichiro Hanada
b686df15fc
Add a new flag for linked list nodes.
...
Change-Id: Ib2f194775cfe5ab05481ac95cd709d6e8e8dd3c6
2012-09-18 22:01:49 +09:00
Yuichiro Hanada
bf45dc4860
Make writePlacedNode write the linked-list node.
...
Change-Id: I60feda815ea08cf73300fccca1ae12b97550f116
2012-09-18 21:20:07 +09:00
Yuichiro Hanada
061d225fb1
Add a new option to FormatOptions.
...
Change-Id: I8bf089bea5de46570a5e81fb1ea3ab22c07eeee1
2012-09-18 21:03:13 +09:00
Jean Chalard
ed47131612
Merge "Fix a bug with surrogate characters" into jb-mr1-dev
2012-09-18 02:06:55 -07:00
Jean Chalard
6c721b5f68
Fix a bug with surrogate characters
...
This is a pretty bad bug :/
Bug: 7013840
Change-Id: I12c7cfa4fa9d56b2c1fee6e6222c64fe20b88fa3
2012-09-18 18:01:15 +09:00
Yuichiro Hanada
8adc0154e6
Remove populateOptions(final ByteBuffer buffer).
...
Change-Id: Ifc4c64c9cffe4f343c5a604c192db010a1792acc
2012-09-18 14:42:52 +09:00
Yuichiro Hanada
cc958dd96e
Refactor BinaryDictInputOutput.
...
Change-Id: Idb4b635fcac70cc988e0dd3ce3bf121fba12099c
2012-09-14 11:08:01 +09:00
Yuichiro Hanada
1a347723c5
Move FormatOptions and FileHeader to FormatSpec.
...
Change-Id: I232e35598635113bf2c81825669c744aadc79efe
2012-09-13 16:35:41 +09:00
Yuichiro Hanada
81d97eec0e
Move constants and comments.
...
Change-Id: Ifd66bda7d528827ba61c60531121ea206a2325be
2012-09-13 14:28:39 +09:00
Yuichiro Hanada
8d031a63b4
Add put method to FusionDictionaryBufferInterface.
...
Change-Id: Iac0b35d2da05e81237d105e8fe13c56d16038de1
2012-09-12 15:41:21 +09:00
Yuichiro Hanada
e55b644aef
Add new binary dictionary format.
...
Change-Id: Ia99411d4009857d5e420ca87ef8acf1f1826d3ed
2012-09-10 13:05:46 +09:00
Yuichiro Hanada
eae7b293e4
Check the length of the word when add to FusionDictionary.
...
Change-Id: Id98d18e90a8b83b597507728b467f56888c8fd12
2012-09-10 12:35:53 +09:00
Yuichiro Hanada
83dfe0fd8c
Add FormatOptions.
...
Change-Id: Ibad05a5f9143de1156b2c897593ec89b0a0b07e7
2012-09-05 18:05:43 +09:00
Ken Wakasa
f2789819bd
Cosmetic fixes and a bug fix in UnigramDictionary::testCharGroupForContinuedLikeness().
...
This change has actually been extracted from a change work in progress I4fe423834b8131fb122251892c98228a6e08ba25
Change-Id: I52568fa09da2ea22be7f8bfe9676b7cd73c31fa4
2012-09-04 14:23:37 +09:00
Jean Chalard
2035b946a3
Merge "Reinstate the shortcut-only attribute" into jb-mr1-dev
2012-09-02 19:28:01 -07:00
Jean Chalard
72b1c93941
Reinstate the shortcut-only attribute
...
Also add the blacklist attribute
Bug: 7005742
Bug: 2704000
Change-Id: Icbe60bdf25bfb098d9e3f20870be30d6aef07c9d
2012-08-31 22:11:52 +09:00
Yuichiro Hanada
666a433802
add UserHistoryDictIOUtils.
...
Change-Id: I8a70e43b23f65b5fd5f0ee0b30a94ad8f5ef8a8a
2012-08-31 15:08:57 +09:00
Yuichiro Hanada
b2a43a2ed4
add readUnigramsAndBigramsBinary.
...
Change-Id: I7967f11211221d4877bf0a0c30183af885f45390
2012-08-31 14:39:19 +09:00
Yuichiro Hanada
62ed901100
add readHeader.
...
Change-Id: I5be5d62a63ca897e36fe93200ffdca6befb363aa
2012-08-30 14:17:50 +09:00
Yuichiro Hanada
f5c4ff4817
Add FusionDictionaryBufferInterface.
...
Change-Id: I8640c994231d5f46bc6e074ce8a5bf5344fed0aa
2012-08-29 19:27:49 +09:00
Yuichiro Hanada
d4fe7fda30
Use ByteBuffer when reading FusionDictionary from file.
...
Change-Id: Ia71561648e17f846d277c22309ac37c21c67a537
2012-08-24 13:31:08 +09:00
Jean Chalard
13822d2b05
Hack to skip reading an outdated binary file.
...
Bug: 7005813
Change-Id: Ie0d8d4b2d5eb147838ca23bdd5ec1cecd4f01151
2012-08-20 13:56:52 +09:00
Ken Wakasa
72c0f4de1d
Merge "add reconstructBigramFrequency" into jb-mr1-dev
2012-08-17 03:19:12 -07:00
Yuichiro Hanada
c0a75c8ecb
add reconstructBigramFrequency
...
Change-Id: Iff20dcb9ca0d6064bb118247887fe24b812c0c61
2012-08-17 19:05:16 +09:00
Jean Chalard
aa27635a8a
Reword a confusing comment
...
Bug: 7005645
Change-Id: Ifd942b3ce242aeeec512e132e1cee31329e994b1
2012-08-17 17:22:28 +09:00
Yuichiro Hanada
0d35c159fe
fix findWordInTree.
...
Change-Id: I8f42df28f76188677db9d4e55885e1fc6a40b53f
2012-08-17 10:23:01 +09:00
Yuichiro Hanada
66f338983b
fix findWordInTree.
...
Change-Id: I9d81c815494a0670afa81219ad7bad82274d997e
2012-08-16 20:21:47 +09:00
Jean Chalard
54e84a00fc
Make a makedict command for dicttool (A3)
...
This behaves exactly as the old makedict command. Further
changes will redirect the calls to makedict to this, so as
to consolidate similar code.
Groundwork for
Bug: 6429606
Change-Id: Ibeadbf48bec70f988a15ca36ebf5d1ce3b5b54ea
2012-08-04 01:11:46 +09:00
Jean Chalard
d10c473347
Small performance tweak
...
Change-Id: Icd540742073d49d12e70b2d8bd99aaf7ccb5802d
2012-06-08 17:09:40 +09:00
Jean Chalard
7214617622
Remove a slew of Eclipse warnings.
...
Change-Id: I03236386aea13fbd4fb8eaeee18e0008aa136502
2012-06-08 16:23:18 +09:00
Tadashi G. Takaoka
93ebf74bae
Clean up some compiler warnings
...
Change-Id: I604da15e65fc3cf807ec4033df4e4cd5ef0196fc
2012-05-25 19:04:54 +09:00
Jean Chalard
418b343797
Use a formula packing more information into 4 bits field
...
Bug: 6313806
Change-Id: Id0779bd69afae0bb4a4a285340c1eb306544663a
2012-05-15 18:59:21 +09:00
Jean Chalard
76319c6931
Small optimization
...
Performance gain is < 2%
Bug: 6394357
Change-Id: I2b7da946788cf11d1a491efd20fb2bd2333c23d1
2012-05-14 15:52:01 +09:00
Jean Chalard
4df5b43df8
Small optimizations
...
Bug: 6394357
Change-Id: I00ba1b5ab3d527b3768e28090c758ddd1629f281
2012-05-14 15:51:58 +09:00
Jean Chalard
3b1b72ac4d
More optimizations
...
We don't merge tails anyway, and we can't do it any more
because that would break the bigram lookup algorithm.
The speedup is about 20%, and possibly double this if
there are no bigrams.
Bug: 6394357
Change-Id: I9eec11dda9000451706d280f120404a2acbea304
2012-05-14 12:41:18 +09:00
Jean Chalard
12efad3d15
Some more obvious optimizations
...
The speedup is about 15%
Bug: 6394357
Change-Id: Ibd57363d9d793206dd916d8927366db4192083b6
2012-05-14 12:35:31 +09:00
Jean Chalard
47db0be7cb
Some obvious optimizations to makedict
...
Bug: 6394357
Change-Id: Ibfd98aac2304ef50cf90b1de984736ddcfe7a4bc
2012-05-14 12:34:05 +09:00
Jean Chalard
f7346de94a
Write the bigram frequency following the new formula
...
This also tests for bigram frequency against unigram frequency
Bug: 6313806
Bug: 6028348
Change-Id: If7faa3559fee9f2496890f0bc0e081279e100854
2012-05-11 20:27:22 +09:00
Jean Chalard
4455fe2c89
Refactor a method
...
Rename it, rename parameters, and add a parameter that will
be necessary soon.
Also, rescale the bigram frequency as necessary.
Bug: 6313806
Change-Id: I192543cfb6ab6bccda4a1a53c8e67fbf50a257b0
2012-05-11 19:34:35 +09:00
Ken Wakasa
84478103ec
Tidy up the MakedictLog class.
...
Follow up to I436b2b7b
Change-Id: Id17b134dab2f876b874a505e92a379c8b5567fa4
2012-05-05 23:40:21 +09:00
Ken Wakasa
03b423f313
Suppress debug log from makedict in LatinIME
...
bug: 6447900
Change-Id: I436b2b7b261b422a7edca9cb99a4689b63877fe0
2012-05-05 09:28:27 +09:00
Jean Chalard
20a6dea1ca
Add a flag for bigram presence in the header
...
This is a cherry-pick of Icb602762 onto jb-dev.
Bug: 6355745
Change-Id: Icb602762bb0d81472f024fa491571062ec1fc4e9
2012-04-26 16:40:29 +09:00
Jean Chalard
44c64f46a1
Ignore bigrams that are not also listed as unigrams
...
This is a cherry pick of I14b67e51 on jb-dev
Bug: 6340915
Change-Id: Iaa512abe1b19ca640ea201f9761fd7f1416270ed
2012-04-26 15:20:30 +09:00
Jean Chalard
805fed49e1
Merge "Fix binary reading code performance."
2012-04-23 23:39:37 -07:00
Jean Chalard
1d80a7f395
Fix binary reading code performance.
...
This is not the Right fix ; the Right fix would be to read
the file in a buffered way. However this delivers tolerable
performance for a minimal amount of code changes.
We may want to skip submitting this patch, but keep it around
in case we need to use the functionality until we have a good
patch.
Change-Id: I1ba938f82acfd9436c3701d1078ff981afdbea60
2012-04-24 15:16:17 +09:00
Jean Chalard
a64a1a46e4
Fix a bug where a node size would be seen as increasing.
...
The core reason for this is quite shrewd. When a word is a bigram
of itself, the corresponding chargroup will have a bigram referring
to itself. When computing bigram offsets, we use cached addresses of
chargroups, but we compute the size of the node as we go. Hence, a
discrepancy may happen between the base offset as seen by the bigram
(which uses the recomputed value) and the target offset (which uses
the cached value).
When this happens, the cached node address is too large. The relative
offset is negative, which is expected, since it points to this very
charnode whose start is a few bytes earlier. But since the cached
address is too large, the offset is computed as smaller than it should
be.
On the next pass, the cache has been refreshed with the newly computed
size and the seen offset is now correct (or at least, much closer to
correct). The correct value is larger than the previously computed
offset, which was too small. If it happens that it crosses the -255 or
-65335 boundary, the address will be seen as needing 1 more byte than
previously computed. If this is the only change in size of this node,
the node will be seen as having a larger size than previously, which
is unexpected. Debug code was catching this and crashing the program.
So this case is very rare, but in an even rarer occurence, it may
happen that in the same node, another chargroup happens to decrease
it size by the same amount. In this case, the node may be seen as
having not been modified. This is probably extremely rare. If on
top of this, it happens that no other node has been modified, then
the file may be seen as complete, and the discrepancy left as is
in the file, leading to a broken file. The probability that this
happens is abyssally low, but the bug exists, and the current debug
code would not have caught this.
To further catch similar bugs, this change also modifies the test
that decides if the node has changed. On grounds that all components
of a node may only decrease in size with each successive pass, it's
theoritically safe to assume that the same size means the node
contents have not changed, but in case of a bug like the bug above
where a component wrongly grows while another shrinks and both cancel
each other out, the new code will catch this. Also, this change adds
a check against the number of passses, to avoid infinite loops in
case of a bug in the computation code.
This change fixes this bug by updating the cached address of each
chargroup as we go. This eliminates the discrepancy and fixes the
bug.
Bug: 6383103
Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01
2012-04-24 14:04:02 +09:00
Tom Ouyang
df7ebbbd61
Change binary dictionary output buffer size to match dictionary size.
...
Bug: 6355943
Change-Id: Iaab7bc16ba0dbc7bfde70b06e7bd355519838831
2012-04-19 10:18:57 -07:00
Jean Chalard
f420df2823
Add support for German umlaut and French ligatures flags
...
Bug: 6202812
Change-Id: Ib4a7f96f6ef86c840069b15d04393f84d428c176
2012-04-06 17:07:29 +09:00
Jean Chalard
b8060399c7
Remove constructors
...
And small cleanup.
Change-Id: I1de903f42c1b8d57a488be2162e0b94055a6d1f2
2012-04-06 16:53:15 +09:00
Jean Chalard
8cf1a8d04f
Remove the shortcutOnly attribute which is now useless.
...
Change-Id: Ifccdfdaf7c0066bb7728981503baceff0fedb71f
2012-04-06 16:27:53 +09:00
Jean Chalard
c734c2aca1
Add a simple way to input dictionary header attributes
...
Just add them as an attribute to the root of the XML node.
Bug: 6202812
Change-Id: Idf040bfebf20a72f9e4370930a85d97df593f484
2012-04-03 15:18:51 +09:00
Jean Chalard
e705a122d1
Remove useless adding of shortcut as unigrams.
...
Change-Id: I1f50ebf00d6dd0dad4114fad86ace5b7b304613a
2012-03-28 20:40:38 +09:00
Jean Chalard
752996540f
Add read support for string shortcuts for makedict.
...
Change-Id: I48ee4fc9ac703ad2a680b3cd848de91c415ea3c8
2012-03-28 20:40:08 +09:00
Jean Chalard
3bbb31f3f0
Change the format of the shortcuts in the binary dict.
...
This only includes the write part of the change. The read part is
coming in a different commit.
Change-Id: Iabe7af6cd134462dc19245f5400719920ed31c8f
2012-03-28 20:24:07 +09:00
Tom Ouyang
b163f91621
Merge "Add support for updating and adding bigrams to existing nodes."
2012-03-23 05:57:55 -07:00
Tom Ouyang
7cfe20efbe
Add support for updating and adding bigrams to existing nodes.
...
Bug: 6188977
Change-Id: I48aca8ba199247d73395ab13b9d1976f4e739208
2012-03-23 21:52:39 +09:00
Ken Wakasa
066866954a
Add a missing comparison in Word.equals()
...
Follow up to I94e2e29c
bug: 6209651
Change-Id: Iff2daca8c2678e2d1796f98d6db738f109e3d03f
2012-03-23 14:41:16 +09:00
Ken Wakasa
9f0ea52a5d
Add missing Word.hashCode()
...
Some cleanups too.
bug: 6209651
Change-Id: I94e2e29c92e90e554e4952d277d590e093766c4f
2012-03-23 13:11:39 +09:00
Ken Wakasa
2aa02b84a4
Revive the Makefile for makedict
...
Follow up to I4d2ef504. Address a compiler warning and a small optimization as well.
bug: 6188977
bug: 6209651
Change-Id: Ibc9da51d48ebf0b8815ad0bb2f697242970ba8f7
2012-03-22 11:55:18 +09:00
Tom Ouyang
e276c2401e
Move makedict to LatinIME android keyboard.
...
Bug: 6188977
Change-Id: I4d2ef504bb983abbda3cb52ee450cb46f58d95cf
2012-03-21 19:30:26 +09:00
satok
905670bd87
Add a dummy file and package for make dict
...
Change-Id: I195fd42f2a773bcc6fab0a61336a1c15d97902bb
2012-03-19 15:26:13 +09:00