Commit graph

36 commits

Author SHA1 Message Date
Yuichiro Hanada
8d031a63b4 Add put method to FusionDictionaryBufferInterface.
Change-Id: Iac0b35d2da05e81237d105e8fe13c56d16038de1
2012-09-12 15:41:21 +09:00
Yuichiro Hanada
e55b644aef Add new binary dictionary format.
Change-Id: Ia99411d4009857d5e420ca87ef8acf1f1826d3ed
2012-09-10 13:05:46 +09:00
Yuichiro Hanada
eae7b293e4 Check the length of the word when add to FusionDictionary.
Change-Id: Id98d18e90a8b83b597507728b467f56888c8fd12
2012-09-10 12:35:53 +09:00
Yuichiro Hanada
83dfe0fd8c Add FormatOptions.
Change-Id: Ibad05a5f9143de1156b2c897593ec89b0a0b07e7
2012-09-05 18:05:43 +09:00
Jean Chalard
2035b946a3 Merge "Reinstate the shortcut-only attribute" into jb-mr1-dev 2012-09-02 19:28:01 -07:00
Jean Chalard
72b1c93941 Reinstate the shortcut-only attribute
Also add the blacklist attribute

Bug: 7005742
Bug: 2704000
Change-Id: Icbe60bdf25bfb098d9e3f20870be30d6aef07c9d
2012-08-31 22:11:52 +09:00
Yuichiro Hanada
666a433802 add UserHistoryDictIOUtils.
Change-Id: I8a70e43b23f65b5fd5f0ee0b30a94ad8f5ef8a8a
2012-08-31 15:08:57 +09:00
Yuichiro Hanada
b2a43a2ed4 add readUnigramsAndBigramsBinary.
Change-Id: I7967f11211221d4877bf0a0c30183af885f45390
2012-08-31 14:39:19 +09:00
Yuichiro Hanada
62ed901100 add readHeader.
Change-Id: I5be5d62a63ca897e36fe93200ffdca6befb363aa
2012-08-30 14:17:50 +09:00
Yuichiro Hanada
f5c4ff4817 Add FusionDictionaryBufferInterface.
Change-Id: I8640c994231d5f46bc6e074ce8a5bf5344fed0aa
2012-08-29 19:27:49 +09:00
Yuichiro Hanada
d4fe7fda30 Use ByteBuffer when reading FusionDictionary from file.
Change-Id: Ia71561648e17f846d277c22309ac37c21c67a537
2012-08-24 13:31:08 +09:00
Jean Chalard
13822d2b05 Hack to skip reading an outdated binary file.
Bug: 7005813
Change-Id: Ie0d8d4b2d5eb147838ca23bdd5ec1cecd4f01151
2012-08-20 13:56:52 +09:00
Ken Wakasa
72c0f4de1d Merge "add reconstructBigramFrequency" into jb-mr1-dev 2012-08-17 03:19:12 -07:00
Yuichiro Hanada
c0a75c8ecb add reconstructBigramFrequency
Change-Id: Iff20dcb9ca0d6064bb118247887fe24b812c0c61
2012-08-17 19:05:16 +09:00
Jean Chalard
aa27635a8a Reword a confusing comment
Bug: 7005645
Change-Id: Ifd942b3ce242aeeec512e132e1cee31329e994b1
2012-08-17 17:22:28 +09:00
Jean Chalard
d10c473347 Small performance tweak
Change-Id: Icd540742073d49d12e70b2d8bd99aaf7ccb5802d
2012-06-08 17:09:40 +09:00
Jean Chalard
7214617622 Remove a slew of Eclipse warnings.
Change-Id: I03236386aea13fbd4fb8eaeee18e0008aa136502
2012-06-08 16:23:18 +09:00
Tadashi G. Takaoka
93ebf74bae Clean up some compiler warnings
Change-Id: I604da15e65fc3cf807ec4033df4e4cd5ef0196fc
2012-05-25 19:04:54 +09:00
Jean Chalard
418b343797 Use a formula packing more information into 4 bits field
Bug: 6313806
Change-Id: Id0779bd69afae0bb4a4a285340c1eb306544663a
2012-05-15 18:59:21 +09:00
Jean Chalard
76319c6931 Small optimization
Performance gain is < 2%

Bug: 6394357
Change-Id: I2b7da946788cf11d1a491efd20fb2bd2333c23d1
2012-05-14 15:52:01 +09:00
Jean Chalard
4df5b43df8 Small optimizations
Bug: 6394357
Change-Id: I00ba1b5ab3d527b3768e28090c758ddd1629f281
2012-05-14 15:51:58 +09:00
Jean Chalard
3b1b72ac4d More optimizations
We don't merge tails anyway, and we can't do it any more
because that would break the bigram lookup algorithm.
The speedup is about 20%, and possibly double this if
there are no bigrams.

Bug: 6394357

Change-Id: I9eec11dda9000451706d280f120404a2acbea304
2012-05-14 12:41:18 +09:00
Jean Chalard
f7346de94a Write the bigram frequency following the new formula
This also tests for bigram frequency against unigram frequency

Bug: 6313806
Bug: 6028348
Change-Id: If7faa3559fee9f2496890f0bc0e081279e100854
2012-05-11 20:27:22 +09:00
Jean Chalard
4455fe2c89 Refactor a method
Rename it, rename parameters, and add a parameter that will
be necessary soon.
Also, rescale the bigram frequency as necessary.

Bug: 6313806
Change-Id: I192543cfb6ab6bccda4a1a53c8e67fbf50a257b0
2012-05-11 19:34:35 +09:00
Jean Chalard
20a6dea1ca Add a flag for bigram presence in the header
This is a cherry-pick of Icb602762 onto jb-dev.

Bug: 6355745
Change-Id: Icb602762bb0d81472f024fa491571062ec1fc4e9
2012-04-26 16:40:29 +09:00
Jean Chalard
44c64f46a1 Ignore bigrams that are not also listed as unigrams
This is a cherry pick of I14b67e51 on jb-dev

Bug: 6340915
Change-Id: Iaa512abe1b19ca640ea201f9761fd7f1416270ed
2012-04-26 15:20:30 +09:00
Jean Chalard
805fed49e1 Merge "Fix binary reading code performance." 2012-04-23 23:39:37 -07:00
Jean Chalard
1d80a7f395 Fix binary reading code performance.
This is not the Right fix ; the Right fix would be to read
the file in a buffered way. However this delivers tolerable
performance for a minimal amount of code changes.
We may want to skip submitting this patch, but keep it around
in case we need to use the functionality until we have a good
patch.

Change-Id: I1ba938f82acfd9436c3701d1078ff981afdbea60
2012-04-24 15:16:17 +09:00
Jean Chalard
a64a1a46e4 Fix a bug where a node size would be seen as increasing.
The core reason for this is quite shrewd. When a word is a bigram
of itself, the corresponding chargroup will have a bigram referring
to itself. When computing bigram offsets, we use cached addresses of
chargroups, but we compute the size of the node as we go. Hence, a
discrepancy may happen between the base offset as seen by the bigram
(which uses the recomputed value) and the target offset (which uses
the cached value).
When this happens, the cached node address is too large. The relative
offset is negative, which is expected, since it points to this very
charnode whose start is a few bytes earlier. But since the cached
address is too large, the offset is computed as smaller than it should
be.
On the next pass, the cache has been refreshed with the newly computed
size and the seen offset is now correct (or at least, much closer to
correct). The correct value is larger than the previously computed
offset, which was too small. If it happens that it crosses the -255 or
-65335 boundary, the address will be seen as needing 1 more byte than
previously computed. If this is the only change in size of this node,
the node will be seen as having a larger size than previously, which
is unexpected. Debug code was catching this and crashing the program.

So this case is very rare, but in an even rarer occurence, it may
happen that in the same node, another chargroup happens to decrease
it size by the same amount. In this case, the node may be seen as
having not been modified. This is probably extremely rare. If on
top of this, it happens that no other node has been modified, then
the file may be seen as complete, and the discrepancy left as is
in the file, leading to a broken file. The probability that this
happens is abyssally low, but the bug exists, and the current debug
code would not have caught this.
To further catch similar bugs, this change also modifies the test
that  decides if the node has changed. On grounds that all components
of a node may only decrease in size with each successive pass, it's
theoritically safe to assume that the same size means the node
contents have not changed, but in case of a bug like the bug above
where a component wrongly grows while another shrinks and both cancel
each other out, the new code will catch this. Also, this change adds
a check against the number of passses, to avoid infinite loops in
case of a bug in the computation code.

This change fixes this bug by updating the cached address of each
chargroup as we go. This eliminates the discrepancy and fixes the
bug.

Bug: 6383103
Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01
2012-04-24 14:04:02 +09:00
Tom Ouyang
df7ebbbd61 Change binary dictionary output buffer size to match dictionary size.
Bug: 6355943
Change-Id: Iaab7bc16ba0dbc7bfde70b06e7bd355519838831
2012-04-19 10:18:57 -07:00
Jean Chalard
f420df2823 Add support for German umlaut and French ligatures flags
Bug: 6202812
Change-Id: Ib4a7f96f6ef86c840069b15d04393f84d428c176
2012-04-06 17:07:29 +09:00
Jean Chalard
8cf1a8d04f Remove the shortcutOnly attribute which is now useless.
Change-Id: Ifccdfdaf7c0066bb7728981503baceff0fedb71f
2012-04-06 16:27:53 +09:00
Jean Chalard
c734c2aca1 Add a simple way to input dictionary header attributes
Just add them as an attribute to the root of the XML node.

Bug: 6202812
Change-Id: Idf040bfebf20a72f9e4370930a85d97df593f484
2012-04-03 15:18:51 +09:00
Jean Chalard
752996540f Add read support for string shortcuts for makedict.
Change-Id: I48ee4fc9ac703ad2a680b3cd848de91c415ea3c8
2012-03-28 20:40:08 +09:00
Jean Chalard
3bbb31f3f0 Change the format of the shortcuts in the binary dict.
This only includes the write part of the change. The read part is
coming in a different commit.

Change-Id: Iabe7af6cd134462dc19245f5400719920ed31c8f
2012-03-28 20:24:07 +09:00
Tom Ouyang
e276c2401e Move makedict to LatinIME android keyboard.
Bug: 6188977
Change-Id: I4d2ef504bb983abbda3cb52ee450cb46f58d95cf
2012-03-21 19:30:26 +09:00
Renamed from tools/makedict/src/com/android/inputmethod/latin/makedict/BinaryDictInputOutput.java (Browse further)