Commit Graph

267 Commits (9ebba46c775f37abeb0451602cb323fd45adf33b)

Author SHA1 Message Date
Jean Chalard 1d80a7f395 Fix binary reading code performance.
This is not the Right fix ; the Right fix would be to read
the file in a buffered way. However this delivers tolerable
performance for a minimal amount of code changes.
We may want to skip submitting this patch, but keep it around
in case we need to use the functionality until we have a good
patch.

Change-Id: I1ba938f82acfd9436c3701d1078ff981afdbea60
2012-04-24 15:16:17 +09:00
Jean Chalard a64a1a46e4 Fix a bug where a node size would be seen as increasing.
The core reason for this is quite shrewd. When a word is a bigram
of itself, the corresponding chargroup will have a bigram referring
to itself. When computing bigram offsets, we use cached addresses of
chargroups, but we compute the size of the node as we go. Hence, a
discrepancy may happen between the base offset as seen by the bigram
(which uses the recomputed value) and the target offset (which uses
the cached value).
When this happens, the cached node address is too large. The relative
offset is negative, which is expected, since it points to this very
charnode whose start is a few bytes earlier. But since the cached
address is too large, the offset is computed as smaller than it should
be.
On the next pass, the cache has been refreshed with the newly computed
size and the seen offset is now correct (or at least, much closer to
correct). The correct value is larger than the previously computed
offset, which was too small. If it happens that it crosses the -255 or
-65335 boundary, the address will be seen as needing 1 more byte than
previously computed. If this is the only change in size of this node,
the node will be seen as having a larger size than previously, which
is unexpected. Debug code was catching this and crashing the program.

So this case is very rare, but in an even rarer occurence, it may
happen that in the same node, another chargroup happens to decrease
it size by the same amount. In this case, the node may be seen as
having not been modified. This is probably extremely rare. If on
top of this, it happens that no other node has been modified, then
the file may be seen as complete, and the discrepancy left as is
in the file, leading to a broken file. The probability that this
happens is abyssally low, but the bug exists, and the current debug
code would not have caught this.
To further catch similar bugs, this change also modifies the test
that  decides if the node has changed. On grounds that all components
of a node may only decrease in size with each successive pass, it's
theoritically safe to assume that the same size means the node
contents have not changed, but in case of a bug like the bug above
where a component wrongly grows while another shrinks and both cancel
each other out, the new code will catch this. Also, this change adds
a check against the number of passses, to avoid infinite loops in
case of a bug in the computation code.

This change fixes this bug by updating the cached address of each
chargroup as we go. This eliminates the discrepancy and fixes the
bug.

Bug: 6383103
Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01
2012-04-24 14:04:02 +09:00
Tom Ouyang df7ebbbd61 Change binary dictionary output buffer size to match dictionary size.
Bug: 6355943
Change-Id: Iaab7bc16ba0dbc7bfde70b06e7bd355519838831
2012-04-19 10:18:57 -07:00
Jean Chalard f420df2823 Add support for German umlaut and French ligatures flags
Bug: 6202812
Change-Id: Ib4a7f96f6ef86c840069b15d04393f84d428c176
2012-04-06 17:07:29 +09:00
Jean Chalard b8060399c7 Remove constructors
And small cleanup.

Change-Id: I1de903f42c1b8d57a488be2162e0b94055a6d1f2
2012-04-06 16:53:15 +09:00
Jean Chalard 8cf1a8d04f Remove the shortcutOnly attribute which is now useless.
Change-Id: Ifccdfdaf7c0066bb7728981503baceff0fedb71f
2012-04-06 16:27:53 +09:00
Jean Chalard c734c2aca1 Add a simple way to input dictionary header attributes
Just add them as an attribute to the root of the XML node.

Bug: 6202812
Change-Id: Idf040bfebf20a72f9e4370930a85d97df593f484
2012-04-03 15:18:51 +09:00
Jean Chalard e705a122d1 Remove useless adding of shortcut as unigrams.
Change-Id: I1f50ebf00d6dd0dad4114fad86ace5b7b304613a
2012-03-28 20:40:38 +09:00
Jean Chalard 752996540f Add read support for string shortcuts for makedict.
Change-Id: I48ee4fc9ac703ad2a680b3cd848de91c415ea3c8
2012-03-28 20:40:08 +09:00
Jean Chalard 3bbb31f3f0 Change the format of the shortcuts in the binary dict.
This only includes the write part of the change. The read part is
coming in a different commit.

Change-Id: Iabe7af6cd134462dc19245f5400719920ed31c8f
2012-03-28 20:24:07 +09:00
Tom Ouyang b163f91621 Merge "Add support for updating and adding bigrams to existing nodes." 2012-03-23 05:57:55 -07:00
Tom Ouyang 7cfe20efbe Add support for updating and adding bigrams to existing nodes.
Bug: 6188977
Change-Id: I48aca8ba199247d73395ab13b9d1976f4e739208
2012-03-23 21:52:39 +09:00
Ken Wakasa 066866954a Add a missing comparison in Word.equals()
Follow up to I94e2e29c

bug: 6209651
Change-Id: Iff2daca8c2678e2d1796f98d6db738f109e3d03f
2012-03-23 14:41:16 +09:00
Ken Wakasa 9f0ea52a5d Add missing Word.hashCode()
Some cleanups too.

bug: 6209651
Change-Id: I94e2e29c92e90e554e4952d277d590e093766c4f
2012-03-23 13:11:39 +09:00
Ken Wakasa 2aa02b84a4 Revive the Makefile for makedict
Follow up to I4d2ef504.  Address a compiler warning and a small optimization as well.

bug: 6188977
bug: 6209651
Change-Id: Ibc9da51d48ebf0b8815ad0bb2f697242970ba8f7
2012-03-22 11:55:18 +09:00
Tom Ouyang e276c2401e Move makedict to LatinIME android keyboard.
Bug: 6188977
Change-Id: I4d2ef504bb983abbda3cb52ee450cb46f58d95cf
2012-03-21 19:30:26 +09:00
satok 905670bd87 Add a dummy file and package for make dict
Change-Id: I195fd42f2a773bcc6fab0a61336a1c15d97902bb
2012-03-19 15:26:13 +09:00