There are two problems here. The first one is the tests would send
an invalid unicode character. Although we could want dicttool to
handle this more gracefully, it's fine for now.
The second problem is much more serious. If a node has more than
128 children, then the java code will crash trying to read the
dictionary back because of a bug that this change fixes. In
theory, it's possible that happens when we try to load the user
history dictionary back from the disk - native code is not affected
so there is no other point that may cause a problem.
In the practice, that means you'd need to have 129 words with a
common prefix (including empty string) but all different after
this. It's almost impossible with Google Keyboard since there are
only so many keys on the keyboard that you can make a word out
of, and then again you'd have to do it repeatedly until it
actually enters the user history dictionary, wait for it to get
saved on the disk.
The bad news is, if you manage to get this far, the keyboard will
crash every time and won't be able to get up until you clear
data for the package.
The good news is, the dictionary itself is not corrupted and only
the reading code is wrong. So updating to a newer version would
actually even recover from this situation.
All in all, considering how almost-impossible this is to trigger,
I don't think even a single user actually did hit this bug.
Bug: 8583091
Change-Id: Iabb2a7f47cbd9ed3193d2a3487318d280753e071
Both bugs only affect debug mode. One has the wrong object tested
with equals, the other has the iteration failing in some cases.
Change-Id: Ie9100d257a3f9e3be340cf3e38116f63417bdc1a
The important bug is in findWordInTree. The problem, which is
not obvious, is that we were calling codePointAt() with the
code point index in the string, instead of the char index.
The other bug this change fixes was harmless in the practice,
because it's in the iteration which is only used for debug and
pretty printing purposes. It's very similar in that it would
substract a length in code point to a length in chars and
truncate a StringBuilder at that length, so it would fail in a
quite similar manner. This changes the meaning of the "length"
attribute in Position, but it's clearer this way anyway.
Bug: 8450145
Change-Id: If396f883a9e6449de39351553ba83f5be5bd30f0
We used to make the dictionary that we passed to the
dictionary pack as an initial value based on the locale.
This is wrong - it should be read from the dictionary.
This change fixes that.
Bug: 7005813
Change-Id: Ib08ed31dd9c216f6f7b9c6c3174ca514bf96e06f
The "correct" bigram frequency is now returned by the reading
code. However, as the binary format represents the frequency
in a lossy manner, the frequency is not guaranteed to be the
exact same as the one in the source text format - only a close
enough value. It is however the exact same value seen by the
native code.
Bug: 7395653
Change-Id: I49199ef18901c671189912b3550623e9643baedd