LatinIME

Author	SHA1	Message	Date
Jean Chalard	54e84a00fc	Make a makedict command for dicttool (A3) This behaves exactly as the old makedict command. Further changes will redirect the calls to makedict to this, so as to consolidate similar code. Groundwork for Bug: 6429606 Change-Id: Ibeadbf48bec70f988a15ca36ebf5d1ce3b5b54ea	2012-08-04 01:11:46 +09:00
Jean Chalard	d10c473347	Small performance tweak Change-Id: Icd540742073d49d12e70b2d8bd99aaf7ccb5802d	2012-06-08 17:09:40 +09:00
Jean Chalard	7214617622	Remove a slew of Eclipse warnings. Change-Id: I03236386aea13fbd4fb8eaeee18e0008aa136502	2012-06-08 16:23:18 +09:00
Tadashi G. Takaoka	93ebf74bae	Clean up some compiler warnings Change-Id: I604da15e65fc3cf807ec4033df4e4cd5ef0196fc	2012-05-25 19:04:54 +09:00
Jean Chalard	418b343797	Use a formula packing more information into 4 bits field Bug: 6313806 Change-Id: Id0779bd69afae0bb4a4a285340c1eb306544663a	2012-05-15 18:59:21 +09:00
Jean Chalard	76319c6931	Small optimization Performance gain is < 2% Bug: 6394357 Change-Id: I2b7da946788cf11d1a491efd20fb2bd2333c23d1	2012-05-14 15:52:01 +09:00
Jean Chalard	4df5b43df8	Small optimizations Bug: 6394357 Change-Id: I00ba1b5ab3d527b3768e28090c758ddd1629f281	2012-05-14 15:51:58 +09:00
Jean Chalard	3b1b72ac4d	More optimizations We don't merge tails anyway, and we can't do it any more because that would break the bigram lookup algorithm. The speedup is about 20%, and possibly double this if there are no bigrams. Bug: 6394357 Change-Id: I9eec11dda9000451706d280f120404a2acbea304	2012-05-14 12:41:18 +09:00
Jean Chalard	12efad3d15	Some more obvious optimizations The speedup is about 15% Bug: 6394357 Change-Id: Ibd57363d9d793206dd916d8927366db4192083b6	2012-05-14 12:35:31 +09:00
Jean Chalard	47db0be7cb	Some obvious optimizations to makedict Bug: 6394357 Change-Id: Ibfd98aac2304ef50cf90b1de984736ddcfe7a4bc	2012-05-14 12:34:05 +09:00
Jean Chalard	f7346de94a	Write the bigram frequency following the new formula This also tests for bigram frequency against unigram frequency Bug: 6313806 Bug: 6028348 Change-Id: If7faa3559fee9f2496890f0bc0e081279e100854	2012-05-11 20:27:22 +09:00
Jean Chalard	4455fe2c89	Refactor a method Rename it, rename parameters, and add a parameter that will be necessary soon. Also, rescale the bigram frequency as necessary. Bug: 6313806 Change-Id: I192543cfb6ab6bccda4a1a53c8e67fbf50a257b0	2012-05-11 19:34:35 +09:00
Ken Wakasa	84478103ec	Tidy up the MakedictLog class. Follow up to I436b2b7b Change-Id: Id17b134dab2f876b874a505e92a379c8b5567fa4	2012-05-05 23:40:21 +09:00
Ken Wakasa	03b423f313	Suppress debug log from makedict in LatinIME bug: 6447900 Change-Id: I436b2b7b261b422a7edca9cb99a4689b63877fe0	2012-05-05 09:28:27 +09:00
Jean Chalard	20a6dea1ca	Add a flag for bigram presence in the header This is a cherry-pick of Icb602762 onto jb-dev. Bug: 6355745 Change-Id: Icb602762bb0d81472f024fa491571062ec1fc4e9	2012-04-26 16:40:29 +09:00
Jean Chalard	44c64f46a1	Ignore bigrams that are not also listed as unigrams This is a cherry pick of I14b67e51 on jb-dev Bug: 6340915 Change-Id: Iaa512abe1b19ca640ea201f9761fd7f1416270ed	2012-04-26 15:20:30 +09:00
Jean Chalard	805fed49e1	Merge "Fix binary reading code performance."	2012-04-23 23:39:37 -07:00
Jean Chalard	1d80a7f395	Fix binary reading code performance. This is not the Right fix ; the Right fix would be to read the file in a buffered way. However this delivers tolerable performance for a minimal amount of code changes. We may want to skip submitting this patch, but keep it around in case we need to use the functionality until we have a good patch. Change-Id: I1ba938f82acfd9436c3701d1078ff981afdbea60	2012-04-24 15:16:17 +09:00
Jean Chalard	a64a1a46e4	Fix a bug where a node size would be seen as increasing. The core reason for this is quite shrewd. When a word is a bigram of itself, the corresponding chargroup will have a bigram referring to itself. When computing bigram offsets, we use cached addresses of chargroups, but we compute the size of the node as we go. Hence, a discrepancy may happen between the base offset as seen by the bigram (which uses the recomputed value) and the target offset (which uses the cached value). When this happens, the cached node address is too large. The relative offset is negative, which is expected, since it points to this very charnode whose start is a few bytes earlier. But since the cached address is too large, the offset is computed as smaller than it should be. On the next pass, the cache has been refreshed with the newly computed size and the seen offset is now correct (or at least, much closer to correct). The correct value is larger than the previously computed offset, which was too small. If it happens that it crosses the -255 or -65335 boundary, the address will be seen as needing 1 more byte than previously computed. If this is the only change in size of this node, the node will be seen as having a larger size than previously, which is unexpected. Debug code was catching this and crashing the program. So this case is very rare, but in an even rarer occurence, it may happen that in the same node, another chargroup happens to decrease it size by the same amount. In this case, the node may be seen as having not been modified. This is probably extremely rare. If on top of this, it happens that no other node has been modified, then the file may be seen as complete, and the discrepancy left as is in the file, leading to a broken file. The probability that this happens is abyssally low, but the bug exists, and the current debug code would not have caught this. To further catch similar bugs, this change also modifies the test that decides if the node has changed. On grounds that all components of a node may only decrease in size with each successive pass, it's theoritically safe to assume that the same size means the node contents have not changed, but in case of a bug like the bug above where a component wrongly grows while another shrinks and both cancel each other out, the new code will catch this. Also, this change adds a check against the number of passses, to avoid infinite loops in case of a bug in the computation code. This change fixes this bug by updating the cached address of each chargroup as we go. This eliminates the discrepancy and fixes the bug. Bug: 6383103 Change-Id: Ia3f450e22c87c4c193cea8ddb157aebd5f224f01	2012-04-24 14:04:02 +09:00
Tom Ouyang	df7ebbbd61	Change binary dictionary output buffer size to match dictionary size. Bug: 6355943 Change-Id: Iaab7bc16ba0dbc7bfde70b06e7bd355519838831	2012-04-19 10:18:57 -07:00
Jean Chalard	f420df2823	Add support for German umlaut and French ligatures flags Bug: 6202812 Change-Id: Ib4a7f96f6ef86c840069b15d04393f84d428c176	2012-04-06 17:07:29 +09:00
Jean Chalard	b8060399c7	Remove constructors And small cleanup. Change-Id: I1de903f42c1b8d57a488be2162e0b94055a6d1f2	2012-04-06 16:53:15 +09:00
Jean Chalard	8cf1a8d04f	Remove the shortcutOnly attribute which is now useless. Change-Id: Ifccdfdaf7c0066bb7728981503baceff0fedb71f	2012-04-06 16:27:53 +09:00
Jean Chalard	c734c2aca1	Add a simple way to input dictionary header attributes Just add them as an attribute to the root of the XML node. Bug: 6202812 Change-Id: Idf040bfebf20a72f9e4370930a85d97df593f484	2012-04-03 15:18:51 +09:00
Jean Chalard	e705a122d1	Remove useless adding of shortcut as unigrams. Change-Id: I1f50ebf00d6dd0dad4114fad86ace5b7b304613a	2012-03-28 20:40:38 +09:00
Jean Chalard	752996540f	Add read support for string shortcuts for makedict. Change-Id: I48ee4fc9ac703ad2a680b3cd848de91c415ea3c8	2012-03-28 20:40:08 +09:00
Jean Chalard	3bbb31f3f0	Change the format of the shortcuts in the binary dict. This only includes the write part of the change. The read part is coming in a different commit. Change-Id: Iabe7af6cd134462dc19245f5400719920ed31c8f	2012-03-28 20:24:07 +09:00
Tom Ouyang	b163f91621	Merge "Add support for updating and adding bigrams to existing nodes."	2012-03-23 05:57:55 -07:00
Tom Ouyang	7cfe20efbe	Add support for updating and adding bigrams to existing nodes. Bug: 6188977 Change-Id: I48aca8ba199247d73395ab13b9d1976f4e739208	2012-03-23 21:52:39 +09:00
Ken Wakasa	066866954a	Add a missing comparison in Word.equals() Follow up to I94e2e29c bug: 6209651 Change-Id: Iff2daca8c2678e2d1796f98d6db738f109e3d03f	2012-03-23 14:41:16 +09:00
Ken Wakasa	9f0ea52a5d	Add missing Word.hashCode() Some cleanups too. bug: 6209651 Change-Id: I94e2e29c92e90e554e4952d277d590e093766c4f	2012-03-23 13:11:39 +09:00
Ken Wakasa	2aa02b84a4	Revive the Makefile for makedict Follow up to I4d2ef504. Address a compiler warning and a small optimization as well. bug: 6188977 bug: 6209651 Change-Id: Ibc9da51d48ebf0b8815ad0bb2f697242970ba8f7	2012-03-22 11:55:18 +09:00
Tom Ouyang	e276c2401e	Move makedict to LatinIME android keyboard. Bug: 6188977 Change-Id: I4d2ef504bb983abbda3cb52ee450cb46f58d95cf	2012-03-21 19:30:26 +09:00
satok	905670bd87	Add a dummy file and package for make dict Change-Id: I195fd42f2a773bcc6fab0a61336a1c15d97902bb	2012-03-19 15:26:13 +09:00

34 commits