Commit Graph

53 Commits (78212a6d3de2c1fdaa394c58a16cbdee3ad5d046)

Author SHA1 Message Date
Keisuke Kuroyanagi 78212a6d3d Use enum to specify ngram type.
Change-Id: Ie28768ceadcd7a2d940c57eb30be7d4c364e509f
2014-11-25 19:07:10 +09:00
Keisuke Kuroyanagi 2383575d2d Change v403 historical info format.
count -> 2B, level -> 0B.

Change-Id: I3b241126f56eb33cdf09cb1ebfed04f534e4ec48
2014-10-31 17:22:13 +09:00
Keisuke Kuroyanagi bcb52d73e2 Enable count based dynamic ngram language model for v403.
Bug: 14425059

Change-Id: Icc15e14cfd77d37cd75f75318fd0fa36f9ca7a5b
2014-10-30 23:38:19 +09:00
Keisuke Kuroyanagi 660b00477c Add DynamicLanguageModelProbabilityUtils.
Bug: 14425059
Change-Id: Ia58ab3f0ead02798046d182a9464dcbd95f086bc
2014-10-30 21:33:57 +09:00
Keisuke Kuroyanagi c2ba0ce411 Fix: TRT and ime-simulator bulid.
Change-Id: I1697a907562d1ed6aff2b001763d1594263ba0d3
2014-10-30 01:01:40 +09:00
Keisuke Kuroyanagi 6b0561f9d2 Add a class to have global counters for LanguageModelDictContent.
Bug: 14425059
Change-Id: I08ec19903432356b6028853fd73b4eefce20218e
2014-10-29 21:05:41 +09:00
Keisuke Kuroyanagi b5ef884fbb Support dumping ngram entries.
Bug: 14425059
Change-Id: Ib03a0c3d166ed6f1e60c67127b28006d55143b6b
2014-10-22 18:15:53 +09:00
Keisuke Kuroyanagi c9865785f4 Support ngram entry migration.
Bug: 14425059
Change-Id: I98cb9fa303af2d93a0a3512e8732231c564e3c5d
2014-10-22 11:31:16 +09:00
Keisuke Kuroyanagi 47fc656cd7 Use EntryCounters during GC.
Bug: 14425059
Change-Id: I61eb798686dc753fb6c0fe99a0719c1732198f30
2014-10-21 16:36:03 +09:00
Keisuke Kuroyanagi e8750d970e Introduce EntryCounters to count entries in a dictionary.
Bug: 14425059

Change-Id: Ic13ba827d96fa4a147485ba92fdb37e23e04e8e8
2014-10-21 15:46:14 +09:00
Keisuke Kuroyanagi 3601c214f8 Update useless n-gram entry detection logic during GC.
Bug: 14425059
Change-Id: Ib939deae5b60167751dee07965bb1ef1a43c4625
2014-10-15 20:43:27 +09:00
Keisuke Kuroyanagi 72d17d9209 Use better conditional probability for ngram entries.
Old:
P(W | W_prev) = f(W, W_prev) + C
New:
P(W | W_prev) = f(W, W_prev) / f(W_prev)

Bug: 14425059
Bug: 16547409

Change-Id: I4d13be6de2c6bad6bad7fb22320a23ba4ecd361c
2014-10-15 18:23:00 +09:00
Keisuke Kuroyanagi 5400701908 Move entry updating method to language model dict content.
Bug: 14425059
Change-Id: I710055490d141539458cbf968adf5a7ccffd9552
2014-10-15 12:29:31 +09:00
Keisuke Kuroyanagi 287e155e44 Move HistoricalInfo to property and use it in *Property.
Bug: 14425059
Change-Id: Icccccabad98fb543c6a6be2844cfc0086d80b739
2014-10-01 11:39:33 +09:00
Keisuke Kuroyanagi 79bb37d499 Rename BigramProperty to NgramProperty.
Remaining work is changing bigram to ngram for supporting
ngram entry counting, dumping, and migration.

Bug: 14425059
Change-Id: Ifba288a1166996d62a5e57698f63537ea0a2a8ee
2014-09-29 19:10:39 +09:00
Keisuke Kuroyanagi cb4f544198 Quit reading unigram probability in Ver4PatriciaTrieNodeReader.
Bug: 14425059
Change-Id: I4fc7b0e236151a2c64e7131772264024c6597633
2014-09-25 11:41:50 +09:00
Keisuke Kuroyanagi 7d911d6f91 Move word flags to language model dict content.
Bug: 14425059
Change-Id: I64712e5c83d0bc241e6f0f16117ab47b5d75bd4b
2014-09-24 14:15:34 +09:00
Keisuke Kuroyanagi 09c154925f Add firstOrDefault and lastOrDefault to IntArrayView.
Change-Id: I854c02eff3fa0b53c72a5f1cabce001f4854ada0
2014-09-17 21:16:31 +09:00
Keisuke Kuroyanagi 4926b90ec5 Support n-gram for look-up.
Bug: 14425059
Change-Id: I19523c29fb802cd65158c7540d1608e7f55c4ca7
2014-09-17 16:20:00 +09:00
Keisuke Kuroyanagi 36ba139ca6 Support decaying dict in getWordProbability().
Bug: 14425059
Change-Id: I24db3f9131c2999fc388035dc365c7faaef3bdb1
2014-09-14 17:29:50 +09:00
Keisuke Kuroyanagi 395fe8e98d Implement LanguageModelDictContent.getWordProbability().
Bug: 14425059
Change-Id: I290a05cee6f341caa25fb222892505529cef1eb7
2014-09-10 19:51:12 +09:00
Keisuke Kuroyanagi 93e3b5a16f Add TerminalPositionLookupTableTest.
Change-Id: I4a3ab4c94a7759d7f24c7edc9c167fe6bbdd3eb7
2014-08-29 14:16:15 +09:00
Keisuke Kuroyanagi fe395232d6 Remove bigram dict content.
Bug: 14425059
Change-Id: I75918c6761a50832da511088eb83becd56b23662
2014-08-27 20:05:59 +09:00
Keisuke Kuroyanagi 758d093644 Get entry count after truncation using LanguageModelDictContent.
Bug: 14425059
Change-Id: I41b237c1c22c21740946d52e3be9d6f963c9cd54
2014-08-27 20:04:39 +09:00
Keisuke Kuroyanagi 07b3b41c25 Add a method to iterate entries in LanguageModelDictContent.
Bug: 14425059
Change-Id: I4e9c3a97891c020f762fa709f806d333c067f496
2014-08-26 12:01:08 +09:00
Keisuke Kuroyanagi 063f86d40f Truncate entries in language model dict content.
Bug: 14425059

Change-Id: I023c1d5109a2c43fcea3bb11a0fd7198c82891ba
2014-08-22 20:13:04 +09:00
Keisuke Kuroyanagi 9aa6699107 Update probabilities in language model dict content for GC.
Bug: 14425059
Change-Id: I354408afd8e5c1955ff0acea3d0243d628fe3843
2014-08-22 20:07:54 +09:00
Keisuke Kuroyanagi ace03d7919 Merge "Add BoS flag in probability entry." 2014-08-16 04:15:21 +00:00
Keisuke Kuroyanagi 623067a183 Add BoS flag in probability entry.
Bug: 14425059

Change-Id: I50439630034ada0280c44cbbb308aa0b95b72048
2014-08-19 11:49:05 +09:00
Keisuke Kuroyanagi bfcd5efd50 Merge "Use byte array view in ver4 dict contents." 2014-08-16 04:15:21 +00:00
Keisuke Kuroyanagi 1f6e52ef02 Use byte array view in ver4 dict contents.
Change-Id: Icf79a51a200f7ccd775264d1a83dd61e7dcfbab2
2014-08-18 22:46:10 +09:00
Keisuke Kuroyanagi d3097c67ca Remove entry from language model dict content.
Bug: 14425059
Change-Id: Iea51c0ae908d499da19839de06222a1c4d19088e
2014-08-18 12:34:50 +09:00
Keisuke Kuroyanagi b4531d861e Add method to remove entry from language model dict content.
Bug: 14425059
Change-Id: Id21af0110e770caa3e95cb5d7ba8b3d1af8e0b12
2014-08-18 12:34:48 +09:00
Keisuke Kuroyanagi 9a23f0fba2 Add bigrams to language model content.
Bug: 14425059

Change-Id: Id81e3775ea0104750a23e3dca62c00681ed8dc2e
2014-08-12 20:32:42 +09:00
Keisuke Kuroyanagi 03dc44f543 Add/Get n-gram probability entry in languageModelDictContent
Bug: 14425059
Change-Id: I7926c3812f89b9a71fe1873a5bc32f793f91b640
2014-08-06 00:42:56 +00:00
Keisuke Kuroyanagi 851e0458fe Remove ProbabilityDictContent and use LanguageModelDictContent
Bug: 14425059
Change-Id: I1bb9e78ecb24139b87c99be6722e37eec0a2285d
2014-08-05 14:13:07 +09:00
Keisuke Kuroyanagi 0889484266 Add methods for unigrams to LanguageModelDictContent.
Bug: 14425059
Change-Id: I0a6b480a3d4735787ffac68c47b4ffefc3f1b8a5
2014-08-05 12:38:55 +09:00
Keisuke Kuroyanagi c4696b2eb6 Save language model in the body buffer.
Bug: 14425059
Change-Id: Iaec277f7bed03d6c6780c6ce90fbe5fe799e175e
2014-08-01 20:19:16 +09:00
Keisuke Kuroyanagi c0c674cdc0 Make MmappedBuffer use byte array view.
Bug: 16691311
Change-Id: I2122c01ee27c33e11dec52643925c069927bea2b
2014-08-01 19:26:01 +09:00
Keisuke Kuroyanagi dc3856d758 Add LanguageModelDictContent.
This class will replace BigramDictContent and
ProbabilityDictContent.

Bug: 14425059
Change-Id: I3d15c833957e27b2f5999386db042188272bbb4b
2014-08-01 12:45:00 +09:00
Keisuke Kuroyanagi 90b7c1729f Remove DictContent.
Bug: 14425059
Change-Id: I74fa4b6ba4605447c1c87427371e4be5eb8e7ae6
2014-08-01 12:06:21 +09:00
Keisuke Kuroyanagi b22f95ec8a Remove isUpdatable from constructors of dict contents.
Change-Id: I2d54f477d9b341e944e265786a734f23d152bb81
2014-07-11 15:23:55 +09:00
Keisuke Kuroyanagi 2ac934296c Concatenate dict buffers other than header to a single file.
Bug: 13664080
Change-Id: I34c9d8046b339c9b855be378a5fad907382d1359
2014-07-11 15:15:47 +09:00
Keisuke Kuroyanagi 804f7450fc Use linked list for bigram list.
BinaryDictionaryTests for VERSION4_DEV:
Before
Time: 36.461
After
Time: 33.031

Bug: 14425059

Change-Id: I9ca2714f450f61f713df6ebd34c953dece991cdb
2014-07-07 21:09:25 +09:00
Keisuke Kuroyanagi f9ce867d80 Add boundary check for v4 bigram reading.
Bug: 14496386
Change-Id: Iedd3445c3222a777a2476beed7d9eb53773f406c
2014-05-27 19:29:35 +09:00
Keisuke Kuroyanagi 64341927d2 Quit use bigram probability diff for ver4 dict.
Change-Id: I2cfcfbcf351877d1dff466a24974dbb05908f14e
2014-05-15 16:02:58 +09:00
Keisuke Kuroyanagi ad518d9a5b Avoid copying bigram list if possible.
Constructing en_US main dict using dicttool:
Before:
real    1m8.699s
user    1m10.600s
sys     0m2.390s
After:
real    0m17.204s
user    0m20.560s
sys     0m0.720s


Bug: 13406708
Change-Id: I3b0476be57e5cb93c6497025b3ffa7064ac326c6
2014-05-08 14:19:33 +09:00
Keisuke Kuroyanagi 8d8fb396a0 Add new bigram entry at the tail of existing list.
Bug: 13406708
Change-Id: If3162e65fc9aa2c47f046aee528276cb51fad9f4
2014-05-01 19:29:43 +09:00
Ken Wakasa 8ca9be17db s/hash_map_compat/unordered_map/
Change-Id: Icce5f9a12b04bdd7540c52750d303a585d71f28a
2014-04-11 18:07:59 +09:00
Keisuke Kuroyanagi 4ce480d5ce Use unique_ptr.
Change-Id: Id92a5b07da4f7f95e2cd293ce8dc1a5f979b7853
2014-03-07 14:31:54 +09:00