Commit graph

110 commits

Author SHA1 Message Date
Adrian Velicu
9f46834839 dicttool header to read stream exhaustively
Change-Id: I50a286c115f5bd6e93763bd2f79031676d6fffd8
2014-11-11 18:10:26 -08:00
Adrian Velicu
1e72f9da12 Dicttool to handle unpackaging non-latest version dicts
Change-Id: I738735186213b3a40eff997ae2fd83069c6445f1
2014-11-11 16:35:04 -08:00
Adrian Velicu
0691f29d36 Merge "Making 'dicttool header' output format version" 2014-11-11 01:34:36 +00:00
Adrian Velicu
8e394ffcf4 Making 'dicttool header' output format version
Change-Id: I4198f6b463711feb4ab78020934cca4d23870fbb
2014-11-08 10:35:37 +09:00
Jean Chalard
5b91b551e5 Move util classes under common
Also why did we have two copies of LocaleUtils >.>

Bug: 18108776
Change-Id: I03b4403dfd51934e66b567f2f8b87da419cfb3ab
2014-11-07 18:00:03 +09:00
Jean Chalard
5b33d197ba Add a header command to dicttool.
This will allow to greatly improve the performance of the
metadata-generating files, as they won't have to wait for
the info command to read the entire dictionary when the
header is all we need.

Also add tests, and while we're at it, use the seed as
intended to enable reproducible tests.

Change-Id: I0ba79ef62f0292b23e63aed57ff565bb102281a2
2014-11-06 18:50:59 +09:00
Jean Chalard
f6b0e32df3 Add a *FAST* dictionary header reader.
It's still unused as of this change but the next change will use it

As a reference point, generating the metadata for Bayo takes
3'02" on my machine with the info command; it's down to 16" if
made to use this instead. The gains increases with the number
of dictionaries obviously.

Change-Id: I0eeea2d8f81bb74b0d1570af658e91b56f7c2b79
2014-11-06 13:17:08 +09:00
Jean Chalard
5564317f83 Genericize getting a raw dictionary
This will allow for not copying the whole dictionary when only
the header is needed.

Change-Id: Ie4a649b507ccd4a430201824ed87b8b8bbf55e9f
2014-11-06 13:12:39 +09:00
Jean Chalard
ae55db95a7 Large simplification in obtaining a raw dictionary
That is where the last refactorings were leading. This code is
simpler, but it's far more flexible. Importantly, it only makes
a single copy instead of making a full disk copy for every
intermediate step.
Next we're going to make the "copy" part modular for processes
that don't need to copy the whole file.

Change-Id: Ief32ac665d804b9b20c44f443a9c87452ceb367a
2014-11-05 12:27:35 +09:00
Keisuke Kuroyanagi
3cde19ded1 Merge "Initial commit for native dicttoolkit." 2014-10-31 11:29:20 +00:00
Keisuke Kuroyanagi
e101a53ffc Initial commit for native dicttoolkit.
Bug: 10059681

Change-Id: Ib730af8ebc944e08aaada869c0626724a499747c
2014-10-31 20:27:06 +09:00
Adrian Velicu
7c87859d4c Using "blacklist" flag as "possibly offensive"
Bug: 11031090
Change-Id: I5cc0d006ab003656498eb82b0875eb9c051d331e
2014-10-31 14:33:05 +09:00
Tadashi G. Takaoka
067d8cdf56 Fix unit test breakage
Change-Id: I538288054a58eb2c81ce3cbe5c9bef900fb653a5
2014-10-24 16:48:46 +09:00
Jean Chalard
9e58ae4698 Merge "Some more simplification of DecoderSpec works" 2014-10-24 03:56:22 +00:00
Jean Chalard
40c11fdbff Merge "Simplify handling of steps in DecoderChainSpec" 2014-10-24 03:50:47 +00:00
Jean Chalard
afdde63374 Some more simplification of DecoderSpec works
Change-Id: I23fa4e4ed96228406e70aa94d84fd7b8d3f69347
2014-10-23 16:57:14 +09:00
Jean Chalard
52e92b8a3f Simplify handling of steps in DecoderChainSpec
This is a preliminary refactoring change to improve performance
in dicttool diagnostic tools.

Change-Id: I9a59328af62e336809246be5bebbbf2e154366b3
2014-10-23 16:57:11 +09:00
Tadashi G. Takaoka
92d073c2fd Remove unused import and method
Bug: 18003991
Change-Id: Id6b67bf66b397301e5186826dba2b60df9cb4c65
2014-10-23 16:37:07 +09:00
Tadashi G. Takaoka
d3a4c51324 Fix Javadoc and null analysis related warnings
This CL also adds @SuppressWarning("unused" to java-overridable package.

Bug: 18003991
Change-Id: If70527e30654384705d7a814f5efd181d9f539e1
2014-10-23 09:58:42 +09:00
Jean Chalard
90aa229f01 Remove XML input/output from dicttool.
This hasn't been used for a while. It's deprecated. Let's kill it.

Change-Id: Ib1c491fa14b6406f6f77f2b0869f4db1810eb078
2014-10-22 17:28:33 +09:00
Tadashi G. Takaoka
5f00fe09e9 Fix some compiler warnings
This CL fixes the following compiler warnings.

- Indirect access to static member
- Access to a non-accessible member of an enclosing type
- Parameter assignment
- Method can be static
- Local variable declaration hides another field or variable
- Value of local variable is not used
- Unused import
- Unused private member
- Unnecessary 'else' statement
- Unnecessary declaration of throw exception
- Redundant type arguments
- Missing '@Override' annotation
- Unused '@SuppressWarning' annotations

Bug: 18003991
Change-Id: Icfebe753e53a2cc621848f769d6a3d7ce501ebc7
2014-10-21 19:28:37 +09:00
Adrian Velicu
05172bf1a5 Renaming "blacklist" flag to "possibly offensive"
No behaviour changes.
Unified the overloaded FusionDictionary::add method to always take an
isPossiblyOffensive argument.

Bug: 11031090
Change-Id: I5741a023ca1ce842d2cf10d4f6c926b0efabaa78
2014-10-21 11:51:47 +09:00
Akifumi Yoshimoto
7e5614520a Merge "Include a code point table in the binary dictionary." 2014-10-02 08:55:18 +00:00
Akifumi Yoshimoto
9168ab60cf Include a code point table in the binary dictionary.
Bug:17097992
Change-Id: I677a5eb3a704e4386f6573360e44ca335d81d2df
2014-10-02 12:27:49 +09:00
Keisuke Kuroyanagi
c6a6f6a990 Introduce NgramProperty in Java side.
Bug: 14425059
Change-Id: I8b3458ad22730b3dccbe0caea2c5930f5276dc82
2014-10-01 11:21:08 +09:00
Akifumi Yoshimoto
f4329f7fff Read dicttool option for switching code point table
Bug:17097992
Change-Id: I0b3f12c4450f784b9a33470d1dc4c306062de91e
2014-09-26 15:15:10 +09:00
Tadashi G. Takaoka
fec4769e0b Refactor dicttool with try-with-resource
This CL must be checked in together with Idd7c744d0f.

Change-Id: Ia0ff09a054c1852b39cdce22a4377108afb254e2
2014-06-22 23:20:37 -07:00
Tadashi G. Takaoka
a91561aa58 Use Java 7 diamond operator
Change-Id: If16ef50ae73147594615d0f49d6a22621eaf1aef
2014-05-24 01:05:42 +09:00
Jean Chalard
7086d88d3e Have dicttool test tidy up after itself.
Bug: 13776363
Change-Id: Icb1d3fc0efe71e0339b434928e8aed507f2fb590
2014-05-23 19:56:57 +09:00
Keisuke Kuroyanagi
93cda5bb39 Move code only used for dicttool and tests under tests.
Bug: 13035567
Change-Id: I13c6df013ef2b67c9bf67455d9c32d283bf9ea2e
2014-03-27 15:30:32 +09:00
Keisuke Kuroyanagi
f14cf3e64c Fix: dicttool build.
Change-Id: I5c3bcbe9f3054bdd1a760398fe11344e0e05ac6a
2014-03-07 13:01:48 +00:00
Keisuke Kuroyanagi
3ad4af2354 Move DictionaryOptions from FusionDictionary to FormatSpec.
Bug: 8187060
Bug:13035567

Change-Id: Id4f45e589521ae98c926a4c0607be10ce1a983f2
2014-03-06 18:53:09 +09:00
Keisuke Kuroyanagi
516f86815d Separate WeightedString from FusionDictionary.
Bug: 8187060

Change-Id: I40c1dafca3eb52244c64fdb4c1db30a56385d678
2014-03-06 18:53:06 +09:00
Keisuke Kuroyanagi
36305d4207 Fix: dicttool build.
Change-Id: I592b14eba895786d0981586a01ef545e003396c8
2014-02-28 19:04:49 +09:00
Jean Chalard
890b44e537 Correctly read the header of APK-embedded dicts
Bug: 13164518
Change-Id: I8768ad887af8b89ad9f29637f606c3c68629c7ca
2014-02-24 22:54:01 +09:00
Keisuke Kuroyanagi
8e3a1d0f89 Remove unused argument from readDictionaryBinary.
Bug: 12810574
Change-Id: Ice415ebd8d11162facca3fe8927ef8a616b11424
2014-02-14 19:02:15 +09:00
Keisuke Kuroyanagi
69ccac6e51 Remove unused code.
Bug: 12810574
Change-Id: If0ef02a984469a3b6e0c00b1c3c8d98d0d2b5466
2014-02-10 15:05:11 +09:00
Keisuke Kuroyanagi
8ffc631826 Make PtNode have ProbabilityInfo instead of raw value.
Bug: 11281877
Bug: 12810574
Change-Id: Id1cda0afc74c4e30633c735729143491b2274a7b
2014-02-10 15:05:08 +09:00
Keisuke Kuroyanagi
b24de426fc Use CombinedFormatUtils to convert dict elements to strings.
Bug: 11281877
Bug: 12810574
Change-Id: Ib631f75eab73abc9877a7698171c45e8f2fc7600
2014-02-06 16:09:25 +09:00
Keisuke Kuroyanagi
5f5feeba13 Consolidate WordProperty and Word.
Bug: 11281877
Bug: 12810574
Change-Id: I9dc99188f80f25a8780c1860dab46e4aa80a23e5
2014-02-06 15:13:33 +09:00
Keisuke Kuroyanagi
df1d3e733e Make WeightedString have ProbabilityInfo.
Bug: 11281877
Bug: 12810574
Change-Id: I265e3d8654c75766cd0e0d09d67ef62b4566298a
2014-02-05 21:44:55 +09:00
Keisuke Kuroyanagi
c2fd53ee0e Remove ver4 dict updater.
Change-Id: I468994c98d091be621b9fb3fbe6405c67fc6a465
2013-12-17 18:17:51 +09:00
Jean Chalard
b868375763 Fix failing tests
- Version 3 is not supported
- Now passing the right string to open v4 dicts. Fix the tests for this.

Change-Id: I7829330c3568a715b96396ba4e4e69c6e17775ab
2013-12-16 14:32:19 +09:00
Jean Chalard
a245d15da5 Have dicttool use the native library to generate v4 dicts.
Yay !

Change-Id: Iea8ced9e81031b9ab7eff05ad9ef7215be248de9
2013-12-13 18:18:20 +09:00
Jean Chalard
7b55cd3e2b Remove flags from Java side.
This simplifies the code quite a bit.
- GERMAN_UMLAUTS are now handled through a key-value attribute.
  The dictionary generator does not need to know about it any more.
- FRENCH_LIGATURES are deprecated as we handle them with shortcuts now.
- CONTAINS_BIGRAMS is deprecated. Bigram processing is always applied
  regardless of this flag.

Bug: 11281748
Change-Id: If567e52e245a9342adc7f3104a0f7d8d782df8c1
2013-12-13 18:15:05 +09:00
Ken Wakasa
2fa3693c26 Reset to 9bd6dac470
The bulk merge from -bayo to klp-dev should not have been merged to master.

Change-Id: I527a03a76f5247e4939a672f27c314dc11cbb854
2013-12-13 17:13:32 +09:00
Yuichiro Hanada
73665510ca Show more messages when reading a compressed combined format file.
Change-Id: I51a1b9454fcfe656e0fcf762dcfd9ecbadde86c3
2013-10-08 17:05:39 +09:00
Yuichiro Hanada
48e01ec111 Make dicttool read the compressed combined format.
Change-Id: Ib39fa110402895a655f4e705caae53397ace9259
2013-09-30 14:59:19 +09:00
Yuichiro Hanada
51a590b2fe Fix getDictionary.
Change-Id: I6bc3ec8dd4397a9aaf9dca2f16ce8a1929a47e9e
2013-09-26 15:26:31 +09:00
Yuichiro Hanada
fa68e2cdf5 Add a new option for version 4 to dicttool.
Change-Id: I18fd48c1f6921758d30330fbc77f4a917c33f1c8
2013-09-19 11:59:42 +09:00