2012-08-03 03:22:29 +00:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2012 The Android Open Source Project
|
|
|
|
*
|
2013-01-21 12:52:57 +00:00
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
2012-08-03 03:22:29 +00:00
|
|
|
*
|
2013-01-21 12:52:57 +00:00
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
2012-08-03 03:22:29 +00:00
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
2013-01-21 12:52:57 +00:00
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
2012-08-03 03:22:29 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
package com.android.inputmethod.research;
|
|
|
|
|
2012-08-09 19:20:45 +00:00
|
|
|
import android.util.Log;
|
|
|
|
|
2012-08-03 03:22:29 +00:00
|
|
|
import com.android.inputmethod.latin.Dictionary;
|
|
|
|
import com.android.inputmethod.latin.Suggest;
|
2012-12-18 02:19:58 +00:00
|
|
|
import com.android.inputmethod.latin.define.ProductionFlag;
|
2012-08-03 03:22:29 +00:00
|
|
|
|
2013-01-12 00:49:54 +00:00
|
|
|
import java.util.ArrayList;
|
2012-12-23 18:40:34 +00:00
|
|
|
import java.util.LinkedList;
|
2012-08-03 03:22:29 +00:00
|
|
|
import java.util.Random;
|
|
|
|
|
2012-12-23 18:40:34 +00:00
|
|
|
/**
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
* MainLogBuffer is a FixedLogBuffer that tracks the state of LogUnits to make privacy guarantees.
|
2012-12-23 18:40:34 +00:00
|
|
|
*
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
* There are three forms of privacy protection: 1) only words in the main dictionary are allowed to
|
|
|
|
* be logged in enough detail to determine their contents, 2) only a subset of words are logged
|
|
|
|
* in detail, such as 10%, and 3) no numbers are logged.
|
|
|
|
*
|
|
|
|
* This class maintains a list of LogUnits, each corresponding to a word. As the user completes
|
|
|
|
* words, they are added here. But if the user backs up over their current word to edit a word
|
|
|
|
* entered earlier, then it is pulled out of this LogBuffer, changes are then added to the end of
|
|
|
|
* the LogUnit, and it is pushed back in here when the user is done. Because words may be pulled
|
|
|
|
* back out even after they are pushed in, we must not publish the contents of this LogBuffer too
|
|
|
|
* quickly. However, we cannot let the contents pile up either, or it will limit the editing that
|
|
|
|
* a user can perform.
|
|
|
|
*
|
|
|
|
* To balance these requirements (keep history so user can edit, flush history so it does not pile
|
|
|
|
* up), the LogBuffer is considered "complete" when the user has entered enough words to form an
|
|
|
|
* n-gram, followed by enough additional non-detailed words (that are in the 90%, as per above).
|
|
|
|
* Once complete, the n-gram may be published to flash storage (via the ResearchLog class).
|
|
|
|
* However, the additional non-detailed words are retained, in case the user backspaces to edit
|
|
|
|
* them. The MainLogBuffer then continues to add words, publishing individual non-detailed words
|
|
|
|
* as new words arrive. After enough non-detailed words have been pushed out to account for the
|
|
|
|
* 90% between words, the words at the front of the LogBuffer can be published as an n-gram again.
|
|
|
|
*
|
|
|
|
* If the words that would form the valid n-gram are not in the dictionary, then words are pushed
|
|
|
|
* through the LogBuffer one at a time until an n-gram is found that is entirely composed of
|
|
|
|
* dictionary words.
|
|
|
|
*
|
|
|
|
* If the user closes a session, then the entire LogBuffer is flushed, publishing any embedded
|
|
|
|
* n-gram containing dictionary words.
|
2012-12-23 18:40:34 +00:00
|
|
|
*/
|
2013-01-12 00:49:54 +00:00
|
|
|
public abstract class MainLogBuffer extends FixedLogBuffer {
|
2012-08-09 19:20:45 +00:00
|
|
|
private static final String TAG = MainLogBuffer.class.getSimpleName();
|
2012-12-18 02:19:58 +00:00
|
|
|
private static final boolean DEBUG = false && ProductionFlag.IS_EXPERIMENTAL_DEBUG;
|
2012-08-09 19:20:45 +00:00
|
|
|
|
2012-08-03 03:22:29 +00:00
|
|
|
// The size of the n-grams logged. E.g. N_GRAM_SIZE = 2 means to sample bigrams.
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
public static final int N_GRAM_SIZE = 2;
|
2013-01-12 00:49:54 +00:00
|
|
|
|
|
|
|
// Whether all words should be recorded, leaving unsampled word between bigrams. Useful for
|
|
|
|
// testing.
|
|
|
|
/* package for test */ static final boolean IS_LOGGING_EVERYTHING = false
|
|
|
|
&& ProductionFlag.IS_EXPERIMENTAL_DEBUG;
|
|
|
|
|
|
|
|
// The number of words between n-grams to omit from the log.
|
2012-12-18 02:19:58 +00:00
|
|
|
private static final int DEFAULT_NUMBER_OF_WORDS_BETWEEN_SAMPLES =
|
2013-01-12 00:49:54 +00:00
|
|
|
IS_LOGGING_EVERYTHING ? 0 : (DEBUG ? 2 : 18);
|
2012-08-03 03:22:29 +00:00
|
|
|
|
|
|
|
private Suggest mSuggest;
|
2013-01-12 00:49:54 +00:00
|
|
|
private boolean mIsStopping = false;
|
2012-08-03 03:22:29 +00:00
|
|
|
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
/* package for test */ int mNumWordsBetweenNGrams;
|
2012-08-03 03:22:29 +00:00
|
|
|
|
|
|
|
// Counter for words left to suppress before an n-gram can be sampled. Reset to mMinWordPeriod
|
|
|
|
// after a sample is taken.
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
/* package for test */ int mNumWordsUntilSafeToSample;
|
2012-08-03 03:22:29 +00:00
|
|
|
|
2013-01-12 00:49:54 +00:00
|
|
|
public MainLogBuffer() {
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
super(N_GRAM_SIZE + DEFAULT_NUMBER_OF_WORDS_BETWEEN_SAMPLES);
|
|
|
|
mNumWordsBetweenNGrams = DEFAULT_NUMBER_OF_WORDS_BETWEEN_SAMPLES;
|
2012-08-03 03:22:29 +00:00
|
|
|
final Random random = new Random();
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
mNumWordsUntilSafeToSample = DEBUG ? 0 : random.nextInt(mNumWordsBetweenNGrams + 1);
|
2012-08-03 03:22:29 +00:00
|
|
|
}
|
|
|
|
|
2012-12-23 18:40:34 +00:00
|
|
|
public void setSuggest(final Suggest suggest) {
|
2012-08-03 03:22:29 +00:00
|
|
|
mSuggest = suggest;
|
|
|
|
}
|
|
|
|
|
|
|
|
public void resetWordCounter() {
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
mNumWordsUntilSafeToSample = mNumWordsBetweenNGrams;
|
2012-08-03 03:22:29 +00:00
|
|
|
}
|
|
|
|
|
2013-01-12 00:49:54 +00:00
|
|
|
public void setIsStopping() {
|
|
|
|
mIsStopping = true;
|
|
|
|
}
|
|
|
|
|
2012-08-03 03:22:29 +00:00
|
|
|
/**
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
* Determines whether uploading the n words at the front the MainLogBuffer will not violate
|
|
|
|
* user privacy.
|
2012-08-03 03:22:29 +00:00
|
|
|
*
|
|
|
|
* The size of the MainLogBuffer is just enough to hold one n-gram, its corrections, and any
|
|
|
|
* non-character data that is typed between words. The decision about privacy is made based on
|
|
|
|
* the buffer's entire content. If it is decided that the privacy risks are too great to upload
|
|
|
|
* the contents of this buffer, a censored version of the LogItems may still be uploaded. E.g.,
|
|
|
|
* the screen orientation and other characteristics about the device can be uploaded without
|
|
|
|
* revealing much about the user.
|
|
|
|
*/
|
2013-01-12 00:49:54 +00:00
|
|
|
private boolean isSafeNGram(final ArrayList<LogUnit> logUnits, final int minNGramSize) {
|
|
|
|
// Bypass privacy checks when debugging.
|
|
|
|
if (IS_LOGGING_EVERYTHING) {
|
|
|
|
if (mIsStopping) {
|
|
|
|
return true;
|
2012-11-17 00:10:10 +00:00
|
|
|
}
|
|
|
|
// Only check that it is the right length. If not, wait for later words to make
|
|
|
|
// complete n-grams.
|
|
|
|
int numWordsInLogUnitList = 0;
|
|
|
|
final int length = logUnits.size();
|
|
|
|
for (int i = 0; i < length; i++) {
|
|
|
|
final LogUnit logUnit = logUnits.get(i);
|
|
|
|
final String word = logUnit.getWord();
|
|
|
|
if (word != null) {
|
|
|
|
numWordsInLogUnitList++;
|
2013-01-12 00:49:54 +00:00
|
|
|
}
|
|
|
|
}
|
2012-11-17 00:10:10 +00:00
|
|
|
return numWordsInLogUnitList >= minNGramSize;
|
2013-01-12 00:49:54 +00:00
|
|
|
}
|
|
|
|
|
2012-08-03 03:22:29 +00:00
|
|
|
// Check that we are not sampling too frequently. Having sampled recently might disclose
|
|
|
|
// too much of the user's intended meaning.
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
if (mNumWordsUntilSafeToSample > 0) {
|
2012-08-03 03:22:29 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
if (mSuggest == null || !mSuggest.hasMainDictionary()) {
|
2013-01-12 00:49:54 +00:00
|
|
|
// Main dictionary is unavailable. Since we cannot check it, we cannot tell if a
|
|
|
|
// word is out-of-vocabulary or not. Therefore, we must judge the entire buffer
|
|
|
|
// contents to potentially pose a privacy risk.
|
2012-08-03 03:22:29 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
// Reload the dictionary in case it has changed (e.g., because the user has changed
|
|
|
|
// languages).
|
|
|
|
final Dictionary dictionary = mSuggest.getMainDictionary();
|
|
|
|
if (dictionary == null) {
|
|
|
|
return false;
|
|
|
|
}
|
2013-01-12 00:49:54 +00:00
|
|
|
|
|
|
|
// Check each word in the buffer. If any word poses a privacy threat, we cannot upload
|
|
|
|
// the complete buffer contents in detail.
|
|
|
|
int numWordsInLogUnitList = 0;
|
2012-12-23 18:40:34 +00:00
|
|
|
final int length = logUnits.size();
|
2013-01-12 00:49:54 +00:00
|
|
|
for (int i = 0; i < length; i++) {
|
2012-12-23 18:40:34 +00:00
|
|
|
final LogUnit logUnit = logUnits.get(i);
|
2012-11-17 00:10:10 +00:00
|
|
|
if (!logUnit.hasWord()) {
|
2012-08-03 03:22:29 +00:00
|
|
|
// Digits outside words are a privacy threat.
|
2012-08-09 22:58:25 +00:00
|
|
|
if (logUnit.mayContainDigit()) {
|
2012-08-03 03:22:29 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} else {
|
2013-01-12 00:49:54 +00:00
|
|
|
numWordsInLogUnitList++;
|
2012-11-17 00:10:10 +00:00
|
|
|
final String word = logUnit.getWord();
|
2012-08-03 03:22:29 +00:00
|
|
|
// Words not in the dictionary are a privacy threat.
|
2012-08-10 08:54:06 +00:00
|
|
|
if (ResearchLogger.hasLetters(word) && !(dictionary.isValidWord(word))) {
|
2012-08-12 20:54:53 +00:00
|
|
|
if (DEBUG) {
|
|
|
|
Log.d(TAG, "NOT SAFE!: hasLetters: " + ResearchLogger.hasLetters(word)
|
|
|
|
+ ", isValid: " + (dictionary.isValidWord(word)));
|
|
|
|
}
|
2012-08-03 03:22:29 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2013-01-12 00:49:54 +00:00
|
|
|
|
|
|
|
// Finally, only return true if the minNGramSize is met.
|
|
|
|
return numWordsInLogUnitList >= minNGramSize;
|
2012-08-03 03:22:29 +00:00
|
|
|
}
|
|
|
|
|
2013-01-12 00:49:54 +00:00
|
|
|
public void shiftAndPublishAll() {
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
final LinkedList<LogUnit> logUnits = getLogUnits();
|
2013-01-12 00:49:54 +00:00
|
|
|
while (!logUnits.isEmpty()) {
|
|
|
|
publishLogUnitsAtFrontOfBuffer();
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-08-03 03:22:29 +00:00
|
|
|
@Override
|
2013-01-12 00:49:54 +00:00
|
|
|
protected final void onBufferFull() {
|
|
|
|
publishLogUnitsAtFrontOfBuffer();
|
|
|
|
}
|
|
|
|
|
|
|
|
protected final void publishLogUnitsAtFrontOfBuffer() {
|
|
|
|
ArrayList<LogUnit> logUnits = peekAtFirstNWords(N_GRAM_SIZE);
|
|
|
|
if (isSafeNGram(logUnits, N_GRAM_SIZE)) {
|
|
|
|
// Good n-gram at the front of the buffer. Publish it, disclosing details.
|
|
|
|
publish(logUnits, true /* canIncludePrivateData */);
|
|
|
|
shiftOutWords(N_GRAM_SIZE);
|
|
|
|
resetWordCounter();
|
|
|
|
} else {
|
|
|
|
// No good n-gram at front, and buffer is full. Shift out the first word (or if there
|
|
|
|
// is none, the existing logUnits).
|
|
|
|
logUnits = peekAtFirstNWords(1);
|
|
|
|
publish(logUnits, false /* canIncludePrivateData */);
|
|
|
|
shiftOutWords(1);
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
}
|
2013-01-12 00:49:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Called when a list of logUnits should be published.
|
|
|
|
*
|
|
|
|
* It is the subclass's responsibility to implement the publication.
|
|
|
|
*
|
|
|
|
* @param logUnits The list of logUnits to be published.
|
|
|
|
* @param canIncludePrivateData Whether the private data in the logUnits can be included in
|
|
|
|
* publication.
|
|
|
|
*/
|
|
|
|
protected abstract void publish(final ArrayList<LogUnit> logUnits,
|
|
|
|
final boolean canIncludePrivateData);
|
|
|
|
|
|
|
|
@Override
|
|
|
|
protected void shiftOutWords(int numWords) {
|
|
|
|
int oldNumActualWords = getNumActualWords();
|
|
|
|
super.shiftOutWords(numWords);
|
|
|
|
int numWordsShifted = oldNumActualWords - getNumActualWords();
|
|
|
|
mNumWordsUntilSafeToSample -= numWordsShifted;
|
[Rlog56] Buffer words before pushing out LogUnit
Previously, a logbuffer only held an n-gram. Data went in and out of it, FIFO, until privacy
conditions were met (i.e. data not collected too frequently), and then an n-gram was saved.
E.g., if n=2, and only 10% of data is collected, then 18 words went through the logbuffer before
it captured the next 2 words.
However, if a user then went back and edited the n-gram, these edits were not captured.
This change changes the logbuffer size to temporarily hold data about words that are not recorded,
so that if the user backs up over them, the edits to an n-gram that we do eventually capture are
stored. If the example above, instead of a logbuffer holding 2 words, it holds 20. The system
waits until all the words not needed for the n-gram have been gathered (i.e. the buffer is full),
so the user has adequate time to edit, before shifting out the n-gram. The buffer is still flushed
when the user closes the IME. See the comment for MainLogBuffer for an explanation.
multi-project commit with I45317bc95eeb859adc1b35b24d0478f2df1a67f3
Change-Id: I4ffd95d08c6437dcf650d866ef9e24b6af512334
2013-01-08 19:18:43 +00:00
|
|
|
if (DEBUG) {
|
2013-01-12 00:49:54 +00:00
|
|
|
Log.d(TAG, "wordsUntilSafeToSample now at " + mNumWordsUntilSafeToSample);
|
2012-08-03 03:22:29 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|