fitsemiself
Label data using semi-supervised self-training method
Syntax
Description
fitsemiself creates a semi-supervised self-training model
given labeled data, labels, and unlabeled data. The returned model contains the fitted labels
for the unlabeled data and the corresponding scores. This model can also predict labels for
unseen data using the predict object function. For more information on
the labeling algorithm, see Algorithms.
uses the labeled data in Mdl = fitsemiself(Tbl,ResponseVarName,UnlabeledTbl)Tbl, where
Tbl.ResponseVarName contains the labels for the labeled data, and
returns fitted labels for the unlabeled data in UnlabeledTbl. The
function stores the fitted labels and the corresponding scores in the
FittedLabels and LabelScores properties of the
object Mdl, respectively.
uses Mdl = fitsemiself(Tbl,formula,UnlabeledTbl)formula to specify the response variable (vector of labels) and
the predictor variables to use among the variables in Tbl. The
function uses these variables to label the data in
UnlabeledTbl.
uses the predictor data in Mdl = fitsemiself(Tbl,Y,UnlabeledTbl)Tbl and the labels in
Y to label the data in UnlabeledTbl.
uses the predictor data in Mdl = fitsemiself(X,Y,UnlabeledX)X and the labels in Y
to label the data in UnlabeledX.
specifies options using one or more name-value pair arguments in addition to any of the
input argument combinations in previous syntaxes. For example, you can specify the type of
learner, number of iterations, and score threshold to use in the labeling
algorithm.Mdl = fitsemiself(___,Name,Value)
Examples
Input Arguments
Name-Value Arguments
Output Arguments
Algorithms
The algorithm begins by training a user-specified classifier
(Learner), first trained on the labeled data alone, and then uses that
classifier to make label predictions for the unlabeled data. Next, the algorithm provides
scores for the predictions, and then treats the predictions as true labels for the next
training cycle of the classifier if the scores are above a threshold
(ScoreThreshold). This process repeats until the label predictions
converge or the iteration limit (IterationLimit) is reached.
References
[1] Abney, Steven. “Understanding the Yarowsky Algorithm.” Computational Linguistics 30, no. 3 (September 2004): 365–95. https://doi.org/10.1162/0891201041850876.
[2] Yarowsky, David. “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods.” Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–96. Cambridge, Massachusetts: Association for Computational Linguistics, 1995. https://doi.org/10.3115/981658.981684.
Version History
Introduced in R2020b


