Main Content

removeDocument

Remove documents from bag-of-words or bag-of-n-grams model

Description

newBag = removeDocument(bag,idx) removes the documents with indices specified by idx from the bag-of-words or bag-of-n-grams model bag. If the removed documents contain words or n-grams that do not appear in the remaining documents, then the function also removes these words or n-grams from bag.

example

Examples

collapse all

Remove selected documents from a bag-of-words model.

documents = tokenizedDocument([ ...
    "an example of a short sentence" 
    "a second short sentence"
    "a third example"
    "a final sentence"]);
bag = bagOfWords(documents)
bag = 
  bagOfWords with properties:

          Counts: [4x9 double]
      Vocabulary: ["an"    "example"    "of"    "a"    "short"    "sentence"    "second"    "third"    "final"]
        NumWords: 9
    NumDocuments: 4

Remove the first and third documents from bag.

idx = [1 3];
newBag = removeDocument(bag,idx)
newBag = 
  bagOfWords with properties:

          Counts: [2x5 double]
      Vocabulary: ["a"    "short"    "sentence"    "second"    "final"]
        NumWords: 5
    NumDocuments: 2

Remove the same documents using logical indices.

idx = logical([1 0 1 0]);
newBag = removeDocument(bag,idx)
newBag = 
  bagOfWords with properties:

          Counts: [2x5 double]
      Vocabulary: ["a"    "short"    "sentence"    "second"    "final"]
        NumWords: 5
    NumDocuments: 2

Input Arguments

collapse all

Input bag-of-words or bag-of-n-grams model, specified as a bagOfWords object or a bagOfNgrams object.

Indices of documents to remove, specified as a vector of numeric indices or a vector of logical indices.

Example: [2 4 6]

Example: [0 1 0 1 0 1]

Output Arguments

collapse all

Output model, returned as a bagOfWords object or a bagOfNgrams object. The type of newBag is the same as the type of bag.

Version History

Introduced in R2017b