Removing commas between columns in text data

Question

0 个投票

I have a txt file which is the ouput of a lemmatizer, in the form

Sometimes, ,, I, use, commas, .
I, like, writing, ,, I, like, reading

How can I read it into a tokenizedDocument deleting the unneccessary commas between tokens? A simple approach would be

test=readlines('/path/to/file.txt')
test=strrep(test,',','')
test=tokenizedDocument(test)

but it would remove even the commas already present in the original text, while I'd like to preserve punctuation-

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Walter Roberson 2021-10-16

在 MATLAB Online 中打开

2 个投票

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, {'(?<=[^,]),\s', '\s*,,', '\s+\.'}, {' ', ',', '.'})
test = 2×1 cell array
    {'Sometimes, I use commas.'      }
    {'I like writing, I like reading'}

Notice we had to have a special rule for periods. You have 'use, commas' which should almost certainly translate to 'use commas' (so comma space becomes space), but after that 'commas, .' should not become 'commas .' .

To put it another way, we cannot use the rule that comma space pair is to be deleted: that works for the comma space between the word 'commas' and the period, but it does not work for the comma space pair between 'use' and 'commas': if you tried to apply that rule then 'use, commas' would merge together to 'usecommas' .

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Kim Maria Damiani 2021-10-16

Thank you!

请先登录，再进行评论。

Answer 2

Chunru 2021-10-16

在 MATLAB Online 中打开

0 个投票

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, ',\s', ' ')
test = 2×1 cell array
    {'Sometimes , I use commas .'     }
    {'I like writing , I like reading'}

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Removing commas between columns in text data

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

类别

产品

版本

标签

Community Treasure Hunt

Removing commas between columns in text data

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论