Replace a missing string in a table

23 次查看(过去 30 天)
I want to replace all missing strings in a table with a string of my choice, say 'unknown'. I use R2016a (without an upgrade option), so functions like fillmissing are not available to me, in case they could be of help. Eg:
dblVar = [NaN; 3; 7; 9];
cellstrVar = {'one'; 'three'; ''; 'nine'};
categoryVar = categorical({''; 'red'; 'yellow'; 'blue'});
A = table(dblVar, cellstrVar, categoryVar)
A =
dblVar cellstrVar categoryVar
______ __________ ___________
NaN 'one' <undefined>
3 'three' red
7 '' yellow
9 'nine' blue
I would like to end up with this:
A =
dblVar cellstrVar categoryVar
______ __________ ___________
NaN 'one' unknown
3 'three' red
7 'unknown' yellow
9 'nine' blue
Note I also replaced the categorical '<undefined>' as well, if you can please include in your answer.
Is there a way to do this without changing A's structure, eg from table to cell, in the process? The reason I want to avoid the transformation is my table is large, and transformation may cause memory issues.
Edit to add: the location of the missing string value has to be identified as well, there may be several such columns in the table.
Many thanks.

采纳的回答

George
George 2016-9-28
This should work:
dblVar = [NaN; 3; 7; 9];
cellstrVar = {'one'; 'three'; ''; 'nine'};
categoryVar = categorical({''; 'red'; 'yellow'; 'blue'});
cellstrVar2 = {'four'; 'none'; '7'; ''};
A = table(dblVar, cellstrVar, categoryVar, cellstrVar2);
varNames = A.Properties.VariableNames;
for ii = 1:numel(varNames)
if iscellstr(A{1,varNames{ii}})
undefloc = strcmp(A.(ii), '');
A{undefloc, ii} = cellstr('unknown');
end
if iscategorical(A{1, varNames{ii}})
undefloc = isundefined(A{:,ii});
A{undefloc, ii} = categorical(cellstr('unknown'));
end
end
A =
dblVar cellstrVar categoryVar cellstrVar2
______ __________ ___________ ___________
NaN 'one' unknown 'four'
3 'three' red 'none'
7 'unknown' yellow '7'
9 'nine' blue 'unknown'
You can use the undefloc variable to find where things were undefined or empty strings.
  4 个评论
Ory
Ory 2016-9-29
编辑:Ory 2016-9-29
Awesome! If not known, one can get to the classes index by, say for cell strings:
catInd = strcmp(varfun(@class, A, 'OutputFormat', 'cell'), 'cell');
George
George 2016-9-29
There you go. If you do that and slam the cells and categoricals into cell arrays you can use the curly brace syntax on the cell array, rather than the variable name.

请先登录,再进行评论。

更多回答(1 个)

Peter Perkins
Peter Perkins 2016-10-3
George's loop seems fine to me although you could tweak it a bit as
for name = varNames
var = A.name;
if iscellstr(var)
var(strcmp(var,'')) = {'unknown'};
elseif iscategorical(A.(name))
var(isundefined(var) = 'unknown';
end
A.Name = var;
end
If you're willing to write a couple of small functions, you can do this:
theCellStrs = varfun(@iscellstr,A);
A(:,theCellStrs) = varfun(@replaceEmptyString,A(:,theCellStrs));
function c = replaceEmptyString(c)
c(strcmp(c,'') = {'Unknown'};
(and similarly for categorical) but varfun uses a loop underneath.

类别

Help CenterFile Exchange 中查找有关 Data Type Identification 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by