Delete only consecutive repeated string entries from a dataset in matlab

1 次查看（过去 30 天）

显示更早的评论

avantika 2013-8-29

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/85896-delete-only-consecutive-repeated-string-entries-from-a-dataset-in-matlab

采纳的回答： Andrei Bobrov

在 MATLAB Online 中打开

hi!

I am relatively new to matlab. I have a dataset having three columns, time, pitch and notation. for eg

 time   pitch notation
5725 329.63 G
5800 329.63 G
5900 311.13 M
5900 311.13 M
6000 570.40 P

I want to remove duplicates occurring consecutively in the file such that the order remains the same. so the output will be:

 time   pitch notation
5725 329.63 G
5900 311.13 M
6000 570.40 P

I am currently using matlab 7.9.0 so the first of unique command id not supported. Can anyone tell me how to go about it further.

[EDITED, table formatted, Jan]

5 个评论
显示 3更早的评论隐藏 3更早的评论

avantika 2013-8-29

编辑：avantika 2013-8-29

在 MATLAB Online 中打开

hi!

I am relatively new to matlab. I have a dataset having three columns, time, pitch and notation. for eg

 time   pitch notation
5725 329.63 GM
5800 329.63 GM
5900 311.13 MM
5900 311.13 MM
6000 570.40 PM
6725 329.63 GM
6800 329.63 GM
7900 311.13 MM
8900 311.13 MM
9000 570.40 PM
9500 570.40 PM
1000 570.40 PM
I want to remove repeated enteries occurring consecutively in the pitch and notation column in the file but the order of the dataset should remain the same. so the output will be:
 time   pitch notation
5725 329.63 GM
5900 311.13 MM
6000 570.40 PM
6725 329.63 GM
7900 311.13 MM
9000 570.40 PM

Jan 2013-8-30

编辑：Jan 2013-8-30

在 MATLAB Online 中打开

I still do not understand the empty lines and the type of the input is not explained here. Is this a text file, a cell containing strings, or or is pitch a field of a struct, which contains a double vector?

What should happen for:

5725 329.63 GM
5800 329.63 GM
5725 329.63 GM
5800 329.63 GM

So does "consecutively" mean the position in the array or should a line vanish, if there is any equal set of values before, even if other lines appear in between?

I assume that the solution is very easy, if you define the wanted procedure and the class of the input exactly.

请先登录，再进行评论。

请先登录，再回答此问题。

采纳的回答

Andrei Bobrov 2013-8-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/85896-delete-only-consecutive-repeated-string-entries-from-a-dataset-in-matlab#answer_95475

编辑：Andrei Bobrov 2013-8-30

在 MATLAB Online 中打开

    C  ={  'time'   'pitch' 'notation'
5725 329.63 'GM'
5800 329.63 'GM'
5900 311.13 'MM'
6000 570.40 'PM'
6725 329.63 'GM'
6800 329.63 'GM'
7900 311.13 'MM'
8900 311.13 'MM'
9000 570.40 'PM'
9500 570.40 'PM'
1000 570.40 'PM'};
    d = cell2dataset(C); % your dataset - array
 [~,~,ii] = unique(d.notation);
 out = d([true;diff(ii)~=0],:);

ADD without dataset array

C  ={  'time'   'pitch' 'notation'
5725 329.63 'GM'
5800 329.63 'GM'
5900 311.13 'MM'
6000 570.40 'PM'
6725 329.63 'GM'
6800 329.63 'GM'
7900 311.13 'MM'
8900 311.13 'MM'
9000 570.40 'PM'
9500 570.40 'PM'
1000 570.40 'PM'};
[ii,ii,ii] = unique(C(2:end,3));
out = C([true(2,1);diff(ii)~=0],:);

ADD 2

scale = {'GM';'PM'};
[~,ii] = ismember(C(2:end,3),scale);
i1 = ii > 0;
C1 = C(i1,:);
out = C1([true(2,1);diff(ii(i1))~=0],:);

6 个评论
显示 4更早的评论隐藏 4更早的评论

Andrei Bobrov 2013-8-30

see block ADD2

avantika 2013-9-2

I get an error when i use the commands given by you for ismember:

scale = {'NL';'NM';'NU';'RL';'RM';'RU';'GL';'GM';'GU';'mL';'mM';'mU';'DL';'DM';'DU';'SL';'SM';'SU';'PL';'PM';'PU';};

>> [~,ii] = ismember(C(2:end,3),scale);

i1 = ii > 0;

C1 = C(i1,:);

out2 = C1([true(2,1);diff(ii(i1))~=0],:);

??? Error using ==> dataset.subsref at 82 Dataset array subscripts must be two-dimensional.

Error in ==> ismember at 78 found = find(a(i)==s(:)); % FIND returns indices for LOC.

请先登录，再进行评论。

更多回答（1 个）

Simon 2013-8-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/85896-delete-only-consecutive-repeated-string-entries-from-a-dataset-in-matlab#answer_95411

在 MATLAB Online 中打开

Hi!

I don't understand why the second entry is removed. Is a "duplicate" defined if all three columns match or only the second and third?

Does the unique command in 7.9 support rows? Like

b = unique(A, 'rows')

Does your data set consist of single lines for each entry? You could try to put each data set as a string in a cell array and use unique of the cell array.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Simon 2013-8-29

Hi!

Try it with a cell array of strings, as proposed.

The unique command has additional return values that contain the relation between the sorted output and the input. Check the documentation for more information.

avantika 2013-8-29

编辑：avantika 2013-8-29

hi!

i checked the class of dataset before the conversion to cell array of strings .

class(ds3.pitch)

ans =

double

class(ds3.time)

ans =

double

class(ds3.notation)

ans =

cell

I am not able to convert it to cell array of strings.

请先登录，再进行评论。

请先登录，再回答此问题。

类别

AI and Statistics Statistics and Machine Learning Toolbox Descriptive Statistics and Visualization Managing Data

在 Help Center 和 File Exchange 中查找有关 Managing Data 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Delete only consecutive repeated string entries from a dataset in matlab

5 个评论
显示 3更早的评论隐藏 3更早的评论

采纳的回答

6 个评论
显示 4更早的评论隐藏 4更早的评论

更多回答（1 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Delete only consecutive repeated string entries from a dataset in matlab

5 个评论 显示 3更早的评论隐藏 3更早的评论

采纳的回答

6 个评论 显示 4更早的评论隐藏 4更早的评论

更多回答（1 个）

3 个评论 显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

5 个评论
显示 3更早的评论隐藏 3更早的评论

6 个评论
显示 4更早的评论隐藏 4更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论