How can I find and remove the rows starting with different word?

Question

Ergün AKGÜN 2018-3-29

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/391433-how-can-i-find-and-remove-the-rows-starting-with-different-word

评论： Ergün AKGÜN 2018-3-29

Hi All;

I have a 40000 row document. The words 1 to 485 start with the letter "a". The words 486 to 1158 start with the letter "b". It starts with the letter "c" between 1158 and 4000... There are some erroneous places among these word groups. For example, there are words starting with another letter such as "b, c, d, v ..." in groups of words that should start with the letter "a" between 1 and 485. How can I find and remove the rows of these?

Thank you for your help.

5 个评论
显示 3更早的评论隐藏 3更早的评论

Guillaume 2018-3-29

在 MATLAB Online 中打开

So, the problem is

extracting the first letter of each line
finding out which ones are out of order

That's fairly straightforward but for one thing: You're obviously not using an english alphabet (turk?) whose order may not match matlab's idea of order. For example

>> sort('aba güreşi değnek göstermek')
ans =
  '   aabdeeeeeggikkmnrrstöüğş'

is probably the wrong order.

If the alphabet for 1st letter is just US-ASCII [a-z] then it's easy.

Ergün AKGÜN 2018-3-29

编辑：Ergün AKGÜN 2018-3-29

在 MATLAB Online 中打开

I'm so sorry about my poor english; As you see in the table in "C" letter, there is sentence starting with "a" Example: 'adamlarına ve ondan yana....'.

Here is the problem:

110:1200 rows contains words starting with "C" letter.
I want to scan 110:1200 and find words which are not starting with letter "c"
And delete that row

In this table i want to detect 6th and 14th row (because its not starting with "c") and delete.

    {'ceket'                                                                             }
    {'ceketatay'                                                                         }
    {'celâdet'                                                                           }
    {'celâl'                                                                             }
    {'Celâlî'                                                                            }
    {'adamlarına ve ondan yana olanlara, sonraları da türeyen bütün eşkıyaya verilen ad.'}
    {'Celâlîlik'                                                                         }
    {'celâllenme'                                                                        }
    {'celâllenmek'                                                                       }
    {'celâlli'                                                                           }
    {'celâllice'                                                                         }
    {'celbe'                                                                             }
    {'celep'                                                                             }
    {'yetiştirilen genç.'                                                                }
    {'celeplik'                                                                          }
    {'celî'                                                                              }
    {'celî yazı celil'                                                                   }
    {'cellât'                                                                            }
    {'cellât gibi'                                                                       }
    {'cellâtlık'                                                                         }
    {'celp'                                                                              }

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Elias Gule 2018-3-29

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/391433-how-can-i-find-and-remove-the-rows-starting-with-different-word#answer_312632

在 MATLAB Online 中打开

I hope this helps.

Cnew = cellfun(@char,Cnew,'uni',0); % convert the cell array contents to string
index = cellfun('isempty',(regexpi(Cnew,'^c.*'))); % find index of line not starting with C
Cnew(index) = []; % remove line not starting with C

3 个评论
显示 1更早的评论隐藏 1更早的评论

Elias Gule 2018-3-29

在 MATLAB Online 中打开

yep. the index variable gives you an array of 1s and 0s. so to remove rows within your specified range, you can modify the index variable like so:

Cnew = cellfun(@char,Cnew,'uni',0); % convert the cell array contents to string
index = cellfun('isempty',(regexpi(Cnew,'^c.*'))); % find index of line not starting with C
index(1:299) = 0;
index(1001:end)=0;
Cnew(index) = []; % remove line not starting with C

Ergün AKGÜN 2018-3-29

Thank you very much, both answers are correct bur Elias answer is what exactly i need.

Thank you Elias and Guillaume

请先登录，再进行评论。

Answer 2

Guillaume 2018-3-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/391433-how-can-i-find-and-remove-the-rows-starting-with-different-word#answer_312631

在 MATLAB Online 中打开

Your english is fine and I understood what you want to do.

If the first letter of every line is belong to the character set [a-z] (and it looks like you want to ignore case), then it's very easy to solve.

However, if we have to take into account accented letters such as ğ then it's a lot more complicated because matlab has no concept of internationalisation. I have no idea where ğ is located in your alphabet but it's not going to be where matlab think it is.

If we assume US-ASCII alphabet only, the intruders can be detected easily:

firstletter = lower(cellfun(@(s) s(1), Cnew)).';  %get first letter and convert to lower case
ldiff = sign(diff(firstletter));
outoforderrows = union(strfind(ldiff, [-1 1]), strfind(ldiff, [1 -1])) + 1

But with turkish alphabet, lower may not work correctly for a start. In addition, since matlab may have the wrong idea about the order of letters, it may tell you that some lines are out of order when they are not.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How can I find and remove the rows starting with different word?

5 个评论
显示 3更早的评论隐藏 3更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

How can I find and remove the rows starting with different word?

5 个评论 显示 3更早的评论隐藏 3更早的评论

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

5 个评论
显示 3更早的评论隐藏 3更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论