Strange output with function sortrows

Question

Digvijay Rawat 2017-3-9

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/328962-strange-output-with-function-sortrows

编辑： dpb 2017-3-10

problemo.mat

Hello. I have a 4 column matrix with many (1000s) rows. I want the data sorted by the first column (which basically has multiple identical entries for each unique entry) preserving the rows. So naturally, I use sortrows. The problem is, the output variable has just one row messed up every time the value in the first column changes.

For clarity, this is the .mat file of the output variable. For example, in the second column, row no 74 and 75, after -0.010, 0.010 is shown instead of -0.09. Rest all the entries are fine. This happens for every unique entry of the first column but only does not happen with the first unique entry. Can anyone explain this to me or give a possible solution?

EDIT - .mat file attached now. It has the parent cell array A and the resulting matrix after using sortrows on the matrix contained in the first cell of A.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Stephen23 2017-3-9

@Digvijay Rawat: please upload files here and do not put links to third-party websites. To upload click the paperclip button above the textbox.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

dpb 2017-3-9

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/328962-strange-output-with-function-sortrows#answer_258023

编辑：dpb 2017-3-10

在 MATLAB Online 中打开

No .mat file attached, try again.

I'd venture that whassup is floating point roundoff so that all those "identical" values aren't actually precisely identical and the ordering is what it is because

x-2*eps < x-eps < x <x+eps < x+2*eps < ... <x+N*eps

where eps is rounding difference for the values x in the column. Try subtracting the base value of the first column value against which you're sorting for the areas around row 74 where the anomaly occurs and report the result. I suspect you'll then understand the reason; perhaps you need to round or use a tolerance on determining the unique values within that tolerance for the first column and then classify by it then sort by subsequent columns to get the ordering expected. Or, perhaps this really is the expected ordering if the precise values are needed! :)

ADDENDUM

Using your data file

>> fprintf(['%15.12f' repmat('%8.4f',1,3) '\n'],sorted(74:80,:).')
020000000000 -0.0110  0.0000  0.0000
020000000000  0.0110  0.0000  0.0000
020000000790 -0.0109  0.0000 -0.0539
020000000790 -0.0108  0.0000 -0.0929
020000000790 -0.0106  0.0000 -0.1183
020000000790 -0.0105  0.0000 -0.1321
020000000790 -0.0104  0.0000 -0.1357
>>

makes it easier to see the "why"...as the various other things looked at in Comment show, there are two values that are nominally 0.2 in the dataset and the data are sorted on those values correctly.

CONCLUSION

To fix the problem,

>> length(unique(A1(:,1)))    % review...how many in first column unique?
ans =
  31
>> A1(:,1)=round(A1(:,1),3);  % clean up the first column to 3 decimal places
>> length(unique(A1(:,1)))    % and after that, only half as many
ans =
  16
>> s=sortrows(A1);            % now sort and see what "bad" range looks like
>> s(74:80,:)
ans =
  0.0200   -0.0110         0         0
  0.0200   -0.0109         0   -0.0539
  0.0200   -0.0108         0   -0.0929
  0.0200   -0.0106         0   -0.1183
  0.0200   -0.0105         0   -0.1321
  0.0200   -0.0104         0   -0.1357
  0.0200   -0.0102         0   -0.1310
>>

Voila! What you were expecting to see at first...

2 个评论
显示无隐藏无

Digvijay Rawat 2017-3-9

Hey, file attached now. eps should not be an issue here since the values that are being mixed up are of different sign altogether.

dpb 2017-3-9

编辑：dpb 2017-3-9

在 MATLAB Online 中打开

The order for those is controlled by the order for the first column, though..

>> load problemo.mat
>> A1=A{1};
>> u=unique(A1(:,1)),;
u =
       0
  0.0200
  0.0200
  0.0400
  0.0400
  0.0600
  ...
  0.2400
  0.2400
  0.2600
  0.2600
  0.2800
  0.2800
  0.3000
  0.3000
>> min(diff(u))
ans =
 7.9000e-10
>>

Note there are doubled-up values for each of the nominal values with minimum difference shown above; rest is probably about the same.

Alternatively,

>> sorted(74:80,:)   % your "problem" area...
ans =
  0.0200   -0.0110         0         0
  0.0200    0.0110         0         0
  0.0200   -0.0109         0   -0.0539
  0.0200   -0.0108         0   -0.0929
  0.0200   -0.0106         0   -0.1183
  0.0200   -0.0105         0   -0.1321
  0.0200   -0.0104         0   -0.1357
>> sorted(75,1)<sorted(76,1)  % what's the relationship between these 2?
ans =
   1
>> diff(sorted(74:80,1))>0
ans =
   0
   1
   0
   0
   0
   0
>>

What the above means is the 2nd is same as first; both of those are less than (albeit only slightly) third and then the rest are identical to that one over this subset. This same effect will happen at every one of the above matched pairs returned by unique; whether the 2nd column is sorted over the entire group for each group will then depend solely on the luck of the draw as to whether they happen to fall in the correct order already; for this set that didn't happen.

BUT the one that appears out of order with values in column 2 is in correct order based on the higher-priority sorting of column 1. As noted in Answer, if you want this to go away, you'll have to fixup the first column values to not have the discrepancy in values that causes their ordering in natural order.

请先登录，再进行评论。