Finding NaN and Missing values from a mat cell matrix
1 次查看(过去 30 天)
显示 更早的评论
Hello, I have obtained a global matrix from an analysis (which is attached here and it is a reduced matrix as it is exceeded the 5mb) and I would like to find the NaN and missing values for each case to sort out some issues in the code for those values before generating a more complex simulation analysis. As you will see in the mat file there are 70 columns with separate information and each row is identified by the 1st column as it is related to the unique event of my database. I would like to generate two tables with the following information:
1st table containing the summary information of NaN values values in the whole matrix (attched file, that it may not contain NaN values as I have to reduce the number of rows for exceeding the 5mb) where it provides their location based on the row (first column: date_event) that provides the date_event, the name of the station provided in colum 46, and the column of the variable that has the NaN value. For example:
matrix_NaN=['1985-03-03 22:47:08', 'CFLAN,', 'Rrup1'; '1997-02-19 18:25:14','CPLAT', 'Rx'; ..........]
2nd table containing the information of missing values like it was provided with the NaN values:
matrix_missing=['2003-08-26 21:11:35', 'CFLAN,' 'Rx'; '2003-08-26 21:11:35','CTRUJ', sigma_Rx'; ..........]
I would appreciate the help
1 个评论
Stephen23
2024-8-15
Why is this data inefficiently stored as lots and lots of scalar arrays inside a cell array?
Using one table would be much more efficient, and offer much easier ways to process the data.
采纳的回答
Voss
2024-8-15
load('example_global.mat')
C = example_global
As you said, the cell array in the attached mat file doesn't have any NaNs, so I'm going to introduce some for testing/demonstration purposes, to show that the distinction between NaN and <missing> can be made:
C{5,5} = NaN; % introducing NaNs for testing/demonstration
C{100,60} = NaN;
Now, construct a cell array containing info about the NaN and missing values, with 3 columns (corresponding datetime value, "variable" name, and value - either NaN or <missing>):
[ridx,cidx] = find(cellfun(@(x)any(ismissing(x)),C));
lidx = sub2ind(size(C),ridx,cidx);
C_missing = [C(ridx,1) C(1,cidx).' C(lidx)];
disp(C_missing)
If you need to split it into two cell arrays, one for the NaNs and one for the <missing>s, you can do so like this:
nan_rows = cellfun(@isnumeric,C_missing(:,3));
matrix_NaN = C_missing(nan_rows,[1 2]);
matrix_missing = C_missing(~nan_rows,[1 2]);
disp(matrix_NaN)
disp(matrix_missing)
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Multidimensional Arrays 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!