How do I import Velocity 3.2.0 CSV DVH data into MATLAB 9.1 (R2016b)?

Question

Daniel Bridges 2017-1-5

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/319120-how-do-i-import-velocity-3-2-0-csv-dvh-data-into-matlab-9-1-r2016b

评论： Walter Roberson 2017-1-7

sample.csv

How do I import radiation oncology software Velocity 3.2.0's dose-volume histogram (DVH) data in a comma-separated value file (CSV, sample file attached) into MATLAB 9.1 (R2016b)? Using Velocity one can create DVH data for multiple tissues displayed in a single graph, and export this data as a sequential two-column CSV.

csvread requires that "the file must contain only numeric values", whereas the CSV is two columns of data sets that begin with header text and end with an empty row.

It appears that for this reason a simple execution of importdata is insufficient, because the command terminates after the importing only the first data set:

   test = importdata('filename.csv');
test = 
struct with fields:
        data: [1024×2 double]
    textdata: {2×1 cell}

whereas the file actually contains additional data sets (e.g. copying from row 1026):

   58.1704  0.00692086
  
   Prostate  
   GY   (CC)
   55.2304  0.0046139
   55.2333  0.00230695

What do we use to import data in CSV that is formatted as follows? (The following describes what is seen using Excel 2016.)

header text in Column 1
header text in Columns 1 and 2
numerical data in columns 1 and 2 in multiple rows
empty row
(repeat for next data set for multiple data sets of various length)

Walter Roberson requested a sample data file and provided a solution below using fopen, fgetl, feof, and textscan.

4 个评论
显示 2更早的评论隐藏 2更早的评论

Daniel Bridges 2017-1-5

编辑：Daniel Bridges 2017-1-5

I am now seeking to answer this question.

It seems a counterproductive workaround to import the entire file into a string and then write a script to parse its contents. Or to put it another way, I expect MathWorks to have a more eloquent solution already prepared that I merely need to find.

One workaround is for Velocity: Instead of creating the "full" multiple-tissue DVH one wishes to export, one must save to multiple files a separate DVH for each organ of interest, so that there is only one data set per CSV. This is not ideal, but it seems faster than continuing to search for additional ideas.

Edit: Walter, I thought it was not uncommon for data to be written sequentially (i.e. appended end-to-end); old magnetic tape comes to mind. Because I thought MathWorks had prepared for common data files, I thought there was a command or option I was simply unaware of. I am sorry if this expectation was incorrect, but I don't see why it was unrealistic. I have attached a sample data file to the original post.

Walter Roberson 2017-1-5

Velocity appears to be from Varian. Varian advertises,

https://www.varian.com/oncology/products/software/image-management-informatics/velocity?cat=store

"Velocity provides a vendor-neutral platform that integrates image, structure, plan and dose data to create a unified patient dataset." Unfortunately their documentation is a bit sparse as to what that format is. Except they mention DICOM, and they mention RT Plan software. Someone has written software to read DICOM RT Plan data in MATLAB; see https://github.com/ulrikls/dicomrt2matlab

It sounds like your data is not DICOM based.

As I poke around, the information I am finding about DVH suggests that the most common formats are not what you are describing your file as having. But it is difficult to tell, as you have not given an example file.

Walter Roberson 2017-1-5

There are millions of file formats. People invent their own more often than they use standard formats, and they modify the file format over time, often without considering backwards capability. There is no practical way for Mathworks to already support them all.

Mag tape was always written in records, often fixed length binary records. Variable length records did exist but when it came time to start a new data structure, typically a new record was written. Not inevitably though: packing multiple structures into one tape record did happen. Remember though that memory was typically not large and a complete record at a time has to be read in for mag tape (no positioning by bytes), so the variable length records did not pack long continuous streams in like became common on disc files.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2017-1-5

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/319120-how-do-i-import-velocity-3-2-0-csv-dvh-data-into-matlab-9-1-r2016b#answer_249448

在 MATLAB Online 中打开

There is no pre-written Mathworks routine to read that file format. It is however not difficult to write coode for it.

   num = 0;
   fid = fopen('sample.csv','rt');
   while true 
     H1 = fgetl(fid) ;
     if feof(fid); break; end 
     H2 = fgetl(fid) ;
     if feof(fid); break; end 
     datacell = textscan(fid, '%f%f', 'delimiter', ',', 'combineoutput', true) ;
     if isempty(datacell) || isempty(datacell{1}); break; end 
     num = num + 1;
     headers(num) = {H1, H2} ;
     data(num) = datacell;
     fgetl(fid);  %the empty line between organs
   end

This will create two cell arrays, one of headers and the other of corresponding numeric values. You might want to do some processing on H1 (organ name) and H2 (not sure what that line is for) before storing that information.

7 个评论
显示 5更早的评论隐藏 5更早的评论

Daniel Bridges 2017-1-6

编辑：Daniel Bridges 2017-1-6

在 MATLAB Online 中打开

The headers line causes the error:

headers(num) = {H1,H2};

It is fixed by allowing for columns, enabling the creation of a 3x2 cell array in this case:

headers(num,:) = {H1,H2};

To get the headers to read correctly, I've had to omit the last line:

fgetl(fid); %the empty line between organs

This command was actually skipping the first header of the next data section, causing the first row of data to be stored as the second header. With it removed, the headers are stored correctly, but the empty row is being stored at the end of the numerical data as NaN in each column.

I'd like to accept this answer once I can remove the NaN from the end of the imported data. I've been writing a script to plot the data, and while the NaN may not negatively affect it since it's at the end of the vectors, for the sake of propriety it seems better to remove it.

I plan to return to this problem in about 10 hours, and try to post a solution myself unless someone does so first.

Daniel Bridges 2017-1-7

编辑：Daniel Bridges 2017-1-7

在 MATLAB Online 中打开

Is it not more legible and memory-efficient to put it immediately after textscan's cell array creation?

     datacell = textscan(fid,'%f%f','delimiter',',','collectoutput',true); 
     if isempty(datacell) || isempty(datacell{1}); break; end 
     if any(isnan(datacell{1}(end,:))); datacell{1}(end,:) = []; end

Walter Roberson 2017-1-7

No, it is the same efficiency. But it certainly does not hurt to have it closer to where datacell is created.

请先登录，再进行评论。

How do I import Velocity 3.2.0 CSV DVH data into MATLAB 9.1 (R2016b)?

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

7 个评论
显示 5更早的评论隐藏 5更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

How do I import Velocity 3.2.0 CSV DVH data into MATLAB 9.1 (R2016b)?

4 个评论 显示 2更早的评论隐藏 2更早的评论

采纳的回答

7 个评论 显示 5更早的评论隐藏 5更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

7 个评论
显示 5更早的评论隐藏 5更早的评论