Data Organization in ECG Analysis: Separate Leads or Individual Signals
4 次查看(过去 30 天)
显示 更早的评论
Dear Matlab Community,
I am currently in the process of planning data organization for an ECG analysis using the PTB-XL dataset, and I would like to seek your advice and expertise on a specific question.
When it comes to data organization, is it recommended to use each lead separately (12 leads --> 12 columns)? Or would it be preferable to adopt an approach where each row represents a distinct ECG signal?
I would greatly appreciate insights and recommendations from those who have experience in this area.
Thank you very much for your valuable contribution.
Best regards,
采纳的回答
Star Strider
2024-5-25
I am not certain what you want to do, or what question you are asking. In general, EKG data are analysed column-wise, with a time vector (generally beginning at zero with regular sampling intervals and a sampling frequency at least 256 Hz and 1 kHz if possible) for the first column, and each lead being successive columns, ordered characteristically as . This also is the way MATLAB data matrices in general are organised.
22 个评论
rawaa mejri
2024-5-25
To clarify, Do we use a single temporal vector for all (one column only), or do we create a temporal vector for each lead? For instance, for lead I, do we have a column called time_I (12 leads -> 12 temporal vectors)? Is that correct? Or do we have a single temporal vector (one column only) for all 12 leads (12 columns)?
Star Strider
2024-5-25
My pleasure!
I’m not sure what you mean by ‘temporal vector’. There is characteristically one time vector in the first column, and each corresponding EKG lead in columns 2-13 (2-14 if is included). All the leads are collected and recorded at the same times, those times given in the first-column time vector.
Collectting them serially is not appropriate, because natural variations in heart rate in a normal, healthy heart make the serial records ininterpretable when aggregated. They must all be collected and recorded at the same times.
rawaa mejri
2024-5-25
Thanks a lot for your help !
I understand! Just one last question: in the PTB-XL dataset, I have a .dat file and a .hea file. I suppose I should use the .hea file to extract the leads. Regarding the time vector, should I generate it based on the sampling frequency provided?
Star Strider
2024-5-25
My pleasure!
I am not familiar with that dataset. If you do not already have a time vector provided with it, you can create one using the linspace function —
EKG = rand(10,12); % EKG Data
Fs = 256; % Sampling Frequency
Time = linspace(0, size(EKG,1)-1, size(EKG,1)).'/Fs; % Synthetic Time Vector
EKG_Matrix = [Time EKG]
EKG_Matrix = 10x13
0 0.4856 0.4760 0.7745 0.4151 0.9988 0.3865 0.1495 0.0096 0.3425 0.5348 0.8598 0.7829
0.0039 0.2463 0.4456 0.8281 0.5107 0.2525 0.9103 0.6471 0.9814 0.8694 0.4492 0.7721 0.6395
0.0078 0.0309 0.3216 0.8742 0.5227 0.1697 0.0887 0.3554 0.8075 0.9444 0.6633 0.4132 0.8197
0.0117 0.8595 0.4542 0.1945 0.4590 0.8822 0.3897 0.8010 0.4530 0.7423 0.4072 0.3338 0.5004
0.0156 0.6430 0.4124 0.0590 0.6767 0.2136 0.2213 0.6985 0.4550 0.9520 0.1370 0.3643 0.0842
0.0195 0.2821 0.6850 0.2601 0.8432 0.0871 0.4847 0.4177 0.0077 0.2032 0.9331 0.3941 0.1442
0.0234 0.8091 0.6667 0.7804 0.9966 0.2299 0.6895 0.8777 0.4294 0.6993 0.5283 0.4722 0.7945
0.0273 0.3704 0.8445 0.5042 0.5735 0.3571 0.5817 0.5753 0.2384 0.2834 0.7878 0.5440 0.5434
0.0312 0.7034 0.0919 0.8106 0.9310 0.7157 0.6494 0.4306 0.1360 0.6325 0.9629 0.2411 0.9952
0.0352 0.4809 0.0332 0.9774 0.0175 0.4227 0.0604 0.3911 0.7998 0.9508 0.4208 0.3304 0.4006
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
figure
plot(EKG_Matrix(:,1), EKG_Matrix(:,2:end)+[1:size(EKG_Matrix,2)-1]*2)
grid
ylim('padded')
xlabel('Time')
legend(compose('Lead %s',["I","II","III","aV_R","aV_L","aV_F","V_1","V_2","V_3","V_4","V_5","V_6"]), 'Location','eastoutside')
.
rawaa mejri
2024-5-25
Thank you again for your invaluable assistance. Following your recommendations, I have selected the first example and tried it out. Here are the signals; based on your experience, are they correct? Here are the data in the file .hea:
00001_lr 12 100 1000
00001_lr.dat 16 1000.0(0)/mV 16 0 -119 1508 0 I
00001_lr.dat 16 1000.0(0)/mV 16 0 -55 723 0 II
00001_lr.dat 16 1000.0(0)/mV 16 0 64 64758 0 III
00001_lr.dat 16 1000.0(0)/mV 16 0 86 64423 0 AVR
00001_lr.dat 16 1000.0(0)/mV 16 0 -91 1211 0 AVL
00001_lr.dat 16 1000.0(0)/mV 16 0 4 7 0 AVF
00001_lr.dat 16 1000.0(0)/mV 16 0 -69 63827 0 V1
00001_lr.dat 16 1000.0(0)/mV 16 0 -31 6999 0 V2
00001_lr.dat 16 1000.0(0)/mV 16 0 0 63759 0 V3
00001_lr.dat 16 1000.0(0)/mV 16 0 -26 61447 0 V4
00001_lr.dat 16 1000.0(0)/mV 16 0 -39 64979 0 V5
00001_lr.dat 16 1000.0(0)/mV 16 0 -79 832 0 V6
here the first rows of data :
Star Strider
2024-5-25
My pleasure!
That appears to be correct, and the timing appears to be approppriate (about 66 bpm).
The only change I would make is the order of the plots. The most common arrangement is something like this —
EKG = rand(100,12); % EKG Data
Fs = 256; % Sampling Frequency
Time = linspace(0, size(EKG,1)-1, size(EKG,1)).'/Fs; % Synthetic Time Vector
EKG_Matrix = [Time EKG];
Leads = ["I","II","III","aV_R","aV_L","aV_F","V_1","V_2","V_3","V_4","V_5","V_6"];
figure
tiledlayout(6,2)
for k = 1:6
nexttile(2*k-1)
plot(EKG_Matrix(:,1), EKG_Matrix(:,k+1))
grid
title(Leads(k))
end
for k = 1:6
nexttile(2*k)
plot(EKG_Matrix(:,1), EKG_Matrix(:,k+6))
grid
title(Leads(k+6))
end
Another common format —
figure
tiledlayout(3,4)
for k = 1:3
nexttile(4*k-3)
plot(EKG_Matrix(:,1), EKG_Matrix(:,k+1))
grid
title(Leads(k))
end
for k = 1:3
nexttile(4*k-2)
plot(EKG_Matrix(:,1), EKG_Matrix(:,k+4))
grid
title(Leads(k+3))
end
for k = 1:3
nexttile(4*k-1)
plot(EKG_Matrix(:,1), EKG_Matrix(:,k+7))
grid
title(Leads(k+6))
end
for k = 1:3
nexttile(4*k)
plot(EKG_Matrix(:,1), EKG_Matrix(:,k+10))
grid
title(Leads(k+9))
end
That would actually look corect if I had your data to plot.
.
rawaa mejri
2024-5-28
Regarding the header (.hea) file which contains values like this:
00001_lr 12 100 1000
00001_lr.dat 16 1000.0(0)/mV 16 0 -119 1508 0 I
00001_lr.dat 16 1000.0(0)/mV 16 0 -55 723 0 II
00001_lr.dat 16 1000.0(0)/mV 16 0 64 64758 0 III
00001_lr.dat 16 1000.0(0)/mV 16 0 86 64423 0 AVR
00001_lr.dat 16 1000.0(0)/mV 16 0 -91 1211 0 AVL
00001_lr.dat 16 1000.0(0)/mV 16 0 4 7 0 AVF
00001_lr.dat 16 1000.0(0)/mV 16 0 -69 63827 0 V1
00001_lr.dat 16 1000.0(0)/mV 16 0 -31 6999 0 V2
00001_lr.dat 16 1000.0(0)/mV 16 0 0 63759 0 V3
00001_lr.dat 16 1000.0(0)/mV 16 0 -26 61447 0 V4
00001_lr.dat 16 1000.0(0)/mV 16 0 -39 64979 0 V5
00001_lr.dat 16 1000.0(0)/mV 16 0 -79 832 0 V6
How do you determine the values for gain and baseline? After extracting the signals, is it logical for the first line (excluding the header) to contain all zeros ( like the first row of the picture )?
Thank you very much for your clarification.
Star Strider
2024-5-28
I am not familiar with that format. If you can upload those files and tell me from where you downloaded them (URL), I might be able to figure that out by referring to the approppriate documentation for the file. Just now, I have no idea what the numbers in the .hea file refer to.
rawaa mejri
2024-5-28
Thanks a lot ,
here is the link of the whole dataset : https://physionet.org/content/ptb-xl/1.0.3/
here is the link of the files panel : https://physionet.org/content/ptb-xl/1.0.3/records100/00000/#files-panel
Star Strider
2024-5-28
As always, my pleasure!
I have not used PhysioNet in a few years, and am not familiar with this database (added after I last used PhysioNet).
I cannot find any information on the header file format, so I still do not have any idea how to interpret it. However, looking at the Summary tab for the first record, it looks suspicously like the .hea information, so perhaps it would be interpreted as:
Record length 00:00:10
Clock frequency 100 ticks per second
Signal: I1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: II1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: III1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: AVR1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: AVL1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: AVF1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: V11 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: V21 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: V31 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: V41 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: V51 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
Signal: V61 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0
and perhaps that is how to decode it. There does not appear to be any other information available with respect to its format, at least that I can find. I have no idea what the numbers mean otherwise, and I cannot find a source for that format (otherwise it would likely be straightforward to write code to translate that information into something intelligible).
It is necessary to have an account and log into it to be able to contact the authors for questions or comments. (I do not have one, and have no specific need to create one.) This is not generally necessary for PhysioNet in my experience elsewhere on the site, since I have contacted the administrators a few times with specific questions.
Apparently, the sampling frequency is 100 Hz, with a Nyquist frequency of 50 Hz (this is pushing it for EKG traces, since the spectral content of a normal EKG is generally 0-45 Hz, and abnormal EKGs can have frequency components up to about 100 Hz).
That is the best I can do with respect to the .hea files. If you have more information about them in the files you have downloaded, please share it.
.
rawaa mejri
2024-5-28
Thank you, honestly I don't have any other information but I found this on the site (https://archive.physionet.org/physiotools/wag/header-5.htm):
- "ADC gain (ADC units per physical unit) [optional] This field is a floating-point number that specifies the difference in sample values that would be observed if a step of one physical unit occurred in the original analog signal. For ECGs, the gain is usually roughly equal to the R-wave amplitude in a lead that is roughly parallel to the mean cardiac electrical axis. If the gain is zero or missing, this indicates that the signal amplitude is uncalibrated; in such cases, a value of 200 (DEFGAIN, defined in <wfdb/wfdb.h>) ADC units per physical unit may be assumed.
- baseline (ADC units) [optional] This field can be present only if the ADC gain is also present. It is not separated by whitespace from the ADC gain field; rather, it is surrounded by parentheses, which delimit it. The baseline is an integer that specifies the sample value corresponding to 0 physical units. If absent, the baseline is taken to be equal to the ADC zero. Note that the baseline need not be a value within the ADC range; for example, if the ADC input range corresponds to 200-300 degrees Kelvin, the baseline is the (extended precision) value that would map to 0 degrees Kelvin. WFDB library versions 5.0 and earlier ignore baseline fields."
Based on this definition and your explanation, will we find all values at 0 and at 1000? Does it make sense?
Star Strider
2024-5-28
As always, my pleasure!
Looking at my previous Comment, there are 1000 adu (that I assume means analog-digital units, or bits) per mV, and since the typical value for an R-deflection is 1 mV, that looks correct to me. I assume that is the ADC gain. It would appear that the 16-bit ADC values would be converted to floating-point representation using that value. (That information is hidden from us, however we do not need it anyway.) The data you plotted appear to reflect that. The table image would support that assumption.
rawaa mejri
2024-5-28
Can you validate this please, I've done everything?
heaFile = '00001_lr.hea';
datFile = '00001_lr.dat';
% Open and read the .hea file
fid = fopen(heaFile, 'r');
if fid == -1
error('Error opening .hea file');
end
headerInfo = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
% Extract information from the .hea file
headerLines = headerInfo{1};
numSignals = sscanf(headerLines{1}, '%*s %d %*d %*d');
samplingRate = sscanf(headerLines{1}, '%*s %*d %d %*d');
numSamples = sscanf(headerLines{1}, '%*s %*d %*d %d');
% Initialize gain and baseline values
defaultGain = 200; % Default gain value if not specified
gain = zeros(1, numSignals);
baseline = zeros(1, numSignals);
% Read the line containing gain and baseline information
gainBaselineLine = headerLines{2};
gainBaselineParts = strsplit(gainBaselineLine);
% Extract gain and baseline
gain(1) = str2double(gainBaselineParts{3}(1:strfind(gainBaselineParts{3}, '(')-1));
baseline(1) = str2double(gainBaselineParts{6});
for i = 2:numSignals
lineParts = strsplit(headerLines{i+1});
gain(i) = str2double(lineParts{3}(1:strfind(lineParts{3}, '(')-1));
baseline(i) = str2double(lineParts{6});
end
fid = fopen(datFile, 'r');
if fid == -1
error('Error opening .dat file');
end
data = fread(fid, [numSignals, numSamples], 'int16')';
fclose(fid);
% Convert data to physical units
for i = 1:numSignals
data(:, i) = (data(:, i) - baseline(i)) / gain(i);
end
% Generate time in seconds
time = (0:numSamples-1)' / samplingRate;
outputData = [time, data];
header = {'Time', 'I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'};
outputTable = array2table(outputData, 'VariableNames', header);
csvFileName = 'ecg_data.csv';
writetable(outputTable, csvFileName);
disp(['ECG data extracted and saved to ', csvFileName]);
Star Strider
2024-5-28
As always, my pleasure!
Note that the fopen function has a second output that contains an error messaage.
There is nothing wrong with using textscan, however readtable might be more appropriate, and probably easier to work with. You have array2table so I assume you have readtable.
Beyond those considerations, the code appears to be correct. (I do not have your data so I cannot independently verify it.)
If you want to remove any baseline drift or highh-frequency noise (or both, and if you have the Signal Processing Toolbox), the highpass or bandpass functions could be appropriate. For best results, use the 'ImpulseResponse','iir' name-value pair with those. There is a minimal amount of high-frequency noise, however if you want to eliminate it, first take the Fourier transform of your signal to determine the spectral characteristics, and then use that information to design your filter passband limits. (I have my own function that I can post, that does that efficiently, however I would suggest using the pspectrum function otherwise, unless you want to write your own function to implement the fft and return the appropriate results.)
.
rawaa mejri
2024-5-28
Thank you very much, thank you for your help and your multiple pieces of advice. Is it correct, logical, that every lead signal starts at 0 ?
As you mentioned noise, in the dataset, there is a file that contains 4 types of noise (baseline drift, static noise, burst noise, electrodes problems). Each type of noise is linked to a lead, for example: baseline drift in II, III, AVF, or all (i.e., all leads). Are there any recommended methods? Thresholds?
Star Strider
2024-5-28
As always, my pleasure!
Every signal should start at time=0, and all leads should share the same time vector. The isoelectric point (zero reference) in an EKG recording is the zero voltage reference of the P-R interval in every P-T segment (beginning of the P-deflection to the end of the T-deflection), because the heart is considered to be ‘at rest’ at that time. Every other voltage is referenced to that. Ideally, all P-R isoelectric points are the same voltage throughout the EKG recording, and in every lead.
With respect to noise, the sort of signal processing used depends on the type of noise (broadband or band-limited). Baseline variations can be eliminated with a highpass or bandpass filter. What you use depends on what you want to do.
Once the noise and baseline drift are accounted for (and eliminated if possible), there are no specific thresholds, at least with respect to signal processing Significant features after that are the various intervals and voltages in the intervals. Books have been written on EKG interpretation (absolute and relative voltages and specific intervals) so I will not go into that here. Braunwald‘s Heart Disease likely has the best discussion on all of that.
.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)