Data Organization in ECG Analysis: Separate Leads or Individual Signals

Question

rawaa mejri 2024-5-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2122541-data-organization-in-ecg-analysis-separate-leads-or-individual-signals

评论： Star Strider 2024-5-28

Dear Matlab Community,

I am currently in the process of planning data organization for an ECG analysis using the PTB-XL dataset, and I would like to seek your advice and expertise on a specific question.

When it comes to data organization, is it recommended to use each lead separately (12 leads --> 12 columns)? Or would it be preferable to adopt an approach where each row represents a distinct ECG signal?

I would greatly appreciate insights and recommendations from those who have experience in this area.

Thank you very much for your valuable contribution.

Best regards,

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Star Strider 2024-5-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2122541-data-organization-in-ecg-analysis-separate-leads-or-individual-signals#answer_1463331

I am not certain what you want to do, or what question you are asking. In general, EKG data are analysed column-wise, with a time vector (generally beginning at zero with regular sampling intervals and a sampling frequency at least 256 Hz and 1 kHz if possible) for the first column, and each lead being successive columns, ordered characteristically as

. This also is the way MATLAB data matrices in general are organised.

22 个评论
显示 20更早的评论隐藏 20更早的评论

Star Strider 2024-5-25

在 MATLAB Online 中打开

My pleasure!

I am not familiar with that dataset. If you do not already have a time vector provided with it, you can create one using the linspace function —

EKG = rand(10,12); % EKG Data

Fs = 256; % Sampling Frequency

Time = linspace(0, size(EKG,1)-1, size(EKG,1)).'/Fs; % Synthetic Time Vector

EKG_Matrix = [Time EKG]

EKG_Matrix = 10x13

0 0.4856 0.4760 0.7745 0.4151 0.9988 0.3865 0.1495 0.0096 0.3425 0.5348 0.8598 0.7829 0.0039 0.2463 0.4456 0.8281 0.5107 0.2525 0.9103 0.6471 0.9814 0.8694 0.4492 0.7721 0.6395 0.0078 0.0309 0.3216 0.8742 0.5227 0.1697 0.0887 0.3554 0.8075 0.9444 0.6633 0.4132 0.8197 0.0117 0.8595 0.4542 0.1945 0.4590 0.8822 0.3897 0.8010 0.4530 0.7423 0.4072 0.3338 0.5004 0.0156 0.6430 0.4124 0.0590 0.6767 0.2136 0.2213 0.6985 0.4550 0.9520 0.1370 0.3643 0.0842 0.0195 0.2821 0.6850 0.2601 0.8432 0.0871 0.4847 0.4177 0.0077 0.2032 0.9331 0.3941 0.1442 0.0234 0.8091 0.6667 0.7804 0.9966 0.2299 0.6895 0.8777 0.4294 0.6993 0.5283 0.4722 0.7945 0.0273 0.3704 0.8445 0.5042 0.5735 0.3571 0.5817 0.5753 0.2384 0.2834 0.7878 0.5440 0.5434 0.0312 0.7034 0.0919 0.8106 0.9310 0.7157 0.6494 0.4306 0.1360 0.6325 0.9629 0.2411 0.9952 0.0352 0.4809 0.0332 0.9774 0.0175 0.4227 0.0604 0.3911 0.7998 0.9508 0.4208 0.3304 0.4006

<mw-icon class=""></mw-icon>

figure

plot(EKG_Matrix(:,1), EKG_Matrix(:,2:end)+[1:size(EKG_Matrix,2)-1]*2)

grid

ylim('padded')

xlabel('Time')

legend(compose('Lead %s',["I","II","III","aV_R","aV_L","aV_F","V_1","V_2","V_3","V_4","V_5","V_6"]), 'Location','eastoutside')

.

Star Strider 2024-5-25

在 MATLAB Online 中打开

My pleasure!

That appears to be correct, and the timing appears to be approppriate (about 66 bpm).

The only change I would make is the order of the plots. The most common arrangement is something like this —

EKG = rand(100,12); % EKG Data

Fs = 256; % Sampling Frequency

Time = linspace(0, size(EKG,1)-1, size(EKG,1)).'/Fs; % Synthetic Time Vector

EKG_Matrix = [Time EKG];

Leads = ["I","II","III","aV_R","aV_L","aV_F","V_1","V_2","V_3","V_4","V_5","V_6"];

figure

tiledlayout(6,2)

for k = 1:6

nexttile(2*k-1)

plot(EKG_Matrix(:,1), EKG_Matrix(:,k+1))

grid

title(Leads(k))

end

for k = 1:6

nexttile(2*k)

plot(EKG_Matrix(:,1), EKG_Matrix(:,k+6))

grid

title(Leads(k+6))

end

Another common format —

figure

tiledlayout(3,4)

for k = 1:3

nexttile(4*k-3)

plot(EKG_Matrix(:,1), EKG_Matrix(:,k+1))

grid

title(Leads(k))

end

for k = 1:3

nexttile(4*k-2)

plot(EKG_Matrix(:,1), EKG_Matrix(:,k+4))

grid

title(Leads(k+3))

end

for k = 1:3

nexttile(4*k-1)

plot(EKG_Matrix(:,1), EKG_Matrix(:,k+7))

grid

title(Leads(k+6))

end

for k = 1:3

nexttile(4*k)

plot(EKG_Matrix(:,1), EKG_Matrix(:,k+10))

grid

title(Leads(k+9))

end

That would actually look corect if I had your data to plot.

.

Star Strider 2024-5-28

As always, my pleasure!

I have not used PhysioNet in a few years, and am not familiar with this database (added after I last used PhysioNet).

I cannot find any information on the header file format, so I still do not have any idea how to interpret it. However, looking at the Summary tab for the first record, it looks suspicously like the .hea information, so perhaps it would be interpreted as:

Record length 00:00:10

Clock frequency 100 ticks per second

Signal: I1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: II1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: III1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: AVR1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: AVL1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: AVF1 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: V11 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: V21 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: V31 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: V41 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: V51 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

Signal: V61 tick per sample; 1000 adu/mV; 16-bit ADC, zero at 0; baseline is 0

and perhaps that is how to decode it. There does not appear to be any other information available with respect to its format, at least that I can find. I have no idea what the numbers mean otherwise, and I cannot find a source for that format (otherwise it would likely be straightforward to write code to translate that information into something intelligible).

It is necessary to have an account and log into it to be able to contact the authors for questions or comments. (I do not have one, and have no specific need to create one.) This is not generally necessary for PhysioNet in my experience elsewhere on the site, since I have contacted the administrators a few times with specific questions.

Apparently, the sampling frequency is 100 Hz, with a Nyquist frequency of 50 Hz (this is pushing it for EKG traces, since the spectral content of a normal EKG is generally 0-45 Hz, and abnormal EKGs can have frequency components up to about 100 Hz).

That is the best I can do with respect to the .hea files. If you have more information about them in the files you have downloaded, please share it.

.

rawaa mejri 2024-5-28

Thanks @Star Strider,

Thank you, honestly I don't have any other information but I found this on the site (https://archive.physionet.org/physiotools/wag/header-5.htm):

"ADC gain (ADC units per physical unit) [optional] This field is a floating-point number that specifies the difference in sample values that would be observed if a step of one physical unit occurred in the original analog signal. For ECGs, the gain is usually roughly equal to the R-wave amplitude in a lead that is roughly parallel to the mean cardiac electrical axis. If the gain is zero or missing, this indicates that the signal amplitude is uncalibrated; in such cases, a value of 200 (DEFGAIN, defined in <wfdb/wfdb.h>) ADC units per physical unit may be assumed.
baseline (ADC units) [optional] This field can be present only if the ADC gain is also present. It is not separated by whitespace from the ADC gain field; rather, it is surrounded by parentheses, which delimit it. The baseline is an integer that specifies the sample value corresponding to 0 physical units. If absent, the baseline is taken to be equal to the ADC zero. Note that the baseline need not be a value within the ADC range; for example, if the ADC input range corresponds to 200-300 degrees Kelvin, the baseline is the (extended precision) value that would map to 0 degrees Kelvin. WFDB library versions 5.0 and earlier ignore baseline fields."

Based on this definition and your explanation, will we find all values at 0 and at 1000? Does it make sense?

rawaa mejri 2024-5-28

Thanks a lot @Star Strider,

Can you validate this please, I've done everything?

heaFile = '00001_lr.hea';

datFile = '00001_lr.dat';

% Open and read the .hea file

fid = fopen(heaFile, 'r');

if fid == -1

error('Error opening .hea file');

end

headerInfo = textscan(fid, '%s', 'Delimiter', '\n');

fclose(fid);

% Extract information from the .hea file

headerLines = headerInfo{1};

numSignals = sscanf(headerLines{1}, '%*s %d %*d %*d');

samplingRate = sscanf(headerLines{1}, '%*s %*d %d %*d');

numSamples = sscanf(headerLines{1}, '%*s %*d %*d %d');

% Initialize gain and baseline values

defaultGain = 200; % Default gain value if not specified

gain = zeros(1, numSignals);

baseline = zeros(1, numSignals);

% Read the line containing gain and baseline information

gainBaselineLine = headerLines{2};

gainBaselineParts = strsplit(gainBaselineLine);

% Extract gain and baseline

gain(1) = str2double(gainBaselineParts{3}(1:strfind(gainBaselineParts{3}, '(')-1));

baseline(1) = str2double(gainBaselineParts{6});

for i = 2:numSignals

lineParts = strsplit(headerLines{i+1});

gain(i) = str2double(lineParts{3}(1:strfind(lineParts{3}, '(')-1));

baseline(i) = str2double(lineParts{6});

end

fid = fopen(datFile, 'r');

if fid == -1

error('Error opening .dat file');

end

data = fread(fid, [numSignals, numSamples], 'int16')';

fclose(fid);

% Convert data to physical units

for i = 1:numSignals

data(:, i) = (data(:, i) - baseline(i)) / gain(i);

end

% Generate time in seconds

time = (0:numSamples-1)' / samplingRate;

outputData = [time, data];

header = {'Time', 'I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'};

outputTable = array2table(outputData, 'VariableNames', header);

csvFileName = 'ecg_data.csv';

writetable(outputTable, csvFileName);

disp(['ECG data extracted and saved to ', csvFileName]);

Star Strider 2024-5-28

As always, my pleasure!

Note that the fopen function has a second output that contains an error messaage.

There is nothing wrong with using textscan, however readtable might be more appropriate, and probably easier to work with. You have array2table so I assume you have readtable.

Beyond those considerations, the code appears to be correct. (I do not have your data so I cannot independently verify it.)

If you want to remove any baseline drift or highh-frequency noise (or both, and if you have the Signal Processing Toolbox), the highpass or bandpass functions could be appropriate. For best results, use the 'ImpulseResponse','iir' name-value pair with those. There is a minimal amount of high-frequency noise, however if you want to eliminate it, first take the Fourier transform of your signal to determine the spectral characteristics, and then use that information to design your filter passband limits. (I have my own function that I can post, that does that efficiently, however I would suggest using the pspectrum function otherwise, unless you want to write your own function to implement the fft and return the appropriate results.)

.

Star Strider 2024-5-28

As always, my pleasure!

Every signal should start at time=0, and all leads should share the same time vector. The isoelectric point (zero reference) in an EKG recording is the zero voltage reference of the P-R interval in every P-T segment (beginning of the P-deflection to the end of the T-deflection), because the heart is considered to be ‘at rest’ at that time. Every other voltage is referenced to that. Ideally, all P-R isoelectric points are the same voltage throughout the EKG recording, and in every lead.

With respect to noise, the sort of signal processing used depends on the type of noise (broadband or band-limited). Baseline variations can be eliminated with a highpass or bandpass filter. What you use depends on what you want to do.

Once the noise and baseline drift are accounted for (and eliminated if possible), there are no specific thresholds, at least with respect to signal processing Significant features after that are the various intervals and voltages in the intervals. Books have been written on EKG interpretation (absolute and relative voltages and specific intervals) so I will not go into that here. Braunwald‘s Heart Disease likely has the best discussion on all of that.

.

rawaa mejri 2024-5-28

Thanks a lot

Star Strider 2024-5-28

As always, my pleasure!

请先登录，再进行评论。

Data Organization in ECG Analysis: Separate Leads or Individual Signals

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

22 个评论
显示 20更早的评论隐藏 20更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Data Organization in ECG Analysis: Separate Leads or Individual Signals

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

22 个评论 显示 20更早的评论隐藏 20更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

22 个评论
显示 20更早的评论隐藏 20更早的评论