Remove specific data sequence when reading .bin file

Question

0 个投票

How do I remove/skip the specific data sequence [1024 0 2240 -24500] when reading the .bin file in batch? thanks!

filename='rf.bin';                                                          % filename='./rf.bin';
fid=fopen(filename,'r');
dataHeader=fread(fid,123,'uint8');                                          % skipping the header for .bin
NsperBatch = 1e3;                                                           % number of sample per batch 
K=100;                                                                      % Average every K set of values, K=100 in this case
magSpectrumMat=[];                                                          
while ~feof(fid) 
    magSpectrum=0;
    for k=1:K                                                             
        data=fread(fid,NsperBatch*2,'int16','b');
        dataIQ=data(1:2:end)+1i*data(2:2:end);
        dataSpectrum=fftshift(fft(dataIQ));
        magSpectrum=magSpectrum+abs(dataSpectrum).^2;
    end
    magSpectrum = magSpectrum/K;
    magSpectrumMat = [magSpectrumMat magSpectrum];
end
magSpectrumMat_dB=pow2db(magSpectrumMat);

3 个评论
显示 1更早的评论隐藏 1更早的评论

Ivy Chen 2017-10-12

编辑：Ivy Chen 2017-10-12

Yes, it can occur anywhere. What you described above is the preferred way to just drop the [1024 0 2240 -24500] data sequence and read four more values in the block. This way, it will make the calculation and matrix clean. Thanks for the help.

Walter Roberson 2017-10-12

Is there any possibility that it could occur inside the 123 byte header? Is there any possibility it could start inside the 123 byte header but end outside the header?

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Walter Roberson 2017-10-12

在 MATLAB Online 中打开

3 个投票

Okay, here it is, with skips accounted for, and with automatic padding in case the data is the wrong size.

As you indicated the words to skip could occur "anywhere" after I asked about that, I assumed that the words to skip might even occur during that 123 byte header.

I assumed that if there were not a full K batch that you wanted to take the mean of what was available in the last partial batch rather than dividing by K specifically.

I vectorized a lot of the computation.

I was not completely sure of the order of data you wanted to output. I think your existing code is putting the results for averaging into column vectors in a matrix; that is the output format I create here.

The below is not tested as I do not happen to have your data file.

NsperBatch = 1e3;
header_size = 123;
K=100;              % Average every K set of values, K=100 in this case
pattern_to_skip = int16([1024 0 2240 -24500]);   %magic sequence of words to ignore
filename = 'rf.bin';         % filename='./rf.bin';
pattern_to_skip = typecast( swapbytes(pattern_to_skip), 'uint8');  %big endian
PL = length(pattern_to_skip);
fid = fopen(filename,'r');
bytes = reshape( fread(fid, inf, '*uint8'), 1, []);   %row vector
fclose(fid);
orig_num_bytes = length(bytes);
skiplocs = strfind(bytes, pattern_to_skip);
for idx = fliplr(skiplocs)
   bytes(idx:idx+PL-1) = [];    %delete bytes
end
postskip_num_bytes = length(bytes);
fprintf('%d groups were skipped\n', (orig_num_bytes - postskip_num_bytes) / PL );
dataHeader = bytes(1:header_size);
bytes = bytes(header_size+1:end);
data_length = length(bytes);
if mod(data_length, 2) ~= 0
  fprintf('warning: data is odd number of bytes long, padding\n');
  bytes(end+1) = 0;
end
if mod(data_length, 4) ~= 0
  fprintf('warning: data is odd number of words long, padding\n');
  bytes(end+1:end+2) = 0;
end
words = typecast(bytes, 'int16');
all_dataIQ = double( complex( words(1:2:end), words(2:2:end) ) );
num_dataIQ = length(all_dataIQ);
target_num_dataIQ = NsperBatch * ceil( num_dataIQ / NsperBatch);
if num_dataIQ ~= target_num_dataIQ
  fprintf('warning: complex data is not a multiple of %d samples long, padding\n', numNsperBatch);
  all_dataIQ(target_num_dataIQ) = 0;    %zero fill automatically
end
magSpectra = abs(fftshift( fft( reshape(all_dataIQ, NsperBatch, []) ) )).^2;  %do it all at once!
num_spectra = size(magSpectra, 2);
num_full_batches = floor(num_spectra / K);
num_leftover = num_spectra - K * num_full_batches;
num_batches = num_full_batches + (num_leftover ~= 0);
magSpectrumMat = zeros(NsperBatch, num_batches);
for batch_idx = 1 : num_full_batches
  bstart = (batch_idx - 1) * K + 1;
  bend = bstart + K - 1; 
  magSpectrum = mean( magSpectra(:, bstart : bend ), 2 );
  magSpectrumMat(:, batch_idx) = magSpectrum;
end
if num_leftover ~= 0
  magSpectrum = mean( magSpectra(:, end-num_leftover+1 : end), 2 );
  magSpectrumMat(:, end) = magSpectrum;
end

8 个评论
显示 6更早的评论隐藏 6更早的评论

Walter Roberson 2017-10-13

MATLAB has only been supported on x86 and x64 architecture since R2010a. x86 and x64 architectures are little endian. On little-endian systems, the a sequence of bytes, int16([1024 0 2240 -24500]) is not hex 0400 0000 08C0 A04C, it is 0004 0000 C008 4CA0, so to search for int16([1024 0 2240 -24500]) in a sequence of bytes, you need to byte-swap it.

Normalizing by max(abs) guarantees that the result peaks at either +1 or -1. In some contexts, that is a good property, if you are trying to compare relative proportions from magSpectra to magSpectra. It is, however, not good if you need to be able to compare absolute proportions. For example, if the signal were to suddenly cut in half for everything, normalizing by max(abs) would not notice.

You also need to be aware of the fact that your values being normalized are fft results. The first entry in any fft result is the total power of the signal, sum() of the time series values, and if you are not working with a signal that has an expected mean of 0 and you are not subtracting off the mean before doing the fft, then it is much more common than not for that entry to dominate the output -- and so max(abs()) is going to be picking out that entry most of the time. But the rest of your entries are effectively relative frequency components and after you normalize by the total power they can be pretty small... If you want to show relative proportions of signal, then you should either subtract off the mean before doing the fft or else set the corresponding array element to 0 before doing the max(abs()). It is usually easier to set it before doing the fftshift().

Normalizing by norm() will give signal levels typically much below normalizing by max(abs)

There are many cases where it is best to normalize to mean 0 std 1, by subtracting off the mean and dividing the result by the standard deviation.

There are also many cases where it is best to calculate the normalization factors over all of the data rather than spectrum by spectrum.

You need to think more about what you would like to see.

For example suppose that you had two spectra, nearly the same except that one had a narrow band frequency injected, so giving another peak compared to the first. If you normalize by norm() then because the total of the squares of the samples increased, the norm would increase, so dividing by the norm would reduce the relative heights of the second signal. If you normalize by max(abs) then the relative heights would not change unless the injected signal happened to be the highest peak (but then it would...).

On the other hand, if you had two signals that were the same except that one had twice the magnitude around the mean that the other did, and the mean was 0, then dividing by max(abs) would show exactly the same results. But if the mean was not 0, then the total power is probably going to dominate the fft() and you end up normalizing by that, effectively scaling the graphs except for a spike up to 1 where the power is being plotted...

Walter Roberson 2017-10-16

在 MATLAB Online 中打开

Do not modify the bytes = line. Your header is defined by an odd number of bytes, and if you swap at the time you read them in, you would move byte 123 to the position of byte 124 and would then be ignoring the wrong byte. So you have to scan as bytes and delete the garbage as bytes (unless you are sure the garbage never occurs in the headers), and once have scrubbed the garbage you need to trim off the first 123 bytes of what is left.

Once you have trimmed off the header, there is a possibility that you need to byte swap: it depends on how the data was stored.

When you described the data values to remove, I assumed you had read through the data stream and had found those particular numeric values after reading as int16, with the implication that the bytes were in the other order (because the native order on whatever host you are using is little-endian.) But it is possible that you were told the sequence of bytes by someone else who assumed you were using big endian, in which case the byteswap would not be needed... Do you have a sample file known to have the sequence of bytes in it that you could process with byteswap or not on the match for deletion, to check to see which is happening in practice?

If the data is written as big-endian then you would need to byteswap the int16, which you would do by changing

words = typecast(bytes, 'int16');

to

words = byteswap( typecast(bytes, 'int16') );

Ivy Chen 2017-10-16

It is the last case that the IQ data is big endian. I will add the byteswap to complete the process. thanks again!

请先登录，再进行评论。

Remove specific data sequence when reading .bin file

3 个评论
显示 1更早的评论隐藏 1更早的评论

采纳的回答

8 个评论
显示 6更早的评论隐藏 6更早的评论

更多回答（0 个）

类别

标签

Community Treasure Hunt

Remove specific data sequence when reading .bin file

3 个评论 显示 1更早的评论 隐藏 1更早的评论

采纳的回答

8 个评论 显示 6更早的评论 隐藏 6更早的评论

更多回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

8 个评论
显示 6更早的评论隐藏 6更早的评论