Main Content

Read Validity Data from MDF Files

This example shows how to read and use validity data from MDF files for preprocessing.

Introduction to Validity Information

According to the ASAM MDF standard, “invalidation bits” are an optional feature that require at least one additional byte for each record and serve to mark signal values in a record as “invalid.” This means that each individual sample of each channel has its own invalidation bit.

The mdfRead function by default returns validity data, which is the inverse of invalidation bits. That is, if an invalidation bit is marked as false in the file, indicating a valid sample, then mdfRead returns a true value for the corresponding validity sample.

The ValidityRule name-value argument for mdfRead controls how validity data is handled:

  • "ignore": Does not read validity data.

  • "include" (default): Reads validity data and stores it in the timetable's Validity custom property.

  • "replace": Reads validity data and replaces all invalid samples with the MATLAB® missing value. (Note: Integer channels are converted to doubles.)

Read Validity Data Without Name-Value Arguments

To read all data from an MDF file, use the mdfRead function with only the filename as an input argument. By default, this reads and includes validity data, storing it in the custom properties of the returned timetables. Each timetable corresponds to data from a specific channel group and includes a related timetable that represents the validity data for that channel group. The following mdfRead call is equivalent to specifying ValidityRule='include'.

dataWithValidity = mdfRead("VehicleDataWithInvalidSamples.mf4")
Warning: mdfRead returned data that contains invalid samples in channel groups: 1, 2. Get validity data for a channel group from the timetable’s property Properties.CustomProperties.Validity.
dataWithValidity=2×1 cell array
    {100x2 timetable}
    {100x3 timetable}

chanGrp1Data = dataWithValidity{1}
chanGrp1Data=100×2 timetable
         time     Throttle    EngineRPM
        ______    ________    _________

        0 sec          50         0    
        1 sec      53.333         1    
        2 sec      56.667         2    
        3 sec          60         3    
        4 sec      63.333         4    
        5 sec      66.667         4    
        6 sec          70         6    
        7 sec      73.333         7    
        8 sec      76.667         8    
        9 sec          80         9    
        10 sec         80         9    
        11 sec         80         9    
        12 sec         80         9    
        13 sec         80         9    
        14 sec         80         9    
        15 sec         80         9    
      ⋮

The warning thrown in the mdfRead call indicates that the returned data in both channel groups 1 and 2 contain invalid samples. To retrieve the validity data for channel group 1, access the Validity custom property of the timetable. Notice how the data timetable and validity timetable are the same size. Each logical value of the validity timetable corresponds to the validity of the corresponding sample in the data timetable.

chanGrp1Validity = chanGrp1Data.Properties.CustomProperties.Validity
chanGrp1Validity=100×2 timetable
     time     Throttle    EngineRPM
    ______    ________    _________

    0 sec      true         true   
    1 sec      true         true   
    2 sec      true         true   
    3 sec      true         true   
    4 sec      true         true   
    5 sec      true         false  
    6 sec      true         true   
    7 sec      true         true   
    8 sec      true         true   
    9 sec      true         true   
    10 sec     false        false  
    11 sec     false        false  
    12 sec     false        false  
    13 sec     false        false  
    14 sec     false        false  
    15 sec     false        false  
      ⋮

Validity data can be used to manually preprocess the data timetable. For example, validity data can be used to remove all rows of a channel group that contain invalid samples.

validSampleRowsMask = all(chanGrp1Validity{:,:} == 1,2);
dataWithInvalidRowsRemoved = chanGrp1Data(validSampleRowsMask, :)
dataWithInvalidRowsRemoved=37×2 timetable
         time     Throttle    EngineRPM
        ______    ________    _________

        0 sec          50         0    
        1 sec      53.333         1    
        2 sec      56.667         2    
        3 sec          60         3    
        4 sec      63.333         4    
        6 sec          70         6    
        7 sec      73.333         7    
        8 sec      76.667         8    
        9 sec          80         9    
        31 sec     63.684        31    
        32 sec     59.474        32    
        33 sec     55.263        33    
        34 sec     51.053        34    
        35 sec     46.842        35    
        36 sec     42.632        36    
        37 sec     38.421        37    
      ⋮

Ignore Validity Data

To save memory and processing time when validity data is not required for your analysis, use the argument ValidityRule="ignore".

dataValidityIgnored = mdfRead("VehicleDataWithInvalidSamples.mf4", ValidityRule="ignore", GroupNumber=1)
dataValidityIgnored = 1x1 cell array
    {100x2 timetable}

Notice that the timetable does not contain a Validity custom property.

dataValidityIgnored{1}.Properties
ans = 
  TimetableProperties with properties:

             Description: ''
                UserData: []
          DimensionNames: {'time'  'Variables'}
           VariableNames: {'Throttle'  'EngineRPM'}
           VariableTypes: ["double"    "double"]
    VariableDescriptions: {'Throttle'  'EngineRPM'}
           VariableUnits: {'*10^2'  '*10^2'}
      VariableContinuity: []
                RowTimes: [100x1 duration]
               StartTime: 0 sec
              SampleRate: 1
                TimeStep: 1 sec
                  Events: [0x8 eventtable]
        CustomProperties: No custom properties are set.
      Use addprop and rmprop to modify CustomProperties.

Read Validity Data with "replace"

dataInvalidReplaced = mdfRead("VehicleDataWithInvalidSamples.mf4", ValidityRule="replace", GroupNumber=1)
dataInvalidReplaced = 1x1 cell array
    {100x2 timetable}

All invalid samples have now been replaced with the missing value. For example, the "EngineRPM" channel contains an invalid sample at t=5 seconds, therefore the value is NaN.

dataInvalidReplaced{1}
ans=100×2 timetable
         time     Throttle    EngineRPM
        ______    ________    _________

        0 sec          50          0   
        1 sec      53.333          1   
        2 sec      56.667          2   
        3 sec          60          3   
        4 sec      63.333          4   
        5 sec      66.667        NaN   
        6 sec          70          6   
        7 sec      73.333          7   
        8 sec      76.667          8   
        9 sec          80          9   
        10 sec        NaN        NaN   
        11 sec        NaN        NaN   
        12 sec        NaN        NaN   
        13 sec        NaN        NaN   
        14 sec        NaN        NaN   
        15 sec        NaN        NaN   
      ⋮

Preprocess Data

MATLAB offers a variety of functions for handling missing values in a timetable, such as fillmissing. This function with the fill method set to 'linear' sets new values for the invalid samples by using linear interpolation of neighboring, nonmissing values (valid samples).

dataPreprocessed = fillmissing(dataInvalidReplaced{1}, 'linear')
dataPreprocessed=100×2 timetable
         time     Throttle    EngineRPM
        ______    ________    _________

        0 sec          50         0    
        1 sec      53.333         1    
        2 sec      56.667         2    
        3 sec          60         3    
        4 sec      63.333         4    
        5 sec      66.667         5    
        6 sec          70         6    
        7 sec      73.333         7    
        8 sec      76.667         8    
        9 sec          80         9    
        10 sec     82.149        10    
        11 sec     84.298        11    
        12 sec     86.447        12    
        13 sec     88.596        13    
        14 sec     90.746        14    
        15 sec     92.895        15    
      ⋮

For more information on proprocessing data, see Data Preprocessing.

Visual Comparison of Throttle Channel

Plot the "Throttle" channel against the time channel for both the original data, as it exists in the file, and for the data that has been linearly interpolated with new values for the invalid samples.

subplot(2, 1, 1)
plot(chanGrp1Data.time, chanGrp1Data.Throttle, "r")
title("Throttle Signal as Represented in the File", "FontWeight", "bold")
xlabel("Timestamp")
ylabel("Throttle")
subplot(2, 1, 2)
plot(dataPreprocessed.time, dataPreprocessed.Throttle, "b")
title("Throttle Signal with Invalid Samples Replaced with Linearly Interpolated Values", "FontWeight", "bold")
xlabel("Timestamp")
ylabel("Throttle")

Figure contains 2 axes objects. Axes object 1 with title Throttle Signal as Represented in the File, xlabel Timestamp, ylabel Throttle contains an object of type line. Axes object 2 with title Throttle Signal with Invalid Samples Replaced with Linearly Interpolated Values, xlabel Timestamp, ylabel Throttle contains an object of type line.

Conclusion

Use ValidityRule options according to your needs:

  • Use "ignore" when validity data is not relevant to your analysis, to save memory and processing time.

  • Use "include" for manually handling invalid samples.

  • Use "replace" for convenience when handling invalid samples, using functions such as fillmissing or rmmissing.