Read Validity Data from MDF Files
This example shows how to read and use validity data from MDF files for preprocessing.
Introduction to Validity Information
According to the ASAM MDF standard, “invalidation bits” are an optional feature that require at least one additional byte for each record and serve to mark signal values in a record as “invalid.” This means that each individual sample of each channel has its own invalidation bit.
The mdfRead
function by default returns validity data, which is the inverse of invalidation bits. That is, if an invalidation bit is marked as false in the file, indicating a valid sample, then mdfRead
returns a true value for the corresponding validity sample.
The ValidityRule
name-value argument for mdfRead
controls how validity data is handled:
"
ignore
": Does not read validity data."
include
" (default): Reads validity data and stores it in the timetable'sValidity
custom property."
replace
": Reads validity data and replaces all invalid samples with the MATLAB®missing
value. (Note: Integer channels are converted to doubles.)
Read Validity Data Without Name-Value Arguments
To read all data from an MDF file, use the mdfRead
function with only the filename as an input argument. By default, this reads and includes validity data, storing it in the custom properties of the returned timetables. Each timetable corresponds to data from a specific channel group and includes a related timetable that represents the validity data for that channel group. The following mdfRead
call is equivalent to specifying ValidityRule='include'
.
dataWithValidity = mdfRead("VehicleDataWithInvalidSamples.mf4")
Warning: mdfRead returned data that contains invalid samples in channel groups: 1, 2. Get validity data for a channel group from the timetable’s property Properties.CustomProperties.Validity.
dataWithValidity=2×1 cell array
{100x2 timetable}
{100x3 timetable}
chanGrp1Data = dataWithValidity{1}
chanGrp1Data=100×2 timetable
time Throttle EngineRPM
______ ________ _________
0 sec 50 0
1 sec 53.333 1
2 sec 56.667 2
3 sec 60 3
4 sec 63.333 4
5 sec 66.667 4
6 sec 70 6
7 sec 73.333 7
8 sec 76.667 8
9 sec 80 9
10 sec 80 9
11 sec 80 9
12 sec 80 9
13 sec 80 9
14 sec 80 9
15 sec 80 9
⋮
The warning thrown in the mdfRead
call indicates that the returned data in both channel groups 1 and 2 contain invalid samples. To retrieve the validity data for channel group 1, access the Validity
custom property of the timetable. Notice how the data timetable and validity timetable are the same size. Each logical value of the validity timetable corresponds to the validity of the corresponding sample in the data timetable.
chanGrp1Validity = chanGrp1Data.Properties.CustomProperties.Validity
chanGrp1Validity=100×2 timetable
time Throttle EngineRPM
______ ________ _________
0 sec true true
1 sec true true
2 sec true true
3 sec true true
4 sec true true
5 sec true false
6 sec true true
7 sec true true
8 sec true true
9 sec true true
10 sec false false
11 sec false false
12 sec false false
13 sec false false
14 sec false false
15 sec false false
⋮
Validity data can be used to manually preprocess the data timetable. For example, validity data can be used to remove all rows of a channel group that contain invalid samples.
validSampleRowsMask = all(chanGrp1Validity{:,:} == 1,2); dataWithInvalidRowsRemoved = chanGrp1Data(validSampleRowsMask, :)
dataWithInvalidRowsRemoved=37×2 timetable
time Throttle EngineRPM
______ ________ _________
0 sec 50 0
1 sec 53.333 1
2 sec 56.667 2
3 sec 60 3
4 sec 63.333 4
6 sec 70 6
7 sec 73.333 7
8 sec 76.667 8
9 sec 80 9
31 sec 63.684 31
32 sec 59.474 32
33 sec 55.263 33
34 sec 51.053 34
35 sec 46.842 35
36 sec 42.632 36
37 sec 38.421 37
⋮
Ignore Validity Data
To save memory and processing time when validity data is not required for your analysis, use the argument ValidityRule="ignore"
.
dataValidityIgnored = mdfRead("VehicleDataWithInvalidSamples.mf4", ValidityRule="ignore", GroupNumber=1)
dataValidityIgnored = 1x1 cell array
{100x2 timetable}
Notice that the timetable does not contain a Validity
custom property.
dataValidityIgnored{1}.Properties
ans = TimetableProperties with properties: Description: '' UserData: [] DimensionNames: {'time' 'Variables'} VariableNames: {'Throttle' 'EngineRPM'} VariableTypes: ["double" "double"] VariableDescriptions: {'Throttle' 'EngineRPM'} VariableUnits: {'*10^2' '*10^2'} VariableContinuity: [] RowTimes: [100x1 duration] StartTime: 0 sec SampleRate: 1 TimeStep: 1 sec Events: [0x8 eventtable] CustomProperties: No custom properties are set. Use addprop and rmprop to modify CustomProperties.
Read Validity Data with "replace"
dataInvalidReplaced = mdfRead("VehicleDataWithInvalidSamples.mf4", ValidityRule="replace", GroupNumber=1)
dataInvalidReplaced = 1x1 cell array
{100x2 timetable}
All invalid samples have now been replaced with the missing
value. For example, the "EngineRPM" channel contains an invalid sample at t=5 seconds, therefore the value is NaN.
dataInvalidReplaced{1}
ans=100×2 timetable
time Throttle EngineRPM
______ ________ _________
0 sec 50 0
1 sec 53.333 1
2 sec 56.667 2
3 sec 60 3
4 sec 63.333 4
5 sec 66.667 NaN
6 sec 70 6
7 sec 73.333 7
8 sec 76.667 8
9 sec 80 9
10 sec NaN NaN
11 sec NaN NaN
12 sec NaN NaN
13 sec NaN NaN
14 sec NaN NaN
15 sec NaN NaN
⋮
Preprocess Data
MATLAB offers a variety of functions for handling missing values in a timetable, such as fillmissing
. This function with the fill method set to 'linear'
sets new values for the invalid samples by using linear interpolation of neighboring, nonmissing values (valid samples).
dataPreprocessed = fillmissing(dataInvalidReplaced{1}, 'linear')
dataPreprocessed=100×2 timetable
time Throttle EngineRPM
______ ________ _________
0 sec 50 0
1 sec 53.333 1
2 sec 56.667 2
3 sec 60 3
4 sec 63.333 4
5 sec 66.667 5
6 sec 70 6
7 sec 73.333 7
8 sec 76.667 8
9 sec 80 9
10 sec 82.149 10
11 sec 84.298 11
12 sec 86.447 12
13 sec 88.596 13
14 sec 90.746 14
15 sec 92.895 15
⋮
For more information on proprocessing data, see Data Preprocessing.
Visual Comparison of Throttle Channel
Plot the "Throttle"
channel against the time channel for both the original data, as it exists in the file, and for the data that has been linearly interpolated with new values for the invalid samples.
subplot(2, 1, 1) plot(chanGrp1Data.time, chanGrp1Data.Throttle, "r") title("Throttle Signal as Represented in the File", "FontWeight", "bold") xlabel("Timestamp") ylabel("Throttle") subplot(2, 1, 2) plot(dataPreprocessed.time, dataPreprocessed.Throttle, "b") title("Throttle Signal with Invalid Samples Replaced with Linearly Interpolated Values", "FontWeight", "bold") xlabel("Timestamp") ylabel("Throttle")
Conclusion
Use ValidityRule
options according to your needs:
Use "
ignore
" when validity data is not relevant to your analysis, to save memory and processing time.Use "
include
" for manually handling invalid samples.Use "
replace
" for convenience when handling invalid samples, using functions such asfillmissing
orrmmissing
.