File Ensemble Datastore Using Data in Text Files
In predictive maintenance algorithm design, you frequently have system data in a plain text format such as comma-separated values (CSV). This example shows how to create and use a fileEnsembleDatastore
object to manage an ensemble of data stored in such a format.
Ensemble Data
Extract the compressed data for the example.
unzip fleetdata.zip % extract compressed files
The ensemble consists of ten files, fleetdata_01.txt, ..., fleetdata_10.txt
, each containing data for one car in a fleet of cars. Each file contains five unlabeled columns of data, corresponding to daily readings of the following values:
Odometer reading at the end of the day, in miles
Fuel consumed that day, in gallons
Maximum rpm for the day
Maximum engine temperature for the day, in degrees Celsius
Engine light status at the end of the day (0 = off, 1 = on)
Each file contains data for between about 80 and about 120 days of operation. The data sets were artificially manufactured for this example and do not correspond to real fleet data.
Configure the Ensemble Datastore
Create a fileEnsembleDatastore
object to manage the data.
location = pwd;
extension = '.txt';
fensemble = fileEnsembleDatastore(location,extension);
Configure the ensemble datastore to use the provided function readFleetData.m
to read data from the files.
fensemble.ReadFcn = @readFleetData;
Because the columns in the data files are unlabeled, the function readFleetData
attaches a predefined label to the corresponding data. Configure the ensemble data variables to match the labels defined in readFleetData
.
fensemble.DataVariables = ["Odometer";"FuelConsump";"MaxRPM";"MaxTemp";"EngineLight"];
The function readFleetData
also parses the file name to return an ID of the car from which the data was collected, a number from 1 through 10. This ID is the ensemble independent variable.
fensemble.IndependentVariables = "ID";
Specify all data variables and the independent variable as selected variables for reading from the ensemble datastore.
fensemble.SelectedVariables = [fensemble.IndependentVariables;fensemble.DataVariables]; fensemble
fensemble = fileEnsembleDatastore with properties: ReadFcn: @readFleetData WriteToMemberFcn: [] DataVariables: [5x1 string] IndependentVariables: "ID" ConditionVariables: [0x0 string] SelectedVariables: [6x1 string] ReadSize: 1 NumMembers: 10 LastMemberRead: [0x0 string] Files: [10x1 string]
Read Ensemble Data
When you call read
on the ensemble datastore, it uses readFleetData
to read the selected variables from the first ensemble member.
data1 = read(fensemble)
data1=1×6 table
ID Odometer FuelConsump MaxRPM MaxTemp EngineLight
__ _________________ _________________ _________________ _________________ _________________
1 {120x1 timetable} {120x1 timetable} {120x1 timetable} {120x1 timetable} {120x1 timetable}
Examine and plot the odometer data.
odo1 = data1.Odometer{1}
odo1=120×1 timetable
Time Var1
_______ ______
0 days 180.04
1 day 266.76
2 days 396.01
3 days 535.19
4 days 574.31
5 days 714.82
6 days 714.82
7 days 821.44
8 days 1030.5
9 days 1213.4
10 days 1303.4
11 days 1416.9
12 days 1513.5
13 days 1513.5
14 days 1697.1
15 days 1804.6
⋮
plot(odo1.Time,odo1.Var1)
Compute the average gas mileage for this member of the fleet. This value is the odometer reading on the last day, divided by the total fuel consumed.
fuelConsump1 = data1.FuelConsump{1}.Var1; totalConsump1 = sum(fuelConsump1); totalMiles1 = odo1.Var1(end); mpg1 = totalMiles1/totalConsump1
mpg1 = 22.3086
Batch-Process Data from All Ensemble Members
If you call read
again, it reads data from the next ensemble member and advances the LastMemberRead
property of fensemble
to reflect the file name of that ensemble. You can repeat the processing steps to compute the average gas mileage for that member. In practice, it is more useful to automate the process of reading and processing the data. To do so, reset the ensemble datastore to a state in which no data has been read. Then loop through the ensemble and perform the read and process steps for each member, returning a table that contains each car's ID and average gas mileage. (If you have Parallel Computing Toolbox™, you can use it to speed up the processing of larger data ensembles.)
reset(fensemble) mpgData = zeros(10,2); % preallocate array for 10 ensemble members ct = 1; while hasdata(fensemble) data = read(fensemble); odo = data.Odometer{1}.Var1; fuelConsump = data.FuelConsump{1}.Var1; totalConsump = sum(fuelConsump); mpg = odo(end)/totalConsump1; ID = data.ID; mpgData(ct,:) = [ID,mpg]; ct = ct + 1; end mpgTable = array2table(mpgData,'VariableNames',{'ID','mpg'})
mpgTable=10×2 table
ID mpg
__ ______
1 22.309
2 19.327
3 20.816
4 27.464
5 18.848
6 22.517
7 27.018
8 27.284
9 17.149
10 26.37