Calculations When Tables Have Both Numeric and Nonnumeric Data
This example shows how to perform calculations on data in tables when they have both numeric and nonnumeric data. After you identify the table variables that contain numeric data, you can access the data in those variables by using either curly braces or dot notation. Then you can perform arithmetic operations or call functions on the numeric data and assign the result back into the table, all in one line of code. You also can use the rowfun
function for calculations across the rows of a table and the varfun
function for calculations along the variables. If your table has groups of data within it, you can use the groupsummary
, rowfun
, and varfun
functions to perform calculations for each group in the table.
Read Sample Data into Table
Read data from a CSV (comma-separated values) file, testScores.csv
, into a table by using the readtable
function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, School
, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert School
to a categorical
variable.
scores = readtable("testScores.csv","TextType","string"); scores.School = categorical(scores.School)
scores=10×5 table
LastName School Test1 Test2 Test3
__________ __________ _____ _____ _____
"Jeong" XYZ School 90 87 93
"Collins" XYZ School 87 85 83
"Torres" XYZ School 86 85 88
"Phillips" ABC School 75 80 72
"Ling" ABC School 89 86 87
"Ramirez" ABC School 96 92 98
"Lee" XYZ School 78 75 77
"Walker" ABC School 91 94 92
"Garcia" ABC School 86 83 85
"Chang" XYZ School 79 76 82
Create Subtable with Numeric Data
One straightforward way to work with the numeric data is to create a subtable that has only the numeric variables. You can create a subtable by indexing into a table using parentheses and specifying rows and variables. The subtable is a new, smaller table that contains only the specified rows and variables from the old table.
For example, create a subtable from scores
that has only the test scores. Because the first two variables have nonnumeric data, you can index into this table specifying the other variables.
numericScores = scores(:,3:end)
numericScores=10×3 table
Test1 Test2 Test3
_____ _____ _____
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82
Another way to specify variables is to use the vartype
function to specify them by data type. This function is useful when you have a large table with many variables that have different data types. It returns a subscript that you can use to specify table variables.
numericVars = vartype("numeric")
numericVars = table vartype subscript: Select table variables matching the type 'numeric'
numericScores = scores(:,numericVars)
numericScores=10×3 table
Test1 Test2 Test3
_____ _____ _____
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82
Calculate When Subtable Has Only Numeric Data
(Since R2023a) You can perform operations on a table directly, as long as all its variables have data types that support the operations. For more information, see Direct Calculations on Tables and Timetables.
For example, scale the numeric data so the test scores are on a 25-point scale.
numericScores = numericScores .* 0.25
numericScores=10×3 table
Test1 Test2 Test3
_____ _____ _____
22.5 21.75 23.25
21.75 21.25 20.75
21.5 21.25 22
18.75 20 18
22.25 21.5 21.75
24 23 24.5
19.5 18.75 19.25
22.75 23.5 23
21.5 20.75 21.25
19.75 19 20.5
Before R2023a, you cannot use this syntax. Instead, index into the table using curly braces, or use the Variables
affordance to specify all table rows and variables. These syntaxes return the same result as the previous operation and work in all releases.
numericScores{:,:} = numericScores{:,:} .* 0.25 numericScores.Variables = numericScores.Variables .* 0.25
When you use these syntaxes, they extract the table contents and concatenate them into an array, perform the calculation, and assign the results back into the table. The only requirement is that the variables must all have data types that allow them to be concatenated.
With curly braces, you can also specify a subset of rows and variables, as in
numericScores{1:5,["Test1","Test3"]}
.With
Variables
, you always get all rows and all variables concatenated into an array.
(Since R2023a) You can also call many mathematical and statistical functions on a table directly. For example, subtract the minimum value within each table variable from that variable.
numericScores = numericScores - min(numericScores)
numericScores=10×3 table
Test1 Test2 Test3
_____ _____ _____
3.75 3 5.25
3 2.5 2.75
2.75 2.5 4
0 1.25 0
3.5 2.75 3.75
5.25 4.25 6.5
0.75 0 1.25
4 4.75 5
2.75 2 3.25
1 0.25 2.5
Again, before R2023a you cannot use this syntax. Instead, use either of the following syntaxes. They return the same result and work in all releases.
numericScores{:,:} = numericScores{:,:} - min(numericScores{:,:}) numericScores.Variables = numericScores.Variables - min(numericScores.Variables)
Calculate on One Variable in Any Table
In all releases, you can also perform calculations on one variable at a time by using dot notation and variable names. For example, add a correction worth five points to the last set of scores in Test3
.
Because the other table variables are unaffected by operations on an individual variable, you can perform this kind of calculation in any table. It does not matter whether the other variables have numeric or nonnumeric data.
numericScores.Test3 = numericScores.Test3 + 5
numericScores=10×3 table
Test1 Test2 Test3
_____ _____ _____
3.75 3 10.25
3 2.5 7.75
2.75 2.5 9
0 1.25 5
3.5 2.75 8.75
5.25 4.25 11.5
0.75 0 6.25
4 4.75 10
2.75 2 8.25
1 0.25 7.5
Calculate Across Rows in Full Table
The full table, scores
, has numeric and nonnumeric variables. In all releases, use curly-brace indexing or dot notation to perform calculations on specified rows and variables within tables.
For example, find the mean, minimum, and maximum values of the test scores for each student. Calculate these values across each row. Assign them to scores
as new table variables.
One simple, useful way is to extract the data into a matrix, call a function on it, and then assign the output to a new table variable. For example, calculate the mean test scores across each row. Then add them to scores
in a new table variable, TestMean
. Use curly braces to extract the numeric data from Test1
, Test2
, and Test3
into a matrix. To calculate the mean across rows, specify the dimension as 2
when you call mean
.
vars = ["Test1","Test2","Test3"]; scores.TestMean = mean(scores{:,vars},2)
scores=10×6 table
LastName School Test1 Test2 Test3 TestMean
__________ __________ _____ _____ _____ ________
"Jeong" XYZ School 90 87 93 90
"Collins" XYZ School 87 85 83 85
"Torres" XYZ School 86 85 88 86.333
"Phillips" ABC School 75 80 72 75.667
"Ling" ABC School 89 86 87 87.333
"Ramirez" ABC School 96 92 98 95.333
"Lee" XYZ School 78 75 77 76.667
"Walker" ABC School 91 94 92 92.333
"Garcia" ABC School 86 83 85 84.667
"Chang" XYZ School 79 76 82 79
Another way to perform calculations across rows is to use the rowfun
function. You do not need to extract data from the table when using rowfun
. Instead, pass the table and a function to apply to the data as input arguments to rowfun
. While the syntax is a little more complex, rowfun
can be useful when the function that you apply takes multiple input arguments or returns multiple output arguments.
For example, use the bounds
function to find the minimum and maximum test scores. The bounds
function returns two output arguments, so apply it to scores
by using rowfun
. The output of rowfun
is a new table that has TestMin
and TestMax
variables. In this case, also specify "SeparateInputs"
as false
so that values across each row are combined into a vector before being passed to bounds
.
minmaxTest = rowfun(@bounds, ... scores, ... "InputVariables",vars, ... "OutputVariableNames",["TestMin","TestMax"], ... "SeparateInputs",false)
minmaxTest=10×2 table
TestMin TestMax
_______ _______
87 93
83 87
85 88
72 80
86 89
92 98
75 78
91 94
83 86
76 82
Concatenate scores
and minmaxTest
so that these values are in one table.
scores = [scores minmaxTest]
scores=10×8 table
LastName School Test1 Test2 Test3 TestMean TestMin TestMax
__________ __________ _____ _____ _____ ________ _______ _______
"Jeong" XYZ School 90 87 93 90 87 93
"Collins" XYZ School 87 85 83 85 83 87
"Torres" XYZ School 86 85 88 86.333 85 88
"Phillips" ABC School 75 80 72 75.667 72 80
"Ling" ABC School 89 86 87 87.333 86 89
"Ramirez" ABC School 96 92 98 95.333 92 98
"Lee" XYZ School 78 75 77 76.667 75 78
"Walker" ABC School 91 94 92 92.333 91 94
"Garcia" ABC School 86 83 85 84.667 83 86
"Chang" XYZ School 79 76 82 79 76 82
Calculate Along Specified Variables in Full Table
Find the mean score for each test. Calculate these values along the table variables.
The simplest way is to use mean
. First use curly braces to extract the numeric data from Test1
, Test2
, and Test3
into a matrix. Then call mean
to calculate the mean of each column of the matrix. The output is a numeric vector where each element is the mean of a table variable.
vars = ["Test1","Test2","Test3"]; meanOfEachTest = mean(scores{:,vars})
meanOfEachTest = 1×3
85.7000 84.3000 85.7000
Another way to perform calculations along table variables is to use the varfun
function. You do not need to extract data from the table when using varfun
. Instead, pass the table and a function to apply to the data as input arguments to varfun
.
Calculate the mean scores using varfun
. The output is a new table with meaningful names for the table variables.
meanOfEachTest = varfun(@mean, ... scores, ... "InputVariables",vars)
meanOfEachTest=1×3 table
mean_Test1 mean_Test2 mean_Test3
__________ __________ __________
85.7 84.3 85.7
Calculate Using Groups of Data Within Full Table
If your table has one or more grouping variables, then you can perform calculations on groups of data within the table. You can use the values in a grouping variable to specify the groups that the rows belong to.
For example, the School
variable in scores
has two values, ABC School
and XYZ School
. You can think of these two values as categories that denote groups of data in scores
. In this case, you can perform calculations by school.
To apply a function and use grouping variables, you can use the varfun
function. You can specify a function, such as mean
, and then use varfun
to apply it to each table variable that you specify. When you also specify grouping variables, varfun
applies the function to each group within each table variable.
Calculate the mean score for each test by school.
vars = ["Test1","Test2","Test3"]; meanScoresBySchool = varfun(@mean, ... scores, ... "InputVariables",vars, ... "GroupingVariables","School")
meanScoresBySchool=2×5 table
School GroupCount mean_Test1 mean_Test2 mean_Test3
__________ __________ __________ __________ __________
ABC School 5 87.4 87 86.8
XYZ School 5 84 81.6 84.6
Starting in R2018a, you also can use the groupsummary
function to perform calculations on groups of data in each table variable.
meanScoresBySchool = groupsummary(scores,"School","mean",vars)
meanScoresBySchool=2×5 table
School GroupCount mean_Test1 mean_Test2 mean_Test3
__________ __________ __________ __________ __________
ABC School 5 87.4 87 86.8
XYZ School 5 84 81.6 84.6
The syntax for groupsummary
is a bit simpler. Also, you can use groupsummary
to specify multiple methods at once. For example, find both the minimum and maximum scores of each test by school.
minmaxBySchool = groupsummary(scores,"School",["min","max"],vars)
minmaxBySchool=2×8 table
School GroupCount min_Test1 max_Test1 min_Test2 max_Test2 min_Test3 max_Test3
__________ __________ _________ _________ _________ _________ _________ _________
ABC School 5 75 96 80 94 72 98
XYZ School 5 78 90 75 87 77 93
To use all the predefined methods of groupsummary
, specify "all"
as the method. Calculate all statistics on the mean test score by school.
allStatsBySchool = groupsummary(scores,"School","all","TestMean")
allStatsBySchool=2×14 table
School GroupCount mean_TestMean sum_TestMean min_TestMean max_TestMean range_TestMean median_TestMean mode_TestMean var_TestMean std_TestMean nummissing_TestMean nnz_TestMean numunique_TestMean
__________ __________ _____________ ____________ ____________ ____________ ______________ _______________ _____________ ____________ ____________ ___________________ ____________ __________________
ABC School 5 87.067 435.33 75.667 95.333 19.667 87.333 75.667 57.967 7.6136 0 5 5
XYZ School 5 83.4 417 76.667 90 13.333 85 76.667 29.856 5.464 0 5 5
Sometimes you might want to find a particular value in one table variable and then find the corresponding value in another table variable. In such cases use rowfun
.
For example, find the student in each school who had the highest mean test score. The attached supporting function, findNameAtMax
, returns both the highest score and the name of the student who had that score. Use rowfun
to apply findNameAtMax
to each group of students. The rowfun
function is suitable because findNameAtMax
has multiple input arguments (last names and test scores) and also returns multiple output arguments.
maxScoresBySchool = rowfun(@findNameAtMax, ... scores, ... "InputVariables",["LastName","TestMean"], ... "GroupingVariables","School", ... "OutputVariableNames",["max_TestMean","LastName"])
maxScoresBySchool=2×4 table
School GroupCount max_TestMean LastName
__________ __________ ____________ _________
ABC School 5 95.333 "Ramirez"
XYZ School 5 90 "Jeong"
Supporting Function
function [maxValue,lastName] = findNameAtMax(names,values) % Return maximum value and the last name % from the row at which the maximum value occurred [maxValue,maxIndex] = max(values); lastName = names(maxIndex); end
See Also
table
| rowfun
| varfun
| groupsummary
| readtable
| vartype