groupfilter

Filter by group

collapse all in page

Syntax

G = groupfilter(T,groupvars,method)

G = groupfilter(T,groupvars,groupbins,method)

G = groupfilter(___,datavars)

G = groupfilter(___,"IncludedEdge",LR)

B = groupfilter(A,groupvars,method)

B = groupfilter(A,groupvars,groupbins,method)

B = groupfilter(A,groupvars,groupbins,method,"IncludedEdge",LR)

[B,BG] = groupfilter(A,___)

Description

Table Data

G = groupfilter(T,groupvars,method) returns the rows of table or timetable T that satisfy the group-wise filtering condition specified in method. The filtering condition method is a function handle applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = groupfilter(T,"Trial",@(x) numel(x) > 5) groups the data in T by Trial, and keeps the rows that belong to groups with more than five trials.

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

example

G = groupfilter(T,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping. For example, G = groupfilter(T,"SaleDate","year",@(x) numel(x) > 5) bins the data in SaleDate by year, and keeps the rows whose bin has more than five elements.

example

G = groupfilter(___,datavars) specifies the table variables to apply the filtering method to for any of the previous syntaxes. For example, G = groupfilter(T,"Trial",@(x) x == max(x),"Height") keeps the rows of T that correspond to the maximum height for each trial.

example

G = groupfilter(___,"IncludedEdge",LR) specifies the included bin edge as "left" or "right" to indicate which end of the bin interval is inclusive. You can use IncludeEdge with any previous syntax that specifies groupbins.

Array Data

B = groupfilter(A,groupvars,method) returns the rows of vector or matrix A that satisfy the group-wise filtering condition specified in method. The filtering condition method is a function handle applied to all column vectors. Groups are defined by rows in the column vectors in groupvars that have the same unique combination of values.

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

B = groupfilter(A,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping.

example

B = groupfilter(A,groupvars,groupbins,method,"IncludedEdge",LR) specifies whether to include the left or right edge in each bin when groupbins is specified.

[B,BG] = groupfilter(A,___) also returns the unique grouping vector combinations corresponding to the rows in B.

example

Examples

collapse all

Filter Two-Variable Table

Open Live Script

Create a table containing two variables.

groupID = [1 1 1 2 2 3]';
sample = [3 1 2 9 8 5]';
T = table(groupID,sample)

T=6×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       2         9   
       2         8   
       3         5

Group by ID number, and return rows corresponding to groups with more than two samples.

Gnumel = groupfilter(T,"groupID",@(x) numel(x) > 2)

Gnumel=3×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2

Return rows whose group samples are between 0 and 6.

Gvals = groupfilter(T,"groupID",@(x) min(x) > 0 && max(x) < 6)

Gvals=4×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       3         5

Group by Largest Values

Open Live Script

Create a table containing two variables that represent a day number and temperature.

daynum = [1 1 1 1 2 2 2 2]';
temp = [67 65 71 55 61 79 58 78]';
T = table(daynum,temp)

T=8×2 table
    daynum    temp
    ______    ____

      1        67 
      1        65 
      1        71 
      1        55 
      2        61 
      2        79 
      2        58 
      2        78

Group by day number, and return the largest two temperatures for each day.

G = groupfilter(T,"daynum",@(x) ismember(x,maxk(x,2)))

G=4×2 table
    daynum    temp
    ______    ____

      1        67 
      1        71 
      2        79 
      2        78

Group by Month

Open Live Script

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 4 10; ...
                       2017 4 14; 2017 4 30; 2017 5 25; ...
                       2017 5 29; 2017 5 21]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619]';
T = table(timeStamps,profit)

T=9×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Apr-2017     2587 
    14-Apr-2017     1998 
    30-Apr-2017     2899 
    25-May-2017     3112 
    29-May-2017      909 
    21-May-2017     2619

Group the dates by month, and return rows that correspond to the maximum profit for that month.

Gmax = groupfilter(T,"timeStamps","month",@(x) x == max(x))

Gmax=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    02-Mar-2017     3071         Mar-2017    
    30-Apr-2017     2899         Apr-2017    
    25-May-2017     3112         May-2017

Return rows whose month had an average profit greater than $2300.

Gavg = groupfilter(T,"timeStamps","month",@(x) mean(x) > 2300)

Gavg=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    10-Apr-2017     2587         Apr-2017    
    14-Apr-2017     1998         Apr-2017    
    30-Apr-2017     2899         Apr-2017

Filter Three-Variable Table

Open Live Script

Create a table T that contains information about nine individuals.

groupID = [1 2 3 1 2 3 1 2 3]';
Height = [62 61 59 66 70 72 57 67 71]';
HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Poor"; "Fair"; "Excellent"; "Poor"; "Excellent"; "Fair"]);
T = table(groupID,Height,HealthStatus)

T=9×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       1         62       Poor       
       2         61       Good       
       3         59       Fair       
       1         66       Poor       
       2         70       Fair       
       3         72       Excellent  
       1         57       Poor       
       2         67       Excellent  
       3         71       Fair

Group by ID number, and return rows for groups that contain only members with a minimum height of 60.

G1 = groupfilter(T,"groupID",@(x) min(x) >= 60,"Height")

G1=3×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       2         61       Good       
       2         70       Fair       
       2         67       Excellent

Group by ID number, and return rows for groups that contain only members whose health status is Poor.

G2 = groupfilter(T,"groupID",@(x) all(x == "Poor"),"HealthStatus")

G2=3×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       1         62          Poor    
       1         66          Poor    
       1         57          Poor

Filter with Vector Data

Open Live Script

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Group by day of the week, and compute the maximum profit for each group. Display the maximum profits and their corresponding groups.

[maxDailyProfit,dayOfWeek] = groupfilter(profit,timeStamps, ...
    "dayname",@(x) x == max(x))

maxDailyProfit = 5×1

        3071
        1185
        2899
        3112
        2619

dayOfWeek = 5x1 categorical
     Thursday 
     Wednesday 
     Friday 
     Saturday 
     Tuesday

Input Arguments

collapse all

`T` — Input table
table | timetable

Input table, specified as a table or timetable.

`A` — Input array
column vector | matrix

Input array, specified as a column vector or a group of column vectors stored as a matrix.

`groupvars` — Grouping variables or vectors
scalar | vector | matrix | cell array | pattern | function handle | table `vartype` subscript

Grouping variables or vectors, specified as one of these options:

For array input data, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or a cell array.

For table or timetable input data, groupvars indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

Indexing Scheme	Values to Specify	Examples
Variable names	A string scalar or character vector A string array or cell array of character vectors A `pattern` object	`"A"` or `'A'` — A variable named `A` `["A" "B"]` or `{'A','B'}` — Two variables named `A` and `B` `"Var"+digitsPattern(1)` — Variables named `"Var"` followed by a single digit
Variable index	An index number that refers to the location of a variable in the table A vector of numbers A `logical` vector. Typically, this vector is the same length as the number of variables, but you can omit trailing `0` (`false`) values.	`3` — The third variable from the table `[2 3]` — The second and third variables from the table `[false false true]` — The third variable
Function handle	A function handle that takes a table variable as input and returns a `logical` scalar	`@isnumeric` — All the variables containing numeric values
Variable type	A `vartype` subscript that selects variables of a specified type	`vartype("numeric")` — All the variables containing numeric values

Example: groupfilter(T,"Var3",method)

`method` — Filtering method
function handle

Filtering method, specified as a function handle.

method defines the function used to filter out members from each group. The function must return a logical scalar or a logical column vector with the same number of rows as the input data indicating which group members to select.

If the function returns a logical scalar, then either all members of the group are filtered out (when the value is false) or none are filtered out (when the value is true).
If the function returns a logical vector, then members of groups are filtered out when the corresponding element is false, and members are kept when the corresponding element is true.

To define the function handle, use a syntax of the form @(inputargs) mymethod, where mymethod depends on inputargs.

A function can filter for rows corresponding to groups that meet a condition. For example, @(x) mean(x) > 10 passes to the output only rows corresponding to groups with a group mean greater than 10.
A function can filter for rows that meet a condition within their corresponding group. For example, @(x) x == max(x) passes to the output only rows corresponding to the maximum value of rows within their group.

For more information, see Create Function Handle and Anonymous Functions.

When groupfilter applies the method to more than one nongrouping variable at a time, the method returns a logical scalar or vector for each variable. For each row, the corresponding values in all returned scalars or vectors must be true to pass the row to the output.

`groupbins` — Binning scheme for grouping variables or vectors
`"none"` (default) | vector of bin edges | number of bins | length of time (bin width) | name of time unit (bin width) | cell array of binning methods

Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.

"none" — No binning.
Vector of bin edges — The bin edges define the bins. You can specify the edges as numeric values or as datetime values for datetime grouping variables or vectors.
Number of bins — The number determines how many equally spaced bins to create. You can specify the number of bins as a positive integer scalar.
Length of time (bin width) — The length of time determines the width of each bin. You can specify the bin width as a duration or calendarDuration scalar for datetime or duration grouping variables or vectors.

Name of time unit (bin width) — The name of the time unit determines the width of each bin. You can specify the bin width as one of the options in this table for datetime or duration grouping variables or vectors.

Value	Description	Data Type
`"second"`	Each bin is 1 second.	`datetime` and `duration`
`"minute"`	Each bin is 1 minute.	`datetime` and `duration`
`"hour"`	Each bin is 1 hour.	`datetime` and `duration`
`"day"`	Each bin is 1 calendar day. This value accounts for daylight saving time shifts.	`datetime` and `duration`
`"week"`	Each bin is 1 calendar week.	`datetime` only
`"month"`	Each bin is 1 calendar month.	`datetime` only
`"quarter"`	Each bin is 1 calendar quarter.	`datetime` only
`"year"`	Each bin is 1 calendar year. This value accounts for leap days.	`datetime` and `duration`
`"decade"`	Each bin is 1 decade (10 calendar years).	`datetime` only
`"century"`	Each bin is 1 century (100 calendar years).	`datetime` only
`"secondofminute"`	Bins are seconds from 0 to 59.	`datetime` only
`"minuteofhour"`	Bins are minutes from 0 to 59.	`datetime` only
`"hourofday"`	Bins are hours from 0 to 23.	`datetime` only
`"dayofweek"`	Bins are days from 1 to 7. The first day of the week is Sunday.	`datetime` only
`"dayname"`	Bins are full day names, such as `"Sunday"`.	`datetime` only
`"dayofmonth"`	Bins are days from 1 to 31.	`datetime` only
`"dayofyear"`	Bins are days from 1 to 366.	`datetime` only
`"weekofmonth"`	Bins are weeks from 1 to 6.	`datetime` only
`"weekofyear"`	Bins are weeks from 1 to 54.	`datetime` only
`"monthname"`	Bins are full month names, such as `"January"`.	`datetime` only
`"monthofyear"`	Bins are months from 1 to 12.	`datetime` only
`"quarterofyear"`	Bins are quarters from 1 to 4.	`datetime` only

`datavars` — Table variables to operate on
scalar | vector | cell array | pattern | function handle | table `vartype` subscript

Table variables to operate on, specified as one of the options in this table. datavars indicates which variables of the input table or timetable to apply the filtering methods to. Other variables not specified by datavars pass through to the output without being operated on. groupfilter applies the filtering methods to the specified variables and uses the results to remove rows from all variables. When datavars is not specified, groupfilter operates on each nongrouping variable.

Indexing Scheme	Values to Specify	Examples
Variable names	A string scalar or character vector A string array or cell array of character vectors A `pattern` object	`"A"` or `'A'` — A variable named `A` `["A" "B"]` or `{'A','B'}` — Two variables named `A` and `B` `"Var"+digitsPattern(1)` — Variables named `"Var"` followed by a single digit
Variable index	An index number that refers to the location of a variable in the table A vector of numbers A `logical` vector. Typically, this vector is the same length as the number of variables, but you can omit trailing `0` (`false`) values.	`3` — The third variable from the table `[2 3]` — The second and third variables from the table `[false false true]` — The third variable
Function handle	A function handle that takes a table variable as input and returns a `logical` scalar	`@isnumeric` — All the variables containing numeric values
Variable type	A `vartype` subscript that selects variables of a specified type	`vartype("numeric")` — All the variables containing numeric values

Example: groupfilter(T,groupvars,method,["Var1" "Var2" "Var4"])

`LR` — Included bin edge for binning scheme
`"left"` (default) | `"right"`

Included bin edge for binning scheme, specified as either "left" or "right", indicating which end of the bin interval is inclusive.

You can specify LR only if you also specify groupbins, and the value applies to all binning methods for all grouping variables or vectors.

Output Arguments

collapse all

`G` — Output table
table | timetable

Output table for table or timetable input data, returned as a table or timetable. G contains the rows in T that satisfy the group-wise filtering method.

`B` — Output array
vector | matrix

Output array for array input data, returned as a vector or matrix. B contains the rows in A that satisfy the group-wise filtering method.

`BG` — Grouping vectors
column vector | cell array of column vectors

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B.

Alternative Functionality

Live Editor Task

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

Compute by Group task in the Live Editor

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The groupfilter function supports tall arrays with the following usage notes and limitations:

If A and groupvars are both tall matrices, then they must have the same number of rows.
If the first input is a tall matrix, then groupvars can be a cell array containing tall grouping vectors.
The groupvars and datavars arguments do not support function handles.
The method argument must be a valid input for splitapply operating on a tall array.
When grouping by discretized datetime arrays, the categorical group names are different compared to in-memory groupfilter calculations.

For more information, see Tall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Sparse inputs are not supported.
Binning scheme is not supported for datetime or duration data.
Input tables that contain multidimensional arrays are not supported.
Computation methods must be constant.
Grouping variables must be constant when the first input argument is a table.
Data variables must be constant.
Binning scheme specified as character vectors or strings must be constant.
Name-value arguments must be constant.
Computation methods cannot return sparse or multidimensional results.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

Version History

Introduced in R2019b

expand all

R2024a: Apply multiple binning methods to grouping variable

Apply multiple binning methods to one grouping variable or vector by specifying a cell array of binning methods.

R2022b: Code generation support

Generate C or C++ code for the groupfilter function. For usage notes and limitations, see C/C++ Code Generation.

R2022a: Improved performance with small group size

The groupfilter function shows improved performance, especially when the data count in each group is small.

For example, this code filters by group a matrix with 500 groups with a count of 10 each. The code is about 2.32x faster than in the previous release.

function timingGroupfilter
data = (1:5000)';
groups = repelem(1:length(data)/10,10)';
p = randperm(length(data));
data = data(p);
groups = groups(p);

tic
for k = 1:600
   G = groupfilter(data,groups,@(x) x == max(x));
end
toc
end

The approximate execution times are:

R2021b: 2.32 s

R2022a: 1.00 s

The code was timed on a Windows^® 10, Intel^® Xeon^® CPU E5-1650 v4 @ 3.60 GHz test system by calling the timingGroupfilter function.

groupfilter

Syntax

Description

Table Data

Array Data

Examples

Filter Two-Variable Table

Group by Largest Values

Group by Month

Filter Three-Variable Table

Filter with Vector Data

Input Arguments

`T` — Input table
table | timetable

`A` — Input array
column vector | matrix

`groupvars` — Grouping variables or vectors
scalar | vector | matrix | cell array | pattern | function handle | table `vartype` subscript

`method` — Filtering method
function handle

`groupbins` — Binning scheme for grouping variables or vectors
`"none"` (default) | vector of bin edges | number of bins | length of time (bin width) | name of time unit (bin width) | cell array of binning methods

`datavars` — Table variables to operate on
scalar | vector | cell array | pattern | function handle | table `vartype` subscript

`LR` — Included bin edge for binning scheme
`"left"` (default) | `"right"`

Output Arguments

`G` — Output table
table | timetable

`B` — Output array
vector | matrix

`BG` — Grouping vectors
column vector | cell array of column vectors

Alternative Functionality

Live Editor Task

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Version History

R2024a: Apply multiple binning methods to grouping variable

R2022b: Code generation support

R2022a: Improved performance with small group size

See Also

Functions

Live Editor Tasks

Topics

groupfilter

Syntax

Description

Table Data

Array Data

Examples

Filter Two-Variable Table

Group by Largest Values

Group by Month

Filter Three-Variable Table

Filter with Vector Data

Input Arguments

T — Input table table | timetable

A — Input array column vector | matrix

groupvars — Grouping variables or vectors scalar | vector | matrix | cell array | pattern | function handle | table vartype subscript

method — Filtering method function handle

groupbins — Binning scheme for grouping variables or vectors "none" (default) | vector of bin edges | number of bins | length of time (bin width) | name of time unit (bin width) | cell array of binning methods

datavars — Table variables to operate on scalar | vector | cell array | pattern | function handle | table vartype subscript

LR — Included bin edge for binning scheme "left" (default) | "right"

Output Arguments

G — Output table table | timetable

B — Output array vector | matrix

BG — Grouping vectors column vector | cell array of column vectors

Alternative Functionality

Live Editor Task

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

R2024a: Apply multiple binning methods to grouping variable

R2022b: Code generation support

R2022a: Improved performance with small group size

See Also

Functions

Live Editor Tasks

Topics

`T` — Input table
table | timetable

`A` — Input array
column vector | matrix

`groupvars` — Grouping variables or vectors
scalar | vector | matrix | cell array | pattern | function handle | table `vartype` subscript

`method` — Filtering method
function handle

`groupbins` — Binning scheme for grouping variables or vectors
`"none"` (default) | vector of bin edges | number of bins | length of time (bin width) | name of time unit (bin width) | cell array of binning methods

`datavars` — Table variables to operate on
scalar | vector | cell array | pattern | function handle | table `vartype` subscript

`LR` — Included bin edge for binning scheme
`"left"` (default) | `"right"`

`G` — Output table
table | timetable

`B` — Output array
vector | matrix

`BG` — Grouping vectors
column vector | cell array of column vectors

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.