istftLayer
Description
An ISTFT layer computes the inverse short-time Fourier transform of the input. Use of this layer requires Deep Learning Toolbox™.
Creation
Description
creates an Inverse Short-Time Fourier Transform (ISTFT) layer. The input to layer
= istftLayeristftLayer
must be a real-valued dlarray
(Deep Learning Toolbox) object in
"CBT"
or "SCBT"
format.
For
"CBT"
inputs, the size of the channel ("C"
) dimension must be even and divisible byfloor(
.FFTLength
/2)+1For
"SCBT"
inputs, the size of the spatial ("S"
) dimension must equalfloor(
.FFTLength
/2)+1
The output of istftLayer
is a real-valued array in
"CBT"
format.
For more information, see Layer Input and Output Formats.
creates an ISTFT layer with properties specified by one or more name-value arguments. You
can specify the analysis window and the number of overlapped samples, among others.layer
= istftLayer(Name=Value
)
Properties
ISTFT
Window
— Windowing function
hann(128,"periodic")
(default) | vector
This property is read-only.
Windowing function used to compute the ISTFT, specified as a vector with two or
more elements. For perfect reconstruction, use the same window as in stftLayer
.
For a list of available windows, see Windows.
You can set this property when you create an istftLayer
object. After you create an istftLayer
object, this property is read-only.
Note
istftLayer
initializes the weights internally so that
Window
is in single precision. Initializing the weights
directly is not recommended.
Example: hann(N+1)
and
(1-cos(2*pi*(0:N)'/N))/2
both specify a Hann window of length
N
+ 1.
Data Types: double
| single
OverlapLength
— Number of overlapped samples
75%
of window length (default) | nonnegative integer
This property is read-only.
Number of overlapped samples, specified as a nonnegative integer smaller than the
length of window
. If you omit OverlapLength
or
specify it as empty, the object sets it to the largest integer less than 75% of the
window length, which turns out to be 96 samples for the default Hann window.
Equivalently, the stride between adjoining segments is 32 samples.
You can set this property when you create an istftLayer
object. After you create an istftLayer
object, this property is read-only.
Data Types: double
| single
FFTLength
— Number of DFT points
128
(default) | positive integer
This property is read-only.
Number of discrete Fourier transform (DFT) points, specified as a positive integer
greater than or equal to the window length. To achieve perfect time-domain
reconstruction, set the number of DFT points to match that used in stftLayer
.
You can set this property when you create an istftLayer
object. After you create an istftLayer
object, this property is read-only.
Data Types: double
| single
Method
— Method of overlap-add
"wola"
(default) | "ola"
This property is read-only.
Method of overlap-add, specified as one of these:
"wola"
— Weighted overlap-add"ola"
— Overlap-add
You can set this property when you create an istftLayer
object. After you create an istftLayer
object, this property is read-only.
ExpectedOutputSize
— Expected number of channels and samples
"none"
(default) | two-element vector
This property is read-only.
Expected number of channels and samples output by istftLayer
,
specified as a two-element vector of positive integers. The first element is the
expected number of channels, and the second element is the expected number of time
samples.
By default, istftLayer
does not check the output size of the
ISTFT. If you specify ExpectedOutputSize
,
istftLayer
errors if the inverse short-time Fourier transform for
the given inputs do not match ExpectedOutputSize
in the number of
channels and samples.
You can set this property when you create an istftLayer
object. After you create an istftLayer
object, this property is read-only.
Data Types: single
| double
Layer
WeightLearnRateFactor
— Multiplier for weight learning rate
0
(default) | nonnegative scalar
Multiplier for weight learning rate, specified as a nonnegative scalar. If you do
not specify this property, it defaults to zero, resulting in weights that do not
update with training. You can also set this property using the setLearnRateFactor
(Deep Learning Toolbox) function.
Data Types: double
| single
Name
— Layer name
""
(default) | character vector | string scalar
Layer name, specified as a character vector or string scalar.
For Layer
array input, the trainnet
(Deep Learning Toolbox) and
dlnetwork
(Deep Learning Toolbox) functions automatically assign
names to layers with the name ""
.
The istftLayer
object stores this property as a character vector.
Data Types: char
| string
NumInputs
— Number of inputs
1
(default)
This property is read-only.
Number of inputs to the layer, returned as 1
. This layer accepts a
single input only.
Data Types: double
InputNames
— Input names
{'in'}
(default)
This property is read-only.
Input names, returned as {'in'}
. This layer accepts a single input
only.
Data Types: cell
NumOutputs
— Number of outputs
1
(default)
This property is read-only.
Number of outputs from the layer, returned as 1
. This layer has a
single output only.
Data Types: double
OutputNames
— Output names
{'out'}
(default)
This property is read-only.
Output names, returned as {'out'}
. This layer has a single output
only.
Data Types: cell
Examples
Create ISTFT Layer
Create an inverse short-time Fourier transform layer. Specify a 64-sample Hamming window. Specify 63 overlapped samples between adjoining segments.
layer = istftLayer(Window=hamming(64),OverlapLength=63)
layer = istftLayer with properties: Name: '' WeightLearnRateFactor: 0 Window: [64x1 double] OverlapLength: 63 FFTLength: 64 Method: 'wola' ExpectedOutputSize: 'none' Learnable Parameters Weights: [64x1 single] State Parameters No properties. Use properties method to see a list of all properties.
Using istftLayer
in Deep Learning Network
Create an array of five layers, containing a sequence input layer, an STFT layer, an LSTM layer, an ISTFT layer, and a regression layer. There is one feature in the sequence input. Set the minimum signal length in the sequence input layer to 2048 samples. Use the default window of length 128 for both STFT and ISTFT layers.
layers = [
sequenceInputLayer(1,MinLength=2048)
stftLayer(TransformMode="realimag")
lstmLayer(130)
fullyConnectedLayer(130)
istftLayer];
Create a random array containing a batch of 10 signals and 2048 samples. Save the signal as a dlarray
in "CBT"
format. Analyze the layers as a dlnetwork
using the example network input.
networkInput = dlarray(randn(1,10,2048,"single"),"CBT"); analyzeNetwork(layers,networkInput,targetusage="dlnetwork")
ISTFT of Zero-Padded Data in Deep Learning Network
Create a deep learning network that demonstrates perfect reconstruction of the short-time Fourier transform (STFT) of a dlarray
. To minimize edge effects, the network zero-pads the data before computing the STFT.
Generate a 3-by-2000-by-5 array containing five batches of a three-channel sinusoidal signal sampled at 1 kHz for two seconds. Save the array as a dlarray
, specifying the dimensions in order. dlarray
permutes the array dimensions to the "CBT"
shape expected by a deep learning network. Display the array dimension sizes.
Fs = 1e3; nchan = 3; nbtch = 5; nsamp = 2000; t = (0:nsamp-1)/Fs; x = zeros(nchan,nsamp,nbtch); for k=1:nbtch x(:,:,k) = sin(k*pi.*(1:nchan)'*t)+cos(k*pi.*(1:nchan)'*t); end xd = dlarray(x,"CTB");
Design a periodic Hann window of length 100 and set the number of overlap samples to 75. Check the window and overlap length for COLA compliance.
nwin = 100;
win = hann(nwin,"periodic");
noverlap = 75;
tf = iscola(win,noverlap)
tf = logical
1
Create a STFT layer that uses the Hann window. Set the number of overlap samples to 75 and FFT length to 128. Set the layer transform mode to "realimag"
to concatenate the real and imaginary parts of the layer output along the channel dimension. Create an ISTFT layer using the same FFT length, window, and overlap.
fftlen = 128; ftl = stftLayer(Window=win,FFTLength=fftlen, ... OverlapLength=noverlap,TransformMode="realimag"); iftl = istftLayer(Window=win,FFTLength=fftlen, ... OverlapLength=noverlap);
Create a deep learning network appropriate for the data that demonstrates perfect reconstruction of the STFT. Use a function layer to zero-pad the data on both sides along the time dimension before computing the STFT. The length of the zero-pad is the window length. Use a function layer after the ISTFT layer to trim both sides of the ISTFT layer output by the same amount.
layers = [ sequenceInputLayer(nchan,MinLength=nsamp) functionLayer(@(X) paddata(X,nsamp+2*nwin,dimension=3,side="both")) ftl iftl functionLayer(@(X) trimdata(X,nsamp,dimension=3,side="both"))]; dlnet = dlnetwork(layers);
Analyze the network using the data. The number of channels of the STFT layer output is twice the layer input.
analyzeNetwork(dlnet,xd)
Run the data through the forward
method of the network.
dataout = forward(dlnet,xd);
The output is a dlarray
in "CBT"
format. Convert the network output to a numeric array. Permute the dimensions so that each page is a batch.
xrec = extractdata(dataout); xrec = permute(xrec,[1 3 2]);
Choose a batch. Plot the original and reconstructed multichannel signal of that batch as a stacked plot.
wb = 4; tiledlayout(2,1) nexttile stackedplot(x(:,:,wb)',DisplayLabels="Channel "+string(1:nchan)) title("Batch "+num2str(wb)+": Original") nexttile stackedplot(xrec(:,:,wb)',DisplayLabels="Channel "+string(1:nchan)) title("Batch "+num2str(wb)+": Reconstruction")
Confirm perfect reconstruction of the data.
max(abs(x(:)-xrec(:)))
ans = single
5.6394e-07
More About
Inverse Short-Time Fourier Transform
The inverse short-time Fourier transform is computed by taking the IFFT of each DFT vector of the STFT and overlap-adding the inverted signals.
Recall that the STFT of a signal is computed by sliding an analysis window g(n) of length M over the signal and calculating the discrete Fourier transform (DFT) of each segment of windowed data. The window hops over the original signal at intervals of R samples, equivalent to L = M – R samples of overlap between adjoining segments. The ISTFT is calculated as follows.
where is the DFT of the windowed data centered about time and . The inverse STFT is a perfect reconstruction of the original signal as long as where is a nonzero constant and equals 0 or 1. For more information, see Constant Overlap-Add (COLA) Constraint. This figure depicts the steps in reconstructing the original signal.
Constant Overlap-Add (COLA) Constraint
To ensure successful reconstruction of nonmodified spectra, the analysis window must satisfy the COLA constraint. In general, if the analysis window satisfies the condition where is a nonzero constant and equals 0 or 1, the window is considered to be COLA-compliant. Additionally, COLA compliance can be described as either weak or strong.
Weak COLA compliance implies that the Fourier transform of the analysis window has zeros at frame-rate harmonics such that
Alias cancellation is disturbed by spectral modifications. Weak COLA relies on alias cancellation in the frequency domain. Therefore, perfect reconstruction is possible using weakly COLA-compliant windows as long as the signal has not undergone any spectral modifications.
For strong COLA compliance, the Fourier transform of the window must be bandlimited consistently with downsampling by the frame rate such that
This equation shows that no aliasing is allowed by the strong COLA constraint. Additionally, for strong COLA compliance, the value of the constant must equal 1. In general, if the short-time spectrum is modified in any way, a stronger COLA compliant window is preferred.
You can use the iscola
function to check for weak COLA compliance. The number of summations used to check COLA compliance is dictated by the window length and hop size. In general, it is common to use in for weighted overlap-add (WOLA), and for overlap-add (OLA). By default, istft
uses the WOLA method, by applying a synthesis window before performing the overlap-add method.
In general, the synthesis window is the same as the analysis window. You can construct useful WOLA windows by taking the square root of a strong OLA window. You can use this method for all nonnegative OLA windows. For example, the root-Hann window is a good example of a WOLA window.
Perfect Reconstruction
In general, computing the STFT of an input signal and inverting it does not result in perfect reconstruction. If you want the output of ISTFT to match the original input signal as closely as possible, the signal and the window must satisfy the following conditions:
Input size — If you invert the output of
stft
usingistft
and want the result to be the same length as the input signalx
, the value ofmust be an integer. In the equation, Nx is the length of the signal, M is the length of the window, and L is the overlap length.
COLA compliance — Use COLA-compliant windows, assuming that you have not modified the short-time Fourier transform of the signal.
Padding — If the length of the input signal is such that the value of k is not an integer, zero-pad the signal before computing the short-time Fourier transform. Remove the extra zeros after inverting the signal.
You can use the stftmag2sig
function to obtain an estimate of a signal reconstructed from the magnitude of its
STFT.
Layer Input and Output Formats
The input to istftLayer
must be a real-valued
dlarray
object in "CBT"
or "SCBT"
format. The output is a real-valued "CBT"
array.
If the input is in "SCBT"
format, the "S"
dimension corresponds to frequency.
If the input is in "CBT"
format, istftLayer
assumes
the frequency and channel dimensions have been flattened into the channel dimension.
Algorithms
The size of the ISTFT depends on the dimensions and data format of the input STFT, the length of the windowing function, the number of overlapped samples and the number of DFT points.
Define the hop size as hopSize =
length(
. The
number of samples in the ISTFT is
Window
)-OverlapLength
length(
, where
Window
)+(nseg-1)*hopSizenseg
is the size of the input in the time ("T"
) dimension.
If the input to
istftLayer
is a"SCBT"
formatteddlarray
, the number of channels isszC/2
, whereszC
is the size of the input in the channel ("C"
) dimension.If the input to
istftLayer
is a"CBT"
formatteddlarray
, the number of channels isszC/(2*nfreq)
, wherenfreq = floor(
.FFTLength
/2)+1
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)