# 使用 GPU 和向量化计算提高性能

MATLAB® 针对涉及矩阵和向量的运算进行了优化。修改基于循环、面向标量的代码以使用 MATLAB 矩阵和向量运算的进程称为向量化。向量化代码通常比相应的基于循环的代码运行得更快，并且通常更短且更容易理解。有关向量化的介绍，请参阅 向量化的应用

### GPU 和 CPU 上基于时间循环的函数执行

1. 将每列数据从时间域转换到频域。

2. 将频域数据与滤波器向量的变换相乘。

3. 将过滤后的数据转换回时间域并将结果存储在矩阵中。

```data = complex(randn(4096,100),randn(4096,100)); filter = randn(16,1);```

`CPUtime = timeit(@()fastConvolution(data,filter))`
```CPUtime = 0.0140 ```

```gpu = gpuDevice; disp(gpu.Name + " GPU selected.")```
```NVIDIA RTX A5000 GPU selected. ```

```gData = gpuArray(data); gFilter = gpuArray(filter); GPUtime = gputimeit(@()fastConvolution(gData,gFilter))```
```GPUtime = 0.0144 ```

### CPU 和 GPU 上向量化函数的执行时间

`CPUtimeVectorized = timeit(@()fastConvolutionVectorized(data,filter))`
```CPUtimeVectorized = 0.0062 ```
`GPUtimeVectorized = gputimeit(@()fastConvolutionVectorized(gData,gFilter))`
```GPUtimeVectorized = 4.5717e-04 ```
`CPUspeedup = CPUtime/CPUtimeVectorized`
```CPUspeedup = 2.2513 ```
`GPUspeedup = GPUtime/GPUtimeVectorized`
```GPUspeedup = 31.3964 ```
```bar(categorical(["CPU" "GPU"]), ... [CPUtime CPUtimeVectorized; GPUtime GPUtimeVectorized], ... "grouped") ylabel("Execution Time (s)") legend("Unvectorized","Vectorized")```

### 支持函数

```function y = fastConvolution(data,filter) % Zero-pad filter to the column length of data, and transform [rows,cols] = size(data); filter_f = fft(filter,rows); % Create an array of zeros of the same size and class as data y = zeros(rows,cols,'like',data); for idx = 1:cols % Transform each column of data data_f = fft(data(:,idx)); % Multiply each column by filter and compute inverse transform y(:,idx) = ifft(filter_f.*data_f); end end```

```function y = fastConvolutionVectorized(data,filter) % Zero-pad filter to the length of data, and transform [rows,~] = size(data); filter_f = fft(filter,rows); % Transform each column of the input data_f = fft(data); % Multiply each column by filter and compute inverse transform y = ifft(filter_f.*data_f); end```