For loop operation in ODE45 performs faster than equivalent indexing

Question

Nick 2025-3-20

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2175435-for-loop-operation-in-ode45-performs-faster-than-equivalent-indexing

回答： Walter Roberson 2025-3-21

I have two version of a code that solves the advection equation using a 5th order WENO discretization scheme. In one version a for loop is used to generate the equations for each discretization node. For the other, I replaced the for loop with indexing, hoping to improve the speed. The for loop version always performs faster than the indexed version by 2-4 times (regardless of the number of discretizations). I was always under the assumption that indexing is preferred due to speed. Any help in understanding why this could happen would be greatly appreciated as I apply this logic to other more complicated problems.

%% Code with For Loop:
clear; hold on
tic
%% Parameters
Nx = 300;            % Number of spatial points (column length)
L = 2 * pi;         % Column length
dx = L / Nx;        % Grid spacing
x = linspace(0, L, Nx); % Spatial grid
a = 1.0;             % Advection speed
T_final = 10;       % Final time
T_injection = 0.01;   % Injection time at x = 0
%% Initial Condition (Zero Concentration)
u0 = zeros(Nx, 1);  
%% Time Integration using ODE45 (better for non-stiff advection)
options = odeset('RelTol',1e-6,'AbsTol',1e-6);
[t_sol, u_sol] = ode45(@(t, u) WENO5_ODE(t, u, a, dx, Nx, T_injection), [0 T_final], u0, options);
%% Extract Outlet Concentration (Last Grid Point)
outlet_concentration = u_sol(:, end); % Last column (outlet) over time
%% Calculate HETP
% Find retention time (t_R) as time at peak concentration
[peak_conc, peak_idx] = max(outlet_concentration);
t_R = t_sol(peak_idx); % Retention time
% Compute standard deviation of outlet concentration profile (σ_t)
sigma_t = sqrt(sum((t_sol - t_R).^2 .* outlet_concentration) / sum(outlet_concentration));
% Compute number of theoretical plates (N)
N = (t_R / sigma_t)^2;
% Compute HETP
HETP = L / N;
%% Display HETP
disp(['N = ', num2str(N)]);
%% Plot Outlet Concentration vs. Time
hold on
plot(t_sol, outlet_concentration, 'b', 'LineWidth', 2);
xlabel('Time'); ylabel('Outlet Concentration');
title('Outlet Concentration vs. Time (Pulse Response)');
%grid on;
%% Function: ODE System using WENO5
function dudt = WENO5_ODE(t, u, a, dx, Nx, T_injection)
% Apply step injection at inlet for t <= T_injection
u_bc = u;  % Copy u for boundary modification
if t <= T_injection
    u_bc(1) = 1;  % Inject concentration at x = 0
else
    u_bc(1) = 0;  % Stop injection
end
% Compute the spatial derivative using WENO5
dudt = WENO5_Advection(u_bc, a, dx, Nx);
end
%% Function: WENO5 Advection Term (With a For Loop)
function dudx = WENO5_Advection(u, a, dx, Nx)
% Extend solution for boundary conditions (Zero-Gradient at outlet)
u_ext = [u(1); u(1); u; u(end); u(end)];  % Left BC (Dirichlet), Right BC (Zero-Gradient)
f_ext = a * u_ext;
% Flux reconstruction at interfaces
f_half = zeros(Nx+1, 1);
for j = 3:Nx+2
    % Compute stencil-based fluxes
    f0 = (2/6) * f_ext(j-2) + (-7/6) * f_ext(j-1) + (11/6) * f_ext(j);  % Left-most stencil
    f1 = (-1/6) * f_ext(j-1) + (5/6) * f_ext(j) + (2/6) * f_ext(j+1);  % Center-left stencil
    f2 = (2/6) * f_ext(j) + (5/6) * f_ext(j+1) + (-1/6) * f_ext(j+2);  % Center-right stencil
    % Compute smoothness indicators with correct 13/12 and 1/4 factors
    beta0 = (13/12) * (f_ext(j-2) - 2*f_ext(j-1) + f_ext(j))^2 + (1/4) * (f_ext(j-2) - 4*f_ext(j-1) + 3*f_ext(j))^2;
    beta1 = (13/12) * (f_ext(j-1) - 2*f_ext(j) + f_ext(j+1))^2 + (1/4) * (f_ext(j-1) - f_ext(j+1))^2;
    beta2 = (13/12) * (f_ext(j) - 2*f_ext(j+1) + f_ext(j+2))^2 + (1/4) * (3*f_ext(j) - 4*f_ext(j+1) + f_ext(j+2))^2;
    % Linear weights
    d0 = 0.1; d1 = 0.6; d2 = 0.3;
    % Compute nonlinear weights
    epsilon = 1e-15;
    alpha0 = d0 / (beta0 + epsilon)^2;
    alpha1 = d1 / (beta1 + epsilon)^2;
    alpha2 = d2 / (beta2 + epsilon)^2;
    % Normalize weights
    omega0 = alpha0 / (alpha0 + alpha1 + alpha2);
    omega1 = alpha1 / (alpha0 + alpha1 + alpha2);
    omega2 = alpha2 / (alpha0 + alpha1 + alpha2);
    % Compute final WENO5 flux
    f_half(j-1) = omega0 * f0 + omega1 * f1 + omega2 * f2;
end
% Compute flux difference with Zero-Gradient BC at outlet (First-order Approximation)
dudx = -(1/dx) * (f_half(2:Nx+1) - f_half(1:Nx));
end
toc

%% Code with Indexing
clear; hold on
tic
%% Parameters
Nx = 300;            % Number of spatial points (column length)
L = 2 * pi;         % Column length
dx = L / Nx;        % Grid spacing
x = linspace(0, L, Nx); % Spatial grid
a = 1.0;             % Advection speed
T_final = 10;       % Final time
T_injection = 0.01;   % Injection time at x = 0
%% Initial Condition (Zero Concentration)
u0 = zeros(Nx, 1);  
%% Time Integration using ODE45 (better for non-stiff advection)
options = odeset('RelTol',1e-6,'AbsTol',1e-6);
[t_sol, u_sol] = ode45(@(t, u) WENO5_ODE(t, u, a, dx, Nx, T_injection), [0 T_final], u0, options);
%% Extract Outlet Concentration (Last Grid Point)
outlet_concentration = u_sol(:, end); % Last column (outlet) over time
%% Calculate HETP
% Find retention time (t_R) as time at peak concentration
[peak_conc, peak_idx] = max(outlet_concentration);
t_R = t_sol(peak_idx); % Retention time
% Compute standard deviation of outlet concentration profile (σ_t)
sigma_t = sqrt(sum((t_sol - t_R).^2 .* outlet_concentration) / sum(outlet_concentration));
% Compute number of theoretical plates (N)
N = (t_R / sigma_t)^2;
% Compute HETP
HETP = L / N;
%% Display HETP
disp(['N = ', num2str(N)]);
%% Plot Outlet Concentration vs. Time
hold on
plot(t_sol, outlet_concentration, 'b', 'LineWidth', 2);
xlabel('Time'); ylabel('Outlet Concentration');
title('Outlet Concentration vs. Time (Pulse Response)');
%grid on;
%% Function: ODE System using WENO5
function dudt = WENO5_ODE(t, u, a, dx, Nx, T_injection)
% Apply step injection at inlet for t <= T_injection
u_bc = u;  % Copy u for boundary modification
if t <= T_injection
    u_bc(1) = 1;  % Inject concentration at x = 0
else
    u_bc(1) = 0;  % Stop injection
end
% Compute the spatial derivative using WENO5
dudt = WENO5_Advection(u_bc, a, dx, Nx);
end
%% Function: WENO5 Advection Term (With a For Loop)
function dudx = WENO5_Advection(u, a, dx, Nx)
% Extend solution for boundary conditions (Zero-Gradient at outlet)
u_ext = [u(1); u(1); u; u(end); u(end)];  % Left BC (Dirichlet), Right BC (Zero-Gradient)
f_ext = a * u_ext;
% Flux reconstruction at interfaces
f_half = zeros(Nx+1, 1);
index = 1:Nx;
% Compute stencil-based fluxes
f0(index) = (2/6) * f_ext(index) + (-7/6) .* f_ext(index+1) + (11/6) .* f_ext(index+2);  % Left-most stencil
f1(index) = (-1/6) * f_ext(index+1) + (5/6) .* f_ext(index+2) + (2/6) .* f_ext(index+3);  % Center-left stencil
f2(index) = (2/6) * f_ext(index+2) + (5/6).* f_ext(index+3) + (-1/6) .* f_ext(index+4);  % Center-right stencil
% Compute smoothness indicators with correct 13/12 and 1/4 factors
beta0(index) = (13/12) * (f_ext(index) - 2*f_ext(index+1) + f_ext(index+2)).^2 + (1/4) * (f_ext(index) - 4*f_ext(index+1) + 3*f_ext(index+2)).^2;
beta1(index) = (13/12) * (f_ext(index+1) - 2*f_ext(index+2) + f_ext(index+3)).^2 + (1/4) * (f_ext(index+1) - f_ext(index+3)).^2;
beta2(index) = (13/12) * (f_ext(index+2) - 2*f_ext(index+3) + f_ext(index+4)).^2 + (1/4) * (3*f_ext(index+2) - 4*f_ext(index+3) + f_ext(index+4)).^2;
% Linear weights
d0 = 0.1; d1 = 0.6; d2 = 0.3;
% Compute nonlinear weights
epsilon = 1e-15;
alpha0 = d0 ./ (beta0 + epsilon).^2;
alpha1 = d1 ./ (beta1 + epsilon).^2;
alpha2 = d2 ./ (beta2 + epsilon).^2;
% Normalize weights
omega0 = alpha0 ./ (alpha0 + alpha1 + alpha2);
omega1 = alpha1 ./ (alpha0 + alpha1 + alpha2);
omega2 = alpha2 ./ (alpha0 + alpha1 + alpha2);
% Compute final WENO5 flux
f_half(2:Nx+1) = omega0 .* f0 + omega1 .* f1 + omega2 .* f2;
% Compute flux difference with Zero-Gradient BC at outlet (First-order Approximation)
dudx = -(1/dx) .* (f_half(2:Nx+1) - f_half(1:Nx));
end
toc

4 个评论
显示 2更早的评论隐藏 2更早的评论

Sam Chak 2025-3-20

在 MATLAB Online 中打开

@Nick, Looks like you're trying to inject an Unit Impulse Signal?

t = linspace(0, 10, 1001);

T_injection = 0.01;

u_bc = zeros(1, numel(t));

for i = 1:numel(t)

if t(i) <= T_injection

u_bc(i) = 1; % Inject concentration at x = 0

else

u_bc(i) = 0; % Stop injection

end

plot(t, u_bc), grid on, axis([-2, 12, -0.2, 1.2]),

title('Impulse signal injection')

xlabel('Time / s'), ylabel('Inlet ON/OFF state')

●

Nick 2025-3-20

Hello, yes that is correct, although I want it to be an impulse of finite thickness dictated by t_inj

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2025-3-21

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2175435-for-loop-operation-in-ode45-performs-faster-than-equivalent-indexing#answer_1562163

在 MATLAB Online 中打开

index = 1:Nx;
% Compute stencil-based fluxes
f0(index) = (2/6) * f_ext(index) + (-7/6) .* f_ext(index+1) + (11/6) .* f_ext(index+2);  % Left-most stencil

The 1:Nx portion requires creating an explicit vector of values 1 through Nx, and storing those values in a variable.

Then the f_ext(index) requires recalling that vector of indexes, and building a temporary variable the same length as the index vector, and then iterating over each element of the index vector moving the corresponding source value to the output location.

This process does not benefit from the fact that the index vector is consecutive elements (well, it is not out of the question that there is an optimization step that takes that into account.) In general, the index vector could be scrambled like

index = randperm(Nx);

or in general there could be gaps such as

index = [1:50, 101:Nx];

so the general process has to work iteratively in extracting elements given by the index vector.

The f_ext(index+1) part requires recalling the index vector, building a temporary output variable the same size, and adding 1 to each element of the index vector storing the result into the temporary variable, and then afterwards iterating over each element of the temporary variable pulling the appropriate value into a temporary output location.

By way of contrast, the looping version is working with only one element at a time, and the indexing is fairly likely optimized.

Please note that if you were to write

f0 = (2/6) * f_ext(1:Nx) + (-7/6) .* f_ext(2:Nx+1) + (11/6) .* f_ext(3:Nx+2);  % Left-most stencil

then the optimizer can know that consecutive elements of f_ext are being used, which can be noteably more efficient in moving data around.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

For loop operation in ODE45 performs faster than equivalent indexing

4 个评论
显示 2更早的评论隐藏 2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

For loop operation in ODE45 performs faster than equivalent indexing

4 个评论 显示 2更早的评论隐藏 2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论