For loop operation in ODE45 performs faster than equivalent indexing
6 次查看(过去 30 天)
显示 更早的评论
I have two version of a code that solves the advection equation using a 5th order WENO discretization scheme. In one version a for loop is used to generate the equations for each discretization node. For the other, I replaced the for loop with indexing, hoping to improve the speed. The for loop version always performs faster than the indexed version by 2-4 times (regardless of the number of discretizations). I was always under the assumption that indexing is preferred due to speed. Any help in understanding why this could happen would be greatly appreciated as I apply this logic to other more complicated problems.
%% Code with For Loop:
clear; hold on
tic
%% Parameters
Nx = 300; % Number of spatial points (column length)
L = 2 * pi; % Column length
dx = L / Nx; % Grid spacing
x = linspace(0, L, Nx); % Spatial grid
a = 1.0; % Advection speed
T_final = 10; % Final time
T_injection = 0.01; % Injection time at x = 0
%% Initial Condition (Zero Concentration)
u0 = zeros(Nx, 1);
%% Time Integration using ODE45 (better for non-stiff advection)
options = odeset('RelTol',1e-6,'AbsTol',1e-6);
[t_sol, u_sol] = ode45(@(t, u) WENO5_ODE(t, u, a, dx, Nx, T_injection), [0 T_final], u0, options);
%% Extract Outlet Concentration (Last Grid Point)
outlet_concentration = u_sol(:, end); % Last column (outlet) over time
%% Calculate HETP
% Find retention time (t_R) as time at peak concentration
[peak_conc, peak_idx] = max(outlet_concentration);
t_R = t_sol(peak_idx); % Retention time
% Compute standard deviation of outlet concentration profile (σ_t)
sigma_t = sqrt(sum((t_sol - t_R).^2 .* outlet_concentration) / sum(outlet_concentration));
% Compute number of theoretical plates (N)
N = (t_R / sigma_t)^2;
% Compute HETP
HETP = L / N;
%% Display HETP
disp(['N = ', num2str(N)]);
%% Plot Outlet Concentration vs. Time
hold on
plot(t_sol, outlet_concentration, 'b', 'LineWidth', 2);
xlabel('Time'); ylabel('Outlet Concentration');
title('Outlet Concentration vs. Time (Pulse Response)');
%grid on;
%% Function: ODE System using WENO5
function dudt = WENO5_ODE(t, u, a, dx, Nx, T_injection)
% Apply step injection at inlet for t <= T_injection
u_bc = u; % Copy u for boundary modification
if t <= T_injection
u_bc(1) = 1; % Inject concentration at x = 0
else
u_bc(1) = 0; % Stop injection
end
% Compute the spatial derivative using WENO5
dudt = WENO5_Advection(u_bc, a, dx, Nx);
end
%% Function: WENO5 Advection Term (With a For Loop)
function dudx = WENO5_Advection(u, a, dx, Nx)
% Extend solution for boundary conditions (Zero-Gradient at outlet)
u_ext = [u(1); u(1); u; u(end); u(end)]; % Left BC (Dirichlet), Right BC (Zero-Gradient)
f_ext = a * u_ext;
% Flux reconstruction at interfaces
f_half = zeros(Nx+1, 1);
for j = 3:Nx+2
% Compute stencil-based fluxes
f0 = (2/6) * f_ext(j-2) + (-7/6) * f_ext(j-1) + (11/6) * f_ext(j); % Left-most stencil
f1 = (-1/6) * f_ext(j-1) + (5/6) * f_ext(j) + (2/6) * f_ext(j+1); % Center-left stencil
f2 = (2/6) * f_ext(j) + (5/6) * f_ext(j+1) + (-1/6) * f_ext(j+2); % Center-right stencil
% Compute smoothness indicators with correct 13/12 and 1/4 factors
beta0 = (13/12) * (f_ext(j-2) - 2*f_ext(j-1) + f_ext(j))^2 + (1/4) * (f_ext(j-2) - 4*f_ext(j-1) + 3*f_ext(j))^2;
beta1 = (13/12) * (f_ext(j-1) - 2*f_ext(j) + f_ext(j+1))^2 + (1/4) * (f_ext(j-1) - f_ext(j+1))^2;
beta2 = (13/12) * (f_ext(j) - 2*f_ext(j+1) + f_ext(j+2))^2 + (1/4) * (3*f_ext(j) - 4*f_ext(j+1) + f_ext(j+2))^2;
% Linear weights
d0 = 0.1; d1 = 0.6; d2 = 0.3;
% Compute nonlinear weights
epsilon = 1e-15;
alpha0 = d0 / (beta0 + epsilon)^2;
alpha1 = d1 / (beta1 + epsilon)^2;
alpha2 = d2 / (beta2 + epsilon)^2;
% Normalize weights
omega0 = alpha0 / (alpha0 + alpha1 + alpha2);
omega1 = alpha1 / (alpha0 + alpha1 + alpha2);
omega2 = alpha2 / (alpha0 + alpha1 + alpha2);
% Compute final WENO5 flux
f_half(j-1) = omega0 * f0 + omega1 * f1 + omega2 * f2;
end
% Compute flux difference with Zero-Gradient BC at outlet (First-order Approximation)
dudx = -(1/dx) * (f_half(2:Nx+1) - f_half(1:Nx));
end
toc
%% Code with Indexing
clear; hold on
tic
%% Parameters
Nx = 300; % Number of spatial points (column length)
L = 2 * pi; % Column length
dx = L / Nx; % Grid spacing
x = linspace(0, L, Nx); % Spatial grid
a = 1.0; % Advection speed
T_final = 10; % Final time
T_injection = 0.01; % Injection time at x = 0
%% Initial Condition (Zero Concentration)
u0 = zeros(Nx, 1);
%% Time Integration using ODE45 (better for non-stiff advection)
options = odeset('RelTol',1e-6,'AbsTol',1e-6);
[t_sol, u_sol] = ode45(@(t, u) WENO5_ODE(t, u, a, dx, Nx, T_injection), [0 T_final], u0, options);
%% Extract Outlet Concentration (Last Grid Point)
outlet_concentration = u_sol(:, end); % Last column (outlet) over time
%% Calculate HETP
% Find retention time (t_R) as time at peak concentration
[peak_conc, peak_idx] = max(outlet_concentration);
t_R = t_sol(peak_idx); % Retention time
% Compute standard deviation of outlet concentration profile (σ_t)
sigma_t = sqrt(sum((t_sol - t_R).^2 .* outlet_concentration) / sum(outlet_concentration));
% Compute number of theoretical plates (N)
N = (t_R / sigma_t)^2;
% Compute HETP
HETP = L / N;
%% Display HETP
disp(['N = ', num2str(N)]);
%% Plot Outlet Concentration vs. Time
hold on
plot(t_sol, outlet_concentration, 'b', 'LineWidth', 2);
xlabel('Time'); ylabel('Outlet Concentration');
title('Outlet Concentration vs. Time (Pulse Response)');
%grid on;
%% Function: ODE System using WENO5
function dudt = WENO5_ODE(t, u, a, dx, Nx, T_injection)
% Apply step injection at inlet for t <= T_injection
u_bc = u; % Copy u for boundary modification
if t <= T_injection
u_bc(1) = 1; % Inject concentration at x = 0
else
u_bc(1) = 0; % Stop injection
end
% Compute the spatial derivative using WENO5
dudt = WENO5_Advection(u_bc, a, dx, Nx);
end
%% Function: WENO5 Advection Term (With a For Loop)
function dudx = WENO5_Advection(u, a, dx, Nx)
% Extend solution for boundary conditions (Zero-Gradient at outlet)
u_ext = [u(1); u(1); u; u(end); u(end)]; % Left BC (Dirichlet), Right BC (Zero-Gradient)
f_ext = a * u_ext;
% Flux reconstruction at interfaces
f_half = zeros(Nx+1, 1);
index = 1:Nx;
% Compute stencil-based fluxes
f0(index) = (2/6) * f_ext(index) + (-7/6) .* f_ext(index+1) + (11/6) .* f_ext(index+2); % Left-most stencil
f1(index) = (-1/6) * f_ext(index+1) + (5/6) .* f_ext(index+2) + (2/6) .* f_ext(index+3); % Center-left stencil
f2(index) = (2/6) * f_ext(index+2) + (5/6).* f_ext(index+3) + (-1/6) .* f_ext(index+4); % Center-right stencil
% Compute smoothness indicators with correct 13/12 and 1/4 factors
beta0(index) = (13/12) * (f_ext(index) - 2*f_ext(index+1) + f_ext(index+2)).^2 + (1/4) * (f_ext(index) - 4*f_ext(index+1) + 3*f_ext(index+2)).^2;
beta1(index) = (13/12) * (f_ext(index+1) - 2*f_ext(index+2) + f_ext(index+3)).^2 + (1/4) * (f_ext(index+1) - f_ext(index+3)).^2;
beta2(index) = (13/12) * (f_ext(index+2) - 2*f_ext(index+3) + f_ext(index+4)).^2 + (1/4) * (3*f_ext(index+2) - 4*f_ext(index+3) + f_ext(index+4)).^2;
% Linear weights
d0 = 0.1; d1 = 0.6; d2 = 0.3;
% Compute nonlinear weights
epsilon = 1e-15;
alpha0 = d0 ./ (beta0 + epsilon).^2;
alpha1 = d1 ./ (beta1 + epsilon).^2;
alpha2 = d2 ./ (beta2 + epsilon).^2;
% Normalize weights
omega0 = alpha0 ./ (alpha0 + alpha1 + alpha2);
omega1 = alpha1 ./ (alpha0 + alpha1 + alpha2);
omega2 = alpha2 ./ (alpha0 + alpha1 + alpha2);
% Compute final WENO5 flux
f_half(2:Nx+1) = omega0 .* f0 + omega1 .* f1 + omega2 .* f2;
% Compute flux difference with Zero-Gradient BC at outlet (First-order Approximation)
dudx = -(1/dx) .* (f_half(2:Nx+1) - f_half(1:Nx));
end
toc
4 个评论
Sam Chak
2025-3-20
@Nick, Looks like you're trying to inject an Unit Impulse Signal?
t = linspace(0, 10, 1001);
T_injection = 0.01;
u_bc = zeros(1, numel(t));
for i = 1:numel(t)
if t(i) <= T_injection
u_bc(i) = 1; % Inject concentration at x = 0
else
u_bc(i) = 0; % Stop injection
end
end
plot(t, u_bc), grid on, axis([-2, 12, -0.2, 1.2]),
title('Impulse signal injection')
xlabel('Time / s'), ylabel('Inlet ON/OFF state')
回答(1 个)
Walter Roberson
2025-3-21
index = 1:Nx;
% Compute stencil-based fluxes
f0(index) = (2/6) * f_ext(index) + (-7/6) .* f_ext(index+1) + (11/6) .* f_ext(index+2); % Left-most stencil
The 1:Nx portion requires creating an explicit vector of values 1 through Nx, and storing those values in a variable.
Then the f_ext(index) requires recalling that vector of indexes, and building a temporary variable the same length as the index vector, and then iterating over each element of the index vector moving the corresponding source value to the output location.
This process does not benefit from the fact that the index vector is consecutive elements (well, it is not out of the question that there is an optimization step that takes that into account.) In general, the index vector could be scrambled like
index = randperm(Nx);
or in general there could be gaps such as
index = [1:50, 101:Nx];
so the general process has to work iteratively in extracting elements given by the index vector.
The f_ext(index+1) part requires recalling the index vector, building a temporary output variable the same size, and adding 1 to each element of the index vector storing the result into the temporary variable, and then afterwards iterating over each element of the temporary variable pulling the appropriate value into a temporary output location.
By way of contrast, the looping version is working with only one element at a time, and the indexing is fairly likely optimized.
Please note that if you were to write
f0 = (2/6) * f_ext(1:Nx) + (-7/6) .* f_ext(2:Nx+1) + (11/6) .* f_ext(3:Nx+2); % Left-most stencil
then the optimizer can know that consecutive elements of f_ext are being used, which can be noteably more efficient in moving data around.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Ordinary Differential Equations 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
