optimizing my code using vectorization methods or avoiding for loops

Question

0 个投票

Hi everyone i'm wondering is there any way to optimize this code to run faster by using vectorization Methods or avoiding loops or any other methods . Here i have a very large text file (up to 2.5 G ) that must be readed and compared line by line with another file in .xlsx format .It take me eternal to run and i'm also worried that the memory will not be enough because the result of calculation will be much bigger than the .txt file

the text file is something like this:

1567683075.081675 800002C1 1100000000000000

1567683075.082312 80000189 7437060000843B00

also same structure with 3 column as time and hex and about 10 million row which will be 2 G

and the excel file has 800 row and 17 column like this:

'1' '800002CA' 'EBC1' 'nodata' 'ASRECA_1' 'nodata ' '1' '0' 'NaN' 'NaN' 'NaN' 'NaN' 'NaN' 'NaN' 'NaN' '4' '0,5'

as I said the second column of text file will be compared with the second column in excel file and some calculation happend . so this should be done for all the rows in text file and the result will be stored in a structure .

I'm so far with the code and i want to know how can i replace this 2 for loops because the second one will be irritated 10 million time as length of c is the same with text.

Thank you

fileID = fopen('day_29_08.txt');
text = textscan(fileID,'%s %s %s');
fclose(fileID);
length_text=length(text{1,3})%% it will be 10 million rows
excel_data = readtable('List.xlsx');
excel_id = table2cell(excel_data(:,2));
excel_signal_name = table2cell(excel_data(:,5));
length_excel=length(excel_data);%% 800 Rows
for i=1:length_excel
    c=strcmp(excel_id{i},text{1,2});%% compare every id in excell with every 10 million rows of text
    for j=1:length(c)
        if c(j)
            %% hier i need the index of c for calculation and assignments
        end
    end
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Spencer Chen 2020-2-3

在 MATLAB Online 中打开

0 个投票

I would first use the "profile" function to check which part of the code is actually taking a long time. Then tackle those lines.

Secondly, maybe some of your for-loops are inevitable, but you can do it more efficiently. One way to do that is reduce the number of iterations you would have to do.

For example -- your big i-for-loop. You are looping your text file, which according to you needs to run 20 million iterations. Consider then, what if you can loop your excel file instead? That only has 800 rows, much less than 20 million. How can modify your code to loop it that way?

Thirdly, you have may conditional assignments within your loop. Instead of doing it item by item inside the loop, think about how you can do this outside the loop in a vectorized form, e.g. use your for-loop to identify those matched_data_indices, then you have a vectorized your assignments:

matched_data_indices;  % from loop
flag = isnan(excel_bit(matched_data_inices)) && isnan(excel_signalbyte_2(matched_data_inices));
dec_struct.hex = ...

Lastly, you may not need to do a for-loop to find matching data.

Blessings,

Spencer

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

optimizing my code using vectorization methods or avoiding for loops

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

类别

产品

标签

Community Treasure Hunt

optimizing my code using vectorization methods or avoiding for loops

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

更多回答（0 个）

类别

产品

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论