Help using the arrayfun() function to apply strsplit() to all entries in a string array

I'm trying to wrap my head around how the arrayfun() function works and would greatly appreciate some help with a specific example:
I have a string array of weather data.
weather_strings =
10×1 string array
"UTC,2140991,49.0"
"UTC,2140992,49.1"
"UTC,2140993,49.1"
...
I need to extract the values after the second comma (temperatures) as a 1x10 matrix of doubles, [49.0, 49.1, 49.1, ...].
I've figured out a clunky way to do this for a single entry (please let me know if there's a better way).
weather_string = weather_strings(1) % extract only the first entry
weather_string_split = strsplit(weather_string, ',') % apply strsplit() to split on commas
weather_string_split_trim = weather_string_split(:,3) % extract only 3rd column
weather_num_trim = str2num(weather_string_split_trim) % convert from string to double
But I can't seem to figure out how to use arrayfun() to apply that to every entry. I've tried:
weather_strings_split = arrayfun(strsplit(weather_strings,','), weather_strings) % apply stringsplit to split on commas, for all elements?
which gives the error message:
Error using strsplit (line 80)
First input must be either a character vector or a string scalar.
Error in test_window (line 17)
weather_strings_split = arrayfun(strsplit(weather_strings,','), weather_strings)
I'm probably missing something painfully obvious. What is it? I'm still somewhat of a beginner at coding, so I welcome you to explain it to me like I'm 5 years old.
Alternatively, if there's a clever way to extract these numbers directly from this data table (which came directly from a webread() function), I'd love to hear it. Var3 is a cell array.
weather_data_table =
10×3 table
Var1 Var2 Var3
__________ ________ __________________
2018-11-26 17:41:25 'UTC,2140991,49.0'
2018-11-26 17:42:27 'UTC,2140992,49.1'
2018-11-26 17:43:28 'UTC,2140993,49.1'
...
Again, the goal is to get just the last numbers after the second comma of Var3 into a 1D matrix.
Thanks in advance!

 采纳的回答

Try this:
for k1 = 1:size(weather_strings,1)
Col3(k1,:) = str2double(regexp(weather_strings{k1}, '\d*\.\d*', 'match'));
end
Col3 =
49.0000
49.1000
49.1000
The loop is necessary because regexp is not vectorised. It can only handle one srting at a time.

6 个评论

"The loop is necessary because regexp is not vectorised. It can only handle one srting at a time."
According to the regexp documentation the first input may be "specified as a character vector, a cell array of character vectors, or a string array. Each character vector in a cell array, or each string in a string array, can be of any length and contain any characters." I often use regexp with cell arrays containing multiple different char vectors, I don't see any reason why it should not work with string arrays, just as its documentation states it does.
>> C = {'UTC,2140991,49.0','UTC,2140992,49.1','UTC,2140993,49.1'}
>> regexp(C,'\d*\.\d*', 'match','once')
ans =
'49.0'
'49.1'
'49.1'
The regexp call threw an error with the column vector when I tried it. It works with a row vector, and OP may not want to re-format the column vector to a row vector. Also, while arrayfun definitely has its uses, when I’ve used it for problems like this, it’s been significantly slower than a simple loop, which surprised me. Thus, the loop.
Awesome, thank you so much Star Strider and Stephen Cobeldick! This works brilliantly for my temperature data.
Is there a way to write a similar regexp function that would isolate the number from the end of the line, regardless of whether or not it contains a decimal point? (Which is why I originally tried to use commas as delimiters.)
I also need the same function to clean up my Humidity data, which has whole integer values.
For example:
weather_strings =
10×1 string array
"UTC,2140991,59"
"UTC,2140992,61"
"UTC,2140993,60"
...
If the user selects Humidity data instead of Temperature data right now, I get the following error message:
Unable to perform assignment because the indices on the left side are not compatible with the size of the right
side.
Error in clean_data (line 14)
clean_weather_strings(:,k) = regexp(weather_strings{k}, '\d*\.\d*', 'match');
Error in Lab7 (line 23)
clean_weather_doubles = clean_data(weather_data_table) % give input to clean_data function, save output
I assume this is because our '\d*\.\d*' expression looks for digits separated by a period. I'm just not familiar enough with the syntax of the regexp() function to know how to set it up differently.
Thanks again!
My pleasure!
I assume this is because our '\d*\.\d*' expression looks for digits separated by a period.
Correct.
‘I'm just not familiar enough with the syntax of the regexp() function to know how to set it up differently.
The regexp funciton can act ‘logically’, so giving it a choice as to use the '\d*\.\d*' or '\d*', it will choose the correct pattern, with ‘|’ designationg a logical ‘or’.
To accomodate both, the regexp call changes to:
for k1 = 1:size(weather_strings,1)
Col3(k1,:) = str2double(regexp(weather_strings{k1}, '\d*\.\d*|\d*', 'match'));
end
Out = Col3(:,2)
This works for both when I tested it, amazingly enough!
(I still have much to learn about regexp myself.)

请先登录,再进行评论。

更多回答(1 个)

In R2016b:
>> weather_strings = string({'UTC,2140991,49.0'
'UTC,2140992,49.1'
'UTC,2140993,49.1'})
weather_strings =
3x1 string array
"UTC,2140991,49.0"
"UTC,2140992,49.1"
"UTC,2140993,49.1"
>> str2double(regexp(C,'(\d+\.)?\d+$','match','once'))
ans =
49
49.1
49.1
>>

类别

帮助中心File Exchange 中查找有关 Dates and Time 的更多信息

产品

版本

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by