if-else statement to check the claim identity of URL

1 次查看(过去 30 天)
How to check whether there is more than 1 URL (2 or 3 URLs...) exist in 1 URL? My purpose for this feature is to check whether there is 2 or 3...URLs hide within 1 URL, if yes then return 1, else return 0. e.g. www.abc.com/www.koko.my, http://www.abc.com=https://www.koko.my, www.abc.com.www.koko.my....etc. Here is my code, I face prob in checking the condition of URL. I have about 100++ data which save as 'URL' file. Then I want that data use 'is_double_url' function to check the results
| *is_double_url.m* |
function out = is_double_url(url_path1)
f1 = strfind(url_path1,'www.');
if isempty(f1)
out = 0;
return;
end
f2 = strfind(url_path1,'/');
f3 = bsxfun(@minus,f2,f1');
count_dots = zeros(size(f3,1),1);
for k = 1:size(f3,1)
[x,y] = find(f3(k,:)>0,1);
str2 = url_path1(f1(k):f2(y));
if ~isempty(strfind(str2,'..'))
continue
end
count_dots(k) = nnz(strfind(str2,'.'));
end
out = ~any(count_dots(2:end)<2);
if any(strfind(url_path1,'://')>f2(1))
out = true;
end
return;
| *f10.m* |
data = importdata('url');
[sizeData b] = size(data);
for i = 1:sizeData
feature10(i) = is_double_url(data{i});
end

回答(1 个)

Walter Roberson
Walter Roberson 2014-3-21
This turns out to be quite tough to get right.
You need to consider percent-encoding, and UTF-8 encoding, and Unicode strings, Then you have to worry about Internationalized Domain Name encoding.
Note: your example,
http://www.abc.com=https://www.koko.my
is not a valid URL. The "com=https:" would be considered to be all one component, but neiter "=" nor ":" are permitted as characters in host name components.

类别

Help CenterFile Exchange 中查找有关 Workspace Variables and MAT-Files 的更多信息

标签

尚未输入任何标签。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by