Perform Google Search in Matlab
43 次查看(过去 30 天)
显示 更早的评论
Hi!
I am trying to figure out how to perform a Google search automatically in matlab and save the results in an array.
Say I wanted to save the paths to the pdf files: "site:www.cnn.com filetype:pdf"
Some answers in the list should then be:
...
I have seen some scripts (links below) but unfortunately they are outdated or simply do not work. I am guessing it may be possible to do this but I cannot seem to figure it out. Any assistance would be very welcome!
Links:
3 个评论
Joel Handy
2019-6-10
After doing some more research, it looks like scraping (thats what we are doing, scraping googles search results) is against their terms of service and they actively attempt to thwart it. That would explain why some older tools are no longer maintained. I'm not a web expert, There appear to be ways of doing what you want but I dont think any of them are simple.
Sorry I couldnt be more help.
回答(3 个)
Monika Phadnis
2019-6-27
I followed the example given on this link to extract data from the url.
As for the url, I used " http://www.google.com/search?q=cnn.com+filetype%3Apdf " this as the url parameter for webread for the example given by you. This gives string array of the href links, you can try parsing the array for the required links.
In my output strings starting with " /url " had the search links.
0 个评论
KARTIK GURNANI
2020-5-21
This Does seem true.
Ps :
Microsoft introduced this feature to prevent Other Web engines from copying their data {Search Results } on Bing way before Google.
It seems like we would be violating TOS on google and bing .
I tried.
I got Partial Results.
The best possible way would be to use Matlab to build a Neural Network which Runs search Querries from a system with Dynamic IP.
@AndrewNg might shed some better light on this.
There is a possible solution to this .
But , the Biggest issue of it all :
Google and Bing {Microsoft} might label your ip address as spam or bot .
Which Means , No netflix , No Hulu , No other streaming Service.
You might get locked out of Even Reading News on certain websites.
Hell , even simple web searches you might end up solving Recaptcha or the Newer Version : ImageCaptcha.
Dynamic IP will help in this case but check with your ISP before attempting this.
You might lose the Security or your Plan may get suspended .
>>It will take the ISP a lot of man hours to get that single IP cleaned up : Removed from Blacklist across most filters.
>>You would mostly increase their headache.
##
Note :
I have created a matlab script that can work your search querry.
I am not sure about posting it here.
The issue being you can only run it :
Single Search Query
It works but crawling takes a while , then use of postcript to convert to pdf .
Better when saving to HTML file with images.
If anyone would like the script , please let me know.
The script is only for educational terms.
Do not use it to violate TOS of any organization.
Good Luck & Stay Safe,
Kartik
2 个评论
David Chen
2020-5-27
编辑:David Chen
2020-5-27
"If anyone would like the script , please let me know."
I want.
DGM
2024-9-18
Here's a basic example. I'm pretty sure there are other ways of doing this, but the docs are a confusing maze. Last I checked, DDG's API wasn't even complete enough to be useful for anything.
% your query string
query = '+site:www.cnn.com banana';
% your google custom search key, etc
% https://developers.google.com/custom-search/v1/overview
% https://developers.google.com/custom-search/v1/introduction
% https://developers.google.com/custom-search/docs/tutorial/creatingcse
% free accounts are limited to 10 results per query, 100 queries per day
% there are also rate limits
apikey = 'your_key_goes_here'; % API key
cx = 'your_cx_goes_here'; % CSE identifier
% search setup
wopt = weboptions('contenttype','json');
url = ['https://customsearch.googleapis.com/customsearch/v1?cx=' cx '&key=' apikey '&q=' query '&num=10'];
% try to perform the search
try
S = webread(url,wopt);
catch
% this might also happen if API call is broken somehow
fprintf('Connection error. Web search failed.\n')
return;
end
% extract the urls
if isfield(S,'items')
items = S.items;
% depending on the results, items is either a struct array
% or a cell array of dissimilar structs
if isstruct(items)
urllist = {items.link}.';
else
urllist = cellfun(@(x) x.link,items,'uniform',false);
end
else
fprintf('No results.\n')
return;
end
urllist
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Google 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!