Error using matlab.int​ernal.webs​ervices.HT​TPConnecto​r/copyCont​entToByteA​rray (line 396) The server returned the status 429 with message "Too Many Requests" in response to the request to URL.

17 次查看(过去 30 天)
I am writing a script that will take my protein sequence of interest and find matches to it using NCBI blast. I identify the hits and then try and get the sequence for each (in code this is 'j=1:.....'). This works fine, but when I get to number 1226, I get the error below. I have tried a couple of things:
  • Pausing longer between iterations of my loop did not help
  • Manually doing this particular one in the command window returned the same error, though if I go to the NCBI website and find the protein, there's no problem.
Any advice would be greatly appreciated!!
THE ERROR:
Error using matlab.internal.webservices.HTTPConnector/copyContentToByteArray
(line 396)
The server returned the status 429 with message "Too Many Requests" in response
to the request to URL
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucest&term=NXO81641%5BAccession%5D.
Error in readContentFromWebService (line 46)
byteArray = copyContentToByteArray(connection);
Error in webread (line 125)
[varargout{1:nargout}] = readContentFromWebService(connection, options);
Error in getncbidata>accession2gi (line 316)
searchXML = webread(searchurl);
Error in getncbidata (line 182)
[giID,db] = accession2gi(accessnum,db,'quick');
Error in getgenpept (line 64)
[varargout{1:nargout}] =
getncbidata(accessnum,'database','protein','fileformat','GenPept',varargin{:});
75 rethrow(e)
MY CODE:
%% Get a list of everything that aligns with TPC2 (and must be listed as TPC2)
POI = 'MAEPQAESEPLLGGARGGGGDWPAGLTTYRSIQVGPGAAARWDLCIDQAVVFIEDAIQYRSINHRVDASSMWLYRRYYSNVCQRTLSFTIFLILFLAFIETPSSLTSTADVRYRAAPWEPPCGLTESVEVLCLLVFAADLSVKGYLFGWAHFQKNLWLLGYLVVLVVSLVDWTVSLSLVCHEPLRIRRLLRPFFLLQNSSMMKKTLKCIRWSLPEMASVGLLLAIHLCLFTMFGMLLFAGGKQDDGQDRERLTYFQNLPESLTSLLVLLTTANNPDVMIPAYSKNRAYAIFFIVFTVIGSLFLMNLLTAIIYSQFRGYLMKSLQTSLFRRRLGTRAAFEVLSSMVGEGGAFPQAVGVKPQNLLQVLQKVQLDSSHKQAMMEKVRSYGSVLLSAEEFQKLFNELDRSVVKEHPPRPEYQSPFLQSAQFLFGHYYFDYLGNLIALANLVSICVFLVLDADVLPAERDDFILGILNCVFIVYYLLEMLLKVFALGLRGYLSYPSNVFDGLLTVVLLVLEISTLAVYRLPHPGWRPEMVGLLSLWDMTRMLNMLIVFRFLRIIPSMKLMAVVASTVLGLVQNMRAFGGILVVVYYVFAIIGINLFRGVIVALPGNSSLAPANGSAPCGSFEQLEYWANNFDDFAAALVTLWNLMVVNNWQVFLDAYRRYSGPWSKIYFVLWWLVSSVIWVNLFLALILENFLHKWDPRSHLQPLAGTPEATYQMTVELLFRDILEEPGEDELTERLSQHPHLWLCR'
%Blast the inputted sequence against NCBI protein. Return 5000 results.
[blastsend, waittime]=blastncbi(POI, 'blastp', 'MaxNumberSequences', 5000);
%Get a copy of the blast report.
tm = string(datetime('now'));
blastresultsfile=string('blastresults'+tm+'.xml');
getblast(blastsend, 'Wait', waittime, 'ToFile', blastresultsfile);
%%
%Read in the blast results.
BlastFile=blastread('blastresults.xml');
BlastFileTb=struct2table(BlastFile.Hits);
IdentityFile=strings(1);
%Get the Identity value!
for b=1:height(BlastFileTb)
IdentityFile(1,b)=((BlastFile.Hits(b).Hsps(1).Identities)/752)*100;
end
IdentityFile=IdentityFile';
%Identify the different elements of the blast report.
Definition=table(BlastFileTb.Definition);
Definition.Properties.VariableNames={'Definition'};
ID=table(BlastFileTb.ID);
ID.Properties.VariableNames={'ID'};
Accession=table(BlastFileTb.Accession);
Accession.Properties.VariableNames={'Accession'};
Identity=table(IdentityFile);
Identity.Properties.VariableNames={'Identity'};
ExtractedBlast=[ID, Definition, Accession, Identity];
%%
TPC2_list1=ExtractedBlast(contains(ExtractedBlast.Definition, 'protein 2'),:);
TPC2_list2=ExtractedBlast(contains(ExtractedBlast.Definition, 'TPC2'),:);
TPC2_totallist=[TPC2_list1;TPC2_list2];
specieslist=strings(1);
seqlist=strings(1);
%%
% Now we have a list of all the TPC2 on ncbi. I want to identify a) all the
% the speices wirth it and b) also be ab;e to check box the species I want
% to focus on.
for i=1:height(TPC2_totallist)
%Take the description of the target
thisentry=string(table2array(TPC2_totallist(i,2)));
%Find the name of the species by looking between square brackets
findingspecies=split(thisentry, '[');
findingspecies2=findingspecies(contains(findingspecies, ']'));
findingspecies3= split(findingspecies2(1), ']') ;
speciesname=findingspecies3(1);
%Make a list of all the species
specieslist(i,1)=speciesname;
%Remove any repeats from the species list.
unique_specieslist=unique(specieslist);
end
%
%%
%Add the species list to the table we've been working from
TPC2_amendedlist=[TPC2_totallist, array2table(specieslist)];
%%
%% **THIS IS WHERE I HAVE THE PROBLEM, WITH NUMBER 1226
for j=1:height(TPC2_amendedlist)
%For each entry, get the accession number
accno=string(table2array(TPC2_amendedlist(j,3)));
%Use the accession number to get the FASTA online. Put it in a list for
%now ('seqlist')
seq=getgenpept(accno, 'SequenceOnly', true) ;
seqlist(1,j)=seq;
%This step needs a pause ( 1 sec) because the server will ping too often if not
%and reject the request. An extra pause on every tenth iteration. 'j' indicates what iteration we're on.
j
pause(.5);
% if mod(j, 10)==0
% pause(10);
% end
end

回答(1 个)

Tarunbir Gambhir
Tarunbir Gambhir 2020-10-27
I tried to execute your code and got a similar error stacktrace, after retrieving 331 sequences. The problem is as described in the error description, it is a restriction at the NCBI server and nothing wrong with your script.
For your case, I would suggest to try and break up the work into segments of 1225 sequences each and run the script again for each segment. I understand this is not a solution, but it might be a workaround to your problem.

类别

Help CenterFile Exchange 中查找有关 Data Import and Export 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by