code with the same function
信息
此问题已关闭。 请重新打开它进行编辑或回答。
显示 更早的评论
The code below gives me the right amount of how many times a letter repeats itself in a large text.txt.
I wanted another simple code, but that would do the same thing as this, in case it gave me the number of letters in a text (A = number of letters a, B = number of letters b and so on.)
if there is no simpler than this, accept another more complicated or the same level of difficulty.
fileread('mytextfile.txt')
data = fileread('mytextfile.txt');
nnz(data=='A')
nnz(ismember(data,'A'))
0 个评论
回答(2 个)
Walter Roberson
2019-4-3
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A, accumarray(AA, 1)].')
8 个评论
Walter Roberson
2019-4-3
Note that I had already answered you on this matter at https://www.mathworks.com/matlabcentral/answers/453555-help-me-please-please?s_tid=prof_contriblnk#answer_368356
Gabriel Cunha
2019-4-3
编辑:per isakson
2019-4-4
Rik
2019-4-3
It is a bit easier to resolve the error in his previous answer:
%random test data instead of fileread:
%data=char(randi([64 65+25],1,40));data(data==64)=' ';
data = fileread('mytextfile.txt');
[a, ~, aa] = find(accumarray(reshape(double(data),[],1), 1));
fprintf('%c = %d\n', [a(:).'; aa(:).']);
Walter Roberson
2019-4-3
编辑:Walter Roberson
2019-4-4
fprintf('%c = %d\n', [0+A(:), accumarray(AA, 1)].')
Rik
2019-4-4
Curiously, this doesn't seem to work for documents as large as a Bible translation (which seems to be the goal). I have attached a public domain translation for testing. Notice the difference between the two methods for lower case common letters. The accumarray seems to cap out at 65535.
data=fileread('WEB.txt');
clc
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A(:), accumarray(AA, 1)].')
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Walter Roberson
2019-4-4
double(char_list).'
Otherwise the char data type has priority over numeric in determining the data type of the concatenation.
Rik
2019-4-4
Despite of its name, char_list is already a double. I didn't notice your last edit with 0+A(:), so that is why that method is capped (as chars are capped to 16 bit).
Walter Roberson
2019-4-4
I did the 0+ after you (correctly) mentioned about the 65535.
There are two easy options: a loop and a histogram:
%for loop method:
data = fileread('mytextfile.txt');
letters='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
counts=zeros(1,numel(letters));
for n=1:numel(letters)
counts(n)=nnz(data==letters(n));
end
%histogram method:
data = fileread('mytextfile.txt');
counts=histc(data,65:(65+25));
4 个评论
Gabriel Cunha
2019-4-4
Rik
2019-4-4
Those are the ASCII value of A and the number letters in the alphabet (minus 1). But you should probably be using something like this:
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Gabriel Cunha
2019-4-4
Rik
2019-4-4
The edited for-loop method should be a bit easier to understand.
此问题已关闭。
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!