code with the same function

The code below gives me the right amount of how many times a letter repeats itself in a large text.txt.
I wanted another simple code, but that would do the same thing as this, in case it gave me the number of letters in a text (A = number of letters a, B = number of letters b and so on.)
if there is no simpler than this, accept another more complicated or the same level of difficulty.
fileread('mytextfile.txt')
data = fileread('mytextfile.txt');
nnz(data=='A')
nnz(ismember(data,'A'))

回答(2 个)

[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A, accumarray(AA, 1)].')

8 个评论

an error appeared:
Error using horzcat
Dimensions of arrays being concatenated are not consistent.
It is a bit easier to resolve the error in his previous answer:
%random test data instead of fileread:
%data=char(randi([64 65+25],1,40));data(data==64)=' ';
data = fileread('mytextfile.txt');
[a, ~, aa] = find(accumarray(reshape(double(data),[],1), 1));
fprintf('%c = %d\n', [a(:).'; aa(:).']);
fprintf('%c = %d\n', [0+A(:), accumarray(AA, 1)].')
Curiously, this doesn't seem to work for documents as large as a Bible translation (which seems to be the goal). I have attached a public domain translation for testing. Notice the difference between the two methods for lower case common letters. The accumarray seems to cap out at 65535.
data=fileread('WEB.txt');
clc
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A(:), accumarray(AA, 1)].')
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
double(char_list).'
Otherwise the char data type has priority over numeric in determining the data type of the concatenation.
Despite of its name, char_list is already a double. I didn't notice your last edit with 0+A(:), so that is why that method is capped (as chars are capped to 16 bit).
I did the 0+ after you (correctly) mentioned about the 65535.
Rik
Rik 2019-4-3
编辑:Rik 2019-4-4
There are two easy options: a loop and a histogram:
%for loop method:
data = fileread('mytextfile.txt');
letters='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
counts=zeros(1,numel(letters));
for n=1:numel(letters)
counts(n)=nnz(data==letters(n));
end
%histogram method:
data = fileread('mytextfile.txt');
counts=histc(data,65:(65+25));

4 个评论

Sorry for the doubt that you must be a beast, but I am a beginner in MATLAB, but what is the 65 and 0 25 in the histogram?
Those are the ASCII value of A and the number letters in the alphabet (minus 1). But you should probably be using something like this:
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Your code is really incredible, but I also wanted something as simple as the code of my question that counted one letter at a time, but I will certainly study your code as well as the others who answered in order to learn more about MATLAB
The edited for-loop method should be a bit easier to understand.

此问题已关闭。

关闭:

2021-8-20

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by