How do I count and save twitter hashtags?
1 次查看(过去 30 天)
显示 更早的评论
I am writing a script that analyzes the hashtags from tweets that I saved in a text file. So far I managed to count the amount of hashtags in the file:
fid = fopen('Tweets.txt');
numberOfTweets = 0;
while i ~= -1
i = fgetl(fid);
numberOfTweets = numberOfTweets + 1;
end
numberOfTweets = numberOfTweets - 1;
frewind(fid)
for i = 1:numberOfTweets
twitterStuff{i} = fgetl(fid);
end
numberOfHash = 0;
for i = 1:numberOfTweets
if(strfind(twitterStuff{i}, '#') ~=0);
c = strfind(twitterStuff{i}, '#');
[rowHash columnHash] = size(c);
numberOfHash = numberOfHash + columnHash;
end
end
Now, I want to find out what the specific hashtags are and save them into a cell array, but I don't really know how to do that.
2 个评论
Walter Roberson
2012-12-14
Is # by itself a hashtag? Is #this#that with no spaces two hashtags? Is #35 a valid hashtag? Is #? a valid hashtag?
采纳的回答
Jonathan Epperl
2012-12-14
编辑:Jonathan Epperl
2012-12-14
You should use regular expressions for that, you can do pretty much anything with them. This should do what you want to, and if not, then it should point you in the right direction:
s = '#Matlab#2012b rocks my #sox # off!'
% Match a '#' with zero or more characters that aren't whitespace or '#' after it
T = regexp(s,'(#[^ #]*)','tokens')
T{:}
% Match a '#' with 1 or more characters that aren't whitespace or '#' after it
T = regexp(s,'(#[^ #]+)','tokens')
T{:}
% Match a '#' with 1 or more characters that aren't whitespace or '#' after
% it, but don't capture the '#'
T = regexp(s,'#([^ #]+)','tokens')
T{:}
0 个评论
更多回答(2 个)
Sean de Wolski
2012-12-14
编辑:Sean de Wolski
2012-12-14
Using regular expressions:
str = '#MATLAB is an awesome product by #MathWorks';
[matchstart,matchend,~,hashtag] = regexp(str,'(\#(\w*))')
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 String Parsing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!