Extract email addresses from text
7 次查看(过去 30 天)
显示 更早的评论
Hello everyone!
Does anyone have a script (or know how to create one) that extracts email addresses from a text string?
Thanks in advance! Gustav
1 个评论
Steven Lord
2017-6-9
It's likely more complicated than you think. See for example this Microsoft blog and this Stack Overflow question. One of the answers on the Stack Overflow page links to a (five year old) page giving a regular expression that I suspect you could use with the regular expression functionality in MATLAB.
回答(2 个)
Stephen23
2017-6-8
编辑:Stephen23
2021-6-2
email = '[a-z_]+@[a-z]+\.(com|net)';
and adapt it to allow any domain, or whatever other requirements you have:
rgx = '[a-z0-9_]+@[a-z0-9]+(\.[a-z0-9]+)+';
C = regexpi(txt,rgx,'match');
For a slightly stricter version, you can find many regular expressions on the internet, e.g.:
rgx = '[a-z0-9._%+-]+@[a-z0-9-]+(\.[a-z0-9-]+)+'
While this simple regular expression works for simple email adresses, it is worth noting that the complete rules for checking valid email adresses are not trivial to implement with a regular expression:
A common mistake (including by this answer) is to exclude non-latin characters.
oliver
2019-2-15
编辑:oliver
2019-2-15
I think the above examples will miss quite a lot of emails, like all those containing a capital letter or things like: Peter.O'Toole@xyz.com. So my suggestion would be something like:
reg='[a-zA-Z0-9._%''+-]+@([a-zA-Z0-9._-])+\.([a-zA-Z]{2,4})';
Although with the wide variety of new TLDs nowadays, limiting the last character group to 2-4 letters may be obsolete (depending on your needs).
1 个评论
Daniele Lupo
2021-6-2
This regexp validates an invalid mail with consecutive dots, like "my.email@email...com".
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!