How to extract info from a chemical formula
30 次查看(过去 30 天)
显示 更早的评论
Hi All, I want to break down a chemical formula into its constituents. For example: Silicon Dioxide = SiO2 I want to take the string 'SiO2' and parse it so that I know I have 1 silicon and 2 oxygens. I want to do this for more complex compounds as well, say Polyimide which is 'C22H10N2O5'. So I need to handle upper case and lower case letters, and 1 or 2 digit numbers. Any help would be much appreciated.
Pat
0 个评论
回答(4 个)
Fangjun Jiang
2011-8-12
Use the combination of isstrprop() and regexp() might help. You need to provide more examples and explain what you want.
str='C22H10N2O5'
num=regexp(str,'\d+','match')
isstrprop(str,'alpha')
isstrprop(str,'digit')
isstrprop(str,'upper')
One solution:
str='C22H10PuCrN2O5';
[EleList,Trash,EleEnd]=regexp(str,['[','A':'Z','][','a':'z',']?'],'match');
[Num,NumStart]=regexp(str,'\d+','match');
NumList=ones(size(EleList));
Index=ismember(EleEnd+1,NumStart);
NumList(Index)=cellfun(@str2num,Num);
3 个评论
Fangjun Jiang
2011-8-12
Please refresh my memory. Is it true that the element can only have two or one letter? If it is two-letter, is it true that the first letter is always uppercase and the second letter is lowcase?
Paulo Silva
2011-8-12
look at any periodic table that you might find online, all symbols should be there
Paulo Silva
2011-8-12
That's not easy to do, for example not all formulas have the constituents separated by a number, you also need to have all possible constituents in a list so you can identify them in any formula and after it check if there's a number after each constituents.
1 个评论
Kelly Kearney
2011-8-12
Well, assuming he's not dealing with any of those U** elements at the upper end of the periodic table, then all elements consist of either one capital letter or a capital and lowercase letter. So it should be pretty easy to pick those out. Will you always have the base formula, or will it be arranged structurally (i.e. Si(OH)4, or SiO4H4?)
Patrick Knapp
2011-8-12
1 个评论
Fangjun Jiang
2011-8-12
Nice! I couldn't resist coming up with a no-loop solution. See my updated answer.
phenan08
2023-1-26
If it can help, I wrote a formula string parser to determine the composition of a molecule, element by element.
It is possible to use semi-developped formulas, and the script returns 4 outputs: the raw molecular formula, the composition table (the different elements with their counts), the average MW and the monoisotopic mass.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!