MATLAB Answers

Yohahn Jo
13

How do I get my MATLAB editor to read UTF-8 characters? UTF-8 characters in blank squares in editors, but in the command window and workspace works fine.

Asked by Yohahn Jo
on 26 Apr 2016
Latest activity Commented on by rubinius on 13 May 2018
I have a project, which is commented in UTF-8 characters.
I tried changing the system locale on my Windows 10, however MATLAB editor is not recognizing UTF-8 characters(in blank squares). I'm not sure what to do here.
If I open the same .m file in text editor, it works fine.
How do I get my MATLAB editor to read UTF-8? Thank you
feature('DefaultCharacterSet')
feature('locale')
ans =
UTF-8
ans =
ctype: 'en_US.windows-1252'
collate: 'en_US.windows-1252'
time: 'en_US.windows-1252'
numeric: 'en_US_POSIX.windows-1252'
monetary: 'en_US.windows-1252'
messages: 'en_US.windows-1252'
encoding: 'windows-1252'
terminalEncoding: 'windows-949'
jvmEncoding: 'Cp1252'
status: 'MathWorks locale management system initialized.'
warning: 'System locale setting, ko_KR, is different from user locale setti…'

  3 Comments

If you find out the answer, let me know! But plenty of other comments around make it seem that UTF-8 just doesn't work in the MATLAB built-in editor in Windows 10.
Personally, I've been using Notepad++. By 2014b, UTF-8 works fine except for in the IDE/editor. I can do almost everything with the .m file, e.g. run and debug in MATLAB, make graphs. You can even view the file in the editor as long as you don't mind seeing a bit of messy encoding errors. But don't ever save the file with the built-in editor! It screws up the encoding. From time to time a UTF-8 file of mine gets screwed up by the editor, so always have a backup copy!
As far as I can tell, the problem is that the MATLAB editor saves files with ANSI encoding, not UTF-8. Maybe one day they will allow us to control this in the preferences for the Editor/Debugger.
I'm having the same issue for a long time. It seems that sometimes it can handle Chinese characters when I change my system into Chinese default language. But in English system they just becomes a lot of question marks.
I'm having the same problem. It seems that editing the lcdata.xml file to change the encoding of en_US has no effect at all. It stays windows-1252 no matter what I change it to in the file.

Sign in to comment.

2 Answers

Answer by Jinghao Lei on 20 Oct 2016
 Accepted Answer

I have a very tricky way to solve this problem. And it seems works. In my case, (windows matlab 2016b x64)
feature('locale')
always output below even I have modified lcdata.xml
ctype: 'zh_CN.GBK'
...
so, I delete this in lcdata.xml (in codeset)
<encoding name="GBK">
<encoding_alias name="936">
</encoding>
then I change following
<encoding name="UTF-8">
<encoding_alias name="utf8"/>
</encoding>
to
<encoding name="UTF-8">
<encoding_alias name="utf8"/>
<encoding_alias name="GBK"/>
</encoding>
The point is cheat matlab GBK is just alias of utf8

  5 Comments

For MATLAB2017a users, as noticed there is empty in lcdata.xml except some comments. So we should first rename the lcdata_utf8.xml to lcdata.xml, then we do what Jinghao Lei said.
I tried to do the opposite on Linux (I work on a collaborative project under both platforms) without success; but batch-converting everything to UTF-8 and cheating MATLAB on Windows to work with UTF-8 works like a charm...
I still have to add:
feature('DefaultCharacterSet','UTF-8');
to my main script for things to display properly on GUI elements when running under Windows.
The editor shows correct characters in UTF-8. However, When I type
help myfunction
or view myfunction in the help browser, the function comments coded in UTF-8 does not show properly.
Is there a way to resolve this problem?

Sign in to comment.


Answer by Michael Cappello on 31 Oct 2017

% read in the file fID = fopen(filename, 'r', 'n', 'UTF-8'); bytes = fread(fID); fclose(fID);
% The data read from the file can then be converted into Unicode characters, like so: unic = native2unicode(bytes, 'UTF-8');
% if you want, clear the Carriage Returns, set the Line Feeds to a space unic(unic == 10) = []; unic(unic == 13) = ' ';
disp(unic'); % display the Unicode text

  0 Comments

Sign in to comment.