Arabic document

Hello Everyone. Please, I want to know if you can read Arabic document in matlab. Arabic is install on my computer, and when I try to read the file it gives me: {'المملكة' 'المغربية'} is that you have an idea please??

 采纳的回答

Walter Roberson
Walter Roberson 2011-4-28

0 个投票

How are you reading the file, and how are you displaying it? What is your locale set to? What is your font set to?

3 个评论

I was thinking that perhaps Arabic was outside the range of characters representable in MATLAB, but it is not. The chart is at http://en.wikipedia.org/wiki/Arabic_%28Unicode_block%29
Please try
dec2hex(0 + doc{1})
and look to see if the values that were stored look appropriate for the text. If they do, then it is a problem with the displaying of the text rather than the reading of it. If the values look wrong, then you may need to explore the possibility that the file is not UTF-8 encoded.
It doesn't help to say "Its not working". Please show the first line of the output of dec2hex(0 + doc{1}) and indicate the unicode code points for the first 16 or so characters you are expecting in the file. Also, please change your 'r' option to 'rt' so that you are working with text instead of binary.
Please also execute this and indicate the output:
fid = fopen('arabe.txt','r');
dec2hex(0 + fread(fid, 32, '*uint8'));
fclose(fid);
If that is the entire output, then your file is only 5 bytes long. I need a longer sample than that to debug this problem.
I also still need the first line of the output of dec2hex(0 + doc{1}), and the first few unicode code points of what you are expecting. Unfortunately this forum is not able to support posting arabic directly so you will have to look up the characters in the wikipedia article I referenced and write them down manually.

请先登录,再进行评论。

更多回答(6 个)

If I am correct about the file having been double-encoded, then:
fid = fopen('arabe.txt','r');
inputtext = char(native2unicode(fread(fid)));
fclose(fid)

21 个评论

thie code give me this result:
{
fid = fopen('arabe.txt','r');
inputtext = char(native2unicode(fread(fid)));
fclose(fid)
ans =
0
>> inputtext
inputtext =
ï
»
¿
Ø
§
Ø
§
Ø
.....
}
Then I need more of the file to go on. You can find my email address on my user profile by clicking on my name.
I sent you an Arabic document on your email address.
please send me a confirmation of receipt.
and thank you
Received. I'm looking at it now.
fid = fopen('arabe.txt','r');
inputtext = native2unicode(fread(fid,'*uint8'),'UTF-16') .';
fclose(fid);
The text can then be seen by looking at inputtext
Note: you must be using a font that supports Arabic, such as Ariel Regular
Note: if applicable, your terminal must be set to decode UTF-8 . For example my terminal was set to interpret ISO-LATIN-1 by default and the characters did not come out right.
With the system I am using at the moment, the terminal automatically detected that the characters were Arabic and wrote them right to left.
I do not have a Windows system with MATLAB to test this out on; I am using a Linux-64 Matlab displaying to MAC OS-X.
I sent you an overview of the code matches that you have suggested in your email address
I looked at the image you sent. I cannot tell from that image which font you have used.
I change the encoding to: ISO-8859-1, and I use
{fid = fopen('arabe.txt','r');
inputtext = native2unicode(fread(fid,'*uint8'),'UTF-8') .';
fclose(fid);
}
with utf-8 not utf-16, and I managed to read the file,
the problem I have is to go through this file.
when I made ??InputText or InputText (1) it gives me nothing (empty)
I sent you an overview of the code in your email address
Please send a copy of the file with the changed encoding.
I do not have MATLAB for Windows, so I am not able to check using the same setup you are using.
I sent you the file in your email address.
and thank you very much for your help.
The command you used, slCharacterEncoding, is for Simulink; without simulink, the technique is to exit MATLAB, change the encoding, and re-start MATLAB.
http://www.mathworks.com/support/solutions/en/data/1-4TKQUB/index.html?solution=1-4TKQUB
Which locale are you normally in?
I don't understand your question
http://www.mathworks.com/help/techdoc/matlab_env/brj_w4w-2.html
thank you very much. I solved the problem.
I have another question because I have a java class, and I need the called from matlab.
is that you have an idea?
Please start a new Question for that topic.
Also, I think people would appreciate if you could post the solution you came up with for this one.
yes it's true, agree.
I solved the problem by changing system:
http://www.mathworks.com/help/techdoc/matlab_env/brj_w4w-2.html.
thank you very much for your help.
Which variable did you end up having to change, and what did you change it from and what did you change it to?
I use the following code:
{
fid=fopen('arabe.txt','rt');
inputtext = native2unicode(fread(fid,'*uint8'),'UTF-8') .';
fclose(fid);
i=textscan(inputtext,'%s');
}
i sent you an image in your email adress for the changing system.
It appears that najmaf changed the Windows Regional Language settings.
exactly.
when I change the format parameter in Arabic, the text is afiche

请先登录,再进行评论。

I read the file with:
fid=fopen('arabe.txt','r','n','UTF-8');
doc=textscan(fid,'%s');
fclose(fid);
doc{1}
The result is:
'ااض'
'اي'
'يمضحث'
najmaf najma
najmaf najma 2011-4-28

0 个投票

no its not working, I use other formats than UTF-8 ', but its not working well. I'm really stuck on this level.
najmaf najma
najmaf najma 2011-4-28

0 个投票

i use it: fid = fopen('arabe.txt','rt'); dec2hex(0 + fread(fid, 32, '*uint8')) fclose(fid); the result is: { ans =
EF BB BF D8 A7 }
this is the file arabe: ??? ?? ????? ??? ???? ???? ???? ??? ???? ?????
and thank you

1 个评论

I needed you to use
fid = fopen('arabe.txt','r');
dec2hex(0 + fread(fid, 32, '*uint8'));
fclose(fid);
You used 'rt' instead. I don't know if that makes a difference.

请先登录,再进行评论。

najmaf najma
najmaf najma 2011-4-28

0 个投票

sorry, I wanted to send you the file contents of my Arabic, but its not working
najmaf najma
najmaf najma 2011-4-28

0 个投票

you can use any document to test, for the resulta, I sent you that gives me, it gives me character hexadecimal

2 个评论

Yes, and I need to see _what_ those hexadecimal values are.
Wait -- is the first character of the file 0x0627, 'alif ? If so, then the file appears to be a UTF-8 encoding of a UTF-16 byte stream. The file appears to have been encoded twice!
exactly, the first character is the 'alif'

请先登录,再进行评论。

类别

帮助中心File Exchange 中查找有关 Data Type Conversion 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by