decoding utf-8 type emoji codes and special characters from facebook data

48 次查看(过去 30 天)
Hi, I recently downloaded the messenger data from facebook in form of ".json" format.
This format was new for me and it was quiet interesting to load,play around the file and make it like a conversation.
The problem is with decoding the emojis. I have no idea about the format. It looked something like this..
"\u00f0\u009f\u0098\u0082 \u00f0\u009f\u0098\u0082" which, the actual emoji I used is ??.
In matlab as shown in the figure it shows some rubbish "ð ð".
After a long research in the internet, I came to know that it is Unicode-8 format. So, I tried to read the file using unicode-8 format by looking at some answers form matlab central..
clear; clc
fname = 'message_keller.json';
fid = fopen(fname, 'rb');
raw = fread(fid, '*uint8')';
str = native2unicode(raw,'UTF-8');
fclose(fid);
val = jsondecode(str);
But it still was showing "ð ð".
The above link was the method I found for decoding. But that was for powershell.
Can anyone help me decode the unicode so that it can be viewed in matlab and other softwares (curently I am planning to export the conversation to excel)..?
  4 个评论
Guillaume
Guillaume 2018-10-12
I wanted the raw json, not the stuff you've parsed when it is too late to get the right characters. You can just replace the confidential bits with xs or dots.
Or just provide the actual portion of the raw json that correspond to an actual message, e.g, one of the
{"message":{"sender_name":"Don't care","timestamp_ms":whatever,"content":"this is what I need","type":"Generic"}}
section.
Addy
Addy 2018-10-12
Opps. Sorry about that. Now I have attached the raw json file. You can look just by double clicking it.
Also, As I have mentioned before, "\u00f0\u009f\u0098\u0082" is the emoji code for ? - laughing emoji. I did not phrase it. It is in non phrased form. In the conversation I used it twice and that is why it repeats and looks like this "\u00f0\u009f\u0098\u0082 \u00f0\u009f\u0098\u0082"
I have even checked it in notepad++ the code is same..
In the new json file you can find these codes.
"\u00f0\u009f\u0098\u009b" - ?
"\u00f0\u009f\u0091\u008d" - ?
"\u00e3\u0080\u0082" - 。
and again
"\u00f0\u009f\u0098\u0082" - ?
This is the Screen shot of the conversation
Screen shot pulled from Notepad++ which you can also find it in the raw json file I have attached..

请先登录,再进行评论。

回答(0 个)

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by