AI Speech vs Human Speech

Question

Brantley 2019-4-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/456823-ai-speech-vs-human-speech

回答： Gagan Agarwal 2024-5-30

Is it possible to use matlab to detect whether a human or AI voice is talking? If so, can someone give me links to assist.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Walter Roberson 2019-4-17

Not if it is a sufficiently good AI program.

But until then:

Sythesized speech is usually cleaner (less noise) than human speech.
Synthesize speech usually says the same word the same way each time. Human speech seldom does
Human speech does much more blending -- modification of the initial sounds of a word depending on the sounds at the end of the previous word. Some of this is just smooth movement between sounds being easier than sudden movement, but humans tend to modify the sounds themselves, in ways that you can notice if you really listen but which you might have trouble expressing
If you can get the voice to say "Merry Mary, marry", and you can clearly understand which word is which, then probably it is AI. If two of the words come out exactly the same, then probably it is AI. If some of the words come out almost but not quite exactly the same and you have trouble saying what the difference is, then the voice might be human. (There are large regional differences in how the words get said, but it takes speech synthesis to make them exactly the same.)
Try it on homonyms. For example, recently I told Alexa to play one of Elton John's albums, and it said that it was going to play "Live in Australia", with a short i (the verb form, as in, "I live in Canada"), instead of using the long i adverb form, "Filmed in front of a live audience")

Brantley 2019-4-17

How would you use matlab to determine if the AI or human is talking?

Walter Roberson 2019-4-17

The first two items I posted are obviously actionable:

Measure noise in the signal. More noise would tend to imply human.
Find copies of the same word and compare them to see how similar they are. You might use mfcc to recognize words, and then once recognized, isolate the words from the stream, and xcorr. High cross correlation makes it more likely that it is AI. You might have a look at dynamic time warping: the less warping that is needed, the more likely that it is AI generated, since AI is less likely to have micro-changes in timing.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Gagan Agarwal 2024-5-30

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/456823-ai-speech-vs-human-speech#answer_1465366

Hi Brantley

Yes, it's possible to use MATLAB to detect whether a sound is produced by a human or an AI-generated voice. This task falls under the broader category of audio analysis and machine learning.

Here's a high-level overview of how you might approach this problem:

Collect a dataset that includes both human and AI-generated voices. The dataset should be large and diverse enough to train a robust model.
Audio data generally requires preprocessing before it can be used for training a model. This might involve converting the audio files into a uniform format, sampling rate normalization etc.
Choose the deep learning model for training.
After training evaluate the performance of the model.

I hope it helps!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

AI Speech vs Human Speech

3 个评论
显示 1更早的评论隐藏 1更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

AI Speech vs Human Speech

3 个评论 显示 1更早的评论隐藏 1更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论