- Sythesized speech is usually cleaner (less noise) than human speech.
- Synthesize speech usually says the same word the same way each time. Human speech seldom does
- Human speech does much more blending -- modification of the initial sounds of a word depending on the sounds at the end of the previous word. Some of this is just smooth movement between sounds being easier than sudden movement, but humans tend to modify the sounds themselves, in ways that you can notice if you really listen but which you might have trouble expressing
- If you can get the voice to say "Merry Mary, marry", and you can clearly understand which word is which, then probably it is AI. If two of the words come out exactly the same, then probably it is AI. If some of the words come out almost but not quite exactly the same and you have trouble saying what the difference is, then the voice might be human. (There are large regional differences in how the words get said, but it takes speech synthesis to make them exactly the same.)
- Try it on homonyms. For example, recently I told Alexa to play one of Elton John's albums, and it said that it was going to play "Live in Australia", with a short i (the verb form, as in, "I live in Canada"), instead of using the long i adverb form, "Filmed in front of a live audience")
AI Speech vs Human Speech
6 次查看(过去 30 天)
显示 更早的评论
Is it possible to use matlab to detect whether a human or AI voice is talking? If so, can someone give me links to assist.
3 个评论
Walter Roberson
2019-4-17
The first two items I posted are obviously actionable:
- Measure noise in the signal. More noise would tend to imply human.
- Find copies of the same word and compare them to see how similar they are. You might use mfcc to recognize words, and then once recognized, isolate the words from the stream, and xcorr. High cross correlation makes it more likely that it is AI. You might have a look at dynamic time warping: the less warping that is needed, the more likely that it is AI generated, since AI is less likely to have micro-changes in timing.
回答(1 个)
Gagan Agarwal
2024-5-30
Hi Brantley
Yes, it's possible to use MATLAB to detect whether a sound is produced by a human or an AI-generated voice. This task falls under the broader category of audio analysis and machine learning.
Here's a high-level overview of how you might approach this problem:
- Collect a dataset that includes both human and AI-generated voices. The dataset should be large and diverse enough to train a robust model.
- Audio data generally requires preprocessing before it can be used for training a model. This might involve converting the audio files into a uniform format, sampling rate normalization etc.
- Choose the deep learning model for training.
- After training evaluate the performance of the model.
I hope it helps!
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!