According to news.mit.edu, wired.com, and newsweek in related stories, 6 researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed an algorithm that was able to view silent video clips and accurately predict realistic sounds that might appear. For example, when the researchers have shown a silent video clip of an object being hit, they use the AI to produce a sound that is realistic enough to the human ear. In order to train the algorithm according to newsweek, the researchers used 1000 different video clips containing 46,000 sounds produced with just a drumstick. In order for the AI to predict the sound of a new video (according to newsweek), the algorithm looks at the sound properties of each frame of that video and matches them with similar sounds in the AI’s database said Andrew Owens, a CSAIL PhD who was the lead author on an upcoming study detailing the work. According to Futurism.com in related stories, the IEEE Spectrum reports a study with a small group of participants that 41 out of 53 believed that the A.I. Generated sound effects were real sound effects.
According to newstack.io in related stories, new work from the University of Texas at San Antonio research team have shown, that the Autofoley process, which i’ll explain later in the article, can be automated with the use of artificial intelligence that can analyze motion in a given video and then generate its own matching artificial sound effects.
The Autofoley process/The Deep Synthesis Network according to newstack.io in related stories, can analyze, categorize, and recognize what kind of action is happening in a video frame, and then produce the appropriate sound effect that may or may not already have some sound. To achieve precise recognition, the Autofoley system first identifies the sound that could match up with the actions from their customized database, then Autofoley attempts to match the sounds with timing and movement. The first part of the system analyzes the association of movement and timing in video-frame images by extracting features such as color and multi-scale recurrent neural network (RNN) combined with convolutional neural network (CNN). For faster moving actions in video clips where there may be missing visual information between consecutive frames, an interpolation technique using CNNs and a temporal relational network (TRN) that is utilized so that the system can preemptively “fill in” any missing gaps and link them smoothly, so that it can accurately time the actions with the predicted sound. According to futurism.com in related stories, To real Foley Artists who create sound effects with real humans are “damned” because of the A.I. sound effects going on.
RELATED STORIES:
https://news.mit.edu/2016/artificial-intelligence-produces-realistic-sounds-0613
https://www.newsweek.com/artificial-intelligence-algorithm-turing-test-sound-mit-470150
https://futurism.com/the-byte/ai-generated-sound-effects-fooling-human
https://thenewstack.io/these-ai-synthesized-sound-effects-are-realistic-enough-to-fool-humans/
https://www.wired.com/2016/06/mit-artificial-sound-effects/
TAKE ACTION:
Sample to see if you get tricked by realistic A.I. sound effects.