MIT’s Institute Computer Science and Artificial Intelligence is developing deep learning algorithm for creating videos of the future. If you given a still image from a scene and it can create a brief video that reproduces the future of that scene.
We are living in the dynamic physical world, and it is easy to forget how effortlessly we understand our surroundings. With minimal thought, we can figure out how scenes and objects interact.
They have trained on 2 million unlabeled videos that include a year worthy of footage. The algorithm generated videos that human subjects deemed to be realistic often than a baseline model.
The team says future version improves a lot in security tactics and self-driving cars. CSAIL Ph.D. student Carl Vondrick said algorithm helps the machine to recognize people’s activities without human annotations.
In videos, we can see what will happen in the future. Vodrick said “if you anyone predicting the future means they understand something about the present. “
Carl Vondrick does the research, MIT Professor Antonio Torralba, and former CSAIL postdoc Hamed Pirsiavash. The total research work will present at Neural Information Processing Systems (NIPS) conference in Barcelona in next week.
How it works
In past multiple researchers has handled the same topic in computer vision. MIT professor Bill Freeman has worked on visual dynamics and also creates future frames in the scene. But where his model focuses on extrapolating videos into the future, Torralba’s model can also generate completely new videos that haven’t been seen before.
In past process, it build up scenes frame by frame which creates large margin for error. Now this is working to process the entire scene at once, and this algorithm generates the frames as many as, usually it generates 32 frames from scratch per second.
They have arranged trade off to produce all frames simultaneously. It became more accurate, and computer model becomes more complex for longer videos. For creating multiple frames they are working in a model to generate foreground separately and background separately and place the objects in a manner to know which objects are movable and which are not.
The team used a deep learning method called “adversarial learning” that involves training two competing neural networks. One of the networks generates video, and another tells the discrimination between real and created video
The model can create videos resembling scenes from beaches, train stations, hospitals, and golf courses. If you take the example of beach model, the beach model produces beaches with crashing waves, and if you take the golf model it will show you people walking on grass.