3. Everything Everywhere All at Once
‘Everything Everywhere All at Once’ got the most Oscar nods this year. Excited for the quirky film. Even more excited that Generative AI played a small part. The film’s VFX editor Evan Halleck used Runway in some scenes. Runway is a leading Generative AI-based video editing product.
There are 70 (and counting) so-called AI-based video tools I’ve come across so far through interviews, podcasts, videos, and lists of AI tools (links to a few compilations at end of this post). Most of those don’t have what I’m looking for.
Looking for the ability to describe a scene in natural language and have a lifelike video generated from it. The scene description could be a simple setting, character, action, and speech. Or it could go all the way into details of the lighting, mood, and multiple characters with varying emotions in dynamic settings with dialog and soundtrack. And of course, camera angles and movements. Haven’t seen any product do that today.
There’s only one that shows a sliver of this capability (a combination of Google Imagen Video and Google Phenaki). Meta Make-A-Video has only shown examples of simple phrases converted to simple videos. Stability AI’s DreamStudio Pro and Runway’s text-to-video product are not released yet but hold promise.
There’s good progress in specific tasks such as generating singular talking characters in a single pose looking into the camera, removing objects and background from videos, changing lip movement to match new words/languages, motion capture, and the notorious deepfakes. Creators have experimented with using Generative AI text, image, audio, and video tools to create different aspects of their stories and then stitch those together with standard tools such as Final Cut Pro.
We can also easily create a story idea, script, shot list, shot descriptions, costumes, and props using ChatGPT. Accelerates movie production although you still need to get actors, shoot and edit the video!
I’ve curated a set of products that simplify and empower live-action or animated video creation and transformation. Three categories emerge: 1) Generate Videos, 2) Generate Talking Characters, and 3) Edit Videos. Exciting times are ahead!
1) Generate Videos
a) Google Imagen Video + Phenaki: Combination is powerful. Text-to-video. Describe a scene in text (including camera movements) and it creates a video. No audio (music and speech) generation though. Also haven’t seen realistic human characters being generated so far.
b) Meta Make-A-Video: Text-to-video. Seems to only work with a simple, focused text prompt such as “horse drinking water.” Seems less advanced than Google. No audio generation.
c) Stability AI DreamStudio Pro: Expected to release this month. Generate entire movies (likely animation only), storyboarding, 3D cameras, and audio integration with Stability’s audio models.
d) Runway: Text-to-video not released yet. See description of current capabilities in the ‘Edit Videos’ section below.
e) Opus: Natural language descriptive text to animation – 3D scenes and characters. Work in progress but their intent and demo are cool!
2) Generate Talking Characters
a) Creative Reality Studio by D-ID: Create a talking video from a portrait photo and a text script. Not text-to-video and can’t be used for moviemaking other than for scenes that have a single character in portrait mode, looking directly into the camera!
b) CharacterGPT by Alethea AI: Realistic, interactive character generation from a text prompt. Try it at https://mycharacter.ai/. Good progress toward synthetic characters for movies, but seems limited to a single character, portrait mode, and looking into camera.
c) Soul Machines: Create digital people, including digital twins of celebrities. Could be used for characters in movies but again limited to portrait mode, looking into camera. Check out will.i.am’s digital twin.
d) Colossyan: Choose from stock actors and enter the text you want them to speak. Can change emotions and aging. Can also create custom actors. Limited to portrait mode looking into camera.
3) Edit Videos
a) Runway: AI-powered video editing suite. Remove backgrounds and objects from videos, track motion, and remove audio noise. One of the most comprehensive products for creators and editors. Understands well-established creative workflows and helps to speed those up. Co-authored the Stable Diffusion paper ‘High-Resolution Image Synthesis with Latent Diffusion Models’. Some generative capabilities but not a pure text-to-video solution yet. They have a waitlist for text-to-video. Used by the VFX team of ‘Everything Everywhere All at Once’, which leads Oscar nominations this year! Check out this interesting creation ‘Home Alotus’ - Home Alone in the style of White Lotus’s opening credits.
b) Deepfakesweb: Swap faces in a video. Could be useful if a desired actor is unavailable for live production but agrees to the use of their face.
c) TrueSync by Flawless: Replace spoken words (either to replace specific words in the original language or translate into another language) and match the facial expressions/lip movements to the new words/language. Called ‘vubbing’ (visual dubbing). Watch this example.
There are motion-capture tools that transform real-life motion capture into different characters and settings – Move, Rokoko, etc.
Tools that string together stock assets (images, videos, sounds) to generate a pseudo-video (not live action or animation, but more like a presentation with voiceovers, transitions, and embedded videos) – Fliki, Lumen5, etc.
Also check out vidyo AI, Descript.
Audio is a key aspect of video and there are several standalone products for that. Not the focus of this post, sprinkling below.
Microsoft VALL-E: Synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.
Sonantic: Synthesize realistic, simulated human voice from text. Used for Val Kilmer’s audio in Top Gun: Maverick. Acquired by Spotify.
Others (audio, music, speech): Aiva, AWS DeepComposer, Beatoven, Harmonai, Mubert, Play, Resemble, Riffusion, Soundraw, Tortoise TTS, Wellsaidlabs.
I’ll leave you with video tool compilations to explore further. Worth repeating – none of these get to the utopia of true text-to-video: GitHub, FutureTools, Futurepedia, Mad Genius.