Runway's generative AI transforms text into video

Image: Runway

Runway has shouldered aside Midjourney and Stable Diffusion, introducing the first clips of text-to-video AI art that the company says is completely generated by a text prompt.

The company said that it’s offering a waitlist to join what it calls “Gen 2” of text-to-video AI, after offering a similar waitlist for its first, simpler text-to-video tools that use a real-world scene as a model.

When AI art emerged last year, it used a text-to-image model. A user would input a text prompt describing the scene, and the tool would attempt to create an image using what it knew of real-world “seeds,” artistic styles and so forth. Services like Midjourney perform these tasks on a cloud server, while Stable Diffusion and Stable Horde take advantage of similar AI models running on home PCs.

Text-to-video, however, is the next step. There are various ways of accomplishing this: Pollinations.ai has accumulated a few models which you can try out, one of which simply takes a few related scenes and constructs an animation stringing them together. Another simply creates a 3D model of an image and allows you to zoom around.

Runway takes a different approach. The company already offers AI-powered video tools: inpainting to remove objects from a video (as opposed to an image), AI-powered bokeh, transcripts and subtitles, and more. The first generation of its text-to-video tools allowed you to construct a real-world scene, then use it as a model to overlay a text-generated video on top of it. This is normally done as an image, where you could take a photo of a Golden Retriever and use AI to transform the photo into a photo of a Doberman, for example.

That was Gen 1. Runway’s Gen 2, as the company tweeted, can use existing images or videos as a base. But the technology can also completely auto-generate a short video clip from a text prompt and nothing more.

Generate videos with nothing but words. If you can say it, now you can see it.
Introducing, Text to Video. With Gen-2.
Learn more at https://t.co/PsJh664G0Q pic.twitter.com/6qEgcZ9QV4
— Runway (@runwayml) March 20, 2023

As Runway’s tweet indicates, the clips are both short (just a few seconds at most), awfully grainy, and suffers from a low frame rate. It’s not clear when Runway will release the model for early access or general access, either. But the examples on the Runway Gen 2 page do show a wide variety of video prompts: pure text-to-video AI, text+image to video, and so on. It appears that the more input you give the model, the better your luck. Applying a video “overlay” over an existing object or scene seemed to offer the smoothest video and highest resolution.

Runway already offers a $12/mo “Standard” plan that allows for unlimited video projects. But certain tools, such as actually training your own portrait or animal generator, require an additional $10 fee. It’s unclear what Runway will charge for its new model.

What Runway does demonstrate, however, is that in a few short months, we’ve moved from text-to-image AI art into text-to-video AI art… and all we can do is shake our heads in amazement.

Author: Mark Hachman, Senior Editor

As PCWorld’s senior editor, Mark focuses on Microsoft news and chip technology, among other beats. He has formerly written for PCMag, BYTE, Slashdot, eWEEK, and ReadWrite.

Recent stories by Mark Hachman:

Comcast’s new NOW prepaid Internet looks surprisingly compellingUdio’s AI music is my new obsessionBroadband ‘nutrition labels’ kick in, revealing hidden fees for ISPs