AI Video Gen

📽️

AI Video Gen

Tags

Here is what I’ve learnt about AI video gen when I was doing my research late last year.

Top Platforms

Runway https://runwayml.com/

Luma Labs https://lumalabs.ai/

Kling https://klingai.com/global/

Haliou https://hailuoai.video/

Minimax

Sora https://sora.chatgpt.com/explore

Pika https://pika.art

Some of these platforms have their own benefits. Like some work better with people, some with landscapes.

Workflow for AI video gen

Don’t do text → video. The results are way worse than doing text → image and high-res image → video.

So that means that the workflow would be:

(Optional) Use ChatGPT / Claude to prepare a good image prompt

Use that image prompt on models like Midjourney, Flux 1.1 Pro https://replicate.com/black-forest-labs (no subscription - pay per use) or ChatGPT 4o image model on the paid version

Upscale that image using another tool https://magnific.ai/

Now go to the video gen platform and use a mix of image + text written prompt to create the 5-10s video

You can alternatively and recommended create two image keyframes - start frame and end frame. So that the video model has a clear idea of how the image should be animated. And you can now animate the middle via interpolation or motion paths or motion brushes.

So use a combination of image reference, text prompt, and video tool settings (some of the AI platforms have advanced settings that you can use)

Insights

Prompts

Keep image-to-video text prompts concise—neither too long nor too short
Include cinematic descriptors (e.g. “dramatic,” “wide-angle,” “soft focus”)
Learn the intricacies of camera framing that a director of an ad / movie / show / film would know — for e.g. long shot, close shot etc Consistency
Lock seed or style parameters across steps (Midjourney, Fal.ai)
Export 10 reference images to train Flux LORA for uniform look
ChatGPT can also be used

Motion Strategies

Use simple motion paths for objects/characters
For landscapes, add parallax or subtle camera pans

Keyframe Definitions

Start frame: establish setting and mood
End frame: illustrate transformation or reveal

BG Music

You can use the following platforms for bg music:

Suno

Udeo

Since all the video gen tools allow for 5-10s clips, you can create the music that has a beat shift every 5 - 7 - 10s to match the video.

Talking videos

If you want to do talking heads with lipsync you can use:

HeyGen - for lipsync, mouth movement, and moving of head, hands etc https://www.heygen.com/

ElevenLabs for voiceovers - professional AI voices and voice cloning https://elevenlabs.io/

Alternatives:

OpenAI TTS https://platform.openai.com/docs/guides/text-to-speech
Sesame https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice
Hume ‣
Orpheus https://github.com/canopyai/Orpheus-TTS