Here is what I’ve learnt about AI video gen when I was doing my research late last year.
Top Platforms
- Runway https://runwayml.com/
- Luma Labs https://lumalabs.ai/
- Haliou https://hailuoai.video/
- Minimax
- Pika https://pika.art
Some of these platforms have their own benefits. Like some work better with people, some with landscapes.
Workflow for AI video gen
Don’t do text → video. The results are way worse than doing text → image and high-res image → video.
So that means that the workflow would be:
- (Optional) Use ChatGPT / Claude to prepare a good image prompt
- Use that image prompt on models like Midjourney, Flux 1.1 Pro https://replicate.com/black-forest-labs (no subscription - pay per use) or ChatGPT 4o image model on the paid version
- Upscale that image using another tool https://magnific.ai/
- Now go to the video gen platform and use a mix of image + text written prompt to create the 5-10s video
- You can alternatively and recommended create two image keyframes - start frame and end frame. So that the video model has a clear idea of how the image should be animated. And you can now animate the middle via interpolation or motion paths or motion brushes.
- So use a combination of image reference, text prompt, and video tool settings (some of the AI platforms have advanced settings that you can use)
Insights
- Prompts
- Keep image-to-video text prompts concise—neither too long nor too short
- Include cinematic descriptors (e.g. “dramatic,” “wide-angle,” “soft focus”)
- Learn the intricacies of camera framing that a director of an ad / movie / show / film would know — for e.g. long shot, close shot etc Consistency
- Lock seed or style parameters across steps (Midjourney, Fal.ai)
- Export 10 reference images to train Flux LORA for uniform look
- ChatGPT can also be used
- Motion Strategies
- Use simple motion paths for objects/characters
- For landscapes, add parallax or subtle camera pans
- Keyframe Definitions
- Start frame: establish setting and mood
- End frame: illustrate transformation or reveal
BG Music
You can use the following platforms for bg music:
- Suno
- Udeo
Since all the video gen tools allow for 5-10s clips, you can create the music that has a beat shift every 5 - 7 - 10s to match the video.
Talking videos
If you want to do talking heads with lipsync you can use:
- HeyGen - for lipsync, mouth movement, and moving of head, hands etc https://www.heygen.com/
- ElevenLabs for voiceovers - professional AI voices and voice cloning https://elevenlabs.io/
- Alternatives: