← Kilroy’s Daily Briefings
🎬 AI Video Intel

🎬 AI Video Intel — Thursday, May 28, 2026 at 6:45 AM

🎬 AI Video Intel5/28/2026🕐 6:45 AMVideo modelsVisual AI

Top stories, ranked by relevance.

Story cards stay below the sticky dock while audio, chapters, date, and brief navigation remain accessible.

#1YouTube Will Now Auto-Detect and Label AI-Generated Videos

Announced yesterday: YouTube's systems will automatically identify "significant photorealistic AI use" and apply labels even when creators don't self-disclose. Labels move from buried descriptions to directly below the player on long-form, and as overlays on Shorts. Videos made with YouTube's own Veo tools or carrying C2PA metadata get permanent, non-removable labels. Creators can contest incorrect flags through YouTube Studio, but the message is clear — label your AI content proactively or the platform will do it for you.

#2Runway Ships Aleph 2.0 — Edit One Frame, Propagate Everywhere

Runway's new in-context video editor lets you modify a single frame and automatically propagates the change across the entire clip — up to 30 seconds at 1080p — while preserving everything you didn't touch. Think color grading, wardrobe swaps, product recolors, or VFX fixes without re-rendering from scratch. The new Edit Studio product wraps Aleph 2.0 and is available on all paid plans now. Separately, Runway ported Gen-4.5 to NVIDIA's Rubin platform in a single day, signaling major inference speed gains ahead.

#3Microsoft MAI-Image-2.5 Cracks Arena Top 3

Dropped this week: Microsoft's MAI-Image-2.5 hit the #3 spot on the Artificial Analysis text-to-image Arena leaderboard, up from the mid-tier MAI-Image-2 that launched in March. Better natural light, more accurate skin tones. Available for free testing on the Arena now; coming to MAI Playground and Microsoft Foundry within two weeks.

#4ComfyUI v0.22.0 — Stable Audio 3.0, LTXV VRAM Cuts, Async LoRA Loading

The May 20 release adds native Stable Audio 3.0 support, MoGe geometry processing, and HiDream-O1 with area conditioning. For video creators, the LTXV enhancements are the headline: downscaled IC-LoRA support, temporal downscaling, and reduced peak VRAM for LTX 2.3. Block prefetch and async LoRA loading make workflows noticeably snappier. Six new custom nodes shipped the same week, including TurboQuant for ~4.5x VRAM reduction via 3-bit quantization.

#5HappyHorse 1.0 Claims #1 on Video Arena Leaderboard

This 15-billion-parameter unified Transformer from a team reportedly inside Alibaba's Taotian Group hit the top of the Artificial Analysis Video Arena with an Elo of 1374 (T2V) and 1410 (I2V), beating Seedance 2.0, Kling 3.0, and every other commercial system. It ships 1080p cinematic quality with 7-language lip-sync (English, Mandarin, Cantonese, Japanese, Korean, German, French) and generates a 5-second 1080p clip in ~38 seconds on an H100. Open-source weights confirmed but independent access is still being verified.

#6DeepBrain AI Studios Launches 1,000+ Context-Aware Expressive TTS Voices

Announced May 26: AI Studios' new Expressive TTS engine auto-adapts tone, pacing, and delivery based on content type — news reads differently than audiobook narration, short-form video, live commerce, or educational content. The 1,000+ voices are organized by category rather than just language, which makes finding the right voice for a specific video format significantly faster.

#7Kuaishou Plans $20B Kling AI Spinoff as ARR Hits $500M

Kuaishou is spinning off its Kling AI video unit at a reported $20 billion valuation with a potential 2027 IPO. ARR jumped from $300M in January to $500M now — roughly doubling in five months. Kling 3.5 shipped mid-May with browser-based UI, mobile integration, 1080p at 60fps, and physics simulation (fluid dynamics, cloth) that reviewers are calling unprecedented. Tencent is reportedly in talks to invest $2B. JPMorgan sees a path to $52B at 40x ARR.

#8Wan 2.7 Full Open-Source Video Suite Ships Under Apache 2.0

Alibaba's Tongyi Lab released the complete Wan 2.7 suite — four models covering text-to-video, image-to-video, reference-to-video with voice cloning, and instruction-based video editing. The 27B-parameter MoE architecture activates only 14B per inference pass. The standout feature is "thinking mode," where the model reasons about composition before generating, meaningfully improving character consistency. Runs locally via FramePack on 6GB VRAM cards. ComfyUI integration is available via Partner Nodes.