muapi.aiAI tool

muapi-ai

Site: https://muapi.ai/

Visitar site

muapi.ai

Geração de vídeo de IA Clip de vídeo de IA

Visitar site

Planos de precos

Ainda nao ha planos de preco detalhados para esta ferramenta.

Visao detalhada

Seedance 2.0 is now available! Next generation AI now live.Seedance 2.0+15% Seedance Bonus →hidream-i1-devText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.veo3-image-to-videoImage to VideoVEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.wan2.1-text-to-imageText to ImageWAN 2.1 is a powerful AI model that transforms text prompts into high-resolution, photorealistic images. It excels at detailed object rendering, realistic lighting, and fine textures, making it ideal for visual content, concept art, advertising, and digital storytelling.ai-video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.motion-controlsImage to VideoMotion Controls adds dynamic camera movements, speed ramps, and zoom effects to bring your images to life as smooth, engaging videos.vfxImage to VideoVFX delivers high-impact visual effects like explosions, particles, and cinematic overlays to transform static images into action-packed videos.veo3-text-to-videoText to VideoVEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.flux-kontext-max-t2iText to ImageFlux Kontext Max T2I delivers photorealistic or cinematic-quality images with exceptional detail. It's optimized for high-end visuals — from realistic humans to polished product renders.runway-text-to-videoText to VideoGenerate short, high-quality videos from plain text prompts. RunwayML’s text-to-video model interprets your written description and animates it into a moving visual scene with realistic or stylized motion.suno-extend-musicText to AudioThis API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.hunyuan-text-to-videoText to VideoHunyuan T2V generates detailed and dynamic videos from text prompts with a focus on realism and coherent motion. It handles multi-object scenes, human actions, and cinematic compositions effectively, making it ideal for storytelling and visual concepts.veo3-fast-text-to-videoText to VideoVEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.ai-product-shotImage to ImageInstantly generate studio-quality product images with AI. Upload your item photo and get clean, stylized shots perfect for e-commerce, ads, and catalogs.gpt4o-image-to-imageImage to ImageTransform an input image based on a new prompt — like changing style, lighting, or composition. Useful for reinterpreting visuals while keeping structure.hunyuan-image-to-videoImage to VideoHunyuan I2V takes a static image and generates realistic video animations by interpreting motion and context. It works well for human portraits, objects, or scenes, adding lifelike movement while maintaining the image's integrity.ai-video-face-swapVideo to VideoReplace faces in videos with stunning realism. Our AI ensures accurate expression transfer, lighting consistency, and smooth frame-by-frame blending.hunyuan-fast-text-to-videoText to VideoHunyuan Fast T2V provides accelerated video generation from text prompts with slightly reduced detail but excellent speed. Ideal for rapid prototyping, concept testing, and short-form ideas where time is critical.runway-aleph-v2vVideo to VideoTransform any input video into a new visual style or scene while preserving motion and structure. Aleph V2V lets you apply artistic looks, cinematic lighting, or thematic changes to existing footage.minimax-image-01-subject-referenceImage to ImageMinimax’s I2I “Subject Reference” model enables you to transform images while preserving the appearance of a subject using a single reference image. Ideal for maintaining character likeness—features, clothing, or expression—across different styles or settings.ai-product-photographyImage to ImageCreate professional-grade product photos using AI. Upload your item image and describe it with a prompt, and get studio-style, lifestyle, or creative backgrounds in secondsbytedance-seededit-v3Image to ImageSeededit allows precise edits to images using masks and prompt guidance. Whether you're replacing backgrounds, changing clothing, or inpainting missing areas, Seededit ensures realistic, high-quality results with semantic control.ai-background-removerImage to ImageInstantly remove image backgrounds with pixel-perfect precision. Ideal for product photos, profile pictures, and creative projects.ai-image-upscalerImage to ImageTransform blurry or pixelated images into high-definition visuals. Our AI Image Upscaler uses deep learning to reconstruct details and bring your visuals to life.wan2.2-image-to-videoImage to VideoWan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.runway-act-two-i2vImage to VideoUpload a single character image and a driving video — the model transfers facial expressions and head movements from the video onto your image, bringing it to life. It works with photos, illustrations, or stylized portraits, making them speak, blink, and move naturally. Ideal for avatars, AI presenters, digital actors, and story scenes.nano-banana-effectsImage to ImageNano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.pixverse-v4.5-i2vImage to VideoUpload an image and PixVerse v4.5 will breathe life into it with smooth camera motion, realistic effects, and animated elements. Whether it’s a portrait, landscape, or concept art, this mode turns still visuals into dynamic short videos.ai-image-face-swapImage to ImageAdvanced facial recognition and blending algorithms enable precise face swaps while preserving skin tone, lighting, and facial geometry.midjourney-v7-omni-referenceImage to ImageMidjourney's Omni Reference lets you reuse characters, creatures, or styles from an existing image and place them into entirely new scenes. Simply provide a reference image (oref) and Midjourney will maintain identity, details, and visual consistency — ideal for storytelling, character design, or branding across multiple generations.ideogram-v3-t2iText to ImageIdeogram v3 is an advanced text-to-image model designed for creating highly detailed and visually striking images directly from text prompts. It’s especially good for artistic compositions, design mockups, concept art, and photorealistic scenes. With strong support for text rendering inside images, it’s widely used for posters, typography-based art, and creative branding.ai-dress-changeImage to ImageInstantly change outfits in images using AI. Visualize different clothing styles without the need for physical trials—perfect for fashion, e-commerce, and virtual try-ons.grok-imagine-image-to-videoImage to VideoGrok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into short (≈6 second) cinematic videos with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.image-effectsImage to ImageAI Image Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning images from a image.mmaudio-v2-text-to-audioText to AudioConvert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.mmaudio-v2-video-to-videoVideo to VideoMMAudio-v2 generates high-quality, synchronized audio from video or text inputs. Seamlessly integrate it with AI video models to create fully-voiced, expressive video content.wan2.1-image-to-videoImage to VideoAnimate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.sync-lipsyncAudio to VideoGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.google-imagen4-fastText to ImageImagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.openai-sora-2-pro-image-to-videoImage to VideoSora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.runway-act-two-v2vVideo to VideoTake an existing character video and sync it with the motion from a reference video. This lets you update facial expressions, head turns, and speech gestures while keeping the original look and style. It’s perfect for reshooting performances, dubbing, or animating characters without re-rendering visuals.luma-modify-videoVideo to VideoLuma Modify Video lets you transform an existing video into a new creative scene while keeping the original motion and timing intact. The result is a new video with the same movements but a completely fresh look, atmosphere, or theme.pixverse-v4.5-t2vText to VideoPixVerse v4.5 transforms descriptive text into vivid, high-resolution video clips. It understands complex scenes, human motion, and cinematic camera angles — great for creative storytelling, trailers, and animated concepts.veo3.1-image-to-videoImage to VideoVeo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.seedance-pro-t2vText to VideoSeedance Pro delivers high-fidelity video generation from text, producing rich visuals, smooth camera movement, and realistic scenes. Best for storytelling, content creation, and visual production.nano-banana-editImage to ImageNano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.infinitetalk-video-to-videoVideo to VideoInfiniteTalk Video-to-Video enhances or transforms existing videos by syncing the subject’s lip movements and facial expressions with new dialogue or speech. Instead of starting from a still image, you provide a video clip, and the model seamlessly reanimates the speaker’s mouth and expressions to match the script.openai-sora-2-image-to-videoImage to VideoSora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.veo3.1-text-to-videoText to VideoVeo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.chroma-imageText to ImageCroma Image is an advanced text-to-image generation model designed for high-quality, creative, and versatile visuals. It can produce anything from photorealistic portraits and products to imaginative concept art, fantasy illustrations, and cinematic scenes.wan2.2-animateVideo to VideoWan2.2 Animate is a video-to-video model for animating a character or replacing a character in existing video clips. It replicates holistic movement and facial expressions from a reference video or pose while preserving the target character’s appearance. You upload both an image (for the character) and a video containing motion/expression, and the model generates a video where the character in your image moves like the reference. Supports 480p or 720p, up to 120 secondsopenai-sora-2-text-to-videoText to VideoSora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.minimax-hailuo-2.3-pro-i2vImage to VideoHailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.ai-skin-enhancerImage to ImageSmooth skin, reduce blemishes, and enhance complexion with natural-looking results. Perfect for portraits, selfies, and professional photo retouching.wan2.5-image-to-video-fastImage to VideoConvert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.ai-clippingVideo to VideoConvert long-form videos into engaging short clips using AI clipping.ltx-2-19b-lipsyncAudio to VideoLTX-2-19B LipSync generates a realistic talking video by synchronizing a person’s mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency. Ideal for avatars, dubbing, dialogue replacement, and character narration.ai-captionsVideo to VideoAdd AI-generated animated captions to any video using Vadoo's caption engine. Supports multiple languages and viral caption themes like Hormozi style. Perfect for social media creators, marketers, and content producers.midjourney-v7-text-to-imageText to ImageMidjourney V7 produces high-quality, stylized images from text prompts. Known for its artistic flair, surreal composition, and vivid textures, it's perfect for character concepts, fantasy environments, and creative illustrations.ai-color-photoImage to ImageAutomatically add lifelike colors to black-and-white images. Our AI brings history to life with natural tones, accurate shading, and context-aware colorization.flux-devText to ImageGenerate stunning visuals from simple text prompts. Flux Dev transforms your ideas into high-quality, creative images using powerful AI vision models. Perfect for design, storytelling, concept art, and marketing.hidream-i1-fastText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.openai-sora-2-pro-text-to-videoText to VideoSora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.veo3-fast-image-to-videoImage to VideoQuickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.flux-kontext-dev-t2iText to ImageGenerates an image from a text prompt, with optional reference image for pose or style guidance. Ideal for controlled, consistent image creation using just a description.seedance-lite-t2vText to VideoSeedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.kling-o1-text-to-videoText to VideoKling O1 is a unified, multi-modal video generation engine that transforms natural language prompts into short cinematic video clips. It supports text-to-video generation with realistic motion, dynamic camera moves, and coherent scene rendering.seedance-v2.0-extendText to VideoSeedance 2.0 Extend Video continues an existing Seedance 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment.hidream-i1-fullText to ImageThe most advanced version of HiDream I1, delivering high-resolution, detailed images with superior prompt understanding. Best suited for production, content creation, and high-fidelity applications.flux-kontext-pro-i2iImage to ImageFlux Kontext Pro I2I variant enables transforming base images into refined artwork while keeping structure intact. It’s useful for sketch refinement, visual style changes, and creative edits such as re-dressing, relighting, or re-theming with prompt guidance.ai-ghibli-styleImage to ImageBring your imagination to life with art inspired by the enchanting world of Studio Ghibli. This AI model generates dreamy, hand-drawn visuals with soft colors, whimsical characters, and painterly backgroundswan2.1-lora-t2vTrainingWAN 2.1 LoRA T2V enables users to generate videos from text prompts with custom-trained LoRA modules. Tailor the generation to specific characters, outfits, or animation styles — ideal for brand storytelling, fan content, and stylized animations.portrait-stylistImage to ImageProfessional AI portrait styles including hair, makeup, style, and fashion transformations.kling-v2.1-pro-i2vImage to VideoKling 2.1 Pro is the high-end version of Kuaishou’s video generation model, offering enhanced realism, longer motion sequences, and cinematic quality. In I2V mode, it animates static images with fluid environmental effects.veo3.1-fast-text-to-videoText to VideoVeo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.qwen-image-2.0-pro-editImage to ImageQwen 2.0 Pro Image Edit model with maximum precision and modifications.ai-anime-generatorText to ImageCreate stunning anime-style artwork instantly with our AI Anime Generator. Customize characters, scenes, and styles effortlessly in seconds!z-image-turboText to ImageZ-Image Turbo is a high-speed text-to-image model optimized for fast creative generation. It produces detailed, high-contrast, high-resolution images with strong stylization control. Ideal for rapid concept creation, visual exploration, product ideas, fantasy scenes, and cinematic composition tests. Designed for low latency and strong prompt adherence.kling-v2-avatar-proAudio to VideoAI-Avatar v2 Pro takes a reference image of a person/character and an audio dialogue clip, then generates a realistic talking-avatar video. It preserves identity, lip syncs accurately to the audio, adds natural head movement, eye motion, expressions, and cinematic lighting.openai-sora-2-standard-text-to-videoText to VideoOpenAI Sora 2 Standard Text to Video model (High Priority). Generate stunning 10s videos from text prompts.ai-image-extensionImage to ImageExpand the edges of any image with AI. This model continues your original photo or artwork beyond its borders while matching style, lighting, and content.wan2.1-lora-i2vTrainingBring still images to life using WAN 2.1 LoRA I2V, which supports custom LoRA fine-tunes for identity consistency. Animate expressions, subtle movements, or full-body actions while preserving personalized features from the image and LoRA.pixverse-v5.5-i2vImage to VideoPixVerse v5.5 I2V transforms a single image into a dynamic cinematic video clip. It adds smooth camera motion, atmospheric animation, natural parallax, and environmental effects while preserving the image’s original art style and composition.heygen-video-translateVideo to VideoConvert any video into 175+ languages with synchronized voice translation, AI-voice cloning, and accurate lip sync. Just upload your video (or provide a link), select a target language, and HeyGen recreates the speech in that language. 0.05$ per second.wan2.2-spicy-video-extendVideo to VideoWan-2.2-spicy Video Extend continues an existing video by generating new frames that match the original style but add stronger motion, bolder effects, and spicier dramatics.kling-o1-standard-reference-to-videoImage to VideoKling O1 Standard Reference-to-Video generates a smooth, realistic video using one or multiple reference images as visual guidance. It preserves the visual identity, composition, and lighting from the references while adding subtle camera motion, natural parallax, and light environmental animation. This mode prioritizes stability and realism, making it ideal for character shots, environments, product visuals, and calm cinematic scenes.qwen-image-edit-2511Image to ImageQwen Image Edit 2511 performs precise, instruction-driven edits on an existing image while preserving composition, lighting, and overall style. It’s well-suited for object replacement, material changes, localized edits, and subtle scene adjustments with strong visual consistency and minimal artifacts.add-image-watermarkImage to ImageAdd custom watermark to images with adjustable position, opacity, and size. Free local processing using PIL.kling-v2.6-pro-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ai-object-eraserImage to ImageEasily remove unwanted objects, people, or text from any image using AI. Just select the area you want to erase, and the model will intelligently fill the space with realistic background matching the surrounding environment. No Photoshop skills needed.runway-image-to-videoImage to VideoAnimate any image by turning it into a video with motion effects or scene continuity. RunwayML’s I2V model transforms static visuals into short clips by extrapolating depth, movement, and temporal dynamics.suno-create-musicText to AudioSuno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.z-image-baseText to ImageZ-Image Base is a general-purpose text-to-image model designed for reliable, high-quality image generation from natural language prompts. It focuses on clear composition, good prompt adherence, and versatile output across everyday scenes, product-style visuals, characters, and creative concepts.flux-kontext-pro-t2iText to ImageFlux Kontext Pro T2I offers fast and reliable generation with creative flexibility. It supports stylized prompts, character design, and fantasy themes while maintaining clear subject coherence.flux-krea-devText to ImageFlux Krea Dev is a text-to-image model built by Black Forest Labs in collaboration with Krea AI, designed to generate highly photorealistic images that avoid the common 'AI look' artifacts (plastic skin, overexposed lighting, synthetic textures). It emphasizes real texture, natural lighting, and aesthetic control.flux-kontext-max-i2iImage to ImageFlux Kontext Max I2I in Max mode allows precise image enhancement and visual transformations while retaining the source layout. It’s powerful for retouching, photo-to-art workflows, concept refinement.tiktok-carouselText to ImageAI TikTok Carousel Generator — create viral TikTok carousel posts from a single text prompt. Choose a proven storytelling format (Problem-Solution, Listicle, Tutorial, Before & After), set your slide count (3-10), and get stunning AI-generated images at 1080x1920 portrait resolution, ready to upload to TikTok.bytedance-seedream-v5.0-editImage to ImageSeedream 5.0 Lite Edit is an advanced image transformation model by ByteDance, enabling precise, controllable edits using natural language. It specializes in high-fidelity style transfer (Anime, Cyberpunk, Fantasy), background swaps, and object modification while preserving original lighting, color tones, and character consistency for professional-grade creative reworks.gpt4o-text-to-imageText to ImageGenerate images from text prompts using GPT-4o's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.gpt4o-editImage to ImageEdit a specific part of an image using natural language. Ideal for object removal, replacement, or content-aware filling.kling-o1-standard-image-to-videoImage to VideoKling O1 Standard Image-to-Video converts a single still image into a short, natural-looking video clip. It preserves the original image’s composition and lighting while adding subtle camera motion, gentle parallax, and light environmental animation. This mode focuses on realism and stability rather than heavy effects, making it ideal for clean cinematic shots, environments, characters, and product visuals.wan2.1-text-to-videoText to VideoWAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.perfect-pony-xlText to ImagePony XL is a high-quality image generation model based on Stable Diffusion XL architecture. It specializes in character art, hybrid styles, and producing detailed, polished visuals even with simpler prompts.wan2.2-speech-to-videoAudio to VideoWAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.flux-2-klein-4b-turboText to ImageFlux-2-Klein-4B Turbo is an ultra-fast, high-efficiency text-to-image model. It is a distilled version of the Klein 4B model, designed for near-instant rendering while maintaining impressive adherence to prompts. Perfect for rapid prototyping, real-time creative tools, and applications where speed is paramount.seedance-2.0-omni-referenceImage to VideoSeedance 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.midjourney-v7-image-to-imageImage to ImageUse Midjourney V7’s I2I to refine or reinterpret existing images. Modify style, mood, lighting, or content while preserving the overall composition — great for alternate versions, art variations, or polishing concepts.flux-2-klein-9b-turboText to ImageFlux-2-Klein-9B Turbo is a high-performance, mid-size text-to-image model. This distilled variant of Klein 9B provides a superior balance of speed and detail, delivering richer textures and complex scenes with significantly reduced generation times. Ideal for polished illustrations and character-rich visuals where performance is key.grok-imagine-text-to-videoText to VideoGrok Imagine is xAI’s fast, creative text-to-video model that generates short (~6-second) cinematic clips with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.ltx-2.3-video-extendVideo to VideoLTX-2.3 Video Extend seamlessly continues an existing video clip by generating additional frames that match the original motion, style, and scene composition. Powered by the LTX-2.3 architecture, it maintains temporal coherence and visual fidelity across the extension boundary.seedance-lite-reference-videoImage to VideoSeedance Lite's Reference-to-Video feature allows you to supply up to 4 images as reference inputs. The model intelligently blends aspects from these images to generate a cohesive, high-quality video.bytedance-seedream-v3Text to ImageSeedream is designed for generating visually rich and artistic images from text prompts. It excels at fantasy, anime, surrealism, and vibrant color compositions — ideal for creative visuals, storyboards, and concept art.kling-v2.1-master-i2vImage to VideoKling 2.1 Master’s I2V animates a still image into a coherent video sequence. It interprets motion, environment, and context to create realistic, visually stunning video outputs — ideal for animating portraits, scenes, or concept art.flux-kontext-effectsImage to ImageFlux Kontext Effects is a creative image and video model that applies stylized transformations, cinematic filters, and artistic reinterpretations to your inputs. Instead of generating new content from scratch, it enhances or reimagines existing images and videos with unique looks — ranging from surreal effects to realistic cinematic moods.kling-v2.1-standard-i2vImage to VideoKling 2.1 Standard (developed by Kuaishou) brings static images to life by generating smooth, realistic video clips from a single frame. It captures subtle motion, background dynamics, and camera movement to produce professional-looking animations — ideal for portraits, digital art, and cinematic illustrations.qwen-imageText to ImageGenerate high-quality, detailed images from text prompts in various styles — from realistic to artistic — perfect for creative visuals, product shots, and concept art.midjourney-v7-style-referenceImage to ImageGenerate images in the distinctive aesthetic of Midjourney v7 — blending cinematic depth, photorealism or painterly rendering, rich textures, and dynamic lighting. This style reference model helps you infuse any subject with the visual storytelling, composition, and high detail fidelity that Midjourney is known for. Ideal for concept art, stylized portraits, and stunning environment scenes.openai-soraText to VideoSora is a text-to-video generative AI model developed by OpenAI. It can generate short video clips based on descriptive text inputs, producing content that ranges from photorealistic scenes to stylized animations.veo3.1-4k-videoText to VideoGet the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.flux-pulidImage to ImageFlux PuLID is an innovative image-to-image model that enables consistent face rendering across different styles or scenes—without needing any model fine-tuning. By providing a reference image (e.g., a portrait), the model generates new visuals while maintaining your subject’s identity with high fidelity.wan2.5-text-to-video-fastText to VideoTransform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.hunyuan-image-3.0Text to ImageHunyuan Image 3.0 brings together powerful architecture (Mixture-of-Experts + autoregressive style) to produce richly detailed and coherent images from complex prompts. It can read narrative descriptions, render text and signage cleanly, and support multiple visual styles — from photorealism to illustrations.kling-o1-text-to-imageText to ImageKling O1 Text-to-Image is a high-fidelity creative image model that converts rich natural-language prompts into ultra-detailed stills. It excels at cinematic composition, realistic lighting, and coherent scene detail—great for concept art, environment renders, character portraits, and stylized imagery with photoreal or illustrative looks.latent-syncAudio to VideoLatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.creatify-lipsyncAudio to VideoRealistic lipsync video - optimized for speed, quality, and consistency.nano-banana-2Text to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.qwen-image-editImage to ImageThe Qwen Edit Image Model allows you to modify existing images using text-based editing prompts. Instead of generating from scratch, you can upload a base image and describe the desired changes (e.g., replacing objects, altering colors, adding new elements).ltx-2-pro-image-to-videoImage to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.photo-packImage to ImageGenerate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.vidu-q1-referenceImage to VideoVidu Q1 enables you to generate cinematic 1080p videos using multiple visual references—up to seven images—and text prompts. Designed for consistency, it preserves character appearance, props, and backgrounds across scenes while adding new motion and narrative elements.wan2.2-5b-fast-t2vText to VideoWan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.minimax-hailuo-02-standard-i2vImage to VideoTransforms an image into video with light, natural motion. Great for social media, quick animations, and previews.wan2.2-text-to-videoText to VideoWan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.flux-2-flex-editImage to ImageFlux-2-Flex Edit allows flexible transformation of an existing image: object replacement, material changes, lighting adjustments, style shifts, or localized edits. It preserves the original scene’s geometry, perspective, and lighting while modifying only what the edit prompt specifies.ideogram-v3-reframeImage to ImageIdeogram V3 Reframe is a specialized image-to-image model built on Ideogram 3.0, designed to intelligently extend and adapt images across diverse aspect ratios and resolutions. Leveraging advanced AI outpainting, it preserves visual consistency while enabling creative reframing for digital, print, and video content.veo3.1-fast-image-to-videoImage to VideoVeo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.kling-o1-standard-video-editVideo to VideoKling O1 Standard Video-to-Video Edit modifies an existing video while preserving its original structure, motion, and realism. It is designed for subtle, stable edits such as object replacement, background changes, lighting adjustments, or small visual tweaks. This mode prioritizes temporal consistency and natural motion, making it.flux-2-proText to ImageFlux-2-Pro Text-to-Image is a premium, high-fidelity generative model capable of producing ultra-realistic, cinematic, and deeply detailed images from text prompts. It excels at complex lighting, layered compositions, surreal visual concepts, and professional art-grade rendering suitable for concept art, advertising visuals, and world-building.minimax-hailuo-02-standard-t2vText to VideoFast and lightweight text-to-video generation. Ideal for quick drafts, previews, or playful content where speed matters more than cinematic quality.seedance-2.0-watermark-removerVideo to Video🎉 FREE for a limited time — Remove Seedance 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.flux-2-flexText to ImageFlux-2-Flex Text-to-Image is a flexible, high-fidelity generative model capable of producing detailed, imaginative, and stylistically rich scenes from text alone. It excels at surreal concepts, fantasy environments, sci-fi structures, cinematic atmospheres, and high-resolution artistic compositions with strong prompt adherence.midjourney-v7-image-to-videoImage to VideoMidjourney V7’s I2V breathes motion into still images, animating characters, environments, and objects with artistic transitions. Ideal for looping visual stories, concept animations, or enhancing still visuals with subtle motion.flux-schnellText to ImageFlux Schnell is a lightning-fast image generation model designed for rapid iterations. It delivers good visual quality from text prompts almost instantly, making it perfect for real-time concept testing, brainstorming, and UI-integrated experiences.vidu-v2.0-t2vText to VideoVidu's 2.0 model offers enhanced visual quality and comprehensive workflow support across multiple resolution options for versatile content creation.ai-dance-effectsVideo to VideoBring your characters and worlds to life with AI Dance Effects — a creative video effect that adds playful, dynamic, and cinematic motion to your generations. AI Dance Effects lets you guide how characters move, react, and express themselves.bytedance-seedream-v4.5Text to ImageSeedream-v4.5 is ByteDance’s advanced text-to-image diffusion model designed for generating high-detail, high-contrast, cinematic and stylized images. It excels at surreal fantasy concepts, sci-fi worlds, product visuals, photoreal scenes, and artistic compositions with strong prompt adherence and crisp detail.kling-v2.6-pro-t2vText to VideoKling-v2.6-Pro Text-to-Video generates high-fidelity cinematic videos directly from text prompts. It excels at complex compositions, dramatic lighting, fluid camera motion, and visually rich fantasy or sci-fi sequences.flux-2-klein-4b-turbo-editImage to ImageFlux-2-Klein-4B Turbo Edit provides ultra-fast, instruction-based image editing. This high-efficiency variant of Klein 4B Edit is optimized for near-instant swaps and tweaks while preserving layout and lighting. Ideal for real-time design tools and quick creative adjustments.openai-sora-2-pro-charactersText to TextCreate consistent AI characters for your Sora 2 videos. Provide a previous video's task ID and a prompt to define or refine your character.nano-bananaText to ImageNano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.pixverse-v5-t2vText to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.wan2.6-image-to-videoImage to VideoWAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.google-imagen4Text to ImageGoogle Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.google-imagen4-ultraText to ImageImagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.wan2.6-text-to-imageText to ImageWAN 2.6 Text-to-Image generates detailed, cinematic still images from text prompts. It focuses on strong composition, atmospheric lighting, and clear subject structure, making it suitable for fantasy and sci-fi environments, surreal concepts, architectural visuals, and dramatic world-building imagery.wan2.1-reference-videoImage to VideoWAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.qwen-image-2.0Text to ImageQwen 2.0 Text to Image model with enhanced realism.veed-lipsyncAudio to VideoGenerate realistic lipsync from any audio using VEED's latest modelsdxl-imageText to ImageSDXL is a high-quality, large Stable Diffusion model for creating photorealistic and stylized images from text. It excels at fine detail, realistic lighting, and complex scenes.infinitetalk-image-to-videoAudio to VideoInfiniteTalk Image-to-Video brings still portraits and character photos to life by generating natural, realistic talking videos. You provide a single face image and a dialogue script, and the model animates lip movement, facial expressions, and subtle head gestures to match the speech.luma-flash-reframeVideo to VideoTransform and resize your videos effortlessly with Ray 2 Flash Reframe. This tool intelligently expands or adjusts your video’s aspect ratio—adding visually consistent content to the sides, top, or bottom—without altering the original subject.ai-video-upscalerVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.flux-reduxImage to ImageFlux Redux is a transformation model that reimagines or enhances your input images while preserving their main structure and subject. It’s built for creative refinement — whether you want style transfer, artistic reinterpretation, cinematic polish, or mood transformation.qwen-image-2.0-proText to ImageQwen 2.0 Pro Text to Image model with maximum realism and fidelity.seedance-v1.5-pro-i2v-fastImage to VideoSeedance v1.5 Pro Image-to-Video Fast converts a single still image into a short cinematic video with quick generation speed. It preserves the original image’s composition, subject identity, and lighting while adding simple camera motion, light parallax, and subtle environmental animation.seedance-v1.5-pro-video-extendVideo to VideoSeedance v1.5 Pro Video Extend continues an existing video by generating additional frames that match the original scene’s style, lighting, motion, and mood. It is designed for smooth temporal consistency, making it ideal for extending cinematic shots, atmospheric scenes, or slow camera moves without introducing visual jumps or style changes.vidu-v2.0-i2vImage to VideoVidu's 2.0 model delivers advanced image-based video generation with enhanced lighting, emotion dynamics, and automatic frame interpolation for polished visual content.wan2.2-edit-videoVideo to VideoEasily modify existing videos using simple text commands. With Wan 2.2 Video-Edit, you can change attire, character appearance, or other visual elements directly within your video—no need to start from scratch. Works on uploads of 480p or 720p, for up to two minutes.nano-banana-2-editImage to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.kling-v1-avatar-proAudio to VideoKling AI Avatar Pro is the premium tier for making high-quality talking avatars. You upload a character image plus an audio file, and the model generates a realistic avatar video with lip-sync.ovi-image-to-videoImage to VideoOvi is a unified audio–video generation model that can transform a static image plus a descriptive prompt into a short video with synchronized audio. It supports both text-to-video and image-conditioned video inputs. With built-in lip sync, background audio / sound effects, and dialogue support, Ovi brings still visuals to life in cinematic fashion. Videos are generated in 540p resolution.ltx-2.3-lipsyncAudio to VideoLTX-2.3 LipSync generates a realistic talking video by synchronizing mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency—powered by the upgraded LTX-2.3 architecture.kling-v3.0-std-motion-controlVideo to VideoKling V3.0 Standard Motion Control allows for precise control over the camera and subject movement in generated videos. Powered by the latest Kling V3.0 architecture for improved temporal consistency and quality.ovi-text-to-videoText to VideoOvi is a unified model that generates synchronized video and audio from textual input. You write a scene description, including dialogue and ambient sounds, and Ovi produces a short video clip (typically ~5 seconds) where visuals and sound align naturally. Videos are generated in 540p resolution.seedance-v2.0-video-editVideo to VideoSeedance 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.kling-o1-video-editVideo to VideoKling O1 Video Edit lets you send an existing video clip plus an instruction/prompt to edit or transform the clip while preserving temporal coherence and subject identity. Typical edits include color grading, background replacement, object removal, slow-motion slo-mo, speed ramps, style transfer, subtle camera stabilization, and short extension/outro generation. Inputs can include: the source video, an optional frame mask (for localized edits), time range, and style/reference images.vidu-q2-reference-to-imageImage to ImageVIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.minimax-hailuo-02-pro-i2vImage to VideoAdvanced image-to-video with cinematic realism. Adds dynamic camera motion, realistic physics, and atmospheric detail for storytelling.bytedance-seedream-v5.0Text to ImageSeedream 5.0 Lite is ByteDance’s next-generation text-to-image model, delivering high-fidelity AI art with advanced visual reasoning and precise typography. Supporting up to 4K resolution and cinematic detail, it excels at complex scene construction, consistent character generation, and real-time knowledge integration for accurate, contextually relevant visuals.ltx-2-fast-text-to-videoText to VideoLTX Video Fast is a speed-optimised mode of Lightricks’ video-generation engine, supporting text-to-video workflows. It allows you to input a descriptive prompt and get a short video clip with motion, camera movement, lighting, and stylised visuals. The underlying model (LTX-Video) is built for real-time or near-real-time generation of video clips.wan2.6-text-to-videoText to VideoWAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.qwen-text-to-image-2512Image to ImageQwen Image Text-to-Image 2512 generates high-resolution, visually consistent images from text prompts. It focuses on strong scene structure, clean composition, and atmospheric lighting, making it well-suited for cinematic environments, surreal concepts, fantasy and sci-fi worlds.kling-v3.0-pro-image-to-videoImage to VideoKling 3.0 Pro Image-to-Video animates a single input image into a high-quality, realistic video with smooth camera motion, natural physics, and strong temporal consistency. It excels at real-world scenes, human motion, environmental details, and cinematic movement while preserving the original image’s structure and lighting.any-llmText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.qwen-image-2.0-editImage to ImageQwen 2.0 Image Edit model with precise background modification and enhancements.kling-v3.0-standard-text-to-videoText to VideoKling 3.0 Standard Text-to-Video generates smooth, realistic videos from text with stable motion and natural behavior. It works best with clear subjects, simple actions, and one continuous scene, making it ideal for cute animals, small actions, and calm cinematic moments.kling-v2.6-std-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ltx-2.3-image-to-videoImage to VideoLTX-2.3 Image-to-Video animates a single image into a coherent cinematic clip. It preserves scene composition and lighting while adding smooth camera motion, parallax, and environmental dynamics. Built on the upgraded LTX-2.3 architecture for sharper output and improved temporal consistency.minimax-hailuo-02-pro-t2vText to VideoHigh-fidelity text-to-video with cinematic rendering. Best for storytelling, cinematic clips, or realistic visuals with depth, atmosphere, and detail.ltx-2.3-text-to-videoText to VideoLTX-2.3 Text-to-Video generates cinematic video clips directly from text prompts. Built on an upgraded 2.3B architecture, it delivers sharper temporal consistency, faster synthesis, and more precise motion control than previous LTX versions. Ideal for concept visualization, story beats, and prompt-driven animation.topaz-image-upscaleImage to ImageTopaz Image Upscale is a high-quality image-to-image enhancement model that increases resolution, sharpness, and detail using AI super-resolution. It improves clarity, restores texture, reduces noise, and produces crisp, high-res output while preserving natural look and fine edges.seedance-pro-i2vImage to VideoSeedance Pro I2V advanced model animates still images into stunning short videos, preserving intricate visual details and applying smooth motion dynamics, ideal for high-end visuals and cinematic edits.flux-2-dev-editImage to ImageFlux 2 Dev Edit takes an existing image and applies transformations, replacements, or style changes based on a text instruction. It preserves composition, lighting, and the overall scene while modifying only what the edit prompt specifies. Ideal for creative replacements, stylistic adjustments, object swaps, and environment changes while keeping the original artistic integrity.video-combinerVideo to VideoCombine multiple short video clips (5s, 10s, etc.) into a single seamless full-length video. Upload your clips in order and choose the final output aspect ratio. 'Auto' preserves the aspect ratio of your first clip.suno-generate-soundsText to AudioGenerate sound effects using Suno chirp-crow model.suno-generate-lyricsText to TextGenerate lyrics using Suno.vidu-q2-text-to-imageText to ImageVIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.suno-boost-music-styleText to TextBoost style prompts for Suno music generation.pixverse-v5.5-t2vText to VideoPixVerse v5.5 T2V generates cinematic short videos directly from text. It excels at stylized fantasy, anime, surreal worlds, atmospheric environments, and fluid camera motion. The model produces vivid lighting, dynamic effects, depth-rich parallax, and smooth motion.seedance-lite-i2vImage to VideoSeedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.openrouter-visionText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.suno-add-vocalsText to AudioAdd vocals to an instrumental track.seedance-v1.5-pro-t2v-fastText to VideoSeedance v1.5 Pro Text-to-Video Fast generates short cinematic videos directly from text with an emphasis on speed and stability. It produces coherent scenes with simple camera motion, light environmental animation, and consistent lighting.ltx-2-19b-image-to-videoImage to VideoLTX-2-19B Image-to-Video animates a single image into a coherent cinematic clip with strong temporal stability. It preserves composition and lighting while adding controlled camera motion, realistic parallax, and subtle environmental dynamics—well suited for grounded scenes, near-future concepts, and story beats.suno-generate-mashupText to AudioCreate a mashup using 1-5 audio tracks.pixverse-v5-i2vImage to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.bytedance-seedream-v4Text to ImageSeedream v4 generates stunning, high-fidelity images from text prompts. It’s designed for creativity with strong support for realism, fantasy, and artistic styles.bytedance-seedream-v4-editImage to ImageSeedream v4 Edit refines or transforms existing images based on a new prompt and a reference. Instead of masking, you provide a source image and describe how it should be altered — adjusting style, details, or replacing elements while keeping the subject consistent.nano-banana-proText to ImageNano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.minimax-voice-cloneText to AudioMinimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.suno-add-instrumentalText to AudioAdd instrumental backing to acapella audio.wan2.6-image-editImage to ImageWAN 2.6 Image Edit applies targeted, instruction-based edits to an existing image while preserving composition, perspective, and lighting. It’s ideal for object replacement, material changes, environment tweaks, and style adjustments with clean integration and minimal artifacts—keeping the original scene coherent and cinematic.seedance-v2.0-i2vImage to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.seedance-v1.5-pro-i2vImage to VideoSeedance v1.5 Pro Image-to-Video converts a single still image into a smooth cinematic video clip. It preserves the original image’s composition, subject identity, and lighting while adding controlled camera motion, natural parallax, and environmental animation. This mode balances visual quality and motion complexity, making it ideal for cinematic scenes, fantasy worlds, sci-fi environments, and storytelling shots.seedance-v2.0-t2vText to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.flux-2-klein-9b-editImage to ImageFlux-2-Klein-9B Edit performs higher-quality image edits with better detail retention, lighting consistency, and texture handling compared to smaller variants. It’s well-suited for cute character edits, object additions, and visual refinements that need to look natural and polished while keeping the original scene intact.kling-v3.0-pro-text-to-videoText to VideoKling 3.0 Pro is a high-end video generation model capable of producing longer, smoother, and more realistic cinematic videos with strong motion consistency. It handles complex scenes, realistic physics, natural camera movement, and detailed environments better than earlier versions.flux-dev-loraTrainingEnables text-to-image generation using custom LoRA models. Generate consistent characters, styles, or branded visuals with high quality and fast results.flux-kontext-dev-i2iImage to ImageTakes an input images and transforms it based on a new prompt. Keeps structure or pose while changing style, appearance, or details.neta-luminaText to ImageNeta Lumina is a powerful anime-style text-to-image model developed by Neta.art Lab. It’s built on Lumina-Image-2.0, fine-tuned with over 13 million high-quality anime images. It offers strong understanding of multilingual prompts, excellent detail fidelity, support for Danbooru tags, and leaning into niche styles like furry, Guofeng, pets, scenic backgrounds, etc.suno-remix-musicText to AudioThis API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.gpt-image-1.5Text to ImageGPT-Image-1.5 is a high-quality text-to-image generation model designed for rich visual reasoning, detailed compositions, and strong prompt understanding. It excels at complex scenes, symbolic imagery, cinematic lighting, surreal concepts, product visuals, and imaginative world-building while maintaining coherence and fine detail.kling-v3.0-pro-motion-controlVideo to VideoKling V3.0 Pro Motion Control provides the highest level of detail and control for video generation. Suitable for professional workflows requiring complex cinematic camera work and subject consistency.kling-v2.1-master-t2vText to VideoKling 2.1 Master’s T2V mode allows users to generate vivid, high-quality videos from detailed text prompts. It supports dynamic scenes, natural motion, and cinematic quality — perfect for storytelling, ads, or content creation from imagination alone.flux-2-klein-4b-editImage to ImageFlux-2-Klein-4B Edit applies lightweight, instruction-based edits to an existing image. It’s best for clear object swaps, small visual changes, and cute enhancements while preserving the original scene’s layout and lighting. Ideal for fast edits, UI demos, and simple creative tweaks.ideogram-characterImage to ImageIdeogram’s Character Reference model enables consistent character generation using just one reference image. Upload a clear character portrait—and you can place that character in unlimited scenes, styles, poses, or narratives with visual fidelity maintained across all outputs.kling-v3.0-standard-image-to-videoImage to VideoKling 3.0 Standard Image-to-Video animates a single input image into a short, realistic video with smooth, stable motion. It prioritizes temporal consistency, natural physics, and subtle camera movement, making it ideal for everyday scenes, travel moments, people, vehicles, and calm cinematic shots.kling-v1-avatar-standardAudio to VideoKling AI Avatar Standard creates talking avatar videos from a single image + audio input. It supports realistic humans, animals, or stylized characters, producing lip-synced avatar videos easily.kling-v2.5-turbo-pro-i2vImage to VideoKling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.sdxl-loraTrainingThe SDXL LoRA image model enhances Stable Diffusion XL with specialized fine-tuning, letting you generate images in unique styles, characters, or themes. By applying LoRA weights, you can create visuals that match a specific aesthetic, celebrity look, anime style, or custom-trained subject.seedance-v1.5-pro-t2vText to VideoSeedance v1.5 Pro Text-to-Video generates high-quality cinematic videos directly from text prompts. It focuses on smooth motion, rich atmosphere, and coherent scene structure, making it ideal for fantasy worlds, sci-fi environments, surreal visuals, and cinematic storytelling shots with detailed lighting and depth.hunyuan-image-2.1Text to ImageHunyuan Image is a powerful text-to-image generation model that produces photorealistic and highly detailed visuals. It excels at creating portraits, environments, and concept art with strong consistency and realism. Designed for versatility, it supports both natural photography styles and imaginative artistic outputs.qwen-image-edit-plusImage to ImageQwen Image Edit Plus is an upgraded image-editing model that supports multiple image references and superior text editing. Powered by the 20B-parameter Qwen architecture, it allows changes like background swap, style transfer, object removal/addition, and precise text edits (bilingual: English/Chinese) while maintaining visual consistency and preserving details of the original images.kling-v2.5-turbo-pro-t2vText to VideoKling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.leonardoai-lucid-originText to ImageLucid Origin is LeonardoAI’s advanced image generation model, designed for ultra-realistic, vibrant, and highly detailed visuals. It excels at creating photorealistic portraits, landscapes, product shots, and stylized art while faithfully following complex prompts.wan2.5-image-to-videoImage to VideoWAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.wan2.5-text-to-videoText to VideoWAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.wan2.5-text-to-imageText to ImageWAN 2.5 Text-to-Image generates high-quality, realistic or stylized images from textual descriptions. It supports detailed visual storytelling, cinematic compositions, and versatile styles — from portraits and product shots to landscapes and fantasy scenes.topaz-video-upscaleVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.wan2.5-image-editImage to ImageThe Wan2.5 Edit Image model allows you to transform existing images with precision and creativity. By providing an image along with an edit prompt, you can make realistic changes, enhancements, or stylistic adjustments—whether it’s altering objects, changing backgrounds, adding details, or applying an entirely new artistic style.ai-video-upscaler-proVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.add-video-watermarkVideo to VideoAdd custom watermark to videos with adjustable position, opacity, and size. Free local processing using FFmpeg.video-watermark-removerVideo to VideoThe AI Video Watermark Remover is our flagship model designed to remove Sora 2 watermarks, logos, captions, and unwanted text from videos without compromising quality. Supporting a wide range of formats, it's fast, efficient, and processes with the highest quality.leonardoai-phoenix-1.0Text to ImageLeonardoAI Phoenix 1.0 is a professional-grade AI image model designed for realistic, cinematic, and highly detailed visuals. It excels at interpreting complex prompts, rendering text within images, and creating high-resolution outputs suitable for editorial, commercial, or creative projects.gpt-5-nanoText to TextGPT-5 Nano is a lightweight, high-speed language model from the GPT-5 family designed for instant text generation. It delivers intelligent, context-aware responses for creative writing, summarization, dialogue, code generation, and automation — all at low latency and cost. Perfect for chatbots, assistants, content tools, and real-time applications that need fast, reliable text output.leonardoai-motion-2.0Image to VideoMotion 2.0 is Leonardo.AI's cutting-edge model for creating high-quality 5-second videos from text prompts. It offers enhanced control over animation, including camera movements, lighting, and scene dynamics.higgsfield-soul-image-to-imageImage to ImageSOUL is an AI image model focused on hyper-realistic, magazine or editorial-style visuals, especially for fashion, portraits, lifestyle, and commercial content. It offers over 50 curated style presets to get a specific aesthetic without needing complicated prompt engineering. It generates photography-quality images with lighting, textures, and context that feel real — including natural imperfections like film grain, dust, or lens effects for authenticity.veo3.1-reference-to-videoImage to VideoVeo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.higgsfield-dop-image-to-videoImage to VideoHiggsfield’s DOP (Director of Photography) Motion Effects empower creators to combine cinematic camera moves with built-in visual effects—like explosions, fire, distortion, disintegration, and transitions—directly in AI video generation. You choose from a library of motion presets (e.g. Earth Zoom, Bullet Time, Dolly Zoom) and overlay dynamic effects that accentuate storytelling without needing a full VFX pipeline.remix-videoVideo to VideoTransform and resize your videos effortlessly with remix video tool.openai-sora-2-pro-storyboardText to VideoSora 2 Pro enables creators to structure video narratives by chaining multiple scenes through storyboard “cards.” Each card defines a segment of the video—setting, characters, actions, timing—and the model stitches them into a cohesive multi-scene video. This gives you more control over pacing, transitions, and storytelling flow.veo3.1-extend-videoText to VideoVeo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.gpt-5-miniText to TextGPT‑5 Mini is a compact yet powerful AI that converts plain text ideas into detailed, structured prompts suitable for use in text-to-image, text-to-video, and other generative AI models. It’s perfect for creators who want to quickly craft high-quality prompts without manually thinking about style, composition, and descriptive details. The model helps accelerate workflows for artists, video producers, and designers.seedance-pro-i2v-fastImage to VideoSeedance Pro Fast is the high-speed image-to-video generation variant from ByteDance’s Seedance series. With this model you upload a reference image and—using a text prompt—generate short, dynamic video clips (typically 3-12 seconds) featuring smooth motion, cinematic camera moves, prompt-accurate actions, and high visual fidelity. It supports resolutions up to 1080p, multiple aspect ratios (16:9, 9:16, etc.), and rapid turnaround—ideal for social content, product motion, storytelling from a still, and fast prototyping.seedance-pro-t2v-fastText to VideoSeedance Pro Fast is ByteDance’s advanced text-to-video model that turns natural-language prompts into short, cinematic video clips with realistic motion, camera dynamics, and consistent scene detail.ltx-2-pro-text-to-videoText to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.ltx-2-fast-image-to-videoImage to VideoLTX-2 Fast is a speed-optimized mode of the LTX-2 engine by Lightricks, focused on generating short video clips from a still image + prompt (I2V) with good fidelity and rapid turnaround. It supports audio/video together, multiple aspect ratios, and is ideal when you need quick output for iteration or storyboarding.vidu-q2-referenceImage to VideoVidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.vidu-q2-turbo-start-end-videoImage to VideoVidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.vidu-q2-pro-start-end-videoImage to VideoVidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.minimax-hailuo-2.3-pro-t2vText to VideoHailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.minimax-hailuo-2.3-standard-i2vImage to VideoHailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.minimax-hailuo-2.3-standard-t2vText to VideoHailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.minimax-hailuo-2.3-fastImage to VideoMinimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.kling-v2.5-turbo-std-i2vImage to VideoKling 2.5 Turbo Std: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.reve-text-to-imageText to ImageGenerate images from text prompts using reve's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.reve-image-editImage to ImageReVE Edit is a next-generation image editing model that allows users to apply detailed visual transformations through natural language. Whether you want to restyle portraits, modify backgrounds, or create artistic reinterpretations, ReVE Edit delivers realistic and coherent results while preserving structure and identity.grok-imagine-text-to-imageText to ImageGrok Imagine is xAI’s high-quality image generation model that transforms text prompts into detailed, stylish, and visually expressive images. It excels at creating vivid scenes, characters, environments, and concept art with strong lighting, depth, and artistic clarity. Get 6 images each time.seedvr2-image-upscaleImage to ImageSeedVR2 is a one-step diffusion-transformer model designed for image restoration, super-resolution, deblurring, and artifact removal. It enhances low-quality or compressed images into clean, sharp, high-resolution results while preserving natural colors and fine details.qwen-image-edit-plus-loraImage to ImageQwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.nano-banana-pro-editImage to ImageNano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.kling-o1-edit-imageImage to ImageKling O1 Image Edit applies targeted transformations to an existing image while preserving composition, lighting, and visual consistency. Use it to replace objects, retouch elements, change materials, or apply stylistic shifts with high fidelity and minimal artifacts.kling-o1-image-to-videoImage to VideoKling O1’s Image-to-Video mode transforms one or more reference images into short cinematic video clips by adding natural motion, camera choreography, and scene dynamics while preserving subject identity and visual consistency. It supports start/end frames.kling-o1-reference-to-videoImage to VideoKling O1’s Reference-to-Video mode generates a dynamic video using one or multiple reference images as the visual foundation. It preserves identity, style, composition, and key visual details from the references while adding realistic camera motion, environment dynamics, and scene animation.kling-o1-video-edit-fastVideo to VideoVideo Edit Fast is the lightweight, high-speed editing mode of Kling O1. It performs quick edits on an existing video without heavy processing—ideal for fast object replacements, light enhancements, color tweaks, or simple visual adjustments. This mode focuses on speed over complex reconstruction, making it suitable for rapid iterations, previews, and small edits while preserving the original video’s motion and structure.flux-2-devText to ImageFlux 2 Dev is a powerful text-to-image diffusion model designed for high-quality, fast, and highly detailed visual generation. It excels at creating cinematic lighting, vibrant compositions, surreal concepts, characters, products, and worlds with strong prompt following and artistic control. Ideal for rapid image ideation, visual storytelling, and concept art.flux-2-pro-editImage to ImageFlux-2-Pro Edit enables precise, high-fidelity modifications to an existing image while preserving its lighting, style, mood, and composition. It’s ideal for replacing objects, altering materials, adjusting environmental elements, or performing stylistic transformations without damaging the original scene’s quality. Flux-2-Pro maintains ultra-detailed textures and cinematic realism during edits.bytedance-seedream-v4.5-editImage to ImageSeedream-v4.5 Edit allows you to transform an existing image using natural-language instructions. It preserves the core composition, lighting, and style of the original while modifying only the requested elements — perfect for object replacement, environment changes, stylistic adjustments, and high-detail creative reworks.kling-v2.6-pro-i2vImage to VideoKling-v2.6-Pro Image-to-Video transforms a single creative image into a short cinematic video. It preserves the original style, lighting, and composition while adding smooth camera motion, atmospheric effects, and dynamic environmental animation.kling-v2-avatar-standardAudio to VideoAI-Avatar v2 Standard generates a talking-avatar video from a reference image and an audio dialogue. It performs accurate lip-sync, natural facial expressions, subtle head motion, blinking, and light emotional cues based on voice tone. This Standard version focuses on speed and natural realism.wan2.2-spicy-image-to-videoImage to VideoWan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.minimax-speech-2.6-hdText to AudioSpeech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.minimax-speech-2.6-turboText to AudioSpeech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.seedance-v1.5-pro-video-extend-fastVideo to VideoSeedance v1.5 Pro Video Extend Fast quickly extends an existing video by generating a short continuation that matches the original style, motion, and lighting. This mode prioritizes fast output and smooth continuity with minimal new motion, making it ideal for previews, quick edits, and lightweight shot extensions without complex effects.gpt-image-1.5-editImage to ImageGPT-Image-1.5 Edit applies precise, instruction-based modifications to an existing image while preserving composition, lighting, perspective, and visual coherence. It’s well-suited for object replacement, concept evolution, symbolic edits, and creative transformations that feel natural and intentional rather than destructive.grok-imagine-image-to-imageImage to ImageGrok Imagine Image-to-Image transforms an existing image using natural language instructions while preserving scene structure, perspective, and lighting. It is ideal for object replacement, environment evolution, concept re-imagining, and creative edits that feel grounded and visually coherent rather than over-stylized.ltx-2-19b-text-to-videoText to VideoLTX-2-19B Text-to-Video generates coherent cinematic videos directly from text, with an emphasis on temporal stability, natural motion, and conceptual clarity. It works best when the scene has a strong visual idea where motion reinforces meaning rather than overwhelming it.flux-2-klein-4bText to ImageFlux-2-Klein-4B is a lightweight, fast text-to-image model optimized for clear subject rendering, good prompt adherence, and efficient generation. It works best with simple compositions, everyday scenes, and cute or friendly visuals, making it ideal for UI graphics, demos, thumbnails, mascots, and quick creative iterations.flux-2-klein-9bText to ImageFlux-2-Klein-9B is a mid-size text-to-image model that balances detail quality and generation speed. It handles richer lighting, better textures, and more nuanced scenes than smaller variants, while still working well with clear, grounded prompts. Ideal for polished illustrations, product visuals, mascots, and everyday scenes with character.z-image-pText to ImageZ-Image P is based on PiAPI's Qubico/z-image text-to-image model.openai-sora-2-standard-image-to-videoImage to VideoOpenAI Sora 2 Standard Image to Video model (High Priority). Generate stunning 10s videos from an image and text prompt.flux-2-klein-9b-turbo-editImage to ImageFlux-2-Klein-9B Turbo Edit offers high-quality, ultra-fast image editing with superior detail retention. This high-efficiency version of Klein 9B Edit handles lighting and textures with precision while delivering edits much faster than the standard variant. Best for polished character edits and professional refinements where speed is critical.292 Models FoundESC TO CLOSEOne API for All AI ModelsLowest cost API for image, video and audio generation.Explore ModelsView DocumentationBuild & Scale Generative AI WorkflowsThe fastest way to experiment, build and deploy Generative AI apps. Access 20+ optimized models with a single API.Start BuildingReal World Use CasesExplore how top developers are leveraging our workflows to build next-generation applications.Social MediaGenerate production-ready assets for social media automatically. Scale your content creation with custom workflows.Instagram ViralBoost engagement with AI-generated visuals.TikTok ScriptsWrite scripts that hook viewers in seconds.E-CommerceGenerate production-ready assets for e-commerce automatically. Scale your content creation with custom workflows.Product ShotsStudio-quality product photography with AI.Model SwapChange models in photos instantly.MarketingGenerate production-ready assets for marketing automatically. Scale your content creation with custom workflows.Email CopyHigh-converting email sequences.Blog WriterSEO-optimized blog posts.AdvertisingGenerate production-ready assets for advertising automatically. Scale your content creation with custom workflows.Facebook AdsWinning ad creatives made easy.Google AdsSearch and display ad generator.FashionGenerate production-ready assets for fashion automatically. Scale your content creation with custom workflows.Design GeneratorCreate new clothing patterns.Virtual Try-OnVisualize clothes on any avatar.Featured ModelsExplore LibraryImage to Videoopenai-sora-2-pro-image-to-video$3Sora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.Image to Videoveo3.1-image-to-video$2.5Veo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.Image to Imagenano-banana-edit$0.03Nano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.Image to Videoopenai-sora-2-image-to-video$1.5Sora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.Text to Videoveo3.1-text-to-video$2.5Veo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.Text to Imagemidjourney-v7-text-to-image$0.03Midjourney V7 produces high-quality, stylized images from text prompts. Known for its artistic flair, surreal composition, and vivid textures, it's perfect for character concepts, fantasy environments, and creative illustrations.Text to Videoveo3.1-fast-text-to-video$0.6Veo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.Image to Imagemidjourney-v7-image-to-image$0.03Use Midjourney V7’s I2I to refine or reinterpret existing images. Modify style, mood, lighting, or content while preserving the overall composition — great for alternate versions, art variations, or polishing concepts.Image to Videomidjourney-v7-image-to-video$0.15Midjourney V7’s I2V breathes motion into still images, animating characters, environments, and objects with artistic transitions. Ideal for looping visual stories, concept animations, or enhancing still visuals with subtle motion.Text to Imagenano-banana$0.03Nano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.Explore LibraryFrequently Asked QuestionsEverything you need to know about building with MuAPI.What is MuAPI?+MuAPI is a comprehensive platform for building and scaling Generative AI applications. We provide access to over 20+ optimized AI models through a single, unified API, making it easy to experiment, build, and deploy AI-powered features.How does the pricing work?+We offer a flexible pay-as-you-go pricing model, so you only pay for what you use. There are no monthly subscription fees for access, and models are priced competitively per generation. You can view detailed costs in our documentation.Can I use generated assets for commercial projects?+Yes! All assets generated through MuAPI (images, videos, text) are yours to use for both personal and commercial projects. You retain full ownership of the content you create.How fast is the integration?+With our unified API, you can integrate multiple AI models in minutes using a single standardized interface. We provide SDKs and comprehensive documentation to get you started immediately.Which AI models are supported?+We support a wide range of state-of-the-art models including FLUX, Midjourney (via proxy), Runway Gen-2, Stable Diffusion, OpenAI DALL-E 3 & Sora (preview), and many others for image, video, and text generation.What's New🌊 Seedance 2.0 is Now Live on — MuAPI!Seedance 2.0 has officially arrived on MuAPI, bringing smoother motion, enhanced realism, and next-level cinematic video generation. And the best part? It’s available exclusively on MuAPI. If you want access to Seedance 2.0’s latest capabilities, MuAPI is the only place to get it.Learn MoreGot it🍌 Nano-Banana 2 is Now Live on MuAPI – 33% Cheaper and 4× Faster!Google’s Nano-Banana 2 is now available on MuAPI, offering powerful image generation at an even more affordable price. It’s 33% cheaper than Nano-Banana Pro and 4× faster, making it the perfect choice for creators who want high-quality visuals with maximum speed and efficiency. Create more, faster, and at lower cost with Nano-Banana 2 on MuAPI.Learn MoreGot it🍌 Nano Banana Pro is Now Live on MuAPI – 4K Images at 30% OFF!Experience Google’s upgraded Nano Banana Pro model on MuAPI — now supporting stunning 4K image generation. Enjoy ultra-sharp visuals, richer detail, and professional-grade output. And for a limited time, get it at an exclusive 30% discount, making high-resolution AI creativity more affordable than ever.Learn MoreGot it🎥 Veo 3.1 is Now Live on MuAPI – Just $0.6 per Video!Generate cinematic-quality videos faster than ever with Google’s Veo 3.1 — now available on MuAPI at only $0.6 per video. Enjoy first and last frame control, seamless motion, and stunning realism at the most affordable price. Perfect for creators, marketers, and storytellers looking for high-quality AI video at scale.Learn MoreGot it🌌 OpenAI Sora 2 is Now Live on MuAPI – Just $0.25 per Video!Create cinematic, high-quality videos with OpenAI’s latest Sora 2 model — now available on MuAPI at only $0.25 per video. Experience cutting-edge motion, realism, and storytelling power at the most affordable price. Perfect for creators, filmmakers, and innovators who want top-tier video generation at scale.Learn MoreGot it🍌 Nano Banana Now on MuAPI – Just $0.03 per Image!Google’s Nano Banana image model is now live on MuAPI at the unbeatable price of only $0.03 per image. Generate stunning visuals or seamlessly edit existing ones with speed, precision, and affordability. The perfect balance of creativity and cost-efficiency is here!Learn MoreGot it⚡ Veo 3 Fast Now on MuAPI – Just $0.6 per Video!Experience cinematic AI video generation with Google’s Veo 3 Fast at the lowest cost ever — only $0.6 per video on MuAPI. Create stunning visuals in seconds with blazing speed and unmatched affordability. Perfect for creators who want quality videos without breaking the bank.Learn MoreGot it --- hidream-i1-devText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.veo3-image-to-videoImage to VideoVEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.wan2.1-text-to-imageText to ImageWAN 2.1 is a powerful AI model that transforms text prompts into high-resolution, photorealistic images. It excels at detailed object rendering, realistic lighting, and fine textures, making it ideal for visual content, concept art, advertising, and digital storytelling.ai-video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.motion-controlsImage to VideoMotion Controls adds dynamic camera movements, speed ramps, and zoom effects to bring your images to life as smooth, engaging videos.vfxImage to VideoVFX delivers high-impact visual effects like explosions, particles, and cinematic overlays to transform static images into action-packed videos.veo3-text-to-videoText to VideoVEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.flux-kontext-max-t2iText to ImageFlux Kontext Max T2I delivers photorealistic or cinematic-quality images with exceptional detail. It's optimized for high-end visuals — from realistic humans to polished product renders.runway-text-to-videoText to VideoGenerate short, high-quality videos from plain text prompts. RunwayML’s text-to-video model interprets your written description and animates it into a moving visual scene with realistic or stylized motion.suno-extend-musicText to AudioThis API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.hunyuan-text-to-videoText to VideoHunyuan T2V generates detailed and dynamic videos from text prompts with a focus on realism and coherent motion. It handles multi-object scenes, human actions, and cinematic compositions effectively, making it ideal for storytelling and visual concepts.veo3-fast-text-to-videoText to VideoVEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.ai-product-shotImage to ImageInstantly generate studio-quality product images with AI. Upload your item photo and get clean, stylized shots perfect for e-commerce, ads, and catalogs.gpt4o-image-to-imageImage to ImageTransform an input image based on a new prompt — like changing style, lighting, or composition. Useful for reinterpreting visuals while keeping structure.hunyuan-image-to-videoImage to VideoHunyuan I2V takes a static image and generates realistic video animations by interpreting motion and context. It works well for human portraits, objects, or scenes, adding lifelike movement while maintaining the image's integrity.ai-video-face-swapVideo to VideoReplace faces in videos with stunning realism. Our AI ensures accurate expression transfer, lighting consistency, and smooth frame-by-frame blending.hunyuan-fast-text-to-videoText to VideoHunyuan Fast T2V provides accelerated video generation from text prompts with slightly reduced detail but excellent speed. Ideal for rapid prototyping, concept testing, and short-form ideas where time is critical.runway-aleph-v2vVideo to VideoTransform any input video into a new visual style or scene while preserving motion and structure. Aleph V2V lets you apply artistic looks, cinematic lighting, or thematic changes to existing footage.minimax-image-01-subject-referenceImage to ImageMinimax’s I2I “Subject Reference” model enables you to transform images while preserving the appearance of a subject using a single reference image. Ideal for maintaining character likeness—features, clothing, or expression—across different styles or settings.ai-product-photographyImage to ImageCreate professional-grade product photos using AI. Upload your item image and describe it with a prompt, and get studio-style, lifestyle, or creative backgrounds in secondsbytedance-seededit-v3Image to ImageSeededit allows precise edits to images using masks and prompt guidance. Whether you're replacing backgrounds, changing clothing, or inpainting missing areas, Seededit ensures realistic, high-quality results with semantic control.ai-background-removerImage to ImageInstantly remove image backgrounds with pixel-perfect precision. Ideal for product photos, profile pictures, and creative projects.ai-image-upscalerImage to ImageTransform blurry or pixelated images into high-definition visuals. Our AI Image Upscaler uses deep learning to reconstruct details and bring your visuals to life.wan2.2-image-to-videoImage to VideoWan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.runway-act-two-i2vImage to VideoUpload a single character image and a driving video — the model transfers facial expressions and head movements from the video onto your image, bringing it to life. It works with photos, illustrations, or stylized portraits, making them speak, blink, and move naturally. Ideal for avatars, AI presenters, digital actors, and story scenes.nano-banana-effectsImage to ImageNano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.pixverse-v4.5-i2vImage to VideoUpload an image and PixVerse v4.5 will breathe life into it with smooth camera motion, realistic effects, and animated elements. Whether it’s a portrait, landscape, or concept art, this mode turns still visuals into dynamic short videos.ai-image-face-swapImage to ImageAdvanced facial recognition and blending algorithms enable precise face swaps while preserving skin tone, lighting, and facial geometry.midjourney-v7-omni-referenceImage to ImageMidjourney's Omni Reference lets you reuse characters, creatures, or styles from an existing image and place them into entirely new scenes. Simply provide a reference image (oref) and Midjourney will maintain identity, details, and visual consistency — ideal for storytelling, character design, or branding across multiple generations.ideogram-v3-t2iText to ImageIdeogram v3 is an advanced text-to-image model designed for creating highly detailed and visually striking images directly from text prompts. It’s especially good for artistic compositions, design mockups, concept art, and photorealistic scenes. With strong support for text rendering inside images, it’s widely used for posters, typography-based art, and creative branding.ai-dress-changeImage to ImageInstantly change outfits in images using AI. Visualize different clothing styles without the need for physical trials—perfect for fashion, e-commerce, and virtual try-ons.grok-imagine-image-to-videoImage to VideoGrok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into short (≈6 second) cinematic videos with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.image-effectsImage to ImageAI Image Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning images from a image.mmaudio-v2-text-to-audioText to AudioConvert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.mmaudio-v2-video-to-videoVideo to VideoMMAudio-v2 generates high-quality, synchronized audio from video or text inputs. Seamlessly integrate it with AI video models to create fully-voiced, expressive video content.wan2.1-image-to-videoImage to VideoAnimate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.sync-lipsyncAudio to VideoGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.google-imagen4-fastText to ImageImagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.openai-sora-2-pro-image-to-videoImage to VideoSora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.runway-act-two-v2vVideo to VideoTake an existing character video and sync it with the motion from a reference video. This lets you update facial expressions, head turns, and speech gestures while keeping the original look and style. It’s perfect for reshooting performances, dubbing, or animating characters without re-rendering visuals.luma-modify-videoVideo to VideoLuma Modify Video lets you transform an existing video into a new creative scene while keeping the original motion and timing intact. The result is a new video with the same movements but a completely fresh look, atmosphere, or theme.pixverse-v4.5-t2vText to VideoPixVerse v4.5 transforms descriptive text into vivid, high-resolution video clips. It understands complex scenes, human motion, and cinematic camera angles — great for creative storytelling, trailers, and animated concepts.veo3.1-image-to-videoImage to VideoVeo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.seedance-pro-t2vText to VideoSeedance Pro delivers high-fidelity video generation from text, producing rich visuals, smooth camera movement, and realistic scenes. Best for storytelling, content creation, and visual production.nano-banana-editImage to ImageNano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.infinitetalk-video-to-videoVideo to VideoInfiniteTalk Video-to-Video enhances or transforms existing videos by syncing the subject’s lip movements and facial expressions with new dialogue or speech. Instead of starting from a still image, you provide a video clip, and the model seamlessly reanimates the speaker’s mouth and expressions to match the script.openai-sora-2-image-to-videoImage to VideoSora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.veo3.1-text-to-videoText to VideoVeo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.chroma-imageText to ImageCroma Image is an advanced text-to-image generation model designed for high-quality, creative, and versatile visuals. It can produce anything from photorealistic portraits and products to imaginative concept art, fantasy illustrations, and cinematic scenes.wan2.2-animateVideo to VideoWan2.2 Animate is a video-to-video model for animating a character or replacing a character in existing video clips. It replicates holistic movement and facial expressions from a reference video or pose while preserving the target character’s appearance. You upload both an image (for the character) and a video containing motion/expression, and the model generates a video where the character in your image moves like the reference. Supports 480p or 720p, up to 120 secondsopenai-sora-2-text-to-videoText to VideoSora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.minimax-hailuo-2.3-pro-i2vImage to VideoHailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.ai-skin-enhancerImage to ImageSmooth skin, reduce blemishes, and enhance complexion with natural-looking results. Perfect for portraits, selfies, and professional photo retouching.wan2.5-image-to-video-fastImage to VideoConvert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.ai-clippingVideo to VideoConvert long-form videos into engaging short clips using AI clipping.ltx-2-19b-lipsyncAudio to VideoLTX-2-19B LipSync generates a realistic talking video by synchronizing a person’s mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency. Ideal for avatars, dubbing, dialogue replacement, and character narration.ai-captionsVideo to VideoAdd AI-generated animated captions to any video using Vadoo's caption engine. Supports multiple languages and viral caption themes like Hormozi style. Perfect for social media creators, marketers, and content producers.midjourney-v7-text-to-imageText to ImageMidjourney V7 produces high-quality, stylized images from text prompts. Known for its artistic flair, surreal composition, and vivid textures, it's perfect for character concepts, fantasy environments, and creative illustrations.ai-color-photoImage to ImageAutomatically add lifelike colors to black-and-white images. Our AI brings history to life with natural tones, accurate shading, and context-aware colorization.flux-devText to ImageGenerate stunning visuals from simple text prompts. Flux Dev transforms your ideas into high-quality, creative images using powerful AI vision models. Perfect for design, storytelling, concept art, and marketing.hidream-i1-fastText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.openai-sora-2-pro-text-to-videoText to VideoSora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.veo3-fast-image-to-videoImage to VideoQuickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.flux-kontext-dev-t2iText to ImageGenerates an image from a text prompt, with optional reference image for pose or style guidance. Ideal for controlled, consistent image creation using just a description.seedance-lite-t2vText to VideoSeedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.kling-o1-text-to-videoText to VideoKling O1 is a unified, multi-modal video generation engine that transforms natural language prompts into short cinematic video clips. It supports text-to-video generation with realistic motion, dynamic camera moves, and coherent scene rendering.seedance-v2.0-extendText to VideoSeedance 2.0 Extend Video continues an existing Seedance 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment.hidream-i1-fullText to ImageThe most advanced version of HiDream I1, delivering high-resolution, detailed images with superior prompt understanding. Best suited for production, content creation, and high-fidelity applications.flux-kontext-pro-i2iImage to ImageFlux Kontext Pro I2I variant enables transforming base images into refined artwork while keeping structure intact. It’s useful for sketch refinement, visual style changes, and creative edits such as re-dressing, relighting, or re-theming with prompt guidance.ai-ghibli-styleImage to ImageBring your imagination to life with art inspired by the enchanting world of Studio Ghibli. This AI model generates dreamy, hand-drawn visuals with soft colors, whimsical characters, and painterly backgroundswan2.1-lora-t2vTrainingWAN 2.1 LoRA T2V enables users to generate videos from text prompts with custom-trained LoRA modules. Tailor the generation to specific characters, outfits, or animation styles — ideal for brand storytelling, fan content, and stylized animations.portrait-stylistImage to ImageProfessional AI portrait styles including hair, makeup, style, and fashion transformations.kling-v2.1-pro-i2vImage to VideoKling 2.1 Pro is the high-end version of Kuaishou’s video generation model, offering enhanced realism, longer motion sequences, and cinematic quality. In I2V mode, it animates static images with fluid environmental effects.veo3.1-fast-text-to-videoText to VideoVeo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.qwen-image-2.0-pro-editImage to ImageQwen 2.0 Pro Image Edit model with maximum precision and modifications.ai-anime-generatorText to ImageCreate stunning anime-style artwork instantly with our AI Anime Generator. Customize characters, scenes, and styles effortlessly in seconds!z-image-turboText to ImageZ-Image Turbo is a high-speed text-to-image model optimized for fast creative generation. It produces detailed, high-contrast, high-resolution images with strong stylization control. Ideal for rapid concept creation, visual exploration, product ideas, fantasy scenes, and cinematic composition tests. Designed for low latency and strong prompt adherence.kling-v2-avatar-proAudio to VideoAI-Avatar v2 Pro takes a reference image of a person/character and an audio dialogue clip, then generates a realistic talking-avatar video. It preserves identity, lip syncs accurately to the audio, adds natural head movement, eye motion, expressions, and cinematic lighting.openai-sora-2-standard-text-to-videoText to VideoOpenAI Sora 2 Standard Text to Video model (High Priority). Generate stunning 10s videos from text prompts.ai-image-extensionImage to ImageExpand the edges of any image with AI. This model continues your original photo or artwork beyond its borders while matching style, lighting, and content.wan2.1-lora-i2vTrainingBring still images to life using WAN 2.1 LoRA I2V, which supports custom LoRA fine-tunes for identity consistency. Animate expressions, subtle movements, or full-body actions while preserving personalized features from the image and LoRA.pixverse-v5.5-i2vImage to VideoPixVerse v5.5 I2V transforms a single image into a dynamic cinematic video clip. It adds smooth camera motion, atmospheric animation, natural parallax, and environmental effects while preserving the image’s original art style and composition.heygen-video-translateVideo to VideoConvert any video into 175+ languages with synchronized voice translation, AI-voice cloning, and accurate lip sync. Just upload your video (or provide a link), select a target language, and HeyGen recreates the speech in that language. 0.05$ per second.wan2.2-spicy-video-extendVideo to VideoWan-2.2-spicy Video Extend continues an existing video by generating new frames that match the original style but add stronger motion, bolder effects, and spicier dramatics.kling-o1-standard-reference-to-videoImage to VideoKling O1 Standard Reference-to-Video generates a smooth, realistic video using one or multiple reference images as visual guidance. It preserves the visual identity, composition, and lighting from the references while adding subtle camera motion, natural parallax, and light environmental animation. This mode prioritizes stability and realism, making it ideal for character shots, environments, product visuals, and calm cinematic scenes.qwen-image-edit-2511Image to ImageQwen Image Edit 2511 performs precise, instruction-driven edits on an existing image while preserving composition, lighting, and overall style. It’s well-suited for object replacement, material changes, localized edits, and subtle scene adjustments with strong visual consistency and minimal artifacts.add-image-watermarkImage to ImageAdd custom watermark to images with adjustable position, opacity, and size. Free local processing using PIL.kling-v2.6-pro-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ai-object-eraserImage to ImageEasily remove unwanted objects, people, or text from any image using AI. Just select the area you want to erase, and the model will intelligently fill the space with realistic background matching the surrounding environment. No Photoshop skills needed.runway-image-to-videoImage to VideoAnimate any image by turning it into a video with motion effects or scene continuity. RunwayML’s I2V model transforms static visuals into short clips by extrapolating depth, movement, and temporal dynamics.suno-create-musicText to AudioSuno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.z-image-baseText to ImageZ-Image Base is a general-purpose text-to-image model designed for reliable, high-quality image generation from natural language prompts. It focuses on clear composition, good prompt adherence, and versatile output across everyday scenes, product-style visuals, characters, and creative concepts.flux-kontext-pro-t2iText to ImageFlux Kontext Pro T2I offers fast and reliable generation with creative flexibility. It supports stylized prompts, character design, and fantasy themes while maintaining clear subject coherence.flux-krea-devText to ImageFlux Krea Dev is a text-to-image model built by Black Forest Labs in collaboration with Krea AI, designed to generate highly photorealistic images that avoid the common 'AI look' artifacts (plastic skin, overexposed lighting, synthetic textures). It emphasizes real texture, natural lighting, and aesthetic control.flux-kontext-max-i2iImage to ImageFlux Kontext Max I2I in Max mode allows precise image enhancement and visual transformations while retaining the source layout. It’s powerful for retouching, photo-to-art workflows, concept refinement.tiktok-carouselText to ImageAI TikTok Carousel Generator — create viral TikTok carousel posts from a single text prompt. Choose a proven storytelling format (Problem-Solution, Listicle, Tutorial, Before & After), set your slide count (3-10), and get stunning AI-generated images at 1080x1920 portrait resolution, ready to upload to TikTok.bytedance-seedream-v5.0-editImage to ImageSeedream 5.0 Lite Edit is an advanced image transformation model by ByteDance, enabling precise, controllable edits using natural language. It specializes in high-fidelity style transfer (Anime, Cyberpunk, Fantasy), background swaps, and object modification while preserving original lighting, color tones, and character consistency for professional-grade creative reworks.gpt4o-text-to-imageText to ImageGenerate images from text prompts using GPT-4o's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.gpt4o-editImage to ImageEdit a specific part of an image using natural language. Ideal for object removal, replacement, or content-aware filling.kling-o1-standard-image-to-videoImage to VideoKling O1 Standard Image-to-Video converts a single still image into a short, natural-looking video clip. It preserves the original image’s composition and lighting while adding subtle camera motion, gentle parallax, and light environmental animation. This mode focuses on realism and stability rather than heavy effects, making it ideal for clean cinematic shots, environments, characters, and product visuals.wan2.1-text-to-videoText to VideoWAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.perfect-pony-xlText to ImagePony XL is a high-quality image generation model based on Stable Diffusion XL architecture. It specializes in character art, hybrid styles, and producing detailed, polished visuals even with simpler prompts.wan2.2-speech-to-videoAudio to VideoWAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.flux-2-klein-4b-turboText to ImageFlux-2-Klein-4B Turbo is an ultra-fast, high-efficiency text-to-image model. It is a distilled version of the Klein 4B model, designed for near-instant rendering while maintaining impressive adherence to prompts. Perfect for rapid prototyping, real-time creative tools, and applications where speed is paramount.seedance-2.0-omni-referenceImage to VideoSeedance 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.midjourney-v7-image-to-imageImage to ImageUse Midjourney V7’s I2I to refine or reinterpret existing images. Modify style, mood, lighting, or content while preserving the overall composition — great for alternate versions, art variations, or polishing concepts.flux-2-klein-9b-turboText to ImageFlux-2-Klein-9B Turbo is a high-performance, mid-size text-to-image model. This distilled variant of Klein 9B provides a superior balance of speed and detail, delivering richer textures and complex scenes with significantly reduced generation times. Ideal for polished illustrations and character-rich visuals where performance is key.grok-imagine-text-to-videoText to VideoGrok Imagine is xAI’s fast, creative text-to-video model that generates short (~6-second) cinematic clips with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.ltx-2.3-video-extendVideo to VideoLTX-2.3 Video Extend seamlessly continues an existing video clip by generating additional frames that match the original motion, style, and scene composition. Powered by the LTX-2.3 architecture, it maintains temporal coherence and visual fidelity across the extension boundary.seedance-lite-reference-videoImage to VideoSeedance Lite's Reference-to-Video feature allows you to supply up to 4 images as reference inputs. The model intelligently blends aspects from these images to generate a cohesive, high-quality video.bytedance-seedream-v3Text to ImageSeedream is designed for generating visually rich and artistic images from text prompts. It excels at fantasy, anime, surrealism, and vibrant color compositions — ideal for creative visuals, storyboards, and concept art.kling-v2.1-master-i2vImage to VideoKling 2.1 Master’s I2V animates a still image into a coherent video sequence. It interprets motion, environment, and context to create realistic, visually stunning video outputs — ideal for animating portraits, scenes, or concept art.flux-kontext-effectsImage to ImageFlux Kontext Effects is a creative image and video model that applies stylized transformations, cinematic filters, and artistic reinterpretations to your inputs. Instead of generating new content from scratch, it enhances or reimagines existing images and videos with unique looks — ranging from surreal effects to realistic cinematic moods.kling-v2.1-standard-i2vImage to VideoKling 2.1 Standard (developed by Kuaishou) brings static images to life by generating smooth, realistic video clips from a single frame. It captures subtle motion, background dynamics, and camera movement to produce professional-looking animations — ideal for portraits, digital art, and cinematic illustrations.qwen-imageText to ImageGenerate high-quality, detailed images from text prompts in various styles — from realistic to artistic — perfect for creative visuals, product shots, and concept art.midjourney-v7-style-referenceImage to ImageGenerate images in the distinctive aesthetic of Midjourney v7 — blending cinematic depth, photorealism or painterly rendering, rich textures, and dynamic lighting. This style reference model helps you infuse any subject with the visual storytelling, composition, and high detail fidelity that Midjourney is known for. Ideal for concept art, stylized portraits, and stunning environment scenes.openai-soraText to VideoSora is a text-to-video generative AI model developed by OpenAI. It can generate short video clips based on descriptive text inputs, producing content that ranges from photorealistic scenes to stylized animations.veo3.1-4k-videoText to VideoGet the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.flux-pulidImage to ImageFlux PuLID is an innovative image-to-image model that enables consistent face rendering across different styles or scenes—without needing any model fine-tuning. By providing a reference image (e.g., a portrait), the model generates new visuals while maintaining your subject’s identity with high fidelity.wan2.5-text-to-video-fastText to VideoTransform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.hunyuan-image-3.0Text to ImageHunyuan Image 3.0 brings together powerful architecture (Mixture-of-Experts + autoregressive style) to produce richly detailed and coherent images from complex prompts. It can read narrative descriptions, render text and signage cleanly, and support multiple visual styles — from photorealism to illustrations.kling-o1-text-to-imageText to ImageKling O1 Text-to-Image is a high-fidelity creative image model that converts rich natural-language prompts into ultra-detailed stills. It excels at cinematic composition, realistic lighting, and coherent scene detail—great for concept art, environment renders, character portraits, and stylized imagery with photoreal or illustrative looks.latent-syncAudio to VideoLatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.creatify-lipsyncAudio to VideoRealistic lipsync video - optimized for speed, quality, and consistency.nano-banana-2Text to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.qwen-image-editImage to ImageThe Qwen Edit Image Model allows you to modify existing images using text-based editing prompts. Instead of generating from scratch, you can upload a base image and describe the desired changes (e.g., replacing objects, altering colors, adding new elements).ltx-2-pro-image-to-videoImage to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.photo-packImage to ImageGenerate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.vidu-q1-referenceImage to VideoVidu Q1 enables you to generate cinematic 1080p videos using multiple visual references—up to seven images—and text prompts. Designed for consistency, it preserves character appearance, props, and backgrounds across scenes while adding new motion and narrative elements.wan2.2-5b-fast-t2vText to VideoWan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.minimax-hailuo-02-standard-i2vImage to VideoTransforms an image into video with light, natural motion. Great for social media, quick animations, and previews.wan2.2-text-to-videoText to VideoWan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.flux-2-flex-editImage to ImageFlux-2-Flex Edit allows flexible transformation of an existing image: object replacement, material changes, lighting adjustments, style shifts, or localized edits. It preserves the original scene’s geometry, perspective, and lighting while modifying only what the edit prompt specifies.ideogram-v3-reframeImage to ImageIdeogram V3 Reframe is a specialized image-to-image model built on Ideogram 3.0, designed to intelligently extend and adapt images across diverse aspect ratios and resolutions. Leveraging advanced AI outpainting, it preserves visual consistency while enabling creative reframing for digital, print, and video content.veo3.1-fast-image-to-videoImage to VideoVeo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.kling-o1-standard-video-editVideo to VideoKling O1 Standard Video-to-Video Edit modifies an existing video while preserving its original structure, motion, and realism. It is designed for subtle, stable edits such as object replacement, background changes, lighting adjustments, or small visual tweaks. This mode prioritizes temporal consistency and natural motion, making it.flux-2-proText to ImageFlux-2-Pro Text-to-Image is a premium, high-fidelity generative model capable of producing ultra-realistic, cinematic, and deeply detailed images from text prompts. It excels at complex lighting, layered compositions, surreal visual concepts, and professional art-grade rendering suitable for concept art, advertising visuals, and world-building.minimax-hailuo-02-standard-t2vText to VideoFast and lightweight text-to-video generation. Ideal for quick drafts, previews, or playful content where speed matters more than cinematic quality.seedance-2.0-watermark-removerVideo to Video🎉 FREE for a limited time — Remove Seedance 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.flux-2-flexText to ImageFlux-2-Flex Text-to-Image is a flexible, high-fidelity generative model capable of producing detailed, imaginative, and stylistically rich scenes from text alone. It excels at surreal concepts, fantasy environments, sci-fi structures, cinematic atmospheres, and high-resolution artistic compositions with strong prompt adherence.midjourney-v7-image-to-videoImage to VideoMidjourney V7’s I2V breathes motion into still images, animating characters, environments, and objects with artistic transitions. Ideal for looping visual stories, concept animations, or enhancing still visuals with subtle motion.flux-schnellText to ImageFlux Schnell is a lightning-fast image generation model designed for rapid iterations. It delivers good visual quality from text prompts almost instantly, making it perfect for real-time concept testing, brainstorming, and UI-integrated experiences.vidu-v2.0-t2vText to VideoVidu's 2.0 model offers enhanced visual quality and comprehensive workflow support across multiple resolution options for versatile content creation.ai-dance-effectsVideo to VideoBring your characters and worlds to life with AI Dance Effects — a creative video effect that adds playful, dynamic, and cinematic motion to your generations. AI Dance Effects lets you guide how characters move, react, and express themselves.bytedance-seedream-v4.5Text to ImageSeedream-v4.5 is ByteDance’s advanced text-to-image diffusion model designed for generating high-detail, high-contrast, cinematic and stylized images. It excels at surreal fantasy concepts, sci-fi worlds, product visuals, photoreal scenes, and artistic compositions with strong prompt adherence and crisp detail.kling-v2.6-pro-t2vText to VideoKling-v2.6-Pro Text-to-Video generates high-fidelity cinematic videos directly from text prompts. It excels at complex compositions, dramatic lighting, fluid camera motion, and visually rich fantasy or sci-fi sequences.flux-2-klein-4b-turbo-editImage to ImageFlux-2-Klein-4B Turbo Edit provides ultra-fast, instruction-based image editing. This high-efficiency variant of Klein 4B Edit is optimized for near-instant swaps and tweaks while preserving layout and lighting. Ideal for real-time design tools and quick creative adjustments.openai-sora-2-pro-charactersText to TextCreate consistent AI characters for your Sora 2 videos. Provide a previous video's task ID and a prompt to define or refine your character.nano-bananaText to ImageNano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.pixverse-v5-t2vText to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.wan2.6-image-to-videoImage to VideoWAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.google-imagen4Text to ImageGoogle Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.google-imagen4-ultraText to ImageImagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.wan2.6-text-to-imageText to ImageWAN 2.6 Text-to-Image generates detailed, cinematic still images from text prompts. It focuses on strong composition, atmospheric lighting, and clear subject structure, making it suitable for fantasy and sci-fi environments, surreal concepts, architectural visuals, and dramatic world-building imagery.wan2.1-reference-videoImage to VideoWAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.qwen-image-2.0Text to ImageQwen 2.0 Text to Image model with enhanced realism.veed-lipsyncAudio to VideoGenerate realistic lipsync from any audio using VEED's latest modelsdxl-imageText to ImageSDXL is a high-quality, large Stable Diffusion model for creating photorealistic and stylized images from text. It excels at fine detail, realistic lighting, and complex scenes.infinitetalk-image-to-videoAudio to VideoInfiniteTalk Image-to-Video brings still portraits and character photos to life by generating natural, realistic talking videos. You provide a single face image and a dialogue script, and the model animates lip movement, facial expressions, and subtle head gestures to match the speech.luma-flash-reframeVideo to VideoTransform and resize your videos effortlessly with Ray 2 Flash Reframe. This tool intelligently expands or adjusts your video’s aspect ratio—adding visually consistent content to the sides, top, or bottom—without altering the original subject.ai-video-upscalerVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.flux-reduxImage to ImageFlux Redux is a transformation model that reimagines or enhances your input images while preserving their main structure and subject. It’s built for creative refinement — whether you want style transfer, artistic reinterpretation, cinematic polish, or mood transformation.qwen-image-2.0-proText to ImageQwen 2.0 Pro Text to Image model with maximum realism and fidelity.seedance-v1.5-pro-i2v-fastImage to VideoSeedance v1.5 Pro Image-to-Video Fast converts a single still image into a short cinematic video with quick generation speed. It preserves the original image’s composition, subject identity, and lighting while adding simple camera motion, light parallax, and subtle environmental animation.seedance-v1.5-pro-video-extendVideo to VideoSeedance v1.5 Pro Video Extend continues an existing video by generating additional frames that match the original scene’s style, lighting, motion, and mood. It is designed for smooth temporal consistency, making it ideal for extending cinematic shots, atmospheric scenes, or slow camera moves without introducing visual jumps or style changes.vidu-v2.0-i2vImage to VideoVidu's 2.0 model delivers advanced image-based video generation with enhanced lighting, emotion dynamics, and automatic frame interpolation for polished visual content.wan2.2-edit-videoVideo to VideoEasily modify existing videos using simple text commands. With Wan 2.2 Video-Edit, you can change attire, character appearance, or other visual elements directly within your video—no need to start from scratch. Works on uploads of 480p or 720p, for up to two minutes.nano-banana-2-editImage to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.kling-v1-avatar-proAudio to VideoKling AI Avatar Pro is the premium tier for making high-quality talking avatars. You upload a character image plus an audio file, and the model generates a realistic avatar video with lip-sync.ovi-image-to-videoImage to VideoOvi is a unified audio–video generation model that can transform a static image plus a descriptive prompt into a short video with synchronized audio. It supports both text-to-video and image-conditioned video inputs. With built-in lip sync, background audio / sound effects, and dialogue support, Ovi brings still visuals to life in cinematic fashion. Videos are generated in 540p resolution.ltx-2.3-lipsyncAudio to VideoLTX-2.3 LipSync generates a realistic talking video by synchronizing mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency—powered by the upgraded LTX-2.3 architecture.kling-v3.0-std-motion-controlVideo to VideoKling V3.0 Standard Motion Control allows for precise control over the camera and subject movement in generated videos. Powered by the latest Kling V3.0 architecture for improved temporal consistency and quality.ovi-text-to-videoText to VideoOvi is a unified model that generates synchronized video and audio from textual input. You write a scene description, including dialogue and ambient sounds, and Ovi produces a short video clip (typically ~5 seconds) where visuals and sound align naturally. Videos are generated in 540p resolution.seedance-v2.0-video-editVideo to VideoSeedance 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.kling-o1-video-editVideo to VideoKling O1 Video Edit lets you send an existing video clip plus an instruction/prompt to edit or transform the clip while preserving temporal coherence and subject identity. Typical edits include color grading, background replacement, object removal, slow-motion slo-mo, speed ramps, style transfer, subtle camera stabilization, and short extension/outro generation. Inputs can include: the source video, an optional frame mask (for localized edits), time range, and style/reference images.vidu-q2-reference-to-imageImage to ImageVIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.minimax-hailuo-02-pro-i2vImage to VideoAdvanced image-to-video with cinematic realism. Adds dynamic camera motion, realistic physics, and atmospheric detail for storytelling.bytedance-seedream-v5.0Text to ImageSeedream 5.0 Lite is ByteDance’s next-generation text-to-image model, delivering high-fidelity AI art with advanced visual reasoning and precise typography. Supporting up to 4K resolution and cinematic detail, it excels at complex scene construction, consistent character generation, and real-time knowledge integration for accurate, contextually relevant visuals.ltx-2-fast-text-to-videoText to VideoLTX Video Fast is a speed-optimised mode of Lightricks’ video-generation engine, supporting text-to-video workflows. It allows you to input a descriptive prompt and get a short video clip with motion, camera movement, lighting, and stylised visuals. The underlying model (LTX-Video) is built for real-time or near-real-time generation of video clips.wan2.6-text-to-videoText to VideoWAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.qwen-text-to-image-2512Image to ImageQwen Image Text-to-Image 2512 generates high-resolution, visually consistent images from text prompts. It focuses on strong scene structure, clean composition, and atmospheric lighting, making it well-suited for cinematic environments, surreal concepts, fantasy and sci-fi worlds.kling-v3.0-pro-image-to-videoImage to VideoKling 3.0 Pro Image-to-Video animates a single input image into a high-quality, realistic video with smooth camera motion, natural physics, and strong temporal consistency. It excels at real-world scenes, human motion, environmental details, and cinematic movement while preserving the original image’s structure and lighting.any-llmText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.qwen-image-2.0-editImage to ImageQwen 2.0 Image Edit model with precise background modification and enhancements.kling-v3.0-standard-text-to-videoText to VideoKling 3.0 Standard Text-to-Video generates smooth, realistic videos from text with stable motion and natural behavior. It works best with clear subjects, simple actions, and one continuous scene, making it ideal for cute animals, small actions, and calm cinematic moments.kling-v2.6-std-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ltx-2.3-image-to-videoImage to VideoLTX-2.3 Image-to-Video animates a single image into a coherent cinematic clip. It preserves scene composition and lighting while adding smooth camera motion, parallax, and environmental dynamics. Built on the upgraded LTX-2.3 architecture for sharper output and improved temporal consistency.minimax-hailuo-02-pro-t2vText to VideoHigh-fidelity text-to-video with cinematic rendering. Best for storytelling, cinematic clips, or realistic visuals with depth, atmosphere, and detail.ltx-2.3-text-to-videoText to VideoLTX-2.3 Text-to-Video generates cinematic video clips directly from text prompts. Built on an upgraded 2.3B architecture, it delivers sharper temporal consistency, faster synthesis, and more precise motion control than previous LTX versions. Ideal for concept visualization, story beats, and prompt-driven animation.topaz-image-upscaleImage to ImageTopaz Image Upscale is a high-quality image-to-image enhancement model that increases resolution, sharpness, and detail using AI super-resolution. It improves clarity, restores texture, reduces noise, and produces crisp, high-res output while preserving natural look and fine edges.seedance-pro-i2vImage to VideoSeedance Pro I2V advanced model animates still images into stunning short videos, preserving intricate visual details and applying smooth motion dynamics, ideal for high-end visuals and cinematic edits.flux-2-dev-editImage to ImageFlux 2 Dev Edit takes an existing image and applies transformations, replacements, or style changes based on a text instruction. It preserves composition, lighting, and the overall scene while modifying only what the edit prompt specifies. Ideal for creative replacements, stylistic adjustments, object swaps, and environment changes while keeping the original artistic integrity.video-combinerVideo to VideoCombine multiple short video clips (5s, 10s, etc.) into a single seamless full-length video. Upload your clips in order and choose the final output aspect ratio. 'Auto' preserves the aspect ratio of your first clip.suno-generate-soundsText to AudioGenerate sound effects using Suno chirp-crow model.suno-generate-lyricsText to TextGenerate lyrics using Suno.vidu-q2-text-to-imageText to ImageVIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.suno-boost-music-styleText to TextBoost style prompts for Suno music generation.pixverse-v5.5-t2vText to VideoPixVerse v5.5 T2V generates cinematic short videos directly from text. It excels at stylized fantasy, anime, surreal worlds, atmospheric environments, and fluid camera motion. The model produces vivid lighting, dynamic effects, depth-rich parallax, and smooth motion.seedance-lite-i2vImage to VideoSeedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.openrouter-visionText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.suno-add-vocalsText to AudioAdd vocals to an instrumental track.seedance-v1.5-pro-t2v-fastText to VideoSeedance v1.5 Pro Text-to-Video Fast generates short cinematic videos directly from text with an emphasis on speed and stability. It produces coherent scenes with simple camera motion, light environmental animation, and consistent lighting.ltx-2-19b-image-to-videoImage to VideoLTX-2-19B Image-to-Video animates a single image into a coherent cinematic clip with strong temporal stability. It preserves composition and lighting while adding controlled camera motion, realistic parallax, and subtle environmental dynamics—well suited for grounded scenes, near-future concepts, and story beats.suno-generate-mashupText to AudioCreate a mashup using 1-5 audio tracks.pixverse-v5-i2vImage to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.bytedance-seedream-v4Text to ImageSeedream v4 generates stunning, high-fidelity images from text prompts. It’s designed for creativity with strong support for realism, fantasy, and artistic styles.bytedance-seedream-v4-editImage to ImageSeedream v4 Edit refines or transforms existing images based on a new prompt and a reference. Instead of masking, you provide a source image and describe how it should be altered — adjusting style, details, or replacing elements while keeping the subject consistent.nano-banana-proText to ImageNano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.minimax-voice-cloneText to AudioMinimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.suno-add-instrumentalText to AudioAdd instrumental backing to acapella audio.wan2.6-image-editImage to ImageWAN 2.6 Image Edit applies targeted, instruction-based edits to an existing image while preserving composition, perspective, and lighting. It’s ideal for object replacement, material changes, environment tweaks, and style adjustments with clean integration and minimal artifacts—keeping the original scene coherent and cinematic.seedance-v2.0-i2vImage to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.seedance-v1.5-pro-i2vImage to VideoSeedance v1.5 Pro Image-to-Video converts a single still image into a smooth cinematic video clip. It preserves the original image’s composition, subject identity, and lighting while adding controlled camera motion, natural parallax, and environmental animation. This mode balances visual quality and motion complexity, making it ideal for cinematic scenes, fantasy worlds, sci-fi environments, and storytelling shots.seedance-v2.0-t2vText to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.flux-2-klein-9b-editImage to ImageFlux-2-Klein-9B Edit performs higher-quality image edits with better detail retention, lighting consistency, and texture handling compared to smaller variants. It’s well-suited for cute character edits, object additions, and visual refinements that need to look natural and polished while keeping the original scene intact.kling-v3.0-pro-text-to-videoText to VideoKling 3.0 Pro is a high-end video generation model capable of producing longer, smoother, and more realistic cinematic videos with strong motion consistency. It handles complex scenes, realistic physics, natural camera movement, and detailed environments better than earlier versions.flux-dev-loraTrainingEnables text-to-image generation using custom LoRA models. Generate consistent characters, styles, or branded visuals with high quality and fast results.flux-kontext-dev-i2iImage to ImageTakes an input images and transforms it based on a new prompt. Keeps structure or pose while changing style, appearance, or details.neta-luminaText to ImageNeta Lumina is a powerful anime-style text-to-image model developed by Neta.art Lab. It’s built on Lumina-Image-2.0, fine-tuned with over 13 million high-quality anime images. It offers strong understanding of multilingual prompts, excellent detail fidelity, support for Danbooru tags, and leaning into niche styles like furry, Guofeng, pets, scenic backgrounds, etc.suno-remix-musicText to AudioThis API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.gpt-image-1.5Text to ImageGPT-Image-1.5 is a high-quality text-to-image generation model designed for rich visual reasoning, detailed compositions, and strong prompt understanding. It excels at complex scenes, symbolic imagery, cinematic lighting, surreal concepts, product visuals, and imaginative world-building while maintaining coherence and fine detail.kling-v3.0-pro-motion-controlVideo to VideoKling V3.0 Pro Motion Control provides the highest level of detail and control for video generation. Suitable for professional workflows requiring complex cinematic camera work and subject consistency.kling-v2.1-master-t2vText to VideoKling 2.1 Master’s T2V mode allows users to generate vivid, high-quality videos from detailed text prompts. It supports dynamic scenes, natural motion, and cinematic quality — perfect for storytelling, ads, or content creation from imagination alone.flux-2-klein-4b-editImage to ImageFlux-2-Klein-4B Edit applies lightweight, instruction-based edits to an existing image. It’s best for clear object swaps, small visual changes, and cute enhancements while preserving the original scene’s layout and lighting. Ideal for fast edits, UI demos, and simple creative tweaks.ideogram-characterImage to ImageIdeogram’s Character Reference model enables consistent character generation using just one reference image. Upload a clear character portrait—and you can place that character in unlimited scenes, styles, poses, or narratives with visual fidelity maintained across all outputs.kling-v3.0-standard-image-to-videoImage to VideoKling 3.0 Standard Image-to-Video animates a single input image into a short, realistic video with smooth, stable motion. It prioritizes temporal consistency, natural physics, and subtle camera movement, making it ideal for everyday scenes, travel moments, people, vehicles, and calm cinematic shots.kling-v1-avatar-standardAudio to VideoKling AI Avatar Standard creates talking avatar videos from a single image + audio input. It supports realistic humans, animals, or stylized characters, producing lip-synced avatar videos easily.kling-v2.5-turbo-pro-i2vImage to VideoKling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.sdxl-loraTrainingThe SDXL LoRA image model enhances Stable Diffusion XL with specialized fine-tuning, letting you generate images in unique styles, characters, or themes. By applying LoRA weights, you can create visuals that match a specific aesthetic, celebrity look, anime style, or custom-trained subject.seedance-v1.5-pro-t2vText to VideoSeedance v1.5 Pro Text-to-Video generates high-quality cinematic videos directly from text prompts. It focuses on smooth motion, rich atmosphere, and coherent scene structure, making it ideal for fantasy worlds, sci-fi environments, surreal visuals, and cinematic storytelling shots with detailed lighting and depth.hunyuan-image-2.1Text to ImageHunyuan Image is a powerful text-to-image generation model that produces photorealistic and highly detailed visuals. It excels at creating portraits, environments, and concept art with strong consistency and realism. Designed for versatility, it supports both natural photography styles and imaginative artistic outputs.qwen-image-edit-plusImage to ImageQwen Image Edit Plus is an upgraded image-editing model that supports multiple image references and superior text editing. Powered by the 20B-parameter Qwen architecture, it allows changes like background swap, style transfer, object removal/addition, and precise text edits (bilingual: English/Chinese) while maintaining visual consistency and preserving details of the original images.kling-v2.5-turbo-pro-t2vText to VideoKling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.leonardoai-lucid-originText to ImageLucid Origin is LeonardoAI’s advanced image generation model, designed for ultra-realistic, vibrant, and highly detailed visuals. It excels at creating photorealistic portraits, landscapes, product shots, and stylized art while faithfully following complex prompts.wan2.5-image-to-videoImage to VideoWAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.wan2.5-text-to-videoText to VideoWAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.wan2.5-text-to-imageText to ImageWAN 2.5 Text-to-Image generates high-quality, realistic or stylized images from textual descriptions. It supports detailed visual storytelling, cinematic compositions, and versatile styles — from portraits and product shots to landscapes and fantasy scenes.topaz-video-upscaleVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.wan2.5-image-editImage to ImageThe Wan2.5 Edit Image model allows you to transform existing images with precision and creativity. By providing an image along with an edit prompt, you can make realistic changes, enhancements, or stylistic adjustments—whether it’s altering objects, changing backgrounds, adding details, or applying an entirely new artistic style.ai-video-upscaler-proVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.add-video-watermarkVideo to VideoAdd custom watermark to videos with adjustable position, opacity, and size. Free local processing using FFmpeg.video-watermark-removerVideo to VideoThe AI Video Watermark Remover is our flagship model designed to remove Sora 2 watermarks, logos, captions, and unwanted text from videos without compromising quality. Supporting a wide range of formats, it's fast, efficient, and processes with the highest quality.leonardoai-phoenix-1.0Text to ImageLeonardoAI Phoenix 1.0 is a professional-grade AI image model designed for realistic, cinematic, and highly detailed visuals. It excels at interpreting complex prompts, rendering text within images, and creating high-resolution outputs suitable for editorial, commercial, or creative projects.gpt-5-nanoText to TextGPT-5 Nano is a lightweight, high-speed language model from the GPT-5 family designed for instant text generation. It delivers intelligent, context-aware responses for creative writing, summarization, dialogue, code generation, and automation — all at low latency and cost. Perfect for chatbots, assistants, content tools, and real-time applications that need fast, reliable text output.leonardoai-motion-2.0Image to VideoMotion 2.0 is Leonardo.AI's cutting-edge model for creating high-quality 5-second videos from text prompts. It offers enhanced control over animation, including camera movements, lighting, and scene dynamics.higgsfield-soul-image-to-imageImage to ImageSOUL is an AI image model focused on hyper-realistic, magazine or editorial-style visuals, especially for fashion, portraits, lifestyle, and commercial content. It offers over 50 curated style presets to get a specific aesthetic without needing complicated prompt engineering. It generates photography-quality images with lighting, textures, and context that feel real — including natural imperfections like film grain, dust, or lens effects for authenticity.veo3.1-reference-to-videoImage to VideoVeo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.higgsfield-dop-image-to-videoImage to VideoHiggsfield’s DOP (Director of Photography) Motion Effects empower creators to combine cinematic camera moves with built-in visual effects—like explosions, fire, distortion, disintegration, and transitions—directly in AI video generation. You choose from a library of motion presets (e.g. Earth Zoom, Bullet Time, Dolly Zoom) and overlay dynamic effects that accentuate storytelling without needing a full VFX pipeline.remix-videoVideo to VideoTransform and resize your videos effortlessly with remix video tool.openai-sora-2-pro-storyboardText to VideoSora 2 Pro enables creators to structure video narratives by chaining multiple scenes through storyboard “cards.” Each card defines a segment of the video—setting, characters, actions, timing—and the model stitches them into a cohesive multi-scene video. This gives you more control over pacing, transitions, and storytelling flow.veo3.1-extend-videoText to VideoVeo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.gpt-5-miniText to TextGPT‑5 Mini is a compact yet powerful AI that converts plain text ideas into detailed, structured prompts suitable for use in text-to-image, text-to-video, and other generative AI models. It’s perfect for creators who want to quickly craft high-quality prompts without manually thinking about style, composition, and descriptive details. The model helps accelerate workflows for artists, video producers, and designers.seedance-pro-i2v-fastImage to VideoSeedance Pro Fast is the high-speed image-to-video generation variant from ByteDance’s Seedance series. With this model you upload a reference image and—using a text prompt—generate short, dynamic video clips (typically 3-12 seconds) featuring smooth motion, cinematic camera moves, prompt-accurate actions, and high visual fidelity. It supports resolutions up to 1080p, multiple aspect ratios (16:9, 9:16, etc.), and rapid turnaround—ideal for social content, product motion, storytelling from a still, and fast prototyping.seedance-pro-t2v-fastText to VideoSeedance Pro Fast is ByteDance’s advanced text-to-video model that turns natural-language prompts into short, cinematic video clips with realistic motion, camera dynamics, and consistent scene detail.ltx-2-pro-text-to-videoText to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.ltx-2-fast-image-to-videoImage to VideoLTX-2 Fast is a speed-optimized mode of the LTX-2 engine by Lightricks, focused on generating short video clips from a still image + prompt (I2V) with good fidelity and rapid turnaround. It supports audio/video together, multiple aspect ratios, and is ideal when you need quick output for iteration or storyboarding.vidu-q2-referenceImage to VideoVidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.vidu-q2-turbo-start-end-videoImage to VideoVidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.vidu-q2-pro-start-end-videoImage to VideoVidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.minimax-hailuo-2.3-pro-t2vText to VideoHailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.minimax-hailuo-2.3-standard-i2vImage to VideoHailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.minimax-hailuo-2.3-standard-t2vText to VideoHailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.minimax-hailuo-2.3-fastImage to VideoMinimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.kling-v2.5-turbo-std-i2vImage to VideoKling 2.5 Turbo Std: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.reve-text-to-imageText to ImageGenerate images from text prompts using reve's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.reve-image-editImage to ImageReVE Edit is a next-generation image editing model that allows users to apply detailed visual transformations through natural language. Whether you want to restyle portraits, modify backgrounds, or create artistic reinterpretations, ReVE Edit delivers realistic and coherent results while preserving structure and identity.grok-imagine-text-to-imageText to ImageGrok Imagine is xAI’s high-quality image generation model that transforms text prompts into detailed, stylish, and visually expressive images. It excels at creating vivid scenes, characters, environments, and concept art with strong lighting, depth, and artistic clarity. Get 6 images each time.seedvr2-image-upscaleImage to ImageSeedVR2 is a one-step diffusion-transformer model designed for image restoration, super-resolution, deblurring, and artifact removal. It enhances low-quality or compressed images into clean, sharp, high-resolution results while preserving natural colors and fine details.qwen-image-edit-plus-loraImage to ImageQwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.nano-banana-pro-editImage to ImageNano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.kling-o1-edit-imageImage to ImageKling O1 Image Edit applies targeted transformations to an existing image while preserving composition, lighting, and visual consistency. Use it to replace objects, retouch elements, change materials, or apply stylistic shifts with high fidelity and minimal artifacts.kling-o1-image-to-videoImage to VideoKling O1’s Image-to-Video mode transforms one or more reference images into short cinematic video clips by adding natural motion, camera choreography, and scene dynamics while preserving subject identity and visual consistency. It supports start/end frames.kling-o1-reference-to-videoImage to VideoKling O1’s Reference-to-Video mode generates a dynamic video using one or multiple reference images as the visual foundation. It preserves identity, style, composition, and key visual details from the references while adding realistic camera motion, environment dynamics, and scene animation.kling-o1-video-edit-fastVideo to VideoVideo Edit Fast is the lightweight, high-speed editing mode of Kling O1. It performs quick edits on an existing video without heavy processing—ideal for fast object replacements, light enhancements, color tweaks, or simple visual adjustments. This mode focuses on speed over complex reconstruction, making it suitable for rapid iterations, previews, and small edits while preserving the original video’s motion and structure.flux-2-devText to ImageFlux 2 Dev is a powerful text-to-image diffusion model designed for high-quality, fast, and highly detailed visual generation. It excels at creating cinematic lighting, vibrant compositions, surreal concepts, characters, products, and worlds with strong prompt following and artistic control. Ideal for rapid image ideation, visual storytelling, and concept art.flux-2-pro-editImage to ImageFlux-2-Pro Edit enables precise, high-fidelity modifications to an existing image while preserving its lighting, style, mood, and composition. It’s ideal for replacing objects, altering materials, adjusting environmental elements, or performing stylistic transformations without damaging the original scene’s quality. Flux-2-Pro maintains ultra-detailed textures and cinematic realism during edits.bytedance-seedream-v4.5-editImage to ImageSeedream-v4.5 Edit allows you to transform an existing image using natural-language instructions. It preserves the core composition, lighting, and style of the original while modifying only the requested elements — perfect for object replacement, environment changes, stylistic adjustments, and high-detail creative reworks.kling-v2.6-pro-i2vImage to VideoKling-v2.6-Pro Image-to-Video transforms a single creative image into a short cinematic video. It preserves the original style, lighting, and composition while adding smooth camera motion, atmospheric effects, and dynamic environmental animation.kling-v2-avatar-standardAudio to VideoAI-Avatar v2 Standard generates a talking-avatar video from a reference image and an audio dialogue. It performs accurate lip-sync, natural facial expressions, subtle head motion, blinking, and light emotional cues based on voice tone. This Standard version focuses on speed and natural realism.wan2.2-spicy-image-to-videoImage to VideoWan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.minimax-speech-2.6-hdText to AudioSpeech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.minimax-speech-2.6-turboText to AudioSpeech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.seedance-v1.5-pro-video-extend-fastVideo to VideoSeedance v1.5 Pro Video Extend Fast quickly extends an existing video by generating a short continuation that matches the original style, motion, and lighting. This mode prioritizes fast output and smooth continuity with minimal new motion, making it ideal for previews, quick edits, and lightweight shot extensions without complex effects.gpt-image-1.5-editImage to ImageGPT-Image-1.5 Edit applies precise, instruction-based modifications to an existing image while preserving composition, lighting, perspective, and visual coherence. It’s well-suited for object replacement, concept evolution, symbolic edits, and creative transformations that feel natural and intentional rather than destructive.grok-imagine-image-to-imageImage to ImageGrok Imagine Image-to-Image transforms an existing image using natural language instructions while preserving scene structure, perspective, and lighting. It is ideal for object replacement, environment evolution, concept re-imagining, and creative edits that feel grounded and visually coherent rather than over-stylized.ltx-2-19b-text-to-videoText to VideoLTX-2-19B Text-to-Video generates coherent cinematic videos directly from text, with an emphasis on temporal stability, natural motion, and conceptual clarity. It works best when the scene has a strong visual idea where motion reinforces meaning rather than overwhelming it.flux-2-klein-4bText to ImageFlux-2-Klein-4B is a lightweight, fast text-to-image model optimized for clear subject rendering, good prompt adherence, and efficient generation. It works best with simple compositions, everyday scenes, and cute or friendly visuals, making it ideal for UI graphics, demos, thumbnails, mascots, and quick creative iterations.flux-2-klein-9bText to ImageFlux-2-Klein-9B is a mid-size text-to-image model that balances detail quality and generation speed. It handles richer lighting, better textures, and more nuanced scenes than smaller variants, while still working well with clear, grounded prompts. Ideal for polished illustrations, product visuals, mascots, and everyday scenes with character.z-image-pText to ImageZ-Image P is based on PiAPI's Qubico/z-image text-to-image model.openai-sora-2-standard-image-to-videoImage to VideoOpenAI Sora 2 Standard Image to Video model (High Priority). Generate stunning 10s videos from an image and text prompt.flux-2-klein-9b-turbo-editImage to ImageFlux-2-Klein-9B Turbo Edit offers high-quality, ultra-fast image editing with superior detail retention. This high-efficiency version of Klein 9B Edit handles lighting and textures with precision while delivering edits much faster than the standard variant. Best for polished character edits and professional refinements where speed is critical.292 Models FoundESC TO CLOSEExplore/muapi.ai/seedance-v2.0-t2vmuapi/seedance-v2.0-t2vText to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.LLM ReferenceText to VideoPlaygroundAPI ReferenceInputConfigure the model parameters below.As: FormPrompt* requiredThe prompt to generate the videoAspect Ratio16:916:99:164:33:4Duration551015QualitybasichighbasicRemove WatermarkRemove watermark from the generated videoRemove watermark from the generated videoReset AllGenerate Result🚀Related ModelsView allseedance-v2.0-extendSeedance 2.0 Extend Video continues an existing Seedance 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment.Text to Videoseedance-2.0-omni-referenceSeedance 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.Image to Videoseedance-2.0-new-omniSeedance 2.0 New Omni Reference — supply up to 9 images, 3 video clips, and 3 audio clips as reference. Reference them in your prompt with @image_file_1, @video_file_1, @audio_file_1 syntax for precise multimodal control.Image to Videoseedance-2.0-watermark-remover🎉 FREE for a limited time — Remove Seedance 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.Video to Videoseedance-v2.0-video-editSeedance 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.Video to Videoseedance-v2.0-i2vSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.Image to Video📝OverviewAbout this modelSeedance 2.0 Text-to-Video is ByteDance's most advanced text-driven video generation model. Describe any scene in natural language and the model produces a cinematic clip with director-level camera control, native audio-video sync, and up to 2K resolution output. It understands complex prompts — lighting, motion physics, mood, and multi-shot storytelling — turning words into high-fidelity video sequences up to 15 seconds long.1Social Media: Viral short-form content generated entirely from text prompts.2Advertising: Cinematic product promos and brand story videos from a single description.3Filmmaking: Pre-visualization and storyboard generation with realistic camera movements.4AI Films: Multi-shot storytelling with consistent environments and characters across scenes.💰Pricing & ValueCost analysisProviderCostNotesmuapiapp$0.60 per videomuapiapp offers Seedance 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.Fal.aiNot availableFal.ai does not currently support Seedance 2.0. muapiapp is among the first platforms offering access to this model.ReplicateNot availableReplicate does not currently support Seedance 2.0. muapiapp provides access to this advanced ByteDance text-to-video model ahead of other platforms.muapiapp$0.60 per videomuapiapp offers Seedance 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.Fal.aiNot availableFal.ai does not currently support Seedance 2.0. muapiapp is among the first platforms offering access to this model.ReplicateNot availableReplicate does not currently support Seedance 2.0. muapiapp provides access to this advanced ByteDance text-to-video model ahead of other platforms.* Competitor pricing is estimated based on similar model architectures and usage tiers.⚙️Technical DetailsConfiguration schemaParameterTypeDescriptionDefaultPromptstringThe prompt to generate the videoA determined penguin straps itself into a homemade rocket sled on an icy mountain. The rocket ignites with a massive burst and launches the penguin across the frozen landscape at insane speed, blasting through snowdrifts and leaving a fiery trail behind.Aspect RatioEnum (4 options)-16:9DurationEnum (3 options)-5QualityEnum (2 options)-basicRemove WatermarkbooleanRemove watermark from the generated videofalsePromptstringThe prompt to generate the videoDefault ValueA determined penguin straps itself into a homemade rocket sled on an icy mountain. The rocket ignites with a massive burst and launches the penguin across the frozen landscape at insane speed, blasting through snowdrifts and leaving a fiery trail behind.Aspect RatioEnum (4 options)-Default Value16:9DurationEnum (3 options)-Default Value5QualityEnum (2 options)-Default ValuebasicRemove WatermarkbooleanRemove watermark from the generated videoDefault Valuefalse📖Implementation GuideDeveloper documentationHow to Use Seedance 2.0 Text-to-Video Write a Detailed Prompt: Describe the scene, subjects, lighting, mood, and camera movement. Be specific — 'slow dolly zoom into a neon-lit street at night' will outperform 'city street'. Choose Quality: Select basic ($0.12/sec) for fast drafts or high ($0.25/sec) for final cinematic output. Set Duration: Choose 5, 10, or 15 seconds. Longer durations allow richer storytelling. Pick Aspect Ratio: Use 16:9 for widescreen, 9:16 for mobile/social, 4:3 or 3:4 for other formats. Submit and Poll: You'll receive a request_id immediately. Poll the result endpoint until status is completed. ❓Common QuestionsFrequently askedWhat is Seedance 2.0 Text-to-Video?It's ByteDance's state-of-the-art text-to-video model that generates cinematic clips from natural language prompts, with support for complex camera movements, native audio, and up to 2K resolution.What's the difference between basic and high quality?Basic quality uses the fast-t2v model at $0.12/sec — ideal for drafts and iteration. High quality uses the standard-t2v model at $0.25/sec for final, cinema-grade output with richer detail and smoother motion.Does it generate audio?Yes, Seedance 2.0 generates audio natively alongside video, ensuring cinema-grade sound synchronized with the visual content.What is the maximum resolution?Seedance 2.0 supports up to 2K resolution output.seedance-v2.0-t2v --- Limited Time: Earn up to +15% bonus Seedance 2.0 credits — rate unlocks as your cumulative spend grows. Max $500/payment. Offer ends Apr 2.hidream-i1-devText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.veo3-image-to-videoImage to VideoVEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.wan2.1-text-to-imageText to ImageWAN 2.1 is a powerful AI model that transforms text prompts into high-resolution, photorealistic images. It excels at detailed object rendering, realistic lighting, and fine textures, making it ideal for visual content, concept art, advertising, and digital storytelling.ai-video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.motion-controlsImage to VideoMotion Controls adds dynamic camera movements, speed ramps, and zoom effects to bring your images to life as smooth, engaging videos.vfxImage to VideoVFX delivers high-impact visual effects like explosions, particles, and cinematic overlays to transform static images into action-packed videos.veo3-text-to-videoText to VideoVEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.flux-kontext-max-t2iText to ImageFlux Kontext Max T2I delivers photorealistic or cinematic-quality images with exceptional detail. It's optimized for high-end visuals — from realistic humans to polished product renders.runway-text-to-videoText to VideoGenerate short, high-quality videos from plain text prompts. RunwayML’s text-to-video model interprets your written description and animates it into a moving visual scene with realistic or stylized motion.suno-extend-musicText to AudioThis API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.hunyuan-text-to-videoText to VideoHunyuan T2V generates detailed and dynamic videos from text prompts with a focus on realism and coherent motion. It handles multi-object scenes, human actions, and cinematic compositions effectively, making it ideal for storytelling and visual concepts.veo3-fast-text-to-videoText to VideoVEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.ai-product-shotImage to ImageInstantly generate studio-quality product images with AI. Upload your item photo and get clean, stylized shots perfect for e-commerce, ads, and catalogs.gpt4o-image-to-imageImage to ImageTransform an input image based on a new prompt — like changing style, lighting, or composition. Useful for reinterpreting visuals while keeping structure.hunyuan-image-to-videoImage to VideoHunyuan I2V takes a static image and generates realistic video animations by interpreting motion and context. It works well for human portraits, objects, or scenes, adding lifelike movement while maintaining the image's integrity.ai-video-face-swapVideo to VideoReplace faces in videos with stunning realism. Our AI ensures accurate expression transfer, lighting consistency, and smooth frame-by-frame blending.hunyuan-fast-text-to-videoText to VideoHunyuan Fast T2V provides accelerated video generation from text prompts with slightly reduced detail but excellent speed. Ideal for rapid prototyping, concept testing, and short-form ideas where time is critical.runway-aleph-v2vVideo to VideoTransform any input video into a new visual style or scene while preserving motion and structure. Aleph V2V lets you apply artistic looks, cinematic lighting, or thematic changes to existing footage.minimax-image-01-subject-referenceImage to ImageMinimax’s I2I “Subject Reference” model enables you to transform images while preserving the appearance of a subject using a single reference image. Ideal for maintaining character likeness—features, clothing, or expression—across different styles or settings.ai-product-photographyImage to ImageCreate professional-grade product photos using AI. Upload your item image and describe it with a prompt, and get studio-style, lifestyle, or creative backgrounds in secondsbytedance-seededit-v3Image to ImageSeededit allows precise edits to images using masks and prompt guidance. Whether you're replacing backgrounds, changing clothing, or inpainting missing areas, Seededit ensures realistic, high-quality results with semantic control.ai-background-removerImage to ImageInstantly remove image backgrounds with pixel-perfect precision. Ideal for product photos, profile pictures, and creative projects.ai-image-upscalerImage to ImageTransform blurry or pixelated images into high-definition visuals. Our AI Image Upscaler uses deep learning to reconstruct details and bring your visuals to life.wan2.2-image-to-videoImage to VideoWan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.runway-act-two-i2vImage to VideoUpload a single character image and a driving video — the model transfers facial expressions and head movements from the video onto your image, bringing it to life. It works with photos, illustrations, or stylized portraits, making them speak, blink, and move naturally. Ideal for avatars, AI presenters, digital actors, and story scenes.nano-banana-effectsImage to ImageNano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.pixverse-v4.5-i2vImage to VideoUpload an image and PixVerse v4.5 will breathe life into it with smooth camera motion, realistic effects, and animated elements. Whether it’s a portrait, landscape, or concept art, this mode turns still visuals into dynamic short videos.ai-image-face-swapImage to ImageAdvanced facial recognition and blending algorithms enable precise face swaps while preserving skin tone, lighting, and facial geometry.midjourney-v7-omni-referenceImage to ImageMidjourney's Omni Reference lets you reuse characters, creatures, or styles from an existing image and place them into entirely new scenes. Simply provide a reference image (oref) and Midjourney will maintain identity, details, and visual consistency — ideal for storytelling, character design, or branding across multiple generations.ideogram-v3-t2iText to ImageIdeogram v3 is an advanced text-to-image model designed for creating highly detailed and visually striking images directly from text prompts. It’s especially good for artistic compositions, design mockups, concept art, and photorealistic scenes. With strong support for text rendering inside images, it’s widely used for posters, typography-based art, and creative branding.ai-dress-changeImage to ImageInstantly change outfits in images using AI. Visualize different clothing styles without the need for physical trials—perfect for fashion, e-commerce, and virtual try-ons.grok-imagine-image-to-videoImage to VideoGrok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into short (≈6 second) cinematic videos with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.image-effectsImage to ImageAI Image Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning images from a image.mmaudio-v2-text-to-audioText to AudioConvert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.mmaudio-v2-video-to-videoVideo to VideoMMAudio-v2 generates high-quality, synchronized audio from video or text inputs. Seamlessly integrate it with AI video models to create fully-voiced, expressive video content.wan2.1-image-to-videoImage to VideoAnimate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.sync-lipsyncAudio to VideoGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.google-imagen4-fastText to ImageImagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.openai-sora-2-pro-image-to-videoImage to VideoSora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.runway-act-two-v2vVideo to VideoTake an existing character video and sync it with the motion from a reference video. This lets you update facial expressions, head turns, and speech gestures while keeping the original look and style. It’s perfect for reshooting performances, dubbing, or animating characters without re-rendering visuals.luma-modify-videoVideo to VideoLuma Modify Video lets you transform an existing video into a new creative scene while keeping the original motion and timing intact. The result is a new video with the same movements but a completely fresh look, atmosphere, or theme.pixverse-v4.5-t2vText to VideoPixVerse v4.5 transforms descriptive text into vivid, high-resolution video clips. It understands complex scenes, human motion, and cinematic camera angles — great for creative storytelling, trailers, and animated concepts.veo3.1-image-to-videoImage to VideoVeo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.seedance-pro-t2vText to VideoSeedance Pro delivers high-fidelity video generation from text, producing rich visuals, smooth camera movement, and realistic scenes. Best for storytelling, content creation, and visual production.nano-banana-editImage to ImageNano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.infinitetalk-video-to-videoVideo to VideoInfiniteTalk Video-to-Video enhances or transforms existing videos by syncing the subject’s lip movements and facial expressions with new dialogue or speech. Instead of starting from a still image, you provide a video clip, and the model seamlessly reanimates the speaker’s mouth and expressions to match the script.openai-sora-2-image-to-videoImage to VideoSora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.veo3.1-text-to-videoText to VideoVeo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.chroma-imageText to ImageCroma Image is an advanced text-to-image generation model designed for high-quality, creative, and versatile visuals. It can produce anything from photorealistic portraits and products to imaginative concept art, fantasy illustrations, and cinematic scenes.wan2.2-animateVideo to VideoWan2.2 Animate is a video-to-video model for animating a character or replacing a character in existing video clips. It replicates holistic movement and facial expressions from a reference video or pose while preserving the target character’s appearance. You upload both an image (for the character) and a video containing motion/expression, and the model generates a video where the character in your image moves like the reference. Supports 480p or 720p, up to 120 secondsopenai-sora-2-text-to-videoText to VideoSora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.minimax-hailuo-2.3-pro-i2vImage to VideoHailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.ai-skin-enhancerImage to ImageSmooth skin, reduce blemishes, and enhance complexion with natural-looking results. Perfect for portraits, selfies, and professional photo retouching.wan2.5-image-to-video-fastImage to VideoConvert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.ai-clippingVideo to VideoConvert long-form videos into engaging short clips using AI clipping.ltx-2-19b-lipsyncAudio to VideoLTX-2-19B LipSync generates a realistic talking video by synchronizing a person’s mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency. Ideal for avatars, dubbing, dialogue replacement, and character narration.ai-captionsVideo to VideoAdd AI-generated animated captions to any video using Vadoo's caption engine. Supports multiple languages and viral caption themes like Hormozi style. Perfect for social media creators, marketers, and content producers.midjourney-v7-text-to-imageText to ImageMidjourney V7 produces high-quality, stylized images from text prompts. Known for its artistic flair, surreal composition, and vivid textures, it's perfect for character concepts, fantasy environments, and creative illustrations.ai-color-photoImage to ImageAutomatically add lifelike colors to black-and-white images. Our AI brings history to life with natural tones, accurate shading, and context-aware colorization.flux-devText to ImageGenerate stunning visuals from simple text prompts. Flux Dev transforms your ideas into high-quality, creative images using powerful AI vision models. Perfect for design, storytelling, concept art, and marketing.hidream-i1-fastText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.openai-sora-2-pro-text-to-videoText to VideoSora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.veo3-fast-image-to-videoImage to VideoQuickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.flux-kontext-dev-t2iText to ImageGenerates an image from a text prompt, with optional reference image for pose or style guidance. Ideal for controlled, consistent image creation using just a description.seedance-lite-t2vText to VideoSeedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.kling-o1-text-to-videoText to VideoKling O1 is a unified, multi-modal video generation engine that transforms natural language prompts into short cinematic video clips. It supports text-to-video generation with realistic motion, dynamic camera moves, and coherent scene rendering.seedance-v2.0-extendText to VideoSeedance 2.0 Extend Video continues an existing Seedance 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment.hidream-i1-fullText to ImageThe most advanced version of HiDream I1, delivering high-resolution, detailed images with superior prompt understanding. Best suited for production, content creation, and high-fidelity applications.flux-kontext-pro-i2iImage to ImageFlux Kontext Pro I2I variant enables transforming base images into refined artwork while keeping structure intact. It’s useful for sketch refinement, visual style changes, and creative edits such as re-dressing, relighting, or re-theming with prompt guidance.ai-ghibli-styleImage to ImageBring your imagination to life with art inspired by the enchanting world of Studio Ghibli. This AI model generates dreamy, hand-drawn visuals with soft colors, whimsical characters, and painterly backgroundswan2.1-lora-t2vTrainingWAN 2.1 LoRA T2V enables users to generate videos from text prompts with custom-trained LoRA modules. Tailor the generation to specific characters, outfits, or animation styles — ideal for brand storytelling, fan content, and stylized animations.portrait-stylistImage to ImageProfessional AI portrait styles including hair, makeup, style, and fashion transformations.kling-v2.1-pro-i2vImage to VideoKling 2.1 Pro is the high-end version of Kuaishou’s video generation model, offering enhanced realism, longer motion sequences, and cinematic quality. In I2V mode, it animates static images with fluid environmental effects.veo3.1-fast-text-to-videoText to VideoVeo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.qwen-image-2.0-pro-editImage to ImageQwen 2.0 Pro Image Edit model with maximum precision and modifications.ai-anime-generatorText to ImageCreate stunning anime-style artwork instantly with our AI Anime Generator. Customize characters, scenes, and styles effortlessly in seconds!z-image-turboText to ImageZ-Image Turbo is a high-speed text-to-image model optimized for fast creative generation. It produces detailed, high-contrast, high-resolution images with strong stylization control. Ideal for rapid concept creation, visual exploration, product ideas, fantasy scenes, and cinematic composition tests. Designed for low latency and strong prompt adherence.kling-v2-avatar-proAudio to VideoAI-Avatar v2 Pro takes a reference image of a person/character and an audio dialogue clip, then generates a realistic talking-avatar video. It preserves identity, lip syncs accurately to the audio, adds natural head movement, eye motion, expressions, and cinematic lighting.openai-sora-2-standard-text-to-videoText to VideoOpenAI Sora 2 Standard Text to Video model (High Priority). Generate stunning 10s videos from text prompts.ai-image-extensionImage to ImageExpand the edges of any image with AI. This model continues your original photo or artwork beyond its borders while matching style, lighting, and content.wan2.1-lora-i2vTrainingBring still images to life using WAN 2.1 LoRA I2V, which supports custom LoRA fine-tunes for identity consistency. Animate expressions, subtle movements, or full-body actions while preserving personalized features from the image and LoRA.pixverse-v5.5-i2vImage to VideoPixVerse v5.5 I2V transforms a single image into a dynamic cinematic video clip. It adds smooth camera motion, atmospheric animation, natural parallax, and environmental effects while preserving the image’s original art style and composition.heygen-video-translateVideo to VideoConvert any video into 175+ languages with synchronized voice translation, AI-voice cloning, and accurate lip sync. Just upload your video (or provide a link), select a target language, and HeyGen recreates the speech in that language. 0.05$ per second.wan2.2-spicy-video-extendVideo to VideoWan-2.2-spicy Video Extend continues an existing video by generating new frames that match the original style but add stronger motion, bolder effects, and spicier dramatics.kling-o1-standard-reference-to-videoImage to VideoKling O1 Standard Reference-to-Video generates a smooth, realistic video using one or multiple reference images as visual guidance. It preserves the visual identity, composition, and lighting from the references while adding subtle camera motion, natural parallax, and light environmental animation. This mode prioritizes stability and realism, making it ideal for character shots, environments, product visuals, and calm cinematic scenes.qwen-image-edit-2511Image to ImageQwen Image Edit 2511 performs precise, instruction-driven edits on an existing image while preserving composition, lighting, and overall style. It’s well-suited for object replacement, material changes, localized edits, and subtle scene adjustments with strong visual consistency and minimal artifacts.add-image-watermarkImage to ImageAdd custom watermark to images with adjustable position, opacity, and size. Free local processing using PIL.kling-v2.6-pro-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ai-object-eraserImage to ImageEasily remove unwanted objects, people, or text from any image using AI. Just select the area you want to erase, and the model will intelligently fill the space with realistic background matching the surrounding environment. No Photoshop skills needed.runway-image-to-videoImage to VideoAnimate any image by turning it into a video with motion effects or scene continuity. RunwayML’s I2V model transforms static visuals into short clips by extrapolating depth, movement, and temporal dynamics.suno-create-musicText to AudioSuno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.z-image-baseText to ImageZ-Image Base is a general-purpose text-to-image model designed for reliable, high-quality image generation from natural language prompts. It focuses on clear composition, good prompt adherence, and versatile output across everyday scenes, product-style visuals, characters, and creative concepts.flux-kontext-pro-t2iText to ImageFlux Kontext Pro T2I offers fast and reliable generation with creative flexibility. It supports stylized prompts, character design, and fantasy themes while maintaining clear subject coherence.flux-krea-devText to ImageFlux Krea Dev is a text-to-image model built by Black Forest Labs in collaboration with Krea AI, designed to generate highly photorealistic images that avoid the common 'AI look' artifacts (plastic skin, overexposed lighting, synthetic textures). It emphasizes real texture, natural lighting, and aesthetic control.flux-kontext-max-i2iImage to ImageFlux Kontext Max I2I in Max mode allows precise image enhancement and visual transformations while retaining the source layout. It’s powerful for retouching, photo-to-art workflows, concept refinement.tiktok-carouselText to ImageAI TikTok Carousel Generator — create viral TikTok carousel posts from a single text prompt. Choose a proven storytelling format (Problem-Solution, Listicle, Tutorial, Before & After), set your slide count (3-10), and get stunning AI-generated images at 1080x1920 portrait resolution, ready to upload to TikTok.bytedance-seedream-v5.0-editImage to ImageSeedream 5.0 Lite Edit is an advanced image transformation model by ByteDance, enabling precise, controllable edits using natural language. It specializes in high-fidelity style transfer (Anime, Cyberpunk, Fantasy), background swaps, and object modification while preserving original lighting, color tones, and character consistency for professional-grade creative reworks.gpt4o-text-to-imageText to ImageGenerate images from text prompts using GPT-4o's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.gpt4o-editImage to ImageEdit a specific part of an image using natural language. Ideal for object removal, replacement, or content-aware filling.kling-o1-standard-image-to-videoImage to VideoKling O1 Standard Image-to-Video converts a single still image into a short, natural-looking video clip. It preserves the original image’s composition and lighting while adding subtle camera motion, gentle parallax, and light environmental animation. This mode focuses on realism and stability rather than heavy effects, making it ideal for clean cinematic shots, environments, characters, and product visuals.wan2.1-text-to-videoText to VideoWAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.perfect-pony-xlText to ImagePony XL is a high-quality image generation model based on Stable Diffusion XL architecture. It specializes in character art, hybrid styles, and producing detailed, polished visuals even with simpler prompts.wan2.2-speech-to-videoAudio to VideoWAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.flux-2-klein-4b-turboText to ImageFlux-2-Klein-4B Turbo is an ultra-fast, high-efficiency text-to-image model. It is a distilled version of the Klein 4B model, designed for near-instant rendering while maintaining impressive adherence to prompts. Perfect for rapid prototyping, real-time creative tools, and applications where speed is paramount.seedance-2.0-omni-referenceImage to VideoSeedance 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.midjourney-v7-image-to-imageImage to ImageUse Midjourney V7’s I2I to refine or reinterpret existing images. Modify style, mood, lighting, or content while preserving the overall composition — great for alternate versions, art variations, or polishing concepts.flux-2-klein-9b-turboText to ImageFlux-2-Klein-9B Turbo is a high-performance, mid-size text-to-image model. This distilled variant of Klein 9B provides a superior balance of speed and detail, delivering richer textures and complex scenes with significantly reduced generation times. Ideal for polished illustrations and character-rich visuals where performance is key.grok-imagine-text-to-videoText to VideoGrok Imagine is xAI’s fast, creative text-to-video model that generates short (~6-second) cinematic clips with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.ltx-2.3-video-extendVideo to VideoLTX-2.3 Video Extend seamlessly continues an existing video clip by generating additional frames that match the original motion, style, and scene composition. Powered by the LTX-2.3 architecture, it maintains temporal coherence and visual fidelity across the extension boundary.seedance-lite-reference-videoImage to VideoSeedance Lite's Reference-to-Video feature allows you to supply up to 4 images as reference inputs. The model intelligently blends aspects from these images to generate a cohesive, high-quality video.bytedance-seedream-v3Text to ImageSeedream is designed for generating visually rich and artistic images from text prompts. It excels at fantasy, anime, surrealism, and vibrant color compositions — ideal for creative visuals, storyboards, and concept art.kling-v2.1-master-i2vImage to VideoKling 2.1 Master’s I2V animates a still image into a coherent video sequence. It interprets motion, environment, and context to create realistic, visually stunning video outputs — ideal for animating portraits, scenes, or concept art.flux-kontext-effectsImage to ImageFlux Kontext Effects is a creative image and video model that applies stylized transformations, cinematic filters, and artistic reinterpretations to your inputs. Instead of generating new content from scratch, it enhances or reimagines existing images and videos with unique looks — ranging from surreal effects to realistic cinematic moods.kling-v2.1-standard-i2vImage to VideoKling 2.1 Standard (developed by Kuaishou) brings static images to life by generating smooth, realistic video clips from a single frame. It captures subtle motion, background dynamics, and camera movement to produce professional-looking animations — ideal for portraits, digital art, and cinematic illustrations.qwen-imageText to ImageGenerate high-quality, detailed images from text prompts in various styles — from realistic to artistic — perfect for creative visuals, product shots, and concept art.midjourney-v7-style-referenceImage to ImageGenerate images in the distinctive aesthetic of Midjourney v7 — blending cinematic depth, photorealism or painterly rendering, rich textures, and dynamic lighting. This style reference model helps you infuse any subject with the visual storytelling, composition, and high detail fidelity that Midjourney is known for. Ideal for concept art, stylized portraits, and stunning environment scenes.openai-soraText to VideoSora is a text-to-video generative AI model developed by OpenAI. It can generate short video clips based on descriptive text inputs, producing content that ranges from photorealistic scenes to stylized animations.veo3.1-4k-videoText to VideoGet the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.flux-pulidImage to ImageFlux PuLID is an innovative image-to-image model that enables consistent face rendering across different styles or scenes—without needing any model fine-tuning. By providing a reference image (e.g., a portrait), the model generates new visuals while maintaining your subject’s identity with high fidelity.wan2.5-text-to-video-fastText to VideoTransform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.hunyuan-image-3.0Text to ImageHunyuan Image 3.0 brings together powerful architecture (Mixture-of-Experts + autoregressive style) to produce richly detailed and coherent images from complex prompts. It can read narrative descriptions, render text and signage cleanly, and support multiple visual styles — from photorealism to illustrations.kling-o1-text-to-imageText to ImageKling O1 Text-to-Image is a high-fidelity creative image model that converts rich natural-language prompts into ultra-detailed stills. It excels at cinematic composition, realistic lighting, and coherent scene detail—great for concept art, environment renders, character portraits, and stylized imagery with photoreal or illustrative looks.latent-syncAudio to VideoLatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.creatify-lipsyncAudio to VideoRealistic lipsync video - optimized for speed, quality, and consistency.nano-banana-2Text to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.qwen-image-editImage to ImageThe Qwen Edit Image Model allows you to modify existing images using text-based editing prompts. Instead of generating from scratch, you can upload a base image and describe the desired changes (e.g., replacing objects, altering colors, adding new elements).ltx-2-pro-image-to-videoImage to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.photo-packImage to ImageGenerate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.vidu-q1-referenceImage to VideoVidu Q1 enables you to generate cinematic 1080p videos using multiple visual references—up to seven images—and text prompts. Designed for consistency, it preserves character appearance, props, and backgrounds across scenes while adding new motion and narrative elements.wan2.2-5b-fast-t2vText to VideoWan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.minimax-hailuo-02-standard-i2vImage to VideoTransforms an image into video with light, natural motion. Great for social media, quick animations, and previews.wan2.2-text-to-videoText to VideoWan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.flux-2-flex-editImage to ImageFlux-2-Flex Edit allows flexible transformation of an existing image: object replacement, material changes, lighting adjustments, style shifts, or localized edits. It preserves the original scene’s geometry, perspective, and lighting while modifying only what the edit prompt specifies.ideogram-v3-reframeImage to ImageIdeogram V3 Reframe is a specialized image-to-image model built on Ideogram 3.0, designed to intelligently extend and adapt images across diverse aspect ratios and resolutions. Leveraging advanced AI outpainting, it preserves visual consistency while enabling creative reframing for digital, print, and video content.veo3.1-fast-image-to-videoImage to VideoVeo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.kling-o1-standard-video-editVideo to VideoKling O1 Standard Video-to-Video Edit modifies an existing video while preserving its original structure, motion, and realism. It is designed for subtle, stable edits such as object replacement, background changes, lighting adjustments, or small visual tweaks. This mode prioritizes temporal consistency and natural motion, making it.flux-2-proText to ImageFlux-2-Pro Text-to-Image is a premium, high-fidelity generative model capable of producing ultra-realistic, cinematic, and deeply detailed images from text prompts. It excels at complex lighting, layered compositions, surreal visual concepts, and professional art-grade rendering suitable for concept art, advertising visuals, and world-building.minimax-hailuo-02-standard-t2vText to VideoFast and lightweight text-to-video generation. Ideal for quick drafts, previews, or playful content where speed matters more than cinematic quality.seedance-2.0-watermark-removerVideo to Video🎉 FREE for a limited time — Remove Seedance 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.flux-2-flexText to ImageFlux-2-Flex Text-to-Image is a flexible, high-fidelity generative model capable of producing detailed, imaginative, and stylistically rich scenes from text alone. It excels at surreal concepts, fantasy environments, sci-fi structures, cinematic atmospheres, and high-resolution artistic compositions with strong prompt adherence.midjourney-v7-image-to-videoImage to VideoMidjourney V7’s I2V breathes motion into still images, animating characters, environments, and objects with artistic transitions. Ideal for looping visual stories, concept animations, or enhancing still visuals with subtle motion.flux-schnellText to ImageFlux Schnell is a lightning-fast image generation model designed for rapid iterations. It delivers good visual quality from text prompts almost instantly, making it perfect for real-time concept testing, brainstorming, and UI-integrated experiences.vidu-v2.0-t2vText to VideoVidu's 2.0 model offers enhanced visual quality and comprehensive workflow support across multiple resolution options for versatile content creation.ai-dance-effectsVideo to VideoBring your characters and worlds to life with AI Dance Effects — a creative video effect that adds playful, dynamic, and cinematic motion to your generations. AI Dance Effects lets you guide how characters move, react, and express themselves.bytedance-seedream-v4.5Text to ImageSeedream-v4.5 is ByteDance’s advanced text-to-image diffusion model designed for generating high-detail, high-contrast, cinematic and stylized images. It excels at surreal fantasy concepts, sci-fi worlds, product visuals, photoreal scenes, and artistic compositions with strong prompt adherence and crisp detail.kling-v2.6-pro-t2vText to VideoKling-v2.6-Pro Text-to-Video generates high-fidelity cinematic videos directly from text prompts. It excels at complex compositions, dramatic lighting, fluid camera motion, and visually rich fantasy or sci-fi sequences.flux-2-klein-4b-turbo-editImage to ImageFlux-2-Klein-4B Turbo Edit provides ultra-fast, instruction-based image editing. This high-efficiency variant of Klein 4B Edit is optimized for near-instant swaps and tweaks while preserving layout and lighting. Ideal for real-time design tools and quick creative adjustments.openai-sora-2-pro-charactersText to TextCreate consistent AI characters for your Sora 2 videos. Provide a previous video's task ID and a prompt to define or refine your character.nano-bananaText to ImageNano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.pixverse-v5-t2vText to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.wan2.6-image-to-videoImage to VideoWAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.google-imagen4Text to ImageGoogle Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.google-imagen4-ultraText to ImageImagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.wan2.6-text-to-imageText to ImageWAN 2.6 Text-to-Image generates detailed, cinematic still images from text prompts. It focuses on strong composition, atmospheric lighting, and clear subject structure, making it suitable for fantasy and sci-fi environments, surreal concepts, architectural visuals, and dramatic world-building imagery.wan2.1-reference-videoImage to VideoWAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.qwen-image-2.0Text to ImageQwen 2.0 Text to Image model with enhanced realism.veed-lipsyncAudio to VideoGenerate realistic lipsync from any audio using VEED's latest modelsdxl-imageText to ImageSDXL is a high-quality, large Stable Diffusion model for creating photorealistic and stylized images from text. It excels at fine detail, realistic lighting, and complex scenes.infinitetalk-image-to-videoAudio to VideoInfiniteTalk Image-to-Video brings still portraits and character photos to life by generating natural, realistic talking videos. You provide a single face image and a dialogue script, and the model animates lip movement, facial expressions, and subtle head gestures to match the speech.luma-flash-reframeVideo to VideoTransform and resize your videos effortlessly with Ray 2 Flash Reframe. This tool intelligently expands or adjusts your video’s aspect ratio—adding visually consistent content to the sides, top, or bottom—without altering the original subject.ai-video-upscalerVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.flux-reduxImage to ImageFlux Redux is a transformation model that reimagines or enhances your input images while preserving their main structure and subject. It’s built for creative refinement — whether you want style transfer, artistic reinterpretation, cinematic polish, or mood transformation.qwen-image-2.0-proText to ImageQwen 2.0 Pro Text to Image model with maximum realism and fidelity.seedance-v1.5-pro-i2v-fastImage to VideoSeedance v1.5 Pro Image-to-Video Fast converts a single still image into a short cinematic video with quick generation speed. It preserves the original image’s composition, subject identity, and lighting while adding simple camera motion, light parallax, and subtle environmental animation.seedance-v1.5-pro-video-extendVideo to VideoSeedance v1.5 Pro Video Extend continues an existing video by generating additional frames that match the original scene’s style, lighting, motion, and mood. It is designed for smooth temporal consistency, making it ideal for extending cinematic shots, atmospheric scenes, or slow camera moves without introducing visual jumps or style changes.vidu-v2.0-i2vImage to VideoVidu's 2.0 model delivers advanced image-based video generation with enhanced lighting, emotion dynamics, and automatic frame interpolation for polished visual content.wan2.2-edit-videoVideo to VideoEasily modify existing videos using simple text commands. With Wan 2.2 Video-Edit, you can change attire, character appearance, or other visual elements directly within your video—no need to start from scratch. Works on uploads of 480p or 720p, for up to two minutes.nano-banana-2-editImage to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.kling-v1-avatar-proAudio to VideoKling AI Avatar Pro is the premium tier for making high-quality talking avatars. You upload a character image plus an audio file, and the model generates a realistic avatar video with lip-sync.ovi-image-to-videoImage to VideoOvi is a unified audio–video generation model that can transform a static image plus a descriptive prompt into a short video with synchronized audio. It supports both text-to-video and image-conditioned video inputs. With built-in lip sync, background audio / sound effects, and dialogue support, Ovi brings still visuals to life in cinematic fashion. Videos are generated in 540p resolution.ltx-2.3-lipsyncAudio to VideoLTX-2.3 LipSync generates a realistic talking video by synchronizing mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency—powered by the upgraded LTX-2.3 architecture.kling-v3.0-std-motion-controlVideo to VideoKling V3.0 Standard Motion Control allows for precise control over the camera and subject movement in generated videos. Powered by the latest Kling V3.0 architecture for improved temporal consistency and quality.ovi-text-to-videoText to VideoOvi is a unified model that generates synchronized video and audio from textual input. You write a scene description, including dialogue and ambient sounds, and Ovi produces a short video clip (typically ~5 seconds) where visuals and sound align naturally. Videos are generated in 540p resolution.seedance-v2.0-video-editVideo to VideoSeedance 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.kling-o1-video-editVideo to VideoKling O1 Video Edit lets you send an existing video clip plus an instruction/prompt to edit or transform the clip while preserving temporal coherence and subject identity. Typical edits include color grading, background replacement, object removal, slow-motion slo-mo, speed ramps, style transfer, subtle camera stabilization, and short extension/outro generation. Inputs can include: the source video, an optional frame mask (for localized edits), time range, and style/reference images.vidu-q2-reference-to-imageImage to ImageVIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.minimax-hailuo-02-pro-i2vImage to VideoAdvanced image-to-video with cinematic realism. Adds dynamic camera motion, realistic physics, and atmospheric detail for storytelling.bytedance-seedream-v5.0Text to ImageSeedream 5.0 Lite is ByteDance’s next-generation text-to-image model, delivering high-fidelity AI art with advanced visual reasoning and precise typography. Supporting up to 4K resolution and cinematic detail, it excels at complex scene construction, consistent character generation, and real-time knowledge integration for accurate, contextually relevant visuals.ltx-2-fast-text-to-videoText to VideoLTX Video Fast is a speed-optimised mode of Lightricks’ video-generation engine, supporting text-to-video workflows. It allows you to input a descriptive prompt and get a short video clip with motion, camera movement, lighting, and stylised visuals. The underlying model (LTX-Video) is built for real-time or near-real-time generation of video clips.wan2.6-text-to-videoText to VideoWAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.qwen-text-to-image-2512Image to ImageQwen Image Text-to-Image 2512 generates high-resolution, visually consistent images from text prompts. It focuses on strong scene structure, clean composition, and atmospheric lighting, making it well-suited for cinematic environments, surreal concepts, fantasy and sci-fi worlds.kling-v3.0-pro-image-to-videoImage to VideoKling 3.0 Pro Image-to-Video animates a single input image into a high-quality, realistic video with smooth camera motion, natural physics, and strong temporal consistency. It excels at real-world scenes, human motion, environmental details, and cinematic movement while preserving the original image’s structure and lighting.any-llmText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.qwen-image-2.0-editImage to ImageQwen 2.0 Image Edit model with precise background modification and enhancements.kling-v3.0-standard-text-to-videoText to VideoKling 3.0 Standard Text-to-Video generates smooth, realistic videos from text with stable motion and natural behavior. It works best with clear subjects, simple actions, and one continuous scene, making it ideal for cute animals, small actions, and calm cinematic moments.kling-v2.6-std-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ltx-2.3-image-to-videoImage to VideoLTX-2.3 Image-to-Video animates a single image into a coherent cinematic clip. It preserves scene composition and lighting while adding smooth camera motion, parallax, and environmental dynamics. Built on the upgraded LTX-2.3 architecture for sharper output and improved temporal consistency.minimax-hailuo-02-pro-t2vText to VideoHigh-fidelity text-to-video with cinematic rendering. Best for storytelling, cinematic clips, or realistic visuals with depth, atmosphere, and detail.ltx-2.3-text-to-videoText to VideoLTX-2.3 Text-to-Video generates cinematic video clips directly from text prompts. Built on an upgraded 2.3B architecture, it delivers sharper temporal consistency, faster synthesis, and more precise motion control than previous LTX versions. Ideal for concept visualization, story beats, and prompt-driven animation.topaz-image-upscaleImage to ImageTopaz Image Upscale is a high-quality image-to-image enhancement model that increases resolution, sharpness, and detail using AI super-resolution. It improves clarity, restores texture, reduces noise, and produces crisp, high-res output while preserving natural look and fine edges.seedance-pro-i2vImage to VideoSeedance Pro I2V advanced model animates still images into stunning short videos, preserving intricate visual details and applying smooth motion dynamics, ideal for high-end visuals and cinematic edits.flux-2-dev-editImage to ImageFlux 2 Dev Edit takes an existing image and applies transformations, replacements, or style changes based on a text instruction. It preserves composition, lighting, and the overall scene while modifying only what the edit prompt specifies. Ideal for creative replacements, stylistic adjustments, object swaps, and environment changes while keeping the original artistic integrity.video-combinerVideo to VideoCombine multiple short video clips (5s, 10s, etc.) into a single seamless full-length video. Upload your clips in order and choose the final output aspect ratio. 'Auto' preserves the aspect ratio of your first clip.suno-generate-soundsText to AudioGenerate sound effects using Suno chirp-crow model.suno-generate-lyricsText to TextGenerate lyrics using Suno.vidu-q2-text-to-imageText to ImageVIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.suno-boost-music-styleText to TextBoost style prompts for Suno music generation.pixverse-v5.5-t2vText to VideoPixVerse v5.5 T2V generates cinematic short videos directly from text. It excels at stylized fantasy, anime, surreal worlds, atmospheric environments, and fluid camera motion. The model produces vivid lighting, dynamic effects, depth-rich parallax, and smooth motion.seedance-lite-i2vImage to VideoSeedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.openrouter-visionText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.suno-add-vocalsText to AudioAdd vocals to an instrumental track.seedance-v1.5-pro-t2v-fastText to VideoSeedance v1.5 Pro Text-to-Video Fast generates short cinematic videos directly from text with an emphasis on speed and stability. It produces coherent scenes with simple camera motion, light environmental animation, and consistent lighting.ltx-2-19b-image-to-videoImage to VideoLTX-2-19B Image-to-Video animates a single image into a coherent cinematic clip with strong temporal stability. It preserves composition and lighting while adding controlled camera motion, realistic parallax, and subtle environmental dynamics—well suited for grounded scenes, near-future concepts, and story beats.suno-generate-mashupText to AudioCreate a mashup using 1-5 audio tracks.pixverse-v5-i2vImage to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.bytedance-seedream-v4Text to ImageSeedream v4 generates stunning, high-fidelity images from text prompts. It’s designed for creativity with strong support for realism, fantasy, and artistic styles.bytedance-seedream-v4-editImage to ImageSeedream v4 Edit refines or transforms existing images based on a new prompt and a reference. Instead of masking, you provide a source image and describe how it should be altered — adjusting style, details, or replacing elements while keeping the subject consistent.nano-banana-proText to ImageNano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.minimax-voice-cloneText to AudioMinimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.suno-add-instrumentalText to AudioAdd instrumental backing to acapella audio.wan2.6-image-editImage to ImageWAN 2.6 Image Edit applies targeted, instruction-based edits to an existing image while preserving composition, perspective, and lighting. It’s ideal for object replacement, material changes, environment tweaks, and style adjustments with clean integration and minimal artifacts—keeping the original scene coherent and cinematic.seedance-v2.0-i2vImage to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.seedance-v1.5-pro-i2vImage to VideoSeedance v1.5 Pro Image-to-Video converts a single still image into a smooth cinematic video clip. It preserves the original image’s composition, subject identity, and lighting while adding controlled camera motion, natural parallax, and environmental animation. This mode balances visual quality and motion complexity, making it ideal for cinematic scenes, fantasy worlds, sci-fi environments, and storytelling shots.seedance-v2.0-t2vText to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.flux-2-klein-9b-editImage to ImageFlux-2-Klein-9B Edit performs higher-quality image edits with better detail retention, lighting consistency, and texture handling compared to smaller variants. It’s well-suited for cute character edits, object additions, and visual refinements that need to look natural and polished while keeping the original scene intact.kling-v3.0-pro-text-to-videoText to VideoKling 3.0 Pro is a high-end video generation model capable of producing longer, smoother, and more realistic cinematic videos with strong motion consistency. It handles complex scenes, realistic physics, natural camera movement, and detailed environments better than earlier versions.flux-dev-loraTrainingEnables text-to-image generation using custom LoRA models. Generate consistent characters, styles, or branded visuals with high quality and fast results.flux-kontext-dev-i2iImage to ImageTakes an input images and transforms it based on a new prompt. Keeps structure or pose while changing style, appearance, or details.neta-luminaText to ImageNeta Lumina is a powerful anime-style text-to-image model developed by Neta.art Lab. It’s built on Lumina-Image-2.0, fine-tuned with over 13 million high-quality anime images. It offers strong understanding of multilingual prompts, excellent detail fidelity, support for Danbooru tags, and leaning into niche styles like furry, Guofeng, pets, scenic backgrounds, etc.suno-remix-musicText to AudioThis API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.gpt-image-1.5Text to ImageGPT-Image-1.5 is a high-quality text-to-image generation model designed for rich visual reasoning, detailed compositions, and strong prompt understanding. It excels at complex scenes, symbolic imagery, cinematic lighting, surreal concepts, product visuals, and imaginative world-building while maintaining coherence and fine detail.kling-v3.0-pro-motion-controlVideo to VideoKling V3.0 Pro Motion Control provides the highest level of detail and control for video generation. Suitable for professional workflows requiring complex cinematic camera work and subject consistency.kling-v2.1-master-t2vText to VideoKling 2.1 Master’s T2V mode allows users to generate vivid, high-quality videos from detailed text prompts. It supports dynamic scenes, natural motion, and cinematic quality — perfect for storytelling, ads, or content creation from imagination alone.flux-2-klein-4b-editImage to ImageFlux-2-Klein-4B Edit applies lightweight, instruction-based edits to an existing image. It’s best for clear object swaps, small visual changes, and cute enhancements while preserving the original scene’s layout and lighting. Ideal for fast edits, UI demos, and simple creative tweaks.ideogram-characterImage to ImageIdeogram’s Character Reference model enables consistent character generation using just one reference image. Upload a clear character portrait—and you can place that character in unlimited scenes, styles, poses, or narratives with visual fidelity maintained across all outputs.kling-v3.0-standard-image-to-videoImage to VideoKling 3.0 Standard Image-to-Video animates a single input image into a short, realistic video with smooth, stable motion. It prioritizes temporal consistency, natural physics, and subtle camera movement, making it ideal for everyday scenes, travel moments, people, vehicles, and calm cinematic shots.kling-v1-avatar-standardAudio to VideoKling AI Avatar Standard creates talking avatar videos from a single image + audio input. It supports realistic humans, animals, or stylized characters, producing lip-synced avatar videos easily.kling-v2.5-turbo-pro-i2vImage to VideoKling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.sdxl-loraTrainingThe SDXL LoRA image model enhances Stable Diffusion XL with specialized fine-tuning, letting you generate images in unique styles, characters, or themes. By applying LoRA weights, you can create visuals that match a specific aesthetic, celebrity look, anime style, or custom-trained subject.seedance-v1.5-pro-t2vText to VideoSeedance v1.5 Pro Text-to-Video generates high-quality cinematic videos directly from text prompts. It focuses on smooth motion, rich atmosphere, and coherent scene structure, making it ideal for fantasy worlds, sci-fi environments, surreal visuals, and cinematic storytelling shots with detailed lighting and depth.hunyuan-image-2.1Text to ImageHunyuan Image is a powerful text-to-image generation model that produces photorealistic and highly detailed visuals. It excels at creating portraits, environments, and concept art with strong consistency and realism. Designed for versatility, it supports both natural photography styles and imaginative artistic outputs.qwen-image-edit-plusImage to ImageQwen Image Edit Plus is an upgraded image-editing model that supports multiple image references and superior text editing. Powered by the 20B-parameter Qwen architecture, it allows changes like background swap, style transfer, object removal/addition, and precise text edits (bilingual: English/Chinese) while maintaining visual consistency and preserving details of the original images.kling-v2.5-turbo-pro-t2vText to VideoKling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.leonardoai-lucid-originText to ImageLucid Origin is LeonardoAI’s advanced image generation model, designed for ultra-realistic, vibrant, and highly detailed visuals. It excels at creating photorealistic portraits, landscapes, product shots, and stylized art while faithfully following complex prompts.wan2.5-image-to-videoImage to VideoWAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.wan2.5-text-to-videoText to VideoWAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.wan2.5-text-to-imageText to ImageWAN 2.5 Text-to-Image generates high-quality, realistic or stylized images from textual descriptions. It supports detailed visual storytelling, cinematic compositions, and versatile styles — from portraits and product shots to landscapes and fantasy scenes.topaz-video-upscaleVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.wan2.5-image-editImage to ImageThe Wan2.5 Edit Image model allows you to transform existing images with precision and creativity. By providing an image along with an edit prompt, you can make realistic changes, enhancements, or stylistic adjustments—whether it’s altering objects, changing backgrounds, adding details, or applying an entirely new artistic style.ai-video-upscaler-proVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.add-video-watermarkVideo to VideoAdd custom watermark to videos with adjustable position, opacity, and size. Free local processing using FFmpeg.video-watermark-removerVideo to VideoThe AI Video Watermark Remover is our flagship model designed to remove Sora 2 watermarks, logos, captions, and unwanted text from videos without compromising quality. Supporting a wide range of formats, it's fast, efficient, and processes with the highest quality.leonardoai-phoenix-1.0Text to ImageLeonardoAI Phoenix 1.0 is a professional-grade AI image model designed for realistic, cinematic, and highly detailed visuals. It excels at interpreting complex prompts, rendering text within images, and creating high-resolution outputs suitable for editorial, commercial, or creative projects.gpt-5-nanoText to TextGPT-5 Nano is a lightweight, high-speed language model from the GPT-5 family designed for instant text generation. It delivers intelligent, context-aware responses for creative writing, summarization, dialogue, code generation, and automation — all at low latency and cost. Perfect for chatbots, assistants, content tools, and real-time applications that need fast, reliable text output.leonardoai-motion-2.0Image to VideoMotion 2.0 is Leonardo.AI's cutting-edge model for creating high-quality 5-second videos from text prompts. It offers enhanced control over animation, including camera movements, lighting, and scene dynamics.higgsfield-soul-image-to-imageImage to ImageSOUL is an AI image model focused on hyper-realistic, magazine or editorial-style visuals, especially for fashion, portraits, lifestyle, and commercial content. It offers over 50 curated style presets to get a specific aesthetic without needing complicated prompt engineering. It generates photography-quality images with lighting, textures, and context that feel real — including natural imperfections like film grain, dust, or lens effects for authenticity.veo3.1-reference-to-videoImage to VideoVeo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.higgsfield-dop-image-to-videoImage to VideoHiggsfield’s DOP (Director of Photography) Motion Effects empower creators to combine cinematic camera moves with built-in visual effects—like explosions, fire, distortion, disintegration, and transitions—directly in AI video generation. You choose from a library of motion presets (e.g. Earth Zoom, Bullet Time, Dolly Zoom) and overlay dynamic effects that accentuate storytelling without needing a full VFX pipeline.remix-videoVideo to VideoTransform and resize your videos effortlessly with remix video tool.openai-sora-2-pro-storyboardText to VideoSora 2 Pro enables creators to structure video narratives by chaining multiple scenes through storyboard “cards.” Each card defines a segment of the video—setting, characters, actions, timing—and the model stitches them into a cohesive multi-scene video. This gives you more control over pacing, transitions, and storytelling flow.veo3.1-extend-videoText to VideoVeo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.gpt-5-miniText to TextGPT‑5 Mini is a compact yet powerful AI that converts plain text ideas into detailed, structured prompts suitable for use in text-to-image, text-to-video, and other generative AI models. It’s perfect for creators who want to quickly craft high-quality prompts without manually thinking about style, composition, and descriptive details. The model helps accelerate workflows for artists, video producers, and designers.seedance-pro-i2v-fastImage to VideoSeedance Pro Fast is the high-speed image-to-video generation variant from ByteDance’s Seedance series. With this model you upload a reference image and—using a text prompt—generate short, dynamic video clips (typically 3-12 seconds) featuring smooth motion, cinematic camera moves, prompt-accurate actions, and high visual fidelity. It supports resolutions up to 1080p, multiple aspect ratios (16:9, 9:16, etc.), and rapid turnaround—ideal for social content, product motion, storytelling from a still, and fast prototyping.seedance-pro-t2v-fastText to VideoSeedance Pro Fast is ByteDance’s advanced text-to-video model that turns natural-language prompts into short, cinematic video clips with realistic motion, camera dynamics, and consistent scene detail.ltx-2-pro-text-to-videoText to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.ltx-2-fast-image-to-videoImage to VideoLTX-2 Fast is a speed-optimized mode of the LTX-2 engine by Lightricks, focused on generating short video clips from a still image + prompt (I2V) with good fidelity and rapid turnaround. It supports audio/video together, multiple aspect ratios, and is ideal when you need quick output for iteration or storyboarding.vidu-q2-referenceImage to VideoVidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.vidu-q2-turbo-start-end-videoImage to VideoVidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.vidu-q2-pro-start-end-videoImage to VideoVidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.minimax-hailuo-2.3-pro-t2vText to VideoHailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.minimax-hailuo-2.3-standard-i2vImage to VideoHailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.minimax-hailuo-2.3-standard-t2vText to VideoHailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.minimax-hailuo-2.3-fastImage to VideoMinimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.kling-v2.5-turbo-std-i2vImage to VideoKling 2.5 Turbo Std: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.reve-text-to-imageText to ImageGenerate images from text prompts using reve's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.reve-image-editImage to ImageReVE Edit is a next-generation image editing model that allows users to apply detailed visual transformations through natural language. Whether you want to restyle portraits, modify backgrounds, or create artistic reinterpretations, ReVE Edit delivers realistic and coherent results while preserving structure and identity.grok-imagine-text-to-imageText to ImageGrok Imagine is xAI’s high-quality image generation model that transforms text prompts into detailed, stylish, and visually expressive images. It excels at creating vivid scenes, characters, environments, and concept art with strong lighting, depth, and artistic clarity. Get 6 images each time.seedvr2-image-upscaleImage to ImageSeedVR2 is a one-step diffusion-transformer model designed for image restoration, super-resolution, deblurring, and artifact removal. It enhances low-quality or compressed images into clean, sharp, high-resolution results while preserving natural colors and fine details.qwen-image-edit-plus-loraImage to ImageQwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.nano-banana-pro-editImage to ImageNano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.kling-o1-edit-imageImage to ImageKling O1 Image Edit applies targeted transformations to an existing image while preserving composition, lighting, and visual consistency. Use it to replace objects, retouch elements, change materials, or apply stylistic shifts with high fidelity and minimal artifacts.kling-o1-image-to-videoImage to VideoKling O1’s Image-to-Video mode transforms one or more reference images into short cinematic video clips by adding natural motion, camera choreography, and scene dynamics while preserving subject identity and visual consistency. It supports start/end frames.kling-o1-reference-to-videoImage to VideoKling O1’s Reference-to-Video mode generates a dynamic video using one or multiple reference images as the visual foundation. It preserves identity, style, composition, and key visual details from the references while adding realistic camera motion, environment dynamics, and scene animation.kling-o1-video-edit-fastVideo to VideoVideo Edit Fast is the lightweight, high-speed editing mode of Kling O1. It performs quick edits on an existing video without heavy processing—ideal for fast object replacements, light enhancements, color tweaks, or simple visual adjustments. This mode focuses on speed over complex reconstruction, making it suitable for rapid iterations, previews, and small edits while preserving the original video’s motion and structure.flux-2-devText to ImageFlux 2 Dev is a powerful text-to-image diffusion model designed for high-quality, fast, and highly detailed visual generation. It excels at creating cinematic lighting, vibrant compositions, surreal concepts, characters, products, and worlds with strong prompt following and artistic control. Ideal for rapid image ideation, visual storytelling, and concept art.flux-2-pro-editImage to ImageFlux-2-Pro Edit enables precise, high-fidelity modifications to an existing image while preserving its lighting, style, mood, and composition. It’s ideal for replacing objects, altering materials, adjusting environmental elements, or performing stylistic transformations without damaging the original scene’s quality. Flux-2-Pro maintains ultra-detailed textures and cinematic realism during edits.bytedance-seedream-v4.5-editImage to ImageSeedream-v4.5 Edit allows you to transform an existing image using natural-language instructions. It preserves the core composition, lighting, and style of the original while modifying only the requested elements — perfect for object replacement, environment changes, stylistic adjustments, and high-detail creative reworks.kling-v2.6-pro-i2vImage to VideoKling-v2.6-Pro Image-to-Video transforms a single creative image into a short cinematic video. It preserves the original style, lighting, and composition while adding smooth camera motion, atmospheric effects, and dynamic environmental animation.kling-v2-avatar-standardAudio to VideoAI-Avatar v2 Standard generates a talking-avatar video from a reference image and an audio dialogue. It performs accurate lip-sync, natural facial expressions, subtle head motion, blinking, and light emotional cues based on voice tone. This Standard version focuses on speed and natural realism.wan2.2-spicy-image-to-videoImage to VideoWan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.minimax-speech-2.6-hdText to AudioSpeech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.minimax-speech-2.6-turboText to AudioSpeech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.seedance-v1.5-pro-video-extend-fastVideo to VideoSeedance v1.5 Pro Video Extend Fast quickly extends an existing video by generating a short continuation that matches the original style, motion, and lighting. This mode prioritizes fast output and smooth continuity with minimal new motion, making it ideal for previews, quick edits, and lightweight shot extensions without complex effects.gpt-image-1.5-editImage to ImageGPT-Image-1.5 Edit applies precise, instruction-based modifications to an existing image while preserving composition, lighting, perspective, and visual coherence. It’s well-suited for object replacement, concept evolution, symbolic edits, and creative transformations that feel natural and intentional rather than destructive.grok-imagine-image-to-imageImage to ImageGrok Imagine Image-to-Image transforms an existing image using natural language instructions while preserving scene structure, perspective, and lighting. It is ideal for object replacement, environment evolution, concept re-imagining, and creative edits that feel grounded and visually coherent rather than over-stylized.ltx-2-19b-text-to-videoText to VideoLTX-2-19B Text-to-Video generates coherent cinematic videos directly from text, with an emphasis on temporal stability, natural motion, and conceptual clarity. It works best when the scene has a strong visual idea where motion reinforces meaning rather than overwhelming it.flux-2-klein-4bText to ImageFlux-2-Klein-4B is a lightweight, fast text-to-image model optimized for clear subject rendering, good prompt adherence, and efficient generation. It works best with simple compositions, everyday scenes, and cute or friendly visuals, making it ideal for UI graphics, demos, thumbnails, mascots, and quick creative iterations.flux-2-klein-9bText to ImageFlux-2-Klein-9B is a mid-size text-to-image model that balances detail quality and generation speed. It handles richer lighting, better textures, and more nuanced scenes than smaller variants, while still working well with clear, grounded prompts. Ideal for polished illustrations, product visuals, mascots, and everyday scenes with character.z-image-pText to ImageZ-Image P is based on PiAPI's Qubico/z-image text-to-image model.openai-sora-2-standard-image-to-videoImage to VideoOpenAI Sora 2 Standard Image to Video model (High Priority). Generate stunning 10s videos from an image and text prompt.flux-2-klein-9b-turbo-editImage to ImageFlux-2-Klein-9B Turbo Edit offers high-quality, ultra-fast image editing with superior detail retention. This high-efficiency version of Klein 9B Edit handles lighting and textures with precision while delivering edits much faster than the standard variant. Best for polished character edits and professional refinements where speed is critical.292 Models FoundESC TO CLOSETop UpAdd credits to your account and unlock more AI powerSeedance 2.0 Launch OfferAdd $500 more to unlock +3%07d:02h:47m:21s🎁 More discounts ↓ Top up$10 minimum, $5 incrementsCurrencyUSDINRAmex Card User?Please switch currency to INR for successful Amex card payments.$10Buy$50BuyPopular$100BuyCustom Amount ($10 – $500)Buy $10 My AccountUsernameuser mail idCurrent Balance$Top UpBillingTransaction HistoryS.NoDescriptionDateAmountNo transactions yet. Make your first top-up to get started!No results found/1Go --- hidream-i1-devText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.veo3-image-to-videoImage to VideoVEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.wan2.1-text-to-imageText to ImageWAN 2.1 is a powerful AI model that transforms text prompts into high-resolution, photorealistic images. It excels at detailed object rendering, realistic lighting, and fine textures, making it ideal for visual content, concept art, advertising, and digital storytelling.ai-video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.motion-controlsImage to VideoMotion Controls adds dynamic camera movements, speed ramps, and zoom effects to bring your images to life as smooth, engaging videos.vfxImage to VideoVFX delivers high-impact visual effects like explosions, particles, and cinematic overlays to transform static images into action-packed videos.veo3-text-to-videoText to VideoVEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.flux-kontext-max-t2iText to ImageFlux Kontext Max T2I delivers photorealistic or cinematic-quality images with exceptional detail. It's optimized for high-end visuals — from realistic humans to polished product renders.runway-text-to-videoText to VideoGenerate short, high-quality videos from plain text prompts. RunwayML’s text-to-video model interprets your written description and animates it into a moving visual scene with realistic or stylized motion.suno-extend-musicText to AudioThis API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.hunyuan-text-to-videoText to VideoHunyuan T2V generates detailed and dynamic videos from text prompts with a focus on realism and coherent motion. It handles multi-object scenes, human actions, and cinematic compositions effectively, making it ideal for storytelling and visual concepts.veo3-fast-text-to-videoText to VideoVEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.ai-product-shotImage to ImageInstantly generate studio-quality product images with AI. Upload your item photo and get clean, stylized shots perfect for e-commerce, ads, and catalogs.gpt4o-image-to-imageImage to ImageTransform an input image based on a new prompt — like changing style, lighting, or composition. Useful for reinterpreting visuals while keeping structure.hunyuan-image-to-videoImage to VideoHunyuan I2V takes a static image and generates realistic video animations by interpreting motion and context. It works well for human portraits, objects, or scenes, adding lifelike movement while maintaining the image's integrity.ai-video-face-swapVideo to VideoReplace faces in videos with stunning realism. Our AI ensures accurate expression transfer, lighting consistency, and smooth frame-by-frame blending.hunyuan-fast-text-to-videoText to VideoHunyuan Fast T2V provides accelerated video generation from text prompts with slightly reduced detail but excellent speed. Ideal for rapid prototyping, concept testing, and short-form ideas where time is critical.runway-aleph-v2vVideo to VideoTransform any input video into a new visual style or scene while preserving motion and structure. Aleph V2V lets you apply artistic looks, cinematic lighting, or thematic changes to existing footage.minimax-image-01-subject-referenceImage to ImageMinimax’s I2I “Subject Reference” model enables you to transform images while preserving the appearance of a subject using a single reference image. Ideal for maintaining character likeness—features, clothing, or expression—across different styles or settings.ai-product-photographyImage to ImageCreate professional-grade product photos using AI. Upload your item image and describe it with a prompt, and get studio-style, lifestyle, or creative backgrounds in secondsbytedance-seededit-v3Image to ImageSeededit allows precise edits to images using masks and prompt guidance. Whether you're replacing backgrounds, changing clothing, or inpainting missing areas, Seededit ensures realistic, high-quality results with semantic control.ai-background-removerImage to ImageInstantly remove image backgrounds with pixel-perfect precision. Ideal for product photos, profile pictures, and creative projects.ai-image-upscalerImage to ImageTransform blurry or pixelated images into high-definition visuals. Our AI Image Upscaler uses deep learning to reconstruct details and bring your visuals to life.wan2.2-image-to-videoImage to VideoWan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.runway-act-two-i2vImage to VideoUpload a single character image and a driving video — the model transfers facial expressions and head movements from the video onto your image, bringing it to life. It works with photos, illustrations, or stylized portraits, making them speak, blink, and move naturally. Ideal for avatars, AI presenters, digital actors, and story scenes.nano-banana-effectsImage to ImageNano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.pixverse-v4.5-i2vImage to VideoUpload an image and PixVerse v4.5 will breathe life into it with smooth camera motion, realistic effects, and animated elements. Whether it’s a portrait, landscape, or concept art, this mode turns still visuals into dynamic short videos.ai-image-face-swapImage to ImageAdvanced facial recognition and blending algorithms enable precise face swaps while preserving skin tone, lighting, and facial geometry.midjourney-v7-omni-referenceImage to ImageMidjourney's Omni Reference lets you reuse characters, creatures, or styles from an existing image and place them into entirely new scenes. Simply provide a reference image (oref) and Midjourney will maintain identity, details, and visual consistency — ideal for storytelling, character design, or branding across multiple generations.ideogram-v3-t2iText to ImageIdeogram v3 is an advanced text-to-image model designed for creating highly detailed and visually striking images directly from text prompts. It’s especially good for artistic compositions, design mockups, concept art, and photorealistic scenes. With strong support for text rendering inside images, it’s widely used for posters, typography-based art, and creative branding.ai-dress-changeImage to ImageInstantly change outfits in images using AI. Visualize different clothing styles without the need for physical trials—perfect for fashion, e-commerce, and virtual try-ons.grok-imagine-image-to-videoImage to VideoGrok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into short (≈6 second) cinematic videos with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.image-effectsImage to ImageAI Image Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning images from a image.mmaudio-v2-text-to-audioText to AudioConvert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.mmaudio-v2-video-to-videoVideo to VideoMMAudio-v2 generates high-quality, synchronized audio from video or text inputs. Seamlessly integrate it with AI video models to create fully-voiced, expressive video content.wan2.1-image-to-videoImage to VideoAnimate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.sync-lipsyncAudio to VideoGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.google-imagen4-fastText to ImageImagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.openai-sora-2-pro-image-to-videoImage to VideoSora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.runway-act-two-v2vVideo to VideoTake an existing character video and sync it with the motion from a reference video. This lets you update facial expressions, head turns, and speech gestures while keeping the original look and style. It’s perfect for reshooting performances, dubbing, or animating characters without re-rendering visuals.luma-modify-videoVideo to VideoLuma Modify Video lets you transform an existing video into a new creative scene while keeping the original motion and timing intact. The result is a new video with the same movements but a completely fresh look, atmosphere, or theme.pixverse-v4.5-t2vText to VideoPixVerse v4.5 transforms descriptive text into vivid, high-resolution video clips. It understands complex scenes, human motion, and cinematic camera angles — great for creative storytelling, trailers, and animated concepts.veo3.1-image-to-videoImage to VideoVeo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.seedance-pro-t2vText to VideoSeedance Pro delivers high-fidelity video generation from text, producing rich visuals, smooth camera movement, and realistic scenes. Best for storytelling, content creation, and visual production.nano-banana-editImage to ImageNano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.infinitetalk-video-to-videoVideo to VideoInfiniteTalk Video-to-Video enhances or transforms existing videos by syncing the subject’s lip movements and facial expressions with new dialogue or speech. Instead of starting from a still image, you provide a video clip, and the model seamlessly reanimates the speaker’s mouth and expressions to match the script.openai-sora-2-image-to-videoImage to VideoSora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.veo3.1-text-to-videoText to VideoVeo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.chroma-imageText to ImageCroma Image is an advanced text-to-image generation model designed for high-quality, creative, and versatile visuals. It can produce anything from photorealistic portraits and products to imaginative concept art, fantasy illustrations, and cinematic scenes.wan2.2-animateVideo to VideoWan2.2 Animate is a video-to-video model for animating a character or replacing a character in existing video clips. It replicates holistic movement and facial expressions from a reference video or pose while preserving the target character’s appearance. You upload both an image (for the character) and a video containing motion/expression, and the model generates a video where the character in your image moves like the reference. Supports 480p or 720p, up to 120 secondsopenai-sora-2-text-to-videoText to VideoSora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.minimax-hailuo-2.3-pro-i2vImage to VideoHailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.ai-skin-enhancerImage to ImageSmooth skin, reduce blemishes, and enhance complexion with natural-looking results. Perfect for portraits, selfies, and professional photo retouching.wan2.5-image-to-video-fastImage to VideoConvert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.ai-clippingVideo to VideoConvert long-form videos into engaging short clips using AI clipping.ltx-2-19b-lipsyncAudio to VideoLTX-2-19B LipSync generates a realistic talking video by synchronizing a person’s mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency. Ideal for avatars, dubbing, dialogue replacement, and character narration.ai-captionsVideo to VideoAdd AI-generated animated captions to any video using Vadoo's caption engine. Supports multiple languages and viral caption themes like Hormozi style. Perfect for social media creators, marketers, and content producers.midjourney-v7-text-to-imageText to ImageMidjourney V7 produces high-quality, stylized images from text prompts. Known for its artistic flair, surreal composition, and vivid textures, it's perfect for character concepts, fantasy environments, and creative illustrations.ai-color-photoImage to ImageAutomatically add lifelike colors to black-and-white images. Our AI brings history to life with natural tones, accurate shading, and context-aware colorization.flux-devText to ImageGenerate stunning visuals from simple text prompts. Flux Dev transforms your ideas into high-quality, creative images using powerful AI vision models. Perfect for design, storytelling, concept art, and marketing.hidream-i1-fastText to ImageOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.openai-sora-2-pro-text-to-videoText to VideoSora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.veo3-fast-image-to-videoImage to VideoQuickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.flux-kontext-dev-t2iText to ImageGenerates an image from a text prompt, with optional reference image for pose or style guidance. Ideal for controlled, consistent image creation using just a description.seedance-lite-t2vText to VideoSeedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.kling-o1-text-to-videoText to VideoKling O1 is a unified, multi-modal video generation engine that transforms natural language prompts into short cinematic video clips. It supports text-to-video generation with realistic motion, dynamic camera moves, and coherent scene rendering.seedance-v2.0-extendText to VideoSeedance 2.0 Extend Video continues an existing Seedance 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment.hidream-i1-fullText to ImageThe most advanced version of HiDream I1, delivering high-resolution, detailed images with superior prompt understanding. Best suited for production, content creation, and high-fidelity applications.flux-kontext-pro-i2iImage to ImageFlux Kontext Pro I2I variant enables transforming base images into refined artwork while keeping structure intact. It’s useful for sketch refinement, visual style changes, and creative edits such as re-dressing, relighting, or re-theming with prompt guidance.ai-ghibli-styleImage to ImageBring your imagination to life with art inspired by the enchanting world of Studio Ghibli. This AI model generates dreamy, hand-drawn visuals with soft colors, whimsical characters, and painterly backgroundswan2.1-lora-t2vTrainingWAN 2.1 LoRA T2V enables users to generate videos from text prompts with custom-trained LoRA modules. Tailor the generation to specific characters, outfits, or animation styles — ideal for brand storytelling, fan content, and stylized animations.portrait-stylistImage to ImageProfessional AI portrait styles including hair, makeup, style, and fashion transformations.kling-v2.1-pro-i2vImage to VideoKling 2.1 Pro is the high-end version of Kuaishou’s video generation model, offering enhanced realism, longer motion sequences, and cinematic quality. In I2V mode, it animates static images with fluid environmental effects.veo3.1-fast-text-to-videoText to VideoVeo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.qwen-image-2.0-pro-editImage to ImageQwen 2.0 Pro Image Edit model with maximum precision and modifications.ai-anime-generatorText to ImageCreate stunning anime-style artwork instantly with our AI Anime Generator. Customize characters, scenes, and styles effortlessly in seconds!z-image-turboText to ImageZ-Image Turbo is a high-speed text-to-image model optimized for fast creative generation. It produces detailed, high-contrast, high-resolution images with strong stylization control. Ideal for rapid concept creation, visual exploration, product ideas, fantasy scenes, and cinematic composition tests. Designed for low latency and strong prompt adherence.kling-v2-avatar-proAudio to VideoAI-Avatar v2 Pro takes a reference image of a person/character and an audio dialogue clip, then generates a realistic talking-avatar video. It preserves identity, lip syncs accurately to the audio, adds natural head movement, eye motion, expressions, and cinematic lighting.openai-sora-2-standard-text-to-videoText to VideoOpenAI Sora 2 Standard Text to Video model (High Priority). Generate stunning 10s videos from text prompts.ai-image-extensionImage to ImageExpand the edges of any image with AI. This model continues your original photo or artwork beyond its borders while matching style, lighting, and content.wan2.1-lora-i2vTrainingBring still images to life using WAN 2.1 LoRA I2V, which supports custom LoRA fine-tunes for identity consistency. Animate expressions, subtle movements, or full-body actions while preserving personalized features from the image and LoRA.pixverse-v5.5-i2vImage to VideoPixVerse v5.5 I2V transforms a single image into a dynamic cinematic video clip. It adds smooth camera motion, atmospheric animation, natural parallax, and environmental effects while preserving the image’s original art style and composition.heygen-video-translateVideo to VideoConvert any video into 175+ languages with synchronized voice translation, AI-voice cloning, and accurate lip sync. Just upload your video (or provide a link), select a target language, and HeyGen recreates the speech in that language. 0.05$ per second.wan2.2-spicy-video-extendVideo to VideoWan-2.2-spicy Video Extend continues an existing video by generating new frames that match the original style but add stronger motion, bolder effects, and spicier dramatics.kling-o1-standard-reference-to-videoImage to VideoKling O1 Standard Reference-to-Video generates a smooth, realistic video using one or multiple reference images as visual guidance. It preserves the visual identity, composition, and lighting from the references while adding subtle camera motion, natural parallax, and light environmental animation. This mode prioritizes stability and realism, making it ideal for character shots, environments, product visuals, and calm cinematic scenes.qwen-image-edit-2511Image to ImageQwen Image Edit 2511 performs precise, instruction-driven edits on an existing image while preserving composition, lighting, and overall style. It’s well-suited for object replacement, material changes, localized edits, and subtle scene adjustments with strong visual consistency and minimal artifacts.add-image-watermarkImage to ImageAdd custom watermark to images with adjustable position, opacity, and size. Free local processing using PIL.kling-v2.6-pro-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ai-object-eraserImage to ImageEasily remove unwanted objects, people, or text from any image using AI. Just select the area you want to erase, and the model will intelligently fill the space with realistic background matching the surrounding environment. No Photoshop skills needed.runway-image-to-videoImage to VideoAnimate any image by turning it into a video with motion effects or scene continuity. RunwayML’s I2V model transforms static visuals into short clips by extrapolating depth, movement, and temporal dynamics.suno-create-musicText to AudioSuno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.z-image-baseText to ImageZ-Image Base is a general-purpose text-to-image model designed for reliable, high-quality image generation from natural language prompts. It focuses on clear composition, good prompt adherence, and versatile output across everyday scenes, product-style visuals, characters, and creative concepts.flux-kontext-pro-t2iText to ImageFlux Kontext Pro T2I offers fast and reliable generation with creative flexibility. It supports stylized prompts, character design, and fantasy themes while maintaining clear subject coherence.flux-krea-devText to ImageFlux Krea Dev is a text-to-image model built by Black Forest Labs in collaboration with Krea AI, designed to generate highly photorealistic images that avoid the common 'AI look' artifacts (plastic skin, overexposed lighting, synthetic textures). It emphasizes real texture, natural lighting, and aesthetic control.flux-kontext-max-i2iImage to ImageFlux Kontext Max I2I in Max mode allows precise image enhancement and visual transformations while retaining the source layout. It’s powerful for retouching, photo-to-art workflows, concept refinement.tiktok-carouselText to ImageAI TikTok Carousel Generator — create viral TikTok carousel posts from a single text prompt. Choose a proven storytelling format (Problem-Solution, Listicle, Tutorial, Before & After), set your slide count (3-10), and get stunning AI-generated images at 1080x1920 portrait resolution, ready to upload to TikTok.bytedance-seedream-v5.0-editImage to ImageSeedream 5.0 Lite Edit is an advanced image transformation model by ByteDance, enabling precise, controllable edits using natural language. It specializes in high-fidelity style transfer (Anime, Cyberpunk, Fantasy), background swaps, and object modification while preserving original lighting, color tones, and character consistency for professional-grade creative reworks.gpt4o-text-to-imageText to ImageGenerate images from text prompts using GPT-4o's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.gpt4o-editImage to ImageEdit a specific part of an image using natural language. Ideal for object removal, replacement, or content-aware filling.kling-o1-standard-image-to-videoImage to VideoKling O1 Standard Image-to-Video converts a single still image into a short, natural-looking video clip. It preserves the original image’s composition and lighting while adding subtle camera motion, gentle parallax, and light environmental animation. This mode focuses on realism and stability rather than heavy effects, making it ideal for clean cinematic shots, environments, characters, and product visuals.wan2.1-text-to-videoText to VideoWAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.perfect-pony-xlText to ImagePony XL is a high-quality image generation model based on Stable Diffusion XL architecture. It specializes in character art, hybrid styles, and producing detailed, polished visuals even with simpler prompts.wan2.2-speech-to-videoAudio to VideoWAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.flux-2-klein-4b-turboText to ImageFlux-2-Klein-4B Turbo is an ultra-fast, high-efficiency text-to-image model. It is a distilled version of the Klein 4B model, designed for near-instant rendering while maintaining impressive adherence to prompts. Perfect for rapid prototyping, real-time creative tools, and applications where speed is paramount.seedance-2.0-omni-referenceImage to VideoSeedance 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.midjourney-v7-image-to-imageImage to ImageUse Midjourney V7’s I2I to refine or reinterpret existing images. Modify style, mood, lighting, or content while preserving the overall composition — great for alternate versions, art variations, or polishing concepts.flux-2-klein-9b-turboText to ImageFlux-2-Klein-9B Turbo is a high-performance, mid-size text-to-image model. This distilled variant of Klein 9B provides a superior balance of speed and detail, delivering richer textures and complex scenes with significantly reduced generation times. Ideal for polished illustrations and character-rich visuals where performance is key.grok-imagine-text-to-videoText to VideoGrok Imagine is xAI’s fast, creative text-to-video model that generates short (~6-second) cinematic clips with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.ltx-2.3-video-extendVideo to VideoLTX-2.3 Video Extend seamlessly continues an existing video clip by generating additional frames that match the original motion, style, and scene composition. Powered by the LTX-2.3 architecture, it maintains temporal coherence and visual fidelity across the extension boundary.seedance-lite-reference-videoImage to VideoSeedance Lite's Reference-to-Video feature allows you to supply up to 4 images as reference inputs. The model intelligently blends aspects from these images to generate a cohesive, high-quality video.bytedance-seedream-v3Text to ImageSeedream is designed for generating visually rich and artistic images from text prompts. It excels at fantasy, anime, surrealism, and vibrant color compositions — ideal for creative visuals, storyboards, and concept art.kling-v2.1-master-i2vImage to VideoKling 2.1 Master’s I2V animates a still image into a coherent video sequence. It interprets motion, environment, and context to create realistic, visually stunning video outputs — ideal for animating portraits, scenes, or concept art.flux-kontext-effectsImage to ImageFlux Kontext Effects is a creative image and video model that applies stylized transformations, cinematic filters, and artistic reinterpretations to your inputs. Instead of generating new content from scratch, it enhances or reimagines existing images and videos with unique looks — ranging from surreal effects to realistic cinematic moods.kling-v2.1-standard-i2vImage to VideoKling 2.1 Standard (developed by Kuaishou) brings static images to life by generating smooth, realistic video clips from a single frame. It captures subtle motion, background dynamics, and camera movement to produce professional-looking animations — ideal for portraits, digital art, and cinematic illustrations.qwen-imageText to ImageGenerate high-quality, detailed images from text prompts in various styles — from realistic to artistic — perfect for creative visuals, product shots, and concept art.midjourney-v7-style-referenceImage to ImageGenerate images in the distinctive aesthetic of Midjourney v7 — blending cinematic depth, photorealism or painterly rendering, rich textures, and dynamic lighting. This style reference model helps you infuse any subject with the visual storytelling, composition, and high detail fidelity that Midjourney is known for. Ideal for concept art, stylized portraits, and stunning environment scenes.openai-soraText to VideoSora is a text-to-video generative AI model developed by OpenAI. It can generate short video clips based on descriptive text inputs, producing content that ranges from photorealistic scenes to stylized animations.veo3.1-4k-videoText to VideoGet the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.flux-pulidImage to ImageFlux PuLID is an innovative image-to-image model that enables consistent face rendering across different styles or scenes—without needing any model fine-tuning. By providing a reference image (e.g., a portrait), the model generates new visuals while maintaining your subject’s identity with high fidelity.wan2.5-text-to-video-fastText to VideoTransform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.hunyuan-image-3.0Text to ImageHunyuan Image 3.0 brings together powerful architecture (Mixture-of-Experts + autoregressive style) to produce richly detailed and coherent images from complex prompts. It can read narrative descriptions, render text and signage cleanly, and support multiple visual styles — from photorealism to illustrations.kling-o1-text-to-imageText to ImageKling O1 Text-to-Image is a high-fidelity creative image model that converts rich natural-language prompts into ultra-detailed stills. It excels at cinematic composition, realistic lighting, and coherent scene detail—great for concept art, environment renders, character portraits, and stylized imagery with photoreal or illustrative looks.latent-syncAudio to VideoLatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.video-effectsImage to VideoAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.creatify-lipsyncAudio to VideoRealistic lipsync video - optimized for speed, quality, and consistency.nano-banana-2Text to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.qwen-image-editImage to ImageThe Qwen Edit Image Model allows you to modify existing images using text-based editing prompts. Instead of generating from scratch, you can upload a base image and describe the desired changes (e.g., replacing objects, altering colors, adding new elements).ltx-2-pro-image-to-videoImage to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.photo-packImage to ImageGenerate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.vidu-q1-referenceImage to VideoVidu Q1 enables you to generate cinematic 1080p videos using multiple visual references—up to seven images—and text prompts. Designed for consistency, it preserves character appearance, props, and backgrounds across scenes while adding new motion and narrative elements.wan2.2-5b-fast-t2vText to VideoWan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.minimax-hailuo-02-standard-i2vImage to VideoTransforms an image into video with light, natural motion. Great for social media, quick animations, and previews.wan2.2-text-to-videoText to VideoWan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.flux-2-flex-editImage to ImageFlux-2-Flex Edit allows flexible transformation of an existing image: object replacement, material changes, lighting adjustments, style shifts, or localized edits. It preserves the original scene’s geometry, perspective, and lighting while modifying only what the edit prompt specifies.ideogram-v3-reframeImage to ImageIdeogram V3 Reframe is a specialized image-to-image model built on Ideogram 3.0, designed to intelligently extend and adapt images across diverse aspect ratios and resolutions. Leveraging advanced AI outpainting, it preserves visual consistency while enabling creative reframing for digital, print, and video content.veo3.1-fast-image-to-videoImage to VideoVeo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.kling-o1-standard-video-editVideo to VideoKling O1 Standard Video-to-Video Edit modifies an existing video while preserving its original structure, motion, and realism. It is designed for subtle, stable edits such as object replacement, background changes, lighting adjustments, or small visual tweaks. This mode prioritizes temporal consistency and natural motion, making it.flux-2-proText to ImageFlux-2-Pro Text-to-Image is a premium, high-fidelity generative model capable of producing ultra-realistic, cinematic, and deeply detailed images from text prompts. It excels at complex lighting, layered compositions, surreal visual concepts, and professional art-grade rendering suitable for concept art, advertising visuals, and world-building.minimax-hailuo-02-standard-t2vText to VideoFast and lightweight text-to-video generation. Ideal for quick drafts, previews, or playful content where speed matters more than cinematic quality.seedance-2.0-watermark-removerVideo to Video🎉 FREE for a limited time — Remove Seedance 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.flux-2-flexText to ImageFlux-2-Flex Text-to-Image is a flexible, high-fidelity generative model capable of producing detailed, imaginative, and stylistically rich scenes from text alone. It excels at surreal concepts, fantasy environments, sci-fi structures, cinematic atmospheres, and high-resolution artistic compositions with strong prompt adherence.midjourney-v7-image-to-videoImage to VideoMidjourney V7’s I2V breathes motion into still images, animating characters, environments, and objects with artistic transitions. Ideal for looping visual stories, concept animations, or enhancing still visuals with subtle motion.flux-schnellText to ImageFlux Schnell is a lightning-fast image generation model designed for rapid iterations. It delivers good visual quality from text prompts almost instantly, making it perfect for real-time concept testing, brainstorming, and UI-integrated experiences.vidu-v2.0-t2vText to VideoVidu's 2.0 model offers enhanced visual quality and comprehensive workflow support across multiple resolution options for versatile content creation.ai-dance-effectsVideo to VideoBring your characters and worlds to life with AI Dance Effects — a creative video effect that adds playful, dynamic, and cinematic motion to your generations. AI Dance Effects lets you guide how characters move, react, and express themselves.bytedance-seedream-v4.5Text to ImageSeedream-v4.5 is ByteDance’s advanced text-to-image diffusion model designed for generating high-detail, high-contrast, cinematic and stylized images. It excels at surreal fantasy concepts, sci-fi worlds, product visuals, photoreal scenes, and artistic compositions with strong prompt adherence and crisp detail.kling-v2.6-pro-t2vText to VideoKling-v2.6-Pro Text-to-Video generates high-fidelity cinematic videos directly from text prompts. It excels at complex compositions, dramatic lighting, fluid camera motion, and visually rich fantasy or sci-fi sequences.flux-2-klein-4b-turbo-editImage to ImageFlux-2-Klein-4B Turbo Edit provides ultra-fast, instruction-based image editing. This high-efficiency variant of Klein 4B Edit is optimized for near-instant swaps and tweaks while preserving layout and lighting. Ideal for real-time design tools and quick creative adjustments.openai-sora-2-pro-charactersText to TextCreate consistent AI characters for your Sora 2 videos. Provide a previous video's task ID and a prompt to define or refine your character.nano-bananaText to ImageNano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.pixverse-v5-t2vText to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.wan2.6-image-to-videoImage to VideoWAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.google-imagen4Text to ImageGoogle Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.google-imagen4-ultraText to ImageImagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.wan2.6-text-to-imageText to ImageWAN 2.6 Text-to-Image generates detailed, cinematic still images from text prompts. It focuses on strong composition, atmospheric lighting, and clear subject structure, making it suitable for fantasy and sci-fi environments, surreal concepts, architectural visuals, and dramatic world-building imagery.wan2.1-reference-videoImage to VideoWAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.qwen-image-2.0Text to ImageQwen 2.0 Text to Image model with enhanced realism.veed-lipsyncAudio to VideoGenerate realistic lipsync from any audio using VEED's latest modelsdxl-imageText to ImageSDXL is a high-quality, large Stable Diffusion model for creating photorealistic and stylized images from text. It excels at fine detail, realistic lighting, and complex scenes.infinitetalk-image-to-videoAudio to VideoInfiniteTalk Image-to-Video brings still portraits and character photos to life by generating natural, realistic talking videos. You provide a single face image and a dialogue script, and the model animates lip movement, facial expressions, and subtle head gestures to match the speech.luma-flash-reframeVideo to VideoTransform and resize your videos effortlessly with Ray 2 Flash Reframe. This tool intelligently expands or adjusts your video’s aspect ratio—adding visually consistent content to the sides, top, or bottom—without altering the original subject.ai-video-upscalerVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.flux-reduxImage to ImageFlux Redux is a transformation model that reimagines or enhances your input images while preserving their main structure and subject. It’s built for creative refinement — whether you want style transfer, artistic reinterpretation, cinematic polish, or mood transformation.qwen-image-2.0-proText to ImageQwen 2.0 Pro Text to Image model with maximum realism and fidelity.seedance-v1.5-pro-i2v-fastImage to VideoSeedance v1.5 Pro Image-to-Video Fast converts a single still image into a short cinematic video with quick generation speed. It preserves the original image’s composition, subject identity, and lighting while adding simple camera motion, light parallax, and subtle environmental animation.seedance-v1.5-pro-video-extendVideo to VideoSeedance v1.5 Pro Video Extend continues an existing video by generating additional frames that match the original scene’s style, lighting, motion, and mood. It is designed for smooth temporal consistency, making it ideal for extending cinematic shots, atmospheric scenes, or slow camera moves without introducing visual jumps or style changes.vidu-v2.0-i2vImage to VideoVidu's 2.0 model delivers advanced image-based video generation with enhanced lighting, emotion dynamics, and automatic frame interpolation for polished visual content.wan2.2-edit-videoVideo to VideoEasily modify existing videos using simple text commands. With Wan 2.2 Video-Edit, you can change attire, character appearance, or other visual elements directly within your video—no need to start from scratch. Works on uploads of 480p or 720p, for up to two minutes.nano-banana-2-editImage to ImageNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.kling-v1-avatar-proAudio to VideoKling AI Avatar Pro is the premium tier for making high-quality talking avatars. You upload a character image plus an audio file, and the model generates a realistic avatar video with lip-sync.ovi-image-to-videoImage to VideoOvi is a unified audio–video generation model that can transform a static image plus a descriptive prompt into a short video with synchronized audio. It supports both text-to-video and image-conditioned video inputs. With built-in lip sync, background audio / sound effects, and dialogue support, Ovi brings still visuals to life in cinematic fashion. Videos are generated in 540p resolution.ltx-2.3-lipsyncAudio to VideoLTX-2.3 LipSync generates a realistic talking video by synchronizing mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency—powered by the upgraded LTX-2.3 architecture.kling-v3.0-std-motion-controlVideo to VideoKling V3.0 Standard Motion Control allows for precise control over the camera and subject movement in generated videos. Powered by the latest Kling V3.0 architecture for improved temporal consistency and quality.ovi-text-to-videoText to VideoOvi is a unified model that generates synchronized video and audio from textual input. You write a scene description, including dialogue and ambient sounds, and Ovi produces a short video clip (typically ~5 seconds) where visuals and sound align naturally. Videos are generated in 540p resolution.seedance-v2.0-video-editVideo to VideoSeedance 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.kling-o1-video-editVideo to VideoKling O1 Video Edit lets you send an existing video clip plus an instruction/prompt to edit or transform the clip while preserving temporal coherence and subject identity. Typical edits include color grading, background replacement, object removal, slow-motion slo-mo, speed ramps, style transfer, subtle camera stabilization, and short extension/outro generation. Inputs can include: the source video, an optional frame mask (for localized edits), time range, and style/reference images.vidu-q2-reference-to-imageImage to ImageVIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.minimax-hailuo-02-pro-i2vImage to VideoAdvanced image-to-video with cinematic realism. Adds dynamic camera motion, realistic physics, and atmospheric detail for storytelling.bytedance-seedream-v5.0Text to ImageSeedream 5.0 Lite is ByteDance’s next-generation text-to-image model, delivering high-fidelity AI art with advanced visual reasoning and precise typography. Supporting up to 4K resolution and cinematic detail, it excels at complex scene construction, consistent character generation, and real-time knowledge integration for accurate, contextually relevant visuals.ltx-2-fast-text-to-videoText to VideoLTX Video Fast is a speed-optimised mode of Lightricks’ video-generation engine, supporting text-to-video workflows. It allows you to input a descriptive prompt and get a short video clip with motion, camera movement, lighting, and stylised visuals. The underlying model (LTX-Video) is built for real-time or near-real-time generation of video clips.wan2.6-text-to-videoText to VideoWAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.qwen-text-to-image-2512Image to ImageQwen Image Text-to-Image 2512 generates high-resolution, visually consistent images from text prompts. It focuses on strong scene structure, clean composition, and atmospheric lighting, making it well-suited for cinematic environments, surreal concepts, fantasy and sci-fi worlds.kling-v3.0-pro-image-to-videoImage to VideoKling 3.0 Pro Image-to-Video animates a single input image into a high-quality, realistic video with smooth camera motion, natural physics, and strong temporal consistency. It excels at real-world scenes, human motion, environmental details, and cinematic movement while preserving the original image’s structure and lighting.any-llmText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.qwen-image-2.0-editImage to ImageQwen 2.0 Image Edit model with precise background modification and enhancements.kling-v3.0-standard-text-to-videoText to VideoKling 3.0 Standard Text-to-Video generates smooth, realistic videos from text with stable motion and natural behavior. It works best with clear subjects, simple actions, and one continuous scene, making it ideal for cute animals, small actions, and calm cinematic moments.kling-v2.6-std-motion-controlVideo to VideoKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.ltx-2.3-image-to-videoImage to VideoLTX-2.3 Image-to-Video animates a single image into a coherent cinematic clip. It preserves scene composition and lighting while adding smooth camera motion, parallax, and environmental dynamics. Built on the upgraded LTX-2.3 architecture for sharper output and improved temporal consistency.minimax-hailuo-02-pro-t2vText to VideoHigh-fidelity text-to-video with cinematic rendering. Best for storytelling, cinematic clips, or realistic visuals with depth, atmosphere, and detail.ltx-2.3-text-to-videoText to VideoLTX-2.3 Text-to-Video generates cinematic video clips directly from text prompts. Built on an upgraded 2.3B architecture, it delivers sharper temporal consistency, faster synthesis, and more precise motion control than previous LTX versions. Ideal for concept visualization, story beats, and prompt-driven animation.topaz-image-upscaleImage to ImageTopaz Image Upscale is a high-quality image-to-image enhancement model that increases resolution, sharpness, and detail using AI super-resolution. It improves clarity, restores texture, reduces noise, and produces crisp, high-res output while preserving natural look and fine edges.seedance-pro-i2vImage to VideoSeedance Pro I2V advanced model animates still images into stunning short videos, preserving intricate visual details and applying smooth motion dynamics, ideal for high-end visuals and cinematic edits.flux-2-dev-editImage to ImageFlux 2 Dev Edit takes an existing image and applies transformations, replacements, or style changes based on a text instruction. It preserves composition, lighting, and the overall scene while modifying only what the edit prompt specifies. Ideal for creative replacements, stylistic adjustments, object swaps, and environment changes while keeping the original artistic integrity.video-combinerVideo to VideoCombine multiple short video clips (5s, 10s, etc.) into a single seamless full-length video. Upload your clips in order and choose the final output aspect ratio. 'Auto' preserves the aspect ratio of your first clip.suno-generate-soundsText to AudioGenerate sound effects using Suno chirp-crow model.suno-generate-lyricsText to TextGenerate lyrics using Suno.vidu-q2-text-to-imageText to ImageVIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.suno-boost-music-styleText to TextBoost style prompts for Suno music generation.pixverse-v5.5-t2vText to VideoPixVerse v5.5 T2V generates cinematic short videos directly from text. It excels at stylized fantasy, anime, surreal worlds, atmospheric environments, and fluid camera motion. The model produces vivid lighting, dynamic effects, depth-rich parallax, and smooth motion.seedance-lite-i2vImage to VideoSeedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.openrouter-visionText to TextAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.suno-add-vocalsText to AudioAdd vocals to an instrumental track.seedance-v1.5-pro-t2v-fastText to VideoSeedance v1.5 Pro Text-to-Video Fast generates short cinematic videos directly from text with an emphasis on speed and stability. It produces coherent scenes with simple camera motion, light environmental animation, and consistent lighting.ltx-2-19b-image-to-videoImage to VideoLTX-2-19B Image-to-Video animates a single image into a coherent cinematic clip with strong temporal stability. It preserves composition and lighting while adding controlled camera motion, realistic parallax, and subtle environmental dynamics—well suited for grounded scenes, near-future concepts, and story beats.suno-generate-mashupText to AudioCreate a mashup using 1-5 audio tracks.pixverse-v5-i2vImage to VideoPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.bytedance-seedream-v4Text to ImageSeedream v4 generates stunning, high-fidelity images from text prompts. It’s designed for creativity with strong support for realism, fantasy, and artistic styles.bytedance-seedream-v4-editImage to ImageSeedream v4 Edit refines or transforms existing images based on a new prompt and a reference. Instead of masking, you provide a source image and describe how it should be altered — adjusting style, details, or replacing elements while keeping the subject consistent.nano-banana-proText to ImageNano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.minimax-voice-cloneText to AudioMinimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.suno-add-instrumentalText to AudioAdd instrumental backing to acapella audio.wan2.6-image-editImage to ImageWAN 2.6 Image Edit applies targeted, instruction-based edits to an existing image while preserving composition, perspective, and lighting. It’s ideal for object replacement, material changes, environment tweaks, and style adjustments with clean integration and minimal artifacts—keeping the original scene coherent and cinematic.seedance-v2.0-i2vImage to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.seedance-v1.5-pro-i2vImage to VideoSeedance v1.5 Pro Image-to-Video converts a single still image into a smooth cinematic video clip. It preserves the original image’s composition, subject identity, and lighting while adding controlled camera motion, natural parallax, and environmental animation. This mode balances visual quality and motion complexity, making it ideal for cinematic scenes, fantasy worlds, sci-fi environments, and storytelling shots.seedance-v2.0-t2vText to VideoSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.flux-2-klein-9b-editImage to ImageFlux-2-Klein-9B Edit performs higher-quality image edits with better detail retention, lighting consistency, and texture handling compared to smaller variants. It’s well-suited for cute character edits, object additions, and visual refinements that need to look natural and polished while keeping the original scene intact.kling-v3.0-pro-text-to-videoText to VideoKling 3.0 Pro is a high-end video generation model capable of producing longer, smoother, and more realistic cinematic videos with strong motion consistency. It handles complex scenes, realistic physics, natural camera movement, and detailed environments better than earlier versions.flux-dev-loraTrainingEnables text-to-image generation using custom LoRA models. Generate consistent characters, styles, or branded visuals with high quality and fast results.flux-kontext-dev-i2iImage to ImageTakes an input images and transforms it based on a new prompt. Keeps structure or pose while changing style, appearance, or details.neta-luminaText to ImageNeta Lumina is a powerful anime-style text-to-image model developed by Neta.art Lab. It’s built on Lumina-Image-2.0, fine-tuned with over 13 million high-quality anime images. It offers strong understanding of multilingual prompts, excellent detail fidelity, support for Danbooru tags, and leaning into niche styles like furry, Guofeng, pets, scenic backgrounds, etc.suno-remix-musicText to AudioThis API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.gpt-image-1.5Text to ImageGPT-Image-1.5 is a high-quality text-to-image generation model designed for rich visual reasoning, detailed compositions, and strong prompt understanding. It excels at complex scenes, symbolic imagery, cinematic lighting, surreal concepts, product visuals, and imaginative world-building while maintaining coherence and fine detail.kling-v3.0-pro-motion-controlVideo to VideoKling V3.0 Pro Motion Control provides the highest level of detail and control for video generation. Suitable for professional workflows requiring complex cinematic camera work and subject consistency.kling-v2.1-master-t2vText to VideoKling 2.1 Master’s T2V mode allows users to generate vivid, high-quality videos from detailed text prompts. It supports dynamic scenes, natural motion, and cinematic quality — perfect for storytelling, ads, or content creation from imagination alone.flux-2-klein-4b-editImage to ImageFlux-2-Klein-4B Edit applies lightweight, instruction-based edits to an existing image. It’s best for clear object swaps, small visual changes, and cute enhancements while preserving the original scene’s layout and lighting. Ideal for fast edits, UI demos, and simple creative tweaks.ideogram-characterImage to ImageIdeogram’s Character Reference model enables consistent character generation using just one reference image. Upload a clear character portrait—and you can place that character in unlimited scenes, styles, poses, or narratives with visual fidelity maintained across all outputs.kling-v3.0-standard-image-to-videoImage to VideoKling 3.0 Standard Image-to-Video animates a single input image into a short, realistic video with smooth, stable motion. It prioritizes temporal consistency, natural physics, and subtle camera movement, making it ideal for everyday scenes, travel moments, people, vehicles, and calm cinematic shots.kling-v1-avatar-standardAudio to VideoKling AI Avatar Standard creates talking avatar videos from a single image + audio input. It supports realistic humans, animals, or stylized characters, producing lip-synced avatar videos easily.kling-v2.5-turbo-pro-i2vImage to VideoKling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.sdxl-loraTrainingThe SDXL LoRA image model enhances Stable Diffusion XL with specialized fine-tuning, letting you generate images in unique styles, characters, or themes. By applying LoRA weights, you can create visuals that match a specific aesthetic, celebrity look, anime style, or custom-trained subject.seedance-v1.5-pro-t2vText to VideoSeedance v1.5 Pro Text-to-Video generates high-quality cinematic videos directly from text prompts. It focuses on smooth motion, rich atmosphere, and coherent scene structure, making it ideal for fantasy worlds, sci-fi environments, surreal visuals, and cinematic storytelling shots with detailed lighting and depth.hunyuan-image-2.1Text to ImageHunyuan Image is a powerful text-to-image generation model that produces photorealistic and highly detailed visuals. It excels at creating portraits, environments, and concept art with strong consistency and realism. Designed for versatility, it supports both natural photography styles and imaginative artistic outputs.qwen-image-edit-plusImage to ImageQwen Image Edit Plus is an upgraded image-editing model that supports multiple image references and superior text editing. Powered by the 20B-parameter Qwen architecture, it allows changes like background swap, style transfer, object removal/addition, and precise text edits (bilingual: English/Chinese) while maintaining visual consistency and preserving details of the original images.kling-v2.5-turbo-pro-t2vText to VideoKling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.leonardoai-lucid-originText to ImageLucid Origin is LeonardoAI’s advanced image generation model, designed for ultra-realistic, vibrant, and highly detailed visuals. It excels at creating photorealistic portraits, landscapes, product shots, and stylized art while faithfully following complex prompts.wan2.5-image-to-videoImage to VideoWAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.wan2.5-text-to-videoText to VideoWAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.wan2.5-text-to-imageText to ImageWAN 2.5 Text-to-Image generates high-quality, realistic or stylized images from textual descriptions. It supports detailed visual storytelling, cinematic compositions, and versatile styles — from portraits and product shots to landscapes and fantasy scenes.topaz-video-upscaleVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.wan2.5-image-editImage to ImageThe Wan2.5 Edit Image model allows you to transform existing images with precision and creativity. By providing an image along with an edit prompt, you can make realistic changes, enhancements, or stylistic adjustments—whether it’s altering objects, changing backgrounds, adding details, or applying an entirely new artistic style.ai-video-upscaler-proVideo to VideoThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.add-video-watermarkVideo to VideoAdd custom watermark to videos with adjustable position, opacity, and size. Free local processing using FFmpeg.video-watermark-removerVideo to VideoThe AI Video Watermark Remover is our flagship model designed to remove Sora 2 watermarks, logos, captions, and unwanted text from videos without compromising quality. Supporting a wide range of formats, it's fast, efficient, and processes with the highest quality.leonardoai-phoenix-1.0Text to ImageLeonardoAI Phoenix 1.0 is a professional-grade AI image model designed for realistic, cinematic, and highly detailed visuals. It excels at interpreting complex prompts, rendering text within images, and creating high-resolution outputs suitable for editorial, commercial, or creative projects.gpt-5-nanoText to TextGPT-5 Nano is a lightweight, high-speed language model from the GPT-5 family designed for instant text generation. It delivers intelligent, context-aware responses for creative writing, summarization, dialogue, code generation, and automation — all at low latency and cost. Perfect for chatbots, assistants, content tools, and real-time applications that need fast, reliable text output.leonardoai-motion-2.0Image to VideoMotion 2.0 is Leonardo.AI's cutting-edge model for creating high-quality 5-second videos from text prompts. It offers enhanced control over animation, including camera movements, lighting, and scene dynamics.higgsfield-soul-image-to-imageImage to ImageSOUL is an AI image model focused on hyper-realistic, magazine or editorial-style visuals, especially for fashion, portraits, lifestyle, and commercial content. It offers over 50 curated style presets to get a specific aesthetic without needing complicated prompt engineering. It generates photography-quality images with lighting, textures, and context that feel real — including natural imperfections like film grain, dust, or lens effects for authenticity.veo3.1-reference-to-videoImage to VideoVeo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.higgsfield-dop-image-to-videoImage to VideoHiggsfield’s DOP (Director of Photography) Motion Effects empower creators to combine cinematic camera moves with built-in visual effects—like explosions, fire, distortion, disintegration, and transitions—directly in AI video generation. You choose from a library of motion presets (e.g. Earth Zoom, Bullet Time, Dolly Zoom) and overlay dynamic effects that accentuate storytelling without needing a full VFX pipeline.remix-videoVideo to VideoTransform and resize your videos effortlessly with remix video tool.openai-sora-2-pro-storyboardText to VideoSora 2 Pro enables creators to structure video narratives by chaining multiple scenes through storyboard “cards.” Each card defines a segment of the video—setting, characters, actions, timing—and the model stitches them into a cohesive multi-scene video. This gives you more control over pacing, transitions, and storytelling flow.veo3.1-extend-videoText to VideoVeo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.gpt-5-miniText to TextGPT‑5 Mini is a compact yet powerful AI that converts plain text ideas into detailed, structured prompts suitable for use in text-to-image, text-to-video, and other generative AI models. It’s perfect for creators who want to quickly craft high-quality prompts without manually thinking about style, composition, and descriptive details. The model helps accelerate workflows for artists, video producers, and designers.seedance-pro-i2v-fastImage to VideoSeedance Pro Fast is the high-speed image-to-video generation variant from ByteDance’s Seedance series. With this model you upload a reference image and—using a text prompt—generate short, dynamic video clips (typically 3-12 seconds) featuring smooth motion, cinematic camera moves, prompt-accurate actions, and high visual fidelity. It supports resolutions up to 1080p, multiple aspect ratios (16:9, 9:16, etc.), and rapid turnaround—ideal for social content, product motion, storytelling from a still, and fast prototyping.seedance-pro-t2v-fastText to VideoSeedance Pro Fast is ByteDance’s advanced text-to-video model that turns natural-language prompts into short, cinematic video clips with realistic motion, camera dynamics, and consistent scene detail.ltx-2-pro-text-to-videoText to VideoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.ltx-2-fast-image-to-videoImage to VideoLTX-2 Fast is a speed-optimized mode of the LTX-2 engine by Lightricks, focused on generating short video clips from a still image + prompt (I2V) with good fidelity and rapid turnaround. It supports audio/video together, multiple aspect ratios, and is ideal when you need quick output for iteration or storyboarding.vidu-q2-referenceImage to VideoVidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.vidu-q2-turbo-start-end-videoImage to VideoVidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.vidu-q2-pro-start-end-videoImage to VideoVidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.minimax-hailuo-2.3-pro-t2vText to VideoHailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.minimax-hailuo-2.3-standard-i2vImage to VideoHailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.minimax-hailuo-2.3-standard-t2vText to VideoHailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.minimax-hailuo-2.3-fastImage to VideoMinimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.kling-v2.5-turbo-std-i2vImage to VideoKling 2.5 Turbo Std: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.reve-text-to-imageText to ImageGenerate images from text prompts using reve's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.reve-image-editImage to ImageReVE Edit is a next-generation image editing model that allows users to apply detailed visual transformations through natural language. Whether you want to restyle portraits, modify backgrounds, or create artistic reinterpretations, ReVE Edit delivers realistic and coherent results while preserving structure and identity.grok-imagine-text-to-imageText to ImageGrok Imagine is xAI’s high-quality image generation model that transforms text prompts into detailed, stylish, and visually expressive images. It excels at creating vivid scenes, characters, environments, and concept art with strong lighting, depth, and artistic clarity. Get 6 images each time.seedvr2-image-upscaleImage to ImageSeedVR2 is a one-step diffusion-transformer model designed for image restoration, super-resolution, deblurring, and artifact removal. It enhances low-quality or compressed images into clean, sharp, high-resolution results while preserving natural colors and fine details.qwen-image-edit-plus-loraImage to ImageQwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.nano-banana-pro-editImage to ImageNano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.kling-o1-edit-imageImage to ImageKling O1 Image Edit applies targeted transformations to an existing image while preserving composition, lighting, and visual consistency. Use it to replace objects, retouch elements, change materials, or apply stylistic shifts with high fidelity and minimal artifacts.kling-o1-image-to-videoImage to VideoKling O1’s Image-to-Video mode transforms one or more reference images into short cinematic video clips by adding natural motion, camera choreography, and scene dynamics while preserving subject identity and visual consistency. It supports start/end frames.kling-o1-reference-to-videoImage to VideoKling O1’s Reference-to-Video mode generates a dynamic video using one or multiple reference images as the visual foundation. It preserves identity, style, composition, and key visual details from the references while adding realistic camera motion, environment dynamics, and scene animation.kling-o1-video-edit-fastVideo to VideoVideo Edit Fast is the lightweight, high-speed editing mode of Kling O1. It performs quick edits on an existing video without heavy processing—ideal for fast object replacements, light enhancements, color tweaks, or simple visual adjustments. This mode focuses on speed over complex reconstruction, making it suitable for rapid iterations, previews, and small edits while preserving the original video’s motion and structure.flux-2-devText to ImageFlux 2 Dev is a powerful text-to-image diffusion model designed for high-quality, fast, and highly detailed visual generation. It excels at creating cinematic lighting, vibrant compositions, surreal concepts, characters, products, and worlds with strong prompt following and artistic control. Ideal for rapid image ideation, visual storytelling, and concept art.flux-2-pro-editImage to ImageFlux-2-Pro Edit enables precise, high-fidelity modifications to an existing image while preserving its lighting, style, mood, and composition. It’s ideal for replacing objects, altering materials, adjusting environmental elements, or performing stylistic transformations without damaging the original scene’s quality. Flux-2-Pro maintains ultra-detailed textures and cinematic realism during edits.bytedance-seedream-v4.5-editImage to ImageSeedream-v4.5 Edit allows you to transform an existing image using natural-language instructions. It preserves the core composition, lighting, and style of the original while modifying only the requested elements — perfect for object replacement, environment changes, stylistic adjustments, and high-detail creative reworks.kling-v2.6-pro-i2vImage to VideoKling-v2.6-Pro Image-to-Video transforms a single creative image into a short cinematic video. It preserves the original style, lighting, and composition while adding smooth camera motion, atmospheric effects, and dynamic environmental animation.kling-v2-avatar-standardAudio to VideoAI-Avatar v2 Standard generates a talking-avatar video from a reference image and an audio dialogue. It performs accurate lip-sync, natural facial expressions, subtle head motion, blinking, and light emotional cues based on voice tone. This Standard version focuses on speed and natural realism.wan2.2-spicy-image-to-videoImage to VideoWan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.minimax-speech-2.6-hdText to AudioSpeech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.minimax-speech-2.6-turboText to AudioSpeech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.seedance-v1.5-pro-video-extend-fastVideo to VideoSeedance v1.5 Pro Video Extend Fast quickly extends an existing video by generating a short continuation that matches the original style, motion, and lighting. This mode prioritizes fast output and smooth continuity with minimal new motion, making it ideal for previews, quick edits, and lightweight shot extensions without complex effects.gpt-image-1.5-editImage to ImageGPT-Image-1.5 Edit applies precise, instruction-based modifications to an existing image while preserving composition, lighting, perspective, and visual coherence. It’s well-suited for object replacement, concept evolution, symbolic edits, and creative transformations that feel natural and intentional rather than destructive.grok-imagine-image-to-imageImage to ImageGrok Imagine Image-to-Image transforms an existing image using natural language instructions while preserving scene structure, perspective, and lighting. It is ideal for object replacement, environment evolution, concept re-imagining, and creative edits that feel grounded and visually coherent rather than over-stylized.ltx-2-19b-text-to-videoText to VideoLTX-2-19B Text-to-Video generates coherent cinematic videos directly from text, with an emphasis on temporal stability, natural motion, and conceptual clarity. It works best when the scene has a strong visual idea where motion reinforces meaning rather than overwhelming it.flux-2-klein-4bText to ImageFlux-2-Klein-4B is a lightweight, fast text-to-image model optimized for clear subject rendering, good prompt adherence, and efficient generation. It works best with simple compositions, everyday scenes, and cute or friendly visuals, making it ideal for UI graphics, demos, thumbnails, mascots, and quick creative iterations.flux-2-klein-9bText to ImageFlux-2-Klein-9B is a mid-size text-to-image model that balances detail quality and generation speed. It handles richer lighting, better textures, and more nuanced scenes than smaller variants, while still working well with clear, grounded prompts. Ideal for polished illustrations, product visuals, mascots, and everyday scenes with character.z-image-pText to ImageZ-Image P is based on PiAPI's Qubico/z-image text-to-image model.openai-sora-2-standard-image-to-videoImage to VideoOpenAI Sora 2 Standard Image to Video model (High Priority). Generate stunning 10s videos from an image and text prompt.flux-2-klein-9b-turbo-editImage to ImageFlux-2-Klein-9B Turbo Edit offers high-quality, ultra-fast image editing with superior detail retention. This high-efficiency version of Klein 9B Edit handles lighting and textures with precision while delivering edits much faster than the standard variant. Best for polished character edits and professional refinements where speed is critical.292 Models FoundESC TO CLOSEExplore ModelsDiscover and run the world's most advanced AI models.Text to Audiosuno-add-instrumental$0.09Text to Audiosuno-generate-mashup$0.09Text to Audiosuno-add-vocals$0.09Text to Textsuno-boost-music-style$0.003Text to Textsuno-generate-lyrics$0.003Text to Audiosuno-generate-sounds$0.02Video to Videovideo-combiner$0.05Image to Videoseedance-2.0-omni-reference$1.5CategoriesReset AllText to Video47Text to Image53Image to Video64Image to Image61Video to Video33Text to Audio11Audio to Video12Text to Text7Training4All Models292 availableSort: Newly AddedText to Audiosuno-add-instrumentalAdd instrumental backing to acapella audio.$0.09Try Model →Text to Audiosuno-generate-mashupCreate a mashup using 1-5 audio tracks.$0.09Try Model →Text to Audiosuno-add-vocalsAdd vocals to an instrumental track.$0.09Try Model →Text to Textsuno-boost-music-styleBoost style prompts for Suno music generation.$0.003Try Model →Text to Textsuno-generate-lyricsGenerate lyrics using Suno.$0.003Try Model →Text to Audiosuno-generate-soundsGenerate sound effects using Suno chirp-crow model.$0.02Try Model →Video to Videovideo-combinerCombine multiple short video clips (5s, 10s, etc.) into a single seamless full-length video. Upload your clips in order and choose the final output aspect ratio. 'Auto' preserves the aspect ratio of your first clip.$0.05Try Model →Image to Videoseedance-2.0-omni-referenceSeedance 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.$1.5Try Model →Video to Videoseedance-v2.0-video-editSeedance 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.$0.6Try Model →Text to Textopenai-sora-2-pro-charactersCreate consistent AI characters for your Sora 2 videos. Provide a previous video's task ID and a prompt to define or refine your character.$0.1Try Model →Image to Imageportrait-stylistProfessional AI portrait styles including hair, makeup, style, and fashion transformations.$0.01Try Model →Image to Imageflux-2-klein-9b-turbo-editFlux-2-Klein-9B Turbo Edit offers high-quality, ultra-fast image editing with superior detail retention. This high-efficiency version of Klein 9B Edit handles lighting and textures with precision while delivering edits much faster than the standard variant. Best for polished character edits and professional refinements where speed is critical.$0.0104Try Model →Text to Imageflux-2-klein-9b-turboFlux-2-Klein-9B Turbo is a high-performance, mid-size text-to-image model. This distilled variant of Klein 9B provides a superior balance of speed and detail, delivering richer textures and complex scenes with significantly reduced generation times. Ideal for polished illustrations and character-rich visuals where performance is key.$0.0065Try Model →Image to Imageflux-2-klein-4b-turbo-editFlux-2-Klein-4B Turbo Edit provides ultra-fast, instruction-based image editing. This high-efficiency variant of Klein 4B Edit is optimized for near-instant swaps and tweaks while preserving layout and lighting. Ideal for real-time design tools and quick creative adjustments.$0.0078Try Model →Text to Imageflux-2-klein-4b-turboFlux-2-Klein-4B Turbo is an ultra-fast, high-efficiency text-to-image model. It is a distilled version of the Klein 4B model, designed for near-instant rendering while maintaining impressive adherence to prompts. Perfect for rapid prototyping, real-time creative tools, and applications where speed is paramount.$0.0052Try Model →Image to Imagephoto-packGenerate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.$0.3Try Model →Text to Imagetiktok-carouselAI TikTok Carousel Generator — create viral TikTok carousel posts from a single text prompt. Choose a proven storytelling format (Problem-Solution, Listicle, Tutorial, Before & After), set your slide count (3-10), and get stunning AI-generated images at 1080x1920 portrait resolution, ready to upload to TikTok.$0.028Try Model →Image to Videoopenai-sora-2-standard-image-to-videoOpenAI Sora 2 Standard Image to Video model (High Priority). Generate stunning 10s videos from an image and text prompt.$0.3Try Model →Text to Videoopenai-sora-2-standard-text-to-videoOpenAI Sora 2 Standard Text to Video model (High Priority). Generate stunning 10s videos from text prompts.$0.3Try Model →Video to Videokling-v3.0-pro-motion-controlKling V3.0 Pro Motion Control provides the highest level of detail and control for video generation. Suitable for professional workflows requiring complex cinematic camera work and subject consistency.$0.16Try Model →Video to Videokling-v3.0-std-motion-controlKling V3.0 Standard Motion Control allows for precise control over the camera and subject movement in generated videos. Powered by the latest Kling V3.0 architecture for improved temporal consistency and quality.$0.1Try Model →Video to Videoltx-2.3-video-extendLTX-2.3 Video Extend seamlessly continues an existing video clip by generating additional frames that match the original motion, style, and scene composition. Powered by the LTX-2.3 architecture, it maintains temporal coherence and visual fidelity across the extension boundary.$0.104Try Model →Audio to Videoltx-2.3-lipsyncLTX-2.3 LipSync generates a realistic talking video by synchronizing mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency—powered by the upgraded LTX-2.3 architecture.$0.26Try Model →Image to Videoltx-2.3-image-to-videoLTX-2.3 Image-to-Video animates a single image into a coherent cinematic clip. It preserves scene composition and lighting while adding smooth camera motion, parallax, and environmental dynamics. Built on the upgraded LTX-2.3 architecture for sharper output and improved temporal consistency.$0.104Try Model →Text to Videoltx-2.3-text-to-videoLTX-2.3 Text-to-Video generates cinematic video clips directly from text prompts. Built on an upgraded 2.3B architecture, it delivers sharper temporal consistency, faster synthesis, and more precise motion control than previous LTX versions. Ideal for concept visualization, story beats, and prompt-driven animation.$0.104Try Model →Image to Imageqwen-image-2.0-pro-editQwen 2.0 Pro Image Edit model with maximum precision and modifications.$0.09Try Model →Text to Imageqwen-image-2.0-proQwen 2.0 Pro Text to Image model with maximum realism and fidelity.$0.09Try Model →Image to Imageqwen-image-2.0-editQwen 2.0 Image Edit model with precise background modification and enhancements.$0.04Try Model →Text to Imageqwen-image-2.0Qwen 2.0 Text to Image model with enhanced realism.$0.04Try Model →Video to Videoai-captionsAdd AI-generated animated captions to any video using Vadoo's caption engine. Supports multiple languages and viral caption themes like Hormozi style. Perfect for social media creators, marketers, and content producers.$0Try Model →Video to Videoseedance-2.0-watermark-remover🎉 FREE for a limited time — Remove Seedance 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.$0Try Model →Text to Videoseedance-v2.0-extendSeedance 2.0 Extend Video continues an existing Seedance 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment.$0.6Try Model →Image to Videoseedance-v2.0-i2vSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.$0.6Try Model →Text to Imagez-image-pZ-Image P is based on PiAPI's Qubico/z-image text-to-image model.$0.004Try Model →Image to Imagenano-banana-2-editNano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.$0.06Try Model →Text to Imagenano-banana-2Nano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.$0.06Try Model →Image to Imagebytedance-seedream-v5.0-editSeedream 5.0 Lite Edit is an advanced image transformation model by ByteDance, enabling precise, controllable edits using natural language. It specializes in high-fidelity style transfer (Anime, Cyberpunk, Fantasy), background swaps, and object modification while preserving original lighting, color tones, and character consistency for professional-grade creative reworks.$0.0325Try Model →Text to Imagebytedance-seedream-v5.0Seedream 5.0 Lite is ByteDance’s next-generation text-to-image model, delivering high-fidelity AI art with advanced visual reasoning and precise typography. Supporting up to 4K resolution and cinematic detail, it excels at complex scene construction, consistent character generation, and real-time knowledge integration for accurate, contextually relevant visuals.$0.0325Try Model →Text to Videoseedance-v2.0-t2vSeedance 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.$0.6Try Model →Text to Videokling-v3.0-standard-text-to-videoKling 3.0 Standard Text-to-Video generates smooth, realistic videos from text with stable motion and natural behavior. It works best with clear subjects, simple actions, and one continuous scene, making it ideal for cute animals, small actions, and calm cinematic moments.$0.72Try Model →Image to Videokling-v3.0-standard-image-to-videoKling 3.0 Standard Image-to-Video animates a single input image into a short, realistic video with smooth, stable motion. It prioritizes temporal consistency, natural physics, and subtle camera movement, making it ideal for everyday scenes, travel moments, people, vehicles, and calm cinematic shots.$0.72Try Model →Text to Videokling-v3.0-pro-text-to-videoKling 3.0 Pro is a high-end video generation model capable of producing longer, smoother, and more realistic cinematic videos with strong motion consistency. It handles complex scenes, realistic physics, natural camera movement, and detailed environments better than earlier versions.$0.72Try Model →Image to Videokling-v3.0-pro-image-to-videoKling 3.0 Pro Image-to-Video animates a single input image into a high-quality, realistic video with smooth camera motion, natural physics, and strong temporal consistency. It excels at real-world scenes, human motion, environmental details, and cinematic movement while preserving the original image’s structure and lighting.$0.72Try Model →Video to Videoai-clippingConvert long-form videos into engaging short clips using AI clipping.$0.5Try Model →Text to Imagez-image-baseZ-Image Base is a general-purpose text-to-image model designed for reliable, high-quality image generation from natural language prompts. It focuses on clear composition, good prompt adherence, and versatile output across everyday scenes, product-style visuals, characters, and creative concepts.$0.013Try Model →Audio to Videoltx-2-19b-lipsyncLTX-2-19B LipSync generates a realistic talking video by synchronizing a person’s mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency. Ideal for avatars, dubbing, dialogue replacement, and character narration.$0.2Try Model →Video to Videoadd-video-watermarkAdd custom watermark to videos with adjustable position, opacity, and size. Free local processing using FFmpeg.$0Try Model →Image to Imageadd-image-watermarkAdd custom watermark to images with adjustable position, opacity, and size. Free local processing using PIL.$0Try Model →Image to Imageflux-2-klein-9b-editFlux-2-Klein-9B Edit performs higher-quality image edits with better detail retention, lighting consistency, and texture handling compared to smaller variants. It’s well-suited for cute character edits, object additions, and visual refinements that need to look natural and polished while keeping the original scene intact.$0.0208Try Model →Text to Imageflux-2-klein-9bFlux-2-Klein-9B is a mid-size text-to-image model that balances detail quality and generation speed. It handles richer lighting, better textures, and more nuanced scenes than smaller variants, while still working well with clear, grounded prompts. Ideal for polished illustrations, product visuals, mascots, and everyday scenes with character.$0.013Try Model →Image to Imageflux-2-klein-4b-editFlux-2-Klein-4B Edit applies lightweight, instruction-based edits to an existing image. It’s best for clear object swaps, small visual changes, and cute enhancements while preserving the original scene’s layout and lighting. Ideal for fast edits, UI demos, and simple creative tweaks.$0.0156Try Model →Text to Imageflux-2-klein-4bFlux-2-Klein-4B is a lightweight, fast text-to-image model optimized for clear subject rendering, good prompt adherence, and efficient generation. It works best with simple compositions, everyday scenes, and cute or friendly visuals, making it ideal for UI graphics, demos, thumbnails, mascots, and quick creative iterations.$0.0104Try Model →Text to Videoveo3.1-4k-videoGet the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.$0.6Try Model →Text to Videoltx-2-19b-text-to-videoLTX-2-19B Text-to-Video generates coherent cinematic videos directly from text, with an emphasis on temporal stability, natural motion, and conceptual clarity. It works best when the scene has a strong visual idea where motion reinforces meaning rather than overwhelming it.$0.6Try Model →Image to Videoltx-2-19b-image-to-videoLTX-2-19B Image-to-Video animates a single image into a coherent cinematic clip with strong temporal stability. It preserves composition and lighting while adding controlled camera motion, realistic parallax, and subtle environmental dynamics—well suited for grounded scenes, near-future concepts, and story beats.$0.6Try Model →Image to Imagegrok-imagine-image-to-imageGrok Imagine Image-to-Image transforms an existing image using natural language instructions while preserving scene structure, perspective, and lighting. It is ideal for object replacement, environment evolution, concept re-imagining, and creative edits that feel grounded and visually coherent rather than over-stylized.$0.05Try Model →Video to Videokling-v2.6-std-motion-controlKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.$0.45Try Model →Image to Imagegpt-image-1.5-editGPT-Image-1.5 Edit applies precise, instruction-based modifications to an existing image while preserving composition, lighting, perspective, and visual coherence. It’s well-suited for object replacement, concept evolution, symbolic edits, and creative transformations that feel natural and intentional rather than destructive.$0.054Try Model →Image to Imageqwen-text-to-image-2512Qwen Image Text-to-Image 2512 generates high-resolution, visually consistent images from text prompts. It focuses on strong scene structure, clean composition, and atmospheric lighting, making it well-suited for cinematic environments, surreal concepts, fantasy and sci-fi worlds.$0.04Try Model →Image to Imagewan2.6-image-editWAN 2.6 Image Edit applies targeted, instruction-based edits to an existing image while preserving composition, perspective, and lighting. It’s ideal for object replacement, material changes, environment tweaks, and style adjustments with clean integration and minimal artifacts—keeping the original scene coherent and cinematic.$0.045Try Model →Text to Imagewan2.6-text-to-imageWAN 2.6 Text-to-Image generates detailed, cinematic still images from text prompts. It focuses on strong composition, atmospheric lighting, and clear subject structure, making it suitable for fantasy and sci-fi environments, surreal concepts, architectural visuals, and dramatic world-building imagery.$0.04Try Model →Image to Imageqwen-image-edit-2511Qwen Image Edit 2511 performs precise, instruction-driven edits on an existing image while preserving composition, lighting, and overall style. It’s well-suited for object replacement, material changes, localized edits, and subtle scene adjustments with strong visual consistency and minimal artifacts.$0.04Try Model →Video to Videoseedance-v1.5-pro-video-extend-fastSeedance v1.5 Pro Video Extend Fast quickly extends an existing video by generating a short continuation that matches the original style, motion, and lighting. This mode prioritizes fast output and smooth continuity with minimal new motion, making it ideal for previews, quick edits, and lightweight shot extensions without complex effects.$0.26Try Model →Video to Videoseedance-v1.5-pro-video-extendSeedance v1.5 Pro Video Extend continues an existing video by generating additional frames that match the original scene’s style, lighting, motion, and mood. It is designed for smooth temporal consistency, making it ideal for extending cinematic shots, atmospheric scenes, or slow camera moves without introducing visual jumps or style changes.$0.34Try Model →Text to Videoseedance-v1.5-pro-t2v-fastSeedance v1.5 Pro Text-to-Video Fast generates short cinematic videos directly from text with an emphasis on speed and stability. It produces coherent scenes with simple camera motion, light environmental animation, and consistent lighting.$0.26Try Model →Image to Videoseedance-v1.5-pro-i2v-fastSeedance v1.5 Pro Image-to-Video Fast converts a single still image into a short cinematic video with quick generation speed. It preserves the original image’s composition, subject identity, and lighting while adding simple camera motion, light parallax, and subtle environmental animation.$0.26Try Model →Text to Videoseedance-v1.5-pro-t2vSeedance v1.5 Pro Text-to-Video generates high-quality cinematic videos directly from text prompts. It focuses on smooth motion, rich atmosphere, and coherent scene structure, making it ideal for fantasy worlds, sci-fi environments, surreal visuals, and cinematic storytelling shots with detailed lighting and depth.$0.34Try Model →Image to Videoseedance-v1.5-pro-i2vSeedance v1.5 Pro Image-to-Video converts a single still image into a smooth cinematic video clip. It preserves the original image’s composition, subject identity, and lighting while adding controlled camera motion, natural parallax, and environmental animation. This mode balances visual quality and motion complexity, making it ideal for cinematic scenes, fantasy worlds, sci-fi environments, and storytelling shots.$0.34Try Model →Video to Videokling-v2.6-pro-motion-controlKling v2.6 Pro Motion Control allows precise control over camera movement, subject motion, and scene dynamics during video generation. Instead of leaving motion fully implicit, this mode lets you explicitly define how the camera moves (pan, tilt, orbit, dolly, zoom) and how objects or characters behave over time.$0.145Try Model →Text to Textopenrouter-visionAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.$0.025Try Model →Text to Textany-llmAny LLM is a versatile large language model for text generation, comprehension, and diverse NLP tasks such as chat and summarization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.$0.01Try Model →Video to Videokling-o1-standard-video-editKling O1 Standard Video-to-Video Edit modifies an existing video while preserving its original structure, motion, and realism. It is designed for subtle, stable edits such as object replacement, background changes, lighting adjustments, or small visual tweaks. This mode prioritizes temporal consistency and natural motion, making it.$1.09Try Model →Image to Videokling-o1-standard-reference-to-videoKling O1 Standard Reference-to-Video generates a smooth, realistic video using one or multiple reference images as visual guidance. It preserves the visual identity, composition, and lighting from the references while adding subtle camera motion, natural parallax, and light environmental animation. This mode prioritizes stability and realism, making it ideal for character shots, environments, product visuals, and calm cinematic scenes.$0.72Try Model →Image to Videokling-o1-standard-image-to-videoKling O1 Standard Image-to-Video converts a single still image into a short, natural-looking video clip. It preserves the original image’s composition and lighting while adding subtle camera motion, gentle parallax, and light environmental animation. This mode focuses on realism and stability rather than heavy effects, making it ideal for clean cinematic shots, environments, characters, and product visuals.$0.5Try Model →Text to Videowan2.6-text-to-videoWAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.$0.65Try Model →Image to Videowan2.6-image-to-videoWAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.$0.65Try Model →Text to Imagegpt-image-1.5GPT-Image-1.5 is a high-quality text-to-image generation model designed for rich visual reasoning, detailed compositions, and strong prompt understanding. It excels at complex scenes, symbolic imagery, cinematic lighting, surreal concepts, product visuals, and imaginative world-building while maintaining coherence and fine detail.$0.054Try Model →Text to Audiominimax-speech-2.6-turboSpeech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.$0.65Try Model →Text to Audiominimax-speech-2.6-hdSpeech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.$0.65Try Model →Text to Audiominimax-voice-cloneMinimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.$0.65Try Model →Video to Videowan2.2-spicy-video-extendWan-2.2-spicy Video Extend continues an existing video by generating new frames that match the original style but add stronger motion, bolder effects, and spicier dramatics.$0.2Try Model →Image to Videowan2.2-spicy-image-to-videoWan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.$0.2Try Model →Audio to Videokling-v2-avatar-proAI-Avatar v2 Pro takes a reference image of a person/character and an audio dialogue clip, then generates a realistic talking-avatar video. It preserves identity, lip syncs accurately to the audio, adds natural head movement, eye motion, expressions, and cinematic lighting.$0.75Try Model →Audio to Videokling-v2-avatar-standardAI-Avatar v2 Standard generates a talking-avatar video from a reference image and an audio dialogue. It performs accurate lip-sync, natural facial expressions, subtle head motion, blinking, and light emotional cues based on voice tone. This Standard version focuses on speed and natural realism.$0.35Try Model →Text to Videopixverse-v5.5-t2vPixVerse v5.5 T2V generates cinematic short videos directly from text. It excels at stylized fantasy, anime, surreal worlds, atmospheric environments, and fluid camera motion. The model produces vivid lighting, dynamic effects, depth-rich parallax, and smooth motion.$0.1Try Model →Image to Videopixverse-v5.5-i2vPixVerse v5.5 I2V transforms a single image into a dynamic cinematic video clip. It adds smooth camera motion, atmospheric animation, natural parallax, and environmental effects while preserving the image’s original art style and composition.$0.1Try Model →Text to Videokling-v2.6-pro-t2vKling-v2.6-Pro Text-to-Video generates high-fidelity cinematic videos directly from text prompts. It excels at complex compositions, dramatic lighting, fluid camera motion, and visually rich fantasy or sci-fi sequences.$0.9Try Model →Image to Videokling-v2.6-pro-i2vKling-v2.6-Pro Image-to-Video transforms a single creative image into a short cinematic video. It preserves the original style, lighting, and composition while adding smooth camera motion, atmospheric effects, and dynamic environmental animation.$0.9Try Model →Image to Imagebytedance-seedream-v4.5-editSeedream-v4.5 Edit allows you to transform an existing image using natural-language instructions. It preserves the core composition, lighting, and style of the original while modifying only the requested elements — perfect for object replacement, environment changes, stylistic adjustments, and high-detail creative reworks.$0.05Try Model →Text to Imagebytedance-seedream-v4.5Seedream-v4.5 is ByteDance’s advanced text-to-image diffusion model designed for generating high-detail, high-contrast, cinematic and stylized images. It excels at surreal fantasy concepts, sci-fi worlds, product visuals, photoreal scenes, and artistic compositions with strong prompt adherence and crisp detail.$0.05Try Model →Image to Imagevidu-q2-reference-to-imageVIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.$0.032Try Model →Text to Imagevidu-q2-text-to-imageVIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.$0.04Try Model →Image to Imageflux-2-pro-editFlux-2-Pro Edit enables precise, high-fidelity modifications to an existing image while preserving its lighting, style, mood, and composition. It’s ideal for replacing objects, altering materials, adjusting environmental elements, or performing stylistic transformations without damaging the original scene’s quality. Flux-2-Pro maintains ultra-detailed textures and cinematic realism during edits.$0.032Try Model →Text to Imageflux-2-proFlux-2-Pro Text-to-Image is a premium, high-fidelity generative model capable of producing ultra-realistic, cinematic, and deeply detailed images from text prompts. It excels at complex lighting, layered compositions, surreal visual concepts, and professional art-grade rendering suitable for concept art, advertising visuals, and world-building.$0.032Try Model →Image to Imageflux-2-flex-editFlux-2-Flex Edit allows flexible transformation of an existing image: object replacement, material changes, lighting adjustments, style shifts, or localized edits. It preserves the original scene’s geometry, perspective, and lighting while modifying only what the edit prompt specifies.$0.09Try Model →Text to Imageflux-2-flexFlux-2-Flex Text-to-Image is a flexible, high-fidelity generative model capable of producing detailed, imaginative, and stylistically rich scenes from text alone. It excels at surreal concepts, fantasy environments, sci-fi structures, cinematic atmospheres, and high-resolution artistic compositions with strong prompt adherence.$0.09Try Model →Image to Imageflux-2-dev-editFlux 2 Dev Edit takes an existing image and applies transformations, replacements, or style changes based on a text instruction. It preserves composition, lighting, and the overall scene while modifying only what the edit prompt specifies. Ideal for creative replacements, stylistic adjustments, object swaps, and environment changes while keeping the original artistic integrity.$0.031Try Model →Text to Imageflux-2-devFlux 2 Dev is a powerful text-to-image diffusion model designed for high-quality, fast, and highly detailed visual generation. It excels at creating cinematic lighting, vibrant compositions, surreal concepts, characters, products, and worlds with strong prompt following and artistic control. Ideal for rapid image ideation, visual storytelling, and concept art.$0.015Try Model →Text to Imagez-image-turboZ-Image Turbo is a high-speed text-to-image model optimized for fast creative generation. It produces detailed, high-contrast, high-resolution images with strong stylization control. Ideal for rapid concept creation, visual exploration, product ideas, fantasy scenes, and cinematic composition tests. Designed for low latency and strong prompt adherence.$0.007Try Model →Text to Imagekling-o1-text-to-imageKling O1 Text-to-Image is a high-fidelity creative image model that converts rich natural-language prompts into ultra-detailed stills. It excels at cinematic composition, realistic lighting, and coherent scene detail—great for concept art, environment renders, character portraits, and stylized imagery with photoreal or illustrative looks.$0.036Try Model →Image to Imagekling-o1-edit-imageKling O1 Image Edit applies targeted transformations to an existing image while preserving composition, lighting, and visual consistency. Use it to replace objects, retouch elements, change materials, or apply stylistic shifts with high fidelity and minimal artifacts.$0.036Try Model →Video to Videokling-o1-video-edit-fastVideo Edit Fast is the lightweight, high-speed editing mode of Kling O1. It performs quick edits on an existing video without heavy processing—ideal for fast object replacements, light enhancements, color tweaks, or simple visual adjustments. This mode focuses on speed over complex reconstruction, making it suitable for rapid iterations, previews, and small edits while preserving the original video’s motion and structure.$0.585Try Model →Video to Videokling-o1-video-editKling O1 Video Edit lets you send an existing video clip plus an instruction/prompt to edit or transform the clip while preserving temporal coherence and subject identity. Typical edits include color grading, background replacement, object removal, slow-motion slo-mo, speed ramps, style transfer, subtle camera stabilization, and short extension/outro generation. Inputs can include: the source video, an optional frame mask (for localized edits), time range, and style/reference images.$1.09Try Model →Image to Videokling-o1-reference-to-videoKling O1’s Reference-to-Video mode generates a dynamic video using one or multiple reference images as the visual foundation. It preserves identity, style, composition, and key visual details from the references while adding realistic camera motion, environment dynamics, and scene animation.$0.72Try Model →Image to Videokling-o1-image-to-videoKling O1’s Image-to-Video mode transforms one or more reference images into short cinematic video clips by adding natural motion, camera choreography, and scene dynamics while preserving subject identity and visual consistency. It supports start/end frames.$0.72Try Model →Text to Videokling-o1-text-to-videoKling O1 is a unified, multi-modal video generation engine that transforms natural language prompts into short cinematic video clips. It supports text-to-video generation with realistic motion, dynamic camera moves, and coherent scene rendering.$0.72Try Model →Text to Imagenano-banana-proNano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.$0.12Try Model →Image to Imagenano-banana-pro-editNano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.$0.12Try Model →Image to Imageqwen-image-edit-plus-loraQwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.$0.04Try Model →Image to Imageseedvr2-image-upscaleSeedVR2 is a one-step diffusion-transformer model designed for image restoration, super-resolution, deblurring, and artifact removal. It enhances low-quality or compressed images into clean, sharp, high-resolution results while preserving natural colors and fine details.$0.02Try Model →Image to Imagetopaz-image-upscaleTopaz Image Upscale is a high-quality image-to-image enhancement model that increases resolution, sharpness, and detail using AI super-resolution. It improves clarity, restores texture, reduces noise, and produces crisp, high-res output while preserving natural look and fine edges.$0.075Try Model →Text to Imagegrok-imagine-text-to-imageGrok Imagine is xAI’s high-quality image generation model that transforms text prompts into detailed, stylish, and visually expressive images. It excels at creating vivid scenes, characters, environments, and concept art with strong lighting, depth, and artistic clarity. Get 6 images each time.$0.05Try Model →Text to Videogrok-imagine-text-to-videoGrok Imagine is xAI’s fast, creative text-to-video model that generates short (~6-second) cinematic clips with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.$0.15Try Model →Image to Videogrok-imagine-image-to-videoGrok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into short (≈6 second) cinematic videos with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.$0.15Try Model →Image to Imagereve-image-editReVE Edit is a next-generation image editing model that allows users to apply detailed visual transformations through natural language. Whether you want to restyle portraits, modify backgrounds, or create artistic reinterpretations, ReVE Edit delivers realistic and coherent results while preserving structure and identity.$0.05Try Model →Text to Imagereve-text-to-imageGenerate images from text prompts using reve's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.$0.032Try Model →Image to Videokling-v2.5-turbo-std-i2vKling 2.5 Turbo Std: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.$0.28Try Model →Image to Videominimax-hailuo-2.3-fastMinimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.$0.24Try Model →Text to Videominimax-hailuo-2.3-standard-t2vHailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.$0.36Try Model →Image to Videominimax-hailuo-2.3-standard-i2vHailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.$0.36Try Model →Text to Videominimax-hailuo-2.3-pro-t2vHailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.$0.63Try Model →Image to Videominimax-hailuo-2.3-pro-i2vHailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.$0.63Try Model →Image to Videovidu-q2-pro-start-end-videoVidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.$0.13Try Model →Image to Videovidu-q2-turbo-start-end-videoVidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.$0.06Try Model →Image to Videovidu-q2-referenceVidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.$0.065Try Model →Text to Videoltx-2-fast-text-to-videoLTX Video Fast is a speed-optimised mode of Lightricks’ video-generation engine, supporting text-to-video workflows. It allows you to input a descriptive prompt and get a short video clip with motion, camera movement, lighting, and stylised visuals. The underlying model (LTX-Video) is built for real-time or near-real-time generation of video clips.$0.46Try Model →Image to Videoltx-2-fast-image-to-videoLTX-2 Fast is a speed-optimized mode of the LTX-2 engine by Lightricks, focused on generating short video clips from a still image + prompt (I2V) with good fidelity and rapid turnaround. It supports audio/video together, multiple aspect ratios, and is ideal when you need quick output for iteration or storyboarding.$0.46Try Model →Text to Videoltx-2-pro-text-to-videoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.$0.46Try Model →Image to Videoltx-2-pro-image-to-videoLTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.$0.46Try Model →Text to Videoseedance-pro-t2v-fastSeedance Pro Fast is ByteDance’s advanced text-to-video model that turns natural-language prompts into short, cinematic video clips with realistic motion, camera dynamics, and consistent scene detail.$0.06Try Model →Image to Videoseedance-pro-i2v-fastSeedance Pro Fast is the high-speed image-to-video generation variant from ByteDance’s Seedance series. With this model you upload a reference image and—using a text prompt—generate short, dynamic video clips (typically 3-12 seconds) featuring smooth motion, cinematic camera moves, prompt-accurate actions, and high visual fidelity. It supports resolutions up to 1080p, multiple aspect ratios (16:9, 9:16, etc.), and rapid turnaround—ideal for social content, product motion, storytelling from a still, and fast prototyping.$0.06Try Model →Text to Textgpt-5-miniGPT‑5 Mini is a compact yet powerful AI that converts plain text ideas into detailed, structured prompts suitable for use in text-to-image, text-to-video, and other generative AI models. It’s perfect for creators who want to quickly craft high-quality prompts without manually thinking about style, composition, and descriptive details. The model helps accelerate workflows for artists, video producers, and designers.$0.01Try Model →Text to Videoveo3.1-extend-videoVeo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.$0.6Try Model →Text to Videoopenai-sora-2-pro-storyboardSora 2 Pro enables creators to structure video narratives by chaining multiple scenes through storyboard “cards.” Each card defines a segment of the video—setting, characters, actions, timing—and the model stitches them into a cohesive multi-scene video. This gives you more control over pacing, transitions, and storytelling flow.$0.58Try Model →Image to Videoveo3.1-reference-to-videoVeo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.$0.6Try Model →Text to Videoveo3.1-fast-text-to-videoVeo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.$0.6Try Model →Image to Videoveo3.1-fast-image-to-videoVeo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.$0.6Try Model →Text to Videoveo3.1-text-to-videoVeo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.$2.5Try Model →Image to Videoveo3.1-image-to-videoVeo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.$2.5Try Model →Video to Videoremix-videoTransform and resize your videos effortlessly with remix video tool.$0.025Try Model →Image to Videohiggsfield-dop-image-to-videoHiggsfield’s DOP (Director of Photography) Motion Effects empower creators to combine cinematic camera moves with built-in visual effects—like explosions, fire, distortion, disintegration, and transitions—directly in AI video generation. You choose from a library of motion presets (e.g. Earth Zoom, Bullet Time, Dolly Zoom) and overlay dynamic effects that accentuate storytelling without needing a full VFX pipeline.$0.3Try Model →Image to Videoleonardoai-motion-2.0Motion 2.0 is Leonardo.AI's cutting-edge model for creating high-quality 5-second videos from text prompts. It offers enhanced control over animation, including camera movements, lighting, and scene dynamics.$0.4Try Model →Text to Imageleonardoai-lucid-originLucid Origin is LeonardoAI’s advanced image generation model, designed for ultra-realistic, vibrant, and highly detailed visuals. It excels at creating photorealistic portraits, landscapes, product shots, and stylized art while faithfully following complex prompts.$0.03Try Model →Text to Imageleonardoai-phoenix-1.0LeonardoAI Phoenix 1.0 is a professional-grade AI image model designed for realistic, cinematic, and highly detailed visuals. It excels at interpreting complex prompts, rendering text within images, and creating high-resolution outputs suitable for editorial, commercial, or creative projects.$0.05Try Model →Image to Imagehiggsfield-soul-image-to-imageSOUL is an AI image model focused on hyper-realistic, magazine or editorial-style visuals, especially for fashion, portraits, lifestyle, and commercial content. It offers over 50 curated style presets to get a specific aesthetic without needing complicated prompt engineering. It generates photography-quality images with lighting, textures, and context that feel real — including natural imperfections like film grain, dust, or lens effects for authenticity.$0.033Try Model →Text to Videoopenai-sora-2-pro-text-to-videoSora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.$3Try Model →Image to Videoopenai-sora-2-pro-image-to-videoSora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.$3Try Model →Text to Textgpt-5-nanoGPT-5 Nano is a lightweight, high-speed language model from the GPT-5 family designed for instant text generation. It delivers intelligent, context-aware responses for creative writing, summarization, dialogue, code generation, and automation — all at low latency and cost. Perfect for chatbots, assistants, content tools, and real-time applications that need fast, reliable text output.$0.01Try Model →Text to Videoovi-text-to-videoOvi is a unified model that generates synchronized video and audio from textual input. You write a scene description, including dialogue and ambient sounds, and Ovi produces a short video clip (typically ~5 seconds) where visuals and sound align naturally. Videos are generated in 540p resolution.$0.2Try Model →Image to Videoovi-image-to-videoOvi is a unified audio–video generation model that can transform a static image plus a descriptive prompt into a short video with synchronized audio. It supports both text-to-video and image-conditioned video inputs. With built-in lip sync, background audio / sound effects, and dialogue support, Ovi brings still visuals to life in cinematic fashion. Videos are generated in 540p resolution.$0.2Try Model →Video to Videovideo-watermark-removerThe AI Video Watermark Remover is our flagship model designed to remove Sora 2 watermarks, logos, captions, and unwanted text from videos without compromising quality. Supporting a wide range of formats, it's fast, efficient, and processes with the highest quality.$0.065Try Model →Video to Videoai-video-upscaler-proThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.$0.24Try Model →Text to Videoopenai-sora-2-text-to-videoSora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.$1.5Try Model →Image to Videoopenai-sora-2-image-to-videoSora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.$1.5Try Model →Text to Videoopenai-soraSora is a text-to-video generative AI model developed by OpenAI. It can generate short video clips based on descriptive text inputs, producing content that ranges from photorealistic scenes to stylized animations.$0.5Try Model →Text to Imagehunyuan-image-3.0Hunyuan Image 3.0 brings together powerful architecture (Mixture-of-Experts + autoregressive style) to produce richly detailed and coherent images from complex prompts. It can read narrative descriptions, render text and signage cleanly, and support multiple visual styles — from photorealism to illustrations.$0.065Try Model →Image to Imagewan2.5-image-editThe Wan2.5 Edit Image model allows you to transform existing images with precision and creativity. By providing an image along with an edit prompt, you can make realistic changes, enhancements, or stylistic adjustments—whether it’s altering objects, changing backgrounds, adding details, or applying an entirely new artistic style.$0.04Try Model →Video to Videotopaz-video-upscaleThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.$0.08Try Model →Text to Imagewan2.5-text-to-imageWAN 2.5 Text-to-Image generates high-quality, realistic or stylized images from textual descriptions. It supports detailed visual storytelling, cinematic compositions, and versatile styles — from portraits and product shots to landscapes and fantasy scenes.$0.04Try Model →Text to Videowan2.5-text-to-video-fastTransform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.$0.44Try Model →Image to Videowan2.5-image-to-video-fastConvert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.$0.44Try Model →Text to Videowan2.5-text-to-videoWAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.$0.65Try Model →Image to Videowan2.5-image-to-videoWAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.$0.65Try Model →Text to Videokling-v2.5-turbo-pro-t2vKling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.$0.45Try Model →Image to Videokling-v2.5-turbo-pro-i2vKling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.$0.45Try Model →Image to Imageqwen-image-edit-plusQwen Image Edit Plus is an upgraded image-editing model that supports multiple image references and superior text editing. Powered by the 20B-parameter Qwen architecture, it allows changes like background swap, style transfer, object removal/addition, and precise text edits (bilingual: English/Chinese) while maintaining visual consistency and preserving details of the original images.$0.03Try Model →Video to Videowan2.2-animateWan2.2 Animate is a video-to-video model for animating a character or replacing a character in existing video clips. It replicates holistic movement and facial expressions from a reference video or pose while preserving the target character’s appearance. You upload both an image (for the character) and a video containing motion/expression, and the model generates a video where the character in your image moves like the reference. Supports 480p or 720p, up to 120 seconds$0.35Try Model →Video to Videoheygen-video-translateConvert any video into 175+ languages with synchronized voice translation, AI-voice cloning, and accurate lip sync. Just upload your video (or provide a link), select a target language, and HeyGen recreates the speech in that language. 0.05$ per second.$0.25Try Model →Audio to Videokling-v1-avatar-proKling AI Avatar Pro is the premium tier for making high-quality talking avatars. You upload a character image plus an audio file, and the model generates a realistic avatar video with lip-sync.$0.65Try Model →Audio to Videokling-v1-avatar-standardKling AI Avatar Standard creates talking avatar videos from a single image + audio input. It supports realistic humans, animals, or stylized characters, producing lip-synced avatar videos easily.$0.35Try Model →Video to Videowan2.2-edit-videoEasily modify existing videos using simple text commands. With Wan 2.2 Video-Edit, you can change attire, character appearance, or other visual elements directly within your video—no need to start from scratch. Works on uploads of 480p or 720p, for up to two minutes.$0.3Try Model →Text to Imageneta-luminaNeta Lumina is a powerful anime-style text-to-image model developed by Neta.art Lab. It’s built on Lumina-Image-2.0, fine-tuned with over 13 million high-quality anime images. It offers strong understanding of multilingual prompts, excellent detail fidelity, support for Danbooru tags, and leaning into niche styles like furry, Guofeng, pets, scenic backgrounds, etc.$0.02Try Model →Text to Imageperfect-pony-xlPony XL is a high-quality image generation model based on Stable Diffusion XL architecture. It specializes in character art, hybrid styles, and producing detailed, polished visuals even with simpler prompts.$0.02Try Model →Text to Imageflux-krea-devFlux Krea Dev is a text-to-image model built by Black Forest Labs in collaboration with Krea AI, designed to generate highly photorealistic images that avoid the common 'AI look' artifacts (plastic skin, overexposed lighting, synthetic textures). It emphasizes real texture, natural lighting, and aesthetic control.$0.015Try Model →Image to Imageflux-reduxFlux Redux is a transformation model that reimagines or enhances your input images while preserving their main structure and subject. It’s built for creative refinement — whether you want style transfer, artistic reinterpretation, cinematic polish, or mood transformation.$0.01Try Model →Video to Videoai-video-upscalerThe AI Video Upscaler is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this upscaler leverages advanced machine learning models to deliver high-quality, upscaled videos.$0.03Try Model →Image to Imageflux-kontext-effectsFlux Kontext Effects is a creative image and video model that applies stylized transformations, cinematic filters, and artistic reinterpretations to your inputs. Instead of generating new content from scratch, it enhances or reimagines existing images and videos with unique looks — ranging from surreal effects to realistic cinematic moods.$0.04Try Model →Image to Imagenano-banana-effectsNano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.$0.03Try Model →Text to Imagechroma-imageCroma Image is an advanced text-to-image generation model designed for high-quality, creative, and versatile visuals. It can produce anything from photorealistic portraits and products to imaginative concept art, fantasy illustrations, and cinematic scenes.$0.02Try Model →Text to Imagehunyuan-image-2.1Hunyuan Image is a powerful text-to-image generation model that produces photorealistic and highly detailed visuals. It excels at creating portraits, environments, and concept art with strong consistency and realism. Designed for versatility, it supports both natural photography styles and imaginative artistic outputs.$0.035Try Model →Image to Imagebytedance-seedream-v4-editSeedream v4 Edit refines or transforms existing images based on a new prompt and a reference. Instead of masking, you provide a source image and describe how it should be altered — adjusting style, details, or replacing elements while keeping the subject consistent.$0.04Try Model →Text to Imagebytedance-seedream-v4Seedream v4 generates stunning, high-fidelity images from text prompts. It’s designed for creativity with strong support for realism, fantasy, and artistic styles.$0.04Try Model →Text to Imagesdxl-imageSDXL is a high-quality, large Stable Diffusion model for creating photorealistic and stylized images from text. It excels at fine detail, realistic lighting, and complex scenes.$0.004Try Model →Image to Imageideogram-v3-reframeIdeogram V3 Reframe is a specialized image-to-image model built on Ideogram 3.0, designed to intelligently extend and adapt images across diverse aspect ratios and resolutions. Leveraging advanced AI outpainting, it preserves visual consistency while enabling creative reframing for digital, print, and video content.$0.15Try Model →Video to Videoinfinitetalk-video-to-videoInfiniteTalk Video-to-Video enhances or transforms existing videos by syncing the subject’s lip movements and facial expressions with new dialogue or speech. Instead of starting from a still image, you provide a video clip, and the model seamlessly reanimates the speaker’s mouth and expressions to match the script.$0.2Try Model →Audio to Videoinfinitetalk-image-to-videoInfiniteTalk Image-to-Video brings still portraits and character photos to life by generating natural, realistic talking videos. You provide a single face image and a dialogue script, and the model animates lip movement, facial expressions, and subtle head gestures to match the speech.$0.2Try Model →Image to Videowan2.1-reference-videoWAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.$0.1Try Model →Image to Videoseedance-lite-reference-videoSeedance Lite's Reference-to-Video feature allows you to supply up to 4 images as reference inputs. The model intelligently blends aspects from these images to generate a cohesive, high-quality video.$0.1Try Model →Text to Imagegoogle-imagen4-ultraImagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.$0.06Try Model →Text to Imagegoogle-imagen4-fastImagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.$0.02Try Model →Text to Imagegoogle-imagen4Google Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.$0.03Try Model →Audio to Videowan2.2-speech-to-videoWAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.$0.2Try Model →Text to Videopixverse-v5-t2vPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.$0.3Try Model →Image to Videopixverse-v5-i2vPixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.$0.3Try Model →Text to Imagenano-bananaNano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.$0.03Try Model →Image to Imagenano-banana-editNano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.$0.03Try Model →Text to Imageideogram-v3-t2iIdeogram v3 is an advanced text-to-image model designed for creating highly detailed and visually striking images directly from text prompts. It’s especially good for artistic compositions, design mockups, concept art, and photorealistic scenes. With strong support for text rendering inside images, it’s widely used for posters, typography-based art, and creative branding.$0.02Try Model →Trainingsdxl-loraThe SDXL LoRA image model enhances Stable Diffusion XL with specialized fine-tuning, letting you generate images in unique styles, characters, or themes. By applying LoRA weights, you can create visuals that match a specific aesthetic, celebrity look, anime style, or custom-trained subject.$0.002Try Model →Image to Imageimage-effectsAI Image Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning images from a image.$0.03Try Model →Image to Videovideo-effectsAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.$0.3Try Model →Video to Videoai-dance-effectsBring your characters and worlds to life with AI Dance Effects — a creative video effect that adds playful, dynamic, and cinematic motion to your generations. AI Dance Effects lets you guide how characters move, react, and express themselves.$0.3Try Model →Text to Videominimax-hailuo-02-pro-t2vHigh-fidelity text-to-video with cinematic rendering. Best for storytelling, cinematic clips, or realistic visuals with depth, atmosphere, and detail.$0.6Try Model →Image to Videominimax-hailuo-02-pro-i2vAdvanced image-to-video with cinematic realism. Adds dynamic camera motion, realistic physics, and atmospheric detail for storytelling.$0.6Try Model →Text to Videominimax-hailuo-02-standard-t2vFast and lightweight text-to-video generation. Ideal for quick drafts, previews, or playful content where speed matters more than cinematic quality.$0.3Try Model →Image to Videominimax-hailuo-02-standard-i2vTransforms an image into video with light, natural motion. Great for social media, quick animations, and previews.$0.15Try Model →Text to Videowan2.2-5b-fast-t2vWan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.$0.016Try Model →Image to Videovidu-q1-referenceVidu Q1 enables you to generate cinematic 1080p videos using multiple visual references—up to seven images—and text prompts. Designed for consistency, it preserves character appearance, props, and backgrounds across scenes while adding new motion and narrative elements.$0.4Try Model →Image to Imageqwen-image-editThe Qwen Edit Image Model allows you to modify existing images using text-based editing prompts. Instead of generating from scratch, you can upload a base image and describe the desired changes (e.g., replacing objects, altering colors, adding new elements).$0.03Try Model →Video to Videoluma-flash-reframeTransform and resize your videos effortlessly with Ray 2 Flash Reframe. This tool intelligently expands or adjusts your video’s aspect ratio—adding visually consistent content to the sides, top, or bottom—without altering the original subject.$0.35Try Model →Video to Videoluma-modify-videoLuma Modify Video lets you transform an existing video into a new creative scene while keeping the original motion and timing intact. The result is a new video with the same movements but a completely fresh look, atmosphere, or theme.$0.35Try Model →Audio to Videoveed-lipsyncGenerate realistic lipsync from any audio using VEED's latest model$0.04Try Model →Audio to Videocreatify-lipsyncRealistic lipsync video - optimized for speed, quality, and consistency.$0.04Try Model →Audio to Videolatent-syncLatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.$0.04Try Model →Audio to Videosync-lipsyncGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.$0.04Try Model →Image to Imageflux-pulidFlux PuLID is an innovative image-to-image model that enables consistent face rendering across different styles or scenes—without needing any model fine-tuning. By providing a reference image (e.g., a portrait), the model generates new visuals while maintaining your subject’s identity with high fidelity.$0.04Try Model →Image to Imageideogram-characterIdeogram’s Character Reference model enables consistent character generation using just one reference image. Upload a clear character portrait—and you can place that character in unlimited scenes, styles, poses, or narratives with visual fidelity maintained across all outputs.$0.15Try Model →Image to Imageminimax-image-01-subject-referenceMinimax’s I2I “Subject Reference” model enables you to transform images while preserving the appearance of a subject using a single reference image. Ideal for maintaining character likeness—features, clothing, or expression—across different styles or settings.$0.01Try Model →Image to Imagemidjourney-v7-omni-referenceMidjourney's Omni Reference lets you reuse characters, creatures, or styles from an existing image and place them into entirely new scenes. Simply provide a reference image (oref) and Midjourney will maintain identity, details, and visual consistency — ideal for storytelling, character design, or branding across multiple generations.$0.03Try Model →Image to Imagemidjourney-v7-style-referenceGenerate images in the distinctive aesthetic of Midjourney v7 — blending cinematic depth, photorealism or painterly rendering, rich textures, and dynamic lighting. This style reference model helps you infuse any subject with the visual storytelling, composition, and high detail fidelity that Midjourney is known for. Ideal for concept art, stylized portraits, and stunning environment scenes.$0.03Try Model →Video to Videorunway-aleph-v2vTransform any input video into a new visual style or scene while preserving motion and structure. Aleph V2V lets you apply artistic looks, cinematic lighting, or thematic changes to existing footage.$0.2Try Model →Text to Imageqwen-imageGenerate high-quality, detailed images from text prompts in various styles — from realistic to artistic — perfect for creative visuals, product shots, and concept art.$0.03Try Model →Text to Videovidu-v2.0-t2vVidu's 2.0 model offers enhanced visual quality and comprehensive workflow support across multiple resolution options for versatile content creation.$0.3Try Model →Image to Videovidu-v2.0-i2vVidu's 2.0 model delivers advanced image-based video generation with enhanced lighting, emotion dynamics, and automatic frame interpolation for polished visual content.$0.3Try Model →Text to Videopixverse-v4.5-t2vPixVerse v4.5 transforms descriptive text into vivid, high-resolution video clips. It understands complex scenes, human motion, and cinematic camera angles — great for creative storytelling, trailers, and animated concepts.$0.3Try Model →Image to Videopixverse-v4.5-i2vUpload an image and PixVerse v4.5 will breathe life into it with smooth camera motion, realistic effects, and animated elements. Whether it’s a portrait, landscape, or concept art, this mode turns still visuals into dynamic short videos.$0.3Try Model →Video to Videorunway-act-two-v2vTake an existing character video and sync it with the motion from a reference video. This lets you update facial expressions, head turns, and speech gestures while keeping the original look and style. It’s perfect for reshooting performances, dubbing, or animating characters without re-rendering visuals.$0.3Try Model →Image to Videorunway-act-two-i2vUpload a single character image and a driving video — the model transfers facial expressions and head movements from the video onto your image, bringing it to life. It works with photos, illustrations, or stylized portraits, making them speak, blink, and move naturally. Ideal for avatars, AI presenters, digital actors, and story scenes.$0.07Try Model →Text to Videowan2.2-text-to-videoWan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.$0.3Try Model →Image to Videowan2.2-image-to-videoWan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.$0.3Try Model →Image to Videokling-v2.1-pro-i2vKling 2.1 Pro is the high-end version of Kuaishou’s video generation model, offering enhanced realism, longer motion sequences, and cinematic quality. In I2V mode, it animates static images with fluid environmental effects.$0.4Try Model →Image to Videokling-v2.1-standard-i2vKling 2.1 Standard (developed by Kuaishou) brings static images to life by generating smooth, realistic video clips from a single frame. It captures subtle motion, background dynamics, and camera movement to produce professional-looking animations — ideal for portraits, digital art, and cinematic illustrations.$0.225Try Model →Image to Videokling-v2.1-master-i2vKling 2.1 Master’s I2V animates a still image into a coherent video sequence. It interprets motion, environment, and context to create realistic, visually stunning video outputs — ideal for animating portraits, scenes, or concept art.$0.3Try Model →Text to Videokling-v2.1-master-t2vKling 2.1 Master’s T2V mode allows users to generate vivid, high-quality videos from detailed text prompts. It supports dynamic scenes, natural motion, and cinematic quality — perfect for storytelling, ads, or content creation from imagination alone.$1.2Try Model →Image to Imagebytedance-seededit-v3Seededit allows precise edits to images using masks and prompt guidance. Whether you're replacing backgrounds, changing clothing, or inpainting missing areas, Seededit ensures realistic, high-quality results with semantic control.$0.03Try Model →Text to Imagebytedance-seedream-v3Seedream is designed for generating visually rich and artistic images from text prompts. It excels at fantasy, anime, surrealism, and vibrant color compositions — ideal for creative visuals, storyboards, and concept art.$0.03Try Model →Text to Videoseedance-pro-t2vSeedance Pro delivers high-fidelity video generation from text, producing rich visuals, smooth camera movement, and realistic scenes. Best for storytelling, content creation, and visual production.$0.18Try Model →Image to Videoseedance-pro-i2vSeedance Pro I2V advanced model animates still images into stunning short videos, preserving intricate visual details and applying smooth motion dynamics, ideal for high-end visuals and cinematic edits.$0.18Try Model →Text to Videoseedance-lite-t2vSeedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.$0.1Try Model →Image to Videoseedance-lite-i2vSeedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.$0.1Try Model →Text to Imageflux-schnellFlux Schnell is a lightning-fast image generation model designed for rapid iterations. It delivers good visual quality from text prompts almost instantly, making it perfect for real-time concept testing, brainstorming, and UI-integrated experiences.$0.003Try Model →Text to Videohunyuan-fast-text-to-videoHunyuan Fast T2V provides accelerated video generation from text prompts with slightly reduced detail but excellent speed. Ideal for rapid prototyping, concept testing, and short-form ideas where time is critical.$0.05Try Model →Text to Videohunyuan-text-to-videoHunyuan T2V generates detailed and dynamic videos from text prompts with a focus on realism and coherent motion. It handles multi-object scenes, human actions, and cinematic compositions effectively, making it ideal for storytelling and visual concepts.$0.15Try Model →Image to Videohunyuan-image-to-videoHunyuan I2V takes a static image and generates realistic video animations by interpreting motion and context. It works well for human portraits, objects, or scenes, adding lifelike movement while maintaining the image's integrity.$0.15Try Model →Trainingwan2.1-lora-t2vWAN 2.1 LoRA T2V enables users to generate videos from text prompts with custom-trained LoRA modules. Tailor the generation to specific characters, outfits, or animation styles — ideal for brand storytelling, fan content, and stylized animations.$0.3Try Model →Trainingwan2.1-lora-i2vBring still images to life using WAN 2.1 LoRA I2V, which supports custom LoRA fine-tunes for identity consistency. Animate expressions, subtle movements, or full-body actions while preserving personalized features from the image and LoRA.$0.3Try Model →Image to Videomidjourney-v7-image-to-videoMidjourney V7’s I2V breathes motion into still images, animating characters, environments, and objects with artistic transitions. Ideal for looping visual stories, concept animations, or enhancing still visuals with subtle motion.$0.15Try Model →Image to Imagemidjourney-v7-image-to-imageUse Midjourney V7’s I2I to refine or reinterpret existing images. Modify style, mood, lighting, or content while preserving the overall composition — great for alternate versions, art variations, or polishing concepts.$0.03Try Model →Text to Imagemidjourney-v7-text-to-imageMidjourney V7 produces high-quality, stylized images from text prompts. Known for its artistic flair, surreal composition, and vivid textures, it's perfect for character concepts, fantasy environments, and creative illustrations.$0.03Try Model →Text to Videowan2.1-text-to-videoWAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.$0.3Try Model →Image to Videowan2.1-image-to-videoAnimate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.$0.3Try Model →Image to Imagegpt4o-editEdit a specific part of an image using natural language. Ideal for object removal, replacement, or content-aware filling.$0.04Try Model →Image to Imagegpt4o-image-to-imageTransform an input image based on a new prompt — like changing style, lighting, or composition. Useful for reinterpreting visuals while keeping structure.$0.04Try Model →Text to Imagegpt4o-text-to-imageGenerate images from text prompts using GPT-4o's vision capabilities. Ideal for basic concept visuals, diagrams, and abstract compositions.$0.04Try Model →Image to Imageflux-kontext-max-i2iFlux Kontext Max I2I in Max mode allows precise image enhancement and visual transformations while retaining the source layout. It’s powerful for retouching, photo-to-art workflows, concept refinement.$0.06Try Model →Text to Imageflux-kontext-max-t2iFlux Kontext Max T2I delivers photorealistic or cinematic-quality images with exceptional detail. It's optimized for high-end visuals — from realistic humans to polished product renders.$0.06Try Model →Image to Imageflux-kontext-pro-i2iFlux Kontext Pro I2I variant enables transforming base images into refined artwork while keeping structure intact. It’s useful for sketch refinement, visual style changes, and creative edits such as re-dressing, relighting, or re-theming with prompt guidance.$0.03Try Model →Text to Imageflux-kontext-pro-t2iFlux Kontext Pro T2I offers fast and reliable generation with creative flexibility. It supports stylized prompts, character design, and fantasy themes while maintaining clear subject coherence.$0.03Try Model →Text to Imagewan2.1-text-to-imageWAN 2.1 is a powerful AI model that transforms text prompts into high-resolution, photorealistic images. It excels at detailed object rendering, realistic lighting, and fine textures, making it ideal for visual content, concept art, advertising, and digital storytelling.$0.03Try Model →Text to Audiosuno-extend-musicThis API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.$0.09Try Model →Text to Audiosuno-remix-musicThis API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.$0.09Try Model →Text to Audiosuno-create-musicSuno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.$0.09Try Model →Text to Videorunway-text-to-videoGenerate short, high-quality videos from plain text prompts. RunwayML’s text-to-video model interprets your written description and animates it into a moving visual scene with realistic or stylized motion.$0.09Try Model →Image to Videorunway-image-to-videoAnimate any image by turning it into a video with motion effects or scene continuity. RunwayML’s I2V model transforms static visuals into short clips by extrapolating depth, movement, and temporal dynamics.$0.15Try Model →Image to Imageai-object-eraserEasily remove unwanted objects, people, or text from any image using AI. Just select the area you want to erase, and the model will intelligently fill the space with realistic background matching the surrounding environment. No Photoshop skills needed.$0.05Try Model →Image to Imageai-image-extensionExpand the edges of any image with AI. This model continues your original photo or artwork beyond its borders while matching style, lighting, and content.$0.03Try Model →Text to Imageai-anime-generatorCreate stunning anime-style artwork instantly with our AI Anime Generator. Customize characters, scenes, and styles effortlessly in seconds!$0.03Try Model →Image to Imageai-ghibli-styleBring your imagination to life with art inspired by the enchanting world of Studio Ghibli. This AI model generates dreamy, hand-drawn visuals with soft colors, whimsical characters, and painterly backgrounds$0.05Try Model →Image to Imageai-product-photographyCreate professional-grade product photos using AI. Upload your item image and describe it with a prompt, and get studio-style, lifestyle, or creative backgrounds in seconds$0.05Try Model →Text to Imagehidream-i1-fullThe most advanced version of HiDream I1, delivering high-resolution, detailed images with superior prompt understanding. Best suited for production, content creation, and high-fidelity applications.$0.04Try Model →Text to Imagehidream-i1-devOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.$0.02Try Model →Text to Imagehidream-i1-fastOptimized for speed, this variant generates images in just a few steps. Ideal for previews, real-time applications, and use cases where fast results are more important than fine detail.$0.008Try Model →Text to Imageflux-kontext-dev-t2iGenerates an image from a text prompt, with optional reference image for pose or style guidance. Ideal for controlled, consistent image creation using just a description.$0.02Try Model →Image to Imageflux-kontext-dev-i2iTakes an input images and transforms it based on a new prompt. Keeps structure or pose while changing style, appearance, or details.$0.02Try Model →Trainingflux-dev-loraEnables text-to-image generation using custom LoRA models. Generate consistent characters, styles, or branded visuals with high quality and fast results.$0.015Try Model →Image to Videoveo3-fast-image-to-videoQuickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.$0.6Try Model →Text to Imageflux-devGenerate stunning visuals from simple text prompts. Flux Dev transforms your ideas into high-quality, creative images using powerful AI vision models. Perfect for design, storytelling, concept art, and marketing.$0.015Try Model →Image to Imageai-color-photoAutomatically add lifelike colors to black-and-white images. Our AI brings history to life with natural tones, accurate shading, and context-aware colorization.$0.01Try Model →Image to Imageai-skin-enhancerSmooth skin, reduce blemishes, and enhance complexion with natural-looking results. Perfect for portraits, selfies, and professional photo retouching.$0.01Try Model →Image to Imageai-product-shotInstantly generate studio-quality product images with AI. Upload your item photo and get clean, stylized shots perfect for e-commerce, ads, and catalogs.$0.06Try Model →Image to Imageai-background-removerInstantly remove image backgrounds with pixel-perfect precision. Ideal for product photos, profile pictures, and creative projects.$0.01Try Model →Video to Videommaudio-v2-video-to-videoMMAudio-v2 generates high-quality, synchronized audio from video or text inputs. Seamlessly integrate it with AI video models to create fully-voiced, expressive video content.$0.01Try Model →Text to Audiommaudio-v2-text-to-audioConvert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.$0.01Try Model →Image to Imageai-dress-changeInstantly change outfits in images using AI. Visualize different clothing styles without the need for physical trials—perfect for fashion, e-commerce, and virtual try-ons.$0.1Try Model →Video to Videoai-video-face-swapReplace faces in videos with stunning realism. Our AI ensures accurate expression transfer, lighting consistency, and smooth frame-by-frame blending.$0.1Try Model →Image to Imageai-image-face-swapAdvanced facial recognition and blending algorithms enable precise face swaps while preserving skin tone, lighting, and facial geometry.$0.02Try Model →Image to Imageai-image-upscalerTransform blurry or pixelated images into high-definition visuals. Our AI Image Upscaler uses deep learning to reconstruct details and bring your visuals to life.$0.02Try Model →Text to Videoveo3-fast-text-to-videoVEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.$0.6Try Model →Text to Videoveo3-text-to-videoVEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.$2.5Try Model →Image to Videoveo3-image-to-videoVEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.$2.5Try Model →Image to VideovfxVFX delivers high-impact visual effects like explosions, particles, and cinematic overlays to transform static images into action-packed videos.$0.3Try Model →Image to Videomotion-controlsMotion Controls adds dynamic camera movements, speed ramps, and zoom effects to bring your images to life as smooth, engaging videos.$0.3Try Model →Image to Videoai-video-effectsAI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.$0.3Try Model →

Ferramentas da mesma categoria