AI Generation Glossary
Short, practical explanations of the terms you meet when generating images, video, voice and avatars with AI. Each entry defines the concept, explains how it works, and links to a studio where you can try it.
Generation basics
Text to ImageText to image is AI generation where a model creates an original picture from a written description, translating the words in your prompt into composition, subject, lighting and style.Prompt EngineeringPrompt engineering is the practice of writing, structuring and iterating on prompts so that an AI model reliably produces the output you intend, treating the prompt as a controllable input rather than a wish.Negative PromptA negative prompt is a second prompt that tells an AI generation model what you do not want in the output, steering the result away from listed elements such as blur, watermarks, text or unwanted objects.SeedIn AI generation, a seed is the number that initializes the random noise a model starts from. The same seed with the same prompt and settings reproduces the same output, while a different seed produces a different variation.LoRAA LoRA, short for Low-Rank Adaptation, is a small trainable add-on that customizes a large image model to a specific style, character, product or concept without retraining the whole model.Style TransferStyle transfer is an AI technique that re-renders an image in a different visual style, for example turning a photograph into a watercolor, anime frame or oil painting, while preserving the content and composition of the original.
Video
Text to VideoText to video is AI generation where you write a description of a scene and a model produces a complete video clip of it, inventing the visuals, motion and camera work from your words alone.Image to VideoImage to video is an AI technique that takes a still picture and generates a short video clip from it, animating the scene while keeping the subject, style and composition of the original image.Video Diffusion ModelA video diffusion model is a type of AI model that generates video by starting from random noise and progressively refining it into a sequence of coherent frames, learning to keep subjects, lighting and motion consistent across time.Video ExtendVideo extend is a feature of AI video generation that continues an existing clip beyond its original ending, generating additional seconds that keep the same subjects, style and motion so the result plays as one longer shot.
Audio and voice
Text to SpeechText to speech, or TTS, is technology that converts written text into spoken audio. Modern neural TTS generates natural-sounding voices with realistic intonation, rhythm and emotion rather than the robotic output of older systems.Voice CloningVoice cloning is an AI technique that learns the characteristics of a specific person's voice from a short audio sample and can then generate new speech in that voice, saying anything you type.Lip Sync AILip sync AI is technology that animates or re-animates a person's mouth in a video or image so it matches a given audio track, making it look like the person is naturally speaking those words.AI AvatarAn AI avatar is a digital human presenter generated by AI, typically created from a photo or a template character, that speaks a script with synchronized lip movement, facial expressions and sometimes gestures.
Editing
InpaintingInpainting is an AI image editing technique that regenerates only a selected region of an image, letting you remove, replace or repair part of a picture while the rest stays untouched and the new content blends in seamlessly.OutpaintingOutpainting is an AI technique that extends an image beyond its original borders, generating new content on any side that continues the scene's style, lighting and perspective as if the photo had always been larger.AI UpscalingAI upscaling is the process of enlarging an image using a model that generates plausible new detail, producing a sharp high-resolution result instead of the blur you get from simply stretching the pixels.Face SwapFace swap is an AI technique that replaces a face in an image or video with a different face, automatically matching the target's pose, lighting, skin tone and expression so the swapped face looks native to the scene.