Image Generation Models

Image generation models enable Xagent to create and modify images through text prompts.

Purpose

Enable Image Tools:

generate_image - Create images from text descriptions
edit_image - Modify existing images
Visual content creation and editing

How it works:

Accept text prompts describing desired image
Generate new images or modify existing ones
Return image files saved to workspace

When to Use

Required for:

Image generation tasks
Image editing and modification
Visual content creation
Design and marketing materials

Not required for:

Image understanding or analysis (uses Vision LLMs)
Image OCR or text extraction (uses Vision LLMs)
Chart/graph analysis (uses Vision LLMs)

Supported Providers

OpenAI & OpenAI-compatible

Models: gpt-image-1 and compatible models Setup:

Get API key from OpenAI Platform
For compatible services, provide base URL and API key
Select image model

Abilities: Both “generate” and “edit” Best for:

High-quality image generation
Creative tasks
Marketing materials
Wide compatibility

DashScope (Alibaba Cloud)

Models: qwen-image Setup:

Get API key from DashScope Console
Configure API key
Select model

Abilities: Primarily “generate” (can support “edit”) Best for:

Chinese language optimization
Cost-effective for Asian markets
Alibaba Cloud integration

Supported formats:

JPG, JPEG, PNG, BMP, TIFF, WEBP
Max file size: 10MB for editing

Gemini (Google)

Models: gemini-3-pro-preview-image Setup:

Get API key from Google AI Studio
Configure API key (supports GEMINI_API_KEY or GOOGLE_API_KEY)
Select image model

Abilities: “generate” only Best for:

High-quality image generation
Google Cloud integration
Multiple resolution support
Various aspect ratios

Supported formats:

Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Image sizes: 1K (up to 1024px), 2K (up to 2048px), 4K (up to 4096px)
Output: PNG format (base64 encoded)

Features:

Text-to-image generation using Google’s Imagen
Automatic aspect ratio detection and selection
Resolution based on model capabilities (2K/4K)
Support for negative prompts

Gemini image models don’t support image editing. Use generate_image only.

Xinference

Models: Stable Diffusion variants (stable-diffusion-2-1, etc.) Setup:

Deploy Xinference server
Launch image generation model
Configure base URL in Xagent
Select model from Xinference

Abilities: Configurable (default: [“generate”], supports “edit”) Best for:

Self-hosted deployment
Data privacy
Cost control
Using open-source models

Features:

List available models from server
Support for various Stable Diffusion models
Image-to-image and inpainting capabilities

Configuration

Step 1: Add Image Provider

Go to Models in the sidebar
Click Add Model or Add Provider
Select image generation provider
Enter API credentials

Step 2: Configure Model

Select the image model
Configure parameters:
- Image size (e.g., 10241024, 17921024)
- Response format (url or b64_json)
- Number of images (n)
Test generation

Step 3: Set Abilities

Image models can have these abilities:

generate - Create new images from text
edit - Modify existing images

Not all image models support both generation and editing. Check provider documentation.

Usage Examples

Generating Images

User: "Create a promotional poster for a coffee shop"
Xagent: [Uses generate_image tool]
[Generates image based on description, saves to workspace]

Editing Images

User: "Change the color scheme to warm tones"
Xagent: [Uses edit_image tool]
[Modifies existing image, saves to workspace]

Multi-Image Editing

User: "Combine these two images with a sunset background"
Xagent: [Uses edit_image tool with multiple images]
[Edits multiple images into one result]

Troubleshooting

Generation Failed

Check:

API key is valid
Model supports generation ability
Prompt follows guidelines
Content policy violations

Poor Quality

Try:

Improving prompt specificity
Adding style instructions
Using negative prompts
Trying different model
Adjusting image size

Slow Generation

Optimize:

Reduce image size
Consider faster model
Check network connectivity

Edits Not Working

Verify:

Model supports edit ability
Edit instructions are clear
Original image is accessible
Image format is supported
File size within limits (DashScope: 10MB)

File Not Found

Check:

Image path is correct
File exists in workspace
Use workspace file browser to verify path
URL is accessible

Security & Privacy

Important considerations:

Generated images are saved to workspace
Check provider’s data retention policy
Be aware of content policies
Copyright and usage rights

Recommendations:

Review provider content policy
Ensure rights to generated content
Consider compliance requirements
Be mindful of copyright

Next Steps

LLM Models - Configure language models
Vision LLMs - Configure image understanding models
Embedding Models - Configure vector embeddings
Model Overview - Understanding all model types

Documentation Index

​Image Generation Models

​Purpose

​When to Use

​Supported Providers

​OpenAI & OpenAI-compatible

​DashScope (Alibaba Cloud)

​Gemini (Google)

​Xinference

​Configuration

​Step 1: Add Image Provider

​Step 2: Configure Model

​Step 3: Set Abilities

​Usage Examples

​Generating Images

​Editing Images

​Multi-Image Editing

​Troubleshooting

​Generation Failed

​Poor Quality

​Slow Generation

​Edits Not Working

​File Not Found

​Security & Privacy

​Next Steps

Image Generation Models

Purpose

When to Use

Supported Providers

OpenAI & OpenAI-compatible

DashScope (Alibaba Cloud)

Gemini (Google)

Xinference

Configuration

Step 1: Add Image Provider

Step 2: Configure Model

Step 3: Set Abilities

Usage Examples

Generating Images

Editing Images

Multi-Image Editing

Troubleshooting

Generation Failed

Poor Quality

Slow Generation

Edits Not Working

File Not Found

Security & Privacy

Next Steps