Documentation Index
Fetch the complete documentation index at: https://docs.xagent.run/llms.txt
Use this file to discover all available pages before exploring further.
Image Generation Models
Image generation models enable Xagent to create and modify images through text prompts.Purpose
Enable Image Tools:generate_image- Create images from text descriptionsedit_image- Modify existing images- Visual content creation and editing
- Accept text prompts describing desired image
- Generate new images or modify existing ones
- Return image files saved to workspace
When to Use
Required for:- Image generation tasks
- Image editing and modification
- Visual content creation
- Design and marketing materials
- Image understanding or analysis (uses Vision LLMs)
- Image OCR or text extraction (uses Vision LLMs)
- Chart/graph analysis (uses Vision LLMs)
Supported Providers
OpenAI & OpenAI-compatible
Models: gpt-image-1 and compatible models Setup:- Get API key from OpenAI Platform
- For compatible services, provide base URL and API key
- Select image model
- High-quality image generation
- Creative tasks
- Marketing materials
- Wide compatibility
DashScope (Alibaba Cloud)
Models: qwen-image Setup:- Get API key from DashScope Console
- Configure API key
- Select model
- Chinese language optimization
- Cost-effective for Asian markets
- Alibaba Cloud integration
- JPG, JPEG, PNG, BMP, TIFF, WEBP
- Max file size: 10MB for editing
Gemini (Google)
Models: gemini-3-pro-preview-image Setup:- Get API key from Google AI Studio
- Configure API key (supports
GEMINI_API_KEYorGOOGLE_API_KEY) - Select image model
- High-quality image generation
- Google Cloud integration
- Multiple resolution support
- Various aspect ratios
- Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
- Image sizes: 1K (up to 1024px), 2K (up to 2048px), 4K (up to 4096px)
- Output: PNG format (base64 encoded)
- Text-to-image generation using Google’s Imagen
- Automatic aspect ratio detection and selection
- Resolution based on model capabilities (2K/4K)
- Support for negative prompts
Gemini image models don’t support image editing. Use generate_image only.
Xinference
Models: Stable Diffusion variants (stable-diffusion-2-1, etc.) Setup:- Deploy Xinference server
- Launch image generation model
- Configure base URL in Xagent
- Select model from Xinference
- Self-hosted deployment
- Data privacy
- Cost control
- Using open-source models
- List available models from server
- Support for various Stable Diffusion models
- Image-to-image and inpainting capabilities
Configuration
Step 1: Add Image Provider
- Go to Models in the sidebar
- Click Add Model or Add Provider
- Select image generation provider
- Enter API credentials
Step 2: Configure Model
- Select the image model
- Configure parameters:
- Image size (e.g., 10241024, 17921024)
- Response format (url or b64_json)
- Number of images (n)
- Test generation
Step 3: Set Abilities
Image models can have these abilities:- generate - Create new images from text
- edit - Modify existing images
Not all image models support both generation and editing. Check provider documentation.
Usage Examples
Generating Images
Editing Images
Multi-Image Editing
Troubleshooting
Generation Failed
Check:- API key is valid
- Model supports generation ability
- Prompt follows guidelines
- Content policy violations
Poor Quality
Try:- Improving prompt specificity
- Adding style instructions
- Using negative prompts
- Trying different model
- Adjusting image size
Slow Generation
Optimize:- Reduce image size
- Consider faster model
- Check network connectivity
Edits Not Working
Verify:- Model supports edit ability
- Edit instructions are clear
- Original image is accessible
- Image format is supported
- File size within limits (DashScope: 10MB)
File Not Found
Check:- Image path is correct
- File exists in workspace
- Use workspace file browser to verify path
- URL is accessible
Security & Privacy
Important considerations:- Generated images are saved to workspace
- Check provider’s data retention policy
- Be aware of content policies
- Copyright and usage rights
- Review provider content policy
- Ensure rights to generated content
- Consider compliance requirements
- Be mindful of copyright
Next Steps
- LLM Models - Configure language models
- Vision LLMs - Configure image understanding models
- Embedding Models - Configure vector embeddings
- Model Overview - Understanding all model types