feat: Implement AI-Powered Design Generation via Chat

This feature allows you to generate and iteratively refine visual designs
(like posters, logos, etc.) directly through the chat interface.

Key changes include:

1.  **Backend - Intent Detection & Processing:**
    *   I've introduced an `ImageGenerationIntentDetector` in `backend/open_webui/utils/intent_processors.py`. This function:
        *   Uses keyword-based intent detection for new designs and refinements.
        *   Extracts prompts from your messages.
        *   Modifies previous prompts for iterative refinements.
        *   Calls the existing `/api/v1/images/generations` endpoint using `httpx`.
        *   Formats the response (image URL or error) as a chat message, including metadata like `is_generated_design`, `original_prompt`, and `engine_used`.
    *   I've integrated this detector into the main chat processing logic in `backend/open_webui/utils/chat.py`. Design generation requests are now handled by the detector, bypassing the LLM if intent is recognized.

2.  **Frontend - Image Display:**
    *   My analysis confirmed that existing Svelte components (`MarkdownInlineTokens.svelte` using `Image.svelte`) are capable of rendering Markdown-formatted image URLs (`![alt text](url)`) sent by the backend.
    *   The `Image.svelte` component also provides an image preview feature.

3.  **Design Management (MVP Approach):**
    *   For the MVP, generated images are saved via the existing file upload mechanism.
    *   The chat history, with messages containing image URLs and generation metadata, serves as the primary way for you to access and track their designs and refinements. No new database models for explicit design management were added.

4.  **Dependencies:**
    *   I've added `httpx>=0.25.0` to `backend/requirements.txt` to ensure the HTTP client for the intent detector is explicitly listed.

5.  **Documentation:**
    *   I've drafted updates for `README.md` to highlight the new "AI-Powered Design Generation" feature, replacing the previous, more basic "Image Generation Integration" description.

**Testing Plan:**
*   I've prepared detailed manual end-to-end test cases, unit test cases for the `ImageGenerationIntentDetector`, and a frontend visual review checklist to guide developer testing.

This set of changes provides the core functionality for you to conversationally create and refine designs within Open WebUI.
This commit is contained in:
google-labs-jules[bot] 2025-05-25 08:39:03 +00:00
parent 82716f3789
commit 71fabc1579
5 changed files with 448 additions and 1 deletions

View File

@ -47,7 +47,7 @@ For more information, be sure to check out our [Open WebUI Documentation](https:
- 🌐 **Web Browsing Capability**: Seamlessly integrate websites into your chat experience using the `#` command followed by a URL. This feature allows you to incorporate web content directly into your conversations, enhancing the richness and depth of your interactions.
- 🎨 **Image Generation Integration**: Seamlessly incorporate image generation capabilities using options such as AUTOMATIC1111 API or ComfyUI (local), and OpenAI's DALL-E (external), enriching your chat experience with dynamic visual content.
- **🎨 AI-Powered Design Generation:** Effortlessly create and refine a variety of visual designs—such as posters, logos, and social media graphics—directly through chat. Simply describe your desired design, and Open WebUI will generate visuals using its integrated image generation capabilities. Iteratively perfect your designs with follow-up messages, making design creation intuitive and conversational.
- ⚙️ **Many Models Conversations**: Effortlessly engage with various models simultaneously, harnessing their unique strengths for optimal responses. Enhance your experience by leveraging a diverse set of models in parallel.

View File

@ -51,6 +51,8 @@ from open_webui.utils.filter import (
get_sorted_filter_ids,
process_filter_functions,
)
from open_webui.utils.intent_processors import image_generation_intent_detector
from open_webui.models.chats import Chats # Required for saving the message
from open_webui.env import SRC_LOG_LEVELS, GLOBAL_LOG_LEVEL, BYPASS_MODEL_ACCESS_CONTROL
@ -162,6 +164,101 @@ async def generate_chat_completion(
bypass_filter: bool = False,
):
log.debug(f"generate_chat_completion: {form_data}")
# Extract chat_id, user_message, and history for the intent detector
# The form_data["messages"] is expected to be the full history including the latest user message
messages = form_data.get("messages", [])
chat_id = form_data.get("chat_id") # Assuming chat_id is available in form_data
if not chat_id and hasattr(request.state, "chat_id"): # Try to get from request state if available
chat_id = request.state.chat_id
log.debug(f"Messages for detector: {messages}")
user_message_content = ""
chat_history_for_detector = []
if messages:
# The last message is the current user message
user_message_obj = messages[-1]
if user_message_obj.get("role") == "user":
user_message_content = user_message_obj.get("content", "")
# History is all messages before the last one
chat_history_for_detector = messages[:-1]
else:
# This case should ideally not happen if form_data is well-formed for a new turn
log.warning("Last message is not from user, cannot process with intent detector.")
# Fall through to normal processing without detector call
if user_message_content and not getattr(request.state, "direct", False): # Don't run detector for direct model connections
log.debug(f"Calling ImageGenerationIntentDetector for chat_id '{chat_id}' with message: '{user_message_content}'")
try:
detector_response = await image_generation_intent_detector(
user_message=user_message_content,
chat_history=chat_history_for_detector,
request=request,
current_chat_id=chat_id,
)
except Exception as e:
log.exception("ImageGenerationIntentDetector raised an exception.")
detector_response = None # Proceed to LLM on detector error
if detector_response:
log.info(f"ImageGenerationIntentDetector returned a response for chat_id '{chat_id}': {detector_response}")
# The detector's response is a complete assistant message dict.
# This message needs to be added to the chat history.
# The `generate_chat_completion` function itself typically returns the content
# that the calling router then uses.
# For now, we'll assume the calling router is responsible for saving the full chat context
# including this new message.
# We need to return it in a format that mimics a non-streaming LLM response if possible,
# or handle it specially if streaming.
# If the original request was for streaming, this is tricky.
# For simplicity in this integration, if detector responds, we won't stream.
# We will return the single message. The frontend will need to handle it.
if form_data.get("stream"):
log.warn("ImageGenerationIntentDetector responded, but original request was for stream. Returning as single event.")
async def single_event_stream():
yield f"data: {json.dumps({'id': str(uuid.uuid4()), 'choices': [{'delta': detector_response}]})}\n\n"
yield f"data: {json.dumps({'done': True})}\n\n"
# The response needs to be structured somewhat like an OpenAI streaming chunk
# to be minimally disruptive to existing stream handling.
# A more robust solution would be custom event types for this.
# For now, we send the whole message as a 'delta' in the first chunk.
# This is a HACK for streaming.
# A better way would be to have the socket emit this message directly.
return StreamingResponse(single_event_stream(), media_type="text/event-stream")
# For non-streaming, the detector_response is already a dict like:
# {"role": "assistant", "content": "...", "metadata": {...}}
# The typical non-streaming response from generate_openai_chat_completion is a dict
# that includes a "choices" list, e.g., {"choices": [{"message": detector_response}] }
return {
"id": str(uuid.uuid4()), # Generate a unique ID for this "response"
"object": "chat.completion",
"created": int(time.time()),
"model": detector_response.get("metadata", {}).get("engine_used") or form_data.get("model"), # Use engine from metadata or original model
"choices": [
{
"index": 0,
"message": detector_response, # The full message from the detector
"finish_reason": "stop",
}
],
"usage": { # Dummy usage
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
}
}
# IMPORTANT: The above response will be handled by the router.
# The router (e.g., in chats.py or a socket handler) is responsible for taking this response
# and saving the `detector_response` message to the database.
# This function `generate_chat_completion` is primarily for *generating* the content.
if BYPASS_MODEL_ACCESS_CONTROL:
bypass_filter = True
@ -174,6 +271,9 @@ async def generate_chat_completion(
**request.state.metadata,
}
# If detector did not handle it, proceed with normal LLM flow
log.debug("ImageGenerationIntentDetector did not handle the message, proceeding to LLM.")
if getattr(request.state, "direct", False) and hasattr(request.state, "model"):
models = {
request.state.model["id"]: request.state.model,
@ -184,11 +284,14 @@ async def generate_chat_completion(
model_id = form_data["model"]
if model_id not in models:
# This check might be redundant if the detector already ran and returned,
# but good for the path where detector doesn't run or returns None.
raise Exception("Model not found")
model = models[model_id]
if getattr(request.state, "direct", False):
# Detector is currently skipped for direct connections, so this path remains unchanged.
return await generate_direct_chat_completion(
request, form_data, user=user, models=models
)

View File

@ -0,0 +1,342 @@
import logging
import re
from typing import Optional, List, Dict, Any
from fastapi import Request, HTTPException
import httpx # For making async HTTP requests
log = logging.getLogger(__name__)
# Keywords to detect image generation intent
IMAGE_GEN_KEYWORDS = [
"generate image", "create image", "draw image", "show image", "make image",
"generate a picture", "create a picture", "draw a picture", "show a picture", "make a picture",
"generate poster", "create poster", "design poster", "make poster",
"generate photo", "create photo", "show photo", "make photo",
"generate drawing", "create drawing", "draw drawing", "make drawing",
"generate design", "create design", "design a", "make design",
"generate art", "create art", "show art", "make art",
"draw a", "generate a", "create a", "design a", "make a", "show a"
]
# Keywords that might indicate a refinement, but need context
REFINEMENT_KEYWORDS = [
"change", "modify", "update", "add", "remove", "make it", "make them",
"more", "less", "bigger", "smaller", "brighter", "darker",
"another one", "different version", "try again with"
]
async def image_generation_intent_detector(
user_message: str,
chat_history: List[Dict[str, Any]],
request: Request,
current_chat_id: Optional[str] = None, # Added for context if needed
) -> Optional[Dict[str, Any]]:
"""
Detects intent for image generation in a user message, calls an image generation service,
and formats the response.
Args:
user_message: The current message from the user.
chat_history: The history of the conversation.
request: The FastAPI request object, used for making internal API calls.
current_chat_id: The ID of the current chat.
Returns:
A dictionary containing the assistant's response message if image generation
is successful, or None if no image generation intent is detected or an error occurs.
The dict structure for a successful generation:
{
"role": "assistant",
"content": "Here is the image you requested: ![image](image_url)",
"metadata": {
"is_generated_design": True,
"original_prompt": "extracted_prompt",
"image_url": "image_url",
"engine_used": "engine_name", // Or None if not available
"chat_id": current_chat_id, // For potential UI use
}
}
If intent is detected but generation fails, it might return a message indicating failure.
If no intent, returns None.
"""
log.debug(f"ImageGenDetector: Processing message: '{user_message}'")
user_message_lower = user_message.lower()
detected_intent = False
extracted_prompt = user_message # Default to full message
is_refinement = False
base_prompt_from_history = None
previous_image_url = None
# 1. Check for refinement intent first
# Look for the last assistant message that was a generated design
last_assistant_design_message = None
for i in range(len(chat_history) - 1, -1, -1):
msg = chat_history[i]
if msg.get("role") == "assistant" and msg.get("metadata", {}).get("is_generated_design"):
last_assistant_design_message = msg
break
if last_assistant_design_message:
log.debug(f"ImageGenDetector: Found previous design: {last_assistant_design_message}")
# Basic check if current message sounds like a refinement
if any(keyword in user_message_lower for keyword in REFINEMENT_KEYWORDS) or \
not any(gen_keyword in user_message_lower for gen_keyword in IMAGE_GEN_KEYWORDS): # if no explicit "generate"
is_refinement = True
base_prompt_from_history = last_assistant_design_message["metadata"]["original_prompt"]
previous_image_url = last_assistant_design_message["metadata"]["image_url"]
# For now, refinement means modifying the previous prompt
# A more sophisticated approach would use NLP to understand the modification
extracted_prompt = f"{base_prompt_from_history}, {user_message}" # Simple concatenation for now
log.info(f"ImageGenDetector: Refinement detected. New combined prompt: '{extracted_prompt}'")
detected_intent = True
# 2. If not a refinement, check for new image generation intent
if not detected_intent:
for keyword in IMAGE_GEN_KEYWORDS:
if keyword in user_message_lower:
detected_intent = True
# Attempt to extract a cleaner prompt by removing the keyword phrase
# This is a basic approach and can be improved.
# Example: "generate an image of a cat" -> "a cat"
# Example: "a cat" (if keyword was "draw a")
match = re.search(re.escape(keyword) + r"(?: of)?(?: an)?(?: a)?\s*(.*)", user_message, re.IGNORECASE)
if match and match.group(1):
extracted_prompt = match.group(1).strip()
elif user_message_lower.startswith(keyword): # "draw a cat"
extracted_prompt = user_message[len(keyword):].strip()
else: # Fallback if regex fails but keyword is present
extracted_prompt = user_message
if not extracted_prompt: # If keyword was at the end e.g. "generate an image"
log.warning("ImageGenDetector: Keyword detected but no prompt extracted. Needs more input.")
# Ask for more specific prompt, or handle as no intent for now
return { # Or return None and let LLM handle it
"role": "assistant",
"content": "It looks like you want to generate an image, but I need a description. What would you like me to create?",
"metadata": {"requires_clarification": True}
}
log.info(f"ImageGenDetector: New image intent detected. Extracted prompt: '{extracted_prompt}'")
break
if not detected_intent:
log.debug("ImageGenDetector: No image generation intent detected.")
return None
# 3. Call Image Generation Service
# Construct the correct base URL for internal API calls
# The /api/v1 prefix is usually handled by the app's router
image_gen_url = f"{request.base_url}api/v1/images/generations"
# Default payload structure - this might need adjustment based on actual /images/generations endpoint
# We assume it takes a 'prompt' and might take 'model' (engine) or other params.
# For now, we let the image generation service use its default model/engine.
payload = {
"prompt": extracted_prompt,
# "model": "default_image_model_id", # If we need to specify or can get from config
# Add other parameters like n, size, quality if supported and desired
}
# If it's a refinement, and the API supports image-to-image or editing:
# This part is speculative as we don't know the API's capabilities for refinement.
# For this subtask, we are just re-prompting for refinements.
# if is_refinement and previous_image_url and some_condition_for_img2img:
# payload["image_url_to_edit"] = previous_image_url # Or image data
# payload["original_prompt"] = base_prompt_from_history # If API uses it
log.debug(f"ImageGenDetector: Calling image generation service at {image_gen_url} with payload: {payload}")
try:
# Need an async HTTP client. If OpenWebUI uses a shared client, use that.
# For now, creating a new one.
async with httpx.AsyncClient(app=request.app, trust_env=False) as client:
# We need to ensure cookies/auth are passed if the internal API requires them.
# This might involve forwarding headers from the original `request` object.
headers = {"Cookie": request.headers.get("cookie")} if request.headers.get("cookie") else {}
response = await client.post(image_gen_url, json=payload, headers=headers, timeout=60.0)
response.raise_for_status() # Raises an exception for 4XX/5XX responses
response_data = response.json()
log.debug(f"ImageGenDetector: Response from image service: {response_data}")
# Assuming the response_data is a list of generated images,
# and each image object has a 'url' and potentially 'engine' or 'model_id'
# Taking the first image if multiple are returned.
if isinstance(response_data, list) and len(response_data) > 0:
image_info = response_data[0]
image_url = image_info.get("url") # Adjust key if necessary
engine_used = image_info.get("engine") or image_info.get("model_id") # Adjust key
elif isinstance(response_data, dict) and response_data.get("url"): # Single image response
image_url = response_data.get("url")
engine_used = response_data.get("engine") or response_data.get("model_id")
else:
log.error(f"ImageGenDetector: Unexpected response format from image service: {response_data}")
return {
"role": "assistant",
"content": "I tried to generate the image, but I received an unexpected response from the image service.",
"metadata": {"error": True, "details": "Unexpected response format"}
}
if not image_url:
log.error(f"ImageGenDetector: No image URL found in response: {response_data}")
return {
"role": "assistant",
"content": "I tried to generate the image, but couldn't find the image URL in the response.",
"metadata": {"error": True, "details": "No image URL in response"}
}
# 4. Format Response
assistant_message_content = f"Here is the image you requested: ![image]({image_url})"
if is_refinement:
assistant_message_content = f"Okay, I've updated the image: ![image]({image_url})"
chat_message = {
"role": "assistant",
"content": assistant_message_content,
"metadata": {
"is_generated_design": True,
"original_prompt": extracted_prompt, # The prompt used for *this* generation
"base_prompt_if_refinement": base_prompt_from_history if is_refinement else None,
"image_url": image_url,
"engine_used": engine_used,
"chat_id": current_chat_id,
}
}
log.info(f"ImageGenDetector: Successfully generated image. Response: {chat_message}")
return chat_message
except httpx.HTTPStatusError as e:
log.error(f"ImageGenDetector: HTTP error from image service: {e.response.status_code} - {e.response.text}")
error_detail = f"Image service returned error {e.response.status_code}."
try:
error_content = e.response.json()
if error_content.get("detail"):
error_detail = error_content.get("detail")
except:
pass # Keep generic error_detail
return {
"role": "assistant",
"content": f"Sorry, I couldn't generate the image. {error_detail}",
"metadata": {"error": True, "details": str(e)}
}
except httpx.RequestError as e:
log.error(f"ImageGenDetector: Request error calling image service: {e}")
return {
"role": "assistant",
"content": "Sorry, I couldn't reach the image generation service. Please try again later.",
"metadata": {"error": True, "details": str(e)}
}
except Exception as e:
log.exception("ImageGenDetector: An unexpected error occurred.")
return {
"role": "assistant",
"content": "An unexpected error occurred while trying to generate the image.",
"metadata": {"error": True, "details": str(e)}
}
async def example_usage():
# This is a mock request object for testing purposes
class MockApp:
async def __call__(self, scope, receive, send):
if scope["path"] == "/api/v1/images/generations" and scope["method"] == "POST":
# Simulate a successful image generation response
response_body = json.dumps([{
"url": "http://example.com/generated_image.png",
"engine": "mock_engine_v1",
"prompt": json.loads(await receive()["body"].decode())["prompt"]
}])
await send({
"type": "http.response.start",
"status": 200,
"headers": [[b"content-type", b"application/json"]],
})
await send({
"type": "http.response.body",
"body": response_body.encode("utf-8"),
})
return
# Simulate an error response for a specific prompt
if scope["path"] == "/api/v1/images/generations" and \
json.loads(await receive()["body"].decode())["prompt"] == "error_prompt":
response_body = json.dumps({"detail": "Simulated error from image service"})
await send({
"type": "http.response.start",
"status": 500,
"headers": [[b"content-type", b"application/json"]],
})
await send({
"type": "http.response.body",
"body": response_body.encode("utf-8"),
})
return
# Default not found
await send({"type": "http.response.start", "status": 404, "headers": []})
await send({"type": "http.response.body", "body": b""})
class MockRequest:
def __init__(self, base_url="http://localhost:8080/", app=None, headers=None):
self.base_url = base_url
self.app = app or MockApp()
self.headers = headers or {}
# Test cases
chat_history_1 = []
user_message_1 = "Can you generate an image of a happy cat?"
print(f"\n--- Test Case 1: New Image ---")
print(f"User: {user_message_1}")
response_1 = await image_generation_intent_detector(user_message_1, chat_history_1, MockRequest(), "chat1")
print(f"Assistant: {response_1}")
chat_history_2 = [
{"role": "user", "content": "Generate a poster for a rock concert"},
{
"role": "assistant",
"content": "Here is the image you requested: ![image](http://example.com/rock_poster_v1.png)",
"metadata": {
"is_generated_design": True,
"original_prompt": "a poster for a rock concert",
"image_url": "http://example.com/rock_poster_v1.png",
"engine_used": "mock_engine_v1",
"chat_id": "chat2",
}
}
]
user_message_2 = "Make the font psychedelic."
print(f"\n--- Test Case 2: Refinement ---")
print(f"User: {user_message_2}")
response_2 = await image_generation_intent_detector(user_message_2, chat_history_2, MockRequest(), "chat2")
print(f"Assistant: {response_2}")
user_message_3 = "What is the capital of France?"
print(f"\n--- Test Case 3: No Intent ---")
print(f"User: {user_message_3}")
response_3 = await image_generation_intent_detector(user_message_3, chat_history_1, MockRequest(), "chat3")
print(f"Assistant: {response_3}")
user_message_4 = "Generate an image" # Needs clarification
print(f"\n--- Test Case 4: Needs Clarification ---")
print(f"User: {user_message_4}")
response_4 = await image_generation_intent_detector(user_message_4, chat_history_1, MockRequest(), "chat4")
print(f"Assistant: {response_4}")
user_message_5 = "Please draw a picture of error_prompt"
print(f"\n--- Test Case 5: Image Service Error ---")
print(f"User: {user_message_5}")
response_5 = await image_generation_intent_detector(user_message_5, chat_history_1, MockRequest(app=MockApp()), "chat5")
print(f"Assistant: {response_5}")
if __name__ == "__main__":
import asyncio
import json # for MockApp
# Setup basic logging for the example
logging.basicConfig(level=logging.DEBUG)
# asyncio.run(example_usage()) # Commented out for tool environment
pass

View File

@ -9,6 +9,7 @@ passlib[bcrypt]==1.7.4
requests==2.32.3
aiohttp==3.11.11
httpx>=0.25.0
async-timeout
aiocache
aiofiles

1
readme_update_draft.md Normal file
View File

@ -0,0 +1 @@
- **🎨 AI-Powered Design Generation:** Effortlessly create and refine a variety of visual designs—such as posters, logos, and social media graphics—directly through chat. Simply describe your desired design, and Open WebUI will generate visuals using its integrated image generation capabilities. Iteratively perfect your designs with follow-up messages, making design creation intuitive and conversational.