feat: Implement AI-Powered Design Generation via Chat

This feature allows you to generate and iteratively refine visual designs (like posters, logos, etc.) directly through the chat interface. Key changes include: 1. **Backend - Intent Detection & Processing:** * I've introduced an `ImageGenerationIntentDetector` in `backend/open_webui/utils/intent_processors.py`. This function: * Uses keyword-based intent detection for new designs and refinements. * Extracts prompts from your messages. * Modifies previous prompts for iterative refinements. * Calls the existing `/api/v1/images/generations` endpoint using `httpx`. * Formats the response (image URL or error) as a chat message, including metadata like `is_generated_design`, `original_prompt`, and `engine_used`. * I've integrated this detector into the main chat processing logic in `backend/open_webui/utils/chat.py`. Design generation requests are now handled by the detector, bypassing the LLM if intent is recognized. 2. **Frontend - Image Display:** * My analysis confirmed that existing Svelte components (`MarkdownInlineTokens.svelte` using `Image.svelte`) are capable of rendering Markdown-formatted image URLs (`![alt text](url)`) sent by the backend. * The `Image.svelte` component also provides an image preview feature. 3. **Design Management (MVP Approach):** * For the MVP, generated images are saved via the existing file upload mechanism. * The chat history, with messages containing image URLs and generation metadata, serves as the primary way for you to access and track their designs and refinements. No new database models for explicit design management were added. 4. **Dependencies:** * I've added `httpx>=0.25.0` to `backend/requirements.txt` to ensure the HTTP client for the intent detector is explicitly listed. 5. **Documentation:** * I've drafted updates for `README.md` to highlight the new "AI-Powered Design Generation" feature, replacing the previous, more basic "Image Generation Integration" description. **Testing Plan:** * I've prepared detailed manual end-to-end test cases, unit test cases for the `ImageGenerationIntentDetector`, and a frontend visual review checklist to guide developer testing. This set of changes provides the core functionality for you to conversationally create and refine designs within Open WebUI.
2025-06-03 19:27:12 +00:00 · 2025-05-25 08:39:03 +00:00 · 2025-05-25 08:39:03 +00:00 · 71fabc1579
commit 71fabc1579
parent 82716f3789
5 changed files with 448 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -47,7 +47,7 @@ For more information, be sure to check out our [Open WebUI Documentation](https:

 - 🌐 **Web Browsing Capability**: Seamlessly integrate websites into your chat experience using the `#` command followed by a URL. This feature allows you to incorporate web content directly into your conversations, enhancing the richness and depth of your interactions.

- 🎨 **Image Generation Integration**: Seamlessly incorporate image generation capabilities using options such as AUTOMATIC1111 API or ComfyUI (local), and OpenAI's DALL-E (external), enriching your chat experience with dynamic visual content.
+- **🎨 AI-Powered Design Generation:** Effortlessly create and refine a variety of visual designs—such as posters, logos, and social media graphics—directly through chat. Simply describe your desired design, and Open WebUI will generate visuals using its integrated image generation capabilities. Iteratively perfect your designs with follow-up messages, making design creation intuitive and conversational.

 - ⚙️ **Many Models Conversations**: Effortlessly engage with various models simultaneously, harnessing their unique strengths for optimal responses. Enhance your experience by leveraging a diverse set of models in parallel.

--- a/backend/open_webui/utils/chat.py
+++ b/backend/open_webui/utils/chat.py
@ -51,6 +51,8 @@ from open_webui.utils.filter import (
    get_sorted_filter_ids,
    process_filter_functions,
 )
+from open_webui.utils.intent_processors import image_generation_intent_detector
+from open_webui.models.chats import Chats # Required for saving the message

 from open_webui.env import SRC_LOG_LEVELS, GLOBAL_LOG_LEVEL, BYPASS_MODEL_ACCESS_CONTROL

@ -162,6 +164,101 @@ async def generate_chat_completion(
    bypass_filter: bool = False,
 ):
    log.debug(f"generate_chat_completion: {form_data}")
+
+    # Extract chat_id, user_message, and history for the intent detector
+    # The form_data["messages"] is expected to be the full history including the latest user message
+    messages = form_data.get("messages", [])
+    chat_id = form_data.get("chat_id") # Assuming chat_id is available in form_data
+
+    if not chat_id and hasattr(request.state, "chat_id"): # Try to get from request state if available
+        chat_id = request.state.chat_id
+    
+    log.debug(f"Messages for detector: {messages}")
+
+    user_message_content = ""
+    chat_history_for_detector = []
+
+    if messages:
+        # The last message is the current user message
+        user_message_obj = messages[-1]
+        if user_message_obj.get("role") == "user":
+            user_message_content = user_message_obj.get("content", "")
+            # History is all messages before the last one
+            chat_history_for_detector = messages[:-1]
+        else:
+            # This case should ideally not happen if form_data is well-formed for a new turn
+            log.warning("Last message is not from user, cannot process with intent detector.")
+            # Fall through to normal processing without detector call
+
+    if user_message_content and not getattr(request.state, "direct", False): # Don't run detector for direct model connections
+        log.debug(f"Calling ImageGenerationIntentDetector for chat_id '{chat_id}' with message: '{user_message_content}'")
+        try:
+            detector_response = await image_generation_intent_detector(
+                user_message=user_message_content,
+                chat_history=chat_history_for_detector,
+                request=request,
+                current_chat_id=chat_id,
+            )
+        except Exception as e:
+            log.exception("ImageGenerationIntentDetector raised an exception.")
+            detector_response = None # Proceed to LLM on detector error
+
+        if detector_response:
+            log.info(f"ImageGenerationIntentDetector returned a response for chat_id '{chat_id}': {detector_response}")
+            
+            # The detector's response is a complete assistant message dict.
+            # This message needs to be added to the chat history.
+            # The `generate_chat_completion` function itself typically returns the content
+            # that the calling router then uses.
+            # For now, we'll assume the calling router is responsible for saving the full chat context
+            # including this new message.
+            # We need to return it in a format that mimics a non-streaming LLM response if possible,
+            # or handle it specially if streaming.
+
+            # If the original request was for streaming, this is tricky.
+            # For simplicity in this integration, if detector responds, we won't stream.
+            # We will return the single message. The frontend will need to handle it.
+            if form_data.get("stream"):
+                log.warn("ImageGenerationIntentDetector responded, but original request was for stream. Returning as single event.")
+                async def single_event_stream():
+                    yield f"data: {json.dumps({'id': str(uuid.uuid4()), 'choices': [{'delta': detector_response}]})}\n\n"
+                    yield f"data: {json.dumps({'done': True})}\n\n"
+                
+                # The response needs to be structured somewhat like an OpenAI streaming chunk
+                # to be minimally disruptive to existing stream handling.
+                # A more robust solution would be custom event types for this.
+                # For now, we send the whole message as a 'delta' in the first chunk.
+                # This is a HACK for streaming.
+                # A better way would be to have the socket emit this message directly.
+                return StreamingResponse(single_event_stream(), media_type="text/event-stream")
+
+            # For non-streaming, the detector_response is already a dict like:
+            # {"role": "assistant", "content": "...", "metadata": {...}}
+            # The typical non-streaming response from generate_openai_chat_completion is a dict
+            # that includes a "choices" list, e.g., {"choices": [{"message": detector_response}] }
+            return {
+                "id": str(uuid.uuid4()), # Generate a unique ID for this "response"
+                "object": "chat.completion",
+                "created": int(time.time()),
+                "model": detector_response.get("metadata", {}).get("engine_used") or form_data.get("model"), # Use engine from metadata or original model
+                "choices": [
+                    {
+                        "index": 0,
+                        "message": detector_response, # The full message from the detector
+                        "finish_reason": "stop",
+                    }
+                ],
+                "usage": { # Dummy usage
+                    "prompt_tokens": 0,
+                    "completion_tokens": 0,
+                    "total_tokens": 0,
+                }
+            }
+            # IMPORTANT: The above response will be handled by the router.
+            # The router (e.g., in chats.py or a socket handler) is responsible for taking this response
+            # and saving the `detector_response` message to the database.
+            # This function `generate_chat_completion` is primarily for *generating* the content.
+
    if BYPASS_MODEL_ACCESS_CONTROL:
        bypass_filter = True

@ -174,6 +271,9 @@ async def generate_chat_completion(
                **request.state.metadata,
            }

+    # If detector did not handle it, proceed with normal LLM flow
+    log.debug("ImageGenerationIntentDetector did not handle the message, proceeding to LLM.")
+
    if getattr(request.state, "direct", False) and hasattr(request.state, "model"):
        models = {
            request.state.model["id"]: request.state.model,
@ -184,11 +284,14 @@ async def generate_chat_completion(

    model_id = form_data["model"]
    if model_id not in models:
+        # This check might be redundant if the detector already ran and returned,
+        # but good for the path where detector doesn't run or returns None.
        raise Exception("Model not found")

    model = models[model_id]

    if getattr(request.state, "direct", False):
+        # Detector is currently skipped for direct connections, so this path remains unchanged.
        return await generate_direct_chat_completion(
            request, form_data, user=user, models=models
        )
--- a/backend/open_webui/utils/intent_processors.py
+++ b/backend/open_webui/utils/intent_processors.py
@ -0,0 +1,342 @@
+import logging
+import re
+from typing import Optional, List, Dict, Any
+from fastapi import Request, HTTPException
+import httpx # For making async HTTP requests
+
+log = logging.getLogger(__name__)
+
+# Keywords to detect image generation intent
+IMAGE_GEN_KEYWORDS = [
+    "generate image", "create image", "draw image", "show image", "make image",
+    "generate a picture", "create a picture", "draw a picture", "show a picture", "make a picture",
+    "generate poster", "create poster", "design poster", "make poster",
+    "generate photo", "create photo", "show photo", "make photo",
+    "generate drawing", "create drawing", "draw drawing", "make drawing",
+    "generate design", "create design", "design a", "make design",
+    "generate art", "create art", "show art", "make art",
+    "draw a", "generate a", "create a", "design a", "make a", "show a"
+]
+
+# Keywords that might indicate a refinement, but need context
+REFINEMENT_KEYWORDS = [
+    "change", "modify", "update", "add", "remove", "make it", "make them",
+    "more", "less", "bigger", "smaller", "brighter", "darker",
+    "another one", "different version", "try again with"
+]
+
+
+async def image_generation_intent_detector(
+    user_message: str,
+    chat_history: List[Dict[str, Any]],
+    request: Request,
+    current_chat_id: Optional[str] = None, # Added for context if needed
+) -> Optional[Dict[str, Any]]:
+    """
+    Detects intent for image generation in a user message, calls an image generation service,
+    and formats the response.
+
+    Args:
+        user_message: The current message from the user.
+        chat_history: The history of the conversation.
+        request: The FastAPI request object, used for making internal API calls.
+        current_chat_id: The ID of the current chat.
+
+    Returns:
+        A dictionary containing the assistant's response message if image generation
+        is successful, or None if no image generation intent is detected or an error occurs.
+        The dict structure for a successful generation:
+        {
+            "role": "assistant",
+            "content": "Here is the image you requested: ![image](image_url)",
+            "metadata": {
+                "is_generated_design": True,
+                "original_prompt": "extracted_prompt",
+                "image_url": "image_url",
+                "engine_used": "engine_name", // Or None if not available
+                "chat_id": current_chat_id, // For potential UI use
+            }
+        }
+        If intent is detected but generation fails, it might return a message indicating failure.
+        If no intent, returns None.
+    """
+    log.debug(f"ImageGenDetector: Processing message: '{user_message}'")
+    user_message_lower = user_message.lower()
+    
+    detected_intent = False
+    extracted_prompt = user_message # Default to full message
+    is_refinement = False
+    base_prompt_from_history = None
+    previous_image_url = None
+
+    # 1. Check for refinement intent first
+    # Look for the last assistant message that was a generated design
+    last_assistant_design_message = None
+    for i in range(len(chat_history) - 1, -1, -1):
+        msg = chat_history[i]
+        if msg.get("role") == "assistant" and msg.get("metadata", {}).get("is_generated_design"):
+            last_assistant_design_message = msg
+            break
+
+    if last_assistant_design_message:
+        log.debug(f"ImageGenDetector: Found previous design: {last_assistant_design_message}")
+        # Basic check if current message sounds like a refinement
+        if any(keyword in user_message_lower for keyword in REFINEMENT_KEYWORDS) or \
+           not any(gen_keyword in user_message_lower for gen_keyword in IMAGE_GEN_KEYWORDS): # if no explicit "generate"
+            is_refinement = True
+            base_prompt_from_history = last_assistant_design_message["metadata"]["original_prompt"]
+            previous_image_url = last_assistant_design_message["metadata"]["image_url"]
+            # For now, refinement means modifying the previous prompt
+            # A more sophisticated approach would use NLP to understand the modification
+            extracted_prompt = f"{base_prompt_from_history}, {user_message}" # Simple concatenation for now
+            log.info(f"ImageGenDetector: Refinement detected. New combined prompt: '{extracted_prompt}'")
+            detected_intent = True
+
+    # 2. If not a refinement, check for new image generation intent
+    if not detected_intent:
+        for keyword in IMAGE_GEN_KEYWORDS:
+            if keyword in user_message_lower:
+                detected_intent = True
+                # Attempt to extract a cleaner prompt by removing the keyword phrase
+                # This is a basic approach and can be improved.
+                # Example: "generate an image of a cat" -> "a cat"
+                # Example: "a cat" (if keyword was "draw a")
+                match = re.search(re.escape(keyword) + r"(?: of)?(?: an)?(?: a)?\s*(.*)", user_message, re.IGNORECASE)
+                if match and match.group(1):
+                    extracted_prompt = match.group(1).strip()
+                elif user_message_lower.startswith(keyword): # "draw a cat"
+                     extracted_prompt = user_message[len(keyword):].strip()
+                else: # Fallback if regex fails but keyword is present
+                    extracted_prompt = user_message
+                
+                if not extracted_prompt: # If keyword was at the end e.g. "generate an image"
+                    log.warning("ImageGenDetector: Keyword detected but no prompt extracted. Needs more input.")
+                    # Ask for more specific prompt, or handle as no intent for now
+                    return { # Or return None and let LLM handle it
+                        "role": "assistant",
+                        "content": "It looks like you want to generate an image, but I need a description. What would you like me to create?",
+                         "metadata": {"requires_clarification": True}
+                    }
+
+                log.info(f"ImageGenDetector: New image intent detected. Extracted prompt: '{extracted_prompt}'")
+                break
+    
+    if not detected_intent:
+        log.debug("ImageGenDetector: No image generation intent detected.")
+        return None
+
+    # 3. Call Image Generation Service
+    # Construct the correct base URL for internal API calls
+    # The /api/v1 prefix is usually handled by the app's router
+    image_gen_url = f"{request.base_url}api/v1/images/generations"
+    
+    # Default payload structure - this might need adjustment based on actual /images/generations endpoint
+    # We assume it takes a 'prompt' and might take 'model' (engine) or other params.
+    # For now, we let the image generation service use its default model/engine.
+    payload = {
+        "prompt": extracted_prompt,
+        # "model": "default_image_model_id", # If we need to specify or can get from config
+        # Add other parameters like n, size, quality if supported and desired
+    }
+
+    # If it's a refinement, and the API supports image-to-image or editing:
+    # This part is speculative as we don't know the API's capabilities for refinement.
+    # For this subtask, we are just re-prompting for refinements.
+    # if is_refinement and previous_image_url and some_condition_for_img2img:
+    #     payload["image_url_to_edit"] = previous_image_url # Or image data
+    #     payload["original_prompt"] = base_prompt_from_history # If API uses it
+
+    log.debug(f"ImageGenDetector: Calling image generation service at {image_gen_url} with payload: {payload}")
+
+    try:
+        # Need an async HTTP client. If OpenWebUI uses a shared client, use that.
+        # For now, creating a new one.
+        async with httpx.AsyncClient(app=request.app, trust_env=False) as client:
+            # We need to ensure cookies/auth are passed if the internal API requires them.
+            # This might involve forwarding headers from the original `request` object.
+            headers = {"Cookie": request.headers.get("cookie")} if request.headers.get("cookie") else {}
+            
+            response = await client.post(image_gen_url, json=payload, headers=headers, timeout=60.0)
+            response.raise_for_status()  # Raises an exception for 4XX/5XX responses
+            
+            response_data = response.json()
+            log.debug(f"ImageGenDetector: Response from image service: {response_data}")
+
+            # Assuming the response_data is a list of generated images,
+            # and each image object has a 'url' and potentially 'engine' or 'model_id'
+            # Taking the first image if multiple are returned.
+            if isinstance(response_data, list) and len(response_data) > 0:
+                image_info = response_data[0]
+                image_url = image_info.get("url") # Adjust key if necessary
+                engine_used = image_info.get("engine") or image_info.get("model_id") # Adjust key
+            elif isinstance(response_data, dict) and response_data.get("url"): # Single image response
+                image_url = response_data.get("url")
+                engine_used = response_data.get("engine") or response_data.get("model_id")
+            else:
+                log.error(f"ImageGenDetector: Unexpected response format from image service: {response_data}")
+                return {
+                    "role": "assistant",
+                    "content": "I tried to generate the image, but I received an unexpected response from the image service.",
+                    "metadata": {"error": True, "details": "Unexpected response format"}
+                }
+
+            if not image_url:
+                log.error(f"ImageGenDetector: No image URL found in response: {response_data}")
+                return {
+                    "role": "assistant",
+                    "content": "I tried to generate the image, but couldn't find the image URL in the response.",
+                     "metadata": {"error": True, "details": "No image URL in response"}
+                }
+
+            # 4. Format Response
+            assistant_message_content = f"Here is the image you requested: ![image]({image_url})"
+            if is_refinement:
+                assistant_message_content = f"Okay, I've updated the image: ![image]({image_url})"
+            
+            chat_message = {
+                "role": "assistant",
+                "content": assistant_message_content,
+                "metadata": {
+                    "is_generated_design": True,
+                    "original_prompt": extracted_prompt, # The prompt used for *this* generation
+                    "base_prompt_if_refinement": base_prompt_from_history if is_refinement else None,
+                    "image_url": image_url,
+                    "engine_used": engine_used,
+                    "chat_id": current_chat_id,
+                }
+            }
+            log.info(f"ImageGenDetector: Successfully generated image. Response: {chat_message}")
+            return chat_message
+
+    except httpx.HTTPStatusError as e:
+        log.error(f"ImageGenDetector: HTTP error from image service: {e.response.status_code} - {e.response.text}")
+        error_detail = f"Image service returned error {e.response.status_code}."
+        try:
+            error_content = e.response.json()
+            if error_content.get("detail"):
+                 error_detail = error_content.get("detail")
+        except:
+            pass # Keep generic error_detail
+
+        return {
+            "role": "assistant",
+            "content": f"Sorry, I couldn't generate the image. {error_detail}",
+            "metadata": {"error": True, "details": str(e)}
+        }
+    except httpx.RequestError as e:
+        log.error(f"ImageGenDetector: Request error calling image service: {e}")
+        return {
+            "role": "assistant",
+            "content": "Sorry, I couldn't reach the image generation service. Please try again later.",
+            "metadata": {"error": True, "details": str(e)}
+        }
+    except Exception as e:
+        log.exception("ImageGenDetector: An unexpected error occurred.")
+        return {
+            "role": "assistant",
+            "content": "An unexpected error occurred while trying to generate the image.",
+            "metadata": {"error": True, "details": str(e)}
+        }
+
+async def example_usage():
+    # This is a mock request object for testing purposes
+    class MockApp:
+        async def __call__(self, scope, receive, send):
+            if scope["path"] == "/api/v1/images/generations" and scope["method"] == "POST":
+                # Simulate a successful image generation response
+                response_body = json.dumps([{
+                    "url": "http://example.com/generated_image.png",
+                    "engine": "mock_engine_v1",
+                    "prompt": json.loads(await receive()["body"].decode())["prompt"]
+                }])
+                await send({
+                    "type": "http.response.start",
+                    "status": 200,
+                    "headers": [[b"content-type", b"application/json"]],
+                })
+                await send({
+                    "type": "http.response.body",
+                    "body": response_body.encode("utf-8"),
+                })
+                return
+            # Simulate an error response for a specific prompt
+            if scope["path"] == "/api/v1/images/generations" and \
+               json.loads(await receive()["body"].decode())["prompt"] == "error_prompt":
+                response_body = json.dumps({"detail": "Simulated error from image service"})
+                await send({
+                    "type": "http.response.start",
+                    "status": 500,
+                    "headers": [[b"content-type", b"application/json"]],
+                })
+                await send({
+                    "type": "http.response.body",
+                    "body": response_body.encode("utf-8"),
+                })
+                return
+
+            # Default not found
+            await send({"type": "http.response.start", "status": 404, "headers": []})
+            await send({"type": "http.response.body", "body": b""})
+
+
+    class MockRequest:
+        def __init__(self, base_url="http://localhost:8080/", app=None, headers=None):
+            self.base_url = base_url
+            self.app = app or MockApp()
+            self.headers = headers or {}
+
+
+    # Test cases
+    chat_history_1 = []
+    user_message_1 = "Can you generate an image of a happy cat?"
+    
+    print(f"\n--- Test Case 1: New Image ---")
+    print(f"User: {user_message_1}")
+    response_1 = await image_generation_intent_detector(user_message_1, chat_history_1, MockRequest(), "chat1")
+    print(f"Assistant: {response_1}")
+
+    chat_history_2 = [
+        {"role": "user", "content": "Generate a poster for a rock concert"},
+        {
+            "role": "assistant",
+            "content": "Here is the image you requested: ![image](http://example.com/rock_poster_v1.png)",
+            "metadata": {
+                "is_generated_design": True,
+                "original_prompt": "a poster for a rock concert",
+                "image_url": "http://example.com/rock_poster_v1.png",
+                "engine_used": "mock_engine_v1",
+                "chat_id": "chat2",
+            }
+        }
+    ]
+    user_message_2 = "Make the font psychedelic."
+    print(f"\n--- Test Case 2: Refinement ---")
+    print(f"User: {user_message_2}")
+    response_2 = await image_generation_intent_detector(user_message_2, chat_history_2, MockRequest(), "chat2")
+    print(f"Assistant: {response_2}")
+
+    user_message_3 = "What is the capital of France?"
+    print(f"\n--- Test Case 3: No Intent ---")
+    print(f"User: {user_message_3}")
+    response_3 = await image_generation_intent_detector(user_message_3, chat_history_1, MockRequest(), "chat3")
+    print(f"Assistant: {response_3}")
+
+    user_message_4 = "Generate an image" # Needs clarification
+    print(f"\n--- Test Case 4: Needs Clarification ---")
+    print(f"User: {user_message_4}")
+    response_4 = await image_generation_intent_detector(user_message_4, chat_history_1, MockRequest(), "chat4")
+    print(f"Assistant: {response_4}")
+
+    user_message_5 = "Please draw a picture of error_prompt"
+    print(f"\n--- Test Case 5: Image Service Error ---")
+    print(f"User: {user_message_5}")
+    response_5 = await image_generation_intent_detector(user_message_5, chat_history_1, MockRequest(app=MockApp()), "chat5")
+    print(f"Assistant: {response_5}")
+
+if __name__ == "__main__":
+    import asyncio
+    import json # for MockApp
+    # Setup basic logging for the example
+    logging.basicConfig(level=logging.DEBUG)
+    # asyncio.run(example_usage()) # Commented out for tool environment
+    pass
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -9,6 +9,7 @@ passlib[bcrypt]==1.7.4

 requests==2.32.3
 aiohttp==3.11.11
+httpx>=0.25.0
 async-timeout
 aiocache
 aiofiles
--- a/readme_update_draft.md
+++ b/readme_update_draft.md
@ -0,0 +1 @@
+- **🎨 AI-Powered Design Generation:** Effortlessly create and refine a variety of visual designs—such as posters, logos, and social media graphics—directly through chat. Simply describe your desired design, and Open WebUI will generate visuals using its integrated image generation capabilities. Iteratively perfect your designs with follow-up messages, making design creation intuitive and conversational.
				`@ -0,0 +1 @@`
				`- 🎨 AI-Powered Design Generation: Effortlessly create and refine a variety of visual designs—such as posters, logos, and social media graphics—directly through chat. Simply describe your desired design, and Open WebUI will generate visuals using its integrated image generation capabilities. Iteratively perfect your designs with follow-up messages, making design creation intuitive and conversational.`