Updated LLM prompts and System message

2025-07-02 13:20:42 +02:00
parent 6b9b5a60e9
commit f192cba1f8
4 changed files with 72 additions and 66 deletions
--- a/app/Browser/Jobs/InstagramRepost/InstagramRepostJob.php
+++ b/app/Browser/Jobs/InstagramRepost/InstagramRepostJob.php
@@ -402,32 +402,30 @@ class InstagramRepostJob extends BrowserJob implements ShouldBeUniqueUntilProces
        $llmAnswer = $this->openAPIPrompt->generate(
            config('llm.models.chat.name'),
            "Original Caption: {$originalDescription}
-Video Description/Directive: {$reelDescription}",
+llm_description: {$reelDescription}",
            [],
            outputFormat: '{"type": "object", "properties": {"answer": {"type": "string"}}, "required": ["answer"]}',
-            systemMessage: "You are an AI assistant specialized in creating engaging and concise Instagram Reel captions. Your primary task is to transform the provided original caption (often from Twitter) and description/directions into a fresh, unique, but still relevant caption for Instagram Reels format.
+            systemMessage: "You are an expert Instagram caption writer. Your primary goal is to create short, engaging, concise captions for social media reels that capture the fun or relatability of the content without simply describing it like a transcript summary.

-Key instructions:
-1.  **Analyze Input:** You will receive two things: an *original reel caption* (usually starting with \"credit:\" or mentioning a Twitter handle like `t/TwitterUser`), and either a *video description* or explicit directions about the joke/idea behind the video.
-2.  **Transform, Don't Reproduce:** Your output must be significantly different from the original provided caption. It should capture the essence of the content described but phrase it anew – often with humor if appropriate.
-3.  **Keep it Short & Punchy:** Instagram Reels thrive on quick engagement. Prioritize brevity (ideally under two lines, or three lines max) and impact. Make sure your caption is concise enough for fast-scroll viewing.
-4.  **Maintain the Core Idea:** The new caption must directly relate to the video's content/direction/joke without simply restating it like a description would. Focus on what makes the reel *interesting* or *funny* in its own right.
-5.  **Preserve Original Credit (Optional):** If an explicit \"credit\" line is provided, you may incorporate this into your new caption naturally, perhaps using `(via...)` or similar phrasing if it fits well and doesn't sound awkward. **Do not** include any original Instagram account mentions (@handles). They are often intended for promotion which isn't our goal.
-6.  **Use Emoji Judiciously:** Incorporate relevant emojis to enhance the tone (funny, relatable, etc.) or add visual interest. Use them purposefully and in moderation – they should complement the caption, not overwhelm it.
-7.  **Add Hashtags (Optional but Recommended):** Generate a few relevant Instagram hashtags automatically at the end of your output to increase visibility. Keep these organic to the content and avoid forcing irrelevant tags.
+Captions must:
+1.  Be brief and punchy.
+2.  Capture the essence or mood of the video.
+3.  Relate directly to the provided description (if available) or the core concept if no specific LLM description is given, but avoid copying phrasing awkwardly.
+4.  Encourage engagement relevant to the platform's algorithm (e.g., asking a question related to the joke/scene).
+5.  Optionally include relevant hashtags at the end (#hashtagsOnly), chosen appropriately for the reel's content or vibe. Use common tags if no specific ones are provided, but avoid overly generic ones unless fitting.
+6.  If credit information is provided in the input (e.g., `credit: twitteruser`), acknowledge it minimally within the caption text *using only that source*. Do not invent any account handles (`@`) or platform prefixes (`tt/`). Use phrases like "Credited to..." or simply insert the credited name if appropriate, but don't force it unless the core concept naturally includes attribution. If no credit is provided, do not mention a specific creator.

-Your response structure is as follows:
-   The generated caption (your core answer).
-   Then, if you generate any hashtags, list them on the next line(s) prefixed with `#`.
+**Do Not:**
+*   Start captions with 'This reel...' or similar intros.
+*   Describe the video content directly (replacing the LLM description role).
+*   Include platform-specific mentions (`tt/`, `@`) unless naturally part of the credited source's name format itself and used minimally as instructed for credit handling.
+*   Use overly complex sentences, slang that doesn't fit (#hashtags can be used), or long-winded explanations. Keep it to 1-3 short lines maximum.

-Example Input Structure:
-Original Caption: credit: t/otherhandle This banana is looking fly today!
-Video Description/Directive: A man walks into a store holding a banana and wearing sunglasses. He looks around confidently before leaving.
+**Emojis:**
+*   Feel free to use emojis in moderation (e.g., 😂, 🤣, 😜, 👀) to add visual flair and emotion.
+*   They should enhance the caption but not be the *main* focus. Avoid excessive or random emojis that look unprofessional.

-Your answer should only contain the generated caption, and optionally hashtags if relevant.
-
-Remember to be creative and ensure the generated caption feels like something you would see naturally on an Instagram Reel. Aim for personality and relevance.
-",
+Your response format must strictly adhere to JSON with only one required field: `answer`. Provide ONLY the generated caption string in this `answer` field, no explanations, markdown formatting, or other text.",
            keepAlive: true,
            shouldThink: config('llm.models.chat.shouldThink')
        );
--- a/app/Services/AIPrompt/OpenAPIPrompt.php
+++ b/app/Services/AIPrompt/OpenAPIPrompt.php
@@ -13,7 +13,7 @@ class OpenAPIPrompt implements IAIPrompt
    private ?string $token = null;

    public function __construct(?string $host = null) {
-        //dd($host ?? config('llm.api.host'));
+        //dd($host ?? config('llm.api.host')); // DEBUG TODO : Is null ? so thows error because $host is normally of type non null string
        $this->host = $host ?? config('llm.api.host');
        if (config('llm.api.token')) {
            $this->token = config('llm.api.token');
--- a/app/Services/FileTools/VideoDescriptor/OCRLLMVideoDescriptor.php
+++ b/app/Services/FileTools/VideoDescriptor/OCRLLMVideoDescriptor.php
@@ -88,46 +88,35 @@ Please analyze the image carefully and provide a description focusing purely on
        // Step 5: Ask an LLM to describe the video based on the combined descriptions
        $llmDescription = $this->llm->generate(
            config('llm.models.chat.name'),
-            static::DESCRIPTION_PROMPT . $combinedDescription . "\n\nBased only on these frame analyses, please provide:
+            static::DESCRIPTION_PROMPT . $combinedDescription . "\n\nYou are analyzing an Instagram Reel (a short-form video). You have received multiple frames from this reel. For each frame:

-     A single, concise description that captures the main action or theme occurring in the reel across all frames.
-     Identify and describe any joke or humorous element present in the video if you can discern one.
+1.  A **screenshot number** is given (e.g., `Screenshot : 3`).
+2.  The approximate **timestamp in seconds** within the video where that frame occurs.
+3.  An **OCR result** which contains text extracted directly from an image of this frame, potentially including OCR errors or unusual characters.
+4.  A description provided by another LLM for that specific frame (the `LLM Description`).

+Your task is to synthesize a single, coherent video description summarizing the entire reel (`the whole thing`). Use all the information (screenshot number, timestamp, OCR, and llm_description) but be aware that individual descriptions may be inaccurate due to poor image quality or interpretation errors. Look for consistency across multiple frames.

-Important Considerations
+Analyze the sequence of events, character(s), setting, style (e.g., fast cuts, slow-motion), narrative structure (if any), humor, and joke elements throughout the video based on these frame-by-frame inputs. Pay special attention to identifying if there's an underlying joke or humorous concept running through the reel.

-     Remember that most videos are of poor quality; frame descriptions might be inaccurate, vague, or contradictory due to blurriness or fast cuts.
-     Your task is synthesis: focus on the overall impression and sequence, not perfecting each individual piece of information. Some details mentioned in one analysis may simply be incorrect or misidentified from another perspective.
-
-
-Analyze all provided frames (separated by --- for clarity) to understand what's happening. Then, synthesize this understanding into point 1 above and identify the joke if present as per point 2.",
+Based on your analysis, write a concise description (`the whole thing`) that captures the essence of this Instagram Reel. Format your output strictly as JSON with only the `answer` field containing this synthesized summary.",
            outputFormat: '{"type": "object", "properties": {"answer": {"type": "string"}}, "required": ["answer"]}',
-            systemMessage: "You are an expert social media content analyst specializing in interpreting Instagram Reels. Your primary function is to generate a comprehensive description and identify any underlying humor or joke in a given video sequence. You will be provided with individual frame analyses, each containing:
+            systemMessage: "You are an AI assistant specialized in analyzing video content, particularly short-form videos like Instagram Reels. Your task is to synthesize a single description for the entire video based on sequential information provided from its screenshots and associated text data (OCR results).

-     Screenshot Number: The sequential number of the frame.
-     Timestamp: When that specific frame occurs within the reel.
-     OCR Text Result: Raw text extracted from the image content using OCR (Optical Character Recognition), which may contain errors or misinterpretations (\"may appear\" descriptions).
-     LLM Description of Screenshot: A textual interpretation of what's visible in the frame, based on previous LLM processing.
+Your response must strictly follow this JSON format:
+{\"answer\": \"<your final synthesized video description here as a string>\"}

+## Rules
+1.  Analyze all provided inputs: screenshot number, timestamp, OCR result snippet, and LLM description for each frame.
+2.  The core goal is to produce one concise, coherent, and engaging video description that captures the essence of the entire reel ("the whole thing").
+3.  Individual frame descriptions can be inaccurate or contradictory (e.g., object changes drastically between frames). Prioritize consistency across multiple frames unless strongly contradicted by a clear majority.
+4.  Do not generate separate JSON objects for each screenshot; only produce one final `answer` string summarizing the video as a whole at the end of your reasoning.
+5.  Pay special attention to identifying any underlying joke, humor, or satirical element present in the reel based on the collective information.

-Please note:
-
-     The individual frame analyses can be inconsistent due to low video quality (e.g., blurriness) or rapid scene changes where details are hard to distinguish.
-     Your task is not to perfect each frame description but to understand the overall sequence and likely narrative, focusing on identifying any joke, irony, absurdity, or humorous transformation occurring across these frames.
-
-
-Your response should be structured as follows:
-
-     Overall Video Description: Provide a concise summary of what happens in the reel based on the combined information from all the provided screenshots.
-     Humor/Joke Identification (If Applicable): If you can discern any joke or humorous element, explicitly state it and explain how the sequence of frames contributes to this.
-
-
-Instructions for Synthesis:
-
-     Focus on identifying recurring elements, main subject(s), consistent actions/actions that seem unlikely (potential contradiction).
-     Look for patterns where details change rapidly or absurdly.
-     Prioritize information from descriptions over relying solely on OCR text if the description seems more plausible. Ignore minor inconsistencies between frames unless they clearly contradict a central theme or joke premise.
-     Be ready to point out where the humor lies, which might involve unexpected changes, wordplay captured by OCR errors in the context of the visual action described, absurdity, or irony.",
+## Output Constraints
+-   Your response **MUST** be ONLY valid JSON conforming to the structure: {\"type\": \"object\", \"properties\": {\"answer\": {\"type\": \"string\"}}, \"required\": [\"answer\"]}.
+-   Only fill the `answer` field. Do not include any other text or explanations outside this JSON structure.
+-   The `answer` string should be a comprehensive description of the video, suitable for representing it to another user on a platform like Instagram/YouTube Shorts.",
            keepAlive: true,
            shouldThink: config('llm.models.chat.shouldThink')
        );