Some checks failed
Push image to registry / build-image (push) Failing after 51s
6.9 KiB
6.9 KiB
What is this file ?
This file provides the user prompts used to get the prompts directly from the LLM used to give answers.
For example, for the Instagram reel caption generation, here will be listed a prompt that asks the LLM to give the prompt, system message and output format that will be used in the Instagram reel caption generation.
This method comes from the idea that the best way to prompt engineer is to ask the concerned model to generate it directly.
Prompts
Starting sentence is usually :
I’m using some LLM and I would need a prompt and a system message for every use case I will give you.
I’m using structured JSON output provided by the openAI API. The output structure is a simple {"type": "object", "properties": {"answer": {"type": "string"}}, "required": ["answer"]}, so only “answer” can be filled. For the input, everything will be given in the prompt. Give me the system message and prompt separately, preferably in text format.
Instagram Reel caption generation
I’m using some LLM and I would need a prompt, a system message for every use case I will give you.
The first one is when I’m trying to generate a caption for an instagram Reel. For the moment, I can give the LLM the original instagram reel caption that was downloaded from, and a description by an LLM of the video, or the joke behind it.
The caption must be short and well placed with the reel. For example, if the reel is funny, the caption must be short and funny, while still relating to the reel. The caption must not be describing the video like the LLM description does (for example this bad example describe the content of the video instead of doing a caption based on the description given : “Three animated friends chilling in the woods at night until someone's phone inevitably starts ringing somewhere nearby... 😅🌲✨” or this one : “This reel shows me trying to make my sad texts shorter with ChatGPT, but it just frustrates me more! 😅😂”).
It also shouldn’t begin with something like ‘This reel…’. For example this is a bad output : “This reel hilariously mocks every awkward fan reaction to those intense DCU movie scenes. 🎭 #DCFanDrama”
The LLM can add some appropriate hashtags if it wants to and seem appropriate.
Sometimes, the original caption will credit the original author, most of the times on twitter like (“credit : t/twitteruser”). Those credit can appear in the generated caption too, But I don’t want any instagram account mention (“@instagramUser”) because usually it’s to incite to subscribe to the downloaded reel account (like “Seen me already ? follow me @instagramUser”). I don’t want long credits too, juste a simple “credit tt/twitteraccount” is enough. Not like this bad example : “Credited via the brilliant mind at tt/batinterface!…”
The use of emoji is encouraged, but not too much and it has to not look stupid or too.
When using it, I encoutered some problems like this one :
“Credit to: [Original Creator] for this hilarious video game scene where the characters look suspiciously like Kermit the Frog! 😂”. The [Original creator] is not filled in, I don’t even know if the original caption had one.
Some caption are just lame and feels like a facebook post. The intended audience here is young.
Video Descriptor
I’m using some LLM and I would need a prompt and a system message for every use case I will give you.
I’m using structured JSON output provided by the openAI API. The output structure is a simple {"type": "object", "properties": {"answer": {"type": "string"}}, "required": ["answer"]}, so only “answer” can be filled. For the input, everything will be given in the prompt. Give me the system message and prompt separately, preferably in text format.
The LLM here will be used to describe an Instagram Reel (video). Each screenshot of that video will be described using an LLM, prompt, system message and output format. The description of all the screenshots will be given to this LLM that will try to recreate the video based on the description of the screenshots, and describe the video.
The required prompt here is for the LLM that will compile the description into one and try to understand the video and describe it. I’m particularly interested in the joke behind the reel if there is one.
This is an example of a screenshot description by an LLM : “The image shows a close-up of a person's hands holding what appears to be a brown object with a plastic covering, possibly food wrapped in paper or foil. There is also a small portion visible at the top right corner, which seems to be a red and white label. The focus of the image is on the hands holding the object.”
The information I can give in the prompts are the screenshots and for each :
The screenshot number
The timestamp in the video of when the screenshot is taken
An OCR result (may contain some weird character, the COR is not filtered or cleansed)
The LLM description of the screenshot
Most of the description won’t make sense, so some details should be omitted. For example, one screenshot description could say the main subject is a car, and another one 3 seconds later in the video could say the main subject is a cat. You could say the car transformed into a cat, but it would be safer to assume that one of the description is wrong and the main characted was a cat all along the video because another description in the video also says the main subject is a cat.
It is safe to say that most analysed videos will be of bad quality. which means the screenshots description can vary a lot.
Found text by OCR and screenshots descriptions can be retrieved to the final video description if it seems coherent.
Screenshot descriptor
I’m using some LLM and I would need a prompt, a system message and an output format for every use case I will give you.
The first one must describe a screenshot from a video. Each screenshot of that video will be described using the same LLM, prompt, system message and output format. The description of all the screenshots will be given to another LLM that will try to recreate the video based on the description of the screenshots, and describe the video.
The required prompt here is the one that describes a screenshot. The LLM will only be given the screenshot as input information. I need the LLM to describe the given screenshot. No need to specify that it is a screenshot. The LLM description must include specify the scene, the character or the main subject, the text present on the screenshots, most of the time it will be caption added after video editing, that may use emojis.
The LLM used here is llava:7b-v1.6-mistral-q4_1, it is not the best for text generation , but it is very prowerful when using it’s vision capabilty.
The last part is personnal, I included it because I gave the prompt to another LLM that the one used because llava would'nt give me a good prompt.