Discussion about this post

User's avatar
ToxSec's avatar

“First, separate the creative work from the structural work. Asking one LLM call to be both brilliant and obedient means optimizing for competing goals, and research suggests that makes it worse at both”

this is great advice even for more general use cases. aligning llms with different areas in vector space really has a performance effect. great read :)

Hozefa's avatar

At Loom, we also built AI powered subtitles (and captions), below was our thinking model,

- Along with preserving the content and tone we asked the prompt to translate every caption block and preserve that number. Ensure that every single input caption block is translated in the output. The output must contain the exact same number of caption blocks as the input, from the first to the very last (including the final block, regardless of length). Do not stop early.

- We also asked to maintain the same timing and conciseness of the original captions and passed in the original language captions so that the LLM could translate each caption phrase.

Strive to match the original caption's length (word count/character count) closely for on-screen readability and synchronization with dialogue. Use concise phrasing, common synonyms, and efficient sentence structures.

- In order to prevent a timing sync problem we asked the LLM to preserve timestamps of the original captions: All timestamps must match the input captions exactly, with none missing.

- We wanted translations to read easily and naturally in the new language and specifically did not want to translate word for word in a way that would not be easily understood. There are grammatical and syntax nuances between different language. Use vocabulary and current slang widely understood by native speakers of the target language. Avoid jargon, archaic terms, or overly formal language unless explicitly required by the source material's register.

- If there were sections that did not make sense when translated we ask to preserve the word/ phrase in the original transcript. If a phrase or part of a phrase cannot be translated clearly and naturally into the target language while preserving context and intent, keep the original language text from the input.

- What helped were the examples we provided to show that we wanted timestamps and context preserved.

6 more comments...

No posts

Ready for more?