Parallel Workflow

Parallelization takes advantage of tasks that can broken up into discrete independent parts. The user's prompt is passed to multiple LLMs simultaneously. Once all the LLMs respond, their answers are all sent to a final LLM call to be aggregated for the final answer.

Parallel Architecture

Run multiple LLMs in parallel and aggregate their solutions.

📘
Notice that the same user prompt goes to each parallel LLM for execution. An alternate parallel workflow where this main prompt task is broken in sub-tasks is presented later.

📘
Parallel Workflow Cookbook
For a more detailed walk-through refer to the notebook here .

Setup Client & Helper Functions

import asyncio
import together
from together import AsyncTogether, Together

client = Together()
async_client = AsyncTogether()

def run_llm(user_prompt : str, model : str, system_prompt : str = None):
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    messages.append({"role": "user", "content": user_prompt})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
        max_tokens=4000,        
    )

    return response.choices[0].message.content

# The function below will call the reference LLMs in parallel
async def run_llm_parallel(user_prompt : str, model : str, system_prompt : str = None):
    """Run a single LLM call with a reference model."""
    for sleep_time in [1, 2, 4]:
        try:
            messages = []
            if system_prompt:
                messages.append({"role": "system", "content": system_prompt})
    
            messages.append({"role": "user", "content": user_prompt})

            response = await async_client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=0.7,
                max_tokens=2000,
            )
            break
        except together.error.RateLimitError as e:
            print(e)
            await asyncio.sleep(sleep_time)
    return response.choices[0].message.content

import assert from "node:assert";
import Together from "together-ai";

const client = new Together();

export async function runLLM(
  userPrompt: string,
  model: string,
  systemPrompt?: string,
) {
  const messages: { role: "system" | "user"; content: string }[] = [];
  if (systemPrompt) {
    messages.push({ role: "system", content: systemPrompt });
  }

  messages.push({ role: "user", content: userPrompt });

  const response = await client.chat.completions.create({
    model,
    messages,
    temperature: 0.7,
    max_tokens: 4000,
  });

  const content = response.choices[0].message?.content;
  assert(typeof content === "string");
  return content;
}

Implement Workflow

import asyncio
from typing import List

async def parallel_workflow(prompt : str, proposer_models : List[str], aggregator_model : str, aggregator_prompt: str):
    """Run a parallel chain of LLM calls to address the `input_query` 
    using a list of models specified in `models`.

    Returns output from final aggregator model.
    """

    # Gather intermediate responses from proposer models
    proposed_responses = await asyncio.gather(*[run_llm_parallel(prompt, model) for model in proposer_models])
    
    # Aggregate responses using an aggregator model
    final_output = run_llm(user_prompt=prompt,
                           model=aggregator_model,
                           system_prompt=aggregator_prompt + "\n" + "\n".join(f"{i+1}. {str(element)}" for i, element in enumerate(proposed_responses)
           ))
    
    return final_output, proposed_responses

import dedent from "dedent";

/*
  Run a parallel chain of LLM calls to address the `inputQuery` 
  using a list of models specified in `proposerModels`.

  Returns output from final aggregator model.
*/
async function parallelWorkflow(
  inputQuery: string,
  proposerModels: string[],
  aggregatorModel: string,
  aggregatorSystemPrompt: string,
) {
  // Gather intermediate responses from proposer models
  const proposedResponses = await Promise.all(
    proposerModels.map((model) => runLLM(inputQuery, model)),
  );

  // Aggregate responses using an aggregator model
  const aggregatorSystemPromptWithResponses = dedent`
    ${aggregatorSystemPrompt}

    ${proposedResponses.map((response, i) => `${i + 1}. response`)}
  `;

  const finalOutput = await runLLM(
    inputQuery,
    aggregatorModel,
    aggregatorSystemPromptWithResponses,
  );

  return [finalOutput, proposedResponses];
}

Example Usage

reference_models = [
    "microsoft/WizardLM-2-8x22B",
    "Qwen/Qwen2.5-72B-Instruct-Turbo",
    "google/gemma-2-27b-it",
    "meta-llama/Llama-3.3-70B-Instruct-Turbo",
]

user_prompt = """Jenna and her mother picked some apples from their apple farm. 
Jenna picked half as many apples as her mom. If her mom got 20 apples, how many apples did they both pick?"""

aggregator_model = "deepseek-ai/DeepSeek-V3"

aggregator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query.
Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information
provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the
given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured,
coherent, and adheres to the highest standards of accuracy and reliability.

Responses from models:"""

async def main():
    answer, intermediate_reponses = await parallel_workflow(prompt = user_prompt, 
                                                            proposer_models = reference_models, 
                                                            aggregator_model = aggregator_model, 
                                                            aggregator_prompt = aggregator_system_prompt)

    for i, response in enumerate(intermediate_reponses):
        print(f"Intermetidate Response {i+1}:\n\n{response}\n")

    print(f"Final Answer: {answer}\n")


asyncio.run(main())

const referenceModels = [
  "microsoft/WizardLM-2-8x22B",
  "Qwen/Qwen2.5-72B-Instruct-Turbo",
  "google/gemma-2-27b-it",
  "meta-llama/Llama-3.3-70B-Instruct-Turbo",
];

const userPrompt = dedent`
  Jenna and her mother picked some apples from their apple farm.
  Jenna picked half as many apples as her mom.
  
  If her mom got 20 apples, how many apples did they both pick?
`;

const aggregatorModel = "deepseek-ai/DeepSeek-V3";

const aggregatorSystemPrompt = dedent`
  You have been provided with a set of responses from various
  open-source models to the latest user query. Your task is to
  synthesize these responses into a single, high-quality response.
  It is crucial to critically evaluate the information provided in
  these responses, recognizing that some of it may be biased or incorrect.
  Your response should not simply replicate the given answers but
  should offer a refined, accurate, and comprehensive reply to the
  instruction. Ensure your response is well-structured, coherent, and
  adheres to the highest standards of accuracy and reliability.

  Responses from models:
`;

async function main() {
  const [answer, intermediateResponses] = await parallelWorkflow(
    userPrompt,
    referenceModels,
    aggregatorModel,
    aggregatorSystemPrompt,
  );
  for (const response of intermediateResponses) {
    console.log(
      `## Intermediate Response: ${intermediateResponses.indexOf(response) + 1}:\n`,
    );
    console.log(`${response}\n`);
  }
  console.log(`## Final Answer:`);
  console.log(`${answer}\n`);
}

main();

Use cases

Using one LLM to answer a user's question, while at the same time using another to screen the question for inappropriate content or requests.
Reviewing a piece of code for both security vulnerabilities and stylistic improvements at the same time.
Analyzing a lengthy document by dividing it into sections and assigning each section to a separate LLM for summarization, then combining the summaries into a comprehensive overview.
Simultaneously analyzing a text for emotional tone, intent, and potential biases, with each aspect handled by a dedicated LLM.
Translating a document into multiple languages at the same time by assigning each language to a separate LLM, then aggregating the results for multilingual output.

Subtask Agent Workflow

An alternate and useful parallel workflow. This workflow begins with an LLM breaking down the task into subtasks that are dynamically determined based on the input. These subtasks are then processed in parallel by multiple worker LLMs. Finally, the orchestrator LLM synthesizes the workers' outputs into the final result.

📘
Subtask Workflow Cookbook
For a more detailed walk-through refer to the notebook here .