Simbel AI — Your AI Marketing Team

Part 2 of a 4-part series on building Simbel AI.

In part 1, I described what Simbel is: 37 specialized AI agents organized across 7 functions, automating the full marketing pipeline. If you haven't read that yet, start there — this article assumes you know the "what."

This one is the "how."

I want to show you what actually happens when a user clicks "Launch Campaign." Not the marketing version where everything looks smooth and inevitable. The real version: the queues, the state transitions, the human gates, the places where the system has to be careful because things can go wrong.

This is the architecture I wish someone had shown me before I had to figure it out myself.

How an Agent Is Born

Before I can explain what a campaign run looks like, I need to explain what an agent is — at least in the way Simbel uses the term, which is more structured than the casual usage you see in most AI content.

Every agent in the system is defined by five things:

Role — the job title. "Competitor Research Analyst." "Saudi Dialect Copywriter." "Publishing Scheduler." The role isn't decorative — it's the primary frame the agent uses to interpret its task and produce output. Agents with clearly defined roles produce more consistent results than agents with vague ones.

Goal — a specific, measurable objective. Not "help with marketing" but "analyze the top 5 competitors in the specified industry, extract their content strategy and posting cadence, and produce a structured report with specific findings." The more precise the goal, the less the agent has to infer about what success looks like.

Backstory — this is the part most people skip, and it makes a bigger difference than you'd expect. The backstory is the persona: a few paragraphs describing who this agent is, what expertise they have, what they care about, how they think. It's not just flavor. It meaningfully changes the reasoning pattern. An agent with a backstory as a senior strategist who has worked with MENA market brands for 15 years produces materially different output than the same model with no persona context. Why this works is a question for researchers — that it works is something I've verified empirically.

Tools — what the agent can do beyond reasoning. Some agents can search the web. Some can query analytics APIs. Some can post to social platforms. Some can read and write to the campaign knowledge base. Tools are what give agents the ability to act on the world, not just think about it.

Intelligence tier — not every task requires the same level of cognitive horsepower. Writing a competitive strategy for a MENA market healthcare brand requires genuine depth of reasoning and nuance. Generating a list of relevant hashtags does not. We tier our agents based on task complexity, routing each to the appropriate model class. The result is a system that uses premium reasoning where it matters and lighter-weight processing where it doesn't. This is not just a cost optimization — it's also a speed optimization. The premium models are slower. Using them everywhere creates unnecessary latency.

There's also a sixth consideration that isn't part of the agent definition itself but shapes every run: guardrails. Every agent has an iteration limit — a maximum number of attempts before it stops trying and escalates. It has a timeout. It has defined failure behavior. Production systems that don't build these in learn to add them after their first memorable outage.

What Happens When You Click Launch

Here's the full sequence, step by step.

Step 0: Capacity Check and Queueing

Before a single agent wakes up, the system checks whether the infrastructure can handle another active campaign. AI model calls are expensive and rate-limited. Social API connections have their own limits. Parallel campaigns compete for the same resources.

If capacity is available, the campaign enters the pipeline immediately. If not, it joins a queue with a guaranteed execution window. The user sees the queue status. This is a basic engineering consideration that becomes critical at scale, and building it in from the start is much easier than retrofitting it later.

Step 1: F1 — Research and Strategy

Seven agents, running sequentially. Each one receives the outputs of the agents that ran before it.

The first agent scopes the campaign — understanding the industry, the brand, the objectives, the target geography. The second analyzes the competitive landscape: who are the main competitors, what are they posting, how often, what seems to be working. The third builds audience profiles: demographics, psychographics, content preferences, platform behaviors. The fourth identifies trends relevant to the campaign period. The fifth produces platform-specific best practices — what works on Instagram in Saudi Arabia is not the same as what works on LinkedIn for a B2B audience.

The sixth agent synthesizes everything into a draft marketing strategy. The seventh reviews and stress-tests it, checking for consistency, identifying gaps, flagging assumptions that aren't well-supported by the research.

The output of F1 is a structured strategy document — campaign positioning, target audience definition, content themes, platform priorities, timing recommendations, and a content brief. This document persists throughout the entire pipeline. Every downstream agent has access to it.

Human Gate 1: Strategy Review

Here's something I think is often underappreciated in discussions of AI automation: the goal is not to remove humans from the loop. The goal is to remove humans from the tedious, low-value parts of the loop so they can focus on the high-value decisions.

Strategy is a high-value decision. Before any content gets created, the user reviews the strategy document. They can approve it or send it back for revision.

This gate matters for several reasons. First, the user knows things the AI doesn't — internal context, brand sensitivities, recent business developments, upcoming product launches. Second, it creates accountability: the user signs off on the strategic direction before the system invests time in content creation. Third, when something goes wrong downstream, having an approved strategy as the reference point is useful.

When a user rejects a strategy, the system is designed to be smart about what it re-runs. It doesn't restart all seven research agents — that would be wasteful and slow. It identifies which part of the strategy failed and re-runs only the agents responsible for that section. If the problem is with the strategy synthesis, only the synthesizer and reviewer re-run, building on the research that was already completed. This sounds obvious, but getting the dependency graph right so you can do this efficiently took real engineering effort.

Step 2: F2 — Content Creation

Content creation agents are instantiated dynamically based on the platforms selected for this campaign. If the campaign targets Instagram, LinkedIn, and Email, you get content agents for those three platforms — not for TikTok or Facebook or X. This keeps the pipeline lean and avoids generating content that will never be used.

Each platform agent receives: the approved strategy document, the campaign brief, the brand context (tone, voice, positioning), the audience profile, and any relevant past campaign learnings from the knowledge base. The content it produces isn't generic — it's shaped by everything the research phase produced.

This is the layer where dialect selection matters. If the campaign is set to Saudi dialect, the Saudi content agents produce content in Saudi dialect — not Modern Standard Arabic, not a generic Arabic that sounds like it was translated from English. The same applies for Egyptian, Emirati, Gulf, and Levantine. I'll go into this in much more depth in part 3.

Content creation can run in parallel across platforms. There's no dependency between the Instagram agent and the LinkedIn agent — they both draw from the same strategy document independently. Running them in parallel rather than sequentially cuts content creation time significantly.

Human Gate 2: Content Review

The second human gate. Users review the generated content for each platform before anything goes live. They can edit individual posts, approve selectively, or send specific content pieces back for revision without blocking the posts they've already approved.

This gate is where the brand voice lives. AI can get close — very close — but the user knows their brand better than any model does. The review step isn't about fixing broken AI output; it's about the human bringing the last layer of brand judgment that no system can fully replicate.

Step 3: F3 — Publishing

Once content is approved, the publishing agents take over. These agents determine the optimal posting time for each piece of content based on platform-specific audience data and the recommendations from the strategy phase. They manage the actual API connections to each platform, handling authentication, rate limiting, and retries.

Delivery confirmation isn't assumed — it's checked. A post that was "sent" is only marked as published when the platform API returns a success confirmation. Failures are logged and flagged. The user gets notified if something doesn't go out as scheduled.

Step 4: F4 — Analytics and Optimization

Analytics agents begin running after posts are live and enough time has passed for meaningful performance data to accumulate. They track engagement, reach, and conversion metrics across all platforms. They identify which content is overperforming and which is underperforming. They run A/B test analysis when variants exist. They calculate ROI where conversion tracking is available.

The most important output of this phase isn't the report — it's the learnings that get written to the campaign knowledge base. What content themes resonated with this audience? What posting times drove the highest engagement? What tone worked better than expected? These learnings don't disappear when the campaign ends. They become part of the context that informs the next campaign.

This is the compounding return that makes the system more valuable over time. The first campaign is good. The tenth campaign — informed by nine campaigns worth of accumulated learnings about what works for this brand and this audience — is better.

The Campaign State Machine

The full lifecycle I just described maps to an explicit state machine:

QUEUED → F1_RUNNING → F1_REVIEW → F2_RUNNING → F2_REVIEW → F3_PENDING → F3_PUBLISHING → ANALYTICS_PENDING → COMPLETED

Each state has a defined set of allowed transitions and a defined set of failure states. A campaign can't move from F1_REVIEW to F2_RUNNING without explicit user approval. A campaign stuck in F1_RUNNING beyond its timeout threshold moves to F1_FAILED and triggers a recovery flow.

The state machine is the backbone of reliability. Without it, a distributed system with multiple AI agents, multiple external API calls, and multiple human decision points becomes impossible to reason about. With it, you always know where a campaign is, how it got there, and what needs to happen next.

The Context Problem in Detail

Context management is arguably the hardest architectural challenge in multi-agent systems. Each agent needs to know what happened before it, but you can't just dump the entire history into every agent's context window — that's slow, expensive, and quickly hits token limits.

We solve this at three levels:

Within a function: Each agent receives a structured summary of all previous agents' outputs in that function — not the raw outputs, but structured summaries. The Trend Analysis agent receives a summary of what the Competitor Research agent found, not the full competitor analysis. This keeps context windows manageable while preserving the key information.

Between functions: The strategy document produced by F1 is a persistent artifact that all downstream functions can reference. When a content creation agent needs to understand the campaign positioning, it doesn't reach back to re-run research — it reads the strategy document that F1 already produced. This document is structured, not freeform prose, specifically so it's easy for agents to extract specific sections.

Across campaigns: The knowledge base stores learnings in a vector format that enables semantic retrieval. When a new campaign runs, the relevant historical learnings are retrieved based on similarity to the current campaign context — same industry, same audience type, same platform mix. Agents don't receive everything ever stored; they receive the learnings that are actually relevant to their current task.

Getting this architecture right required thinking carefully about the information hierarchy. What does each agent actually need? What can be summarized without losing important signal? What must be preserved in full? These are questions worth spending time on before you write the first line of code.

The Three-Tier Intelligence Strategy

This deserves its own section because it's one of the decisions that most affected the system's practical performance.

Not every agent needs a premium language model. The tasks in this pipeline vary enormously in cognitive complexity:

Tier 1 — Complex reasoning: Competitive strategy synthesis, market analysis, campaign positioning. These tasks require genuine depth of reasoning, handling of ambiguity, and nuanced judgment. We use the most capable models available for these agents. The cost and latency are justified by the quality of output.

Tier 2 — Structured generation: Platform-specific content creation, email copywriting, personalization. These tasks require creativity and brand awareness but follow more defined patterns. Mid-tier models handle these well and run faster.

Tier 3 — Classification and extraction: Hashtag selection, metadata generation, scheduling calculations, structured data extraction. These are largely pattern-matching tasks where model capability beyond a certain threshold adds little value. Lightweight models handle these quickly and cheaply.

The result: the pipeline is faster and the cost per campaign is lower, without sacrificing quality on the tasks where quality matters most. A naive "use the best model for everything" approach is neither the fastest nor the highest-quality solution — it's the highest-cost one.

What's Next

Part 3 goes into the Arabic dialect system — the part of this platform that I find most technically interesting and most personally meaningful.

Arabic isn't one language when it comes to content. The written formal register and the spoken colloquial registers are sufficiently different that a brand using the wrong one sounds alien to its audience. Building a system that generates native-sounding Saudi content, native-sounding Egyptian content, and native-sounding Levantine content — not translations, not approximations — required solving problems that I hadn't seen addressed well anywhere else.

Part 4 will cover the RAG feedback loop: how past campaign performance shapes future campaigns, what the knowledge base actually stores, and how the retrieval system decides what's relevant.

Bassem Zohdy is the founder of Simbel AI, an AI-powered marketing automation platform built for Arabic-speaking markets.

Inside the Hive: How 37 Agents Orchestrate a Marketing Campaign