Skip to main content

Config Sync Pipeline

This page describes the complete lifecycle of a configuration change — from a user clicking a button in the UI to the edge device applying the change and confirming it back to the orchestrator.

The Seven Steps

Every configuration change follows the same pipeline, regardless of whether it affects a dedicated edge, MTGE, or connector:

Step 1: Edit

An administrator makes a change in the Web UI — for example, adding a WireGuard peer, modifying an ACL rule, or updating an interface IP address. The API saves the new desired state to MySQL.

Step 2: Dirty Flag

The API marks the edge's configuration as "dirty" by updating its tracking record. This dirty flag is what causes the yellow pending indicator to appear in the UI. The flag includes a reason (e.g., "wireguard peer added") for debugging purposes.

Step 3: Batch Build

When a sync is triggered (manually or automatically), the API aggregates configuration from across 20+ database tables into a single batch message. This includes interfaces, WireGuard tunnels, routing, NAT rules, ACLs, service chain settings, and monitoring configuration.

The batch message contains:

  • Sequence number — An incrementing counter that identifies this particular configuration version
  • Config hash — A SHA-256 hash of the entire configuration, used for end-to-end verification
  • Commands array — An ordered list of handler-specific configurations

Each command in the array specifies a topic (which handler should process it) and a body (the handler's configuration):

commands: [
{ index: 0, topic: "interface", body: { ... } },
{ index: 1, topic: "wireguard", body: { ... } },
{ index: 2, topic: "static", body: { ... } },
{ index: 3, topic: "nat44-...", body: { ... } },
{ index: 4, topic: "acl", body: { ... } },
...
]

Commands are ordered by dependency — interfaces before tunnels, tunnels before routing, routing before NAT, and so on.

Step 4: MQTT Delivery

The batch message is published to the device's MQTT topic. The topic varies by device type:

Device TypeTopic
Dedicated EdgeVSR/{serial}/batch
MTGE (per tenant)VSR/{serial}/batch/{tenantId}
ConnectorVSR/{serial}/batch

The EMQX broker delivers the message to the connected agent. If the agent is offline, the message is not queued — the agent will request its configuration when it next connects.

Step 5: Apply

The edge agent receives the batch and applies it. How this works depends on the device type:

Dedicated Edge and MTGE — The agent uses the V3 Sync Coordinator, which applies configuration in eight dependency-ordered phases:

Each phase waits for its dependencies to complete before starting. If a newer configuration arrives while a sync is in progress, the coordinator cancels the in-flight sync and applies the newer one instead.

Within each phase, handlers write desired state to etcd (for Ligato-managed resources) or call the VPP Binary API directly (for WireGuard, NAT, and other advanced features).

Connector — The connector agent applies configuration using Linux networking tools directly:

HandlerTool
WireGuardwg CLI (WireGuard tools)
Static routesip route
NATiptables masquerade rules
ACLiptables filter rules

Step 6: Confirm

After applying the configuration, the agent sends a confirmation message back to the orchestrator over MQTT. The confirmation includes:

  • Sequence number — Which configuration version was applied
  • Config hash — The hash computed by the agent over the configuration it actually applied
  • Status — Success or failure
  • Applied commands — Count of successfully applied commands
  • Failed commands — Count and details of any commands that failed
  • VPP mode — Whether the edge is running in DPDK or AF_PACKET mode

Step 7: Hash Verification

The orchestrator compares the hash in the confirmation against the hash it calculated when building the batch. If they match, the configuration is marked as Synced and the dirty flag is cleared. If they do not match, the configuration is marked as Failed.

This end-to-end hash verification provides a cryptographic guarantee that the exact configuration defined in the UI is what is running on the device.

Dirty Flag Lifecycle

The dirty flag tracks the state of each device's configuration relative to the orchestrator's desired state:

StateMeaning
SyncedThe device is running the latest configuration. No pending changes.
PendingChanges exist in the database that have not been pushed to the device.
ApplyingA batch has been sent and the orchestrator is waiting for confirmation.
FailedThe device could not apply the configuration, or the hash did not match.

Stale Configuration Recovery

When an edge reports a sequence number that is older than the orchestrator expects, it means the edge is running stale configuration. This can happen after a device reboot, network outage, or if a previous sync was lost.

The recovery process:

  1. The orchestrator detects the old sequence number in the confirmation message
  2. It waits 15 seconds to allow VPP to stabilize after a restart
  3. It re-pushes the latest batch configuration automatically

No manual intervention is required — stale configuration is self-healing.

Startup Config Request

When an edge agent starts up (after a reboot or container restart), it does not wait passively for configuration. Instead, it actively requests its configuration:

  1. The agent publishes a config request message
  2. The orchestrator receives it and pushes the latest batch configuration
  3. If the first request is not answered (e.g., the orchestrator is temporarily unreachable), the agent retries with exponential backoff

This ensures edges converge to the correct configuration as quickly as possible after any disruption.

Device-Specific Differences

MTGE (Multi-Tenant Gateway)

MTGEs manage configuration per tenant:

  • Each tenant's config is published to a separate MQTT topic: VSR/{serial}/batch/{tenantId}
  • Dirty flags are tracked per tenant, not per device — changing one tenant's config does not trigger a sync for other tenants
  • The agent applies configuration within the correct VRF context for each tenant

Connector

Connectors have a simplified pipeline:

  • Configuration state is embedded directly in the connectors database table (no separate state tracking table)
  • Only four command topics: wireguard, static, nat_config, acl_config
  • The agent uses Linux kernel networking instead of VPP — no etcd or Ligato layer
  • Configuration is persisted to a local JSON file on the connector

Troubleshooting

Sync shows "Failed"

  1. Check device connectivity — Is the device online and reporting heartbeats? An offline device cannot receive configuration.
  2. Review edge logs — The agent logs will show which specific handler or command failed and why.
  3. Hash mismatch — If the hash does not match, it usually means the agent could not apply one or more commands. Look for partial application in the logs.
  4. Retry — After investigating, click Sync Now to reattempt.

Sync stuck on "Applying"

The orchestrator is waiting for a confirmation that never arrived. Possible causes:

  • The MQTT connection was interrupted between the edge and broker
  • The agent crashed during application
  • The agent applied the config but the confirmation message was lost

In most cases, the next heartbeat or inform from the edge will reveal whether it is running the correct configuration. A manual sync retry will resolve the issue.

Edge running old configuration after reboot

This is normally handled automatically by the startup config request mechanism. If the edge still shows stale config:

  1. Verify the edge is connected to MQTT (check heartbeats in the monitoring dashboard)
  2. Trigger a manual sync from the edge detail page
  3. If the edge is not connecting to MQTT, check its certificates and network connectivity

Phase timeout during application

On dedicated edges, if a handler in the sync coordinator takes too long, the phase times out. Common causes:

  • VPP is still starting up and the Binary API is not yet available
  • etcd is temporarily unreachable
  • A handler is blocked waiting for an external resource

Check the agent logs for the specific phase and handler that timed out.