deployment··6 min read

Scaling Your OpenClaw Instance: When and How

A practical guide to recognizing when your OpenClaw deployment needs scaling and the strategies available for handling increased load.

ST
SimpleOpenClaw Team

Most OpenClaw deployments start as a single container serving a handful of users. That works well until it doesn't. This guide helps you recognize when your instance needs more resources and walks through the practical approaches to scaling.

Understanding the Bottlenecks

Before scaling anything, you need to know where the pressure is coming from. OpenClaw has three primary resource consumers, and each scales differently.

CPU and Memory

The gateway process handles request routing, session management, and communication with AI providers. Its CPU usage spikes during request processing but is generally low during idle periods. Memory usage grows with the number of concurrent sessions, as each session maintains context and conversation history.

Network I/O

Every AI interaction involves sending prompts to an external API and streaming responses back. Network latency and bandwidth affect perceived performance directly, but this bottleneck is usually on the AI provider's side rather than yours.

Disk I/O

The workspace directory sees write activity when the AI agent creates or modifies files. The state directory sees writes during configuration changes and session updates. For most deployments, disk I/O is not the limiting factor.

Signs You Need to Scale

Watch for these indicators in your monitoring and user feedback.

Response Latency Increases

If responses that used to arrive in two seconds now take five, and your AI provider's status page shows no issues, your gateway may be CPU-constrained. Check container CPU utilization during peak usage. Sustained utilization above 80% suggests you need more compute.

WebSocket Disconnections

The Control UI communicates with the gateway over WebSockets. Frequent disconnections often indicate memory pressure -- the operating system may be killing processes that exceed their allocation, or the garbage collector is running too aggressively under memory constraints.

Gateway Restart Loops

If the gateway process crashes and restarts repeatedly under load, check your container's memory limit. The gateway startup itself consumes memory, and if the limit is too tight, the process may never reach a stable state under concurrent usage.

Queue Buildup

When more requests arrive than the gateway can process, users experience queuing. This manifests as a noticeable delay before the AI "starts typing." A small queue is normal during bursts, but persistent queuing means you need more processing capacity.

Vertical Scaling: More Resources

The simplest scaling strategy is to give your existing container more CPU and memory. This approach requires no architectural changes and works well up to a point.

Railway

In the Railway dashboard, go to your service's Settings and adjust the resource allocation. For a team of five to ten concurrent users, allocate at least 2 vCPUs and 4 GB of RAM. Railway's resource controls make this a one-click change with zero downtime.

Docker Self-Hosted

Update your container's resource limits:

docker run -d \
  --name openclaw \
  --cpus="2.0" \
  --memory="4g" \
  -p 8080:8080 \
  # ... other flags unchanged
  openclaw-railway-template

When Vertical Scaling Hits Its Ceiling

Vertical scaling has practical limits. Beyond 4 vCPUs and 8 GB of RAM, you typically see diminishing returns because a single gateway process can't efficiently parallelize beyond a certain point. If you're at this level and still seeing performance issues, it's time to consider other strategies.

Horizontal Scaling Considerations

Running multiple OpenClaw instances is more complex than vertical scaling because of shared state. Here's what you need to think about.

Session Affinity

Each gateway maintains in-memory session state. If a user's requests hit different instances, their conversation context gets lost. You need sticky sessions (session affinity) at the load balancer level to ensure a user's requests always reach the same instance.

Shared Storage

All instances need access to the same configuration and workspace data. Use a network-attached storage solution (NFS, EFS, or equivalent) mounted at the data path. Be aware that concurrent writes from multiple instances to the same workspace files can cause conflicts.

Configuration Synchronization

Changes made through the setup wizard on one instance need to propagate to others. Since configuration lives on the shared filesystem, this happens naturally -- but you need to restart gateway processes on other instances to pick up changes.

Optimizing Before Scaling

Before adding resources, make sure you're using your current allocation efficiently.

Review AI Provider Settings

If your AI provider supports streaming responses, ensure it's enabled. Streaming reduces perceived latency dramatically because users see partial responses as they're generated rather than waiting for the complete response.

Optimize Workspace Size

A large workspace directory slows down file operations. Archive old projects and keep the active workspace lean. The AI agent performs better when it doesn't have to search through thousands of irrelevant files.

Monitor and Profile

Effective scaling decisions require data. At minimum, track these metrics over time:

  • Container CPU and memory utilization (percentage of limit)
  • Gateway response times (p50, p95, p99)
  • WebSocket connection count and disconnect rate
  • AI provider API latency (to separate your bottlenecks from theirs)

Most container platforms provide built-in monitoring. Railway's metrics dashboard shows CPU and memory usage per service. For self-hosted deployments, Prometheus with Grafana is a well-tested combination.

Scaling Strategy by Team Size

Here are practical starting points based on team size. Adjust based on actual usage patterns.

1-3 Users

The default allocation (1 vCPU, 2 GB RAM) is usually sufficient. Monitor for occasional spikes but don't over-provision.

4-10 Users

Increase to 2 vCPUs and 4 GB RAM. Enable monitoring and set alerts for sustained CPU usage above 75%.

10-25 Users

Consider 4 vCPUs and 8 GB RAM. At this scale, you should have structured monitoring in place and a tested backup strategy.

25+ Users

You're approaching the limits of a single instance. Evaluate horizontal scaling with session affinity, or consider running multiple independent instances for different teams or projects.

Conclusion

Scaling OpenClaw is a progressive exercise. Start with the smallest allocation that meets your needs, monitor real usage patterns, and increase resources when the data tells you to. Vertical scaling handles the majority of use cases, and the overhead of horizontal scaling is rarely justified until you're well past ten concurrent users. The key is to have monitoring in place so you make scaling decisions based on evidence rather than guesswork.

scalingperformanceinfrastructure

Related Articles