Scaling Your OpenClaw Instance: When and How
A practical guide to recognizing when your OpenClaw deployment needs scaling and the strategies available for handling increased load.
Most OpenClaw deployments start as a single container serving a handful of users. That works well until it doesn't. This guide helps you recognize when your instance needs more resources and walks through the practical approaches to scaling.
Understanding the Bottlenecks
Before scaling anything, you need to know where the pressure is coming from. OpenClaw has three primary resource consumers, and each scales differently.
CPU and Memory
The gateway process handles request routing, session management, and communication with AI providers. Its CPU usage spikes during request processing but is generally low during idle periods. Memory usage grows with the number of concurrent sessions, as each session maintains context and conversation history.
Network I/O
Every AI interaction involves sending prompts to an external API and streaming responses back. Network latency and bandwidth affect perceived performance directly, but this bottleneck is usually on the AI provider's side rather than yours.
Disk I/O
The workspace directory sees write activity when the AI agent creates or modifies files. The state directory sees writes during configuration changes and session updates. For most deployments, disk I/O is not the limiting factor.
Signs You Need to Scale
Watch for these indicators in your monitoring and user feedback.
Response Latency Increases
If responses that used to arrive in two seconds now take five, and your AI provider's status page shows no issues, your gateway may be CPU-constrained. Check container CPU utilization during peak usage. Sustained utilization above 80% suggests you need more compute.
WebSocket Disconnections
The Control UI communicates with the gateway over WebSockets. Frequent disconnections often indicate memory pressure -- the operating system may be killing processes that exceed their allocation, or the garbage collector is running too aggressively under memory constraints.
Gateway Restart Loops
If the gateway process crashes and restarts repeatedly under load, check your container's memory limit. The gateway startup itself consumes memory, and if the limit is too tight, the process may never reach a stable state under concurrent usage.
Queue Buildup
When more requests arrive than the gateway can process, users experience queuing. This manifests as a noticeable delay before the AI "starts typing." A small queue is normal during bursts, but persistent queuing means you need more processing capacity.
Vertical Scaling: More Resources
The simplest scaling strategy is to give your existing container more CPU and memory. This approach requires no architectural changes and works well up to a point.
Railway
In the Railway dashboard, go to your service's Settings and adjust the resource allocation. For a team of five to ten concurrent users, allocate at least 2 vCPUs and 4 GB of RAM. Railway's resource controls make this a one-click change with zero downtime.
Docker Self-Hosted
Update your container's resource limits:
docker run -d \
--name openclaw \
--cpus="2.0" \
--memory="4g" \
-p 8080:8080 \
# ... other flags unchanged
openclaw-railway-template
When Vertical Scaling Hits Its Ceiling
Vertical scaling has practical limits. Beyond 4 vCPUs and 8 GB of RAM, you typically see diminishing returns because a single gateway process can't efficiently parallelize beyond a certain point. If you're at this level and still seeing performance issues, it's time to consider other strategies.
Horizontal Scaling Considerations
Running multiple OpenClaw instances is more complex than vertical scaling because of shared state. Here's what you need to think about.
Session Affinity
Each gateway maintains in-memory session state. If a user's requests hit different instances, their conversation context gets lost. You need sticky sessions (session affinity) at the load balancer level to ensure a user's requests always reach the same instance.
Shared Storage
All instances need access to the same configuration and workspace data. Use a network-attached storage solution (NFS, EFS, or equivalent) mounted at the data path. Be aware that concurrent writes from multiple instances to the same workspace files can cause conflicts.
Configuration Synchronization
Changes made through the setup wizard on one instance need to propagate to others. Since configuration lives on the shared filesystem, this happens naturally -- but you need to restart gateway processes on other instances to pick up changes.
Optimizing Before Scaling
Before adding resources, make sure you're using your current allocation efficiently.
Review AI Provider Settings
If your AI provider supports streaming responses, ensure it's enabled. Streaming reduces perceived latency dramatically because users see partial responses as they're generated rather than waiting for the complete response.
Optimize Workspace Size
A large workspace directory slows down file operations. Archive old projects and keep the active workspace lean. The AI agent performs better when it doesn't have to search through thousands of irrelevant files.
Monitor and Profile
Effective scaling decisions require data. At minimum, track these metrics over time:
- Container CPU and memory utilization (percentage of limit)
- Gateway response times (p50, p95, p99)
- WebSocket connection count and disconnect rate
- AI provider API latency (to separate your bottlenecks from theirs)
Most container platforms provide built-in monitoring. Railway's metrics dashboard shows CPU and memory usage per service. For self-hosted deployments, Prometheus with Grafana is a well-tested combination.
Scaling Strategy by Team Size
Here are practical starting points based on team size. Adjust based on actual usage patterns.
1-3 Users
The default allocation (1 vCPU, 2 GB RAM) is usually sufficient. Monitor for occasional spikes but don't over-provision.
4-10 Users
Increase to 2 vCPUs and 4 GB RAM. Enable monitoring and set alerts for sustained CPU usage above 75%.
10-25 Users
Consider 4 vCPUs and 8 GB RAM. At this scale, you should have structured monitoring in place and a tested backup strategy.
25+ Users
You're approaching the limits of a single instance. Evaluate horizontal scaling with session affinity, or consider running multiple independent instances for different teams or projects.
Conclusion
Scaling OpenClaw is a progressive exercise. Start with the smallest allocation that meets your needs, monitor real usage patterns, and increase resources when the data tells you to. Vertical scaling handles the majority of use cases, and the overhead of horizontal scaling is rarely justified until you're well past ten concurrent users. The key is to have monitoring in place so you make scaling decisions based on evidence rather than guesswork.