01 // The Challenge
Growth Constrained by the System, Not the Market
Naturals Ice Cream — India's largest premium ice cream chain — had outgrown its POS. The legacy system couldn't keep pace with the aggregator volume coming in from Swiggy, Zomato, Vendekin, Delight, and Urban Piper simultaneously. Menu changes required manual updates on each platform separately. Inventory was tracked in spreadsheets checked at end of shift. The ERP had no live connection to what was happening across 200 outlets in real time. Leadership had no unified view — just fragmented data from disconnected systems.
The brand had built a national reputation. The technology hadn't scaled with it. They needed a complete replacement — purpose-built, not adapted from a generic product — and they needed it to go live without disrupting a single day of outlet operations.
In plain terms
Imagine running 200 shops where each one is using a different notebook to track orders, a different phone to receive Swiggy and Zomato orders, and sending daily sales figures to head office by WhatsApp. That was the scale of the problem. Every hour, the gap between what the system could handle and what the business actually needed was getting wider.
02 // What Was Built
A Platform Built for Naturals — Not Adapted From Someone Else's
A purpose-built, cloud-native microservices platform covering the full order lifecycle: discovery, ordering, payment, kitchen dispatch, inventory deduction, reporting, and ERP sync — unified under a single API used by every client: POS terminal, kiosk, web, and mobile.
In plain terms
Think of it as a single brain for the entire Naturals operation. Whether an order comes from Swiggy, a customer walking into an outlet, or the kiosk at the counter — it all flows through the same system, gets processed the same way, and shows up in the same reports. One version of the truth, everywhere.
// System Components
Services communicate over gRPC internally. All external clients see only GraphQL. Async jobs — report exports, stock updates, notification dispatch — run through Asynq on Redis. No Kafka. No RabbitMQ. Nothing additional to operate or maintain.
03 // Order Management
Every Channel. One Queue.
The Integration service normalises orders from every aggregator into a single internal schema before they reach the Order service. Each aggregator has its own quirks — Swiggy's flexible boolean types, Zomato's menu format, Vendekin's order acknowledgement protocol — handled in isolated adapters so that changes to one aggregator's API don't affect anything else in the system.
Payment rails: Paytm EDC for card and UPI at the counter, Razorpay for online orders, plus cash and custom payment modes. Delivery dispatch: Shadowfax. Customer notifications: MSG91 for SMS, Firebase for mobile push.
The outlet operator sees one unified order queue on one screen, regardless of which platform the customer ordered from. There's no tab-switching between Swiggy tablet, Zomato tablet, and the POS. One screen. All orders.
In plain terms
Before, a Naturals outlet might have had three separate tablets — one for Swiggy, one for Zomato, one for their own system — plus their POS. Staff had to watch all of them simultaneously and manually update each one. Now there is one screen. One queue. The system handles everything else automatically in the background.
04 // The Migration
200 Outlets.
One Day.
Zero Downtime.
Migrating 200 live outlets simultaneously required zero-downtime cutover tooling and complete confidence in rollback. Most vendors would schedule this across three to six months — outlet by outlet, city by city — with extended parallel-run periods where both systems run together.
We compressed it to one day through a migration architecture that pre-loaded all outlet data, menus, inventory configurations, staff accounts, and pricing into the new system before the cutover window. Nothing was entered live on the day.
The deployment pipeline — deploy_via_watchdog.sh targeting Docker Swarm stack updates — enabled per-service rolling updates with automatic health-check gating. A failed health check stops the rollout before it reaches the next replica. The same mechanism supports single-service canary deploys in under two minutes.
Outlets came online in a controlled batch sequence within a single maintenance window. Staff arrived the next morning and worked on the new system. No outlet was left on the old system. No rollback was required.
In plain terms
Moving a restaurant's entire technology system is like changing the engine of a moving vehicle. You can't stop the business, but you need to replace every component underneath it. We pre-built everything — it was ready before the switch. Then on the day, we flipped all 200 outlets at once. If something had gone wrong with one piece, the system would have automatically stopped and held the rest until it was fixed.
Migration Day Timeline
05 // Architecture
Multi-Datacenter Redundancy
on a Startup Budget
Production-grade infrastructure doesn't have to mean hyperscaler bills. The entire Naturals platform runs on a 6-node ARM64 cluster across two European datacenters — infrastructure that costs a fraction of equivalent managed cloud, with no compromise on reliability.
// Traffic Flow Diagram
3× cax31 ARM64 nodes
3× cax31 ARM64 nodes
One per node — zero-hop entry
Auth, Order, Inventory, Catalogue, Integration, Reporting, Audit, Events
Patroni + etcd leader election · Helsinki primary · Nuremberg standby
Helsinki master · Nuremberg slave · auto-promotion on master loss
ARM64 Compute — 30–40% Cost Saving
All nodes run Hetzner cax31 ARM64 instances. ARM64 cuts infrastructure spend roughly 30–40% versus equivalent x86 instances — for identical performance on server workloads.
Total reserved capacity: ~14 vCPU / 16 GB with burst headroom to ~34 vCPU / 80 GB before any node is saturated. The entire production cluster — 6 compute nodes, 2 database nodes, 1 observability node — costs what a single managed Kubernetes node costs on AWS or GCP.
In plain terms
The servers running the Naturals platform use a more efficient chip architecture — the same kind Apple uses in the M-series Macs. It does the same work for significantly less money. That saving compounds every month.
Docker Swarm — No Kubernetes Complexity
Docker Swarm was a deliberate choice over Kubernetes. It eliminates CRDs, Helm charts, and operator sprawl while delivering rolling updates, health checks, and placement constraints — everything needed for a reliable multi-service deployment.
The result is a platform that any competent engineer can operate and debug without deep Kubernetes expertise — reducing operational risk and the cost of ongoing maintenance.
In plain terms
We chose simpler infrastructure tools deliberately. Simpler tools break less. When they do break, they're faster to fix. And they cost less to maintain. Complexity is a liability in production systems at this scale.
Automatic Database Failover — Patroni + etcd
PostgreSQL runs under Patroni with etcd for distributed consensus. The leader (Helsinki) is continuously replicating to a hot standby (Nuremberg). If the primary fails, Patroni automatically promotes the standby — no manual intervention, no data loss.
In plain terms
If the main database server fails, the backup in another city automatically takes over within seconds. Outlets keep taking orders. No one calls to say the system is down.
Private Network — Tailscale VPN
All inter-node traffic is encrypted via Tailscale VPN. The database and Redis layers sit on a private 10.0.0.0/16 network — never exposed to the public internet. Even if a compute node were compromised, it cannot reach data infrastructure directly.
In plain terms
The databases and sensitive systems are completely invisible to the internet. An attacker can't probe them because they literally can't see them. It's the equivalent of keeping your most valuable records in a vault with no public-facing door.
06 // Inventory
Double-Entry Stock Accounting
In plain terms
The same accounting principles your finance team uses for money — every rupee in, every rupee out, always balanced — applied to stock. You can trace every mango, every cup, every topping forward and backward. Nothing disappears without a record. When an outlet's stock doesn't match the system, you know exactly where the gap is and when it happened.
Inventory uses double-entry accounting — every stock movement is a debit/credit entry pair, making reconciliation auditable and reversible. There are no one-sided adjustments that can hide shrinkage, theft, or data entry errors.
The Recipe engine decomposes each menu item into raw ingredient quantities. When an order is confirmed, the Order service calls Inventory over gRPC to atomically reserve the exact stock required — at the ingredient level, not the product level. If two orders come in simultaneously for the last two portions of a flavour, the system handles the conflict without double-selling.
Aggregator stock sync pushes real-time availability back to Swiggy and Zomato automatically. When a flavour is sold out, it goes offline on the delivery platforms within seconds — before more orders come in for something you can't fulfil.
Low-stock alerts fire before an outlet runs out, not after. Operations staff get notified with enough time to act.
What this replaces
Manual end-of-day stock counting. Spreadsheet updates. WhatsApp messages to head office saying "we've run out of mango." Discovering stockouts after they've already caused order rejections.
What this delivers
Real-time stock visibility across all 200 outlets. Automatic delivery platform updates. Auditable trail of every stock movement. Fewer stockouts. Less waste. Less manual work.
07 // Reporting & ERP
Reports That Open in Under a Second
In plain terms
Most reporting systems make you wait while they calculate. Ours pre-calculates everything in the background so that when you open a report, the answer is already there. A finance person can pull a 200-outlet sales breakdown by payment mode for any date range — and it loads instantly. They can also build their own reports without asking engineering for help.
The Reporting service maintains a dedicated read replica with TimescaleDB fact tables pre-aggregated by day, item, category, and payment mode. Materialized views — mv_daily_sales, mv_item_performance, mv_payment_mode and others — make dashboard queries sub-second regardless of the date range or outlet count queried.
Custom SQL query builder: A no-code report builder with schema metadata, safe join validation, and automatic tenant-scope injection lets the operations team construct ad-hoc reports without engineering. The builder prevents invalid queries at the UI level before they ever reach the database.
Excel and CSV exports run asynchronously — the system generates the file, stores it in MinIO (an S3-compatible object store), and serves it back via a signed download URL. Large exports don't block the UI or slow the system.
ERP sync: Invoice generation, tax settlement exports, and payment reconciliation APIs are consumed directly by the finance team's existing tooling. The ERP gets clean, structured data — no manual export, no reformatting, no copy-paste between systems.
Reports available out of the box
08 // Compliance & Security
Built for Audit. Built for Enterprise.
In plain terms
If a government auditor walks in tomorrow asking for GST records, they're available instantly and complete. If an internal investigation needs to know who changed a price at outlet 47 on a specific date, the system can show exactly that — with the before and after values and the timestamp. Security is enforced at the database level, not just on the surface.
RBAC at the ORM Layer
Role-based access control is enforced at the data layer — not just middleware. An outlet manager's credentials cannot retrieve data from another outlet even if they construct a direct API request. Access scoping is structural, not permissive.
Append-Only Audit Log
Every state change — order, price, stock, user, configuration — is captured with before/after values, the user ID who made the change, and an exact timestamp. The log is append-only: records cannot be edited or deleted, only added. It is the authoritative record for any dispute or investigation.
JWT with JTI Token Revocation
Authentication tokens include a unique identifier (JTI) that allows individual sessions to be revoked without invalidating every other session. If a staff member's device is lost, their access is terminated precisely — no forced logout of every other user in the system.
VAPT Cleared — Web Application & Network Layer
Vulnerability Assessment and Penetration Testing reports on file for both the web application layer and the network infrastructure. The platform has been independently tested and cleared. bcrypt for password storage. Encrypted storage for all integration credentials (aggregator API keys, payment gateway secrets). Rate limiting at 500K requests/minute at the gateway. Cloudflare DDoS protection and WAF active in front of all traffic.
GST-Native — Not an Afterthought
GST compliance is built into the order and invoice lifecycle — not bolted on as an export. HSN-code mapping, GSTIN validation, tax calculation at the item level, and return-ready data exports are handled by the platform at the point of transaction. Audit readiness across all 200 outlets at any moment.
09 // Observability
See Everything. Know Before It Breaks.
In plain terms
The system monitors itself. If an outlet is having issues receiving orders, an alert fires before the outlet manager notices and calls support. If the database is getting slow, the on-call engineer is notified before any customer is affected. This is how a platform maintains 99.9% uptime — not by hoping nothing goes wrong, but by detecting problems in the first few seconds and having a human respond.
Every service emits three signals simultaneously:
Structured JSON logs from every service, shipped via Promtail to Loki. Searchable by outlet, order ID, user, error type, or any field — across the entire fleet of services in one query.
Prometheus metrics scraped to persistent TSDB. Grafana dashboards cover order throughput by channel, service-level latency, database connection pool utilisation, Redis sentinel health, and queue depth. Historical data retained for trend analysis and capacity planning.
OpenTelemetry traces collected by Tempo. A single order request can be traced from the Swiggy webhook arrival through Integration → Order → Inventory → Reporting in one view. When something is slow, the trace shows exactly which service and which database call caused it.
Alertmanager routes on-call pages based on configured thresholds — error rate spikes, latency degradation, outlet connectivity failures, sentinel failover events. The team knows before the business does.
10 // Outcomes
The Numbers
11 // Technology Stack
What's Next
This for Your Operation.
We start with a free audit of your current stack — POS, aggregator integrations, reporting, and infrastructure. We show you exactly what's leaking revenue, what's at risk, and what a proper platform looks like for your scale.