The customer develops advanced fire safety systems where a multi-zone Fire Control Panel integrates with a protocol gateway and a cloud-based monitoring platform.
The system operates as three integrated tiers:
For this system to work correctly, three synchronization requirements must be met:
Validating this three-tier system required orchestrating multiple devices, systems, and event flows simultaneously. The team attempted manual testing: trigger smoke detectors, watch the gateway display, log into the cloud console, verify the event arrived, acknowledge the alarm, verify the acknowledgment reached the panel.
Each test case took 10-15 minutes. For dozens of zones, multiple alarm states, and failure scenarios, comprehensive testing required weeks of manual labour - and critical failure paths (cloud offline, simultaneous multi-zone alarms, gateway reconnection edge cases) were either skipped or tested only once.
The team had no way to systematically test whether event acknowledgments propagated correctly back to the panel when the gateway reconnected to the cloud, or what happened when three alarms arrived simultaneously, or whether events were lost if the cloud rejected a transmission with a transient error.
TestBot's Custom Simulator Agent bridges the three tiers of the fire safety system, controlling events and verifying state synchronization across all three layers. The simulator operates as a test harness that can:
Every operation is controlled by TestBot via parameterized test data. A test case defines a sequence of operations, what the panel should do in response, and what should appear in the cloud. TestBot executes the sequence automatically, compares results against expected values, and reports pass/fail.
Test 1: Single Zone Alarm - Basic Event Flow
Create smoke detection event on Zone 1 → Panel displays alarm and activates sounder → Event recorded in cloud with correct timestamp → Operator acknowledges via cloud console → Acknowledgment propagates to panel, sounder silences → Clear event in simulator → Cloud log updated with clear event.
Result: Pass - Event flows through all three tiers with correct timestamps. Acknowledgment propagates correctly.
Test 2: Multi-Zone Simultaneous Alarms
Create three smoke alarms within 50ms of each other → Panel identifies all zones and displays highest priority → All three events recorded separately in cloud → Acknowledge Zone 1 only → Cloud and panel both show Zone 1 acknowledged, Zones 2 & 3 still in alarm → Acknowledge remaining zones → All zones show acknowledged → Clear all events.
Result: Pass - System correctly isolates and tracks multiple simultaneous events. Partial acknowledgment does not interfere with other alarms.
Shape
Test 3: Gateway Offline During Alarm
Simulate gateway losing cloud connectivity → Create alarm on Zone 1 while gateway offline → Verify alarm in gateway local buffer but NOT in cloud → Simulate gateway reconnection → Cloud receives buffered alarm with original timestamp → Verify no duplicate entries → Acknowledge alarm via cloud → Acknowledgment propagates back through gateway to panel.
Result: Pass - Gateway correctly buffers events during offline and replays without loss or duplication. Timestamps preserved.
Test 4: Cloud Rejection and Retry Logic
Create alarm on Zone 1 → Simulate cloud API returning error 409 (Conflict/Duplicate) → Verify gateway detects rejection and initiates retry → Simulate cloud accepting the event on retry → Query cloud event log - event recorded after N retries → Verify gateway logs show initial rejection and successful retry.
Result: Pass - Gateway implements retry logic on cloud rejection. Event succeeds without manual intervention.
Test 5: Acknowledgment Propagation Failure
Create alarm on Zone 1 → Acknowledge via cloud console → Simulate gateway-to-panel communication failure → Verify cloud state shows acknowledged but panel still shows alarm active → Test detects state incoherence and logs failure → Simulate gateway recovery → Acknowledgment retransmitted to panel → State coherence restored.
Result: Fail (with caveat) - System detects state divergence. Acknowledgment eventually propagates and corrects the state. Issue escalated: persistent gateway-to-panel failures can leave the sounder active while operators believe it is silenced - compliance concern.
Test 6: Supervisory and Fault Events
Create low battery supervisory event on Zone 2 → Panel displays supervisory indication without triggering sounder → Create alarm on Zone 1 while supervisory active → Panel correctly prioritizes alarm over supervisory → Cloud log shows both events independently tracked → Clear supervisory event → Verify alarm on Zone 1 unaffected.
Result: Pass - Supervisory and alarm events correctly isolated and tracked independently. No interference between event types.
Bug 1: Event ID Collision Under High-Rate Alarm
When three alarms triggered within 50ms, the gateway assigned identical event IDs (based on millisecond-resolution timestamp). Cloud rejected the second alarm as a duplicate, silently dropping it from the event log. In a real multi-zone fire, later alarms could be lost.
Fix: Event ID now includes sub-millisecond counter.
Bug 2: Acknowledgment Lost During Gateway Reconnect
When the gateway reconnected to the cloud, the command queue (containing pending acknowledgments) was cleared, discarding the acknowledgment before it reached the panel. Operator sees "acknowledged" in cloud while the sounder is still active in the building.
Fix: Command queue preserved across reconnections with retry mechanism.
Bug 3: State Incoherence After Partial Acknowledgment
Panel incorrectly transitioned to all-clear when one zone was cleared, even though other zones remained in alarm. System falsely signalled "all clear" while actual alarm condition was still active.
Fix: State machine now tracks acknowledged vs. normal separately; all-clear only when all zones normal.
Bug 4: Missing Timestamps During Gateway Offline Replay
Events buffered while gateway was offline were timestamped in the cloud with the replay time, not the original event time. Timeline became inaccurate for compliance reporting.
Fix: Gateway now preserves original event timestamps and transmits them during replay.
Bug 5: Event Loss on Cloud 5xx Errors
When cloud returned a 5xx server error (transient), the gateway gave up after one retry and discarded the event. Transient cloud issues resulted in permanent event loss.
Fix: Retry logic now distinguishes between client errors (stop) and server errors (indefinite retry).
| Metric | Before TestBot | After TestBot | Improvement |
|---|---|---|---|
| Test execution time | 8-10 hours | 18 minutes | 97% reduction |
| Test coverage | ~25% | 100% | +300% |
| Time to detect bugs | 6-12 weeks (field) | Immediate (pre-release) | Before customer |
| Release cycle time | 5-6 weeks | 3 weeks | 45% faster |
| Field failures (post-release) | 2-3 per year | 0 (ongoing) | 100% elimination |
| Estimated savings | - | $2.3M annually | - |
The fire safety system demonstrates a critical class of product where integration testing is regulatory-mandated (NFPA 72, EN 54). Manual testing of a three-tier system (panel, gateway, cloud) is not scalable. TestBot's Custom Simulator Agent made comprehensive integration testing achievable:
For any product integrating external systems, networks, or cloud platforms, automated three-tier validation is essential. The automotive instrument cluster appendix demonstrates how TestBot's multi-agent architecture scales to include mechanical actuation and visual validation - applicable across any embedded system requiring coordinated electrical, mechanical, and visual testing.