A panic in a goroutine without a recover takes the whole panel down. The
per-node heartbeat and traffic-sync goroutines run remote network I/O for
each node with no panic isolation, so one misbehaving node could crash the
master.
Add common.GoRecover(name, fn), which runs fn in a goroutine guarded by a
recover that logs the panic with a stack trace instead of crashing, and use
it for the per-node heartbeat, traffic-sync and global-push goroutines. The
deferred WaitGroup/semaphore releases still run during panic unwind, so the
group never stalls. Other background goroutines can adopt the same helper.