Federated learning (FL) trains a shared model across many distributed clients (such as phones, browsers, IoT devices, or data silos) without centralizing raw data. In each round of training, selected clients download the current global model, train locally on their private data, and then upload updated parameters or gradients back to a central server for aggregation.
This data never leaving the device is great for privacy, but it creates a new bottleneck: communication. Model parameters can be tens or hundreds of megabytes, and sending them repeatedly across hundreds or thousands of clients quickly adds up to large bandwidth consumption and significant time overhead. For many deployments, network cost and latency are just as important as model accuracy.
The Federated Learning Communication Cost Calculator on this page helps you quantify that overhead. Given your model size, number of participating clients, number of training rounds, and typical client uplink/downlink bandwidth, it estimates:
Use these estimates to decide whether a given FL setup is feasible on your network, to compare alternative designs (for example, fewer clients vs. more rounds), or to motivate optimizations such as compression and sparse updates.
The form above asks you to enter five key parameters that describe your federated learning configuration.
What it is: The size of the model parameters that each client downloads and uploads in every round, measured in megabytes (MB). You can think of this as the size of the serialized weights file that would be transferred over the network.
Typical values:
If your framework reports model size in megabits or gigabytes, convert to megabytes before using the calculator.
What it is: The number of devices that participate in each round. In cross-device FL, this might be a small fraction of a very large eligible pool; here, you should enter the actual number of clients per round, not the total fleet size.
Typical values:
What it is: The number of global aggregation steps you plan to run. In each round, the server sends the current model to the selected clients, they train locally, and then they upload updates for aggregation.
Typical values: From a few dozen rounds (simple models, large datasets, strong clients) to several hundred or more (non-IID data, constrained devices, aggressive privacy or robustness requirements).
What it is: The typical upload bandwidth available to each client, in megabits per second (Mbps). This controls how fast clients can send updates back to the server.
Typical values:
What it is: The typical download bandwidth available to each client, in megabits per second (Mbps). This controls how fast clients can receive the global model from the server.
Typical values:
In many real deployments, downlink is faster than uplink, so upload time often dominates the communication cost.
The calculator uses a simple but widely applicable model of synchronous federated learning where each participating client sends and receives the full model in every round.
Each round, each client downloads the current global model and uploads its updated model of approximately the same size. Under that assumption, the total data transferred per client per round is:
where D_c is measured in megabytes (download + upload).
The total data transferred across all clients in one round is then:
D_r = D_c ร N = 2 ร M ร N (in MB per round).
Over R training rounds, the total communication volume across all clients becomes:
D_total = D_r ร R = 2 ร M ร N ร R (in MB over the entire training run).
If you prefer gigabytes, you can divide the result by 1024.
Bandwidth is provided in megabits per second, but model size is in megabytes. Since 1 byte = 8 bits, a model of M MB contains 8 ร M megabits of data.
The download time per client per round is:
t_down = (M ร 8) / B_d (seconds)
and the upload time per client per round is:
t_up = (M ร 8) / B_u (seconds)
Assuming clients cannot perfectly overlap upload and download for the same round, the per-round communication time per client is approximated as:
t_round = t_down + t_up (seconds per round per client).
Multiplying by R gives an estimate of the total communication time per client across all rounds:
T_total = t_round ร R (seconds).
The calculator combines your inputs using the formulas above to output a few key quantities. Exact labels may vary depending on your implementation, but conceptually you will see:
These figures are upper-level estimates, not strict guarantees. Network conditions can fluctuate, and production systems often overlap communication, computation, and scheduling. Use the outputs to compare scenarios and identify potential bottlenecks rather than as precise SLAs.
Suppose you want to coordinate a federated learning experiment across mobile phones with the following configuration:
D_c = 2 ร M = 2 ร 20 MB = 40 MB
Each client transfers 40 MB per round (20 MB down, 20 MB up).
D_r = D_c ร N = 40 MB ร 100 = 4,000 MB
This is about 4,000 / 1024 โ 3.9 GB of data per round across all clients.
D_total = D_r ร R = 4,000 MB ร 50 = 200,000 MB
That is roughly 200,000 / 1024 โ 195 GB of total data transferred across the fleet for the entire job.
First convert model size to megabits: 8 ร M = 8 ร 20 = 160 megabits.
Download time per client per round:
t_down = (8 ร M) / B_d = 160 / 20 = 8 seconds
Upload time per client per round:
t_up = (8 ร M) / B_u = 160 / 10 = 16 seconds
Per-round communication time per client:
t_round = t_down + t_up = 8 + 16 = 24 seconds
Total communication time per client across all 50 rounds:
T_total = t_round ร R = 24 ร 50 = 1,200 seconds
That is 1,200 / 60 = 20 minutes of cumulative communication time per client over the full training run, assuming ideal conditions and no overlap with computation.
With these numbers in mind, you might decide that 195 GB of aggregate traffic and roughly 20 minutes of communication per client is acceptable on WiโFi but too heavy for mobile data, motivating techniques such as smaller models or fewer rounds.
The same formulas can be used to compare different FL regimes. The table below illustrates how communication characteristics change between a cross-device mobile scenario and a cross-silo data-center scenario, using plausible (but simplified) numbers.
| Scenario | Model Size (MB) | Clients per Round | Rounds | Uplink / Downlink (Mbps) | Total Data Across Clients | Per-round Time per Client |
|---|---|---|---|---|---|---|
| Cross-device mobile | 20 | 100 | 50 | 10 / 20 | โ195 GB | โ24 s |
| Cross-silo data center | 200 | 10 | 100 | 200 / 500 | โ390 GB | โ5.1 s |
In the cross-device case, you move less total data than in the cross-silo case, but each client is slower due to weaker bandwidth. In the cross-silo case, you can afford much larger models and more rounds because each site has strong network links, even though the total traffic can be higher.
Use the calculator to plug in your own configurations and see where your scenario falls on this spectrum.
The calculator is intentionally simple and is designed for quick, back-of-the-envelope estimates. It does not model all the complexities of real federated learning systems. Keep the following assumptions and limitations in mind when interpreting results:
Because of these simplifications, you should treat the outputs as approximate indicators and safety-margin them appropriately when planning budgets, SLAs, or production rollouts.
Here are some common ways practitioners use this type of communication cost estimate:
By adjusting the inputs interactively and observing how communication volume and time change, you can quickly build intuition about which levers (model size, clients, rounds, bandwidth) matter most for your federated learning deployment.