GAAvernor

GAR

从分布式服务器接收到各自的梯度 $V_{i}^{t}$ 之后，将不同的梯度更新方向整合到一个方向。通常来说，参数的更新方式为： $θ_{t + 1} = θ_{t} - λ F (V_{1}^{t}, \dots, V_{n}^{t})$ 其中 $λ$ 为学习率。

Clasical GAR

F (V_{1}, \dots, V_{n}) = \frac{1}{n} i = 1 \sum n V_{i}

Linear GAR

F (V_{1}, \dots, V_{n}) = i = 1 \sum n α^{i} V_{i}

Workers

Benign Worker

如果一个 worker 在时刻 t 的梯度 $V^{t}$ 是无偏的，即 $E V^{t} = g_{t}$ ，则称此 worker 是 Benign Worker 。

Byzantine Worker

如果一个 worker 在时刻 t 的梯度 $V^{t}$ 是有偏的，即 $E V^{t} - g_{t} \neq = 0$ ，则称此 worker 是 Byzantine Worker 。

Role Function

用来决定当前 worker 是 Benign 还是 Byzantine 。如果是 Byzantine Worker，在传送梯度到 server 时会对梯度进行篡改。

Previous Defenses

Brute-Force

找一个最优集 $C^{\*}$ ，它是 $Q$ 的子集，大小为 $n - m$ ，

C^{\*} = ar g C \in R min (V_{i}, V_{j}) \in C \times C max ∣∣ V_{i} - V_{j} ∣∣

GAR 函数为：

F (V_{1}, \dots, V_{n}) = \frac{1}{n - m} V \in C^{\*} \sum V

需要满足 Byzantine ratio： $n \geq 2 m + 1$

GeoMed

找与其他 worker 距离之和最小的一个 worker 的梯度作为最终梯度。

GAR 函数为：

F (V_{1}, \dots, V_{n}) := ar g V_{i} min j \neq = i \sum ∣∣ V_{i} - V_{j} ∣∣

需要满足 Byzantine ratio： $n \geq 2 m + 1$

Krum

首先利用 Brute-Force GAR 的方法找一个大小为 $n - m - 2$ 的最优集，然后对每个最优集中的 worker，计算： $s (V_{i}) = \sum_{i \to j} ∣∣ V_{i} - V_{j} ∣ ∣^{2}$

GAR 函数为：

F (V_{1}, \dots, V_{n}) = ar g V_{i} \in Q min s (V_{i})

需要满足 Byzantine ratio： $n \geq 2 m + 3$

Bulyan

首先，它运行 Krum 方法得到 $n - 2 m$ 的梯度的选择集，然后，它在每个方向上计算 $F$ ： $F$ 的第 i 个坐标等于选择集中第 i 个坐标的中值旁边 $n - 4 m$ 个值的平均值。

需要满足 Byzantine ratio： $n \geq 4 m + 3$

Security Assumptions

Server 是安全可信的。

至少有一个 worker 不被敌人控制。

各个 worker 的数据集是独立同分布的。

GAA 有权限接触到验证集。

Goals

确保分布式学习可以得到正确的梯度，将损失降到可接受的范围。

Distributed Learning as a Markov Decision Process

States $s_{t} := (Q_{t}, θ_{t}, \hat{f}_{B} (θ_{t}))$

$θ_{t}$ 是 Server 的参数， $Q_{t}$ 是接收到的梯度， $\hat{f}_{B} (θ_{t})$ 是 Server 在验证集 $B$ 上的损失。

FF's Roam Notes

Explorer

GAAvernor

GAR

Clasical GAR

Linear GAR

Workers

Benign Worker

Byzantine Worker

Role Function

Previous Defenses

Brute-Force

GeoMed

Krum

Bulyan

Security Assumptions

Server 是安全可信的。

至少有一个 worker 不被敌人控制。

各个 worker 的数据集是独立同分布的。

GAA 有权限接触到验证集。

Goals

Distributed Learning as a Markov Decision Process

States $s_{t} := (Q_{t}, θ_{t}, \hat{f}_{B} (θ_{t}))$

Actions 各个 worker 的权重

Rewards $\hat{f}_{B}(\theta_t)-\hat^{f}_{B}(\theta_{t+1})$

Views

Methods

Experiments

Conclusion

Graph View

Table of Contents

FF's Roam Notes

Explorer

GAAvernor

Related Works

GAR

Clasical GAR

Linear GAR

Workers

Benign Worker

Byzantine Worker

Role Function

Previous Defenses

Brute-Force

GeoMed

Krum

Bulyan

Security Assumptions

Server 是安全可信的。

至少有一个 worker 不被敌人控制。

各个 worker 的数据集是独立同分布的。

GAA 有权限接触到验证集。

Goals

Distributed Learning as a Markov Decision Process

States st​:=(Qt​,θt​,f^​B​(θt​))

Actions 各个 worker 的权重

Rewards \hat{f}_{B}(\theta_t)-\hat^{f}_{B}(\theta_{t+1})

Views

Methods

Experiments

Conclusion

Graph View

Table of Contents

States $s_{t} := (Q_{t}, θ_{t}, \hat{f}_{B} (θ_{t}))$

Rewards $\hat{f}_{B}(\theta_t)-\hat^{f}_{B}(\theta_{t+1})$