[HDLBits] Cs450/gshare

Branch direction predictor
分支方向预测器

A branch direction predictor generates taken/not-taken predictions of the direction of conditional branch instructions. It sits near the front of the processor pipeline, and is responsible for directing instruction fetch down the (hopefully) correct program execution path. A branch direction predictor is usually used with a branch target buffer (BTB), where the BTB predicts the target addresses and the direction predictor chooses whether to branch to the target or keep fetching along the fall-through path.
分支方向预测器会预测条件分支指令方向的执行/不执行情况。它位于处理器流水线前端附近，负责引导指令提取到（希望）正确的程序执行路径。分支方向预测器通常与分支目标缓冲区 (BTB) 一起使用，其中 BTB 预测目标地址，方向预测器选择是分支到目标地址还是继续沿着 fall-through 路径提取指令。

Sometime later in the pipeline (typically at branch execution or retire), the results of executed branch instructions are sent back to the branch predictor to train it to predict more accurately in the future by observing past branch behaviour. There can also be pipeline flushes when there is a mispredicted branch.
在流水线的某个稍后阶段（通常是在分支执行或退出时），已执行分支指令的结果会被发送回分支预测器，以便通过观察过去的分支行为来训练它，使其能够更准确地预测未来的分支。当出现分支预测错误时，也可能会触发流水线刷新。

Branch direction predictor located in the Fetch stage. The branch predictor makes a prediction using the current pc and history register, with the result of the prediction affecting the next pc value. Training and misprediction requests come from later in the pipeline.
分支方向预测器位于提取阶段。分支预测器使用当前 pc 和历史寄存器进行预测，预测结果会影响下一个 pc 值。训练和错误预测请求来自流水线的后期。

For this exercise, the branch direction predictor is assumed to sit in the fetch stage of a hypothetical processor pipeline shown in the diagram on the right. This exercise builds only the branch direction predictor, indicated by the blue dashed rectangle in the diagram.
在本练习中，假设分支方向预测器位于右图所示的假设处理器流水线的提取阶段。本练习仅构建分支方向预测器，图中蓝色虚线矩形表示。

The branch direction prediction is a combinational path: The pc register is used to compute the taken/not-taken prediction, which affects the next-pc multiplexer to determine the value of pc in the next cycle.
分支方向预测是一条组合路径： pc 寄存器用于计算是否被采取的预测，这会影响下一个 pc 多路复用器以确定下一个周期中 pc 的值。

Conversely, updates to the pattern history table (PHT) and branch history register take effect at the next positive clock edge, as would be expected for state stored in flip-flops.
相反，模式历史表 (PHT) 和分支历史寄存器的更新将在下一个正时钟边沿生效，正如触发器中存储的状态所预期的那样。

Gshare predictor Gshare 预测器

Branch direction predictors are often structured as tables of counters indexed by the program counter and branch history. The table index is a hash of the branch address and history, and tries to give each branch and history combination its own table entry (or at least, reduce the number of collisions). Each table entry contains a two-bit saturating counter to remember the branch direction when the same branch and history pattern executed in the past.
分支方向预测器通常构造为由程序计数器和分支历史记录索引的计数器表。表索引是分支地址和历史记录的哈希值，并尝试为每个分支和历史记录组合赋予其自己的表条目（或至少减少冲突次数）。每个表条目包含一个两位饱和计数器，用于记住过去执行相同分支和历史记录模式时的分支方向。

One example of this style of predictor is the gshare predictor[1]. In the gshare algorithm, the branch address (pc) and history bits "share" the table index bits. The basic gshare algorithm computes an N-bit PHT table index by xoring N branch address bits and N global branch history bits together.
这种预测器的一个例子是 gshare 预测器 [1] 。在 gshare 算法中，分支地址 ( pc ) 和历史位“共享”表索引位。基本的 gshare 算法通过对 N 个分支地址位和 N 个全局分支历史位进行异或运算来计算 N 位 PHT 表索引。

The N-bit index is then used to access one entry of a 2N-entry table of two-bit saturating counters. The value of this counter provides the prediction (0 or 1 = not taken, 2 or 3 = taken).
然后使用 N 位索引访问 2 N 个两位饱和计数器表中的一项。该计数器的值用于预测（0 或 1 = 未执行，2 或 3 = 执行）。

Training indexes the table in a similar way. The training pc and history are used to compute the table index. Then, the two-bit counter at that index is incremented or decremented depending on the actual outcome of the branch.
训练以类似的方式对表进行索引。训练 pc 和历史记录用于计算表索引。然后，根据分支的实际结果，该索引处的两位计数器会递增或递减。

References 参考

S. McFarling, "Combining Branch Predictors", WRL Technical Note TN-36, Jun. 1993
S. McFarling，“ 合并分支预测器 ”， WRL 技术说明 TN-36，1993 年 6 月

Description 描述

Build a gshare branch predictor with 7-bit pc and 7-bit global history, hashed (using xor) into a 7-bit index. This index accesses a 128-entry table of two-bit saturating counters (similar to cs450/counter_2bc). The branch predictor should contain a 7-bit global branch history register (similar to cs450/history_shift).
构建一个 gshare 分支预测器，其 7 位 pc 和 7 位全局历史记录，并通过异或操作将其散列到一个 7 位索引中。该索引访问一个包含 128 个条目的两位饱和计数器表（类似于 cs450/counter_2bc）。 )。分支预测器应包含一个 7 位全局分支历史寄存器（类似于 cs450/history_shift ）。

The branch predictor has two sets of interfaces: One for doing predictions and one for doing training. The prediction interface is used in the processor's Fetch stage to ask the branch predictor for branch direction predictions for the instructions being fetched. Once these branches proceed down the pipeline and are executed, the true outcomes of the branches become known. The branch predictor is then trained using the actual branch direction outcomes.
分支预测器有两组接口：一组用于预测，一组用于训练。预测接口用于处理器的提取阶段，用于请求分支预测器对正在提取的指令进行分支方向预测。一旦这些分支指令沿着流水线向下执行，就会知道分支的真实结果。然后，使用实际的分支方向结果对分支预测器进行训练。

When a branch prediction is requested (predict_valid = 1) for a given pc, the branch predictor produces the predicted branch direction and state of the branch history register used to make the prediction. The branch history register is then updated (at the next positive clock edge) for the predicted branch.
当针对给定 pc 请求分支预测（ predict_valid = 1）时，分支预测器会生成预测的分支方向以及用于进行预测的分支历史寄存器的状态。然后，分支历史寄存器会在下一个正时钟沿根据预测的分支进行更新。

When training for a branch is requested (train_valid = 1), the branch predictor is told the pc and branch history register value for the branch that is being trained, as well as the actual branch outcome and whether the branch was a misprediction (needing a pipeline flush). Update the pattern history table (PHT) to train the branch predictor to predict this branch more accurately next time. In addition, if the branch being trained is mispredicted, also recover the branch history register to the state immediately after the mispredicting branch completes execution.
当请求训练分支时 ( train_valid = 1)，分支预测器会被告知正在训练的分支的 pc 和分支历史寄存器值，以及实际分支结果以及该分支是否为错误预测（需要流水线刷新）。更新模式历史表 (PHT) 以训练分支预测器，使其下次能够更准确地预测该分支。此外，如果正在训练的分支预测错误，还会将分支历史寄存器恢复到错误预测分支执行完成后的状态。

If training for a misprediction and a prediction (for a different, younger instruction) occurs in the same cycle, both operations will want to modify the branch history register. When this happens, training takes precedence, because the branch being predicted will be discarded anyway. If training and prediction of the same PHT entry happen at the same time, the prediction sees the PHT state before training because training only modifies the PHT at the next positive clock edge. The following timing diagram shows the timing when training and predicting PHT entry 0 at the same time. The training request at cycle 4 changes the PHT entry state in cycle 5, but the prediction request in cycle 4 outputs the PHT state at cycle 4, without considering the effect of the training request in cycle 4.
如果在同一周期内发生针对错误预测和预测（针对不同的、较新的指令）的训练，则这两个操作都需要修改分支历史寄存器。当这种情况发生时，训练优先，因为被预测的分支无论如何都会被丢弃。如果对同一 PHT 条目的训练和预测同时发生，则预测会在训练之前看到 PHT 状态，因为训练只会在下一个时钟上升沿修改 PHT。下图显示了同时训练和预测 PHT 条目 0 的时序。周期 4 的训练请求会改变周期 5 的 PHT 条目状态，但周期 4 的预测请求会输出周期 4 的 PHT 状态，而没有考虑周期 4 训练请求的影响。

Training and predicting using PHT entry 0 at the same time123456789clktrain_validtrain_pc ^ train_history0train_takenpht[0]123predict_validpredict_pc ^ predict_history0predict_takentrainpredictabc

areset is an asynchronous reset that clears the entire PHT to 2b'01 (weakly not-taken). It also clears the global history register to 0.
areset 是一个异步复位，它会将整个 PHT 清零为 2b'01（弱不执行）。它还会将全局历史寄存器清零。

module top_module#(parameter N = 128,parameter M = 7
)(input clk,input areset,input  predict_valid,input  [M-1:0] predict_pc,output predict_taken,output [M-1:0] predict_history,input train_valid,input train_taken,input train_mispredicted,input [M-1:0] train_history,input [M-1:0] train_pc
);reg [1:0] PHT [N-1:0];reg [M-1:0] GHR;assign predict_history = GHR;assign predict_taken = PHT[predict_history ^ predict_pc] > 1;integer i;always@(posedge clk or posedge areset) beginif(areset) beginfor (i=0; i<N; i=i+1) PHT[i] <= 1;GHR <= 0;endelse begin//处理GHR，回滚优先if(train_valid & train_mispredicted)beginGHR <= {train_history[M-2:0],train_taken};endelse if(predict_valid) beginGHR <= {GHR[M-2:0],predict_taken};   end//处理PHTif(train_valid) beginif(train_taken) beginPHT[train_history ^ train_pc] <= PHT[train_history ^ train_pc] == 3 ? 3 : PHT[train_history ^ train_pc] + 1;endelse beginPHT[train_history ^ train_pc] <= PHT[train_history ^ train_pc] == 0 ? 0 : PHT[train_history ^ train_pc] - 1;endend    endend
endmodule

还想用之前的模块，但发现越改越麻烦，还是重写吧。

那么HDLBits就堂堂完结了，希望秋招能找个好工作