Real Talk
Jim Foley, Director of R&D, Real Intent
Jim Foley, prior to his role as R&D Director at Real Intent, was Head for Power Analysis Products at Sequence Design, and has held product management and development roles at Cadence Design Systems. Jim received a BS in Electrical Engineering from Worcester Polytechnic Institute. Go Engineers!

## Ascent Lint Rule of the Month: COMBO_NBA

March 14th, 2013 by Jim Foley, Director of R&D, Real Intent

One of the first things you learn about when modeling logic in Verilog is to avoid race conditions.  You can do this by coding clocked registers with non-blocking assignments. So why not make life simple, and use non-blocking assignments for combinational logic too?

Let’s back up a bit and review the basics:
A problem occurs when the target of one register assignment feeds into the assignment for the next register stage. Without some kind of delay, a value could ‘race’ from one assignment right through the next register stage in the same instant of simulation time.

always @(posedge clk)

bb = f1(aa);  // When clk rises, bb is determined by aa

always @(posedge clk)

cc = f2(bb);  // The same instant, cc could get the new result.. This is not what we want!

This race is solved by non-blocking assignments. A non-blocking assignment causes the value of the right-hand-side to be held until the end of the current simulation cycle.   All the ‘@(posedge clk)‘ blocks triggered by the same clock are run and the right-hand-side values for each non-blocking assignment are determined. Then the left-hand-side of each assignment is updated at the end of the simulation cycle.  Next, a new simulation cycle starts and any blocks that are sensitive to the updated signals are triggered, still within the same instant of simulation time.

For example:

always @(posedge clk)

bb <= f1(aa);  // When clk rises, f1(aa) is calculated and an update to bb is scheduled.

always @(posedge clk)

cc <=f2(bb);  // The original value of bb is used, and an update to cc is scheduled
// before the new value of bb is assigned.

So far, so good.  It’s important to note that if you’re calculating some intermediate result and use more than one assignment in a block, a non-blocking assignment is probably not what you want.

Consider the following:

always @(posedge clk)

begin
bb <= f1(aa);   //
bb gets scheduled with the value of f1(aa)
cc <=f2(bb) ;   // cc will get f2(bb), based on the original value of bb.
end

Is this register pipeline what the designer intended?  Maybe, but it’s not clear from how the code is written.

So, putting intermediate assignments within a block aside for now, still leaves us the question: Why not use non-blocking assignments for combinational logic?  Non-blocking assignments model a kind of delay, albeit a delay that executes in zero simulation time.  Gates have delays.  Why not use non-blocking assignments to model them?  If you don’t need an intermediate result of an assignment within a block, what harm does it do?

It turns out that using non-blocking assignments for combinational logic will slow down  simulation performance and slow it quite a lot.

When you have a collection of process blocks that feed values from one to another, your optimizing compiler can be smart about scheduling execution to avoid running the same block more than once.  This optimization doesn’t work with events that occur across different simulation cycles, as non-blocking assignments require. Consider this small combinational cloud modeled with non-blocking assignments:

always @(aa)

bb <= f1(aa);

always @(aa, bb)

cc <= f2(aa, bb);

always @(aa, bb, cc)

dd <= f3(aa, bb, cc);

The value of aa changes in the first cycle of a simulation timestep, and the three blocks are executed. Updates are scheduled for bb, cc, and dd, and the results are updated at the end of this first cycle.  In the next simulation cycle, the two blocks sensitive to bb and cc are executed again with these updated values, which schedule updates to cc and dd.  These are updated at the end of the second cycle. In the third cycle, the change on dd causes the last block to be executed a third time, scheduling a third update to dd for the end of the third cycle.

The successive updates have a compounding effect on activity in the simulator, causing the same logic blocks to be reevaluated multiple times.  If another block is triggered on a change on dd, it will be executed three times, once for each update across three simulation cycles.

This example illustrates one of the reasons why delay-accurate gate level simulation is substantially slower than running RTL.  However, except for the obvious case of intermediate assignments within the same block, and other less common situations like feedback loops, using non-blocking assignments for combinational logic will still give you the right answer in simulation. Your simulator will just take longer to calculate it. Non-blocking assignments are essential for modeling registers, but can be a silent killer of simulation performance if used where they’re not needed.

The COMBO_NBA lint rule will point out non-blocking assignments that appear in combinational logic blocks. It’s worth an investment of your time to replace these with blocking assignments, even in the short run, so valuable simulation cycles can be spent on verifying your design.