May 24, 2004
Low Power Soc Design
Please note that contributed articles, blog entries, and comments posted on EDACafe.com are the views and opinion of the author and do not necessarily represent the views and opinions of the management and staff of Internet Business Systems and its subsidiary web-sites.
| by Jack Horgan - Contributing Editor
Posted anew every four weeks or so, the EDA WEEKLY delivers to its readers information concerning the latest happenings in the EDA industry, covering vendors, products, finances and new developments. Frequently, feature articles on selected public or private EDA companies are presented. Brought to you by EDACafe.com. If we miss a story or subject that you feel deserves to be included, or you just want to suggest a future topic, please contact us! Questions? Feedback? Click here. Thank you!
In highly mobile, frequently wireless applications like cell phones, personal digital assistants, palmtops and multimedia players the trend is towards smaller and lighter weight packaging yet with increasing functionality and lower price. The demand from end users for increased battery service life is considerable. Expectations for several hours of active use and much longer for standby operation are the norm. For high performance, non-battery operated systems such as workstations, servers and networking computers there are concerns regarding packaging expense, cost for cooling and fans for heat removal, and reliability due to temperature effects. According to one industry
the chip failure rates double for every 10 to 20º C increase in operating temperature under normal ambient conditions. Laptop and notebook computers have power related issues common to both.
Power density measured in watts per square centimeter is rising dramatically. The Pentium 4 had a power density of 46 watts/cm2, a factor of seven times the power density of the Intel486. The ratio of maximum peak power for the two machines is 20:1. The power density of a Pentium processor is already higher than a hot plate. In an often published Intel chart of power density versus process node the slope of the curve is headed toward nuclear reactor and rocket nozzle over the next five years.
The expression for power is given by:
Power = P = Pdynamic + Pstatic = Pswitching + Pshort-circuit + Pleakage + Pstatic
Power consumption has both dynamic and static components. Dynamic power consists of switching power and short circuit power. Switching power consumption results from the actively changing states of a circuit due to the charging and discharging of the effective capacitive loads. The Interconnect consumes the majority of dynamic power with clock power being the largest contributor. Switching power is given by:
Pswitching = afCeffVdd2 where
a is a measure of switching activity. Since a circuit typically does not switch every cycle, alpha can be thought of as the probability of switching
f is clock frequency
Ceff is the effective capacitance which includes any internal capacitance associated with the gate's transistors and any external capacitances, which are comprised of parasitic wire capacitance and the input capacitance associated with any downstream logic gates. This is a function of fan-out, wire length, transistor size.
Vdd is the supply voltage
Energy is defined as the integral of power over time and is not a function of frequency.
Techniques to lower switching power are combinations of reducing activity, capacitance and supply voltage. One way to reduce activity is to lower the clock frequency which impacts the performance. The best known technique for activity reduction is clock gating. Clock network power can account for as much as 75 percent of the total switching power of a chip, and sequential cells driven by clocks can account for as much as 70 percent of the total clock power. Turning off the clocks to these elements when they are not switching results in significant power savings. Using a simple AND or OR gate (depending on the edge on which flip-flops are triggered) with the enable and clock signals as
inputs produces a gated clock as output. One can also employ a level-sensitive latch to hold the enable signal from the active edge until the inactive edge of the clock.
Similarly one could employ power gating recognizing that the system or sizable blocks within the system have both active and standby modes of operations. Power could be shut off or gated to these blocks when operating in a standby mode and restored as needed. The gated circuitry would not dissipate any power when turned off. Additional circuitry would be required to monitor the need for these functional blocks. A problem with power gating is the latency between when the signal to turn a unit on arrives and when the unit is ready to operate. Retention flip-flops on an isolated power supply could be used to save the logic state of all sequential elements when a chip is powered down,
eliminating the need to reinitialize the device when it comes out of standby mode. Some products support multiple levels of standby (soft off, nap and sleep) which differ in terms of the amount of power saving and latency.
Load Capacitance can be reduced by downsizing the gates and by minimizing total wire length. Power-aware placement algorithms could minimize the length of critical wires, thereby reducing their associated parasitic capacitances. Such algorithms should weigh the amount of switching activity that is associated with each wire.
Since the switching power depends quadratically on the voltage supply, voltage scaling should be a fruitful area for power reduction. However, this also reduces the switching speed of the gate. One approach would be to partition the design into functional blocks that are fed by different voltage supplies. These are referred to as voltage islands. Performance critical functions could be located in the higher voltage domains and less critical functions in the lower voltage domains. Voltage converters or level shifters would be required between modules operating at different supply voltage levels.
Several things can be done during the very early design phases to improve the power consumption situation including algorithm development, data representation, precomputation based logic, logic restructuring, pipelining, guarded evaluation and delay balancing. Tradeoffs can be made between functional parallelism and frequency and/or voltage. A block of logic running at a given frequency with a given supply voltage could be replaced with two copies of that block. Each copy would performs half of the task while running at a lower frequency and/or using a lower voltage. Using this technique, the total power consumption of the function may be reduced. The performance is maintained but at the
expense of using more silicon area.
Pshort-circuit = IscVdd
During a transition this is a momentary short circuit current or “crowbar” current flowing between Vdd and GND when transistor stacks switch state.
In the past dynamic power was the dominate factor. However, leakage power is becoming a significant issue as the industry moves to new process nodes as shown in familiar chart from Intel. The x-axis could be expressed in terms of the year.
The three primary sources of leakage current are sub-threshold (I1) or source-to-drain leakage current which grows exponential with lowering Vt and increasing temperature, reverse bias junction band-to-band tunneling current (I2), and gate oxide tunneling current (I3) which could be addressed by using high-k dielectric material. Sub-threshold leakage is dominant.
For mobile/portable devices with a high standby-to-active ratio, leakage current may be the dominant factor in determining overall battery life.
It should be noted that an off transistor stack has an order of magnitude lower threshold leakage than an individual transistor.
Sub-threshold leakage current is proportional to exp[q(Vgs-Vt)/nkT]. According to this relationship, leakage current and therefore power dissipation increases exponentially with decreasing threshold voltage and with increasing temperature. Of course using high-Vt transistors will degrade performance. A solution is to have a mixture of high and low Vt transistors. Use low Vt transistors on timing-critical paths and high Vt transistors on non-critical paths. This approach is referred to as dual Vt design.
Multi-Threshold CMOS (MTCMOS) cells can be used to control leakage power. Low Vt transistors are used to implement gates for high speed, while high Vt transistors are added to form virtual rails. These high Vt transistors suppress the leakage current when the Sleep signal is activated. Of course there needs to be a sleep control mechanism
Variable Threshold CMOS (VTCMOS) is a body biasing technique that controls effective threshold voltage by applying substrate bias to MOS transistors. In the active mode a zero body bias is applied. In standby mode, the effective threshold voltage is made to be larger by applying a reverse substrate bias to block the leakage current. Transistor performance in the active mode is kept the same as that in the conventional design by utilizing low VDD and low Vt.
Thus far we have focused on the need for and the methods of reducing power consumption and dissipation. There other factors such as temperature, timing, signal integrity and reliability which are impacted by power design decisions and need to addressed.
IR drop is a supply voltage reduction across a large IC as current flows through its power grid. The voltage drop may causes the voltage supplied to the affected cells to be lower than required, leading to larger gate and signal delays, which in turn can cause timing degradation in the signal paths as well as clock skew. Voltage drop on power and ground grids also reduces the noise margins and compromises the signal integrity of the design. The IR-drop effect in the power/ground network increases rapidly with technology scaling. Traditional counter measures include wire-sizing and decoupling capacitor insertion with resource allocation schemes.
The effects of changes in supply and bias voltages on timing and power must be characterized. In the past k-factor based derating functions in .lib format sufficed. Today, more advanced equation-based methods using Scalable Polynomial Models (SPM) are required.
Electromigration (EM) is the flow of metal ions under the influence of high electric current densities resulting in the depletion and accumulation of metal ions along the interconnect. The migration of material caused by electron “wind” creates voids upwind and causes metal ions to accumulate downstream into “hillocks” or “whiskers,” In power grid wires, the increased resistance due to EM can result in larger IR drops and degradation in gate delay. Power EM is harmful from the point of view of design reliability as the voids can cause open circuits, while the hillocks tend to cause shorts in neighboring wires. EM is dependent upon temperature,
current densities and the length and width
of wires. In the case of signal lines the two main forms of EM are referred to as the wire self-heat and hot electron effects. Wire self-heat occurs when the current density is too high. The resulting heating effect causes the affected tracks to expand and contract, which degrades the reliability of the design. The hot electron problem refers to the case where carriers become trapped in a transistor's channel. This distorts the field used to control the transistor, which results in performance degradation.
How are some representative vendors responding to the challenges of low power design in terms of estimation, synthesis, optimization, and analysis offerings? Thumbnail sketches follow.
Atrenta a venture funded ($17 million) spin-off of Interra, Inc started in 2001. The company's core product, SpyGlass, is a predictive analyzer that can do in-depth structural analysis at the RT-Level through the use of its unique “look-ahead” architecture that is based on fast-synthesis and cycle-based simulation engines. SpyGlass is able to quickly identify critical problems such as combinational loops, synchronization across multiple clock domains, tri-state bus decoding errors, and wasted real-estate.
Atrenta offers a number of application specific tools including SpyGlass LP - designing RTL for low power. LP analyzes designs for low power issues at the block level and chip level. It helps visualize voltage (power) domains, examines issues related to signals crossing domain boundaries and assists with clock gating. It also employs a switching analysis engine and a comprehensive rule set.
SpyGlass LP is part of a comprehensive analysis solution that includes checks for complex design problems such as clock domain crossings, synchronization, set-reset, area, electrical rule checking, design-for-test, and constraints validation.
Sequence Design, Inc. was formed in June 2000 by the merger of Frequency Technology and Sente, Inc. In January of 2001, Sequence merged with Sapphire Design Automation. Today the company has 90 employees. I spoke with Piyush Sancheti, Director of Marketing. Sequence offers four distinct families of software solutions: PowerTheater for efficient power design, CoolTime for electrical integrity analysis, PhysicalStudio for design closure and ExtractionStage as an extraction tool.
PowerTheater is focused at the architectural level where 80% of the power consumption is determined. The methodology is to estimate power early, optimize power dissipation before synthesis and to verify power at the gate and physical level. Sequence has developed a sophisticated RTL divide-and-conquer approach, where both static and dynamic power is accounted for. PowerTheater identifies the major structures in the design and then focuses specific, patented power analysis methodologies on these structures, including: memory, I/O's, clocks, data path and control logic to identify power-reduction opportunities. It proposes RTL design modifications along with estimated power savings.
CoolTime offers electrical integrity analysis for the concurrent analysis of power, voltage drop, timing, and signal integrity. It uses event-driven timing windows and dynamic coupling models to accurately model crosstalk delay and glitch. CoolTime's instantaneous analysis is based on a patent-pending vectorless algorithm, T2, for current and voltage calculation. This unique algorithm relies on static timing analysis methods to compute actual event waveforms on each circuit node. Based on the switching events and power data in the .LIB, CoolTime computes instantaneous current waveforms and resulting voltage waveforms. The voltage analysis takes into account the dynamic effects resulting
from power grid capacitance, on-chip decoupling capacitors and package inductance. Additionally, simulation based instantaneous and average methods are also supported utilizing switching activity in VCD format.
Last May Sequence Design announced a joint development effort with Toshiba Corporation to optimize power and reduce wasted power consumption in semiconductors based on Toshiba's Selective MTCMOS (Multi-Threshold CMOS) technology. Last week Sequence Design announced it has entered into a partnership with the power driven physical design EDA vendor, Golden Gate Technology, to add power grid design to the company's NanoCool power integrity flow.
I recently attended a one day seminar on Low Power Soc Design hosted by Magma Design Automation the number four EDA vendor according to our quarterly reports. After the seminar I also spoke with Sameer Patel, Director of Product Marketing. He stressed the differentiation was their unified memory resident data model that enables the optimization, implementation and analysis engines to get immediate access to continuously updated logical, physical, timing and other design information for making on-the-fly design decisions. He said that these was far more integrated then merely having a common data repository that required individual modules to extract information, massage it and
Magma offers two power modules Blast Power and Blast Rail that are fully integrated into its RTL-to-GDSII implementation flow.
Blast Power is Magma's solution for power optimization and management. Power-aware synthesis supports techniques such as clock gating, power gating, voltage islands, and multi-Vt libraries. An optimization engine makes leakage-power-versus-timing tradeoffs through the design and implementation flow. Power-aware placement and routing minimizes switching power and ensures uniform power and voltage drop distribution.
Blast Power generates the power mesh based upon user-defined constraints including utilization limits, current density, and voltage drop limits. It also performs decoupling capacitance insertion and optimization to minimize fluctuations on the power network due to transient effects. Blast Power supports reference methodologies from foundries, library vendors and custom IP vendors
Blast Rail provides transient rail analysis to account for sudden changes in power consumption and undershoot and overshoot effects due to power surges taking into account capacitive, resistive and inductive effects. The modules has capabilities for extracting parasitic networks, directly linking to timing analysis, and performing power consumption, voltage drop and electromigration analyses. Blast Rail generates maps to view power, power density, current and current density as well as the voltage drops on a cell, wire or global basis.
National Semiconductor Corporation and ARM are jointly promoting PowerWise Interface (PWI) as an open standard interface for system power management based upon adaptive voltage scaling.
One way to reduce energy consumption in a processor is to reduce the clock frequency as low as possible and to reduce the core supply voltage to the minimum amount for that clock frequency, frequency/voltage scaling. A simple approach would be to have a frequency/voltage table where the voltages are the minimum to maintain functionality over all parts and temperature. This voltage must include headroom for power supply regulation error. Open loop adaptive voltage scaling (AVS) is an approach which regulates the supply voltage to a pre-characterized value that guarantees operation over process, temperature, and power supply variations. However, this does not guarantee minimum energy
consumption. Minimum energy consumption is achieved when the maximum propagation delay (and thus minimum voltage) is present for any given situation (frequency, process, temperature). Closed loop AVS accomplishes this by regulating the propagation delay margin.
The closed loop AVS system developed by National Semiconductor and ARM has two hardware components: the Intelligent Energy Manager (IEM) and Adaptive Power Controller (APC), located in the processor, and the AVS compliant energy management unit (EMU). The ARM IEM determines the minimum performance (clock frequency) required by the processor for given tasks. The APC accepts a performance request from the IEM and determines the minimum voltage the processor can operate at for that performance level. It also commands the EMU to attain the lowest supply voltage for a given clock frequency. The APC is synthesizable code operating in the processor, and it manages the IEM requests and voltage
control without any intervention from the processor. All the software hooks for controlling performance are contained in the IEM. The APC controls the supply voltage transparent to the IEM, however it is coupled to the external EMU. The AVS EMU is equipped to interpret commands from the APC through a new open standard interface, PowerWise Interface (PWI).
Letter to the Editor
Dr. Horgan, while I appreciate that time constraints limit the amount of
research that can go into an article, today's piece on FPGA tools seemed
little more than the marketing bullet sheets from each of the vendors.
As much as FPGAs are getting bigger, they still don't, and won't look like
ASICs due to the non-trivial problem of programmable interconnect area, speed
and power. For what FPGAs have been, and from where I can see them going,
Synplify has the market hands down. I invite you to visit with the FPGA
vendors and ask their customers and consultants what tool flow they use. I
doubt you will find any other synthesis tools even close to Synplify / Pro. We
are riding up to the programmable logic peak in the current cycle of Dr.
Makemoto's curve. Synplify owns this wave. Nothing gets more out of an FPGA
in either area or timing. Nothing comes close to the compilation speed or ease
So, my bottom line on synthesis is that it is over for FPGAs, and for
traditional ASICs there are lots of fine tools out there like the venerable DC,
Magma and others. ASIC and FPGA synthesis tools don't have any compelling
reason to unify. ..... (cut off by editorial prerogative)
Steve Weir, Engineering Program Manager, Harris DTS
Thanks for the unsolicited testimonial for Synplicity. It was even stronger than your testimonial posted on Synplicity's website. My editorial began with input from Jeff Garrison, Synplicity's Director of Product Marketing for FPGA Synthesis Products. When the editorial first appeared, it had his title but not the company name.
I was triggered to write this column by Synopsys' announcement of their Design Compiler FPGA. This was the fourth attempt on their part in the FPGA Synthesis arena. I also saw that Mentor Graphics, who has considerable presences in this market segment, had introduced its Physical Synthesis offering in late December. I was curious why
these two leading EDA vendors were interested in this area. Evidently, neither company got the memo that it is already over. In the history of EDA many companies have been on top only to fall to wayside. In an industry dominate by three major players, many Davids have been able to develop innovative products and occasionally so have the Goliaths.
The FPGA Synthesis market is a classic example of the integration versus best of breed argument. Synopsys has targeted its Design Compiler FPGA at designers who prototype ASICs using high-end FPGAs; 51% of their customers by their count. By buildings its offering on top of Design Compiler, Synopsis argues “By using a common, robust ASIC and FPGA flow, designers can design once for both their ASIC and FPGA and get the fastest path to prototype and integrity is ensured.” Synopsys reported 24 new logos in Q2 fro DC FPGA. Mentor Graphics' position is that “FPGA Advantage is a complete Integrated Design Environment (IDE) targeting high-complexity FPGA device
. FPGA Advantage
accelerates total product design with integration of FPGA IO design as well as bi-directional integration of the PCB design flow.” Each vendor seeks to exploit its strength in complementary areas. Of course neither integrated vendor would concede that their FPGA Synthesis products are less than competitive. The marketplace will ultimately decide. I regret I did not spend more time in the editorial drawing out this theme.
Your suggestion about speaking to users is a good one. However, as software vendors in many industries will tell you it is difficult to get customers to go on the record. Some end user companies will not even admit they are using a given product. User referrals from vendors are almost assuredly going to be advocates.
As for FPGA versus ASIC there is already an abundant amount of material available on the strengths and weakness of both.
--Contributing Editors can be reached by
You can find the full EDACafe event calendar here
To read more news, click here
-- Jack Horgan, EDACafe.com Contributing Editor.