Going Green with Low Power Methodology: May 2008

Thursday, May 8, 2008

Level Shifter(upf)

Level shifter translates from one voltage swing to another.UPF provides set_level_shifter construct to achieve this.

Syntax :
set_level_shifter level_shifter_name
-domain domain_name
[-elements list]
[-applies_to ]
[-threshold value]
[-rule ]
[-location ][-no_shift]

Example :

UPF code
set_level_shifter my_ls
-domain PDgreen
-rule low_to_high
-location self
-applies_to outputs
-threshold 0.02

Here, the signal going from lower voltage (p5 or p6)to higher voltage(p7) get a low_to_high level shifter when the voltage difference exceeds that specified threshold (0.02).

In our next blog we will discuss more on how UPF is used to define power distribution network (Defining power domain,supply port,supply net) and power states tables with examples.

Retention(upf)

UPF provides set_retention and set_retention_control constructs to add retention strategy in the design.

Syntax :

set_retention retention_name
-domain domain_name
[-retention_power_net net_name
-retention_ground_net net_name]
[-elements list]

set_retention_control retention_name
-domain domain_name
-save_signal {{net_name }}
-restore_signal {{net_name }}
[-assert_r_mutex {{net_name }}]*
[-assert_s_mutex {{net_name }}]*
[-assert_rs_mutex {{net_name }}]*

A simulation model for retention cell :

reg save_q; // shadow register
always @( posedge save_1 )
begin // save process
save_q <= q;
end

always @( negedge restore_1 )
begin // restore process
q <= save_q;
end

Example :

set_retention ret3
–domain PDgreen
–retention_power_net Vbu ----> Back up vdd
–elements { u37 } ----> Name

set_retention_control ret3
–domain PDgreen
–save_signal s ----> SAVE
–restore_signal r ----> RESTORE

Isolation(upf)

Isolation is used to clamp the output value of power gated block to fixed value during power-down state. UPF provides set_isolation and set_isolation_control constructs to add isolation strategy in the design.

Syntax for isolation cell :

set_isolation isolation_name
-domain domain_name
<-isolation_power_net net_name -isolation_ground_net net_name-isolation_power_net net_name -isolation_ground_net net_name
-no_isolation>
-clamp_value <01latchz>
-applies_to
[latch : The value of the non-isolated port when the isolation signal becomes active.]

Syntax to specify the control signals for isolation strategy.

set_isolation_control isolation_name
-domain domain_name
-isolation_signal signal_name
[-isolation_sense <highlow> ]
[-location ]

[location : Specifies the location for isolation cell to be placed in the logic hierarchy]
A simulation model for isolation cell :

always @( iso_enable, non_isolated )
begin
if (iso_enable == isolation_sense )
isolated = clamp_value;
else
isolated = non-isolated; // Output of the power gated block.
end

Every time the control signal changes value, the new value is compared with its corresponding isolation sense value.If the control signal is the same as the sense value, the specified clamp value is driven as the isolated signal; otherwise, the non-isolated signal propagates as the isolated value.If power supply to the corresponding isolation elements is turned off or the enable signal is X or Z, the isolated signal is driven to X.

Example :

set_isolation iso3
–domain PDgreen
–isolation_power_net Vbu ----> Back up power net
–clamp_value 0 ----> Clamp isolation output to 0
–applies_to outputs ----> Isolates the outputs

set_isolation_control iso3
–domain PDgreen
–isolation_signal CPU_iso ----> Isolation enable
–isolation_sense low
–location self ----> Iso cell desired location

When isolation signal CPU_iso goes low, isolation cell becomes active clamps the outputs of iso3 to 0 else isolation cell became inactive i.e it propagates non-isolated value as isolated value.

Power switch(upf)

An element that connects power to the power domain supply from the permanent power supply depending the control signal.

Syntax :

create_power_switch switch_name
-domain domain_name
-output_supply_port {port_name supply_net_name}
{-input_supply_port {port_name supply_net_name}}*
{-control_port {port_name net_name}}*
{-on_state {state_name input_supply_port {boolean_function}}}*

Example :

UPF Code :

create_power_switch sw_controller

-domain PD_CONTROLLER

-input_supply_port [list VDDT VDDL_o]-->Input supply

-output_supply_port [list VDDc VDDL_2]-->output supply

-control_port [list EN PSE]-->Switch enable

-on_state [list on_state VDDT EN]--> ON state

-off_state [list off_state {!EN}}-->OFF state

The switch is ON based on the ON state of the control signal (PSE) , and drives the value specified on the input supply port (VDDL_o) on the output supply port (VDDL_2).

The switch is OFF based on the OFF state of the control signal (PSE) , and then OFF state is driven on the output supply port (VDDL_2).

Introduction to UPF

Unified Power Format(UPF) is a unique wide power format standard developed by Accellera that helps to deploy a power aware design information that cannot be specified in HDL's. In simple terms it helps us to tie the HDL's logic specification to the constrained power implementation with common semantics that can be deployed for verification as well as implementation.Thus what is designed and written in UPF is what is verified and what is implemented.

The above figure indicates the various steps during the process flow , where the UPF files become part of the design source for that stage .UPF along with RTL in complete describes the intent of the designer , which is later passed to the Synthesis tool that reads the RTL/UPF design input files and produces a netlist that might also produce a new UPF fileset which, combined with the netlist, represents a further refined version of same design. This netlist along with the redefined UPF or unaltered UPF goes as an input to a UPF-aware logic equivalence checker that performs equivalence checks including the UPF commands and later the place and route tool that reads both the netlist and the UPF files and produces outputs along with an output UPF file.

Apart from this the Simulator provides simulation models which accurately models the isolation,retention ,power switch,level shifter behavior based on the power intent of the design specified through RTL/UPF.This helps inthe functional verification of the power intent of the design along with the logical functioanlity.

UPF Power Intent Definition: The UPF definition includes definition of power distribution architecture that includes power domains ,supply rails and switches;power strategies that include power state tables and usage of special cells like isolation,retention,level shifter and power switches .
For example,

In the above figure, UPF extends the existing RTL with power-related functionality and bridges the gap between the power controller and the RTL extensions, making it easier to do functional verification at the RTL without embedding the power-related features into the golden RTL.

Looking at some basic definitions:

Power domain : A group of design elements that share a primary supply.

Supply port : Supply port that originates a supply state and voltage value.

Supply net : Propagates a supply state from one supply net to another.

Scope: A particular design element in the logic hierarchy.

Power state : The state of a supply net or supply port i.e ON/OFF.

Regulator : A design element that takes a set of input supply nets and provides the source for a set of output supply nets. The output voltage is a function of combining the input voltage and the logical state of any control signals.

Switch: A design element that conditionally connects one or more input supply nets to a single output supply net according to the logical state of one or more control inputs.

Now, in our next subsequent blogs we can look into the modelling of different cells using UPF commands.

Other Methods of Power reduction

Having looked at the various methods and techniques that can be applied at the architectural level of the design and the verification challenges thus involved and before moving ahead with the introduction to UPF which makes it easy ,there are other areas like Low power Logic synthesis, software based power reduction.There could be more areas apart from this,we would update the blog as and when we understand them.

Low Power Logic Synthesis:This includes state assignment,retiming,logic minimization and technology remapping for low power in the design.State encoding has formed one of the crucial areas where it not only accounts for area minimization but also for power reduction.Based on a couple of research studies conducted, reduction of switching activity of the input state lines of a next state logic during state assignment formed one of the key areas for power reduction and Minimum Weighted hamming Distance encoding like gray coding forms one of the solutions to reduce that switching activity and thus the power.Apart from this there had been proposals of using T-Flip Flops in the design because it results in natural clock gating and would also result in reduced next state logic complexity.Sometimes a combination of T-Flip Flop and D- Flip Flop had also turned out to be one of the approaches to use , where the T-Flip Flop is usef on high switching activity bit lines thus reducing the combinatorial logic realization and thus the power consumtption;and D-flip flop on low switching activity bit lines for it's easier to implement and also simulatneously reduces the power .

Apart from the mentioned above ,low power retiming is also one approach to reduce the dynamic power consumption which is performed by the synthesis tool which is based on the calculation of the power reduction by relocating the registers while having a calculation of the glitches thus produced due to the relocation of register and the switching activity of the sequential circuit.

Software based Power Reduction Method: Take an example of a CPU in our PC.While we had been concentrating on various techniques to reduce the power on the CPU,it's the software that executes on the CPU that determines its power consumption.

A simple example for an inefficient software could be something that includes "busy wait loops".There could be an application like SpreadSheet or a Word document that waits for the user inputs,and during those times the spreadsheet or the word document would be simply recalculating the values thus using the CPU in high power activity, while it's expected to be in "inactive" state until the user types in the value.This inefficient behaviour is accounted by the "busy wait loops" executed by the software on the CPU.Thus power-aware software have come into the industry that understands the hadware activity and acts accordingly in mitigating the power consumtption of the system.

Verification challenges involved with low power design

With the introduction of the complex on-chip power management techniques , there is an increase in the possibility of a design error that can be destructive if the false current paths are not detected and properly controlled and thus imposes a challenge in the verification of the design.On chip power management is fundamentally an analog/mixed signal problem , demanding analysis of both analog and digital circuit behavior.

For every power management technique there is a need for both dynamic verification and static verification.While dynamic verification is a test vector based verification to validate the functionality of the design;static verification is a vector less approach which performs structural and architectural checks of the design.

Verify Multi Voltage Design:

Prior to MV design technique, every design had two power states ON state and an OFF state , as the design works on single Supply voltage. This made the verification easy as the whole design understands the boolean logic '0' and '1' for the same voltage level.

With the MV design technique included in the design, apart from the ON/OFF states,there are more n different combination of power states included based on the different voltage supplies used inside the design , thus increasing the number of power states and their transitions.This also defines different voltage levels for '0' and '1' in different blocks of the design. Thus while we verify the signal interface between different voltage domain blocks,it's not the boolean value validation but the voltage levels of the signals that needs to be validated. Level shifters should be taken into account during the logic conversion of signals from one voltage domain to another.

Apart from this there is a need for a Voltage aware verification to verify the MV islands in the design.
1.

For example, In the above figure we can see there are three voltage domains with different power modes/states that is defined by the voltage at which each domain is working.Here we need to ensure that the Memory content is not lost when the supply rail falls below the standby voltage for that particular voltage domain.

2.In case of DVFS(Dynamic Voltage Frequency scaling) , since voltages are changed during run time,we need to check that the voltage ramp times are carefully controlled to avoid voltage overshoot or undershoot which can lead to the malfunctioning of the system.

3.Apart from the mentioned scenarios , there is also a need to perform static verification checks to validate
a. Level shifters by providing a separate constraint for each supply voltage level
b. If the power switch control signals are generated from appropriate domains.
c. Clock tree(to verify the clock swings from one voltage domain to another voltage domain)
d. Reset signals for the design
e. Finally timing checks for a block operating at different voltages(different voltages can
demand the same block to meet different performance objectives) by performing static
timing analysis.

Power Gating:

Unlike multi-VDD , power gating demands a Power-Aware verification as in a power gating technique various blocks in the design are completely shut-off /on based on the control signals coming from Power controller.For all those blocks with the power gated logic implemented, we need to verify two major states for that block in the design the Power Up and Power Down state, which cumulates into a set of power states for the design to operate.

Now, Let's look at the major challenges that come along with power-gating :

1.Power states: To verify the functionality of the design in every power state.;

2.Power State transitions: To verify the functionality of each block in the design while the design transits from one power state to another to ensure all legal state entry and exist.

3.Valid Power sequences: The power sequence is mainly to ensure the right voltage switches are ON for a particular block in the design.Here we need to validate
a. All power domains are completely powered up before issuing reset.
b. Main control unit of the design doesn't power up until entire chip is powered up.

For example , in the below figure we can see three power domains with its respective power table intent showing the power states and their transitions.

4. To ensure that the powered down block does not evaluate the events of the input transitions to the block .This can be evaluated by checking that there are no transitions on the inputs of the always-on block coming from the power gated block.

5. The pass through wires of a powered down logic are not affected, i.e. there can be a set of wires routed through the powered down logic , the logic of these wires should remain unaffected.

6. On powering up the power gated blocks,need to verify that the logic behavior in the block is enabled for evaluation, i.e. combinatorial logic and latches are re-evaluated,continuous assignments are re-evaluated and edge triggered logic of the flops are evaluated on next active edge.

7. To verify the power up sequence that powers up and down the various blocks in the design and the acknowledgment thus received.

8. To verify that the power gated block doesn't resume it's operation until the switching fabric is completely powered up.

9. If same power rail is used for external switching and internal switching , then need to verify the always-on logic which might remain off during external switching.

10. To detect the wastage of power in the design:
For example toggling of clock when power domain is in OFF state.

11 .There's also a need to emulate the IR drops and analyze the circuit behavior for the same.

Verifying Retention :

1. There is a need to verify that proper timing of the control signal (SAVE) is provided for the retention registers to save the values of those registers required before powering down.Improper timing can lead to corrupted data storage.

2. There is a need to verify that the proper timing of control signals(RESTORE) are provided to the retention registers to restore the values of those registers required on power up.Improper timing would end up in trying to restore the values before the registers intended are not powered up.Thus malfunctioning of the circuit.

Apart from this neither clock nor reset should be provided to the power gated block while the retention registers are saving/restoring the information.

3. If the power gated block is employing partial retention, functional testing is required to ensure that there are no illegal combination states that might cause deadlock.
For example:In order to allow the powered down block to restart correctly , the non-retention registers are fed with X values.So there is a need to verify that the non-retention registers are corrupted during power down and remain corrupted even after power up until the controlling logic is re-evaluated.Here there is also a need to verify that on power up of the block, the X values in the non-retention registers doesn't propagate and affect the function of the circuit.

4.Need to verify the X's are not propagated once the power gated block is restarted.

5. Retention registers in a block are properly connected and remain always-on and are not effected by either power-up or power-down of the respective block.Improper power supply connection will lead to loss of data storage.

Verifying Isolation :

1. To verify the unintentional propagation of high impedance states from power gated block to the always-on block.This mainly comes under static verification in structural checks to look for missing isolation cells in the design

2. Isolation logic of the particular power domain is also properly connected to the isolation power and ground and remains untouched by the power gated block.Improper power supply connection will lead to transmission of undesired signals.

3. Need to verify multiple net condition for the outputs of isolation cells while pull-up and pull-down clamps are used as isolation cells.

4. To verify that the domain of the control signals that are provided to the isolation cell are buffered by the always-on buffers.

5. Verify stuck at '1' and stuck at '0' at the output of the isolation cell.

Assertions should be provided at the power gating controller ports to ensure that the required switching technology, the above mentioned functionality and coverage are met.The power gating control signals should be controllable while testing .

Having said thus, all these and most of them form part of the functional check of the design where the functionality of the design has to be checked against various control signals generated by a power controller unit in the design,power switches,various domain interfaces that include the isolation cells,retention registers,ELS and level shifters.But then, all these do no exist in the RTL(and thus the control signals remain floating) and later during synthesis, the tool smartly adds these cells .

In order to emulate the behavior of those cells for simulation, one of the possible solution is PLI's. But then, the major concern here is there should be two RTL's , one with PLIs to simulate the behavior and one without PLIs for synthesis.

In order to combat this huge burden that comes with the complex designs and more of power saving strategies ahead,UPF standard has been introduced that helps in capturing the power intent of a design that goes through the entire flow of a design for both verification tool and the Implementation tool.

Wednesday, May 7, 2008

Power gated controller.

In our previous discussion, we have seen that control signals are required for each power cell in a proper sequence to achieve power gated technique in the design.

During the architecture phase of the project,
Designer will make the decision on which block to shut-down and when and how long this has to be shut-down, is isolation required or not,is retention required or not if required how much state to retain during power down.

Godwin in his blog, well suggests to have a separate power gated controller block for each power domain that takes inputs from the main power control logic and generates the power down signals in the desired sequence. Few control signals required to perform the above task are,
1. Control Signal for the Power Switch (PWR_EN)
2. Control Signal for the Isolation Cell Enable (ISO_EN)
3. Control Signal for the retention flops (SAVE and RESTORE)

All the above signals need to be generated in the right order to avoid the malfunction of the circuit.
One of the sequence to follow,
To power-down the block :
1. Disable the clock
2. Generate SAVE : This will indicate that the contents of the main register in the power gated block moved into retention latch.
3. Generate ISO_EN: This will enable isolation cells to be active and clamp the output of the power gated block to either '1' or '0'.
4. Since all the basic elements are informed of the shut-down operation, now generate PWR_EN, to turn off the power rails, that control specific blocks.

To power-up the block :
1. Generate PWR_EN, to turn on the power rails, that control specific blocks.
2. Disable ISO_EN : This will disable isolation cells.Once disabled, the output of the power gated block is connected to the next power-on block.
3.Generate RESTORE : This will indicate that the the main register in the power gated block restore the data saved in retention latch.
4. Enable the clock.

Sunday, May 4, 2008

Level Shifters

As discussed , as the design incorporates a mix of voltages , there is a need for a voltage translation logic that takes care of the interfaces between different voltage domains.

Practically, if looked at a signal in a 5v domain if driven by a signal from a 1v domain,it is a cause of concern as a 1v swing would not be reaching even the threshold value of the 5v swing.But,presently with the shrinking technology most of the chips have their voltages around 1v in the view of reducing dynamic power.
Then here there could be a question like, how could a signal from a 0.9v domain driving a 1v domain could pose any problem. Here the problem comes from the fact that such a 0.9v signal could make both the transistors ON thus resulting in a crowbar current(Both pmos and nmos are ON when the voltage lies between Vtn and Vdd+Vtp) .Apart from this there could also be certain timing closure problems for each voltage domain when the required voltage swings are not met.Thus it's here the solution of level shifters mitigates the problem.

The purpose of the level shifter is to convert the signal voltage to the correct voltage of the receiving domain.There are two cases :

Shifting the voltage down: This is easier than shifting the voltage high.A simple level shifter circuit of H2L type could be a simple inverter or buffer that is powered by the lower voltage domain.

Shifting the voltage high: This is complex because of the low strengths of the driving signal.The circuit is complex with usage of two power supplies from both the domains.Thus there is also a requirment of careful placement of these cells to minimize the area.

In applications where there is a mix of power gating and Multiple VDD strategies in the design there are special cells called Enable Level Shifters(ELS), that combine the isolation logic along with the level shifter fucntionality.

Retention register

As discussed earlier( post name : power gating ), When power gating is used, the system needs some form of state retention strategy to store it's internal state information.The best approach is to replace a standard register with a retention register when power gated domain is shut down. A retention register contains a shadow register that can save the register data during power down and restore it at power up.

Why only Retention register ?
Retention registers are special low leakage flip-flops used to hold the data of main register of the power gated block. These registers are always powered up. Power gating controller controls the retention mechanism such as when to save the current contents of the power gating block and when to restore it back.

Here one of the key architectural decision in power gating is how much state to retain during power down.Based on storage there are two types of retention.

1. Full state retention : Replacing all register present in the power gated block with retention register. (i.e Retaining the full state of the power gated block) during power down.
Advantage : Most robust, Verification is easy
Disadvantage : Area penalty

2. Partial state retention : Retaining some of the internal state of the block[Shallow state]is saved in the retention.But here the biggest challenge is to assure that all non retained register power up in legal, safe and variable states.

FIFOs, memories and counter are the best example where partial state retention is employed.

[Shallow state : Registers that directly control the logic of the design]
[Deep state : Registers that are used by the state machine which contain lager amount of data]

There are three types of retention.
1. Single save/restore pin retention latch (Slave latch being always on)
2. Single pin Balloon Latch.
3. Dual Pin Balloon Latch.

Figure shows a flop with retention cell(Dual pin baloon latch) along with the necessary control signals. SAVE and RESTORE.

SAVE is a control signal used to store the state of the register into retention latch.
RESTORE is a control signal used to restore the state of the register from the retention latch value.Restore operation is done irrespective of the clock.The retained value is forced into the slave latch.

When system wants turn off the power domain block following are steps to be followed in sequence :
1. Disable the clock .
2. Asserting SAVE signal before sending the turn off signal.
3. Disable the power(Vdd) to the block.

When system wants to turn on the power domain block,
1. Enable the power(Vdd) to the block.
2. Asserting RESTORE signal to retain the state of main register from retention latch.
3. Enable the clock.

Isolation Cell

As discussed earlier , with a header switch fabric, the output of the power domain block discharges towards Vss, while in case of a footer switch fabric, the output charges towards Vdd when the switch is turned off. Here there is no guarantee that the power gated output will fully discharge to ground or charge to the supply, which results in a floating output(transistors spend more in threshold results in a crowbar current) which in turn affects the behavior of the power-on block.In order to overcome this, an isolation strategy is required at the output of the power gated block.

One of the isolation strategy to combat the above effects is an isolation cell that isolates the power gated block from the power on block by clamping the output of the power gated block to a fixed value either logic '1' or logic '0' depending on the isolation control signal given by the power controller block.

Why only isolation cells ?
Isolation cells in the library are designed so that they do not experience the crowbar current when input signal floats, as long as the isolation control input is off.They are always powered on during the power down mode.

There are various combinations of gates and transistors for achieving the property of isolation.
1. AND gate function clamps the output at '0'
2. OR gate function clamps the output at '1'
3. PULL up(pmos) transistor clamps the output at '1'
4. PULL down(nmos) transitor clamps the output at '0' when it receives the isolation control signal from the power gated controller.

While the transistor approach introduces multiple drivers on the power gated net(Net becomes a shared channel) a careful sequencing is required in order to avoid this contention where testability becomes very difficult. The advantage is , it occupies less silicon area and less timing cost as compared to a gate-style isolation cell.

Saturday, May 3, 2008

Power gated switches.(sleep transistor)

A sleep transistor is reffered to either PMOS or NMOS high Vth transistor that connects power to the power domain from the permanent power supply which is commonly called "virtual power supply ".
The sleep transistors are controlled by a power management unit to switch on/off power supply to the circuit.The PMOS sleep transistor is named as "Header switch" and the NMOS sleep transistor is named as "Footer switch".

Header switches turn off VDD and keep VSS on. As a result, the output of a power gated block collapse towards the ground(output capacitor discharges towards the ground) when the switch is turned off.It allows a simple design of a pull-down transistor(PMOS) to isolate power-off cells and clamp output signals in “0”.

The footer switch is used to control VSS supply. As a result the output of power gated block charge towards the supply voltage(vdd) when the switch is turned off. Designs become more sensitive to ground noise on the virtual ground (VIRTUAL_VSS) coupled through the footer switch. The isolation on “0 state becomes complex due to loss of the virtual ground in sleep mode and necessity of bypassing footer switch to reach permanent VSS.

Footer and Header sleep transistor with isolation cell.

The key issues affecting while taking the desicion are "Area, cost,IR drop constraints". The below are the few advantage/disadvantage with the Header and footer switches :
1. Footer switch occupy less silicon area relative to Header switch.
2. PMOS transistor is less leaky than NMOS transistor of a same size.
3. PMOS has lower drive current than NMOS of a same size.

Friday, May 2, 2008

Implementation of power gating in a design

The key elements required while implementing the power gating technique in the design are
1. Power gated switches.
2. Isolation cells.
3. Retention Cells.
4. Level shifter.

The entire design is divided into a number of power gated functional blocks(A collection of design elements which share the common supply), always on functional block,Power switching Fabric and Power gating controller.

Power switching network : Unlike always_on block the power gated block receives its power through power switching network.This network switches either Vdd or Vss to the power gated block.The switching fabric typically consist of large number cmos switches distributed or within the power gated block.

There are two approaches for controlling the power to the power gated domain.

1. Fine grain power gating: Fine-grain power gating encapsulates the switching transistor as a part of the standard cell logic. Here the primary burden of adding switching transistors is left with the library IP provider or standard cell designer. This means that it is possible to use a traditional design flow to deploy fine grain power gating but significantly increases the silicon area..
An example of a fine-grain power gated shown below, where we can observe power switch is connected directly to the standard cell.In order to keep the area overhead to a minimum, fine-grained power gates are implemented as footer switches to ground as NMOS transistor.

The disadvantage of the fine-grain sleep transistor is implementation adds a sleep transistor to every cell that results in significant area increase. Also, it is not able to use the normal standard cells provided by library vendors and ASIC
foundries. Another issue is that the cells become more sensitive to PVT variations, because the built-in sleep transistor is subject to PVT variation which results in added IR-drop variation inthe cell and hence performance variation.

2. Coarse grain power gating : In coarse-grain power gating, the power-gating transistor is a part of the power distribution network rather than the standard cell. One sleep transistor cell is used to turn on and off a set of standard cells(power domain function block).

The coarse-grained approach requires less area than fine-grain power gating due to the lower number of sleep transistors and less routing of enable signals for power gating. Fewer sleep transistors result in better leakage control.
The disadvantage is it might take several clock cycles to power up a larger block of logic cells.

An example of coarse grain power gating is shown below where we can see single power switch transistor is connected to the power gated logic.

The advantage of this approach is less sensitive to PVT variation and introduces less IR-drop variations than the “fine-grain” implementations. Also, the area overhead is significantly smaller as compared to fine-grain.

There are two ways of implementing a coarse-grain structure:

2.a Ring-based Network: The power switches are placed around the perimeter of the power gated block that is being switched-off as a ring.
In the ring style implementation, a virtual power ring is added to surround each power domain. The sleep transistors are placed between permanent power ring and virtual power rings to control power supply to each power domain, as shown in Fig,

It has small impact on placement and routing. However, it could result in more IR-drop at center of the design due to thelimited drive of the sleep transistors distance from the center.

2.b Grid/Column-based Network: The power gates are distributed throughout the power gated region.

In the grid style sleep transistor implementation, the sleep transistors are placed close to power grid to connect permanent power network and virtual power networks, as shown in the above Fig.

The advantages of the grid style implementation are the better IR-drop management because each sleep transistor drives local cells.

The drawback of the implementation is its impact on routing and physical synthesis, because the sleep transistors are distributed in the design area.

Here the key challenge involved with the switching fabric is to limit the in-rush current(causes voltage spikes on the supply) when the power is reconnected to power gated block to avoid the excessive IR drop in the power network.This drop in turn increases the delay in the network.It can corrupt the function as well as retention register in the power gated blocks when power is reconnected if in rush current is not controlled.

Power Gating

From the previous post we have seen few techniques to reduce the dynamic power (Clock gating,Multi Vdd) as well as static power ( Multi Vt). But that is not enough since during standby mode the design continues to consume leakage power, Now let's move on to the most robust method 'Power gating technique' the most effective method to reduce the leakage current.

Power Gating or Power switch off technique is a mechanism to turn off the blocks temporarily when it is idle/based on the requirements to reduce the static power dissipation.The turn off time can also be called as "low power mode/inactive mode" .When the blocks are turned on once again,they are called as "active mode".The strategy behind power gating is to switch these two modes at the appropriate time to reduce the power while minimizing the impact on performance.

There are two types of architectural decisions with power gating:static power gating and dynamic power gating.

In static power gating ,we try to power gate a particular block in the design based on application and this block remains ON/OFF through out the normal operation of the chip until the chip is re- initialized or re-configured through external input signals or by setting certain configuration registers in the chip or during software re-initialization. This is a simpler approach that can be used based on the application requirements where on design,implementation and verification perspective there is only a need to check that the power gated block doesn't affect the functionality of the design.

For example, it could be switching of a peripheral interface of a CPU for a particular application.

In dynamic power gating ,the decision is made on the fly while the chip is running based on various parameters decided by the power control logic of the chip .The main goal of dynamic power gating is to reduce the leakage power of the chip by defining power domains and analyzing the data paths and modes of operation of the chip while it is running.The decision making of the power control for each power domain in the design is done either by software or hardware.

In a software based approach the driver software can schedule the power down/up operations.

For example ACPI introduced for PC's gives the controllability of power controlling decisions to the operating system.

The operating system is aware of new application, and it has the data to make power-management decision.Although ACPI targets desktops ans notebooks,it makes a good model to follow when developing a software based power management system for embedded systems. With ACPI, software automatically controls the power to peripherals, ans peripherals can also activate the processor.For example ,receiving an incoming call with a modem powers the processor from standby mode in time to capture the data.

In a hardware based approach either hardware timers are utilized or a dedicated power management controller is another option that maintains the control logic to control all the power domains.

(While in the whole blog when we refer to power gating , we are mainly referring to the dynamic power gating .)

The architecture and implementation of the power gating in a design ,includes certain trade-off's and challenges respectively, they are:

Architectural trade-offs involved with this technique :

1. Power gating affects design architecture.
2. It increases time delays as power gated modes have to be safely entered and exited.
3. The possible amount of leakage power saving in such low power mode and the energy dissipation to enter and exit such mode introduces some architectural trade-offs.
4. The time and energy cost to recover the lost data when the block is re entered to active
mode.

Implementation challenges involved with power gating are,

1. Adding sleep transistor/power switches to switch off the power supplies to the idle circuit.

1.The output of a power gated design may ramp off slowly , as a result these outputs spend significant amount of time at threshold voltage,thus causing large crowbar currents in the always power on block.Isolation strategy is required to overcome this.

2. Retention strategy is required for some power gated blocks, as it is required to store the internal state of block during power down and to store this state during power up.

3. A dedicated power management controller is required for shutting down the blocks in a proper sequence.

4. It also requires a system-level understanding to decide on the addition of power gates,isolation,retention in the design and when to control them.

5. Performing power aware verification of the design is a biggest challenge.

State retention and restoration required for those power gated designs that need to resume their operations on a wake-up event based on the states information.here are three methods for the state retention and restoration methods.

1. Application software that writes specified register and memory values to disk storage before shutting down power supplies. At wakeup,the software writes the saved states back into the design.

2. State-retention and -restoration method uses a design’s scan chains to shift out register states into an always powered memory before power-down and shift-in of the states at wake-up.

"Both the above methods usually take too long to read and write which consumes a lot of power."

3. Retention register and latches that contains a shadow register that can efficiently store and restore states during power down/up.

Thursday, May 1, 2008

Multi Threshold (Vt) Design

As we have seen earlier, subthreshold leakage depends exponentially on Vt; while the delay has a much weaker dependence on Vt.Thus libraries with multiple Vt has become a common way of reducing leakage current and balancing the timing constraint of the design.

Silicon foundaries provide multiple threshold libraries at the same process node :

1. Low Vt : This produces least cell count and least dynamic power,But produce highest leakage power compared to other Vt cells. It runs faster. Good for a design with very tight timing constraints.

2. Standard(Nominal) Vt : A standard Vth cell maintains it's characteristics for leakage and delay between low Vt and High Vt cells.

3. High Vt cells :Produce least leakage power consumption but has high cell count and dynamic power. This methodology is good for leakage power critical design at the expense of speed of the device.It runs slow.

Thus it's the implementation tool which has to use the various cells in the library and create an implementation that will meet the timing constraints while reducing the leakage current as much as possible.Use lower threshold gates on critical path while higher threshold gates off the critical path.

The decision making process of deciding on the different Vt cells renders simplified and efficient by considering the trade-offs early in the synthesis process. Usually as there is a minimum performance to be met before optimizing the power,the design is first synthesized for high performance by using a low threshold voltage library at first pass and then later the decision making is done in locating all those areas in the design which doesn't require low Vt cells and then they are swapped with the High Vt cells.

However, there could also be some applications where power is the main goal then the low leakage library is targetted first which is later swapped by higher performing ,higher leakage equivalents in speed critical areas.

Multiple voltage thresholds can reduce power consumption with a little impact on Timing, area and place and route.

Multi Voltage(Vdd) Design:

As we have seen dynamic power is quadratic proportional to VDD, by lowering VDD , we can reduce the dynamic power significantly but at the cost of performance. So with this statement underlined, we can look at various parts of a particular design and analyze the critical and the non-critical paths and have a balance of low and high voltages based on the performance requirement of the path.

For example, take the USB card which we use to store data. It includes a processor block and its interface with the USB core . The processor block may require a high speed clock, while the USB core requires a low frequency clock in compliance with the protocol standards. So if we can give the low power that is required for the USB core to run, then we can drastically reduce the overall power consumed by the design.

Again within the design of the processor block , depending on it’s work load we can provide different voltages to the same block.

For example, RAM present in the processor block can be provided with Low voltage when there is no access to the contents of it’s memory and high voltage while performing reads and writes to the contents . So, it’s ultimatum on the architecture of the design to partition the design for different VDD’s based on the performance requirements and work load of the design.Thus based on the above examples , we can categorize the multi voltage strategies into

Static voltage scaling: different fixed voltages are given to different blocks
Multi level Voltage scaling: a block is switched between two or more voltages and fixed for a particular mode
Dynamic voltage and Frequency scaling: based on the work load large number of voltage levels are switched dynamically
Adaptive voltage scaling : same as above but here a control loop would be used that would adjust the voltage

Here while the blocks are partition for different VDD supplies, we need to insert Level Shifters between various voltage domains on the signals running between different blocks. These level shifters are buffers that translate the signal from one voltage swing to another(either Low to High or High to Low) .

Though this approach contributes for significant dynamic power reduction , it involves various complex decision of architectural strategies. Apart from this,
1. As different blocks in the design are at different voltages timing analysis becomes more complex.
2. Multi-voltage design require additional resources on the board(Regulators to provide additional supplies)
3. Should follow proper power up and power down sequencing.

As the design uses multiple power domains apart from the large power benefit , it has it's impact on the architecture, area penalty due to power grid and level shifter.performance degradation, signal integrity design and verification.

We will discuss on the importance of level shifter in the next few posts(during the discussion of power gating technique).

Clock Gating

The distribution network of clock forms a significant contributor for the power of a chip. Reduction in the switching capacitance of the clock infers a great impact on the total power. Clock gating is one such an approach that partitions the clock network in the design and allows only those partitions of the design to toggle that are needed on each clock cycle. It is implemented by turning off the clock for those blocks which are not required.

This approach can often save significant amount of total clock power but it requires trade-offs between timing and leakage power during the implementation.Moreover based on the granularity level at which the clock gating is applied the power savings tend to differ.There are three levels of granularity

Module level Clock gating or Global Clock gating: In this approach for an entire block or module in the design the clock is shut off,typically from a central clock-generator module.This method unctionally shuts down the block and reduces a significant amount of dynamic power as it shuts down the entier clock tree.

Register level clock gating or local clock gating: In this approach the clock to a single or a set of regiusters is gated. In origianl RTL implementation , it is typical to implement a synchronous load enabled register using a clocked D-FlipFlop and a recirculating multiplexer with the D flip flop being clocked every cycle

So, the key to clock gating for these registers is to use the same enable signal to gate the clock thus the register doesn't get the clock signal in the cycles when no new data is loaded thus saving the power and also eliminating the multiplexer and it's power consumption.However gating a single bit register doesn't render useful in the power savings , so it's better to use this type of clock gating for a large number of registers ,saving the flip flop clocking power and multiplexer power of all the registers by using a single clock gating circuit.

Cell-level clock gating: In this approach a clock gating circuit designed for each cell.For example a memory is designed such as to receive the clock only during cycles of "active" access.This appears as an easy method for power saving but has an area overhead and limits the power savings as a large number of registers and memeories need to be predesigned with clock gating , and it doesn't help in sharing the clock-gating logic between the registers.

While comparing the power savings per clock gate betwee these approaches , global clock gating reduces more power compared to local power gating, but local power gating provides more opportunities like automated insertion which can result in large number of clock gated cells in the design.

The clock gating circuit can be implemented

1. By using an AND gate for the enable and Clock and giving the output as the clock for the flip flop.But this approach has the problem with the glitches on the enable signal that are propagated when the clock is high to the clock pin of the register.Though this can be avoided by applying appropriate set up and hold time constraints on the enable signal,any spurious change of the signal during run time can cause wrong values to be latched.

By using a level sensitive ,active-low latch on the enable path .This helps in removing the problem of glitches as mentioned before.The output of the latch is freeezed at the rising edge of the clock and also ensures that the enable signal at the AND gate is stable when the clock is high.

Clock gating technique reduces a significant amount of dynamic power ,with an overhead of insertion delays in the clock tree( clock skew).But still there is leakage power still in the system ,While going ahead we can discuss more on the techniques through which we can reduce the leakage power.

Going Green with Low Power Methodology