August 15, 2005
Please note that contributed articles, blog entries, and comments posted on EDACafe.com are the views and opinion of the author and do not necessarily represent the views and opinions of the management and staff of Internet Business Systems and its subsidiary web-sites.
| by Jack Horgan - Contributing Editor
Posted anew every four weeks or so, the EDA WEEKLY delivers to its readers information concerning the latest happenings in the EDA industry, covering vendors, products, finances and new developments. Frequently, feature articles on selected public or private EDA companies are presented. Brought to you by EDACafe.com. If we miss a story or subject that you feel deserves to be included, or you just want to suggest a future topic, please contact us! Questions? Feedback? Click here. Thank you!
Most of what has appeared in this column about Rambus has been about litigation; patent suits with Hynix, Infineon, Micro and most recently Samsung as well as FCC. I thought it would be interesting to see if anything was occurring on the product front.
On July 7 Rambus announced the latest version of its high-bandwidth XDR memory interface technology, named XDR2. The XDR2 memory interface uses a micro-threaded DRAM core and circuit enhancements that enable data rates starting at 8GHz Rambus claims that this makes it five times faster than today's best-in-class GDDR graphics DRAM products. XDR2 memory interface is targeting applications that require extreme memory bandwidth, such as 3D graphics, advanced video imaging, and network routing and switching applications.
I had an opportunity to discuss this with Victor Echevarria, Product Marketing Manager for the Platform Solution Group.
What is the significance of XDR2?
XDR2 means a lot of things to the memory industry. It really has significance for lots of different markets. The first notable item about XDR2 is that it is a follow on product to our XDR memory interface which is shipping today, the memory technology for the cell processor. XDR2 really integrates a number of technologies to up the bandwidth from XDR1, from 3.2 GHz to 8 GHz. The significance of XDR2 is that it is the first DRAM technology to ever include micro-threading technology. Micro-threading technology is a technology that we've invented at Rambus to essentially enable finer access granularity transactions.
Some background on this topic. As memory interface technology increases in speed, the interfaces seem to increase much faster than the DRAM core can keep up. By core I mean all of the storage elements within the DRAM. What ends up happening is that every time an interface doubles or triples in speed, what it essentially doing every time an access request is being issued into a DRAM a lot of data is sent back. With each generation more and more data is sent per request.
You can imagine that you have a library and a librarian who is going to fetch you books. You request a book from the library and the librarian goes, gets it and gives it to you. By increasing the interface speed at which the librarian can fetch books for you, doesn't necessarily mean the librarian can fetch multiple books at one time. Let's say we double the librarians speed. Instead of issuing multiple requests, I tell the librarian to go grab one book bit the librarian brings back 2 or 4 books. That's the problem facing the application markets that require high speed memory. There's the notion of access granularity that's not getting any better as time goes on. Micro-threading
essentially eliminates that. It allows you to transfer very fine chunks of data with this incredibly fast interface.
One of the key applications where this is going to help is in the graphics space. The graphics processor makers often deal with frame buffering using units of measurement called triangles. They break down a three dimensional rendered frame buffer into millions of polygons and millions of triangles. The trend is for these triangles to get smaller and smaller to yield more realistic images. You can think of an image that has a single triangle that basically looks like a triangle. As you add more and more triangles you generate more and more complex shapes. Three dimensional applications are now getting to the point where they are having 7M to 10M triangles on a given frame. If a given
triangle is extremely small then it would help graphics vendors to be able to pull out individual triangles from the memory. That's where micro-threading plays in. This is where XDR2 is targeting primarily graphics vendors. We are also targeting consumer electronics and networking vendors.
What is there about consumer electronics and networking applications that would make them targets for XDR2?
Let's touch on networking first. Networking packet buffer processor are essentially dealing with arbitrary length network traffic packets that are coming in and out of you router, switch or wherever you have your packet buffer processor. About half of those packets end up being about 32 bytes in length. As memory technology increases in speed, you are not going to be able to access 32 bytes at a time. What you are doing is limiting those 32 byte packets with your new memory access technology you are not going to be able to access just those 32 bytes. You are going to have to access 64 bytes or more. With micro-threading and XDR2 we have got that back down to 16 bytes. So we are able
now to access the small granularity where you are not wasting one-half your bandwidth with data. With consumer electronics the value proposition is similar to that with XDR1 which is that you obtain the bandwidth you need using a single XDR2 device that you ordinarily require multiple DDR2 or DDR3 devices to obtain. There is definitely going to be some benefit to the finer access granularity as well given the fact that increasingly consumer electronics are dealing with some three dimensional processing and finer cells for rendering images.
XDR2 also integrates a couple of other technologies that are not necessarily new to the semiconductor industry but are new to DRAM. We are integrating transmit equalization to enable this 8 GHz signaling rates across consumer FR4 PCB material. And it is still differential signaling topology memory. XDR is currently the only differential memory technology that is shipping in the mainstream. Also an adaptive timing circuitry which complements our flex based timing adjust in XDR. Flexsave for XDR eliminates any bit-to-bit skew that could be caused by trace length mismatch, by driver mismatch or by any type of capacitive effects that vary pin-to-pin. The adaptive circuitry we are
including with XDR2 is essentially a way to make sure that any differences that occur during system operation, due for example to temperature fluctuations, are tracked out over the course of normal operations.
What is the cost ratio of XDR2 to XDR1?
XDR2 currently doesn't have any DRAM vendors that we have publicly announced. Since there's no silicon, there isn't really any information on cost that we can quote at this time.
What is the availability date?
We can not comment specifically on what our customers' plans are. We do know from conversations with a number of controller vendors in the industry and also our DRAM vendors that we expect the need for this kind of memory somewhere in the 2007 timeframe. However, we are not committing to that deliverable date. This is merely speculation of when a memory technology like this would be effective in the industry.
Presumable it would be of value today. Is the delay simply the length of time needed to incorporate new technology into an upcoming product lifecycle?
It would definitely have its effectiveness today but what I mean by saying that its effectiveness would come to a head in 2007 is that current memory technologies are sufficient for what the market requirements are for graphics vendors such as NVIDIA and ATI. It is for us working with them to define what the requirements are for the next generation and to see what they are capable of doing. Clearly it's 2007 before we see any great XDR2 applications.
Do you anticipate any other memory vendor introducing similar or competing technology?
The only competing technologies in the graphics space are efforts in the graphic GDDR market. The comparisons we have shown to date show that XDR2 provides over 5x effective bandwidth than the best in class competing GDDR device.
Who are the vendors of those GDDR devices?
GDDR is primarily supplied by Samsung. They have about 85% of the GDDR market. There are efforts by Hynix and Infineon at this time to produce GDDR3. These are still very small players, about 5% of the market each. Micron is also trying to come up with some GDDR offering.
Other than NVIDIA and ATI who are some of the possible end users of XDR2?
Basically anybody who produces graphics chips. NVIDIA and ATI make up a huge percentage of the market. Their combined total dwarfs any other market shares. But we clearly have partnership with companies like S3 and SGI and want to continue those relationships. We feel that XDR2 could bring value to their products as well. Those are the ones off the top of my head but anyone who plays in the graphics space.
This product is initially targeted at graphics industry although there are some applications in the consumer electronics and networking space
This product is initially target at graphics similar to the way that XDR1 was. The reason XDR1 was primarily adopted for the cell processor is because it offered the highest possible performance that you could get. It resonated well with Sony because they were targeting the cell processor for Play Station 3. Since then more vendors in the consumer electronics space specifically DTV and microdisplays have seen the value of using XDR because they can actually reduce system cost and component count. XDR2 we imagine will follow suit and really provided this unprecedented performance that would allow a graphics vendor or game console maker to differentiate themselves from their competition.
Then as memory technology progresses, it would become something that is more mainstream, something that could penetrate into the consumer electronic and networking markets. At least for XDR1 we are looking to get into main memory as well for applications like servers and PCs.
What market share does XDR1 command?
This is a unique technology. There are currently no products shipping in volume with it. The cell processor is the first announced product that has it and that is coming to bear currently. Sony is releasing the first cell based product in the spring of 2006. IDC estimates that by 2009 over 800 million units of XDR1 will have shipped primarily driven by game console and consumer electronics. We have a number of customers in the CE space as well but at this point all of them are still confidential.
Some of the technologies in XDR2 such as differential signaling are contained in other Rambus' products.
Differential signaling was essentially invented for XDR. It is also used in FlexIO processor interconnection which is a high speed logic to logic interconnect. It came from the same base core technology development as XDR. It was the incarnation of XDR that help interconnect two chips, kind of like a front side bus would if you look at an Intel Northbridge connect to an Intel CPU. Something like that would be a good application for FlexIO interconnect. The cell processor also integrates FlexIO to enable multiple cell processors to talk to each other as well as peripheral chips like graphics synthesizer and Southbridge.
Rambus is the only vendor using differential signaling.
XDR is the only memory technology currently using differential signaling for its data. There are some generations of DDR that use differential signaling for their strobes. Currently DDR still uses single ended data.
What are the advantages of the Rambus approach?
If you get to higher data rates like 3 GHz or 8 GHz differential becomes your best viable option. The primary advantage of differential signaling is that it is cleaner from a signal integrity standpoint. We are able to reach orders of magnitude higher bandwidths using orders of magnitude lower swing. Using 200 milivolt differential swing as opposed to DDR which is more than one volt. Another key benefit of having differential signaling is the fact that it is very immune to crosstalk and it is well suited to emi (electromagnetic interference) sensitive applications. That's another key point that resonates well with consumer electronic vendors because any technology that requires them to
add more emi shielding adds cost to their overall product line. By the mere fact that it is 200 milivolts, it's also much lower power from the interface standpoint. On the silicon side the power distribution becomes easier because the way differential signaling drivers and receivers work. It is not like you are constantly switching on and off massive amount of current from the power supply. It is basically a constant current drop from the driver and receivers. So it is easier to design.
You are optimistic about XDR2's future based on the parallel with XDR1 in that applications other than those you are initially targeting will come on board.
Absolutely. It is not that we are not targeting those applications at the onset, we just see the initial applications, the initial adopters to be in the graphics arena similar to XDR1. The value doesn't currently exist for consumer electronics and networking but just from an historical perspective we would expect graphics vendors to pick up the technology first.
Does incorporating this technology present any challenges to EDA tools?
We do a lot of work up front to ensure that the development of the technology is as easy as possible. Historically what we have done especially with XDR1 and DRAM is to have design the interface for our customers. We can do that in any design flow whether it is COT, ASIC, ASAP or you name it. A lot of that is essentially dealt with up front by our engineers. What this does for the end customer is that it reduces the amount of risk that they take on. They don't need to worry about designing a difficult XDR2 interface, if their core competency is graphics COPU or networking packet buffers. As the other memory technologies increase in speed it is difficult for them to simply integrate a
Any challenges on the manufacturing side?
We are still evaluating it from a technology standpoint. Currently there is nothing we have to publicly announce about the different technologies or the manufacturing differences between XDR1 and XDR2. Over the next year or two it will be something we are looking at internally.
I thought it would be helpful to give a few details about micro-threading, a DRAM core innovation developed to increase memory system efficiency and to enable DRAMs to provide more usable data bandwidth to requesting memory controllers, while minimizing power consumption. See figure below comparing traditional memory core to micro-threaded core.
Most DRAM cores divide their memory storage into discrete banks that can be accessed concurrently. Banks are typically split across both rows of the DRAM die. Since DRAM pins are also split across the two halves, each half-bank delivers its data to the pins that correspond to its half. Each time a row within a bank is accessed, the DRAM core dedicates resources on both sides of the DRAM.
A typical DRAM core component has eight independent banks. One bank consists of an “A” half connected to “A” data pins and a “B” half connected to “B” data pins. The two bank halves operate in parallel in response to row and column commands. A row command selects a single row within each bank half, and two column commands select two column locations within each row half. Each group of four bank halves (a “quadrant”) has its own set of column and row decoder circuits. However, these resources are operated in parallel, with each transaction utilizing two diagonal quadrants and not using the other two quadrants.
After a row command is received, the selected row is accessed (sensed and latched). Before another bank can perform a row access, a time tRR must elapse. This time interval represents the time the bank's row circuitry is occupied. After a column command is received, the selected column is accessed. In the case of a read command, the data at the memory location is driven onto the data pins and in the case of a write command the data on the data pins is stored in the location. Before the bank can perform a column access, a time tCC must elapse. This time interval represents the time the bank's column circuitry is occupied. 16 bits are transported on each link during a column access.
With 16 data links, the column granularity is 32 bytes. The row granularity is 64 bytes.
In the case of a micro-threaded DRAM core each component has 16 independent banks. Each of the 16 banks is equivalent to a half-bank of the typical DRAM core. Even-numbered banks connect to the “A” data pins and the odd-numbered banks connect to the “B” data pins. Each bank quadrant operates independently in response to row and column commands (a quadrant is a group of four banks with dedicated row and column circuitry). Furthermore, a column access of an upper quadrant is interleaved with the corresponding column access of the lower quadrant.
After a row command is received, the selected row is accessed. A time tRR must elapse before another bank in the same bank quadrant can perform a row access. However, banks in the other three quadrants may be accessed during the interval. After a column command is received, the selected column is accessed. A time tCC must elapse before this bank can receive another column access command. However, banks in the other three quadrants may be column-accessed during the interval. Each column access only transports data for half the tCC interval, and each column access only uses 8 of the 16 data links, resulting in a column granularity of 8 bytes, one-quarter of the previous value. The
row granularity is 16 bytes, again one-quarter of the previous value.
Finer granularity comes at the expense of increased command bandwidth. However, this translates into only a nominal increase in DRAM area, typically 1%. Since some DRAMS already come with independent row and column circuitry for each quadrant of banks, finery granularity may or may not require added circuit which would increase the area and cost of the DRAM component.
The top five articles over the last two weeks as determined by the number of readers were
Silicon Canvas Launches Laker PnR Editor
The Laker PnR Editor provides a built-in Grid-Based Router, advanced ECO editing functions, auto DRC correction, tight integration with Mentor Graphics' Calibre, Synopsys' Hercules verification tools as well as integration with Novas' Verdi/Debussy/nECO tool set. Through this, layout designers can quickly complete time-consuming Post P&R editing tasks, such as fixing DRC/LVS violations, timing closure, signal integrity and functional ECOs.
EMA Design Automation and AEi Systems Announce New Power IC Model Library for PSpice
This PSpice library incorporates over 150 time-domain simulation models for power electronic designs and gives designers capabilities previously unavailable for these popular parts - the ability to plug in a model, representative of the actual IC, and simulate the switching performance under actual operating conditions.
Synopsys Hires Nelson Pratt as Vice President of Marketing Communications
Mr. Pratt's previous 20 years of experience includes being director of Worldwide Marketing at Open Source Development Labs, vice president of Corporate Marketing at InFocus Systems, director of Communications for IBM's Networking Division and senior director of Corporate Marketing for Novell.
Incentia DesignCraft and TimeCraft Adopted By ITE as Its Synthesis and Timing Tapeout Software
ITE Tech. Inc. is one of Taiwan's leading IC design houses. DesignCraft's integrated logic, DFT and low power synthesis capability helped ITE achieve the smallest possible chip area chip and power consumption, while meeting timing constraints. TimeCraft's fast analysis speed shortened timing analysis turnaround time.
Altera Ships Industry's Largest, Low-Cost FPGA
The Cyclone II EP2C70 device has 68,416 logic elements, 1.1 Mbits of embedded memory and 150 dedicated 18x18 multipliers, resulting in over 40 percent more dedicated DS) resources than the nearest competing FPGA. This makes the EP2C70 device an excellent solution for cost-sensitive DSP applications.
Other EDA News
Other IP & SoC News
--Contributing Editors can be reached by
You can find the full EDACafe event calendar here
To read more news, click here
-- Jack Horgan, EDACafe.com Contributing Editor.