/ Denali Memory Blog
 

Archives for: May 2010

A few days ago, this blog discussed the “big resistor” model of SDRAM power consumption (see Marc Greenberg’s “big resistor” model of semiconductor DRAM power consumption) and that blog entry ended with this statement:

“It might also be a very good idea to design your system so that it can accommodate more than one DDR variant. That way, time and circumstance can determine which DRAM technology gets used at any given time. That idea, coincidentally, is the topic of a Denali White Paper scheduled to be released in the near future.”

Well, designing advanced SOCs so that they can mate with multiple generations of SDRAMs is not just a nice recommendation based on a convenient theory—it happens in the real world. Case in point: ST Microelectronics’ announcement of the company’s new SPEAr (Structured Processor Enhanced Architecture) 1300 family of embedded microprocessors that target high-performance connectivity and embedded applications.

ST Micro’s new embedded microprocessor family typifies the current evolutionary step in high-end embedded processing platforms. The SPEAr1300 embedded processor family is based on a 600-MHz, dual-core implementation of ARM’s Cortex-A9 32-bit microprocessor. Each of the two Cortex-A9 processor cores has separate 32-kbyte L1 SRAM instruction and data caches and the chip adds a unified 512-kbyte L2 SRAM cache and a 32-kbyte boot RAM so the embedded processors in this family can easily handle the processing requirements of many high-end embedded applications. The SPEAr1300 embedded microprocessor family includes a host of different peripheral interfaces commonly used in today’s high-end embedded applications—USB, PCIe, and SATA, for example—and also includes many internal peripherals needed to create complete systems as shown in the block diagram below:





Like most of the embedded microprocessors being designed today, ST Microelectronics’ SPEAr1300 family does not include a lot of on-chip memory beyond the L1 and L2 caches for the dual-core ARM Cortex-A9. Yet that 600-MHz dual core processor will have quite a thirst for instructions and data so a large memory must be somewhere nearby. Like most SOCs designed today, the SPEAr1300 processor family is designed to mate to the current leader in low-cost memory storage: DDR SDRAMs. To do this, the SPEAr1300 devices include a DDR SDRAM controller capable of driving either DDR2 or DDR3 SDRAM. This conservative design approach—accommodating two generations of SDRAM—grants system designers maximum leeway and flexibility in developing systems along a wide price/performance continuum and will help to drive the chips’ sales volumes by making sure that the embedded processors suit the widest possible range of applications.

As previously mentioned, Denali will soon release a White Paper discussing the topic of multigenerational DDR SDRAM support in more detail.

Some combinations like chocolate with peanut butter, ice cream with peanuts and chocolate sauce, and HDDs with NAND Flash caching make obvious sense. Other combinations don’t make such an obvious match. In these cases, people need extra convincing. LG and Hitachi are suggesting just such a non-obvious combo as a concept: an optical drive and NAND Flash. Optical drives? Aren’t they on the way out, like floppy disks? Not really. We still use them to play Blu-Ray and DVD videos, right? And we’ve all experienced the problems associated with scratched or defective video discs, right? So the two backers of the HyDrive optical disc/NAND Flash combo put this extremely short, extremely effective video together to prove the utility of fortifying an optical disc with NAND Flash:



That’s enough to make a believer out of me.

Oh yes, once it’s in the system, the NAND Flash in the optical drive is also available to help accelerate the HDD as well, with the appropriate drivers. The HyDrive concept demonstrates once more that getting NAND Flash into a PC, in some way, can be a real performance booster for a wide variety of applications. Perhaps even wider than is intuitively obvious.

EETimes' Junko Yoshida just published an article titled “5 reasons why Samsung scares Japan” (http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=225200248) that discusses five major reasons why Samsung has become the manufacturing powerhouse that it is today. There’s no question that in semiconductor memory, Samsung is a true industry leader. According to DRAMeXchange, Samsung was the dominant player in semiconductor DRAM, with a 32.3% market share in 1Q2010, and the company held an even more dominant position in NAND Flash during the same period, with a 38.3% market share. (During this same period, Hynix was the #2 DRAM player with a 21.5% market share and Toshiba was the #2 NAND Flash player with a 32.4% market share, again according to DRAMeXchange.)

Without a doubt, Samsung currently holds an impressive lead in semiconductor memory manufacturing—at least from a market-share perspective. Further, as the last part of Yoshida’s article discusses, Samsung’s announced plans to more than double its capital expenditures this year versus 2009 are clearly aimed at keeping Samsung in the lead.

Not so coincidentally, Samsung is giving a keynote at MemCon 2010, being held on July 28. Register here: https://www.denali.com/en/memcon/2010/


To see the DRAMeXchange 1Q2010 DRAM vendor ranking, go here: http://www.dramexchange.com/WeeklyResearch/Post/2/2358.html

To see the DRAMeXchange 1Q2010 NAND Flash vendor ranking, go here: http://www.dramexchange.com/WeeklyResearch/Post/2/2365.html

Too many DRAM choices. If you want low power DRAM, do you choose LPDDR1, LPDDR2, DDR2L, DDR3L, or what? It’s not an easy choice because of all the variables involved: the peak and average transfer rates, the power supply voltages, and the IC manufacturing process technology used—just to name three. In addition there are the DRAMs’ various operating modes and the corresponding current consumption in each mode. The modes of interest include:

  • Read mode (Idd4R)

  • Idle mode (Idd0)

  • Power-down mode (Idd2P)

  • Self-refresh mode (Idd6)

If you try to zero in on the optimal DDR choice using the multivariate equation needed to compute the exact energy consumption of any DRAM you might pick for a system design, taking all of these modes into account, you will most likely never get to the point where you can clearly identify the right DRAM choice.

Fortunately, your choice just needn’t be that complicated and Denali’s Marc Greenberg has an answer for you. He calls it the “big resistor” model of semiconductor DRAM power consumption. Now we all know that DRAMs aren’t big resistors. There are billions of tiny little transistors and capacitors inside of a DRAM and not one “big resistor” to be found inside of that package. But from a pins-out perspective, a DRAM’s power consumption is roughly proportional to the square of the power supply voltage, to a first-order approximation. To get the lowest power dissipation, you simply need to pick the DRAM with the lowest operating voltage that can meet your performance criteria. Now that’s a much easier choice.

Below is a table that shows the various choices for DDR DRAM, the associated operating voltages (including a few of Marc’s educated guesses), and a figure of merit based on the square of the operating voltage relative to DDR1 DRAM. Clearly from the table, LPDDR2 DRAM is the right choice if power and energy consumption are your only decision criteria.

Here are four bar graphs comparing the power consumption of the various DDR memory technologies in read, idle, power-down, and self-refresh modes.

From these bar graphs, LPDDR2 is the clear winner if power and energy consumption are the only considerations. However, power and energy consumption are never the only considerations in the real world of system design and manufacturing. Availability, device capacity, device organization, device speed, device cost, per-bit cost, second sourcing, and technology road map are other real considerations, just to name a few. So it’s helpful to see what other DRAM technologies might serve for a given system. Looking at the above bar graphs, it’s apparent that a low-voltage variation of DDR3, DDR3L, might give LPDDR2 DRAM a respectable run for the money except in power-down mode. If power-down mode is important because your system will put the DRAM in that mode a lot of the time, then LPDDR1 DRAM might also be a good choice.

It might also be a very good idea to design your system so that it can accommodate more than one DDR variant. That way, time and circumstance can determine which DRAM technology gets used at any given time. That idea, coincidentally, is the topic of a Denali White Paper scheduled to be released in the near future.

Earlier, we covered the announcement of OCZ’s Enyo USB 3.0 external SSD. Now PC Perspective has created a pretty interesting video review of the drive. OK, the review starts kind of slow and seems to have very little technical meat but you might want to let the video play until they disassemble the drive to reveal the product’s interesting silicon guts. That’s when you’ll learn all you need to know about the chips inside (including the SSD and USB 3.0 controller chip identities) and where you’ll hear about the speed tests PC Perspective applied to the drive. There are three speed tests discussed: one for a USB 2.0 connection (ho hum), one for USB 3.0 using standard drivers (now we’re talkin’), and one using OCZ’s special USB 3.0 drivers (yowzah!).

OCZ Enyo 128GB Portable SSD Review from Ryan Shrout on Vimeo.

Can you spare nine minutes to get a really good grounding in SSD concepts? No? How about eight minutes and forty-two seconds? Ah, now we’re talking. Information Week and with Storage Switzerland’s Lead Analyst George Crump have produced an excellent 4-part video tutorial series on SSDs. Each part runs less than 10 minutes and Part 1 covers the technical side of storage evolution in a concise and interesting way that’s sure to interest even the most experienced computer-storage techno-geek in precisely eight minutes and forty-two seconds.

Part 2 covers the use cases for SSD in large, enterprise-class storage situations. Be sure to catch Crump’s expert presentation skills as he deftly manipulates three whiteboard markers at the same time to make his points.

Part 3 provides an in-depth, technical look at the various ways that Solid State Disk is being delivered: as a separate SSD box connected to a SAN system, a drive module in a drive array, and as a PCIE card in a server. Crump explains the application of each approach, including the trade-offs and benefits. (Note: Parts 3 and 4 aren't available as embedded video, so there are links instead.)

http://link.brightcove.com/services/player/bcpid1753162278?bctid=27960235001

Part 4 of the video tutorial looks at SSD cost ramifications and advantages over mechanical storage.

http://link.brightcove.com/services/player/bcpid1753162278?bctid=28087406001

This blog starts with a simple, sad truth: DDR DRAMs are naturally inefficient. If this statement bothers you, just get over it. All human-made artifacts have inefficiencies and DRAMs are no different. However, there are things you can do to squeeze every bit of bandwidth efficiency out of a DDR DRAM and your efforts can be rewarded with significant performance gains. You can improve memory-subsystem bandwidth by 20-30% or perhaps more, depending on how efficient your subsystem was to begin with. With this added bandwidth efficiency, you can also pick from one of two benefits depending on your system design goals. The obvious benefit you can derive is to get more bandwidth at a given SDRAM clock rate. But perhaps your system doesn’t actually need more bandwidth. Perhaps lower-power operation and lower memory costs are more pressing. In that case, efficiency improvements can cut the required SDRAM clock rate, which reduces power consumption and allows you to use slower, less expensive, and less power-hungry SDRAMs in your design.

Here’s a figure that illustrates these two benefits.


SDRAM Efficiency Examples

The top bar in the figure shows the bandwidth available from a DDR3-1600 SDRAM subsystem, which uses an 800MHz transfer clock. The theoretical peak bandwidth of such as system is 1600 Mtransfers/sec. However, in this theoretical system, the memory controller is only able to achieve 50% transfer efficiency with the SDRAM, so the memory subsystem has an effective bandwidth of 800 Mtransfers/sec. By making the memory controller somewhat smarter with respect to the SDRAM’s needs and with no change in clock rate, the memory controller can extract more bandwidth from the SDRAM. In the case shown by the second bar in the above figure, a controller that’s able to boost SDRAM transfer efficiency to 75% is able to achieve 1200 Mtransfers/sec with no increase in memory cost or clock rate. Alternatively, that same 75% memory-controller efficiency could be used to achieve the same 800-Mtransfer/sec bandwidth as was obtained with the 50%-efficient memory controller and an 800MHz transfer clock, but the controller with improved efficiency can hit the goal of 800 Mtransfer/sec with a peak transfer rate of only 1066 Mtransfers/sec. DDR3-1033 SDRAMs cost considerably less than DDR3-1600 SDRAMs, so there are real cost and power savings to be had by improving memory-controller efficiency. It’s really important to keep in mind that clock speed isn’t the critical figure of merit here. Effective bandwidth (actual data throughput) is.

How can a memory controller make an SDRAM more efficient? Throughput-improvement techniques are based on a firm understanding of the causes of SDRAM inefficiencies. An SDRAM can be completely, 100% efficient if all memory accesses are of the same type (read or write) and are directed at the same memory page. As soon as the memory controller must switch to another SDRAM page, access inefficiencies appear because the controller must open another SDRAM page and get it ready to access (activate it) prior to the actual access. There is another way to achieve high transfer efficiency and that is to direct every consecutive access to a rotating list of memory banks so that the access pattern never accesses different rows in the same bank with sequential memory accesses. Unfortunately, few real memory-access patterns look like either of these use cases so we must look elsewhere for efficiency improvement.

Improvements to SDRAM operational efficiency must emphasize intelligent, traffic-based page management and intelligent command reordering. To achieve these efficiency improvements, a memory controller must gather the commands issued by one or more memory-using blocks on the chip, order those commands in real time for highest SDRAM efficiency based on the immediate state of all the SDRAMs attached to the controller, and possibly reorder the access commands in real time based on assigned priorities and efficiency considerations as new commands enter the memory-controller’s command queue.

Before looking at the nitty gritty details of these efficiency-improving techniques, it’s helpful to step back and discuss why most SOC designs have standardized on one (or a very few) SDRAM ports. The most efficient memory, from access and bandwidth perspectives, is on-chip memory. Ideally, every memory-using function block on the SOC would have its own block of dedicated on-chip memory. Distributing memory on chip in this manner boosts raw available bandwidth and greatly diminishes memory resource conflicts. (There’s less conflict because there are more memory resources.) Each new block of memory with its own independent memory interface also clearly increases available bandwidth, but at a price of course. On-chip memory is relatively expensive. On-chip embedded SRAM can easily cost many dollars per Mbyte as opposed to memory packaged in stand-alone bulk SDRAM chips, which cost dollars per Gbyte. That’s a 1000x difference in the per-bit cost of memory. Consequently, system designers strive mightily to cram all of a system’s memory (or as much as possible) into a standard SDRAM chip to get the most memory per BOM dollar. Part of that mighty effort involves funneling the various memory-access steams on the SOC into one SDRAM port. A memory controller performs that funneling and it’s the memory controller’s job to make sure that all of the on-chip memory-using function blocks get the bandwidth they need from the attached SDRAM. The challenge is to give each memory client in the system the bandwidth, latency, and quality of service that it needs from the single SDRAM resource.

That said, let’s take a look at some possible memory-access optimizations. The following figure shows two memory reads with each read directed at a different bank of memory. The sequence on the right is unoptimized and shows a bank activation of Bank 0 followed (after an appropriate number of wait cycles) by five read operations directed at that bank. The access command sequence then performs a read from Bank 1 which starts by activating Bank 1 followed by the read commands after an appropriate delay. There’s no data coming from the SDRAM for eight cycles because the memory controller waited to activate Bank 1 until all reads from Bank 0 were complete. Those eight cycles represent lost bandwidth. Waiting for completion of all access to Bank 0 before activating Bank 1 needlessly delays the read operations on Bank 1.


SDRAM Read Optimization

The left side of the figure shows what can happen if the memory controller reorders the low-level commands sent to the SDRAM using a look-ahead ordering algorithm and bank interleaving. By promoting the activation command for Bank 1 ahead of the read commands for Bank 0, the memory controller is able to prepare Bank 1 for reads earlier in the sequence. As a result, the reads from Bank 1 occur sooner, recovering eight cycles of lost bandwidth. To safely reorder these low-level commands, the memory controller must fully understand the SDRAM’s rules of operation, the rules for activating more than one memory bank at a time, and the required timing for bank activation and bank reading. Note that in this example, the memory controller has not reordered the memory-access commands it’s received from the on-chip functional blocks. It has merely interleaved bank activation commands to the SDRAM and read commands sent to the SDRAM. The read commands to the two banks remain in order.

The above example is a simple one that exploits the SDRAM’s bank-interleaving abilities. It’s also possible to reorder the memory-access commands (reads and writes) from the on-chip functional blocks as long as coherency considerations are observed. For example, because of coherency considerations you really don’t want to promote reads to an address before writes to the same address if that wasn’t the original order of issue for those memory-access commands. However, it’s also possible to safely violate coherency considerations in some cases if you know what you’re doing. But that’s a more advanced topic, for a different blog entry.

(This blog entry is based on a presentation created by Denali's Marc Greenberg.)

One of the maxims in the world of system design is that it has always been relatively hard to interface with DRAMs and make them work properly in all possible operational situations. This isn’t a new situation. It’s been hard to interface with DRAMs since the day they were first introduced back in October, 1970 when Intel rolled out the first commercial DRAM, the 1-kbit 1103. Intel’s 1103 DRAM was a PMOS chip that introduced the concept of refresh to system designers and used 16V logic levels that required level shifters. Despite the interfacing difficulties—which really weren’t so great compared to the interface requirements of the magnetic-core planes that the DRAM replaced—the 1103 needed only two years or so to single-handedly end the 2-decade dominance of magnetic-core memory. Mostek introduced the first DRAM with multiplexed addressing—the 4-kbit MK4096— in 1973 and suddenly system designers needed to understand the intricacies of DRAM row- and column-access timing. The consequence of incorrectly interfacing to DRAM has always been erroneous DRAM operation. Worse, DRAMs never tell the system when the control signals they’re receiving from the system are out of spec. The DRAMs simply fail to return good data. It has always been up to the system design team to suss out the problems.

Why is this so? Why are DRAMs so darn dumb? Like most things in engineering, the answer lies in economies of scale. Today, PCs and servers drive more than 80% of DRAM chip sales volume so whatever PC and server designers need in a DRAM pretty much determines what most of the manufactured DRAM chips will look like. And what do PC and server designers need in a DRAM? They need cheap storage; the cheaper, the better because PCs and servers use a lot of DRAM so DRAM cost has a big influence on system cost. Today’s PCs, for example, team 16 or 32 DRAM chips with one processor and one memory controller. So it makes a lot more sense, economically, to concentrate as much of the design complexity of the memory-subsystem interface as possible in the one DRAM controller rather than in the 16 or 32 memory chips connected to that controller. That’s why DRAMs started out dumb and have stayed that way.

For a long time after the microprocessor’s introduction in 1971, DRAM control was implemented outside of the processor chip, with memorable exceptions. One of the most memorable exceptions was the DRAM refresh logic designed and built into the original Z80 microprocessor, which was introduced in 1976. The Z80’s refresh logic removed the need to perform the DRAM refresh task as a software routine or to implement the DRAM refresh in logic external to the processor. Unfortunately, the refresh scheme designed into the Z80 was specifically for the DRAMs of the day. The processor generated a 7-bit refresh address that looks laughably small today. Canning a non-programmable refresh scheme into any memory controller, whether or not the controller is part of the processor, no longer makes much sense. DRAM technology evolves quickly so DRAM controllers must be nimble. For example, today’s SDRAMs have an optional auto-refresh mode, which may or may not be an attractive feature depending on the bandwidth and power goals for a specific memory subsystem. A good SDRAM controller will be able to both optimally insert refresh cycles and to use autorefresh, depending on the specific situation.

As DRAMs have gotten larger and as they have migrated from the older asynchronous RAS/CAS interfaces to today’s synchronous interfaces, whole new sets of challenges and problems have emerged. Most of these are associated with the synchronous interface, which has evolved in support of the growing L1 and L2 SRAM caches found in modern PC and server processors. The inclusion of these cache memories moves DRAM access-speed emphasis from individual access-cycle latency to burst bandwidth. DDR3 memory accentuates this trend with its exclusive use of 8-transfer bursts. In addition, there are restrictions on command timing that must be implicitly enforced by the memory controller because the DRAM will give no indication of a command-timing violation. These timings are simply data-sheet specs that must be incorporated into the design of the memory controller.

All of this history boils down into the need for fairly sophisticated SDRAM memory control. Improper command sequences or command-sequence timing to SDRAMs results in erroneous data and data loss. The only way to ensure rock-solid SDRAM operation is to manage all aspects of the SDRAM with a sufficiently complex memory controller that issues properly ordered commands with the right command timing and a proven SDRAM PHY that provides exactly the signal timing the SDRAM requires at the multi-hundred-MHz clock rates used by today’s DDR2 and DDR3 SDRAMs.

1 2 3 4 >>

RSS Feed

Subscribe Now
MemCon 2010: Call for Presentations

View 2010 Agenda

Subscribe by Email

Enter your email address:

Delivered by FeedBurner

Recent Posts

About Author

The Denali Memory Report addresses trends, analysis, and news for the semiconductor memory industry. The blog is designed to provide practical and unbiased analysis of the memory market, including vendor profiles, technology roadmaps, price/supply outlooks, and other news developments.

Search

powered by b2evolution blog software