/ Denali Memory Blog
 

Archives for: April 2010

Sometimes it takes decades but NAND Flash semiconductor memory is turning out to be quite the media killer. Over the last decade, NAND Flash memory has killed off 35mm photographic film for all but the most dedicated still-photography enthusiasts. With the advent of dSLR (digital single-lens reflex) cameras that also shoot video, such as the Canon 5D and 7D dSLRs, NAND Flash memory now seriously threatens to replace photographic film for movie and TV production because of the lower costs and faster workflows. The latest confirmation of NAND Flash’s lethal effects is this announcement in the Washington Post reporting that Sony plans to terminate production of floppy disks and will stop selling them in Japan next year after producing them for 30 years. For many who thought the floppy disk dead already, it may come as a surprise that they are still in production but they are apparently still in use in Japan where Sony claims to own more than 70% of the market. NAND Flash simply offers computer users many advantages including more capacity, better performance, and more ruggedness compared to the venerable floppy. Soon, the only remnant of the floppy disk’s former glory may well be the icon on a Microsoft Office application toolbar that serves as a shortcut for saving a document or file.

The proven lethal effects of NAND Flash on other portable media also point to the current controversy between SSD and HDD advocates. Hard disk drives retain their lead over SSDs in storage capacity and storage cost/bit and HDD vendors continue to move heaven and earth to maintain that lead. SSD vendors seem just as eager and determined to overtake HDD vendors and the historic record of NAND-Flash-based storage devices’ ability to kill older, competing media formats is pretty compelling. Already, enterprise-class storage systems are finding compelling applications for SSDs as storage accelerators. Leading-edge PC users including gamers and video producers find SSDs compelling for their speed, albeit at a higher cost for storage. Today, SSDs are mostly used as HDD helpers. Tomorrow may be another story. It obviously won’t happen next year or even the year after that, but I wouldn’t bet against the ultimate outcome.

Wondering whether an SSD really makes that much difference to laptop performance? Wonder no longer. Corsair has posted the following video that graphically illustrates how much faster boot and application loading times are when there’s an SSD present. This video should leave no doubt in your mind. In particular, watch as the video has to speed up while the HDD-equipped laptop is doing pretty much nothing.


Steve Leibson
04/26/10

What is a Flash cache?

A Flash cache acts like SRAM memory caches that are designed to speed up DRAM access times; Flash caches speed access to HDDs in an analogous manner. Data is drawn from HDDs as needed and the retrieved data is cached in NAND Flash. The next time this data is needed, it’s drawn directly from the cache instead of the slower HDD. Flash caches do not require as much NAND Flash memory as SSDs, and therefore cost less, but they can deliver significant performance improvements when paired with HDDs—in fact the effective performance of a Flash cache paired with an HDD can actually exceed that of an SSD.

(Note: It’s also possible to use DRAM to cache HDD data, but DRAM is more expensive than NAND Flash for equivalent capacity and DRAM provides only volatile storage unless you add a backup battery. For these reasons, NAND Flash is the better choice for an HDD memory cache.)

Using a faster memory technology as a cache for a slower-yet-cheaper memory technology is a relatively common technique used by computer designers. Designers have always faced memory access-time problems and caching is a very, very common solution to this problem. If the typical working set is a small fraction of the total HDD capacity, then a cache that holds that working set will make the HDD appear to be as fast (or almost as fast) as NAND Flash memory, resulting in a dramatic improvement in application performance.

Adding a cache can deliver significant performance gains for I/O-intensive workloads but it’s critical to make the cache invisible to the application to avoid rewriting the application code. You make a Flash cache invisible by intimately integrating it into the operating system and the file system. This is a critical step because it sidesteps the need to rewrite the application so that it need not decide what goes where. Application code must explicitly manage code and data placement in storage when a system employs a mix of HDDs and faster, Flash-based SSDs but not if the Flash memory is configured as a cache. If you can write or rewrite an application so that it explicitly controls where data is stored, then a mix of SSDs and HDDs can be used effectively. NAND Flash cache used to accelerate HDD performance solves a more common problem—a problem ingrained in all existing application programs that are not written for an explicit SSD/HDD storage hierarchy.

The question is: Is there a practical working set that’s a small subset of a computer’s total disk capacity? Intel’s Amber Huffman presented some very interesting data in 2008. Intel tracked five employee power users and observed how they used data over successive time periods. Four out of five of these power users used no more than 6 Gbytes of data for a working set in a typical 10-hour work period. A 6-Gbyte NAND Flash cache is easily and economically achievable today. It’s not an incredibly expensive amount of NAND Flash memory. With the right parallelism designed into the cache, you can get the required access time, throughput, and capacity to make a huge improvement in application performance by masking the HDD’s access time with a relatively small Flash cache.

Here’s a different example from the Enterprise world that demonstrates the advantages of using Flash memory to cache HDD storage. Pliant Technology, a vendor of high-speed Flash Enterprise SSDs, studied a typical data warehouse. The company compared high-end disk arrays composed of fast, enterprise-class, short-stroked HDDs against a hybrid array of four SSDs and many low cost HDDs (not short-stroked). Pliant’s hybrid drive array dramatically increased available disk capacity and performance versus the conventional short-stroked HDD array. The disk capacity per rack shelf increased by almost an order of magnitude, while the IOPS performance increased 6.5x.

Note that the cost per rack shelf also increased significantly, but this increase was compensated by a corresponding decrease in the number of shelves required for storage. The key figures of merit for this example:

  • total storage-system cost decreased by 50%
  • cost per IOPS decreased 50%
  • the cost per gigabyte of storage improved, and
  • the hybrid disk array required one eighth the power to operate and cool compared to the amount of power needed to operate and cool the array of fast, short-stroked HDDs—nearly an order of magnitude improvement in power consumption.

So NAND Flash memory used as a disk cache whether for low-end applications or enterprise installations shows great promise.

Note: This blog entry is based on a Denali White Paper: The World is Flash.

Love DAC? Design chips? Looking for a job? Today’s your lucky day. Denali, Atrenta, and SpringSoft want you to be able to attend DAC in Anaheim (We’re going to Disneyland!) so much that they’re willing to spring for a free exhibit pass for you. All you need to do is get your carcass down to the Anaheim convention center for the event. The exhibits are open June 14-16. The rules are simple:

1. The recipient must be from a company not exhibiting at DAC.

2. To obtain a "Free-DAC" Exhibit-Only 2010 pass, the recipient must belong to a semiconductor, OEM, system or service provider in the electronics industry.

3. All recipients must include their current place of employment and use a corporate email address. In the case that the recipient is no longer employed, they may use a personal email address and note their previous employer. Out of work EDA professionals are invited to register for a free pass.

**** Bonus, Bonus, Bonus ****

If a free pass isn’t enough of a shameless bribe to get you to attend DAC, here’s a chance at yet another shameless bribe: the first 500 people to register for a pass (and who meet the above qualifications) will enter a drawing for a new Apple iPAD. Better hurry because a lotta’ people are already ahead of you in that line.

Ready to register? Go here.

Numonyx has announced or reannounced two 128-Mbit non-volatilve memory devices based on the company’s 90nm PCM (phase change memory) process technology. These two devices target existing NOR Flash memory sockets and the company’s press release claims that both products are available now in production quantities. The two announced PCM memory devices are the Omneo P8P PCM parallel-I/O memory and the Omneo P5Q PCM serial-I/O memory. The Omneo P5Q PCM device is configured as a 16Mx8-bit memory and employs an SPI interface and supports dual and quad SPI protocols. The Omneo P8P PCM device is configured as an 8Mx16-bit memory with a 16-bit parallel I/O interface and the device has a secondary SPI channel for issuing block commands to the device and for low-pin-count, in-system programming.

Numonyx PCM chips

The initial read access time for the parallel PCM device is given as 115 nsec with 25 nsec for subsequent reads during 8-word (the parallel device is configured as a x16-bit memory) , asynchronous page reads. Write cycle time is a short 70 nsec, but the writes are buffered internally by a 64-byte FIFO that masks the actual cell-specific write time. Using the same memory cell, the serial P5Q PCM memory device’s read-access time is constrained by the SPI interface, not the fundamental access time of the memory cell. One of the big advantages of PCM over NOR Flash is the ability to directly write one memory location with no need for an erase cycle.

Two significant attributes of the devices, made significant because of the technical issues surrounding, are the announced write endurance and the operating temperature range of the devices. Numonyx had previously announced a version of the P8P PCM with 100,000-cycle write endurance. The latest announcement jumps the rated write-cycle endurance for both the P8P and the P5Q memory devices to 1M write cycles.

Operating temperature is important because PCM employs a thermally activated storage mechanism. Ones and zeroes are stored in PCM cells based on a heating/cooling cycle. Fast cooling produces an amorphous state in the PCM’s chalcogenide material and slow cooling essentially anneals the cell into a crystalline state. If the ambient temperature is too high, the heat alone will start to anneal PCM cells. Numonyx’ data sheets give both memories’ operating temperature range as 0 to 70° C.

With data sheets on line and announced volume availability, these two PCM devices from Numonyx appear to the first commercially available, commercially viable PCM devices on the market. Numonyx is not currently disclosing pricing except to say they’re currently charging a premium given that the devices are not very far down the learning curve. However, given the activity that the company has put into presentations and articles over the past two years, they likely are not the last.

One of the most ignored Intel announcements of recent memory must be Doug Davis’ early disclosure at IDF (China) on April 14 (see the hour-long keynote video here) of the company’s new Atom-based Tunnel Creek, an SOC specifically designed for embedded applications. Intel’s Atom processor, a relatively low-powered implementation of the “Intel Architecture,” has been taking the low-end notebook and netbook world by storm. Atom processors also work well and have been rapidly adopted in the embedded world when the embedded product’s block-diagram resembles a PC. However, smaller embedded systems can’t adopt the multichip, chipset-style design of PCs. Many smaller embedded systems require even fewer chips for cost-effective implementation.

Enter Intel’s Tunnel Creek, which sports four x1 lanes of PCIe in addition to the Atom processor core; memory, audio, and video controllers; and an LPC block. The simple addition of a flexible PCIe interface means that embedded designers can gluelessly add a variety of different chips to the Tunnel Creek SOC to create embedded designs with minimal BOMs.

Intel Tunnel Creek block digram

Figure 1: Intel Tunnel Creek block diagram

What can you connect to a PCIe interface that would be useful in an embedded design? Here are just a few ideas that immediately come to mind:

  1. An ASSP with a PCIe interface. In the same talk where he disclosed Tunnel Creek, Davis also mentioned that Intel will be developing more than one application-specific I/O hub for specific use with Tunnel Creek. In addition, there are many other likely candidates already on the market such as advanced video/graphics controllers from companies such as nVidia and fast Ethernet controllers from companies such as Realtek.
  2. An FPGA. Both Xilinx and Altera offer FPGAs with integral PCIe interfaces. Imagine the ability to gluelessly graft an FPGA directly to an Intel Atom-based SOC. Tunnel Creek should be able to do that.
  3. An SSD. You can get PCIe-based SSDs that provide more performance than SATA- or SAS-interfaced SSDs because the PCIe interface is more efficient for high-speed I/O than disk-centric interface protocols. Why add an unneeded disk controller to the mix?
  4. Your own ASIC. Intel and TSMC announced earlier that the Atom core would be available to select customers as an ASIC/SOC core. Perhaps you don’t have the production volumes needed to qualify as a select customer for that program but you’d still like to avail yourself of Intel’s processor architecture because of the immense pool of existing software, the many available operating systems for the x86 architecture, and the broad development tool support. Tunnel Creek gives you a way of doing so using a standard processor-based SOC that will likely be produced in fairly high volumes. For lower production volumes, a 2-chip embedded design may well be the most economical.

If these possibilities excite your inner design muse, then start bothering Intel to see when you can get your hands on some Tunnel Creek samples.

Network World has just posted an SSD comparison test written by Logan G. Harbaugh. The test pitted some consumer-class SSDs against enterprise-class SSDs and with an Adaptec ASR5805/512 SSD controller and MaxIQ kit, which uses an attached SSD to accelerate attached drives arrays via flash caching. Overall, Harbaugh found that the SSDs improved system performance by a factor of 2 to 10 depending on the product.

The discussed test results illustrate many key points we’re learning about SSDS used both as direct storage devices and as caches for rotating storage. For example, the article discusses the “write cliff,” which is the sudden loss of write performance sustained by some SSDs after they are initially filled. The cause of the performance loss is the need to scavenge free space from deleted files and the need for “wear leveling,” which prevents premature NAND Flash failure in the SSD. If the SSD capacity is not overprovisioned (larger than the rated size), then writes cannot progress until the SSD’s controller finds, liberates, and organizes the free space. If the SSD capacity is overprovisioned, then free-space “garbage collection” and wear leveling can occur as a background task and will not increase the drive’s write latency. The write latency of an SSD that’s not overprovisioned can range unpredictably from milliseconds to seconds depending on the amount of free space to be liberated. Of course, there’s a cost for capacity overprovisioning and the higher cost of enterprise-class SSDs reflects the cost for added storage. The Network World article notes these characteristics in fine detail.

Harbaugh tested the Adaptec SSD controller/Intel SSD combo and drives or drive arrays from Apricorn, Compellent, Dot Hill, Fusionio, HP, and Ritek. The testbed was an HP ML370G5 server running Windows Server 2003 with external storage connected via Fibre Channel through a 2Gbps HP FC switch and the tests were run with IOmeter. Harbaugh notes that the Fusioio 32-Gbyte ioDrive and the HP StorageWorks IO Accelerator for blade servers (made by Fusionio) achieved the highest throughput in the tests. The Fusionio SSD delivered read and write throughputs of 706 and 456 Mbytes/sec respectively while the HP StorageWorks IO Accelerator delivered read and write throughputs on a blade server of 806 and 618 Mbytes/sec respectively. Both products delivered “excellent IOps, with no write cliff.” Note that the Fusionio drive costs $6,829.99 and the list price for HP StorageWorks IO Accelerator ranges from $4400 to 13,200 on the HP Web site for storage capacities ranging from 80 to 320 Gbytes. The tested 160-Gbyte HP accelerator lists for $7700. So performance comes at a price (nothing new there) that’s substantially higher than the SATA drives configured as HDD replacements.

Only the application developer can determine if high throughput and the absence of a write cliff is worth several thousand dollars. In many enterprise-class situations, the additional hardware cost is irrelevant. Online businesses such as Amazon.com have found that even a 1% increase in response time causes lost sales to the tune of millions of dollars because online customers are impatient and they bore easily. They will not wait long before wandering off, perhaps to a faster competitor. Large investment firms measure millisecond increases in online trading latency in terms of millions of dollars as well. The first trading firm to get a deal leaves competitors choking in the dust with no deal at all. So a few thousand dollars to avoid a sudden latency increase measured in tens or hundreds of milliseconds or even seconds is trivial insurance against large potential business losses.

The Network World article notes that one advantage these faster drives have is their use of the PCIe interface rather than the drive industry’s preferred SATA or SAS interfaces. The PCIe interface is closely coupled to the computer’s or server’s processor and can therefore provide very low latency and very high throughput, which is easily increased simply by adding parallel PCIe lanes. Several of the storage products including those from Fusionio and HP that were tested in this article employ PCIe interfaces to improve the storage subsystem’s performance.

Anandtech has just posted a meaty article about SandForce SSD controllers as used in SSDs from OCZ and Corsair. (Understanding SandForce's SF-1200 & SF-1500, Not All Drives are Equal) It’s worth a read from at least two perspectives. First, it gives you some pretty deep insight into the real importance and value of the firmware running on these SSD controllers. As the Anandtech article discusses, controller firmware can make a substantial performance difference using the same hardware. In the case of the SandForce SF-1500 enterprise SSD controller and the company’s SF-1200 “client” SSD controller chips, firmware makes all the difference in performance because the two devices are electrically the same IC. Both chips running their associated firmware are rated at 30K random-read IOPS (I/O operations per second for 4-Kbyte reads) but the SF1500 is rated at 30K random-write IOPS (4-Kbyte writes) while the SF-1200 is rated at 10K random-write IOPS, which is a whopping two-thirds less performance than delivered by the electrically identical SF-1500 controller chip. There’s also an order-of-magnitude difference in rated data reliability between the two controllers, which you’d expect customers to want for enterprise-class SSDs. From the perspective of product positioning, this performance spread makes tremendous sense because the enterprise-class SF-1500 is reported to be substantially more expensive than the SF-1200 and the price premium is largely attributable to the differential speed and data-reliability performance delivered by the controller firmware (along with some extra reliability testing for the SF-1500 chip).

As Anandtech reports, because of a special relationship with SandForce, OCZ apparently got a special “fast” version of the control firmware for the SF-1200 controller that delivers the faster random-write IOPS performance of the SF-1500 and OCZ will reportedly be using that controller-chip/firmware combo in an upcoming Vertex 2 SSD. Anandtech further reports that this “fast” SF-1200 firmware reached at least one other SSD vendor, Corsair, through an early firmware release. Anandtech has tested an early review version of this drive and it delivers the higher performance. A later production version of this controller firmware apparently does throttle the SF-1200’s write IOPS performance to the rated “client”
SSD levels.

Despite Anandtech’s stated concerns, all this versioning and throttling isn’t anything particularly new or insidious in the electronics industry. The end customer pays for performance, which is both a real and perceived value, whether or not the performance is due to hardware or firmware differences. From an SSD customer’s perspective, it’s the drive delivering this performance and most customers will not care or even understand the fine distinction between hardware-delivered and firmware-delivered performance. In fact the Anandtech article notes that Intel does a similar thing by enabling or disabling Hyperthreading on two differently priced Core i5 and Core i7 processors. Same die, different performance. I know of a case all the way back in the 1970s where changing one bit in a product’s firmware image doubled the amount of RAM available to the user. Changing that bit cost the customer a few thousand dollars. So the practice isn’t new. But SSD vendors do need to know the difference. They need to understand the cost/performance tradeoffs they are making, because their products’ performance will reflect the consequences of these choices.

Thus the point of this blog entry is to point out, perhaps even to underscore, that firmware is a big differentiator in SSD performance, just as it is in just about any product category. SSD vendors need to understand this. Companies developing SSD controller chips should be aware that excellent controller firmware can substantially differentiate products, just as it has for SandForce’s SF-1500 and SF-1200 controllers. SSD manufacturers should be ready to grill their controller vendors about the supplied firmware. Is it as good as it can be? Could it be faster? Very good questions to ask as the SSD competitive landscape continues to heat up. SSD controller-chip vendors: be prepared.

1 2 3 >>

RSS Feed

Subscribe Now
MemCon 2010: Call for Presentations

View 2010 Agenda

Subscribe by Email

Enter your email address:

Delivered by FeedBurner

Recent Posts

About Author

The Denali Memory Report addresses trends, analysis, and news for the semiconductor memory industry. The blog is designed to provide practical and unbiased analysis of the memory market, including vendor profiles, technology roadmaps, price/supply outlooks, and other news developments.

Search

powered by b2evolution blog software