When Linux does well: the e1000e Ethernet bug fixed

October 02, 2008 6:31 PM EDT
One reason I love Linux is that when there's a problem, it gets fixed. Usually, it gets fixed in a hurry and that's exactly what happened with the e1000e Ethernet bug.

To bring you up to speed, a pre-release version of the 2.6.27 Linux kernel, which was being used in several beta Linux distributions, was sometimes frying the Ethernet firmware in systems equipped with the Intel ICH8 and ICH9 chipset and their 82566 and 82567 Ethernet chipsets. The major distributions to worry about were the Mandriva Linux 2009 pre-releases; Novell's SUSE Linux Enterprise 11 Beta 1 and openSUSE 11 Beta 1; Fedora 10 release candidates 1 and 2; Gentoo Linux; and Ubuntu Intrepid Ibex.

OK, so most people were unlikely to ever see this bug, but, on the other hand, a lot of people play with beta Linux distributions. In particular, Fedora was very close to shipping so it's reasonable to assume that quite a few Linux users were putting it through its paces.

Now, thanks to Intel, and a nudge from Linus Torvalds, there's code that will fix the problem. This fix will be in the next pre-release version of the 2.6.27 kernel -- Linux 2.6.27-rc9 - on October 5th.

Torvalds, in the gentle way he guides the Linux development team, pointed out on the LKML (Linux Kernel Mailing List) that "Btw, the _real_ bug is clearly in the hardware design that allows you to brick those things without apparently even having a lock bit."

Torvalds continued, "I'm hoping Intel doesn't treat this as just a software bug. Some hw designer should be thinking hard about which orifice they put their head up in. It used to be that you could fry some monitors by feeding them out-of-range signals. The _monitors_ got fixed."

The next day, Bruce Allen, a Linux kernel developer and Intel engineer, announced a "patch [which] is meant to prevent all future corruptions of the e1000e NVM (non volatile memory) after the driver is loaded." Torvalds immediately applied it to the next test version of the kerne.

This is not the end of the story. This is a fix that prevents the problem, but it doesn't explain how the problem happened in the first place. But, Allen wrote on the LKML, "This should allow us to move forward with debugging without allowing any other bad element or the e1000e driver, to write to the NVM area unexpectedly."

Currently we (Intel Ethernet) are reproducing the issue on multiple machines in house, we are working on the issue with the other core Linux teams here at Intel and within the community. No resolution yet but we are much closer now."

Once the problem is nailed down, "we will post patches to help users who have had this problem restore their eeprom from either a saved image from ethtool -e or from another identical system."

By the time the next production version of the Linux kernel, 2.6.28, comes out later this year, this problem will be just obscure developer history rather than a current concern.