Industry


Ads by TechWords

See your link here


Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
Scott McPherson's picture
Scott McPherson

Tiptoeing Through Minefields

RIM CEO would make a good Amity mayor

As we all are painfully aware, users of the ubiquitous Blackberry device felt great pain - approaching heroin withdrawal, in some cases - during a three-plus hour outage Monday, February 11th. Research in Motion (hereinafter called RIM), the creators of the Blackberry, had a little "My bad" on a system upgrade. Quoting from an Associated Press (AP) story on the outage:

It was the second major outage for the service in less than a year. In April, a minor software upgrade crashed the system for all users. A smaller disruption in September also was caused by a software glitch.

So what was the cause of this most recent outage? From all appearances, it was bad Change and Release Management. An ITIL faux pas, as in ‘What is ITIL?" By RIM's own admission, an upgrade designed to increase capacity was to blame. The full story can be found at: http://www.msnbc.msn.com/id/23134432/

I will ask what every CIO was thinking when s/he read this story: Who in the wild wild world of sports does capacity upgrades at 3:30 in the afternoon on a Monday? What fool made the decision to upgrade in the middle of a North American workday? Does RIM even know the words Change Management? Or Release Management? And why does RIM not have more redundancy? Again, from the AP story:

Experts said RIM's system is relatively reliable, but its centralized structure means that when there are problems, they can affect millions of users.

E-mail sent to and from BlackBerry phones in North America all goes through a Network Operations Center. It appears the problem occurred there, when one of two Internet addresses that relay e-mail from corporate servers stopped responding, according to Zenprise, a Fremont, Calif., company that helps companies troubleshoot BlackBerry problems.

"Any time you got a system that's got a NOC, a Network Operations Center, you have the potential for a single point of failure," said Jack Gold, with technology analyst firm J.Gold Associates in Northborough, Mass.

"What's a bit surprising to me is that with all the work they've been doing over time ... that they haven't been able to have enough redundancy in the NOC so that there isn't a single point of failure," said Gold, who has done business with RIM.

Whoever was responsible for the outage, apparently s/he will not be too severely punished. According to a quote from CNet News, the co-CEO of RIM is displaying a lack of concern about the outage so outrageous that he is one of my nominees for the Annual Mayor Larry Vaughn Award. Mayor Larry Vaughn, as you may recall, was the character played by the late actor Murray Hamilton in Jaws. As Amity mayor, he steadfastly denied that a shark was eating his constituents, until the weight of evidence was too overwhelming for even him to deny (a beach covered in bloodied swimmers will do that to you).

Anyway, under the headline "RIM's co-CEO downplays BlackBerry outage," RIM's co-leader Jim Balsillie was quoted as saying:

"It was an intermittent delay, a couple of hours," he said. "It's old news. It happened days ago."

That's it? No apology? No contrition? No acknowledgement that your infrastructure is problematic, not to mention your decision-making when it comes to change and release management?

Mr. Balsillie might want to ring up (or email) Canadian Member of Parliament Garth Turner, who was quoted in a CBC News article as stating the impact of the outage with a little more candor and honesty. MP Turner said:

"Everyone's in crisis because they're all picking away at their BlackBerrys and nothing's happening," Turner said. "It's almost like cutting the phone cables or a total collapse in telegraph lines a century ago. It just isolates people in a way that's quite phenomenal."

To read more about the impact of the RIM failure to Canadians, read this story:

http://www.cbc.ca/world/story/2008/02/11/blackberry-outage.html

Three hours might not sound like a long time, and if Brittney or Hannah were not able to text their buddies in high school for an afternoon, we can live (rejoice?) with that. But the customer base for Blackberries has expanded to include law enforcement, homeland defense, first responder and major decision-makers at all levels of government. These people cannot afford to be out of touch, because much of their correspondence takes place via email. These people are also out of their offices much of the day, and cannot afford to be tethered to a desk or even a laptop. Duh, that's why they carry Blackberries in the first place!

So for a North American outage to take place in the middle of the workday, not because of a failure of equipment but rather because of a failure to make a good decision -- and that is the second major failure in a year - well, someone needs to wake up at RIM. Here's some free advice: Do your upgrades in the wee hours, like the rest of us have to do. And based on your rate of growth, you can afford to pay the overtime.

Finally, if you make a mistake, say you made a mistake. It plays better with your customers.

What People Are Saying

RIM - Doesn't Get It

This article pretty much reflects what I have heard about RIM management - disconnected from reality. Their CEO is an idiot for making such a statement. Their CIO I hear from insiders is clueless and the chance for future major outages are a high probability within the next 4-6 months. Their main issue is that they do not ahve a clue on how to run high performance systems and they have been having a number of smaller outages over the past 2 years which were red flags waving, but they ignored them.
My company depends on uninterrupted service and after this 2nd major outage in less than a year plus word floating that more outages are eminent, out CIO is looking for an alternative and will drop our 57 BB devices by April 1. A shame - they were a great product ruined by poor management.