The forward slash that crippled Google
- TAGS:Google, programming, security
- IT TOPICS:Data Center, Internet, Operating Systems
I've been thinking about the Google systemwide glitch that happened over the weekend. That event led to every site being labeled as malware and after some email badminton between Google and StopBadware.org the event was attributed to everyone's favorite fallguy: "human error." From my reading of the events that led to what, I believe, is Google's first systemwide, visible failure I tend to agree it was human error. But not an error caused by some poor techie keying in a forward slash in the wrong place, but an error of system design that allows such a simple human goof to bring down the house. You can read the Computerworld news story on the meltdown here.Â
Shortly after the event, Google's Marissa Mayer (one of the company's senior honchos) posted a recap of the glitch which reads in part, "Unfortunately (and here's the human error), the URL of '/' was mistakenly checked in as a value to the file and '/' expands to all URLs. Fortunately, our on-call site reliability team found the problem quickly and reverted the file. Since we push these updates in a staggered and rolling fashion, the errors began appearing between 6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25 a.m., so the duration of the problem for any particular user was approximately 40 minutes."
Now, here's where I will start disabusing anyone and everyone of any capabilities I have as a computer programmer. My follies with DOS included using the C:del *.* command to wipe out one my earliest personal computers. I suffered (and I mean suffered) through a mainframe SQL course where my only salvation was finding a programmer who would trade beer for code and get me through the course. But to think that there are still major systems out there that can be brought down by a forward slash taking down the house is staggering. I wonder what a backslash would have created?
Over at ZDNET, security honcho Ryan Naraine does a good job at decrying the monoculture of Google as a security weakpoint. Monocultures are dangerous in the computer industry (one little virus can infect a world of Windows computers) and in human cuture, but the Google glitch was not a security failure. In fact Google was trying, in working with Stopbadware.org, to make the system more safe or at least less dangerous for web surfers.Â
The Google glitch was a design failure. Design failures take place when system growth overpowers the original concept. No one can design a system that anticipates every failure. Instead you create systems that have multiple backups that allow the system to degrade gracefully while you try to figure out what went wrong. Multiple redundant systems and humans who are trained and remain in charge allow you to safely land an airliner in the Hudson River after several flocks of birds conk out your engines. An errant slash in the wrong place should not be allowed to send the world's largest search engine into convulsions.

