Feature
All fail down: ESD-induced failures lead to a logical trap
Tales From The Cube: When terminals in low-humidity cities fail, it's not hard to finger ESD as the cause. Figuring out the fix turns out to be more of a challenge, especially when engineers assume a single design change should do the trick.
By Larry Baxter, Capsense.com -- EDN, 11/27/2008
Computer terminals were failing all over town—not all towns, just those with low humidity, such as Las Vegas, which was particularly hard-hit. I was called in as a consultant for a company there to repair the failing units. The terminals included an 8-bit microcontroller, an LCD, a membrane keypad, and the usual other stuff. Most of the terminals were wall-mounted. Users would unwittingly shuffle across a rug, picking up as much as 30 kV across 300 pF of body capacitance, discharge it through the keypad, and reset the device. This resetting would cloud the unit’s memory. The problem was not affecting terminals in the company’s offices in high-humidity cities, such as Oahu and Miami.
I looked at the schematic, found nothing suspicious, and asked the engineers what they had tried.
“Everything,” they responded. “Nothing worked.”
“Everything? Could you give me some details?” I asked.
“The reset line sort of snakes around the board,” they answered. “We added a few 0.1-µF capacitors on it. The path of the zap through the keypad didn’t get directly to chassis ground; we added many short braid connections. We scoped the power rails and added more bypass capacitors. We added decoupling capacitors to the ac input because, if the chassis gets a pulse, it could couple into the computer board through the ac-input connection. We added some 10-kΩ isolation resistors on the logic driving the keypad. We added more 0.1-µF capacitors here, too, in case the pulse was finding its way back through the drivers. Nothing worked. We tried everything.”
I mulled the situation over for a minute. They’d done all the right stuff. The terminal should be working. I could think of no other patch.
“Can I look at the terminal?” I asked.
“Well, here’s one,” they replied.
“Is this the one with the fixes in it?”
“We took the fixes out,” they said. “None of them worked.”
It was time to break for lunch anyway; my brain doesn’t work well if calorie-deprived. At lunch, after a glass or two of Cabernet, another question occurred to me.
“Listen, just to make sure, when you say you took the fixes out,” I said, “you mean you put all the changes in, didn’t fix it, and then took them all out, right?”
“No, not exactly. We tried them one at a time. None of them worked,” they answered.
|
I was filled with great happiness and amusement—and, no, not because of the Cabernet I had consumed. “After lunch,” I said, “we’ll put them all back in—all at the same time.”
Sure enough, when we installed all six fixes, the terminal was bulletproof. Pulling them out one at a time, we found the two critical fixes and wrote the ECO (engineering-change order).
The company had fallen into an insidious logic trap: the assumption that the failure has one cause instead of several. One at a time, the fix would help somewhat, but, in most cases, some help is hard to recognize. After installing all the fixes, I could easily see the effect of removing one. Now that I knew how to identify this sneaky trap, I began to see it in many other instances.
Larry Baxter, of Lexington, MA, is a consulting analog and embedded-systems engineer at Capsense.com. You can reach him at larry@capsense.com.














