|Feb12-13, 02:52 PM||#1|
how to purposely INDUCE EMI, noise, et cetera?? drill? phone?
troubleshooting a GPU server, it has a card in it that generates a 20-100 MHz clock signal that goes to the actual GPU. Have worked on several of these units, when they fail the gpu doesnt get the clock signal so there is no VGA output.
not really that important, but my question is, im trying to narrow down what the problem is, i want to induce lots of noise around the small coaxial cable going to the GPU clock, to try to get it to fail.
sometimes the unit will keep failing, then you move some stuff around, and it works, theres not much rhyme or reason, and i cannot continue troubleshooting until i get it to fail again. noise wise, i have suspended an AC black and decker drill locked on full speed above the coaxial cable, and internal cards. loaded the power supply with all kinds of fans, heated the circuit up with heat gun, chilled it with air duster, shoved cell phones inside of it that were making calls, nothing seems to get it to fail consistently.
I am looking for ideas on how to get it to fail, how to add more noise. any suggestions are greatly appreciated. at this time i am not looking on other means to troubleshoot the unit, because most have been exhausted now, by several people, over a few month time span. i do not have access to any test equipment to help, but even if i did i still have to recreate the problem.
|Feb12-13, 03:20 PM||#2|
The problem could be mechanical - while electromagnetic radiation can disturb electronics, this should not happen on a usual noise level. Something roughly in sync with your clock might have some chance.
Enough irradiation would certainly work (permanently), but that is not practical either.
Shake it? Hit it with hard materials?
|Feb12-13, 03:26 PM||#3|
Intermittent faults are definitely the worst kind - commiserations.
Have you eliminated the possibility of a software or on board hardware fault - or even a power supply glitch? I guess you don't have equipment for a substitution test.
I'm not sure that your scattergun approach with the Drill will help. If your drill were to be putting out high enough levels of interference for this test then you would already be getting complaints from the neighbours! Your tests need to be better targeted.
There are a million and one different possible approaches but most of them would not be guaranteed to work.
Rather than the drill noise source, a sweep signal generator could help to produce CW interference. I assume your GPU input is filtered to eliminate frequencies outside the range you quote so 20-100MHz range would do the trick and you should expect around 1V RMS to be available from any generator you could get hold of.
You could try a screened box for the equipment, better grounding, replacing cables with screened versions.
You seem to be assuming that the problem is failure of the clock signal to arrive at the input. What is the basis for that assumption? What level is the clock signal? Are the two units as close together as possible? If there seems to be a correlation then you could record the levels of the clock signal and failure events.
If you can't actually be sitting next to the gear, waiting, then some form of recording would definitely help and you would need some datalogging equipment and software for a laptop. Not too expensive as it's available as School Lab equipment but you'd also need other peripherals to measure noise, signal levels, temperature etc. as input to the datalogger. It could save man-days worth of money in the long run though.
It would be too much to expect you to have an RF spectrum analyser, I guess. Do you have a nearby airport with high power radar.? etc etc.
|Feb12-13, 04:18 PM||#4|
how to purposely INDUCE EMI, noise, et cetera?? drill? phone?
Amen on intermittents.
One has to encircle the problem, binary search if possible.
Can you swap clock boards between two GPU's? That'd tell you board or slot...
I have found a LOT of troubles by physically agitating things.
Poke at it and see if you can make it say "ouch".
Once had a rack of computer equipment that shut down roughly every 12 hours. But if you left the door open it'd run for weeks.
Turned out one of the power supplies had its fan mounted backward so it recirculated hot air with the one adjacent. It'd get too warm, shut down, cool off for a few minutes then come back. Found that one by plucking a hair and holding it in front of all the fans in the rack one by one to see how the wind blew...
|Feb12-13, 04:47 PM||#5|
really good input thanks guys.
sophiecentaur, we are right under an airport. we have thought about solutions such as cable shielding, et cetera, but need to reproduce the failure before anything. the system in general is kind of hokie, the company that sells the gpu cards takes the cyrstal oscillator out, solders on some coax cable with an SMB connector that plugs into the clock reference board. the cable is 50 ohm and im not sure if that is proper matching for the crystal. But we have had the same setup in hundreds of other applications, just this one specific chassis and power supply combo, so its either power or noise related (assumed).
i didnt want to bore you guys with all the troubleshooting i have already done, swapping cards, swapping chassis components, grounding, carrying the chassis to different rooms, different work benches, non ESD, ESD, still not failure able to be reproduced. I made a Labview program to watch the voltage signals on the cards and make sure they didnt glitch, they all seemed fine, in conjuntion with my fluke meter setup on min and max looking at the same voltages.
I was going to make a simple program using labview and my cheap DAQ again, to push the reset button, then put, say a photodiode on the LCD screen, to do a reset count and statistically analyze the reboot failure rate. but really it may be 1 in 100, its still not going to tell me how or when. i figure extreme temperatures, or noise, induced in a way that would present a failure, would allow me to start doing process of elimination to deteremine which one failed.
the drill thing and cell phone thing were kinda humorous and for shts and giggles, but i am seriously out of ideas here and it may just be that i am out of resources.
you would think itd be easy getting something to fail :D
i have had similar problems with fans being reversed, something i realized when i dropped a piece of paper next to it, although it never caused over heating. but thats my whole mindset, i could rent the proper equipment to test this stuff, send it to a test facility, but i know its got to be something simple!
|Feb12-13, 05:33 PM||#6|
There is a formal way to do this. Go to your compliance test lab (where you go have CE mark and FCC testing done). Have them perform the following immunity tests:
Perform these tests on your suspect cable:
EN 61000-4-4 fast transient/burst immunity.
EN 61000-4-6 Conducted RF immunity.
Perform this test on your system (more expensive test):
EN 61000-4-3 radiated immunity.
This will cost some money, but at some point your time can get pretty expensive.
|Feb12-13, 10:24 PM||#7|
I've used drills and cellphones too.....poor man's EMI test. Skilsaw makes bigger sparks...
You say it's a clock board?
How long is that cable?
Is it just a single coaxial line? I'd guess so from the looks of SMB connector...
Yes, a logic analyzer would be mighty nice, or digital storage scope with a smart trigger menu.
For what little this is worth, and you are probably past this point, but the old troubleshooter in me can't resist... i apologize if this is just old man's talk, dont mean to bore you.
Long ago in a galaxy far away i had signals propagating down a line that got one bit corrupted every couple hours.
Encircling the problem was horrific because it was a synchronous sixteen bit I/O bus on differential lines. Won't go into that..
Once we got it corralled, found the problem by looking at each of the fifty or so lines with a 100mhz analog (NOT digital) 'scope while hammering the same address at 20khz repetition rate.
When one turned the lights in the room way down and brightness on the 'scope way up, one could see the transitions were mostly perfect, but on a couple lines one transition in every several thousand was delayed about 50nanoseconds.
What it looked like was a bright trace perfectly shaped that was good data, with a very faint ghost trace that was delayed (hence bad data), of course dim because it occurred so infrequently.
The delay caused an overlap that occasionally corrupted an address.
Moving the cable(which was about 20 feet long) would change symptom , but the boards worked fine in the board tester(no capacitive load).
Further it was sensitive to cable length and to the data pattern transmitted on adjacent bits.
We'd never have found it without that old analog 'scope.
Point being, your low tech approach is probably what will get you to the answer.
Keep up your creativity with the test equipment you have.
Look at that clock signal with a fast analog 'scope and darkened room....
Ours turned out to be a batch of line drivers from TI had been recalled for intermittent behavior but a handfull of them made it through supply chain and QA inspections into our boards. This was 1987-ish, days of 14 pin DIP IC's. The IC's had low drive capability so were affected by reflections and crosstalk in the line. Replacing the IC's with 100% tested ones fixed it.
Good luck - i hope you find a loose connector center spring or broke solder joint. But 'scope that clock's line driver IC output.
Again i apologize if this is no longer relevant.
But i do share your pain.
|Feb12-13, 11:06 PM||#8|
I think my worse tracing an intermittant crosstalk interferance was on an industrial inmotion scales system. This system uses 4-20 ma circuits. Including the load cells. Anyways I spent weeks trying to isolate random fluctuations in weight. One day while looking at the scale indicator I noticed that when I keyed up my radio the weights jumped. This lead us to find a poorly shielded ADC converter.
In regards to the OP. Random intermittants can be a bugger. Generating interferance may lead to poor shielding I would look at the required signal and find a means of generating that signal to find a poor shield. In the case of power there are relatively cheap power recorders. The fault could also be the recieving circuit failing to poll the signal. I ran into a refrigeration controller failure. That turned out to be the master controller CPU getting a bad command to check the RTU.
You can probably use a frequency sweep generater to induce noise. Their are plenty of models and waveform generatures available. I would also look at what other signals are being polled when the clock circuit failures occur.
|Feb12-13, 11:22 PM||#9|
Something just occured to me, if I'm not mistaken the frequency range you mentioned falls within CB radio range. A radio signal could have been the fault source.
Edit just checked CB is 20 to 27 Mhz. There are also HAM radios using The same range as the clock circuit.
|Feb13-13, 10:03 AM||#10|
Seriously, can you not swap one or both of the units with some other equipment, elsewhere? That would firmly establish whether it's the environment or the unit(s).
|Feb13-13, 11:51 AM||#11|
Anyway, i figured out the culprit. really appreciate your guys input, first time i used this forum for troubleshooting help and the stories and ideas really motivated me and made me think out of the box. thanks guys.
so the problem mostly happens during boot up, which should have been my biggest clue. the GPU and the clock cards need to power on simultaneously, if the GPU powers first, and doesnt have a clock, it wont work, even if clock is added later.
thankfully, there are power LED's on the clock boards. watching them closely you could see soemtimes they would glitch, or not come on at all. when it failed, they did not come on for like 5 seconds. i got it to fail a few times today, seems like if you let it sit for a night it will fail the next morning. thank god the LEDs were there, i should have been monitoring those voltages all the time on the scope, i will keep that in mind next time i troubleshoot.
so turns out its something in the power distribution board. I really just wanted to figure out the culprit, so not too interested in the pdb board itself. i will let the manufacture figure it out from here.
also thankful its not an RF problem, because that could have been more difficult to diagnose.
it seems like when i get stressed out over not figuring something out, i really lose motivation. but most of these weird problems get fixed serendipitously, so if you just go and play with the parts try weird things, eventually you will get more clues.
good story Jim. maybe we should start a post with troubleshooting stories, they could be entertaining and informal.
I read one in an RF book, about a very weird GHz radiated emission problem. Dont remember the specifics, but basically they were flying a helicopter over the city with an antenna and spectrum analyzer, looking for where the emissions were coming from. they zoomed in on a fast food joint. apparently the power of the signal was something huge that they never saw before.
they asked the fast food restaurant manager if they could peak around. he let them in, then they noticed a microwave in the kitchen with the door removed. employees were sticking food in there and microwaving it with the door open!
the manager boasted about it saying "yea i found out its saves us several seconds not having to open the door to put food in".
apparently he jammed something in the door striker to close the door safety switch.
im sure you guys have some funny troubleshooting stories also.
|Feb13-13, 12:16 PM||#12|
That's how a good doctor troubleshoots.
Glad you found it and thanks for letting us know...
and as you say it's when you let your mind wander that serendipity switches on.
Same is true in poker.
Thanks for your kindness.
|Feb13-13, 01:47 PM||#13|
Glad you found it and it wasn't an RF problem. Those are extremely daunting at times. I agree in that oft times its best to walk away from a problem for awhile, then revisit it later on. Many of my intermittants I had to solve I ended up figuring out while I was relaxingat home.
|Similar Threads for: how to purposely INDUCE EMI, noise, et cetera?? drill? phone?|
|Obama: Drill Baby, Drill!||Current Events||19|
|Phone service does not work properly (phone and interet through telephone wire)||Computing & Technology||3|
|Drill problem, linear speed of drill bit help plz||Introductory Physics Homework||1|
|A boat in water, pressure, motion, et cetera...||General Physics||9|