How to purposely INDUCE EMI, noise, et cetera? drill? phone?

In summary, the conversation discusses troubleshooting a GPU server that fails when the clock signal is not received by the actual GPU. Several attempts have been made to induce noise and recreate the failure, including using a drill, fans, and cell phones. The problem is intermittent and could possibly be related to a mechanical issue. Suggestions for further troubleshooting include using a sweep signal generator, checking for software or hardware faults, and using a datalogger to monitor noise and signal levels. One potential solution is to swap clock boards between two GPU's. The conversation also mentions the possibility of using a Labview program with a photodiode to
  • #1
hxtasy
112
1
how to purposely INDUCE EMI, noise, et cetera?? drill? phone?

troubleshooting a GPU server, it has a card in it that generates a 20-100 MHz clock signal that goes to the actual GPU. Have worked on several of these units, when they fail the gpu doesn't get the clock signal so there is no VGA output.

not really that important, but my question is, I am trying to narrow down what the problem is, i want to induce lots of noise around the small coaxial cable going to the GPU clock, to try to get it to fail.

sometimes the unit will keep failing, then you move some stuff around, and it works, there's not much rhyme or reason, and i cannot continue troubleshooting until i get it to fail again. noise wise, i have suspended an AC black and decker drill locked on full speed above the coaxial cable, and internal cards. loaded the power supply with all kinds of fans, heated the circuit up with heat gun, chilled it with air duster, shoved cell phones inside of it that were making calls, nothing seems to get it to fail consistently.

I am looking for ideas on how to get it to fail, how to add more noise. any suggestions are greatly appreciated. at this time i am not looking on other means to troubleshoot the unit, because most have been exhausted now, by several people, over a few month time span. i do not have access to any test equipment to help, but even if i did i still have to recreate the problem.

thanks!
 
Engineering news on Phys.org
  • #2


The problem could be mechanical - while electromagnetic radiation can disturb electronics, this should not happen on a usual noise level. Something roughly in sync with your clock might have some chance.
Enough irradiation would certainly work (permanently), but that is not practical either.

Shake it? Hit it with hard materials?
 
  • #3


Intermittent faults are definitely the worst kind - commiserations.
Have you eliminated the possibility of a software or on board hardware fault - or even a power supply glitch? I guess you don't have equipment for a substitution test.
I'm not sure that your scattergun approach with the Drill will help. If your drill were to be putting out high enough levels of interference for this test then you would already be getting complaints from the neighbours! Your tests need to be better targeted.
There are a million and one different possible approaches but most of them would not be guaranteed to work.
Rather than the drill noise source, a sweep signal generator could help to produce CW interference. I assume your GPU input is filtered to eliminate frequencies outside the range you quote so 20-100MHz range would do the trick and you should expect around 1V RMS to be available from any generator you could get hold of.
You could try a screened box for the equipment, better grounding, replacing cables with screened versions.
You seem to be assuming that the problem is failure of the clock signal to arrive at the input. What is the basis for that assumption? What level is the clock signal? Are the two units as close together as possible? If there seems to be a correlation then you could record the levels of the clock signal and failure events.
If you can't actually be sitting next to the gear, waiting, then some form of recording would definitely help and you would need some datalogging equipment and software for a laptop. Not too expensive as it's available as School Lab equipment but you'd also need other peripherals to measure noise, signal levels, temperature etc. as input to the datalogger. It could save man-days worth of money in the long run though.
It would be too much to expect you to have an RF spectrum analyser, I guess. Do you have a nearby airport with high power radar.? etc etc.
 
  • #4


Amen on intermittents.

One has to encircle the problem, binary search if possible.

Can you swap clock boards between two GPU's? That'd tell you board or slot...

I have found a LOT of troubles by physically agitating things.
Poke at it and see if you can make it say "ouch".

Once had a rack of computer equipment that shut down roughly every 12 hours. But if you left the door open it'd run for weeks.
Turned out one of the power supplies had its fan mounted backward so it recirculated hot air with the one adjacent. It'd get too warm, shut down, cool off for a few minutes then come back. Found that one by plucking a hair and holding it in front of all the fans in the rack one by one to see how the wind blew...

old jim
 
  • #5


really good input thanks guys.


sophiecentaur, we are right under an airport. we have thought about solutions such as cable shielding, et cetera, but need to reproduce the failure before anything. the system in general is kind of hokie, the company that sells the gpu cards takes the cyrstal oscillator out, solders on some coax cable with an SMB connector that plugs into the clock reference board. the cable is 50 ohm and I am not sure if that is proper matching for the crystal. But we have had the same setup in hundreds of other applications, just this one specific chassis and power supply combo, so its either power or noise related (assumed).

i didnt want to bore you guys with all the troubleshooting i have already done, swapping cards, swapping chassis components, grounding, carrying the chassis to different rooms, different work benches, non ESD, ESD, still not failure able to be reproduced. I made a Labview program to watch the voltage signals on the cards and make sure they didnt glitch, they all seemed fine, in conjuntion with my fluke meter setup on min and max looking at the same voltages.

I was going to make a simple program using labview and my cheap DAQ again, to push the reset button, then put, say a photodiode on the LCD screen, to do a reset count and statistically analyze the reboot failure rate. but really it may be 1 in 100, its still not going to tell me how or when. i figure extreme temperatures, or noise, induced in a way that would present a failure, would allow me to start doing process of elimination to deteremine which one failed.

the drill thing and cell phone thing were kinda humorous and for shts and giggles, but i am seriously out of ideas here and it may just be that i am out of resources.

you would think itd be easy getting something to fail :D

Jim,

i have had similar problems with fans being reversed, something i realized when i dropped a piece of paper next to it, although it never caused over heating. but that's my whole mindset, i could rent the proper equipment to test this stuff, send it to a test facility, but i know its got to be something simple!
 
  • #6


There is a formal way to do this. Go to your compliance test lab (where you go have CE mark and FCC testing done). Have them perform the following immunity tests:

Perform these tests on your suspect cable:
EN 61000-4-4 fast transient/burst immunity.
EN 61000-4-6 Conducted RF immunity.

Perform this test on your system (more expensive test):
EN 61000-4-3 radiated immunity.

This will cost some money, but at some point your time can get pretty expensive.
 
  • #7
hxtasy said:
really good input thanks guys.


sophiecentaur, we are right under an airport. we have thought about solutions such as cable shielding, et cetera, but need to reproduce the failure before anything. the system in general is kind of hokie, the company that sells the gpu cards takes the cyrstal oscillator out, solders on some coax cable with an SMB connector that plugs into the clock reference board. the cable is 50 ohm and I am not sure if that is proper matching for the crystal. But we have had the same setup in hundreds of other applications, just this one specific chassis and power supply combo, so its either power or noise related (assumed).

the drill thing and cell phone thing were kinda humorous and for shts and giggles, but i am seriously out of ideas here and it may just be that i am out of resources.

you would think itd be easy getting something to fail :D

but i know its got to be something simple!

They're always simple in hindsight.
I've used drills and cellphones too...poor man's EMI test. Skilsaw makes bigger sparks...

You say it's a clock board?

How long is that cable?

Is it just a single coaxial line? I'd guess so from the looks of SMB connector...
http://www.amphenolrf.com/products/smb.asp?N=0&sid=511986006145E17F&

sometimes the unit will keep failing, then you move some stuff around, and it works, there's not much rhyme or reason, and i cannot continue troubleshooting until i get it to fail again.
Sounds like looseness or data crash.

Yes, a logic analyzer would be mighty nice, or digital storage scope with a smart trigger menu.

For what little this is worth, and you are probably past this point, but the old troubleshooter in me can't resist... i apologize if this is just old man's talk, don't mean to bore you.

Long ago in a galaxy far away i had signals propagating down a line that got one bit corrupted every couple hours.
Encircling the problem was horrific because it was a synchronous sixteen bit I/O bus on differential lines. Won't go into that..
Once we got it corralled, found the problem by looking at each of the fifty or so lines with a 100mhz analog (NOT digital) 'scope while hammering the same address at 20khz repetition rate.
When one turned the lights in the room way down and brightness on the 'scope way up, one could see the transitions were mostly perfect, but on a couple lines one transition in every several thousand was delayed about 50nanoseconds.
What it looked like was a bright trace perfectly shaped that was good data, with a very faint ghost trace that was delayed (hence bad data), of course dim because it occurred so infrequently.
The delay caused an overlap that occasionally corrupted an address.
Moving the cable(which was about 20 feet long) would change symptom , but the boards worked fine in the board tester(no capacitive load).
Further it was sensitive to cable length and to the data pattern transmitted on adjacent bits.
We'd never have found it without that old analog 'scope.

Point being, your low tech approach is probably what will get you to the answer.
Keep up your creativity with the test equipment you have.
Look at that clock signal with a fast analog 'scope and darkened room...

Ours turned out to be a batch of line drivers from TI had been recalled for intermittent behavior but a handfull of them made it through supply chain and QA inspections into our boards. This was 1987-ish, days of 14 pin DIP IC's. The IC's had low drive capability so were affected by reflections and crosstalk in the line. Replacing the IC's with 100% tested ones fixed it.

Good luck - i hope you find a loose connector center spring or broke solder joint. But 'scope that clock's line driver IC output.

Again i apologize if this is no longer relevant.
But i do share your pain.

old jim
 
Last edited:
  • #8


I think my worse tracing an intermittant crosstalk interferance was on an industrial inmotion scales system. This system uses 4-20 ma circuits. Including the load cells. Anyways I spent weeks trying to isolate random fluctuations in weight. One day while looking at the scale indicator I noticed that when I keyed up my radio the weights jumped. This lead us to find a poorly shielded ADC converter.

In regards to the OP. Random intermittants can be a bugger. Generating interferance may lead to poor shielding I would look at the required signal and find a means of generating that signal to find a poor shield. In the case of power there are relatively cheap power recorders. The fault could also be the receiving circuit failing to poll the signal. I ran into a refrigeration controller failure. That turned out to be the master controller CPU getting a bad command to check the RTU.
You can probably use a frequency sweep generater to induce noise. Their are plenty of models and waveform generatures available. I would also look at what other signals are being polled when the clock circuit failures occur.
 
Last edited:
  • #9
Something just occurred to me, if I'm not mistaken the frequency range you mentioned falls within CB radio range. A radio signal could have been the fault source.

Edit just checked CB is 20 to 27 Mhz. There are also HAM radios using The same range as the clock circuit.
 
Last edited:
  • #10


Seriously, can you not swap one or both of the units with some other equipment, elsewhere? That would firmly establish whether it's the environment or the unit(s).
 
  • #11


sophiecentaur said:
Seriously, can you not swap one or both of the units with some other equipment, elsewhere? That would firmly establish whether it's the environment or the unit(s).

We have spent a lot of time swapping parts from different units, since we have about 20 of these with the problem. its just hard to reproduce.



Anyway, i figured out the culprit. really appreciate your guys input, first time i used this forum for troubleshooting help and the stories and ideas really motivated me and made me think out of the box. thanks guys.

so the problem mostly happens during boot up, which should have been my biggest clue. the GPU and the clock cards need to power on simultaneously, if the GPU powers first, and doesn't have a clock, it won't work, even if clock is added later.

thankfully, there are power LED's on the clock boards. watching them closely you could see soemtimes they would glitch, or not come on at all. when it failed, they did not come on for like 5 seconds. i got it to fail a few times today, seems like if you let it sit for a night it will fail the next morning. thank god the LEDs were there, i should have been monitoring those voltages all the time on the scope, i will keep that in mind next time i troubleshoot.

so turns out its something in the power distribution board. I really just wanted to figure out the culprit, so not too interested in the pdb board itself. i will let the manufacture figure it out from here.

also thankful its not an RF problem, because that could have been more difficult to diagnose.

it seems like when i get stressed out over not figuring something out, i really lose motivation. but most of these weird problems get fixed serendipitously, so if you just go and play with the parts try weird things, eventually you will get more clues.


good story Jim. maybe we should start a post with troubleshooting stories, they could be entertaining and informal.

I read one in an RF book, about a very weird GHz radiated emission problem. Dont remember the specifics, but basically they were flying a helicopter over the city with an antenna and spectrum analyzer, looking for where the emissions were coming from. they zoomed in on a fast food joint. apparently the power of the signal was something huge that they never saw before.

they asked the fast food restaurant manager if they could peak around. he let them in, then they noticed a microwave in the kitchen with the door removed. employees were sticking food in there and microwaving it with the door open!

the manager boasted about it saying "yea i found out its saves us several seconds not having to open the door to put food in".

apparently he jammed something in the door striker to close the door safety switch.

im sure you guys have some funny troubleshooting stories also.


thanks again,

-hxtasy
 
  • #12


but most of these weird problems get fixed serendipitously, so if you just go and play with the parts try weird things, eventually you will get more clues.

"Poke at it and see if you can make it say "ouch"".
That's how a good doctor troubleshoots.

Glad you found it and thanks for letting us know...
and as you say it's when you let your mind wander that serendipity switches on.
Same is true in poker.

The truth knocks on the door and you say, "Go away, I'm looking for the truth," and so it goes away. Puzzling.
robert pirsig

maybe we need a thread for metaphysics, too...

Thanks for your kindness.

old jim
 
  • #13


Glad you found it and it wasn't an RF problem. Those are extremely daunting at times. I agree in that oft times its best to walk away from a problem for awhile, then revisit it later on. Many of my intermittants I had to solve I ended up figuring out while I was relaxingat home.
 
  • #14


2Hxtacy
Well done. You can now sleep at night, I hope!
 

What is EMI and how can it be induced purposefully?

EMI stands for electromagnetic interference and it is the disturbance caused by an external source on an electronic device or system. This disturbance can disrupt the proper functioning of the device. To purposely induce EMI, one can use an electromagnetic field generator or a strong magnet close to the device.

What is noise and how can it be induced purposefully?

Noise refers to any unwanted or random signals that can interfere with the proper functioning of an electronic device or system. To purposely induce noise, one can use a noise generator or introduce interference signals into the circuit.

How can I induce EMI and noise in a drill?

To induce EMI and noise in a drill, you can use an electromagnetic field generator or a strong magnet close to the drill motor. You can also introduce interference signals into the drill's power supply or wiring.

Can EMI and noise be purposely induced in a phone?

Yes, EMI and noise can be purposely induced in a phone by using an electromagnetic field generator or a strong magnet close to the phone's antenna or circuitry. You can also introduce interference signals into the phone's power supply or wiring.

What are the potential risks of purposely inducing EMI and noise?

Purposely inducing EMI and noise can potentially damage electronic devices and disrupt their proper functioning. It can also interfere with other nearby devices and systems, causing them to malfunction. Therefore, it should only be done in a controlled environment by trained professionals for specific testing purposes.

Similar threads

  • Electrical Engineering
Replies
1
Views
1K
  • Electrical Engineering
Replies
30
Views
5K
  • Computing and Technology
Replies
10
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
3K
Back
Top