Cryptographically secure code obfuscation

mXSCNT · Sep 26, 2011

I got to thinking about code obfuscation. Current code obfuscators use ad hoc techniques like symbol renaming. But is it possible to have a cryptographically secure code obfuscator that outputs programs that work the same way as the originals, but provably no one can understand?

So here's the key idea around what I think that would mean. If you understand a slow program, you should be able to improve the slow parts and make it faster. If a program is slow but you have no idea how to make it faster, you don't understand the program.

So a cryptographically secure code obfuscator is a function F that takes as input a program P, and outputs another program Q (in other words F(P) = Q), where the following conditions apply:

P and Q both compute the same function, but Q is slower than P
it should be computationally hard to improve the performance of Q, given only Q and not P. This can be expressed mathematically by saying, there should not exist an easy-to-compute optimization function W such that if F(P) = Q, W(Q) is faster than Q.
Calculating F should be possible in polynomial time.

Anyone heard of something like this?

rcgldr · Sep 26, 2011

The issue in the case of windows is that it supports debuggers for just about any program. It also supports remote (hosted on a second computer) debuggers than end up getting enabled during system boot time (often used to debug device drivers), where trying to prevent the debugger from attaching to the process used to run you program is probably not possible.

Once the program is actaully running, some debuggers can disassemble the running parts of a program, even if the program is kept encrypted in sections until it's needed to execute.

In the end, at best all you can do is make it difficult, but not impossible, to reverse engineer a program running under Windows.

Versions of Linux also include debug kernels (again mostly used for device driver development), so I'm not sure if Linux is any better, other than access to debugging kernels may be limited to approved developers.

AlephZero · Sep 26, 2011

Let's try stating the problem a different way.

Suppose you have a Turing machine. Clearly the machine can be defined by a sequence of integers in some way, so its definition could be encrypted by standard "unbreakable" methods.

You can then pose the qiestion: can you construct a Turing machine that processes the "encrypted" definition, without trivally decoding it back to the original definition?

I don't know how to start answering that version of the question, but at least it seems to be a more precisely defined problem than the original version.

mXSCNT · Sep 26, 2011

AlephZero said:

Let's try stating the problem a different way.

Suppose you have a Turing machine. Clearly the machine can be defined by a sequence of integers in some way, so its definition could be encrypted by standard "unbreakable" methods.

You can then pose the qiestion: can you construct a Turing machine that processes the "encrypted" definition, without trivally decoding it back to the original definition?

I don't know how to start answering that version of the question, but at least it seems to be a more precisely defined problem than the original version.

That's less precise than my version because you don't say what it means to "trivially" decode it back to the original definition. My version is reducible to math; yours is not.

AlephZero · Sep 26, 2011

mXSCNT said:

My version is reducible to math; yours is not.

Hm... I studied Turing machines as part of a math degree. But please yourself.

I don't see why you require an obfuscated program to be slower than the original, though. Your assertion

If you understand a slow program, you should be able to improve the slow parts and make it faster. If a program is slow but you have no idea how to make it faster, you don't understand the program.

is false in general. There are "slow" algorithms whch are very well understood (e.g. enumeration of all possible cases), but no faster algorithms are known.

rcgldr · Sep 27, 2011

Assuming that P represents some algorithm, then doesn't the end result of any computer that can run Q, end up being the same algorithm as defined by P? If so, then if there is some tool that let's a person run Q with a debugger with trace feature, wouldn't that effectively allow the recreation of P?

Cryptographically secure code obfuscation

Thread 'Learning Assembly and computer architecture for x86'

Thread 'Microsoft Technical Interview question'

Similar threads

Hot Threads

Hackathon ideas?

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective