Steganographic Data File Searches

  • Thread starter Thread starter TimeSkip
  • Start date Start date
  • Tags Tags
    Data File
Click For Summary

Discussion Overview

The discussion revolves around the concept of steganographic data file searches, particularly in the context of intelligence agencies and the potential for detecting covert information within files. Participants explore the feasibility of identifying hidden content through statistical analysis and file signatures, as well as the implications of encryption and compression on such searches.

Discussion Character

  • Exploratory
  • Debate/contested
  • Technical explanation

Main Points Raised

  • Some participants propose that steganographic data file searches could involve analyzing file types and their signatures to detect hidden information, particularly in the context of illegal content.
  • Others argue that there are numerous methods to conceal information in files, and statistical fingerprints can be used to identify covert operators through integrity tests.
  • A participant questions whether a sound file could have a steganographic signature that reveals the content of conversations, suggesting that scanning for specific terms could yield results.
  • Some contributions highlight that the discussion may be overly hypothetical, with many possibilities existing without clear resolution.
  • Several participants clarify that the term "steganography" is often misused in this context, emphasizing that it typically refers to hiding one file within another rather than analyzing file content for specific terms.
  • There is a discussion about whether the information saved in a file has a unique signature, with some asserting that statistical analysis can indicate the type of content present.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the effectiveness or current use of steganographic data file searches. Multiple competing views remain regarding the definitions and implications of steganography, as well as the methods for detecting hidden information.

Contextual Notes

There are limitations in the discussion regarding the definitions of steganography and the assumptions about file signatures and statistical analysis. The conversation includes unresolved questions about the practical application of these concepts in real-world scenarios.

TimeSkip
Messages
44
Reaction score
4
<moved to General Discussion, posts that ask for thoughts are not hard science>

Summary:: Soon?

I've been thinking whether in the present time if not near future, or even already something common nowadays amongst intelligence agencies would be the use of steganographic data file searches.

Just as the term means, it would consist of looking at the data-file type and seeing what other files on the internet would posses such a similar format among many other files and then looking at their steganographic data signature. I'm pretty sure one would have to rootkit the memory controller to do a hard drive scan and be able to perform a search of child pornography or terrorism related content.

Just as an example I think one could write in an .txt format some kind of statement, and instead of scanning the file signature, one would look at the actual content of the file based on its steganographic data file content in 1's and 0's as saved on some cloud or hard drive or SSD.

Is this something that goes on nowadays or will be possible in the near future to thwart known databases or online outlets of child pornography with a comparison to known content in law enforcement agencies?

The only way I can imagine how to circumvent this would be encryption or compression (to a lesser extent).

Thoughts?
 
Last edited by a moderator:
Computer science news on Phys.org
There are too many ways to hide covert information in a computer file, or in a transfer protocol. Every type of file has a statistical fingerprint, and most can be tested for integrity. The files that fail an integrity test, identify the covert operators.
Start here; https://en.wikipedia.org/wiki/Steganography

Traffic analysis will lead you to covert communication links faster than the examinination of file contents.

TimeSkip said:
The only way I can imagine how to circumvent this would be encryption or compression (to a lesser extent).
Do you want to circumvent the use of steganography by others, or do you want to use steganography without being detected?
Who are you investigating? Are they the suspect?
What are you trying to hide? Are you the criminal?
 
Anything hidden using well known publicly available software is likely to be found by detailed forensic analysis by well informed security teams.
 
Baluncore said:
There are too many ways to hide covert information in a computer file, or in a transfer protocol. Every type of file has a statistical fingerprint, and most can be tested for integrity. The files that fail an integrity test, identify the covert operators.
I'm not sure we're on the same page; but, can a sound file, such as a phone conversation in widely known format have a steganographic signature?

Let me give an example.

Person 01 is talking about some terrorism. If we scan the data file of the cell conversation for data file signatures containing "terrorism" being mentioned, then that could lead to a hit. Is it possible for the data file to inform the investigator of what is being talked about?

Similarly, assuming that we can scan .wav or even compression algorithms for a mention of "CP", then could this lead to a hit?

I'm not talking about hiding information steganographically, but rather content found through scanning a data file for information (itself) with a signature of the terms leading to a hit?
 
This is all too hypothetical, and there are too many possibilities.
Almost anything is possible in some unlikely situation.
 
TimeSkip said:
steganographic

You keep using that word. I do not think it means what you think it means.
 
  • Like
Likes   Reactions: hmmm27
Baluncore said:
This is all too hypothetical, and there are too many possibilities.
Almost anything is possible in some unlikely situation.
Well, let me provide an example:

You have a .txt file with the word "terrorism" written in it, and then you save it on your hard drive or SSD or whatever storage system you may have. Does the information saved on your storage drive have a unique information signature?

Vanadium 50 said:
You keep using that word. I do not think it means what you think it means.
The above is all I mean by a steganographic signature based on the unique signature of the information saved on a storage drive.

Thanks.
 
First let me state that what I'm about to describe has nothing to do with steganography. Steganography is something altogether different.

TimeSkip said:
Well, let me provide an example:

You have a .txt file with the word "terrorism" written in it, and then you save it on your hard drive or SSD or whatever storage system you may have. Does the information saved on your storage drive have a unique information signature?

I'm not sure what you mean. But every unique block of information has a unique signature, if you define the "signature" as being the block of information itself. If that sounds trivial, it may be because I'm not sure what you mean by "signature."

But I'll try to help.

Let's assume that your text file is stored in some sort of simple, 8-bit ASCII based format (examples include "ANSI", "Unicode" or "UTF-8").

Each letter corresponds to pattern of binary 1s and 0s.

't' = 0x74 = 0111 0100
'e' = 0x65 = 0110 0101
'r' = 0x72 = 0111 0010
'r' = 0x72 = 0111 0010
'o' = 0x6f = 0110 1111
'r' = 0x72 = 0111 0010
'i' = 0x69 = 0110 1001
's' = 0x73 = 0111 0011
'm' = 0x6d = 0110 1101

So, if you happen to know a priori that the file stores text data in simple, 8-bit ASCII based format, and you want to know if the file contains the word "terrorism," just look for the bit pattern,

0111 0100 0110 0101 0111 0010 0111 0010 0110 1111 0111 0010 0110 1001 0111 0011 0110 1101

Okay, so now suppose that the file format is not a simple, 8-bit ASCII based format, but maybe some 16-bit format (where each character is represented by 16 bits), and the format doesn't resemble ASCII at all. Well, if you know what the file format is, just create the bit pattern that corresponds to "terrorism" in that format (whatever that happens to be) and search for that.

None of this has anything to do with steganography though. It's just simple pattern matching.
 
Steganography in computing terms normally means hiding one file inside another, usually a picture file but it can be any file. If you want to keep data secure then your better off going down the encryption route.
 
  • #10
TimeSkip said:
You have a .txt file with the word "terrorism" written in it, and then you save it on your hard drive or SSD or whatever storage system you may have. Does the information saved on your storage drive have a unique information signature?
Yes. Statistical analysis will indicate strongly that it contains ascii text.

Steganography is something quite different.
 
  • #12
collinsmark said:
First let me state that what I'm about to describe has nothing to do with steganography. Steganography is something altogether different.
I'm not sure what you mean. But every unique block of information has a unique signature, if you define the "signature" as being the block of information itself. If that sounds trivial, it may be because I'm not sure what you mean by "signature."

But I'll try to help.

Let's assume that your text file is stored in some sort of simple, 8-bit ASCII based format (examples include "ANSI", "Unicode" or "UTF-8").

Each letter corresponds to pattern of binary 1s and 0s.

't' = 0x74 = 0111 0100
'e' = 0x65 = 0110 0101
'r' = 0x72 = 0111 0010
'r' = 0x72 = 0111 0010
'o' = 0x6f = 0110 1111
'r' = 0x72 = 0111 0010
'i' = 0x69 = 0110 1001
's' = 0x73 = 0111 0011
'm' = 0x6d = 0110 1101

So, if you happen to know a priori that the file stores text data in simple, 8-bit ASCII based format, and you want to know if the file contains the word "terrorism," just look for the bit pattern,

0111 0100 0110 0101 0111 0010 0111 0010 0110 1111 0111 0010 0110 1001 0111 0011 0110 1101

Okay, so now suppose that the file format is not a simple, 8-bit ASCII based format, but maybe some 16-bit format (where each character is represented by 16 bits), and the format doesn't resemble ASCII at all. Well, if you know what the file format is, just create the bit pattern that corresponds to "terrorism" in that format (whatever that happens to be) and search for that.

None of this has anything to do with steganography though. It's just simple pattern matching.
This is precisely what I mean; but, down to the very way the ASCII information is stored as information on a storage device. I will concede that steganographic searches were meant to assume that every instance of, for example, utilizing the words such as "I'm going to bomb, (such) country." in a cell phone conversation saved in something like .mp3 or .wav or .flac all have unique signatures for each data file format.

All that then one ought to do is run a search query for the unique signatures in the device storage through the memory controller. One doesn't necessarily have to have the file available to open or inspect necessarily.
 
  • #13
I can't edit the OP, otherwise I would substitute something instead of "steganographic", and haven't come across this type of method of analysis yet.
 
  • #14
TimeSkip said:
This is precisely what I mean; but, down to the very way the ASCII information is stored as information on a storage device. I will concede that steganographic searches were meant to assume that every instance of, for example, utilizing the words such as "I'm going to bomb, (such) country." in a cell phone conversation saved in something like .mp3 or .wav or .flac all have unique signatures for each data file format.

All that then one ought to do is run a search query for the unique signatures in the device storage through the memory controller. One doesn't necessarily have to have the file available to open or inspect necessarily.
Of course the way one might say something on a phone conversation would have unique characteristics due to different speech patterns. Yet, this more pertains to a stable input medium like .txt or computerized speech with hard phonetic's and proper grammar form.
 
  • #15
I feel like you are still misunderstanding how these things work. The "signature" that @collinsmark mentioned is nothing else but the text itself. Yes, it has a specific representation (which is what Collin tried to explain to you) but in no way this representation is a "signature" - when looking for the word "terrorism" you just search the text for that word, not for any specific property/signature/whatever which is not a simple content of the file.

So no, even assuming you didn't mean steganography there is still nothing of the kind you are looking for, there is just the content of the file, the message itself.
 
  • #16
TimeSkip said:
All that then one ought to do is run a search query for the unique signatures in the device storage through the memory controller. One doesn't necessarily have to have the file available to open or inspect necessarily.
This doesn't really make sense. The memory controller is not what is used for reading files.
You should be thinking harder about how you would go about getting access to the data.
The usual scenario is that you run a program on the device. There are programs that look for childporn, by computing hashes from all the files and comparing them to hashes of known childporn. A childporn collector is almost certain to have some files that have been seen before.
Audio conversations will be rarely saved on a phone or computer. You will have to listen into some network. If you are law enforcement, you'd get a warrant for a wiretap to serve on the internet provider. You'd have to use speech recognition software to look for words.
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 19 ·
Replies
19
Views
14K
Replies
17
Views
6K
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 9 ·
Replies
9
Views
6K
  • · Replies 2 ·
Replies
2
Views
14K