Searching inside .zip files on internet

  • Thread starter Thread starter Vespero
  • Start date Start date
  • Tags Tags
    files Internet
Click For Summary

Discussion Overview

The discussion revolves around the feasibility of searching for content within .zip files hosted on the internet without downloading and extracting them. Participants explore potential methods and limitations related to this task, focusing on technical aspects of file retrieval and search capabilities.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant inquires about the possibility of searching .zip files online without downloading them, expressing a desire to find specific content within .doc files contained in those archives.
  • Another participant suggests that using HTTP version 1.1 allows for byte serving, which could potentially help in searching, but questions whether any search engines currently support this functionality.
  • A different participant argues that a search bot would need to retrieve files to local storage before searching, regardless of whether the files are compressed, indicating a fundamental limitation in searching directly on web servers.
  • There is mention of the possibility of writing custom code to search within .zip files if the files are fetched, but this requires local access to the files.
  • Concerns are raised about the impracticality of searching through .zip files that contain various file types without first downloading them.

Areas of Agreement / Disagreement

Participants express differing views on the capabilities of search engines and the technical limitations of searching within compressed files directly on web servers. No consensus is reached regarding a viable method for achieving the desired search functionality.

Contextual Notes

Limitations include the requirement for files to be downloaded before they can be searched, and the uncertainty regarding the capabilities of existing search engines to handle such tasks.

Vespero
Messages
26
Reaction score
0
Is anyone aware of a way to search through .zip files on the internet (such as in an archive site) without having to download and extract the files? For example, if I have a search phrase that may be in a .doc file inside a .zip file which is potentially stored with many other .zip files, I don't want to have to download them all and have to manually search through them, but would like to be able to at least find the correct .zip to download first.

Many thanks.
 
Computer science news on Phys.org
Vespero said:
Is anyone aware of a way to search through .zip files on the internet (such as in an archive site) without having to download and extract the files? For example, if I have a search phrase that may be in a .doc file inside a .zip file which is potentially stored with many other .zip files, I don't want to have to download them all and have to manually search through them, but would like to be able to at least find the correct .zip to download first.

Many thanks.

If your using HTTP version 1.1 then yes because you can use ranges.

http://en.wikipedia.org/wiki/Byte_serving
 
SixNein said:
If your using HTTP version 1.1 then yes because you can use ranges.

http://en.wikipedia.org/wiki/Byte_serving
I think he is hoping to use a search engine - and I don't think any of them do what he wants.

If he's writing the search code himself, your byte range thing would be useful if he could eliminate many of the files based on their filename. But whenever a zip file contained only files like *.txt, *.docx, he would still need to read the whole zip file.
 
I can't think of any way for a search bot to look inside any file out on a web server without first retrieving the file to local disk. It seems irrelevant whether the file is compressed (zipped) or not. If you are writing code, you can certainly get a library to open the zip so it can be searched as plain text or whatever format you expect.

Unless perhaps you were able to inject pernicious code onto the web server itself so that the code runs THERE, but that would be only if you are permitted to add code to a site. It would not apply to most sites. If your code is looking at files on the web, they must first be fetched to your local disk by the HTTP client. Period.
 

Similar threads

  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
7
Views
15K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
3
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K