Compressed Zip Folder, Msoft Word, characters to avoid?

  • #1
paulb203
104
43
TL;DR Summary
Which characters to avoid when compressing a folder in MS Word
I’m trying to compressed a folder containing a combination of sub-folders and individual files. All the files are Word documents.
I’m having to rename some of the sub-folders as I’ve become aware that some characters have to be avoided. For example, MS have told me that apostrophes have to be avoided.
Does anyone here know the full list? Google, as ever, doesn’t lead me to a consensus. Example. Someone said spaces are to be avoided but I managed to compress a folder to the contrary. Another said to avoid capital letters, but that's not a problem either.

Also, is it just the folder names, and sub-folder names, or does it apply to individual files too?
 
Computer science news on Phys.org
  • #2
Personally instead of learning which ones to avoid just limit yourself to A-Z, a-z, 0-9 and the character underscore "_" and the character "-". These should work across OS systems as well ie Windows, MacOS and Linux.

Why?

* and ? are used for file searching
' and " are used for quoting file names with embedded spaces for command line commands
. and / and \ and : and ; are used as file and folder separators depending on the file system and OS. Examples include /xxx/yyy/zzz.doc on MacOS or Linux and c:\xxx\yyy\zzz.doc on Windows

Other special characters may crop up in scripting like ! and & and # and @ ...
 
  • Like
  • Informative
Likes AlexB23, Vanadium 50, davenn and 4 others
  • #3
+1 to jedishrfu. I would avoid use of any apostrophes or similar special characters in filenames. You never know what they'll blow up down the line.

IMHO it is worth it for you retrofit your existing files and folders with boring names and follow that policy moving forward.
 
  • Like
Likes paulb203
  • #4
paulb203 said:
Also, is it just the folder names, and sub-folder names, or does it apply to individual files too?
It applies to files as well as folders.

@jedishrfu's advice is good: avoid everything except [A-Za-z0-9_-.] (I've also included . as using this is fine, as long as you recognise its special use to indicate an extension.

If you are working on both Windows and POSIX-based file systems (MacOS, Linux...) it is also a good idea to avoid capital letters because in Linux MyFile.txt and myfile.txt are different files but in Windows they refer to the same file so this would cause problems.

Other characters like spaces are also permitted, but that doesn't mean it is a good idea to use them because they may break other things (for instance if you use spaces you might have to surround the file name with quotes sometimes).

Also note that really_long_folder_or_file_names_are_a_bad_idea, as\are\deep\levels\of\nesting\particularly_with_long_names\for_folders, because in some cases the full path is limited to 260 characters (and sometimes it isn't).
 
  • Like
Likes paulb203
  • #5
All excellent advice (this is all coming back to me now!)

I have a subsite on my webhost where my code makes no distinction between .jpg and .JPG - but my webhost does, so half my images are busted.
 
  • Like
Likes paulb203
  • #6
Thanks a lot guys.

So, in summary;

Stick to; a-z, 0-9, underscore, and dash. That's it.

Avoid everything else, including long names for files or folders, and deep nesting (which I think means folders containing sub-folders which themselves contain sub-folders (so, sub-sub folders, I guess) etc, etc?

Nb; I know not everyone said avoid uppercase and the full stop (period) but given what pbuk said, with caveats, I think it might be best for me to just avoid those too.

Q. If I've understood deep nesting correctly, is it ever problematic to have, say, a folder, containing a subfolder? A folder containing a subfolder which also contains a subfolder? At what point does it become potentially problematic?
 
  • #7
You might be interested in https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

paulb203 said:
Q. If I've understood deep nesting correctly, is it ever problematic to have, say, a folder, containing a subfolder? A folder containing a subfolder which also contains a subfolder? At what point does it become potentially problematic?
According to the previous link, with NTFS (Windows), there is a limit for the maximum pathname length: 32,767 characters with each path component (directory or filename) up to 255 characters long.
 
  • Like
Likes paulb203
  • #8
jack action said:
You might be interested in https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits


According to the previous link, with NTFS (Windows), there is a limit for the maximum pathname length: 32,767 characters with each path component (directory or filename) up to 255 characters long.
That is what NTFS supports, yes, so you can in theory store those files on your disk, however the default in the Windows API is a TOTAL of 260 characters as I said above, so these files cannot normally be accessed by applications. Which is not much use.

1729029695089.png


https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
 
  • Like
  • Informative
Likes paulb203, berkeman and jack action
  • #9
Concerning the dot, I've had some issues using dot in some applications that characterize the file or directory by its ending file type.

As an example, while using the Obsidian app on MacOS, I named a directory my.math.notes, and MacOS decided it was a file and not a directory, so I had to change them to my-math-notes to avoid the issue.

Also its not good to use "<", ">" or "|" characters as they are used in command-line commands for redirection ie "<" for input file redirection ie input comes from the specified file and ">" for output file redirection ie output goes to the specified file. The "|" is for chaining commands together.

"&" is yet another as it has special uses in scripting and in marking a command on unix based OSes for background execution.

Also, the "\" used in Windows for directory path separator is a bad character for other OSes as it is interpreted as an escape character and removed from the string. If you've had C programming, then recall that "\r" is the carriage return, "\n" is the newline character, "\t" is the tab character, and there are others. Most will cause havoc in path/filenames.

Anyway, that's why its best to stick with alphanumeric characters and "-" or "_" only.

As far as restricting subdirectories in zip files, I don't see the point. Zip files can preserve directory structure, allowing you to move files and directories as is, and this comes in very handy when doing backups or snapshots of your work or for sharing with others.
 
  • Like
Likes paulb203
  • #10
paulb203 said:
Q. If I've understood deep nesting correctly, is it ever problematic to have, say, a folder, containing a subfolder? A folder containing a subfolder which also contains a subfolder? At what point does it become potentially problematic?
When the total length of the path approaches 260 characters.
 
  • Like
Likes paulb203
  • #11
jedishrfu said:
As an example, while using the Obsidian app on MacOS, I named a directory my.math.notes, and MacOS decided it was a file and not a directory, so I had to change them to my-math-notes to avoid the issue.
That sounds like an Obsidian bug - the API call to create a directory is different to the call to create a file, and they have different representations on disk. At least in any sensible file system they do - I'm not sure about APFS, and although MacOS is evil and perverse I really can't see it changing a descriptor in this way.

I cannot reproduce this in Obsidian 1.6.7/Big Sur.
 
  • #12
pbuk said:
When the total length of the path approaches 260 characters.
I'm not sure what 'the path' means. I Googled it but regretted. I'm borderline tech illiterate, as you've probably gathered.
Does it mean that if you have a folder named folder_1 you've used up 8 of those 260 characters. And a subfolder named folder_2 you've used up 8 more? If so, would that mean you could have 30 or so nested folders before it became problematic?
 
  • #13
pbuk said:
When the total length of the path approaches 260 characters.

Boy, I have a story for this one. At one time, I was working on an Installshield script for our product. It was a developer kit for Windows and originally we had to add 3 directories to the system path during the install.

However, the developers in their infinite wisdom devided up the code into three separate areas (runtime, developer tools, customer examples) which meant we had to add 9 directories (bin, lib, dll) to the system path.

They were added to the beginning of the path in order to supersede any Microsoft command/dll of the same name as ours. Sadly, when adding these 9 directories each about 30+ character long, we managed to push the system dll + commands directory past the 256 character limit. The path got chopped at 256 and on reboot nothing came up except for a pretty blue screen which made you want to take a nice vacation.

This happened to me on my workstation. Fortunately, I had given a copy of my environment variables to a coworker who had to rebuild his machine a few days earlier and with those in hand I was able to get things back to normal.

I filed a report with MS on it but heard nothing from them. It surprised me that they didn't respond since we were a big user of microsoft products and this seemed like a major flaw. There was nothing in their docs about this limitation that I could find. I felt it was a holdover from DOS days and lean clean programming machines.

Curiously, the command session path allowed for 4096 characters. but we couldn't use that since much of our stuff was GUI based and launched via the desktop not in a command session.
 
  • #14
paulb203 said:
I'm not sure what 'the path' means. I Googled it but regretted. I'm borderline tech illiterate, as you've probably gathered.
Does it mean that if you have a folder named folder_1 you've used up 8 of those 260 characters. And a subfolder named folder_2 you've used up 8 more? If so, would that mean you could have 30 or so nested folders before it became problematic?
When you open a command line session aka command shell where you can type commands like dir, copy, format and others, the command shell sets up a custom environment of parameters that the commands can access to configure themselves properly when they are called to run.

One such parameter is the path parameter which provides a list of directories for the command shell to search when it doesn't understand the command the user typed in.

When I type dir on windows or ls on unix OSes, the command shell knows these are built in and are common to all users. But if I type python then the command shell searches the list of directories listed in the path parameter looking for the Python command.

You can use the echo command to view the path parameter in windows:
Code:
echo %path%     (Shows the path parameter)

c:\Program files\python3\bin;c:\windows\system32;c:\windows

Typing python, the command shell will prepare to search the three directories and find python in the first and will then execute python.exe.

The set command will display all environment parameters.

set (Shows all environment parameters)
 
  • #15
paulb203 said:
I'm not sure what 'the path' means.
https://en.wikipedia.org/wiki/Path_(computing)

paulb203 said:
Does it mean that if you have a folder named folder_1 you've used up 8 of those 260 characters.
In Windows (which is the only place it matters) it means you have used 12 C:\folder_1\.

paulb203 said:
And a subfolder named folder_2 you've used up 8 more?
9 more: folder_2\.

paulb203 said:
If so, would that mean you could have 30 or so nested folders before it became problematic?
Yes but that doesn't mean nesting folders that deep is a good idea - it would be impossible to keep track of what is where.

Also note that working close to the 260 character limit is not a good idea - let's say you want to take a backup on another disk in the folder "D:\backups\pbuk-laptop\2024-10-16T09:09:23\" - that's another 43 characters added on the front (guess how I know this).

Edit [aside]: I think I first encountered this limit whn ripping my CD collection with the (excellent) dbPoweramp on its default settings when it tried to create something like C:\Users\pbuk\Music\ripped-cds\Bach: St Matthew Passion Gardiner, Rolfe Johnson, Et Al\Gloria - Chorus: Et in terra pax - Nancy Argenta, Jane Fairfield, Jean Knibbs, Collin Patrick, Ashley Stafford, Andrew Murgatroyd, Lloyd Morgan, Stephen Varcoe, English Baroque Soloists, John Eliot Gardiner, The Monteverdi Choir.flac
 
Last edited:
  • Like
Likes paulb203
  • #16
pbuk said:
C:\Users\pbuk\Music\ripped-cds\Bach: St Matthew Passion Gardiner, Rolfe Johnson, Et Al\Gloria - Chorus: Et in terra pax - Nancy Argenta, Jane Fairfield, Jean Knibbs, Collin Patrick, Ashley Stafford, Andrew Murgatroyd, Lloyd Morgan, Stephen Varcoe, English Baroque Soloists, John Eliot Gardiner, The Monteverdi Choir.flac
It uses the entire description of the recording as the file name?!? Yow! :wideeyed:
 
  • #17
I think that if you instead went to the ripped-cds directory and zipped from there you could reduce the length of the filepath of the captured files.

Also Windows has shortnames which were remnants of the original DOS filesystem and the longnames were stored as a kind of attribute in a separate internal table.

Im sure though the zip file doesn't use that shortname trick internally.

I've seen longname cause problems in some downloading utilities too where they will stop and say the name is too long.
 
  • #18
jtbell said:
It uses the entire description of the recording as the file name?!? Yow! :wideeyed:
This information is not on the CD, it is provided by third party databases e.g. MusicBrainz' Picard. This data is in turn often provided by individual users: if the user creates an excessively long description then that is what is picked up.

jedishrfu said:
I think that if you instead went to the ripped-cds directory and zipped from there you could reduce the length of the filepath of the captured files.
Instead I edited the filenames (and the internal mp3 tags) to something more useful, and changed my ripping routine to include a review of the meta-data that was being picked up (and also use 3 different profiles in dbPoweramp: rock, classical and audio books with different file naming strategies for each - for instance classical music is filed by composer (Bach) rather than artist (English Baroque Soloists, John Eliot Gardiner, The Monteverdi Choir). Also this was about 20 years ago: the meta-data sources are much better now.
 
Last edited:
  • Like
Likes jedishrfu
  • #19
paulb203 said:
I’m trying to compressed a folder containing a combination of sub-folders and individual files. All the files are Word documents.
Why?

Word files are already compressed. Compressing a compressed file might save you a percent or two - or it might cost you a percent or two. What are you trying to accomplish?
 
  • Like
Likes paulb203 and pbuk
  • #20
There is some advantage in that due to how Windows allocates space in NTFS drives in 4-kbyte chunks.

Nowadays 4-kbyte saving per file on disk isn't much, but a penny saved is a penny saved.
 
  • #21
Vanadium 50 said:
Why?

Word files are already compressed. Compressing a compressed file might save you a percent or two - or it might cost you a percent or two. What are you trying to accomplish?
Thanks, @Vanadium 50.
I'm trying to email the folders to myself, as an extra back-up (I naively thought 'the cloud' was guaranteed safe). I think they have to be zipped to do this.
 
  • #22
paulb203 said:
I'm trying to email the folders to myself, as an extra back-up
Where are you going to save these emails?
 
  • Like
Likes Vanadium 50
  • #23
pbuk said:
Where are you going to save these emails?
I'm not going to do anything beyond sending them from my gmail to my outlook. Then I'll have one in my gmail sent folder, and one in my outlook inbox. So I guess they'll be saved wherever emails are stored; a 'remove server' according to Google. Some warehouse where lots of big computers store lots of data?
 
  • #24
paulb203 said:
Some warehouse where lots of big computers store lots of data?
Acually with multiple copies replicated in different warehouses. This is what is meant by 'in the cloud': do you see why I asked?
 
  • Like
Likes paulb203
  • #25
pbuk said:
Acually with multiple copies replicated in different warehouses. This is what is meant by 'in the cloud': do you see why I asked

pbuk said:
Ah, so that counts as 'the cloud' too. Thanks. When I said the cloud before I was thinking about OneDrive (and iCloud etc). It's the cloud in that respect that I'm less confident about now. I've had folders backed up in email accounts for decades and there's never been a problem. But I have had problems (lost files) with OneDrive.
 
  • #26
paulb203 said:
But I have had problems (lost files) with OneDrive.
Yes I've seen your thread about this. You lost access to the files you had stored in a OneDrive account because you lost access to the OneDrive account.

Exactly the same will happen if you lose access to the Outlook.com or Gmail accounts, however if you are confident that this will not happen it makes more sense to store files using the associated file storage services (OneDrive and Google Drive respectively) rather than the email service.
 
  • Like
Likes Vanadium 50 and paulb203
  • #27
As @pbuk said, this is the cloud.

You will also discover that there are limits on the size of attachments per message, limits on the total size of your mailbox and limits on how long these attachments are kept, These limits depend on your provider and how much you are paying them.

There is no free lunch, If you want someone offsite to curate your data, you will have to pay them.
 
  • Like
Likes paulb203 and jedishrfu
  • #28
Vanadium 50 said:
As @pbuk said, this is the cloud.

You will also discover that there are limits on the size of attachments per message, limits on the total size of your mailbox and limits on how long these attachments are kept, These limits depend on your provider and how much you are paying them.

There is no free lunch, If you want someone offsite to curate your data, you will have to pay them.
Thanks, Vanadium.

I'm familiar with size limits of attachments. This is not a problem as gmail and outlook have limits of 25MB and 20MB respectively, and all my folders/files are smaller than that.
Mailbox size not a problem either as even the free 15GB from gmail and outlook are way more than enough for my needs.

What I didn't know about was 'limits on how long these attachments are kept'. I'm getting conflicting advice from the internet on that one. Some say they, like emails, are there indefinitely, unless the account is deleted. Others say it depends whether it's a free account or a paid one (with a free one you can access them for 7 days).
In my experience all my attachments, like my emails, have been there, available, for decades, in free accounts.

And, to reiterate, this, for me, is an extra back-up; save to this pc, documents; back up on flash drive; extra back up in email. As for OneDrive; I've not decided if I'm going to get rid of it or not (the subscription versions). Or, if I do get rid of that, if I just won't use the free version).

One other good thing about the email back up is, I've found, is if I'm away from my PC, and I don't have my flash drive handy, but have access to a PC, I can access a folder/file from my email.
 
  • #29
paulb203 said:
conflicting advice from the internet
Why are you asking them? Read the Terms of Service.
 
  • Like
Likes paulb203
  • #30
Update.
I've managed to 'zip' some folders. Some took forever. Large amount of subfolders, and individual files, most of them full of apostrophes, question marks, etc, etc. Still; satisfying when each one was done, zipped, and emailed to myself.
Loads more to do though.
 

Similar threads

Replies
4
Views
2K
Replies
6
Views
4K
Replies
1
Views
2K
Replies
18
Views
5K
Replies
6
Views
4K
Replies
7
Views
2K
Replies
33
Views
2K
Replies
1
Views
1K
Back
Top