How can I use AWK to batch rename files in Unix?

  • Thread starter Thread starter Ygggdrasil
  • Start date Start date
  • Tags Tags
    Batch File Unix
Click For Summary

Discussion Overview

The discussion revolves around how to batch rename files in Unix using AWK, with participants exploring various scripting approaches, including bash, Python, and Perl. The context includes technical explanations and personal experiences with different programming languages for text processing tasks.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant describes a method in R for renaming files based on a tab-delimited text file and seeks a bash script alternative.
  • Another participant questions the structure of the input file, specifically the roles of the second and third columns.
  • A participant clarifies that the second column requires additional information appended to it for the renaming process.
  • Some participants suggest that using an AWK or Python script is preferable to a pure bash script for this task.
  • One participant expresses a preference for Perl, citing its superiority for such tasks and its universal availability on Unix machines.
  • Another participant argues that AWK is better for this specific task, sharing personal experiences of how AWK revitalized their programming career.
  • There is a discussion about the capabilities of AWK versus sed, with some participants advocating for AWK's versatility.
  • One participant reflects on the evolution of programming languages for text processing, mentioning various languages and their limitations.

Areas of Agreement / Disagreement

Participants express differing opinions on the best scripting language for the task, with some favoring AWK and others advocating for Perl or Python. The discussion remains unresolved regarding which approach is definitively superior.

Contextual Notes

Participants mention various programming languages and their historical development, highlighting limitations and personal preferences without reaching a consensus on the best tool for the task.

Ygggdrasil
Science Advisor
Insights Author
Gold Member
Messages
3,753
Reaction score
4,198
TL;DR
I'd like help writing a bash script to rename a list of files.
I'd like to rename a bunch of files in a directory based on data from a tab-delimited text file. I know how to do this in R:
Code:
dir <- "~/user/folder/"
fileNames <- read.table(paste0(dir,"fileNames.txt"),sep="\t",header=T)
for (i in c(1:nrow(fileNames))){
  oriFile <- paste0(dir,fileNames[i,"Sample"],"_S",i,sprintf("_L%03d",fileNames[i,"Lane"]),"_R1_001.fastq.gz")
  newFile <- paste0(dir,fileNames[i,"ID"],".fastq.gz")
  system(paste("mv",oriFile,newFile))
}

Where the fileNames.txt file looks something like:
Code:
Lane	Sample	ID
1	xxx-xx-32S-pl1-J01	WT1.IN
1	xxx-xx-32S-pl1-J02	WT2.IN
1	xxx-xx-32S-pl1-J03	WT3.IN

In an effort to improve my knowledge of unix shell scripting, I'd like to know how one would approach writing a bash script that does this.
 
Technology news on Phys.org
Ygggdrasil said:
Where the fileNames.txt file looks something like

Is the intent that the second column is the old filename and the third column is the new filename? And the first column doesn't get used?
 
PeterDonis said:
Is the intent that the second column is the old filename and the third column is the new filename? And the first column doesn't get used?

No, the string in the second column needs additional information appended to it.

For example, the first file to be re-named is xxx-xx-32S-pl1-J01_S1_L001_R1_1.fastq.gz, where the S# is the row number of the item in the table and the L### contains the integer in the first column of the file with the appropriate number of leading zeros appended.
 
This is better done with an awk script or a python script rather than a pure bash script.

Because you are doing relatively simple line by line actions then awk seems to be the best option here although python wouldn’t be much more difficult.
 
  • Like
Likes   Reactions: Ygggdrasil
jedishrfu said:
This is better done with an awk script or a python script rather than a pure bash script.
Ok, then using an R script is probably the correct approach after all. Thanks!
 
  • Like
Likes   Reactions: jedishrfu
Doing filename edits in bash can be rather painful.
 
  • Like
Likes   Reactions: Ygggdrasil
Perl was developed for these types of tasks. It is a superior scripting tool and is universally available on Unix machines.
 
  • Like
Likes   Reactions: jedishrfu
But awk is better. Just sayin...

As an aside, discovering awk helped rejuvenate my career as a programmer. At the time, in the 1980’s I wanted to learn Unix but was stuck with DOS. However, there were some DOS software packages that transformed DOS and the one I selected also had an awk compiler.

I was amazed at how easy it became to write text processing programs with regular expressions and awk’s processing stanzas. I rewrote many of my c programs, enhanced them and went crazy adding features. It was a wild time and I was having a blast. There was just something about the language and the accompanying book that inspired me even today I will turn to when I have to do some quick one off project.

My most recent program converted Matlab to Julia for a work project that never took off. As always the awk coding was a challenge and a lot more fun.
 
Last edited:
  • Like
Likes   Reactions: anorlunda
jedishrfu said:
But awk is better. Just sayin...
Awk may be perfectly fine (and simpler) for this task. If a task has multiple steps that become alternating sed and awk steps, then a scripting language like Perl is far superior.
 
  • #10
Why would you use awk and sed when awk can do both?

I know it’s common to write simple awk text expressions but awk can do so much more whereas sed is somewhat limited.

Anyway, we shouldn’t side track the thread any further. It was my fault for starting this.
 
  • #11
FactChecker said:
Perl was developed for these types of tasks. It is a superior scripting tool and is universally available on Unix machines.

I second that. I found that small functions and utilities that could have been done with awk or similar "narrow purpose" tools for me quickly grew to require more general coding (like access to dictionaries when parsing and matching up data) and Perl just have all the needed bits in one package.
 
  • Like
Likes   Reactions: FactChecker
  • #12
So many languages are developed when a programmer gets frustrated with its limitations. For text processing you can start with SNOBOL then Tex... then AWK then Perl then Python then Ruby then Groovy then Kotlin and the list will continue until we have some major paradigm shift in programming that changes forever the way we do things.

I just liked AWK for its uniqueness, clean design, novelty and usefulness other languages addressed its limitations later on by extending regular depression parsing, or by giving up on its filtering template but for me it detracted from its elegance.

There was one issue that was a pain though in that the early implementations on Unix would throw an error and it was devilishly hard to find what happened and where. However, my DOS version being a compiler fixed that issue and made it much easier to code with.

At one time, I tried designing an AWK++ to use in place of C++ using AWK as I found C++ frustrating to program in especially with the non-intuitive STL inclusion but wanted a means to explore OO design principles more simply. It was a fools errand as you really need to follow compiler parsing rules with a language grammar to do it right.