How can I use AWK to batch rename files in Unix?

Ygggdrasil · Jun 7, 2019

I'd like to rename a bunch of files in a directory based on data from a tab-delimited text file. I know how to do this in R:

Code:

dir <- "~/user/folder/"
fileNames <- read.table(paste0(dir,"fileNames.txt"),sep="\t",header=T)
for (i in c(1:nrow(fileNames))){
  oriFile <- paste0(dir,fileNames[i,"Sample"],"_S",i,sprintf("_L%03d",fileNames[i,"Lane"]),"_R1_001.fastq.gz")
  newFile <- paste0(dir,fileNames[i,"ID"],".fastq.gz")
  system(paste("mv",oriFile,newFile))
}

Where the fileNames.txt file looks something like:

Code:

Lane	Sample	ID
1	xxx-xx-32S-pl1-J01	WT1.IN
1	xxx-xx-32S-pl1-J02	WT2.IN
1	xxx-xx-32S-pl1-J03	WT3.IN

In an effort to improve my knowledge of unix shell scripting, I'd like to know how one would approach writing a bash script that does this.

PeterDonis · Jun 7, 2019

Ygggdrasil said:

Where the fileNames.txt file looks something like

Is the intent that the second column is the old filename and the third column is the new filename? And the first column doesn't get used?

Ygggdrasil · Jun 7, 2019

PeterDonis said:

Is the intent that the second column is the old filename and the third column is the new filename? And the first column doesn't get used?

No, the string in the second column needs additional information appended to it.

For example, the first file to be re-named is xxx-xx-32S-pl1-J01_S1_L001_R1_1.fastq.gz, where the S# is the row number of the item in the table and the L### contains the integer in the first column of the file with the appropriate number of leading zeros appended.

jedishrfu · Jun 7, 2019

This is better done with an awk script or a python script rather than a pure bash script.

Because you are doing relatively simple line by line actions then awk seems to be the best option here although python wouldn’t be much more difficult.

Ygggdrasil · Jun 7, 2019

jedishrfu said:

This is better done with an awk script or a python script rather than a pure bash script.

Ok, then using an R script is probably the correct approach after all. Thanks!

jedishrfu · Jun 7, 2019

Doing filename edits in bash can be rather painful.

FactChecker · Jun 7, 2019

Perl was developed for these types of tasks. It is a superior scripting tool and is universally available on Unix machines.

jedishrfu · Jun 7, 2019

But awk is better. Just sayin...

As an aside, discovering awk helped rejuvenate my career as a programmer. At the time, in the 1980’s I wanted to learn Unix but was stuck with DOS. However, there were some DOS software packages that transformed DOS and the one I selected also had an awk compiler.

I was amazed at how easy it became to write text processing programs with regular expressions and awk’s processing stanzas. I rewrote many of my c programs, enhanced them and went crazy adding features. It was a wild time and I was having a blast. There was just something about the language and the accompanying book that inspired me even today I will turn to when I have to do some quick one off project.

My most recent program converted Matlab to Julia for a work project that never took off. As always the awk coding was a challenge and a lot more fun.

FactChecker · Jun 8, 2019

jedishrfu said:

But awk is better. Just sayin...

Awk may be perfectly fine (and simpler) for this task. If a task has multiple steps that become alternating sed and awk steps, then a scripting language like Perl is far superior.

jedishrfu · Jun 8, 2019

Why would you use awk and sed when awk can do both?

I know it’s common to write simple awk text expressions but awk can do so much more whereas sed is somewhat limited.

Anyway, we shouldn’t side track the thread any further. It was my fault for starting this.

Filip Larsen · Jun 8, 2019

FactChecker said:

Perl was developed for these types of tasks. It is a superior scripting tool and is universally available on Unix machines.

I second that. I found that small functions and utilities that could have been done with awk or similar "narrow purpose" tools for me quickly grew to require more general coding (like access to dictionaries when parsing and matching up data) and Perl just have all the needed bits in one package.

jedishrfu · Jun 8, 2019

So many languages are developed when a programmer gets frustrated with its limitations. For text processing you can start with SNOBOL then Tex... then AWK then Perl then Python then Ruby then Groovy then Kotlin and the list will continue until we have some major paradigm shift in programming that changes forever the way we do things.

I just liked AWK for its uniqueness, clean design, novelty and usefulness other languages addressed its limitations later on by extending regular depression parsing, or by giving up on its filtering template but for me it detracted from its elegance.

There was one issue that was a pain though in that the early implementations on Unix would throw an error and it was devilishly hard to find what happened and where. However, my DOS version being a compiler fixed that issue and made it much easier to code with.

At one time, I tried designing an AWK++ to use in place of C++ using AWK as I found C++ frustrating to program in especially with the non-intuitive STL inclusion but wanted a means to explore OO design principles more simply. It was a fools errand as you really need to follow compiler parsing rules with a language grammar to do it right.

How can I use AWK to batch rename files in Unix?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Sweetspot of data compression

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect