Getting all files under a folder and subfolders in a recursive manner

  • Context: Python 
  • Thread starter Thread starter Arman777
  • Start date Start date
  • Tags Tags
    files
Click For Summary
SUMMARY

This discussion focuses on recursively listing all file paths within a specified directory and its subdirectories using Python's os module. The provided solution utilizes the os.walk() function to traverse the directory structure, collecting full file paths into a list. The final implementation demonstrates a function, get_all_filePaths(folderPATH), which returns a comprehensive list of file paths while excluding subfolder paths. The solution is effective for any depth of nested folders, ensuring complete retrieval of file paths.

PREREQUISITES
  • Python programming knowledge, specifically with version 3.x
  • Understanding of the os module for file and directory manipulation
  • Familiarity with list comprehensions in Python
  • Basic knowledge of recursive functions and their implementation
NEXT STEPS
  • Explore advanced file handling techniques in Python using the pathlib module
  • Learn about error handling in file operations to manage exceptions effectively
  • Investigate performance optimization strategies for large directory structures
  • Study the differences between os.walk() and other directory traversal methods
USEFUL FOR

Python developers, data scientists, and anyone needing to automate file management tasks within complex directory structures will benefit from this discussion.

Arman777
Insights Author
Gold Member
Messages
2,163
Reaction score
191
I want to list all the full path of the files under a folder and all its subfolders recursively. Is there a way to do it? It seems that if the files go into 2 levels, the code can be written like is,


Code:
    import os

    folderPATH = r'C:\Users\Arman\Desktop\Cosmology\Articles'
    filePATHS = [x[2] for x in os.walk(folderPATH)]

    for i in filePATHS:
        for j in i:
            print(j)
prints


Code:
    Astrophysical Constants And Parameters.pdf
    desktop.ini
    Physics Latex Manuel.pdf
    Spactimes.pdf
    A parametric reconstruction of the cosmological jerk from diverse observational data.pdf
    A Thousand Problems in Cosmology Horizons.pdf
    An Almost Isotropic CM Temperature Does Not Imply An Almost Isotropic Universe.pdf
    Big Bang Cosmology - Review.pdf
    desktop.ini
    Expanding Confusion common misconceptions of cosmological horizons and the superluminal expansion of the universe.pdf
    Hubble Radius.pdf
    Is the Universe homogeneous.pdf
    LCDM and Mond.pdf
    Near galaxy clusters.pdf
    The Cosmological Constant and Dark Energy.pdf
    The mass of the Milky Way from satellite dynamic.pdf
    The Status of Cosmic Topology after Planck Data.pdf
    An upper limit to the central density of dark matter haloes from consistency with the presence of massive central black holes.pdf
    Dark Matter - Review.pdf
    Dark Matter Accretion into Supermassive Black Holes.pdf
    desktop.ini
    Andrew H. Jaffe - Cosmology - Imperial College Lecture Notes - Thermodynamics and Particle Physics.pdf
    Big Bang Nucleosynthesis.pdf
    Claus Grupen - Astroparticle Physics - The Early Universe.pdf
    Daniel Baumann - Cosmology - Mathematical Tripos III - Thermal History.pdf
    desktop.ini
    James Rich - Fundamentals of Cosmology - The Thermal History of the Universe.pdf
    Lars Bergström, Ariel Goobar - Cosmology and Particle Astrophysics -  Thermodynamics in the Early Universe.pdf
    Steven Weinberg - Cosmology - The Early Universe.pdf
    Andrei Linde - On the problem of initial conditions for inflation.pdf
    ...

I want a function that produces the same results but with recursive logic and with the full paths. So that for `n` nested folders, I can find the paths. I need something like this,


Code:
    import os

    def get_all_filePATHs(folderPATH):
        ...
        return get_all_filePATHs()

    folderPATH = r'C:\Users\Arman\Desktop\Cosmology\Articles'
    print(get_all_filePATHs(folderPATH))
Note that I am only interested in the full path of the files and not the path of the subfolders, as you can see from the example above.
 
Technology news on Phys.org
Okay this works,

Code:
import os

def get_all_filePaths(folderPATH):
    result = []
    for dirpath, dirnames, filenames in os.walk(folderPATH):
        result.extend([os.path.join(dirpath, filename) for filename in filenames])
    return result
 
I seem to remember writing something like that years ago - ah, here it is (in Pascal):
Code:
procedure TForm1.DoDirectory;
var
  sRec: TSearchRec;
  sRslt: Integer;
  specDir, First: Boolean;
  currFN, altFN: String;
Begin
  First := True;
  sRslt := FindFirst ('*.*',faReadOnly    Or faHidden    Or faSysFile Or faDirectory, sRec);
  While (sRslt = 0) Do Begin
    altFN := sRec.FindData.cAlternateFileName;
    currFN := sRec.FindData.cFileName;
    If (altFN<>'') and (altFN<>currFN) Then Begin
      If First Then Begin
        Writeln(resF);
        Writeln(resF,'[Directory]');
        Writeln(resF,GetCurrentDir);
        First := False;
        End;
      WriteLn(resF, altFN, ' "', currFN,'"');
      End;
    sRslt := FindNext(sRec);
    End;
  sRslt := FindFirst ('*.*', faDirectory, sRec);
  While  (sRslt = 0) Do Begin
    currFN := sRec.FindData.cFileName;
    specDir := (currFN = '.') Or (currFN = '..') Or ((sRec.Attr And faDirectory)=0);
    If (not specDir) and (testCnt<32) Then Begin
      Inc(testCnt);
      ChDir(currFN);
      DoDirectory;
      Chdir('..');
      End;
    sRslt := FindNext(sRec);
    End;
End;
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K