Python directory crunching

Files multiply seemingly without end. We often write scripts to reduce complexity, so let's make things easier for our users.

I've written about how to quickly process text files with a script in Python, but what if you have many files to process? (by the way, the answer is always 'yes', eventually)

We want to be able to just point to whatever directory has our files and have the script figure out what it can work with.

Enter the os.walk function. This function returns files and directories at every point under a specific root directory. Here is the basic outline:

import os
start = '.' # or some directory path
for dir, subdirs, files in os.walk(start):
  print("dir: " + dir)
  for s in subdirs:
    print("subdir: " + s)
  for f in files:
    print("files: " + f)

Each iteration of the loop will have a directory name in dir, a list of subdirectory names in subdirs, and a list of files in files.

Here are some things you can do with them:

Caveat: if the start directory is relative like in my example then the directory names will also be relative - they don't get expanded automatically.

Happy directory traversing!

Tags:  codingpython

Home