Python - all together

We'll use everything I've been discussing about up until last time to do a full example.

I'll write a script that processes markdown files, and trims out the headers and text under some specific level.

I won't attempt to handle every markdown case like commented or quoted setions, but this should work for most files.

I'll start from the bottom and work my way up.

Arguments

import argparse
import os

# more code will come here

def main_with_args(args):
  print("Will copy markdown files from {} to {} while trimming below level {}".format(args.indir, args.outdir, args.maxlevel))

def main():
  parser = argparse.ArgumentParser(description="Copies trimmed markdown files from one directory to another")
  parser.add_argument("-indir", required=True)
  parser.add_argument("-outdir", required=True)
  parser.add_argument("-maxlevel", type=int, default=3)
  args = parser.parse_args()
  main_with_args(args)

if __name__ == '__main__':
  main()

Handling Files

OK, now we're going to write code to create the output directory, and process each file from source into destination. I don't want to recurse in this example, so we'll clear out all subdirectories.

def process_file(in_name, out_name, maxlevel):
  print("processing {} into {}".format(in_name, out_name))

def main_with_args(args):
  # makedirs allows multiple path levels.
  try:
    os.makedirs(args.outdir)
  except FileExistsError as fer:
    print("{} already exists, continuing ...".format(args.outdir))
  for dir, subdirs, files in os.walk(args.indir):
    del subdirs[:]  # use slice syntax to delete all elements
    for f in files:
      basename, ext = os.path.splitext(f)
      if ext.lower() == ".md":
        process_file(os.path.join(dir, f), os.path.join(args.outdir, f), args.maxlevel)

Doing work

Now, finally, we get to the actual file processing. We don't need to look ahead to do our work. We can simply keep track of the current level.

def get_level(s):
  return len(s) - len(s.lstrip('#'))

def process_file(in_name, out_name, maxlevel):
  level = 0
  with open(in_name) as f, open(out_name, "w") as o:
    for l in f:
      if l.startswith("#"):
        level = get_level(l)
      if level <= maxlevel:
        o.write(l)

If you put it all together, you should have a nice little outline for a text-crunching tool.

Happy markdown trimming!

Tags:  codingpython

Home