Tuesday, December 6, 2016

[xtopdf] Wildcard text files to PDF with xtopdf and glob

By Vasudev Ram



First joker card image attribution

This is another in my series of applications that use xtopdf (source), my Python toolkit for PDF creation from other formats.

[ Here's a good overview of xtopdf, its uses, supported platforms and formats, etc. ]

I called this app WildcardTextToPDFpy.. It lets you specify a wildcard for text files, and then converts each of the text files matching the wildcard (like grades*2016.txt or monthly*sales.txt) [1], into corresponding PDF files - with the same names but with '.pdf' appended.

[1] For example, the wildcard grades*2016.txt could match grades-math-2016.txt and grades-bio-2016.txt (think a school or college), while monthly*sales.txt might match monthly-car-sales.txt and monthly-bike-sales.txt (think a vehicle dealership), so a PDF file will be generated for each text file matching the given wildcard.

The program uses the iglob function from the glob module in Python's standard library, similar to how this recent other post:

Simple directory lister with multiple wildcard arguments

used the glob function. The difference is that glob returns a list, while iglob returns a generator, so it will return the matching filenames lazily, on demand, as shown here:
$ python
>>> g1 = glob.glob('text*.txt')
>>> g1
['text1.txt', 'text2.txt', 'text3.txt']
>>>>>> g2 = glob.iglob('text*.txt')
>>> g2
<generator object iglob at 0x027C2850>
>>> next(g2)
'text1.txt'
>>> for f in g2:
...     print f
...
text2.txt
text3.txt

Here is the code for WildcardTextToPDF.py:
from __future__ import print_function

# WildcardTextToPDF.py
# Convert the text files specified by a filename wildcard,
# like '*.txt' or 'foo*bar*baz.txt', to PDF files.
# Each text file's content goes to a separate PDF file, with the 
# PDF file name being the full text file name (including the 
# '.txt' part), with '.pdf' appended.
# Requires:
# - xtopdf: https://bitbucket.org/vasudevram/xtopdf
# - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Product store: https://gumroad.com/vasudevram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com

import sys
import os
import glob
from PDFWriter import PDFWriter

def usage(argv):
    sys.stderr.write("Usage: python {} txt_filename_pattern\n".format(argv[0]))
    sys.stderr.write("E.g. python {} foo*.txt\n".format(argv[0]))

def text_to_pdf(txt_filename):
    pw = PDFWriter(txt_filename + '.pdf')
    pw.setFont('Courier', 12)
    pw.setHeader('{} converted to PDF'.format(txt_filename))
    pw.setFooter('PDF conversion by xtopdf: https://google.com/search?q=xtopdf')

    with open(txt_filename) as txt_fil:
        for line in txt_fil:
            pw.writeLine(line.strip('\n'))
        pw.savePage()

def main():
    if len(sys.argv) != 2:
        usage(sys.argv)
        sys.exit(0)

    try:
        for filename in glob.glob(sys.argv[1]):
            print("Converting {} to {}".format(filename, filename + '.pdf'))
            text_to_pdf(filename)
    except Exception as e:
        print("Caught Exception: type: {}, message: {}".format(\
            e.__class__, str(e)))

if __name__ == '__main__':
    main()
And here are the relevant files before and after running the program, with the program's output in between:
$ dir text*.txt/b
text1.txt
text2.txt
text3.txt

$ python WildcardTextToPDF2.py text*.txt
Converting text1.txt to text1.txt.pdf
Converting text2.txt to text2.txt.pdf
Converting text3.txt to text3.txt.pdf

$ dir text?.txt*/od/b
text1.txt
text2.txt
text3.txt
text1.txt.pdf
text2.txt.pdf
text3.txt.pdf
Finally, here's a cropped screenshot of the third output file, text3.txt.pdf, in Foxit PDF Reader (a lightweight PDF reader that I use):


Also look up:

[xtopdf] Batch convert text files to PDF (with xtopdf and fileinput)

for another variation on the approach to converting text files to PDF.

Speaking of generators, also check out this other post about them:

Python generators are pluggable

The image at the top of the post is of the earliest Joker card by Samuel Hart c. 1863, according to Wikipedia:

Joker (playing card)

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes

FlyWheel - Managed WordPress Hosting



No comments: