The Autodidacts

Exploring the universe from the inside out

Pandoc for Writers

In this tutorial I’ll show how to convert manuscripts written in markdown into a variety of industry-standard formats, automatically, using Pandoc.

Why bother?

I write short-stories and essays in markdown, but many magazines and literary journals only accept manuscripts as Word documents or PDFs, which forces me to convert my source text into a myriad of (often obsolete) formats every time I want to submit to one of these journals.

To do this, I would upload my markdown to whatever document converter app was the top Google result of the day and then fuss over the formatting of the resulting document to get it to comply to the standard manuscript guidelines. But I’d inevitably make little changes while formatting the manuscript, and then bigger revisions, and pretty soon I was at the 36th draft, with my editing history split between markdown and Word — making diffing and version control impossible.

By automating this process, my manuscripts can stay in Markdown, and I can have a manuscript ready to submit to any journal in seconds.

Despite the fact that spending time automating things is more interesting than spending time doing them, as all nerds know and ignore, and as explained by the xkcd #1319, it does not necessarily follow that automating things is any net savings … for the person doing it:

Xkcd #1319, 'Automation'
xkcd #1319, “Automation”

But making use of the automated workflows poor gullible nerds like me have come up with is a legitimate laboursaving device; and since I’ve already wasted more time automating it that I would have ever spent converting manuscripts — even if I churned out pieces as fast as Issac Asimov — the only way I can make up for my lost time is by saving yours.

What is Pandoc?

Pandoc is an open-source document conversion supertool — the gas-powered 94-tool Swiss army knife of the document conversion world. It can do just about anything, but good luck getting it to do anything other than the simplest conversions when you’re getting started.

It’s easy to set up Pandoc and start converting stuff using its default settings. But using Pandoc like this isn’t much of an improvement over the above-mentioned online conversion apps, when it comes to converting and formatting manuscripts. So I set out to get Pandoc to convert my markdown into Open Document files, and put it into standard manuscript format at the same time.

It took a while to figure out, because I couldn’t find anyone else who had done it; Pandoc is popular with academics and web journalists, but so far few straight-up “writers” haven’t adopted it.

Step 1: Know your goal

You’re reading this article, so you’re probably already familiar with the standard manuscript format for non-fiction and short-stories. If you’re not, head over to William Shunn’s excellent page on the topic.

Step 2: Install Pandoc

On Ubuntu Linux, you can install Pandoc by opening a terminal window and running:

sudo apt install pandoc

I use Linux, so this article is Linux-first; however, this is the only step that will be much different if you’re using a different OS. If you’re using Mac or Windows, take a look at the official Pandoc installation instructions for your system.

Step 3: Test Pandoc

Let’s make sure everything’s hunky dory with a simple conversion test. Write some markdown and save it in your home folder as "pandoc-test.md".

Open up a terminal and make sure Pandoc installed properly with pandoc --version.

Time for action! Convert your test document to Open Office format with:

pandoc -s pandoc-test.md --output pandoc-test.odt

What does this do? First, it calls Pandoc. Next we have -s, which means "standalone" — telling it to convert the whole document, with the necessary headers, rather than part of a document — followed by the input file. Then we have the --output flag followed by the file we want it to create. In this example, Pandoc automatically detects that you want to convert Markdown to an Open/Libre Office document, by looking at the file extensions of the input and output files. You can also set these manually using the -f and -t flags; for this example, we could set them manually with -f markdown -t odt. (So if you wanted to mess with someone you could run pandoc -s document.md --output document.docx -f markdown -t odt)

Step 4: Download the reference document

Pandoc cribs the styling for the output document from a reference document. The default styling for Office documents isn’t very attractive, and it isn’t manuscript-formatted, so I created a customized reference document. Here it is:

Download reference.odt

This document shows Pandoc how to style the output. I made it by taking a document that Pandoc had generated (this recommended over creating one from scratch, to make sure it’s compatible) and adjusted the global styles for the body text, headings, and document header. When Pandoc uses it as a reference for conversion, it throws the content out but inherits the reference document’s global styles.

Place reference.odt where Pandoc goes looking for reference documents: in a folder called .pandoc in your home directory. If this hidden folder doesn’t exist, create it. The full path should be ~/.pandoc/reference.odt.

If you’re curious, this is what the reference document tells Pandoc about how to properly format your manuscript:

  1. Set all margins to 1 inch.

  2. Make the body text double-spaced.

  3. Set the document font to monospace.

  4. Set the font size to 12 point.

  5. Remove space between paragraphs, and indent the first line of each paragraph by half an inch.

  6. Add a header on the first page for contact information and word count.

  7. Center all section headers, and set them in 12 point type.

  8. Add page headers to all subsequent pages that say "Lastname / Title / Page #". (The page number is auto-generated.)

Step 5: Edit the template

Replace placeholder name, address, email, and phone number on the first page with your contact information. On page two, replace "Lastname" (in the header) with your name. You’ll only need to do this once.

Step 6: Try it out

Open terminal and navigate to the directory where you stashed your manuscript. Assuming your manuscript is imaginatively titled “manuscript.md”, run the following:

pandoc -s manuscript.md  --output manuscript.odt

Open manuscript.odt and savour the formatting!

Step 7: What you still need to do manually

Go to "File" > "Properties…", and edit the document title; it will be automatically inserted into the header. Alternately, you can skip such trickery, and insert the title of the piece manually into the header of the second page. Make sure not to forget, though: you wouldn’t want to send it out titled “Title”!

Read through the manuscript to make sure that everything looks right, and that the auto-generated word count in the top right corner of the first page is correct. If you like, type "END" after the last paragraph, centred in the middle of the page.

Making Word documents

Some literary magazines are stuck in 1993, and only accept Word documents. Thanks to Pandoc, writers don’t have to remain in the past to submit work these magazines. The steps for converting markdown to a manuscript-formatted Word document are pretty much the same as the above:

  1. Download this handy reference.docx, customize it with your contact information, put it in your .pandoc directory.

  2. Run Pandoc with pandoc -s manuscript.md --output manuscript.docx

  3. Make the any necessary adjustments, and send it off!

Making PDFs

Pandoc can generate nice PDFs, but it’s easier to export a PDF from the Office document than to set up a Pandoc template for auto-generating PDFs; even automation has its limits.

You can do it in two clicks:

  1. Click the red "export as PDF" button in the toolbar (or go to "File" > "Export PDF")
  2. Click “Save”.

And you’re done!

Conclusion

This is only the beginning of what Pandoc can do. Here are some other goodies:

  • You don’t have to start with markdown. You can pass other formats to Pandoc and get the same manuscript output. You can even throw a Word document set in Comic Sans at Pandoc, and have it convert it into a Shunn-compliant manuscript. More realistically, you can convert back and forth between Microsoft Word and Open Document formats, depending on which Office suite is your tool of choice — or even convert HTML to manuscript format, if you’re submitting something that you previously published on the web.

  • Pandoc can automatically convert "dumb" quotes into “smart” quotes, as well as turning fake em-dashes -- the kind made with two hyphens -- into real em-dashes (—), and turning three periods (...) into a genuine honest-to-goodness ellipse … and all you need to do to enable this feature is add the --smart flag to your Pandoc command.

  • Automate all the things. Want to convert a manuscript to multiple formats in one go? I made a little bash script to automate the task for you: format_manuscripts.sh. Run it from within the directory that contains your markdown source (bash format_manuscripts.sh -f="Your Filename") and it will create a directory for formatted manuscripts and generate .ODT and .DOCX manuscripts. Plus, it opens the manuscripts for editing so you can make sure they’re ready to go.

  • You can even make eBooks. The best thing about this workflow is that it leaves your original manuscript untouched. You can convert to as many formats as you want without loosing backward compatibility. What to double-dip and turn all those old blog posts and essays into an eBook? Just run them through Pandoc again.

Now that you’ve got the basics, take a look at the Pandoc documentation for the full menu. If you have any questions or suggestions, leave a comment below or hit me up on Twitter. Thanks for reading!