Introduction
One really nice addition to a standard R environment is the ability to create reports which combine R code, comments and embedded graphical output. The original mechanism for doing this was Sweave, but more recently a second system called knitr has emerged which seems to be more flexible, and this is what I’ve been using
Basic Knitr Use
One of the aspects I particularly liked about knitr was that you could use simple markdown syntax to create your reports rather than having to use Latex, which I’ve never really got to grips with. It’s really simple to create documents which mix comments, code and graphics. Here is a simple example:
Test ===== Simple test markdown for use with knitr ```{r} plot(1:20,(1:20)^2) ```
If you use the standard “Knit HTML” function built into Rstudio then this will generate an HTML document with bitmap PNG images actually embedded within the document source as base64 encoded data. This is fine, but what I often wanted was the ability to generate vector images so that they could later be extracted for publication without having to rerun the script.
Having done some searches I found lots of different bits of information which described different ways to convert the various files which knitr can produce, but with wildly varying levels of success, so having figured out some nice ways to make this work I thought I’d document them.
Using Vector Images directly in your HTML
The simplest way to get vector images into your document is simply to change the type of image you use to a vector based image. Most web browsers these days can understand SVG, and this is a standard output option for R so you can simply change your code to:
Test ===== Simple test markdown for use with knitr ```{r dev='svg'} plot(1:20,(1:20)^2) ```
..and you’ll now get an HTML document with embedded SVG graphics.
I did however find a problem with this. Some of the graphs I was drawing contained a reasonably large number of points and the SVG files created were also very large which made them unwieldy and slow to open. This is however the simplest way to do your conversion.
Using PDF as an output format
An alternative method to use would be to generate a PDF document as your output format. Rs PDF output seems to be better structured than its SVG and PDF images for plots which were very large in SVG were tiny in their PDF equivalents.
Knitr itself won’t natively create PDFs and the easiest way I found to do the conversion was to use a program called pandoc. Pandoc can read in files in a variety of formats and convert them into PDFs using a command as simple as:
pandoc -s input.html -o output.pdf
I found this suggestion in a few places for converting the HTML output from knitr into PDF, but there are a couple of problems with this approach.
- The default HTML output from Rstudio uses html encoding and pandoc can’t understand this. In order to the conversion you need to tell the program to create xhtml encoded output. You can do this by manually running the conversion in an R session
knit("knitr_test.Rmd","knitr_test.md") markdownToHTML("knitr_test.md","knitr_test.html",options=c("use_xhtml"))
which will create an xhtml version of your Rmd file which can be converted.
- The second problem is the pandoc will embed whichever figures you created in your initial knitr run and try to put these into the PDF. This is fine if you used PNG files, but it can’t handle SVG files so you’ll get an error if you try to use these. The solution to this is either
- Change your device type in your file to
{r dev='pdf'}
This will produce an html file you can’t actually open in a browser as the PDF links won’t be rendered, but pandoc will recognise them and make a nice PDF from them.
- Change your device type to PDF and run knitr, but then don’t generate HTML but instead run pandoc on the .md file created by knitr. This will actually produce a more nicely formatted PDF than going from the HTML. One problem you will find with this though is that you get odd figure legends under your plots (Figure 1: plot of chunk unnamed-chunk-1 for example). You can fix this by putting in a proper figure caption on the R code which generates the plot
```{r dev='pdf', fig.cap="My first figure"}
- Change your device type in your file to
Using these relatively simple steps you should be able to turn an R script with a minimal amount of additional formatting into a professional looking document with publication quality figures embedded into it.