Kindle Linux Posts

Using pdfcrop to Remove White Margins | Ubuntu

Post author By Alex
Post date January 25, 2011
11 Comments on Using pdfcrop to Remove White Margins | Ubuntu

One of the most annoying things about PDF files are their fixed font size, the only real way of getting a better view at the text is to zoom the whole page in. This isn’t really a fix for that annoyance but it’s a way of getting more info to screen ratio by removing the wasted white space around the body of the file for each page, this really comes in useful when displaying PDF files on your Kindle, Nook, Smart Phone or other eBook Readers.

 sudo apt-get install texlive-extra-utils

It’s really simple to use and by default it crops all the white space from around an image, it does this per page rather than for the entire document which allows for the best results (as long as you don’t mind changes in font size when you’re reading it).

Change input.pdf to the name of the file you want to crop and output.pdf to the output cropped file.

 pdfcrop input.pdf output.pdf

Some PDFs have better results than others and some PDFs will look the same on eBook readers if they crop the whitespace, but it’s a useful tool to have for some of those old pesky JPEG PDF files with massive borders all the way around.

Example: Left = input, right = output

Some PDF files seem to bring up the following error:

!!! Error: Ghostscript exited with error code 1!

I’m currently not sure what causes this, possibly something to do with the encoding type of the PDF? or maybe just some missing dependencies..

For more info about pdfcrop check out the Ubuntu Manpage: http://manpages.ubuntu.com/manpages/gutsy/man1/pdfcrop.1.html

Tags amazon, crop, ebook, font size, guide, HowTo, isadora, JPEG, julia, katya, kindle, linux, linux mint, lisa, nook, pdf, pdfcrop, resize pdf font, smart phone, tutorial, Ubuntu

11 replies on “Using pdfcrop to Remove White Margins | Ubuntu”

Hey Alex,

I’m looking for a script like this to work on my Mac. I have dozens of PDF articles to read for classes with gobs of inefficient white margin. Can this be used on my mac, considering it’s UNIX based? Also, how how do I view the source code? I’m a C development noob and want to see how this works.

Thanks a million,

William

Hey William,

I don’t use Mac however it should work as it’s a Perl script (http://pdfcrop.sourceforge.net/) the source code should be linked on the site too so you can read it however it’s in Perl, not C.

Good luck and let me know if you get something working!

This one works. But it turns my original 5M file into a huge 86M one. Obviously, I can’t convert ~100 files for reading on my tablet by this method. Is there any solution to this?

I didn’t notice this, I’ll have a check and update if I find anything!

The perl script pdfcrop in the comments is written by Eric Doviak. The one in the Ubuntu texlive-extra-utils package is by Heiko Oberdiek. They are not the same. Eric Doviak’s script did not work for me, but the other did. I was able to shrink down the huge resulting pdf to smaller than the original size by using the following command: “gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf” which I found here: http://www.ubuntugeek.com/ubuntu-tiphowto-reduce-adobe-acrobat-file-size-from-command-line.html

Perhaps there are some command line arguments that can be sent to Heiko Oberdiek’s pdfcrop to override the gs command that writes the huge pdf to avoid it in the first place, or maybe a simple change to the perl script would do it.

I found some people complaining that the above gs command removes some elements from PDFs. Removing a few of the arguments and just running “gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf” gave an acceptable file size and many claimed that it avoided this problem. Note that the -sOutputFIle= argument does not seem to work with a path name.

There is a newer version of Heiko Oberdiek’s pdfcrop which is available here: http://www.ctan.org/tex-archive/support/pdfcrop I played around with some of the new command line options like –pdfversion, but could not get smaller file sizes. Perhaps someone can email the author and see if he has a solution as I already spent way too much time on this. It’s listed in the readme file on the page linked above as “heiko.oberdiek at googlemail.com”.

I too got a hopelessly large file size on the result. Would have been great otherwise.

… checkout briss (http://sourceforge.net/projects/briss/) …

There is a –xetex option to pdfcrop, which helped immensely in reducing the output file size. By default, pdfcrop uses pdftex to write the cropped file. Using the –xetex option used xetex instead.

My input file was just over 1M and had 426 pages. The pdftex-cropped version came out to be 5.9M (495% larger) while xetex-cropped version was only 1.1M (10% larger). YMMV.

It did take very long (13 minutes) in each case. I was using older version of pdfcrop (1.20) and texlive. Newer versions might be better. Most of the time seems to be spent in calculating the bounding box for each page using ghostscript. Perhaps there is a better/faster way.

I noticed that Eric Doviak’s pdfcrop script is very fast (although it didn’t support all the options that Oberdiek’s pdfcrop did). It turns out Eric’s script uses an additional -r72 option to ghostscript, specifying the output resolution.

Oberdiek’s pdfcrop also supports --resolution option. Using this reduced the processing time from 13 minutes to 18 seconds! That’s a huge improvement. So, the required options to optimize both output file size and processing time are:

pdfcrop --xetex --resolution 72 [other-options] input.pdf output.pdf

Thanks! However, when I use pdfcrop, the internal links are lost in the cropped pdf. Any fix for that issue? Thanks again! Martin

11 replies on “Using pdfcrop to Remove White Margins | Ubuntu”

Leave a Reply