fogbound.net v. 3.0

Stripping images from PDFs using Ghostscript

— SjG @ 10:28 am

A long PDF was to be printed, but only the text was important. As it was full of images, it seemed like removing the images would save a whole lot of ink.

It turns out ghostscript has some very nice filters for removing classes of content from a file. You can very simply remove text, images, or vector objects without changing the rest of the layout.

For example, to strip vector and images from a PDF, you can use:

gs -o text-only.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE pdf-with-pictures.pdf

If you don’t have ghostscript installed but use Docker, there are containers that make it easy:

docker run --rm -v pwd:/app -w /app minidocks/ghostscript gs -o text-only.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE pdf-with-pictures.pdf

Filed in:

Comments (0)

Solving a VPN Mystery

— SjG @ 1:14 pm

The Department of Water and Power is doing work near the office, and over the weekend, there was a sustained power outage. I came in Monday to shrieking UPSes and had to power up the firewall and a few other machines. It was the normal stupid kind of stuff.

We have a few virtual servers out in “the cloud,” and we use point-to-point VPNs to make them seem local to our network. Those VPNs also needed restarting.

Through the course of the day, however, one VPN connection kept unceremoniously disconnecting. Looking at logs on the various servers was unenlightening. Everything was running normally, other than the surprise disconnects.

In the evenings, I’ve been watching the old Grenada TV/Jeremy Brett Sherlock Holmes series, so I had to apply Holmes’ deductive process. The virtual servers had experienced no changes except being disconnected, so I needed to focus on the firewall. The firewall had experienced no change, except being restarted. What could have happened?

I finally found a configuration that was incorrect (it was a netmask that was insufficiently restrictive, allowing devices not on the VPN to collide with VPN IP addresses). I fixed the netmask, and the VPN has been up and stable ever since.

But how could this be? It had been running properly literally for years. It had to be something to do with the power outage. But if that had corrupted the configuration, it wouldn’t have been a single IP netmask changing. “[W]hen you have eliminated the impossible, whatever remains, however improbable, must be the truth.” The bad configuration file could not have been in use.

The best theory is that the configuration file had been (accidentally?) modified at some point in the past, but never loaded. When the firewall was restarted, it loaded this modified configuration for the first time.

Filed in:

Comments (0)

Backups

— SjG @ 10:34 am

So, computer folks always talk about the 3-2-1 strategy of backups: have three copies of your data, stored on two different types of media, with one geographically separated. They also like to repeat slogans like “if you have one backup you have no backups.”

For years, I’ve relied on Time Machine, the backup system Apple includes with their operating system. It not only provides a backup, but it keeps multiple versions of files, so if you, for example, accidentally clobbered your book manuscript by searching and replacing a badly-chosen term but didn’t notice for a week, you could go back to the version you had backed up last week. I felt like I was doing a pretty good job of securing my data: I back up onto an external drive at home, and I also back up on an external drive at the office, a little over 1km away. These external drives are encrypted, so if someone were to break into either place and swipe a drive, they’d have the hardware but not my data.

A few years ago, I also added another layer of redundancy: an encrypted cloud backup. I hadn’t liked the cloud backup services I’d seen before, because all of my files would be on someone’s machine where I had no control over them. A screw-up on the part of a system administrator somewhere could make my files available to the open internet! However, a bunch of new services started offering encrypted backups, where the encryption happens locally and the service doesn’t have view into your files other than it’s a big chunk o’ data (more on this later).

To make a long story short, I tried a few services, and went with Backblaze (disclaimer: that’s an affiliate link, I get credits if you follow it and subscribe. You can always avoid that by going directly to https://backblaze.com).

Fast forward a few years. A friend who’s not particularly computer savvy needed help with some IT stuff. They had an external hard drive connected to their machine and used Windows backup, but the process had silently failed a year before. In diagnosing and fixing this, I also convinced them to pay for and use cloud backups.

This friend lost their house and everything in it during the wildfires last week. Among the long list of things that they didn’t have time to grab before evacuating was that backup hard drive. Cloud backups to the rescue! I was able to download all their files for them.

The surprising scope of the fires also brought one thing into sharp focus: my original strategy of “one backup at home and one at the office” is really insufficient. One kilometer’s not far enough away! Having a remote backup somewhere is an important part of backup plans.

I mentioned above that encrypted cloud services like Backblaze have no visibility into your data. This is not completely true. If you use their encryption scheme, the data is encrypted on your local machine before the data is transmitted over the network. So it’s true in normal operations that there’s no way for them to see the contents. However, when you use their interface to restore files, you need to give them your encryption key so they can identify which file(s) you wish to restore. That means the data is (at least temporarily) decrypted on their servers. When I did a full restore of my friend’s files, I provided the key and they generated a zip file for me to download. That zip file was not encrypted. They say it’s on their server for only a seven days, and I don’t have any reason to distrust them.

I want my data encrypted when it’s backed up because I have financial information like account numbers, etc, that could be abused. That these could exist as clear-text on someone else’s server for short periods of time is not ideal, but it’s also a pretty minimal threat. That being said, if you are involved in journalism, political activism, or other activities where your information could impact people’s lives, this may not be the best solution.

Filed in:

Comments (0)

Mysterious Crossword

— SjG @ 4:05 pm

In the so-called Golden Age of Detective Fiction, there was a group of four or five writers considered the Queens of Crime: Margery Allingham, Agatha Christie, Ngiao Marsh, Dorothy L. Sayers, and Josephine Tey. Christie gets most of the glory in the US due to the Hollywood adaptations of her novels, but recently I’ve been reading through Sayers’ Lord Peter Wimsey and Montague Egg mysteries.

Anyone who has read Christie (even the modern, bowdlerized versions) knows they’re chock-a-block with racism, classism, and antisemitism, and, sadly, Sayers suffers from this as well. Unlike Christie, Sayers brings to bear her Oxford education, so her novels and short stories contain frequent allusions to and excerpts from writers ranging back into classic Greece and in a variety of languages. Like Christie, the plots are convoluted with any possible suspects and countless red herrings.

In her 1925 short story, “The Fascinating Problem of Uncle Meleager’s Will” (originally published Pearson’s Magazine, volume 60), Sayers includes a full crossword puzzle that Lord Peter Wimsey and his associates must solve to locate the referenced will. Normally, I let this kind of story just wash over me. I don’t try to solve the murder and I don’t try to analyze the clues. But in this case, I thought I’d try to solve the crossword.

Of course, British crosswords are different than the NY Times style with which I’m more familiar. Furthermore, the number of classical references quickly overwhelmed me. I wasn’t able to complete it. But maybe you will? I took the layout, clues, and solution and laid them out in a convenient PDF for your puzzling pleasure.

dorothy-sayers-puzzle-with-clues Download

Filed in:

Comments (0)

Holding on to Hardware

— SjG @ 11:19 am

My cousin sent me a box of old photos that she had inherited from her mother. It turns out that my mother and her mother would send photo albums to one another throughout the late 1960s and into the early 1980s.

Many of these pictures are interesting to me, and I’d like to digitize them. The average online service wants between $0.65 and $1.25 to scan a print without doing touchup. I’ve used services to scan negatives in the past, but I have an old phot scanner and I have digital cameras that I could use to take photos of the prints.

The prints are degraded to various degrees and many are not really flat, so my first thought was to put them under glass and photograph them. I set up a rig to do that, but it was pretty finicky. Lighting to prevent reflections isn’t easy (I’m space-constrained by boxes of old junk in my office). The prints are many different sizes, and positioning each one took a lot longer than I wanted to spend on it. I don’t really need these in 12 or 24 megapixel detail, plus my macro lens is old and introduces some distortion.

So I decided to use my old Epson Perfection Photo 3170 from … ulp … 20 years ago. It’s USB-A and my current M1 MacBook only has USB-C ports, but I have plenty of USB-A to USB-C adapters for this kind of situation. I plugged the scanner into my M1 MacBook, but it was not recognized. I downloaded a new driver from Epson, but it wouldn’t install, giving me the helpful message “You can’t open the application “EPSON Scan Installer” because this application is not supported on this Mac.” Is that because it’s Intel code and I can’t run drivers in emulation? I have no clue.

I tried downloading VueScan, which is widely recommended for scanners where the driver is no longer provided, but it couldn’t see the scanner either. Mysterious. I’m beginning to think it’s something to do with the hardware itself. It used to work. Had the scanner died from sitting neglected?

I dug through one of those aforementioned space-constraining boxes of junk, and got out my Intel-based MacBook Pro from 2011. I powered it up, plugged the scanner in, and Image Capture immediately recognized it. So I’m scanning on the old machine.

Image Capture under old Mac OS is a little annoying, but I can scan 4 photos at a go into 32-bit TIFF files. I’m only scanning at 600dpi, so I’m getting roughly 6 megapixel scans of these photos. I considered scanning at a higher resolution, but the time and effort and storage involved didn’t seem to be worthwhile. I may regret this someday.

Anyway, here’s a birthday cake I decorated for my best friend Charlie back in March of 1978.

Filed in:

Comments (1)

Tue, 11 Mar 2025

Stripping images from PDFs using Ghostscript

Tue, 4 Mar 2025

Solving a VPN Mystery

Thu, 16 Jan 2025

Backups

Sat, 7 Dec 2024

Mysterious Crossword

Sun, 24 Nov 2024

Holding on to Hardware