Tue, 20 Dec 2011

Kerning Pairs

— SjG @ 11:22 pm

I’ve been playing around with font creation for a couple of projects (more on that will be posted here at some point). One of the more surprising aspects of computer typography is the sheer complexity of it — I may have once naively thought that just it was just a matter of splatting characters … er … glyphs out to some display device based on simple shapes, but I was sadly mistaken. In fact, True Type and its successor Open Type not only use complex mathematical equations for creating the curves that define font outlines, but they also contain rules for scaling, hints for rendering these “mathematically perfect” curves on a bit-mapped display, and metrics for spacing character combinations. Open Type has its own internal language for doing such complex tasks as replacing some glyph pairs with ligatures, or doing fancy substitutions of glyphs depending on the surrounding glyphs or other rules. This allows ambitious font designers to do such things as imitate handwriting or handle non-Roman languages naturally (for example, in Semitic languages, the same letter may be written quite differently if it’s at the beginning or end of a word, and sometimes also depending on where it is in the sentence).

There’s a lifetime of complexity in typography, and, as yet, I’ve only been swimming in the shallow end. Still, I was deep enough to be playing with kerning pairs. Kerning involves moving letters so they fit together nicely. For a visual demonstration and nice game, take a look here. This does more to explain kerning than anything I could write.

The program I’m using for font creation has a facility for creating kerning pair metrics. You can type in a pair of letters, and then adjust the spacing for that particular pair. Of course, you can’t really go through and tune them all1: consider the case where you only have upper case letters and digits from zero through nine. Neglecting accented characters, we’re talking 36 glyphs, or 666 combinations. Now throw in lower case, punctuation, etc, and you have an enormous list of possible combinations to tune.

But think about it for a moment. There are characters combinations that will want tuning in just about every kind of Roman-character-based font, like “VA” or “To” or “ij”. Equally, depending on your language, there are character combinations that will almost never need to be combined. For example, in English, you’ll almost never see a lowercase letter followed immediately by an uppercase, or combinations like “Yq” or “Td” or “zn” in sequence.

So in the interest of selecting kerning pairs intelligently, I wrote a script to analyze character combinations. My target audience is English-speakers, so for my source data, I used English-language texts. But which English texts to use? Being an absurdist, I selected Emma by Jane Austen, At The Mountains of Madness by H. P. Lovecraft, The Adventures of Tom Sawyer, by Mark Twain, An Inquiry into the Nature and Causes of the Wealth of Nations by Adam Smith, Alice, or The Mysteries, Complete by Edward Bulwer Lytton, Tales of the Jazz Age by F. Scott Fitzgerald, Tarzan of the Apes by Edgar Rice Burroughs, An Unsocial Socialist by George Bernard Shaw, the collected writings of Thomas Jefferson, the complete works of William Shakespeare, the Project Gutenberg license text, and the Unix version of the English Dictionary that lives in /usr/share/dict/words.

To analyze the data, I loaded up the text, and stripped out all but the letters, digits, and the following punctuation: period, single-quote, double-quotes, exclamation mark, question mark, comma, semicolon, colon, left parenthesis, and right parenthesis2. I took all of the two-character combinations, and filtered out all pairs where one character was a space. Then I simply counted the number of instances.

Of course, the statistical analysis doesn’t match the experience of reading. While the frequency of combinations that start with an uppercase character followed by a lowercase character is low, those are possibly more important than combinations of lowercase characters. After all, they start out each sentence, and are very visually prominent. Additionally, the shapes of letters increases the propensity of these combinations to need kerning adjustments. With these thoughts in mind, I generated a file of statistics from the same texts, but based solely on combinations containing an uppercase character.

You can download the lists for your own nefarious purposes. Here’s the complete list, and here’s the list containing caps. In the complete list, there is what appears to be bad data. Keep in mind that the text contained such things as Roman Numeral chapter headers, older style numeric abbreviations (e.g., “3dly” and “23d”), some currency abbreviations (e.g., “1s.6d” or “1/6d”, both of which stand for 1 shilling and sixpence), and poetic contractions (e.g., “oer,” “stol’n,”, or “capdv’d”). I also see what I suspect are errors due to imperfect OCR of the original texts.

Last, but not least, I have two files which are my collection of The 128 Vitally Important Kerning Pairs and The 255 Important Kerning Pairs With One Repeat which comprise the most common combinations from the other two files as a single text for examination when testing a font.

1 Ideally, the way you define the spacing of the glyphs themselves saves you from having to tune all combinations. Most should start out looking pretty good. But you do, of course, want your font to lay out perfectly, hence the rest of this discussion.

2 This was admittedly an arbitrary choice of allowable punctuation. I also excluded accented characters like ü and à which would obviously need to be taken into consideration for many European languages. Since my focus was on English, I deemed them rare enough to ignore.

Sat, 10 Dec 2011

To all my Pastafarian friends

— SjG @ 10:14 am

Merry ChriFSMas!


Filed in:

Fri, 9 Dec 2011

DSL and Red Herrings

— SjG @ 6:33 pm

On Wednesday, December 7th, at 8:15 AM Pacific Standard Time, the Internet died.

That is to say, the DSL at home went down. No packets in, no packets out. Out in my “machine room,” the modem blinked its lights in a baleful hey I’m trying to sync dance. The usual tricks failed: rebooting, power cycling, yelling obscenities. From work, I could tracert down to what looked like one hop from my system, so I figured it was another blade needing a reboot in the local Covad DSLAM.

I called my ISP’s tech support, and, after the requisite delays, got escalated to a Tier 2 guy who walked me through more tests. He could see my modem, he said, when he did the line test, but it wasn’t syncing. I didn’t realize that that was possible, but I guess it’s not too surprising that one could test a circuit for connectivity on the physical level rather than the network level (as I try to remember the TCP/IP 5- or 7-layer reference model). The Tier 2 guy said that, at this point, it looked like interference on the copper lines in my house. To test, he had me plug the DSL modem into another phone jack in another part of the house. Lo and behold, it worked! I took the modem back out to my machine room, and the sync failed.

Problem isolated! I had originally done the phone wiring out to the machine room myself when we moved into the house in 2000, using two of the wires in a spare Cat-5 cable and a crimped-on 6P2C (“RJ-11”) connector. I figured that the easiest place to start would be to replace the connector, so I lopped off the old one, crimped on a new one, and tested. Voila! it worked. Great Moral Victory, etc.

On Friday, December 9th, at 8:15 AM Pacific Standard Time, the Internet died.

WTF? The DSL at home was down again. The modem would keep its I’m a well-adjusted, happily-synced modem lights lit for about ten seconds, then fall into sync-seeking mode again. What was it about 8:15 AM? That was an hour or so after the door to the machine room was opened to the outside (the cat lives in there along with the machines). The door doesn’t impinge on the phone wire at all. Could there be a temperature component to the problem? Hm. The litterbox is right up against the wall where the conduit passes. Was Quackie micturating on the circuit? No evidence to support this theory could be found. What, then? Alas, it looked like I’d have to attack the phone wiring again.

This time, I went out to where the wires enter the house. I opened the access box, and made sure the connectors were properly attached. I cleaned everything up, and went to test. No luck. But now, the modem wouldn’t sync on any jack in the house.

So it was back out to the access box, and removing all of the connections. There were a few phone lines connected that go to unused jacks in the house. Maybe degradation of copper or oxidation in one of those jacks was enough to cause my problem. So I used my connection tester, and mapped all of the lines. I disconnected two unused jacks. I neatly redid the connectors for all of the active phone lines, and put it all back together. I went back in and tested. Hey! I’ve got sync! But when I plugged a computer in to test that I was getting a good connection, the LAN circuit of the modem dropped, and it went back to trying to sync. Crap.

Next, I ran an extension cord out to where the phone lines come into the house. I opened the access box, removed all the connectors, and patched the modem directly into the incoming phone line. Hey! I’ve got sync! But once again when I plugged a computer in to the LAN circuit, the modem dropped the LAN connection and lost sync. At this point, my suspicion was that the modem itself was failing somehow.

OK, time to call tech support again.

This time, I get an agent who listens patiently to my diatribe and all of the problem symptoms without saying anything. When I let him get a word in edgewise, he starts his script at the very beginning. “What kind of modem are you using?”

“It’s a D-Link 2320b”

“And what lights are lit?”

I launch into a long description of the sync/no-sync dance sequence.

“Please tell me exactly what lights are lit and what color they are.”

This, of course, is ridiculous, since the power and status lights have red LEDs and the LAN, USB, and sync lights have green LEDs, but I play along.

“The power light is red? It should be green.”

At this point, I am doing everything I can to avoid sighing with disgust, cursing the idiot, or flinging my phone. He asks me to plug the modem into the power supply from my router (without asking make/model or any other details). I verify for myself that they’re both 12VDC devices, and do as he asks.

“What color is the power light?” he asks me again.

It’s still red, of course. The modem is doing its I’m desperately trying to start up and get synced dance, and it gets momentary sync. Then the power light changed from red to green. And the modem stayed in sync.

Once I had recovered my composure, thanked tech support, and closed the ticket, I checked the modem power supply. It was putting out 7.4VDC (although I didn’t test it under load). It was rated for 12VDC. I was flabbergasted that it lit any LEDs at all when operating at 60% voltage, much less tried to sync.

Grabbing a 12VDC wall-wart from the box of spares, I put everything back together, and the Internet lives again!

For now, anyway…

Fri, 25 Nov 2011

Eggshell brownies

— SjG @ 9:50 am

After seeing an article where they showed brownies baked in egg shells (via BoingBoing), I thought it would be a good thing to try for a family gathering. Thus began a haphazard adventure in baking…

Final verdict: a lot of work, interesting results, probably won’t be doing again.

Filed in:

Wed, 23 Nov 2011

YA Fiction

— SjG @ 10:02 am

As much as I like some of the new crop of young adult fiction, I can’t help but wonder if this phenomenon isn’t just rooted in publishers being squeamish and authors being lazy. The category allows — no, encourages — writers to be less nuanced, paint with broader strokes, and, of course, avoid sexuality altogether.

Then again, the YA Fiction phenomenon may simply be symptomatic of “non-young adults” non-reading.

Filed in: