Tue, 20 Dec 2011

Kerning Pairs

— SjG @ 11:22 pm

I’ve been playing around with font creation for a couple of projects (more on that will be posted here at some point). One of the more surprising aspects of computer typography is the sheer complexity of it — I may have once naively thought that just it was just a matter of splatting characters … er … glyphs out to some display device based on simple shapes, but I was sadly mistaken. In fact, True Type and its successor Open Type not only use complex mathematical equations for creating the curves that define font outlines, but they also contain rules for scaling, hints for rendering these “mathematically perfect” curves on a bit-mapped display, and metrics for spacing character combinations. Open Type has its own internal language for doing such complex tasks as replacing some glyph pairs with ligatures, or doing fancy substitutions of glyphs depending on the surrounding glyphs or other rules. This allows ambitious font designers to do such things as imitate handwriting or handle non-Roman languages naturally (for example, in Semitic languages, the same letter may be written quite differently if it’s at the beginning or end of a word, and sometimes also depending on where it is in the sentence).

There’s a lifetime of complexity in typography, and, as yet, I’ve only been swimming in the shallow end. Still, I was deep enough to be playing with kerning pairs. Kerning involves moving letters so they fit together nicely. For a visual demonstration and nice game, take a look here. This does more to explain kerning than anything I could write.

The program I’m using for font creation has a facility for creating kerning pair metrics. You can type in a pair of letters, and then adjust the spacing for that particular pair. Of course, you can’t really go through and tune them all1: consider the case where you only have upper case letters and digits from zero through nine. Neglecting accented characters, we’re talking 36 glyphs, or 666 combinations. Now throw in lower case, punctuation, etc, and you have an enormous list of possible combinations to tune.

But think about it for a moment. There are characters combinations that will want tuning in just about every kind of Roman-character-based font, like “VA” or “To” or “ij”. Equally, depending on your language, there are character combinations that will almost never need to be combined. For example, in English, you’ll almost never see a lowercase letter followed immediately by an uppercase, or combinations like “Yq” or “Td” or “zn” in sequence.

So in the interest of selecting kerning pairs intelligently, I wrote a script to analyze character combinations. My target audience is English-speakers, so for my source data, I used English-language texts. But which English texts to use? Being an absurdist, I selected Emma by Jane Austen, At The Mountains of Madness by H. P. Lovecraft, The Adventures of Tom Sawyer, by Mark Twain, An Inquiry into the Nature and Causes of the Wealth of Nations by Adam Smith, Alice, or The Mysteries, Complete by Edward Bulwer Lytton, Tales of the Jazz Age by F. Scott Fitzgerald, Tarzan of the Apes by Edgar Rice Burroughs, An Unsocial Socialist by George Bernard Shaw, the collected writings of Thomas Jefferson, the complete works of William Shakespeare, the Project Gutenberg license text, and the Unix version of the English Dictionary that lives in /usr/share/dict/words.

To analyze the data, I loaded up the text, and stripped out all but the letters, digits, and the following punctuation: period, single-quote, double-quotes, exclamation mark, question mark, comma, semicolon, colon, left parenthesis, and right parenthesis2. I took all of the two-character combinations, and filtered out all pairs where one character was a space. Then I simply counted the number of instances.

Of course, the statistical analysis doesn’t match the experience of reading. While the frequency of combinations that start with an uppercase character followed by a lowercase character is low, those are possibly more important than combinations of lowercase characters. After all, they start out each sentence, and are very visually prominent. Additionally, the shapes of letters increases the propensity of these combinations to need kerning adjustments. With these thoughts in mind, I generated a file of statistics from the same texts, but based solely on combinations containing an uppercase character.

You can download the lists for your own nefarious purposes. Here’s the complete list, and here’s the list containing caps. In the complete list, there is what appears to be bad data. Keep in mind that the text contained such things as Roman Numeral chapter headers, older style numeric abbreviations (e.g., “3dly” and “23d”), some currency abbreviations (e.g., “1s.6d” or “1/6d”, both of which stand for 1 shilling and sixpence), and poetic contractions (e.g., “oer,” “stol’n,”, or “capdv’d”). I also see what I suspect are errors due to imperfect OCR of the original texts.

Last, but not least, I have two files which are my collection of The 128 Vitally Important Kerning Pairs and The 255 Important Kerning Pairs With One Repeat which comprise the most common combinations from the other two files as a single text for examination when testing a font.

1 Ideally, the way you define the spacing of the glyphs themselves saves you from having to tune all combinations. Most should start out looking pretty good. But you do, of course, want your font to lay out perfectly, hence the rest of this discussion.

2 This was admittedly an arbitrary choice of allowable punctuation. I also excluded accented characters like ü and à which would obviously need to be taken into consideration for many European languages. Since my focus was on English, I deemed them rare enough to ignore.

Fri, 11 Nov 2011

Imaginary Tech Conversation

— SjG @ 3:14 pm

Coder #1: Man, revision control sucks.
Coder #2: What? Are you insane?
Coder #1: Insane? I don’t think so.
Coder #2: How can you say revision control sucks, then?
Coder #1: Well, I’m stuck using SVN.
Coder #2: Subversion isn’t so bad. Especially with the new merge tracking stuff from version 1.5. It’s pretty easy to use.
Coder #1: Yeah, well, I stand by my assertion. It sucks.
Coder #2: Can you be more specific?
Coder #1: Have you tried using SVN in a mixed environment? Say Windows, Linux, and Mac OS?
Coder #2: It’s not going to be line-ending issues. Oh, I see. You’re talking about case-sensitive versus case-insensitive filesystems.
Coder #1: I wish it was that easy. That’s crappy, but you can work around it.
Coder #2: Then?
Coder #1: A riddle for you. When is UTF-8 not UTF-8?
Coder #2: Huh?
Coder #1: When it’s in a filename.
Coder #2: [clicks, reads]
Coder #1: But I guess if you don’t have any Macs or ZFS around, you’re fine.
Coder #2: Holy Linus on a unicycle! Well, I guess it’s back to dated tar files…
Coder #3: Use GIT, you IDIOT!

Fri, 21 Oct 2011

Using PHPExcel with Yii

— SjG @ 2:18 pm

I was running into problems with conflicting autoloaders when I tried to use PHPExcel with the Yii framework. There are instructions out on the web on how to resolve this, but they are only useful if you’re not creating instances of Yii active record models while you’re using PHPExcel. That’s fine if you’re using Yii to create an Excel spreadsheet from already instantiated Yii classes, but my need is to read an Excel spreadsheet and import its data into my database.

It turns out that this is fairly straightforward as well. The trick is in realizing that PHPExcel is registering its autoloader when it’s instantiated, so you don’t need to register it yourself.

public function actionImport()
  $message = '';
  if (!empty($_POST))
    $file = CUploadedFile::getInstanceByName('import');
    $spec = Yii::app()->basePath.'/data/imports/'.$file->name;
    spl_autoload_register(array('YiiBase', 'autoload'));
    try {
         $inputFileType = PHPExcel_IOFactory::identify($spec); 
         $objReader = PHPExcel_IOFactory::createReader($inputFileType);  
         if ($inputFileType != 'CSV')
        $objPHPExcel = $objReader->load($spec); 
        $objWorksheet = $objPHPExcel->setActiveSheetIndex(0);
        $highestRow = $objWorksheet->getHighestRow();
        for ($row = 1;$row < $highestRow+1; $row++)
             $myObjThing = new MyObject; // Yii AR model
             $myObjThing->someField = $objWorksheet->getCellByColumnAndRow(1, $row)->getValue();
             $myObjThing->otherField = $objWorksheet->getCellByColumnAndRow(5, $row)->getValue();
             $myObjThing->detachBehaviors(); // PHP < 5.3 memory management
    catch (Exception $e)
       $message = 'There was a problem handling your file. Technical details: '.$e->getMessage();
    if (! empty($message))

Pretend WordPress didn’t hose the formatting on that …

Tue, 18 Oct 2011

Publishing Old Projects

— SjG @ 9:52 pm

I’ve been publishing a bunch of old projects that I may have posted here, or simply left on my hard drive to suffer the slings and arrows of outrageous bit-rot. Most of these are projects that I created for some specific purpose or another, and have either coded to the point where I’m satisfied with them, or abandoned them.

I’m publishing this stuff in the hopes that it’ll be useful to somebody somewhere. In some cases, the code’s primary use may be as an example of how not to accomplish a task. In other cases, they’re projects that are being used in mission-critical operations, and so are reasonably robust.

I’ll be maintaining them on GitHub, if you want to get creative with the definition of “maintaining.”

Sat, 24 Sep 2011

Ffun ffmpeg ffunctionality

— SjG @ 2:59 pm

I’ve been processing a collection of product videos which came to me in a huge variety of sizes, aspect ratios, and qualities. I need to re-encode them to work in HTML 5, but, more importantly, I need to make them fit into a common player space on the web page.

It turns out that newer versions of ffmpeg support not only cropping, but also padding, and you can even do both operations at once!

For example, I had a source video that was originally 16:9, but had been letterboxed to 4:3, and then had two different sets of labels added. I needed to crop out the letterboxed portion and the top set of labels, and make the result fit nicely into 16:9. So I used VLC, a screen capture utility, and Photoshop to get the measurements. Then I used ffmpeg to crop the relevant section and pad it out to fit into my space (in this case, I’m left aligning the video in the padded output):

ffmpeg -i original/converted.wmv -vf crop=394:295:6:0,pad=524:295:0:0:0xFFFF00 -sameq

That’s cropping a 394 x 295 piece out of the original video (with the origin at 6 pixels from the left, and 0 pixels from the top), and then padding it out to 524 x 295 filling the padded area with bright yellow. The 524 x 295 is really close to 16:9 — and in a later process, it gets resized to the more standard 480 x 2721.

You can string together the padding and cropping in either order, depending on the effect you’re trying to achieve.

1I’m sure some educated person out there could tell me why video standards are so confused/confusing, down to the non-square pixels. While a true 16:9 would dictate 480 x 270 pixels, everybody seems to use 480 x 272. Why? The only thing I can figure out is that 272 is evenly divisible by a power of 2, which probably made display hardware cheaper to manufacture. As you can see, my resizing adds a bit of distortion, but at these resolutions, it doesn’t really matter.