Sat, 29 Jun 2013

8-bit Paleontology

— SjG @ 9:28 am

My maternal grandfather was a paleontologist of some note, yet it was no small surprise that L— broached the subject upon the occasion of my recent visit.

L—, a colleague from former heady days in the dot-com era and now a Web 2.0 millionaire in her own right, has used some of her new-found wealth to indulge in her original academic interest, the study of 8-bit Paleontology. In what was a rare honor, L— permitted me to view her private collection.

This collection, housed in the Library of her estate, is a well-lit room, dominated by the mounted skeleton of a contemporary Homocongregus namco. Surrounding this modern skeleton, she has a splendid display of fossils of some of the great precursor species, including Homocongregus dotovorus and Homocongregus arcada.

Today, however, she was pleased to show her latest acquisition. Opening a velvet-lined box, she revealed to me what is arguably the world’s best example of the immediate ancestor to Homocongregus namco: a gorgeous fossil of Homocongregus Phasmaprosecutus preserved in stone for all eternity in the very act of consuming a row of equally well preserved Pacdotus powerupi. It is truly a stunning piece, worthy of any great museum, and incontrovertible physical record of the evolutionary path of the species.

She kindly allowed me to photograph this specimen, thus I am able to present this image to share with the world, and contribute to the greater good of science:
Homocongregus Phasmaprosecutus

Thu, 13 Jun 2013

Failures in Image Processing

— SjG @ 10:00 am

I have a tendency to have ideas that seem simple until I attempt the implementation. Then, all of the complexities jump out at me, and I give up and move on to the next idea. Once in a while, however, I’ll go so far as to prototype something or play around before abandoning ship. This here’s an example of the latter.

The Big Idea hit when I looking up a recruiter who had emailed me out of the blue. He shared the name of someone I went to school with, and I wanted to see if it was, in fact, the same person. In this case, a quick Google Images search on the name and job title indicated that it was not the same person, so I didn’t have to fake remembered camaraderie.

While searching, though, I thought it interesting the variety of faces that showed up for that name. Hm. Wouldn’t it be cool, thought I, if I could enter a name into my simple little web service, and it would spit back the “average face” of the first ten or twenty matching images from Google? After writing a few pages of code, scrapping them, switching languages and libraries, writing a few pages of code, scrapping them, switching languages and libraries again, writing a few more pages of code, I then ditched the whole enterprise.

In the process, I did use some morphing software to test the underlying concept. Here is the average face for my picture mixed in with the first seven Samuel Goldstein images that were clear enough, angled correctly, and of sufficient size to work with:

For what it’s worth, here are a few of the software challenges to automating something like this:

  • Extracting the portrait from the background. This isn’t critical, but will simplify subsequent tasks.
  • Scaling the heads to be the same size.
  • Aligning the images more or less. I was going to use the eyes as the critical alignment points; if they couldn’t be within a certain degree of accuracy, this would suggest incompatible images (e.g., one a 3/4 portrait, the other straight on).
  • Detail extraction. This is finding key points that match on each image. Experimentally, when combining images by hand, it may be sufficient to match:
    • 4 points for each eye, marking the boundaries
    • 3 points marking the extremities of the nostrils and bottom center of septum
    • 5 points marking ear boundaries: top point where they connect to the head, top of extension, center, bottom of lobe, point where lobe connects to head
    • 5 points marking top outer edges, outer angle, and center of mandible
    • 5 points mapping hairline
    • 5 points mapping top of hair
    • 7 points along the superciliary ridge
  • Interpolate these points on each pair of images, then on each interpolated pair, and so on until a single final interpolation is achieved
  • Render the image to a useful format
  • View image, and laugh (or cry)

A few other lessons learned:

The typical picture returned by Google images search for a name will be a thumbnail portrait. It’ll be small — maybe 150 by 300 pixels or so. While that’s enough data to recognize a face, it’s not a lot of data when you start manipulating it. Ideally, for nicer results, source images should be a minimum of two or three times that size.

Google gives back different results depending on whether you surround the name with quotes or not; it also makes a big difference if you pass size or color parameters. The “face” search is a good parameter to use, although when searching for “Samuel Goldstein” face search inexplicably had lots of Hitlers and Osama Bin Ladens. The “safe search” feature is also strongly recommended for this project — again, when searching for “Samuel Goldstein” without safe search yielded a variety of unexpected vaginas.

Bing image search gives different results (not too surprisingly), but they also have some anomalies. My search brought back many of the same pictures, along with an inexplicable collection of calabashes and LP labels.

If any ambitious programmers go ahead and implement this, please send me the link when you’re done!

Wed, 12 Jun 2013

Smart fixtures for testing

— SjG @ 1:28 pm

Because we’re not completely insane, we run automated testing of the web sites and web-based applications we develop. Because we are busy, we probably don’t do as much testing as we’d like. While there’s always room for improvement, though, having both unit tests and functional tests is a huge, huge win.

For enterprise intranet sites, there are a lot of things which are time-based. There are documents that get published or expired on given dates, users who receive notifications on a schedule, financial tables that depend on the fiscal year, and so on. If you test these functions (and, of course, you do), you may discover your automated tests suddenly fail on the first day of the quarter or some other threshold. This will likely be due to hard-coded dates in your test fixtures (and surely not because of boundary-condition bugs in your code).

There is an easy solution. Where it’s appropriate, you can make your fixtures “smart” by using adaptive logic. Messy, beautiful functions like PHP’s strtotime make this easy.

For example, here’s a snippet from a fixture for a Yii-based project. It’s for a message data table, and I want to be able to guarantee that there’s at least one valid message and one expired message no matter when I’m testing:

"Content"=>"There will be an important announcement on ".date('m/d/Y',strtotime('first day of next month')),
"PublicationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of last month')),
"ExpirationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of next month -1 second')),
"Subject"=>"Monthly Results",
"Content"=>"This message is never current",
"PublicationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of last month -2 month')),
"ExpirationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of this month -1 day')),

With a fixture like that, I can make safe assumptions about time-related displays no matter what the current date is.