I have a tendency to have ideas that seem simple until I attempt the implementation. Then, all of the complexities jump out at me, and I give up and move on to the next idea. Once in a while, however, I’ll go so far as to prototype something or play around before abandoning ship. This here’s an example of the latter.
The Big Idea hit when I looking up a recruiter who had emailed me out of the blue. He shared the name of someone I went to school with, and I wanted to see if it was, in fact, the same person. In this case, a quick Google Images search on the name and job title indicated that it was not the same person, so I didn’t have to fake remembered camaraderie.
While searching, though, I thought it interesting the variety of faces that showed up for that name. Hm. Wouldn’t it be cool, thought I, if I could enter a name into my simple little web service, and it would spit back the “average face” of the first ten or twenty matching images from Google? After writing a few pages of code, scrapping them, switching languages and libraries, writing a few pages of code, scrapping them, switching languages and libraries again, writing a few more pages of code, I then ditched the whole enterprise.
In the process, I did use some morphing software to test the underlying concept. Here is the average face for my picture mixed in with the first seven Samuel Goldstein images that were clear enough, angled correctly, and of sufficient size to work with:

For what it’s worth, here are a few of the software challenges to automating something like this:
- Extracting the portrait from the background. This isn’t critical, but will simplify subsequent tasks.
- Scaling the heads to be the same size.
- Aligning the images more or less. I was going to use the eyes as the critical alignment points; if they couldn’t be within a certain degree of accuracy, this would suggest incompatible images (e.g., one a 3/4 portrait, the other straight on).
- Detail extraction. This is finding key points that match on each image. Experimentally, when combining images by hand, it may be sufficient to match:
- 4 points for each eye, marking the boundaries
- 3 points marking the extremities of the nostrils and bottom center of septum
- 5 points marking ear boundaries: top point where they connect to the head, top of extension, center, bottom of lobe, point where lobe connects to head
- 5 points marking top outer edges, outer angle, and center of mandible
- 5 points mapping hairline
- 5 points mapping top of hair
- 7 points along the superciliary ridge
- Interpolate these points on each pair of images, then on each interpolated pair, and so on until a single final interpolation is achieved
- Render the image to a useful format
- View image, and laugh (or cry)
A few other lessons learned:
The typical picture returned by Google images search for a name will be a thumbnail portrait. It’ll be small — maybe 150 by 300 pixels or so. While that’s enough data to recognize a face, it’s not a lot of data when you start manipulating it. Ideally, for nicer results, source images should be a minimum of two or three times that size.
Google gives back different results depending on whether you surround the name with quotes or not; it also makes a big difference if you pass size or color parameters. The “face” search is a good parameter to use, although when searching for “Samuel Goldstein” face search inexplicably had lots of Hitlers and Osama Bin Ladens. The “safe search” feature is also strongly recommended for this project — again, when searching for “Samuel Goldstein” without safe search yielded a variety of unexpected vaginas.
Bing image search gives different results (not too surprisingly), but they also have some anomalies. My search brought back many of the same pictures, along with an inexplicable collection of calabashes and LP labels.
If any ambitious programmers go ahead and implement this, please send me the link when you’re done!