fogbound.net




Page 10 of 60« First...89101112...2030...Last »

Sat, 29 Jun 2013

8-bit Paleontology

— SjG @ 9:28 am

My maternal grandfather was a paleontologist of some note, yet it was no small surprise that L— broached the subject upon the occasion of my recent visit.

L—, a colleague from former heady days in the dot-com era and now a Web 2.0 millionaire in her own right, has used some of her new-found wealth to indulge in her original academic interest, the study of 8-bit Paleontology. In what was a rare honor, L— permitted me to view her private collection.

This collection, housed in the Library of her estate, is a well-lit room, dominated by the mounted skeleton of a contemporary Homocongregus namco. Surrounding this modern skeleton, she has a splendid display of fossils of some of the great precursor species, including Homocongregus dotovorus and Homocongregus arcada.

Today, however, she was pleased to show her latest acquisition. Opening a velvet-lined box, she revealed to me what is arguably the world’s best example of the immediate ancestor to Homocongregus namco: a gorgeous fossil of Homocongregus Phasmaprosecutus preserved in stone for all eternity in the very act of consuming a row of equally well preserved Pacdotus powerupi. It is truly a stunning piece, worthy of any great museum, and incontrovertible physical record of the evolutionary path of the species.

She kindly allowed me to photograph this specimen, thus I am able to present this image to share with the world, and contribute to the greater good of science:
Homocongregus Phasmaprosecutus


Thu, 13 Jun 2013

Failures in Image Processing

— SjG @ 10:00 am

I have a tendency to have ideas that seem simple until I attempt the implementation. Then, all of the complexities jump out at me, and I give up and move on to the next idea. Once in a while, however, I’ll go so far as to prototype something or play around before abandoning ship. This here’s an example of the latter.

The Big Idea hit when I looking up a recruiter who had emailed me out of the blue. He shared the name of someone I went to school with, and I wanted to see if it was, in fact, the same person. In this case, a quick Google Images search on the name and job title indicated that it was not the same person, so I didn’t have to fake remembered camaraderie.

While searching, though, I thought it interesting the variety of faces that showed up for that name. Hm. Wouldn’t it be cool, thought I, if I could enter a name into my simple little web service, and it would spit back the “average face” of the first ten or twenty matching images from Google? After writing a few pages of code, scrapping them, switching languages and libraries, writing a few pages of code, scrapping them, switching languages and libraries again, writing a few more pages of code, I then ditched the whole enterprise.

In the process, I did use some morphing software to test the underlying concept. Here is the average face for my picture mixed in with the first seven Samuel Goldstein images that were clear enough, angled correctly, and of sufficient size to work with:
s0123_4567

For what it’s worth, here are a few of the software challenges to automating something like this:

  • Extracting the portrait from the background. This isn’t critical, but will simplify subsequent tasks.
  • Scaling the heads to be the same size.
  • Aligning the images more or less. I was going to use the eyes as the critical alignment points; if they couldn’t be within a certain degree of accuracy, this would suggest incompatible images (e.g., one a 3/4 portrait, the other straight on).
  • Detail extraction. This is finding key points that match on each image. Experimentally, when combining images by hand, it may be sufficient to match:
    • 4 points for each eye, marking the boundaries
    • 3 points marking the extremities of the nostrils and bottom center of septum
    • 5 points marking ear boundaries: top point where they connect to the head, top of extension, center, bottom of lobe, point where lobe connects to head
    • 5 points marking top outer edges, outer angle, and center of mandible
    • 5 points mapping hairline
    • 5 points mapping top of hair
    • 7 points along the superciliary ridge
  • Interpolate these points on each pair of images, then on each interpolated pair, and so on until a single final interpolation is achieved
  • Render the image to a useful format
  • View image, and laugh (or cry)

A few other lessons learned:

The typical picture returned by Google images search for a name will be a thumbnail portrait. It’ll be small — maybe 150 by 300 pixels or so. While that’s enough data to recognize a face, it’s not a lot of data when you start manipulating it. Ideally, for nicer results, source images should be a minimum of two or three times that size.

Google gives back different results depending on whether you surround the name with quotes or not; it also makes a big difference if you pass size or color parameters. The “face” search is a good parameter to use, although when searching for “Samuel Goldstein” face search inexplicably had lots of Hitlers and Osama Bin Ladens. The “safe search” feature is also strongly recommended for this project — again, when searching for “Samuel Goldstein” without safe search yielded a variety of unexpected vaginas.

Bing image search gives different results (not too surprisingly), but they also have some anomalies. My search brought back many of the same pictures, along with an inexplicable collection of calabashes and LP labels.

If any ambitious programmers go ahead and implement this, please send me the link when you’re done!


Wed, 12 Jun 2013

Smart fixtures for testing

— SjG @ 1:28 pm

Because we’re not completely insane, we run automated testing of the web sites and web-based applications we develop. Because we are busy, we probably don’t do as much testing as we’d like. While there’s always room for improvement, though, having both unit tests and functional tests is a huge, huge win.

For enterprise intranet sites, there are a lot of things which are time-based. There are documents that get published or expired on given dates, users who receive notifications on a schedule, financial tables that depend on the fiscal year, and so on. If you test these functions (and, of course, you do), you may discover your automated tests suddenly fail on the first day of the quarter or some other threshold. This will likely be due to hard-coded dates in your test fixtures (and surely not because of boundary-condition bugs in your code).

There is an easy solution. Where it’s appropriate, you can make your fixtures “smart” by using adaptive logic. Messy, beautiful functions like PHP’s strtotime make this easy.

For example, here’s a snippet from a fixture for a Yii-based project. It’s for a message data table, and I want to be able to guarantee that there’s at least one valid message and one expired message no matter when I’m testing:


"message_1"=>array(
"Id"=>1,
"Subject"=>"Meeting",
"Content"=>"There will be an important announcement on ".date('m/d/Y',strtotime('first day of next month')),
"PublicationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of last month')),
"ExpirationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of next month -1 second')),
),
"message_2"=>array(
"Id"=>2,
"Subject"=>"Monthly Results",
"Content"=>"This message is never current",
"PublicationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of last month -2 month')),
"ExpirationDate"=>date('Y-m-d H:i:s',strtotime('midnight first day of this month -1 day')),
),

With a fixture like that, I can make safe assumptions about time-related displays no matter what the current date is.


Fri, 24 May 2013

Javascript approximation for Pi

— SjG @ 11:01 am

Based on this tweet, I now have the ultimate JavaScript approximation for Pi (Π), which I think we can all agree is preferable in every way to the outmoded Math.PI:

var pi=((++[+[]][+[]]+[]+ ++[+[]][+[]]+[])* ++[+[]][+[]])*(++[+[]][+[]]+ ++[+[]][+[]])/((+[+[]]+'x'+(![]+[])[[+!+[]+!+[]]*[+!+[]+!+[]]])/(++[+[]][+[]]+ ++[+[]][+[]]));


Sun, 21 Apr 2013

Measuring network traffic between two hosts

— SjG @ 10:28 am

For a project that communicates over an expensive network connection (i.e., one that charges by the kilobyte), I needed to find out exactly how many bytes a specific process was going to transfer between my source host and a destination machine. For my own nefarious purposes, I need to know how many bytes of payload data I’m sending/receiving, but I also need to know the true total data transfer, including TCP/IP headers, etc.

Over the years, I have accumulated a few tricks for measuring this sort of thing. Usually, though, I’ve had to measure one particular kind of traffic (specifically, HTTP) — in which case, it’s not hard to set up a proxy using nc. In this latest case, however, the process not only uses HTTP/HTTPS, but ssh to issue remote commands, so I need to monitor all TCP/IP traffic between the machines.

There are other tools that are sometimes helpful. For example, to see what’s using up bandwidth at a given moment, a tool like iftop is great. Unfortunately, I need to know the aggregates, and iftop doesn’t log to a file in a way that I can use.

If I were on a pure Linux environment, it looks like IPTraf would do what I want, but I’m using a Mac.

I don’t doubt that there are much better approaches out there1, but here’s what I used (pretending that the remote host was at IP 192.168.1.100):

sudo tcpdump -e host 192.168.1.100 > net_process_log.txt
perl -p -i.bak -e 's/(.*?)length (\d+):(.*)length (\d+)/$2,$4/g' net_process_log.txt
cut -d , -f 1 net_process_log.txt > actual_size.txt
cut -d , -f 2 net_process_log.txt > data_size.txt
awk '{s+=$1} END {print s}' actual_size.txt
138099
awk '{s+=$1} END {print s}' data_size.txt
102412

So, in my example, I’m using tcpdump to output all traffic between my machine and the host 192.168.1.100. Typical records output from tcpdump looks like:

11:13:23.834080 xx:xx:xx:xx:xx:xx (oui Unknown) > xx:xx:xx:xx:xx:xx (oui Unknown), ethertype IPv4 (0x0800), length 292: dvr.home.http > apotheosis.home.64602: Flags [P.], seq 1:227, ack 468, win 1354, options [nop,nop,TS val 132300159 ecr 1750033024], length 226
11:13:23.834081 xx:xx:xx:xx:xx:xx (oui Unknown) > xx:xx:xx:xx:xx:xx (oui Unknown), ethertype IPv4 (0x0800), length 66: dvr.home.http > apotheosis.home.64602: Flags [.], ack 469, win 1354, options [nop,nop,TS val 132300159 ecr 1750033026], length 0

There are two lengths specified: the first is the actual packet size, and the second is the payload of the packet. As you can see in the second packet, the payload is zero bytes, but the packet length is 66 bytes.

In any case, I use perl to extract the two lengths into a comma-delimited file, cut to split out the columns, and awk to add them up. It’d be trivial to do all these steps together in a short perl program, but I like keeping around tons of obscure text files from forgotten procedures on my hard drive. Well, actually, I did it this way so I could sanity-check the intermediate steps.

So, this taught me that my process transferred 102,412 bytes of payload, and with the TCP/IP packet overhead transmitted a total of 138,099 bytes.

1 I didn’t discover bwm-ng until after I did my measurement. It looks like it might be a good solution as well. I probably could have used Wireshark too.


Page 10 of 60« First...89101112...2030...Last »