Fri, 2 Jan 2009

Address Book

— SjG @ 3:09 pm

So. Millions of people use Apple’s Someone must have a solution to the problem of households … but I somehow have been left in the dark.

Stated more precisely:
I have numerous friends for whom I have multiple contacts. For example, Alice Code and Bob Crypto are what we now call pairbonded partners. They share a common home address, but separate cell phones, email, work addresses, work email, etc. So I end up keeping three records for the two of them: one for Alice, one for Bob, and one for Bob and Alice.

Now, three records for two people isn’t so onerous, particularly when there’s not a lot of redundant information. But then, if they happen to have a land-line, and change the number, I have to update three records (unless I only put home number for the common record, in which case I’d have to remember this protocol). And the fact is that there are a lot of redundant notes I try to keep, such as anniversaries, favorite charities, etc. This makes the system unwieldy, especially since my whole goal is to reduce the amount of stuff I have to remember and maintain.

So how do people in the real world manage this? Bonus points for solutions that will still work when I sync to my Palm-OS based phone!

Sat, 13 Sep 2008

Generating Plausible Fake User Data

— SjG @ 6:45 pm

So it’s a familiar problem, where you’re developing a data-driven application, and you want to optimize the queries that will run against your database (I’ll have more interesting stuff on this later). The problem, of course, is that to really optimize those queries, you need a lot of sample data.

So I needed to do some address lookup code against a huge collection of users. But because there was the possibility of having to demo the prototype, I really didn’t want 100,000 users named “Foo McBar” living at “10101 Binary Place.” So, with the help of the almighty Internet, the all-frobnicating Perl, and the all-knowing US Bureau of the Census, I created a quick, semi-flexible script to generate people with plausible names and addresses that, if not Google-mappable, at least had agreement on city/state/zip. The city/state/zip is a collection of 250 random zip codes. If you have good zip code data, you can easily extend this to be complete! Names are generated from the most popular forenames and surnames, with a probabilistic bias towards the most common ones. The script also allows you to specify “pick one of n item” type fields, pick a number from a range, plausible email addresses, not-very-plausible phone numbers with or without extensions, and the ability to export as CSV or tab-delimited.

In principle, this should be easy to adapt to other countries, although you’ll need lists of common first names, surnames, street names, and a way of mapping cities to regions, states, districts, cantons, or whatever’s appropriate.

You can grab a copy of it here. It requires a Perl interpreter with the Text::CSV and Getopt::Long CPAN modules.

Usage: [OPTIONS]
   -t, --header : header, a colon-delimited list of column headers
   -f, --format : format string, a colon-delimited list of column contents
       data types:
         fn - first name
         ln - last name
         a1 - street address
         a2 - apartment number
         c - city*
         s - state*
         z - zip 5*
         e - email address
         pne - phone (US), no extension
         pwe - phone (US), with extension
         [a,b,c] - one of a, b, or c
         {a,b,c} - one of a, b, or c in decreasing probability
         [x-y] - a number between x and y, inclusive

         * city, state, and zip will be agree to create a valid address
           if you need multiple addresses, use the code ! to reset the
           synch. The reset works on a left-to-right scan of the format string.

   -n, --number : number of records to create

  -c, --csv : output CSV format (otherwise, tab-delimited).
  -v, --(no)verbose : verbose mode (default false)


Viajante:samuelg$ --header "First:Last:Age:Email" --format "fn:ln:[10-100]:e" -n 5 --c

or, more exotically:

Viajante:samuelg$ --header "First Name:Last Name:Address:City:State:Zip:Super Power" --format "fn:ln:a1:c:s:z:[Invisibility,Invincibility,X-Ray Vision,Flight,Likes Squirrels]" -n 5 -c
"First Name","Last Name",Address,City,State,Zip,"Super Power"
Roseanna,Best,"8821 7th Str.",Manati,PR,00674,Flight
Euna,Crawford,"8195 Lee Str.","Fort Washington",PA,19034,Invincibility
Ted,Williams,"7140 Birch Ave.",Monroe,CT,06468,Invincibility
Mariano,Miranda,"2657 1st Way",Lyford,TX,78569,Flight
Tammy,Flowers,"2135 Washington Blvd.",Duluth,MN,55806,"Likes Squirrels"


Wed, 28 Nov 2007

Aperture Import, Continued.

— SjG @ 1:36 pm

I’ve done some more work on the Aperture Importer (background here), and the latest is attached below. It now does some reformatting of keywords that get split (e.g., “San” and “Diego” can be merged to “San Diego” as a keyword.) It’s hacky and ugly. You’ll have to set up your own keywords for this kind of merge.

I’ve found a couple of apparent Aperture bugs.

If I tell Aperture to import an empty directory from Applescript, it’ll stall and lock up Aperture.

Worse, I find that if I do a large import (more than, say, 5,000 images), Aperture grabs a bunch of memory that never gets released. Well, Aperture itself doesn’t grab the memory, but it causes the kernel_task “process” to allocate a big pile of real memory, which it seems to hold on to until reboot.

It’s a cumulative thing: if I import 5,000 images, the memory gets grabbed. Then, if I do another 5,000 image import, the memory usage doubles. Thinking it would be handled by swapping, I didn’t worry, and continued. This was a bad idea. Aperture locked up, but so did the whole OS. The last thing I could see from top was that 100% of my real memory was allocated, that less than 256M of swap was in use. I had at least 50GB of disk free, so that wasn’t the problem.

Anyway, for safety, if you use this import script, I recommend rebooting between import sessions. Yeah, it’s voodoo, but it’s guaranteed to work.

Aperture Importer Update

Sat, 17 Nov 2007

Aperture Import Script

— SjG @ 7:14 pm

So, after the demise/µsoftification of iView Media Pro, the time came to switch to Aperture.

However, while Aperture is mighty powerful, its limitation of 10,000 images in a project makes import of my photos difficult. What’s more, my … er … unique system of “organization” doesn’t natively work well with Aperture. My attempt at organization, which predates such things as iView, Aperture, or even a usable Bridge, is predicated on the idea of the filesystem as a hierarchical database of sorts.

For example, I start with a directory called “photos” and within it are directories for “animals,” “events,” “people,” “places,” “things,” “projects,” etc. Within “events” are “political,” “work,” “family,” etc. With each of these are either further taxonomic directories, or what might be equivalent to rolls-of-film directories, e.g., “MothersDay-2007-05-13” or “CocktailsAtBerris-2000-06-14.” Directories are all Unix-friendly (no spaces or crazy punctuation) and are generally CamelCase for multiple words.

So I went through a frustrating attempt to write a good Applescript importer. The problem with languages like Applescript (or Javascript implementations embedded in Adobe products) is that they promise more than they can deliver. They’re designed to interact with Applications, but generally don’t have rich access to application functionality. Why can’t I create a folder in my Aperture library in Applescript? Why can’t I get/set a single pixel in Photoshop with Javascript. Yes, I know both are possible through some crazy GUI-script calls or cryptic Event IDs, but why give me the equivalent of object-oriented access and then leave out all the important methods?

Well, enough ranting. With a lot of help from others who have gone before me and posted comments and, even better, code, I hacked together something that will read in my hierarchy, create a new Aperture Project for each leaf node on my tree, and convert the path to that node into a set of keywords which it will apply.

So “photos/events/family/MothersDay-2007-05-13” will become a project named “MothersDay-2007-05-13” and the images within it will all be tagged with the keywords: Events, Family, Mothers, Day, 2007-05-13. It’l also throw in copyright notices and author name.

There’re provisions for excluding words from becoming tags (e.g., “and”) as well as special case code for directories named “misc,” which I often use as catchalls for a taxonomical branch — these get named for the parent directory plus the “misc” (e.g. “ArthopodsMisc”).

Perfect? No. Better than doing it manually? Yes.

In any case, here it is:
Aperture Importer

Fri, 9 Nov 2007

Finding File and Directory Counts

— SjG @ 3:31 pm

So, in the process of organizing photographs, I wanted to examine my deeply-nested hierarchy to figure out how it’s possible I have 30,000 images (Aperture only wants me to have 10,000 in a project, so I need to re-organize the hierarchy even before I import).

So, I figured it’d be easy to use find to list all my directories, and how many images they contain. It turns out that (at least for me) it’s not.

My best stab so far is to use find and a loop, which gives me almost what I want (it not only includes the count of images in the each directory, but subdirectories as well). It fails if there are too many directories. It’s good enough. But it’s not elegant.

So CLI Deities — how would you make this pretty?

find . -type d | while read dir; do echo `ls -1 "$dir" | wc -l` $dir; done

Potential type-face issue disambiguation: after the ls, that first argument is a one, not an ell, although I suppose an ell would work too. The wc option is an ell.