fogbound.net

Page 2 of 4«1234»

Sat, 17 Nov 2007

Aperture Import Script

(— SjG @ 7:14 pm)

So, after the demise/µsoftification of iView Media Pro, the time came to switch to Aperture.

However, while Aperture is mighty powerful, its limitation of 10,000 images in a project makes import of my photos difficult. What’s more, my … er … unique system of “organization” doesn’t natively work well with Aperture. My attempt at organization, which predates such things as iView, Aperture, or even a usable Bridge, is predicated on the idea of the filesystem as a hierarchical database of sorts.

For example, I start with a directory called “photos” and within it are directories for “animals,” “events,” “people,” “places,” “things,” “projects,” etc. Within “events” are “political,” “work,” “family,” etc. With each of these are either further taxonomic directories, or what might be equivalent to rolls-of-film directories, e.g., “MothersDay-2007-05-13″ or “CocktailsAtBerris-2000-06-14.” Directories are all Unix-friendly (no spaces or crazy punctuation) and are generally CamelCase for multiple words.

So I went through a frustrating attempt to write a good Applescript importer. The problem with languages like Applescript (or Javascript implementations embedded in Adobe products) is that they promise more than they can deliver. They’re designed to interact with Applications, but generally don’t have rich access to application functionality. Why can’t I create a folder in my Aperture library in Applescript? Why can’t I get/set a single pixel in Photoshop with Javascript. Yes, I know both are possible through some crazy GUI-script calls or cryptic Event IDs, but why give me the equivalent of object-oriented access and then leave out all the important methods?

Well, enough ranting. With a lot of help from others who have gone before me and posted comments and, even better, code, I hacked together something that will read in my hierarchy, create a new Aperture Project for each leaf node on my tree, and convert the path to that node into a set of keywords which it will apply.

So “photos/events/family/MothersDay-2007-05-13″ will become a project named “MothersDay-2007-05-13″ and the images within it will all be tagged with the keywords: Events, Family, Mothers, Day, 2007-05-13. It’l also throw in copyright notices and author name.

There’re provisions for excluding words from becoming tags (e.g., “and”) as well as special case code for directories named “misc,” which I often use as catchalls for a taxonomical branch — these get named for the parent directory plus the “misc” (e.g. “ArthopodsMisc”).

Perfect? No. Better than doing it manually? Yes.

In any case, here it is:
Aperture Importer

Fri, 9 Nov 2007

Finding File and Directory Counts

So, in the process of organizing photographs, I wanted to examine my deeply-nested hierarchy to figure out how it’s possible I have 30,000 images (Aperture only wants me to have 10,000 in a project, so I need to re-organize the hierarchy even before I import).

So, I figured it’d be easy to use find to list all my directories, and how many images they contain. It turns out that (at least for me) it’s not.

My best stab so far is to use find and a loop, which gives me almost what I want (it not only includes the count of images in the each directory, but subdirectories as well). It fails if there are too many directories. It’s good enough. But it’s not elegant.

So CLI Deities — how would you make this pretty?

find . -type d | while read dir; do echo `ls -1 "$dir" | wc -l` $dir; done

Potential type-face issue disambiguation: after the ls, that first argument is a one, not an ell, although I suppose an ell would work too. The wc option is an ell.

Wed, 27 Jun 2007

Unix: How to find files lacking certain strings

(— SjG @ 4:10 pm)

So, I’m working on a convoluted web site, and a problem comes up. It seems that some vitally important code was not included in some pages (for the sake of argument, let’s say it’s a copyright string). This particular site has an ungodly mix of files, including .htm, .html, and .jsp files. Some of the .jsp files are actual pages, and others are stubs to be included in other .jsp pages. The majority of the full .jsp pages include a “footer.jsp” that has the desired string, so they’re good. But I need to generate a list of the full pages, of whatever sort, that lack this string.

The inverse of this problem is easy, and is the kind of thing I use all the time:
find . -name \*.htm -o -name \*.html -o -name \*.jsp -exec grep -il "myString" {} \;

Initially, I thought using the -v flag to grep would work for me, but grep -vl returns all files it sees, because -v returns the lines that match the invert expression, not the files that match the invert expression. Then there’s the problem that I need to match “full” pages rather than included .jsp stubs.

So here’s how the Mighty Power of Unix came to my rescue:

find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | sort -u > full_pages.txt

provides me with a list of pages that are not mere inclusions, if you accept my assumption that an inclusion won’t match the closing HTML tag.

Then I generate a list of full pages that contain the magic string and or include the footer.jsp that would contain the magic string:
find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | xargs grep -le "uniqueCopyrightTag\|footer\.jsp" | sort -u > pages_no_string.txt

Then I compare the files to find out which full pages lack both the magic string and the include:
comm -3 pages_no_string.txt full_pages.txt

Wow. There it is!

I bet there’s an easier way. Post an example in the comments if you know of one!

NOTE: All commands are on a single line, regardless of whether they wrap in this particular display.

Thu, 8 Mar 2007

Automated Backups – Updated!

[Update -- fixed the link!]

Automated Backups are a good thing. Automated Backups make the little birds sing, the rainbows shine, and little fauns gambol about in beautiful green forests. When computers are backed up, the butterflies flutter, the flowers bloom, and the fruit from the trees taste just a little sweeter. But when computers are not backed up, the universe becomes angry.

An angry universe is not a good thing. An angry universe makes little birds cry. An angry universe makes Cthulhu come and visit.

So. Automated backups. I’m partial to rdiff-backup because it allows me to not only back up data, but keep previous versions available. Backing up nightly doesn’t help if you accidentally overwrite the contents of a file with something, and don’t notice for a day or two. But with rdiff-backup, you can restore the version before the error.

Unfortunately, rdiff-backup really is designed for server-to-server backups, where each end of the transaction has shell access. Enter duplicity, a related project. It’s more designed for storing backups on servers that you don’t control and/or don’t trust. It allows encryption of your backup sets, as well as supporting a wider variety of protocols (ftp, scp, s3, etc.)

So with a combination of these two scripts, you can backup pretty much any POSIX-ish server to pretty much anything that you can ftp or ssh into. Still, it’d be nice if you could:

  • Check that the backups completed successfully, and get email confirming that success or warning on a failure.
  • Configure up all of your various backups by a simple text file, rather than remembering the different command-line formats.
  • Create groups of options that can be applied to backup tasks.
  • Issue commands on the backup source and destinations before and/or after the backup (good for dumping databases into a flat file, for example, and then deleting it after it’s backed up).
  • Get email confirmation on completion of backups.
  • Have some tools to simplify the securing of the backup process.

For these reasons, I put together this backup script, which is basically a Ruby wrapper for rdiff-backup and duplicity. It’s almost entirely configured via two human-readable yaml files.

It’s flexible, reasonably simple to use, and comes without any guarantees whatsoever. Feel free to use it yourself!

DISCLAIMER: it’s as-is. Not to be used in place of a certified Cthulhu-deterrent. Use at your own risk. To quote the duplicity page: “[it] is not stable yet. It is thought to have a few bugs, but will work for normal usage, and should continue to work fine until you depend on it for your business or to protect important personal data.” — that goes for me too, only double.

Wed, 3 Jan 2007

Software: is it too much to ask?

(— SjG @ 2:22 pm)

OK. Entrepreneurs, read up. I’m gonna give you some ideas that’ll make you rich.

Start my ranting:

1. Can I really be the only person who wants to share Thunderbird/Seamonkey address books with a spouse? I mean, how hard can it be?

What I’d like:

  • Each of our “Personal Address Book” collections show up as a list on one another’s address books as a list (e.g., mine shows up on my wife’s machine as “Samuel’s Address Book”. It could use the machine name instead, if it’s easier).
  • We can see one another’s mailing lists in our address books
  • Manual sync is fine — automatic would be even better
  • Simplistic merging is OK, so long as there’s a way to resolve conflicts
  • Ability to mark lists as private or shared

2. Can I really be the only person who wants to share a checkbook program with a spouse? I mean, how hard can it be?

What I’d like:

  • Ability to enter checks / charges / deposits into a common account register
  • Ability for either person to perform reconciliation
  • Ability to have accounts that are not shared

3. Can I really be the only person who wants to have an intelligent, revision-capable backup script that doesn’t require shell on the destination end? I mean, how hard can it be?

What I’d like:

  • rdiff-backup, only permitting an ftp-based push of the backup file.

More to come, as I experience more outrage.

Page 2 of 4«1234»