Fri, 9 Nov 2007

Finding File and Directory Counts

— SjG @ 3:31 pm

So, in the process of organizing photographs, I wanted to examine my deeply-nested hierarchy to figure out how it’s possible I have 30,000 images (Aperture only wants me to have 10,000 in a project, so I need to re-organize the hierarchy even before I import).

So, I figured it’d be easy to use find to list all my directories, and how many images they contain. It turns out that (at least for me) it’s not.

My best stab so far is to use find and a loop, which gives me almost what I want (it not only includes the count of images in the each directory, but subdirectories as well). It fails if there are too many directories. It’s good enough. But it’s not elegant.

So CLI Deities — how would you make this pretty?

find . -type d | while read dir; do echo `ls -1 "$dir" | wc -l` $dir; done

Potential type-face issue disambiguation: after the ls, that first argument is a one, not an ell, although I suppose an ell would work too. The wc option is an ell.

Wed, 27 Jun 2007

Unix: How to find files lacking certain strings

— SjG @ 4:10 pm

So, I’m working on a convoluted web site, and a problem comes up. It seems that some vitally important code was not included in some pages (for the sake of argument, let’s say it’s a copyright string). This particular site has an ungodly mix of files, including .htm, .html, and .jsp files. Some of the .jsp files are actual pages, and others are stubs to be included in other .jsp pages. The majority of the full .jsp pages include a “footer.jsp” that has the desired string, so they’re good. But I need to generate a list of the full pages, of whatever sort, that lack this string.

The inverse of this problem is easy, and is the kind of thing I use all the time:
find . -name \*.htm -o -name \*.html -o -name \*.jsp -exec grep -il "myString" {} \;

Initially, I thought using the -v flag to grep would work for me, but grep -vl returns all files it sees, because -v returns the lines that match the invert expression, not the files that match the invert expression. Then there’s the problem that I need to match “full” pages rather than included .jsp stubs.

So here’s how the Mighty Power of Unix came to my rescue:

find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | sort -u > full_pages.txt

provides me with a list of pages that are not mere inclusions, if you accept my assumption that an inclusion won’t match the closing HTML tag.

Then I generate a list of full pages that contain the magic string and or include the footer.jsp that would contain the magic string:
find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | xargs grep -le "uniqueCopyrightTag\|footer\.jsp" | sort -u > pages_no_string.txt

Then I compare the files to find out which full pages lack both the magic string and the include:
comm -3 pages_no_string.txt full_pages.txt

Wow. There it is!

I bet there’s an easier way. Post an example in the comments if you know of one!

NOTE: All commands are on a single line, regardless of whether they wrap in this particular display.

Thu, 8 Mar 2007

Automated Backups – Updated!

— SjG @ 3:50 pm

[Update — fixed the link!]

Automated Backups are a good thing. Automated Backups make the little birds sing, the rainbows shine, and little fauns gambol about in beautiful green forests. When computers are backed up, the butterflies flutter, the flowers bloom, and the fruit from the trees taste just a little sweeter. But when computers are not backed up, the universe becomes angry.

An angry universe is not a good thing. An angry universe makes little birds cry. An angry universe makes Cthulhu come and visit.

So. Automated backups. I’m partial to rdiff-backup because it allows me to not only back up data, but keep previous versions available. Backing up nightly doesn’t help if you accidentally overwrite the contents of a file with something, and don’t notice for a day or two. But with rdiff-backup, you can restore the version before the error.

Unfortunately, rdiff-backup really is designed for server-to-server backups, where each end of the transaction has shell access. Enter duplicity, a related project. It’s more designed for storing backups on servers that you don’t control and/or don’t trust. It allows encryption of your backup sets, as well as supporting a wider variety of protocols (ftp, scp, s3, etc.)

So with a combination of these two scripts, you can backup pretty much any POSIX-ish server to pretty much anything that you can ftp or ssh into. Still, it’d be nice if you could:

  • Check that the backups completed successfully, and get email confirming that success or warning on a failure.
  • Configure up all of your various backups by a simple text file, rather than remembering the different command-line formats.
  • Create groups of options that can be applied to backup tasks.
  • Issue commands on the backup source and destinations before and/or after the backup (good for dumping databases into a flat file, for example, and then deleting it after it’s backed up).
  • Get email confirmation on completion of backups.
  • Have some tools to simplify the securing of the backup process.

For these reasons, I put together this backup script, which is basically a Ruby wrapper for rdiff-backup and duplicity. It’s almost entirely configured via two human-readable yaml files.

It’s flexible, reasonably simple to use, and comes without any guarantees whatsoever. Feel free to use it yourself!

DISCLAIMER: it’s as-is. Not to be used in place of a certified Cthulhu-deterrent. Use at your own risk. To quote the duplicity page: “[it] is not stable yet. It is thought to have a few bugs, but will work for normal usage, and should continue to work fine until you depend on it for your business or to protect important personal data.” — that goes for me too, only double.

Wed, 3 Jan 2007

Software: is it too much to ask?

— SjG @ 2:22 pm

OK. Entrepreneurs, read up. I’m gonna give you some ideas that’ll make you rich.

Start my ranting:

1. Can I really be the only person who wants to share Thunderbird/Seamonkey address books with a spouse? I mean, how hard can it be?

What I’d like:

  • Each of our “Personal Address Book” collections show up as a list on one another’s address books as a list (e.g., mine shows up on my wife’s machine as “Samuel’s Address Book”. It could use the machine name instead, if it’s easier).
  • We can see one another’s mailing lists in our address books
  • Manual sync is fine — automatic would be even better
  • Simplistic merging is OK, so long as there’s a way to resolve conflicts
  • Ability to mark lists as private or shared

2. Can I really be the only person who wants to share a checkbook program with a spouse? I mean, how hard can it be?

What I’d like:

  • Ability to enter checks / charges / deposits into a common account register
  • Ability for either person to perform reconciliation
  • Ability to have accounts that are not shared

3. Can I really be the only person who wants to have an intelligent, revision-capable backup script that doesn’t require shell on the destination end? I mean, how hard can it be?

What I’d like:

  • rdiff-backup, only permitting an ftp-based push of the backup file.

More to come, as I experience more outrage.

Sun, 22 Oct 2006

Reverse SSH tunnels in Mac OS X

— SjG @ 9:02 am

I’m one of the many people who will be using VNC to do remote assistance for a relative using Windows.

There are a number of tutorials out there. Most of them fail because they require the ability to VNC in to the remote system, which won’t work in my case because the remote Windows box is behind a firewall/router that I can’t configure. There are also several reverse approaches out there, where the user needing assistance initiates the connection. The first of these I say was Gina Trapani’s approach at Geek to Live, which uses UltraVNC on both ends. This is almost the solution I want, except that it requires Windows on my end as well. It also assumes that I’m at a fixed location.

In the comments, I came across Fazal Majid’s response. He had the same requirements as I do, and links to his source where he built a customized VNC server that targets a fixed IP address. Fazal’s approach matches my needs exactly.
But then I ran into the problem of the last step: the reverse SSH tunnel from my known server (which gets hard-coded into the executable) to my notebook running Chicken of the VNC.
Building reverse SSH tunnels is really not that difficult. But when I created the setup, I was able to make it work from a Linux machine and from a Cygwin terminal under Windows, but it mysteriously failed under Mac OS. Using lots of -v flags, I kept seeing the service for the port on the Mac side refusing the connection from the tunnel. The ssh debug looked like:

debug1: remote forward success for: listen 5900, connect localhost:5500
debug1: client_input_channel_open: ctype forwarded-tcpip rchan 2 win 131072 max 32768
debug1: client_request_forwarded_tcpip: listen localhost port 5900, originator ::1 port 60475
debug1: channel 0: new [::1]
debug1: confirm forwarded-tcpip
debug3: channel 0: waiting for connection
debug1: channel 0: not connected: Connection refused
debug2: channel 0: zombie
debug2: channel 0: garbage collecting
It turns out that this means the tunnel doesn’t even see the service. After wasting time with firewall tests and a lot of other false leads, I finally noticed the [::1] notation in there. Yup, that’s an IPv6 address. The solution is to make sure the ssh tunnel is using IPv4. For reference, the command that works is:

ssh -nNT4 -R 5500:localhost:5500 -l my_username