Wed, 27 Jun 2007

Unix: How to find files lacking certain strings

— SjG @ 4:10 pm

So, I’m working on a convoluted web site, and a problem comes up. It seems that some vitally important code was not included in some pages (for the sake of argument, let’s say it’s a copyright string). This particular site has an ungodly mix of files, including .htm, .html, and .jsp files. Some of the .jsp files are actual pages, and others are stubs to be included in other .jsp pages. The majority of the full .jsp pages include a “footer.jsp” that has the desired string, so they’re good. But I need to generate a list of the full pages, of whatever sort, that lack this string.

The inverse of this problem is easy, and is the kind of thing I use all the time:
find . -name \*.htm -o -name \*.html -o -name \*.jsp -exec grep -il "myString" {} \;

Initially, I thought using the -v flag to grep would work for me, but grep -vl returns all files it sees, because -v returns the lines that match the invert expression, not the files that match the invert expression. Then there’s the problem that I need to match “full” pages rather than included .jsp stubs.

So here’s how the Mighty Power of Unix came to my rescue:

find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | sort -u > full_pages.txt

provides me with a list of pages that are not mere inclusions, if you accept my assumption that an inclusion won’t match the closing HTML tag.

Then I generate a list of full pages that contain the magic string and or include the footer.jsp that would contain the magic string:
find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | xargs grep -le "uniqueCopyrightTag\|footer\.jsp" | sort -u > pages_no_string.txt

Then I compare the files to find out which full pages lack both the magic string and the include:
comm -3 pages_no_string.txt full_pages.txt

Wow. There it is!

I bet there’s an easier way. Post an example in the comments if you know of one!

NOTE: All commands are on a single line, regardless of whether they wrap in this particular display.

Thu, 8 Mar 2007

Automated Backups – Updated!

— SjG @ 3:50 pm

[Update — fixed the link!]

Automated Backups are a good thing. Automated Backups make the little birds sing, the rainbows shine, and little fauns gambol about in beautiful green forests. When computers are backed up, the butterflies flutter, the flowers bloom, and the fruit from the trees taste just a little sweeter. But when computers are not backed up, the universe becomes angry.

An angry universe is not a good thing. An angry universe makes little birds cry. An angry universe makes Cthulhu come and visit.

So. Automated backups. I’m partial to rdiff-backup because it allows me to not only back up data, but keep previous versions available. Backing up nightly doesn’t help if you accidentally overwrite the contents of a file with something, and don’t notice for a day or two. But with rdiff-backup, you can restore the version before the error.

Unfortunately, rdiff-backup really is designed for server-to-server backups, where each end of the transaction has shell access. Enter duplicity, a related project. It’s more designed for storing backups on servers that you don’t control and/or don’t trust. It allows encryption of your backup sets, as well as supporting a wider variety of protocols (ftp, scp, s3, etc.)

So with a combination of these two scripts, you can backup pretty much any POSIX-ish server to pretty much anything that you can ftp or ssh into. Still, it’d be nice if you could:

  • Check that the backups completed successfully, and get email confirming that success or warning on a failure.
  • Configure up all of your various backups by a simple text file, rather than remembering the different command-line formats.
  • Create groups of options that can be applied to backup tasks.
  • Issue commands on the backup source and destinations before and/or after the backup (good for dumping databases into a flat file, for example, and then deleting it after it’s backed up).
  • Get email confirmation on completion of backups.
  • Have some tools to simplify the securing of the backup process.

For these reasons, I put together this backup script, which is basically a Ruby wrapper for rdiff-backup and duplicity. It’s almost entirely configured via two human-readable yaml files.

It’s flexible, reasonably simple to use, and comes without any guarantees whatsoever. Feel free to use it yourself!

DISCLAIMER: it’s as-is. Not to be used in place of a certified Cthulhu-deterrent. Use at your own risk. To quote the duplicity page: “[it] is not stable yet. It is thought to have a few bugs, but will work for normal usage, and should continue to work fine until you depend on it for your business or to protect important personal data.” — that goes for me too, only double.

Wed, 3 Jan 2007

Software: is it too much to ask?

— SjG @ 2:22 pm

OK. Entrepreneurs, read up. I’m gonna give you some ideas that’ll make you rich.

Start my ranting:

1. Can I really be the only person who wants to share Thunderbird/Seamonkey address books with a spouse? I mean, how hard can it be?

What I’d like:

  • Each of our “Personal Address Book” collections show up as a list on one another’s address books as a list (e.g., mine shows up on my wife’s machine as “Samuel’s Address Book”. It could use the machine name instead, if it’s easier).
  • We can see one another’s mailing lists in our address books
  • Manual sync is fine — automatic would be even better
  • Simplistic merging is OK, so long as there’s a way to resolve conflicts
  • Ability to mark lists as private or shared

2. Can I really be the only person who wants to share a checkbook program with a spouse? I mean, how hard can it be?

What I’d like:

  • Ability to enter checks / charges / deposits into a common account register
  • Ability for either person to perform reconciliation
  • Ability to have accounts that are not shared

3. Can I really be the only person who wants to have an intelligent, revision-capable backup script that doesn’t require shell on the destination end? I mean, how hard can it be?

What I’d like:

  • rdiff-backup, only permitting an ftp-based push of the backup file.

More to come, as I experience more outrage.

Sun, 22 Oct 2006

Reverse SSH tunnels in Mac OS X

— SjG @ 9:02 am

I’m one of the many people who will be using VNC to do remote assistance for a relative using Windows.

There are a number of tutorials out there. Most of them fail because they require the ability to VNC in to the remote system, which won’t work in my case because the remote Windows box is behind a firewall/router that I can’t configure. There are also several reverse approaches out there, where the user needing assistance initiates the connection. The first of these I say was Gina Trapani’s approach at Geek to Live, which uses UltraVNC on both ends. This is almost the solution I want, except that it requires Windows on my end as well. It also assumes that I’m at a fixed location.

In the comments, I came across Fazal Majid’s response. He had the same requirements as I do, and links to his source where he built a customized VNC server that targets a fixed IP address. Fazal’s approach matches my needs exactly.
But then I ran into the problem of the last step: the reverse SSH tunnel from my known server (which gets hard-coded into the executable) to my notebook running Chicken of the VNC.
Building reverse SSH tunnels is really not that difficult. But when I created the setup, I was able to make it work from a Linux machine and from a Cygwin terminal under Windows, but it mysteriously failed under Mac OS. Using lots of -v flags, I kept seeing the service for the port on the Mac side refusing the connection from the tunnel. The ssh debug looked like:

debug1: remote forward success for: listen 5900, connect localhost:5500
debug1: client_input_channel_open: ctype forwarded-tcpip rchan 2 win 131072 max 32768
debug1: client_request_forwarded_tcpip: listen localhost port 5900, originator ::1 port 60475
debug1: channel 0: new [::1]
debug1: confirm forwarded-tcpip
debug3: channel 0: waiting for connection
debug1: channel 0: not connected: Connection refused
debug2: channel 0: zombie
debug2: channel 0: garbage collecting
It turns out that this means the tunnel doesn’t even see the service. After wasting time with firewall tests and a lot of other false leads, I finally noticed the [::1] notation in there. Yup, that’s an IPv6 address. The solution is to make sure the ssh tunnel is using IPv4. For reference, the command that works is:

ssh -nNT4 -R 5500:localhost:5500 -l my_username

Wed, 7 Sep 2005

Build PHP 5.0.5 for Mac OS X 10.4

— SjG @ 8:39 pm

I’m trying to track down a bug in my CMS Made Simple PHP code where I did something stupid with references (a rant on the PHP pass-by-value model available upon request). So it only manifests with PHP 4.4.x or PHP 5.0.5, since that’s where they finally decided to get strict with us idiot slackers. Neither of these are available as binary packages on Mac OS 10.4.

I was dismayed, shocked, stunned, dazed, and confused to learn that PHP was no longer a package for Fink. Dammit! Now I have to figure it out for myself. Crap.

With the help of a variety of pages out there on the web (especially this one), I was able to do it. Here’s how:

Install Fink, if you haven’t already. I use the “unstable” packages. Read the FAQ, and muck around with it for a while until you feel ready to proceed.

Install a wholehellovalotta packages using Fink:

  • libjpeg
  • libtiff
  • libpng
  • libxml2

I also installed the Fink version of MySQL server 4.1, client, and a bunch of shared libraries.

Next, gotta build ZLib:
curl -O
cd zlib-1.2.3
./configure --prefix=/sw
(su if necessary)
make install

(I did my install work in /sw/src, but you could do it somewhere else if it pleases you more. Just take note of this other location when you need it later.)

Finally, we get PHP 5 .0.5 from

tar xzvf php-5.0.5.tar.gz
cd php-5.0.5
./configure --with-libjpeg=/sw --with-libtiff=/sw --with-libpng=/sw \
--with-gd --with-mysql=/sw --with-xml --with-apxs --with-exif \
--with-jpeg-dir=/sw --enable-exif --with-png-dir=/sw --with-zlib-dir=/sw \
(su if necessary)
make install

Now, I already had a version of PHP installed before this, provided by Marc Liyanage’s excellent binary packages available at his page, so I didn’t need to tweak my php.ini file. If you do, you’d probably do something like:

cp /sw/src/php-5.0.5/php.ini-recommended /usr/local/lib/php.ini

and then edit into submission. You can also use the more general php.ini-dist instead of php.ini-recommended. I don’t know why they provide both — probably to confuse idiots like me. You’ll also need to register the PHP Mime Type with Apache. Edit your /etc/httpd/httpd.conf file, and add either to the general area or to a specific virtual host the line:

AddType application/x-httpd-php .php

Now test it! Create a test file in your web root containing:

<php phpinfo();>

and browse on over to it. With any luck, you’ll be greeted with ahappy PHP 5.0.5 banner.
Celebrate this with red wine. Preferably good red wine. Then get back to coding. As should be obvious, I decided to document instead of code, but I didn’t skip that vital red wine step.