Mon, 9 Jul 2007

Preventing “Overlapping” cron Processes

— SjG @ 9:43 am

I have a number of very time consuming processes that get run by cron on various machines. Some of these processes would cause problems if they “overlapped” — e.g., a new one gets started before the old one is done.

Now, there are a lot of ways to make sure you’re unique if you’re a process, but often I don’t want to modify the source of the process to add that (for many packages, I’d rather not patch and merge and recompile every time a new version comes out). So I write a simple shell script to run the process; cron calls my shell script, and prevents the overlap.

This uses the magic of “pgrep” — unfortunately, different versions of pgrep have different flags, so the code I originally wrote (which used the “-c” flag, which counts the matching processes) didn’t port to most systems. It’s easy enough to pipe the output through a “wc -l”.

I did have to move the pgrep exec out of my if statement, though, since the comparison was going against the return code, not the output. Doh!


RUNNING_PROCS = `pgrep -f longRunningProcess | wc -l`
if [ "$RUNNING_PROCS" -gt "0" ]
        echo `date` longRunningProcess still running. I\'ll let it finish. 
        echo `date` Starting longRunningProcess.
        /path/to/longRunningProcess -flags 
echo "----------------------------"

Filed in:

Wed, 27 Jun 2007

Unix: How to find files lacking certain strings

— SjG @ 4:10 pm

So, I’m working on a convoluted web site, and a problem comes up. It seems that some vitally important code was not included in some pages (for the sake of argument, let’s say it’s a copyright string). This particular site has an ungodly mix of files, including .htm, .html, and .jsp files. Some of the .jsp files are actual pages, and others are stubs to be included in other .jsp pages. The majority of the full .jsp pages include a “footer.jsp” that has the desired string, so they’re good. But I need to generate a list of the full pages, of whatever sort, that lack this string.

The inverse of this problem is easy, and is the kind of thing I use all the time:
find . -name \*.htm -o -name \*.html -o -name \*.jsp -exec grep -il "myString" {} \;

Initially, I thought using the -v flag to grep would work for me, but grep -vl returns all files it sees, because -v returns the lines that match the invert expression, not the files that match the invert expression. Then there’s the problem that I need to match “full” pages rather than included .jsp stubs.

So here’s how the Mighty Power of Unix came to my rescue:

find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | sort -u > full_pages.txt

provides me with a list of pages that are not mere inclusions, if you accept my assumption that an inclusion won’t match the closing HTML tag.

Then I generate a list of full pages that contain the magic string and or include the footer.jsp that would contain the magic string:
find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | xargs grep -le "uniqueCopyrightTag\|footer\.jsp" | sort -u > pages_no_string.txt

Then I compare the files to find out which full pages lack both the magic string and the include:
comm -3 pages_no_string.txt full_pages.txt

Wow. There it is!

I bet there’s an easier way. Post an example in the comments if you know of one!

NOTE: All commands are on a single line, regardless of whether they wrap in this particular display.

Wed, 9 May 2007

Extracting Scripts from Javascript pages using Javascript

— SjG @ 1:56 pm

Here’s a weird one. There was the need to extract the contents of all Javascript <script> … </script> tags from an html page, using Javascript in an Ajax-y environment*. I tried using a similar regular expression to the one published by Matt Mecham, but found that IE threw an error. IE didn’t like the [^] construct.

So, since I knew that the pages that this would need to process would be standard strings with nothing odd in them, I substituted [^\0]. Works in Firefox and IE. I don’t know if it breaks under different encodings, though.

The other problems was conceptual — I didn’t remember that regex.exec() only gives you the first match in the resultant array (but gives you your submatches); I confused it with the behavior of string.match() which doesn’t give you your submatches. *sigh*

So the code looks like this:

var reg = new RegExp("<script[^>]*>([^\\0]*?)<\\/script>","ig");
while( (m2 = reg.exec(http.responseText))  != null )
        for( i = 1; i < m2.length; i++ )
           alert(i + '('+m2[i].length+')' + m2[i]);
           // do other stuff

(Please note that WordPress seems insistent on munging that code. Spacing, in particular, might be corrupted.)

(* note that use of the passive voice. To protect the innocent, we won’t say who/why it was needed.)

Mon, 2 Apr 2007

Buffalo Terastation Problems

— SjG @ 4:20 pm

I’ve written here numerous dull tirades on the subject of backups. Well, here’s more.

We had my brand new shiny backup script working on the LAN to backup all the servers to a Linux box with a 300GB hard drive. For extra security, we copied it out to a Buffalo Terastation, which also serves as our office fileserver. For that extra bit of security, the Terastation is formatted as two shares, each of 250GB (using RAID-1). One of those shares is the office fileshare, the other is for server backups.

Well, there was a slight *cough* stupid *cough* problem with one of my backup scripts over a weekend, which resulted in a recursive backup of a directory (doh!). This filled up the disk on the Linux box, but it didn’t prevent it from happily trying to copy it all to the Terastation (using lftp).

When I came in on Monday, the Terastation was not happy. It simultaneously said the drives were ~30% full, and said that it couldn’t find any disks at all. FTP connections were dropped immediately. We were able to copy a few files off of it from machines that had kept the drive mounted via SMB, but then it would disconnect and vanish from that machine’s network visibility. This was not good. At some point, we thought it might be a good idea to try enabling another protocol to access the data, which had the unfortunate side effect of switching the Teraserver admin into Japanese.

Tech support took about 20 minutes to answer the call, but they were courteous and helpful. Eventually, they concluded that the controller board was bad. To get a replacement, they charged our credit card the price of a new unit, and shipped it out, with the understanding that we’d swap the drives into the new unit, send back the old one, and get credited back the money. While this is not ideal, I can understand why they do it that way.

In any case, the new unit arrived today. I went through the effort of swapping the drives from one unit to the other (which is a lot more complicated than it should be, requiring a lot of screws). And voila! Still a Japanese admin, and no ability to access the data.

My working theory now is that the Teraserver stores configuration data on the drives, and when the one share filled up, it corrupted the config data somehow. I’ll call tech support tomorrow and see what I can learn. *sigh*

Sun, 25 Mar 2007

Backups, cont.

— SjG @ 9:50 pm

OK. I’m a bonehead. The link I provided to my backup script tarball was broken. The link is fixed.

But wait! A new version of the scripts will be posted in a few days. It’s got some bug fixes and some new features. With it, the little birds really do sing more cheerfully, and the colors really will be brighter.

(As an aside … I don’t know why none of the people who clicked on the broken link bothered to send me an email or leave a comment to tell me there was a problem. Could that all have been robot traffic?)