fogbound.net




Wed, 27 Jun 2007

Unix: How to find files lacking certain strings

— SjG @ 4:10 pm

So, I’m working on a convoluted web site, and a problem comes up. It seems that some vitally important code was not included in some pages (for the sake of argument, let’s say it’s a copyright string). This particular site has an ungodly mix of files, including .htm, .html, and .jsp files. Some of the .jsp files are actual pages, and others are stubs to be included in other .jsp pages. The majority of the full .jsp pages include a “footer.jsp” that has the desired string, so they’re good. But I need to generate a list of the full pages, of whatever sort, that lack this string.

The inverse of this problem is easy, and is the kind of thing I use all the time:
find . -name \*.htm -o -name \*.html -o -name \*.jsp -exec grep -il "myString" {} \;

Initially, I thought using the -v flag to grep would work for me, but grep -vl returns all files it sees, because -v returns the lines that match the invert expression, not the files that match the invert expression. Then there’s the problem that I need to match “full” pages rather than included .jsp stubs.

So here’s how the Mighty Power of Unix came to my rescue:

find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | sort -u > full_pages.txt

provides me with a list of pages that are not mere inclusions, if you accept my assumption that an inclusion won’t match the closing HTML tag.

Then I generate a list of full pages that contain the magic string and or include the footer.jsp that would contain the magic string:
find . -name \*.htm -o -name \*.html -o -name \*.jsp | xargs grep -il "</html>" | xargs grep -le "uniqueCopyrightTag\|footer\.jsp" | sort -u > pages_no_string.txt

Then I compare the files to find out which full pages lack both the magic string and the include:
comm -3 pages_no_string.txt full_pages.txt

Wow. There it is!

I bet there’s an easier way. Post an example in the comments if you know of one!

NOTE: All commands are on a single line, regardless of whether they wrap in this particular display.


Sat, 16 Jun 2007

You Can’t Win

— SjG @ 6:16 pm

You Can’t Win, by Jack Black, 1926, reprinted by Nabat Press, 2000.

This is an interesting, conflicted, tripartite book. It’s an autobiography of a hobo and burglar, a jailbird, and a reform activist. The book starts as a good-natured telling of how Black left home, and became a hobo. We follow him as he gets caught up in the seamier side of life away from home, and how, ostensibly, through misunderstandings, he came to fall fully on the wrong side of the law. The arc continues through opium addiction, prison, abuse, and ends in reform and moral outrage.

The first part of the telling is a light, almost romantic adventure. The young man goes off, has adventures in the city, then starts to ride the rails. Sure, there’s danger, there’s police and railyard bulls to avoid, there’s even sudden death from shifting cargo, but the telling is almost with the exuberance of youth. Black encounters other hobos, who welcome him into the family, teach him the argot, and start showing him the ropes.

From here, the tale darkens. Black apprentices himself out to be a burglar, and the situations get more perilous. Friends get killed; Black gets into and out of prison. Still, the tale is rip-roaring adventure: now a member of the brotherhood of thieves, Black introduces us to a cast of wild characters. He describes to us the great hobo gatherings, with their camaraderie and drunken abandon. He details many hair-raising exploits of burglary and safe breaking.

The latter part of the book involves a lot more prison, betrayal, and drug addiction. It still has elaborate capers of theft and jailbreak, but now Black has suffered under the system. Authority is now beating him down, and he responds with wantonness and violence. In the end, there is kindness and reform.

The book is particularly intriguing in the shift of tone throughout the book. There is definite pride in the exploits, even if the words condemn his actions. The latter parts of the book are quite bitter, and the emotions are contradictory — Black blames the cruel neglect and abuse of society for making him into a monster, yet he also happily admits that he never had any interest in becoming part of society or behaving in a way that society would accept. This is what makes the book more than just a personal journey or a thriller; we experience the world from Black’s perspective, seeing hypocrisies in both the society with which he’s in conflict, and in his antisocial lifestyle.

Filed in:

Church Signs

— SjG @ 5:41 pm

Facing the main street at a church up the road from here is one of those illuminated signboards with the movable letters. For years, it amused me with its inadvertent proclamation:

SUN.WORSHIP
10-11 WEEKLY
ALL WELCOME

I’d always had to resist stealing the punctuation. “Get thee behind me, Loki!” I’d say quietly under my breath. And somehow I forbore.

Evidently, however, I was not the only one who interpreted the sign that way. So it’s been changed:

SUNDAY
WORSHIP 10-11
ALL WELCOME

But I still wonder if it bothers them that they’ve just abstracted the misunderstanding by one level. After all, the word “Sunday” originates in Pagan sun worship (refs: here which links to other sources, and numerous others).

Filed in: