Wed, 9 May 2007

Extracting Scripts from Javascript pages using Javascript

— SjG @ 1:56 pm

Here’s a weird one. There was the need to extract the contents of all Javascript <script> … </script> tags from an html page, using Javascript in an Ajax-y environment*. I tried using a similar regular expression to the one published by Matt Mecham, but found that IE threw an error. IE didn’t like the [^] construct.

So, since I knew that the pages that this would need to process would be standard strings with nothing odd in them, I substituted [^\0]. Works in Firefox and IE. I don’t know if it breaks under different encodings, though.

The other problems was conceptual — I didn’t remember that regex.exec() only gives you the first match in the resultant array (but gives you your submatches); I confused it with the behavior of string.match() which doesn’t give you your submatches. *sigh*

So the code looks like this:

var reg = new RegExp("<script[^>]*>([^\\0]*?)<\\/script>","ig");
while( (m2 = reg.exec(http.responseText))  != null )
        for( i = 1; i < m2.length; i++ )
           alert(i + '('+m2[i].length+')' + m2[i]);
           // do other stuff

(Please note that WordPress seems insistent on munging that code. Spacing, in particular, might be corrupted.)

(* note that use of the passive voice. To protect the innocent, we won’t say who/why it was needed.)

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.