Extracting Scripts from Javascript pages using Javascript
Here’s a weird one. There was the need to extract the contents of all Javascript <script> … </script> tags from an html page, using Javascript in an Ajax-y environment*. I tried using a similar regular expression to the one published by Matt Mecham, but found that IE threw an error. IE didn’t like the [^] construct.
So, since I knew that the pages that this would need to process would be standard strings with nothing odd in them, I substituted [^\0]. Works in Firefox and IE. I don’t know if it breaks under different encodings, though.
The other problems was conceptual — I didn’t remember that regex.exec() only gives you the first match in the resultant array (but gives you your submatches); I confused it with the behavior of string.match() which doesn’t give you your submatches. *sigh*
So the code looks like this:
var reg = new RegExp("<script[^>]*>([^\\0]*?)<\\/script>","ig"); while( (m2 = reg.exec(http.responseText)) != null ) { for( i = 1; i < m2.length; i++ ) { alert(i + '('+m2[i].length+')' + m2[i]); // do other stuff } }
(Please note that WordPress seems insistent on munging that code. Spacing, in particular, might be corrupted.)
(* note that use of the passive voice. To protect the innocent, we won’t say who/why it was needed.)