This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries hard to avoid including end-of-sentence punctuation in the marked-up URL.
It is a typical Perl filter, so it can be used by feeding it input:
% gunzip -c ~/mail/archive.gz | urlify > archive.urlified
or by supplying files on the command line:
% urlify ~/mail/*.inbox > ~/allmail.urlified
The program is shown in Example 6.13.
#!/usr/bin/perl
# urlify - wrap HTML links around URL-like constructs
$urls = '(http|telnet|gopher|file|wais|ftp)';
$ltrs = '\w';
$gunk = '/#~:.?+=&%@!\-';
$punc = '.:?\-';
$any  = "${ltrs}${gunk}${punc}";
while (<>) {
    s{
      \b                    # start at word boundary
      (                     # begin $1  {
       $urls     :          # need resource and a colon
       [$any] +?            # followed by on or more
                            #  of any valid character, but
                            #  be conservative and take only
                            #  what you need to....
      )                     # end   $1  }
      (?=                   # look-ahead non-consumptive assertion
       [$punc]*             # either 0 or more punctuation
       [^$any]              #   followed by a non-url char
       |                    # or else
       $                    #   then end of the string
      )
     }{<A HREF="$1">$1</A>}igox;
    print;
}
Copyright © 2001 O'Reilly & Associates. All rights reserved.