| 1 |
=head1 NAME |
| 2 |
|
| 3 |
lwpcook - The libwww-perl cookbook |
| 4 |
|
| 5 |
=head1 DESCRIPTION |
| 6 |
|
| 7 |
This document contain some examples that show typical usage of the |
| 8 |
libwww-perl library. You should consult the documentation for the |
| 9 |
individual modules for more detail. |
| 10 |
|
| 11 |
All examples should be runnable programs. You can, in most cases, test |
| 12 |
the code sections by piping the program text directly to perl. |
| 13 |
|
| 14 |
|
| 15 |
|
| 16 |
=head1 GET |
| 17 |
|
| 18 |
It is very easy to use this library to just fetch documents from the |
| 19 |
net. The LWP::Simple module provides the get() function that return |
| 20 |
the document specified by its URL argument: |
| 21 |
|
| 22 |
use LWP::Simple; |
| 23 |
$doc = get 'http://www.linpro.no/lwp/'; |
| 24 |
|
| 25 |
or, as a perl one-liner using the getprint() function: |
| 26 |
|
| 27 |
perl -MLWP::Simple -e 'getprint "http://www.linpro.no/lwp/"' |
| 28 |
|
| 29 |
or, how about fetching the latest perl by running this command: |
| 30 |
|
| 31 |
perl -MLWP::Simple -e ' |
| 32 |
getstore "ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz", |
| 33 |
"perl.tar.gz"' |
| 34 |
|
| 35 |
You will probably first want to find a CPAN site closer to you by |
| 36 |
running something like the following command: |
| 37 |
|
| 38 |
perl -MLWP::Simple -e 'getprint "http://www.perl.com/perl/CPAN/CPAN.html"' |
| 39 |
|
| 40 |
Enough of this simple stuff! The LWP object oriented interface gives |
| 41 |
you more control over the request sent to the server. Using this |
| 42 |
interface you have full control over headers sent and how you want to |
| 43 |
handle the response returned. |
| 44 |
|
| 45 |
use LWP::UserAgent; |
| 46 |
$ua = LWP::UserAgent->new; |
| 47 |
$ua->agent("$0/0.1 " . $ua->agent); |
| 48 |
# $ua->agent("Mozilla/8.0") # pretend we are very capable browser |
| 49 |
|
| 50 |
$req = HTTP::Request->new(GET => 'http://www.linpro.no/lwp'); |
| 51 |
$req->header('Accept' => 'text/html'); |
| 52 |
|
| 53 |
# send request |
| 54 |
$res = $ua->request($req); |
| 55 |
|
| 56 |
# check the outcome |
| 57 |
if ($res->is_success) { |
| 58 |
print $res->decoded_content; |
| 59 |
} |
| 60 |
else { |
| 61 |
print "Error: " . $res->status_line . "\n"; |
| 62 |
} |
| 63 |
|
| 64 |
The lwp-request program (alias GET) that is distributed with the |
| 65 |
library can also be used to fetch documents from WWW servers. |
| 66 |
|
| 67 |
|
| 68 |
|
| 69 |
=head1 HEAD |
| 70 |
|
| 71 |
If you just want to check if a document is present (i.e. the URL is |
| 72 |
valid) try to run code that looks like this: |
| 73 |
|
| 74 |
use LWP::Simple; |
| 75 |
|
| 76 |
if (head($url)) { |
| 77 |
# ok document exists |
| 78 |
} |
| 79 |
|
| 80 |
The head() function really returns a list of meta-information about |
| 81 |
the document. The first three values of the list returned are the |
| 82 |
document type, the size of the document, and the age of the document. |
| 83 |
|
| 84 |
More control over the request or access to all header values returned |
| 85 |
require that you use the object oriented interface described for GET |
| 86 |
above. Just s/GET/HEAD/g. |
| 87 |
|
| 88 |
|
| 89 |
=head1 POST |
| 90 |
|
| 91 |
There is no simple procedural interface for posting data to a WWW server. You |
| 92 |
must use the object oriented interface for this. The most common POST |
| 93 |
operation is to access a WWW form application: |
| 94 |
|
| 95 |
use LWP::UserAgent; |
| 96 |
$ua = LWP::UserAgent->new; |
| 97 |
|
| 98 |
my $req = HTTP::Request->new(POST => 'http://www.perl.com/cgi-bin/BugGlimpse'); |
| 99 |
$req->content_type('application/x-www-form-urlencoded'); |
| 100 |
$req->content('match=www&errors=0'); |
| 101 |
|
| 102 |
my $res = $ua->request($req); |
| 103 |
print $res->as_string; |
| 104 |
|
| 105 |
Lazy people use the HTTP::Request::Common module to set up a suitable |
| 106 |
POST request message (it handles all the escaping issues) and has a |
| 107 |
suitable default for the content_type: |
| 108 |
|
| 109 |
use HTTP::Request::Common qw(POST); |
| 110 |
use LWP::UserAgent; |
| 111 |
$ua = LWP::UserAgent->new; |
| 112 |
|
| 113 |
my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse', |
| 114 |
[ search => 'www', errors => 0 ]; |
| 115 |
|
| 116 |
print $ua->request($req)->as_string; |
| 117 |
|
| 118 |
The lwp-request program (alias POST) that is distributed with the |
| 119 |
library can also be used for posting data. |
| 120 |
|
| 121 |
|
| 122 |
|
| 123 |
=head1 PROXIES |
| 124 |
|
| 125 |
Some sites use proxies to go through fire wall machines, or just as |
| 126 |
cache in order to improve performance. Proxies can also be used for |
| 127 |
accessing resources through protocols not supported directly (or |
| 128 |
supported badly :-) by the libwww-perl library. |
| 129 |
|
| 130 |
You should initialize your proxy setting before you start sending |
| 131 |
requests: |
| 132 |
|
| 133 |
use LWP::UserAgent; |
| 134 |
$ua = LWP::UserAgent->new; |
| 135 |
$ua->env_proxy; # initialize from environment variables |
| 136 |
# or |
| 137 |
$ua->proxy(ftp => 'http://proxy.myorg.com'); |
| 138 |
$ua->proxy(wais => 'http://proxy.myorg.com'); |
| 139 |
$ua->no_proxy(qw(no se fi)); |
| 140 |
|
| 141 |
my $req = HTTP::Request->new(GET => 'wais://xxx.com/'); |
| 142 |
print $ua->request($req)->as_string; |
| 143 |
|
| 144 |
The LWP::Simple interface will call env_proxy() for you automatically. |
| 145 |
Applications that use the $ua->env_proxy() method will normally not |
| 146 |
use the $ua->proxy() and $ua->no_proxy() methods. |
| 147 |
|
| 148 |
Some proxies also require that you send it a username/password in |
| 149 |
order to let requests through. You should be able to add the |
| 150 |
required header, with something like this: |
| 151 |
|
| 152 |
use LWP::UserAgent; |
| 153 |
|
| 154 |
$ua = LWP::UserAgent->new; |
| 155 |
$ua->proxy(['http', 'ftp'] => 'http://username:password@proxy.myorg.com'); |
| 156 |
|
| 157 |
$req = HTTP::Request->new('GET',"http://www.perl.com"); |
| 158 |
|
| 159 |
$res = $ua->request($req); |
| 160 |
print $res->decoded_content if $res->is_success; |
| 161 |
|
| 162 |
Replace C<proxy.myorg.com>, C<username> and |
| 163 |
C<password> with something suitable for your site. |
| 164 |
|
| 165 |
|
| 166 |
=head1 ACCESS TO PROTECTED DOCUMENTS |
| 167 |
|
| 168 |
Documents protected by basic authorization can easily be accessed |
| 169 |
like this: |
| 170 |
|
| 171 |
use LWP::UserAgent; |
| 172 |
$ua = LWP::UserAgent->new; |
| 173 |
$req = HTTP::Request->new(GET => 'http://www.linpro.no/secret/'); |
| 174 |
$req->authorization_basic('aas', 'mypassword'); |
| 175 |
print $ua->request($req)->as_string; |
| 176 |
|
| 177 |
The other alternative is to provide a subclass of I<LWP::UserAgent> that |
| 178 |
overrides the get_basic_credentials() method. Study the I<lwp-request> |
| 179 |
program for an example of this. |
| 180 |
|
| 181 |
|
| 182 |
=head1 COOKIES |
| 183 |
|
| 184 |
Some sites like to play games with cookies. By default LWP ignores |
| 185 |
cookies provided by the servers it visits. LWP will collect cookies |
| 186 |
and respond to cookie requests if you set up a cookie jar. |
| 187 |
|
| 188 |
use LWP::UserAgent; |
| 189 |
use HTTP::Cookies; |
| 190 |
|
| 191 |
$ua = LWP::UserAgent->new; |
| 192 |
$ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt", |
| 193 |
autosave => 1)); |
| 194 |
|
| 195 |
# and then send requests just as you used to do |
| 196 |
$res = $ua->request(HTTP::Request->new(GET => "http://www.yahoo.no")); |
| 197 |
print $res->status_line, "\n"; |
| 198 |
|
| 199 |
As you visit sites that send you cookies to keep, then the file |
| 200 |
F<lwpcookies.txt"> will grow. |
| 201 |
|
| 202 |
=head1 HTTPS |
| 203 |
|
| 204 |
URLs with https scheme are accessed in exactly the same way as with |
| 205 |
http scheme, provided that an SSL interface module for LWP has been |
| 206 |
properly installed (see the F<README.SSL> file found in the |
| 207 |
libwww-perl distribution for more details). If no SSL interface is |
| 208 |
installed for LWP to use, then you will get "501 Protocol scheme |
| 209 |
'https' is not supported" errors when accessing such URLs. |
| 210 |
|
| 211 |
Here's an example of fetching and printing a WWW page using SSL: |
| 212 |
|
| 213 |
use LWP::UserAgent; |
| 214 |
|
| 215 |
my $ua = LWP::UserAgent->new; |
| 216 |
my $req = HTTP::Request->new(GET => 'https://www.helsinki.fi/'); |
| 217 |
my $res = $ua->request($req); |
| 218 |
if ($res->is_success) { |
| 219 |
print $res->as_string; |
| 220 |
} |
| 221 |
else { |
| 222 |
print "Failed: ", $res->status_line, "\n"; |
| 223 |
} |
| 224 |
|
| 225 |
=head1 MIRRORING |
| 226 |
|
| 227 |
If you want to mirror documents from a WWW server, then try to run |
| 228 |
code similar to this at regular intervals: |
| 229 |
|
| 230 |
use LWP::Simple; |
| 231 |
|
| 232 |
%mirrors = ( |
| 233 |
'http://www.sn.no/' => 'sn.html', |
| 234 |
'http://www.perl.com/' => 'perl.html', |
| 235 |
'http://www.sn.no/libwww-perl/' => 'lwp.html', |
| 236 |
'gopher://gopher.sn.no/' => 'gopher.html', |
| 237 |
); |
| 238 |
|
| 239 |
while (($url, $localfile) = each(%mirrors)) { |
| 240 |
mirror($url, $localfile); |
| 241 |
} |
| 242 |
|
| 243 |
Or, as a perl one-liner: |
| 244 |
|
| 245 |
perl -MLWP::Simple -e 'mirror("http://www.perl.com/", "perl.html")'; |
| 246 |
|
| 247 |
The document will not be transfered unless it has been updated. |
| 248 |
|
| 249 |
|
| 250 |
|
| 251 |
=head1 LARGE DOCUMENTS |
| 252 |
|
| 253 |
If the document you want to fetch is too large to be kept in memory, |
| 254 |
then you have two alternatives. You can instruct the library to write |
| 255 |
the document content to a file (second $ua->request() argument is a file |
| 256 |
name): |
| 257 |
|
| 258 |
use LWP::UserAgent; |
| 259 |
$ua = LWP::UserAgent->new; |
| 260 |
|
| 261 |
my $req = HTTP::Request->new(GET => |
| 262 |
'http://www.linpro.no/lwp/libwww-perl-5.46.tar.gz'); |
| 263 |
$res = $ua->request($req, "libwww-perl.tar.gz"); |
| 264 |
if ($res->is_success) { |
| 265 |
print "ok\n"; |
| 266 |
} |
| 267 |
else { |
| 268 |
print $res->status_line, "\n"; |
| 269 |
} |
| 270 |
|
| 271 |
|
| 272 |
Or you can process the document as it arrives (second $ua->request() |
| 273 |
argument is a code reference): |
| 274 |
|
| 275 |
use LWP::UserAgent; |
| 276 |
$ua = LWP::UserAgent->new; |
| 277 |
$URL = 'ftp://ftp.unit.no/pub/rfc/rfc-index.txt'; |
| 278 |
|
| 279 |
my $expected_length; |
| 280 |
my $bytes_received = 0; |
| 281 |
my $res = |
| 282 |
$ua->request(HTTP::Request->new(GET => $URL), |
| 283 |
sub { |
| 284 |
my($chunk, $res) = @_; |
| 285 |
$bytes_received += length($chunk); |
| 286 |
unless (defined $expected_length) { |
| 287 |
$expected_length = $res->content_length || 0; |
| 288 |
} |
| 289 |
if ($expected_length) { |
| 290 |
printf STDERR "%d%% - ", |
| 291 |
100 * $bytes_received / $expected_length; |
| 292 |
} |
| 293 |
print STDERR "$bytes_received bytes received\n"; |
| 294 |
|
| 295 |
# XXX Should really do something with the chunk itself |
| 296 |
# print $chunk; |
| 297 |
}); |
| 298 |
print $res->status_line, "\n"; |
| 299 |
|
| 300 |
|
| 301 |
|
| 302 |
=head1 COPYRIGHT |
| 303 |
|
| 304 |
Copyright 1996-2001, Gisle Aas |
| 305 |
|
| 306 |
This library is free software; you can redistribute it and/or |
| 307 |
modify it under the same terms as Perl itself. |