1
=head1 NAME
2
3
lwpcook - The libwww-perl cookbook
4
5
=head1 DESCRIPTION
6
7
This document contain some examples that show typical usage of the
8
libwww-perl library.  You should consult the documentation for the
9
individual modules for more detail.
10
11
All examples should be runnable programs. You can, in most cases, test
12
the code sections by piping the program text directly to perl.
13
14
15
16
=head1 GET
17
18
It is very easy to use this library to just fetch documents from the
19
net.  The LWP::Simple module provides the get() function that return
20
the document specified by its URL argument:
21
22
  use LWP::Simple;
23
  $doc = get 'http://www.linpro.no/lwp/';
24
25
or, as a perl one-liner using the getprint() function:
26
27
  perl -MLWP::Simple -e 'getprint "http://www.linpro.no/lwp/"'
28
29
or, how about fetching the latest perl by running this command:
30
31
  perl -MLWP::Simple -e '
32
    getstore "ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz",
33
             "perl.tar.gz"'
34
35
You will probably first want to find a CPAN site closer to you by
36
running something like the following command:
37
38
  perl -MLWP::Simple -e 'getprint "http://www.perl.com/perl/CPAN/CPAN.html"'
39
40
Enough of this simple stuff!  The LWP object oriented interface gives
41
you more control over the request sent to the server.  Using this
42
interface you have full control over headers sent and how you want to
43
handle the response returned.
44
45
  use LWP::UserAgent;
46
  $ua = LWP::UserAgent->new;
47
  $ua->agent("$0/0.1 " . $ua->agent);
48
  # $ua->agent("Mozilla/8.0") # pretend we are very capable browser
49
50
  $req = HTTP::Request->new(GET => 'http://www.linpro.no/lwp');
51
  $req->header('Accept' => 'text/html');
52
53
  # send request
54
  $res = $ua->request($req);
55
56
  # check the outcome
57
  if ($res->is_success) {
58
     print $res->decoded_content;
59
  }
60
  else {
61
     print "Error: " . $res->status_line . "\n";
62
  }
63
64
The lwp-request program (alias GET) that is distributed with the
65
library can also be used to fetch documents from WWW servers.
66
67
68
69
=head1 HEAD
70
71
If you just want to check if a document is present (i.e. the URL is
72
valid) try to run code that looks like this:
73
74
  use LWP::Simple;
75
76
  if (head($url)) {
77
     # ok document exists
78
  }
79
80
The head() function really returns a list of meta-information about
81
the document.  The first three values of the list returned are the
82
document type, the size of the document, and the age of the document.
83
84
More control over the request or access to all header values returned
85
require that you use the object oriented interface described for GET
86
above.  Just s/GET/HEAD/g.
87
88
89
=head1 POST
90
91
There is no simple procedural interface for posting data to a WWW server.  You
92
must use the object oriented interface for this. The most common POST
93
operation is to access a WWW form application:
94
95
  use LWP::UserAgent;
96
  $ua = LWP::UserAgent->new;
97
98
  my $req = HTTP::Request->new(POST => 'http://www.perl.com/cgi-bin/BugGlimpse');
99
  $req->content_type('application/x-www-form-urlencoded');
100
  $req->content('match=www&errors=0');
101
102
  my $res = $ua->request($req);
103
  print $res->as_string;
104
105
Lazy people use the HTTP::Request::Common module to set up a suitable
106
POST request message (it handles all the escaping issues) and has a
107
suitable default for the content_type:
108
109
  use HTTP::Request::Common qw(POST);
110
  use LWP::UserAgent;
111
  $ua = LWP::UserAgent->new;
112
113
  my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse',
114
                [ search => 'www', errors => 0 ];
115
116
  print $ua->request($req)->as_string;
117
118
The lwp-request program (alias POST) that is distributed with the
119
library can also be used for posting data.
120
121
122
123
=head1 PROXIES
124
125
Some sites use proxies to go through fire wall machines, or just as
126
cache in order to improve performance.  Proxies can also be used for
127
accessing resources through protocols not supported directly (or
128
supported badly :-) by the libwww-perl library.
129
130
You should initialize your proxy setting before you start sending
131
requests:
132
133
  use LWP::UserAgent;
134
  $ua = LWP::UserAgent->new;
135
  $ua->env_proxy; # initialize from environment variables
136
  # or
137
  $ua->proxy(ftp  => 'http://proxy.myorg.com');
138
  $ua->proxy(wais => 'http://proxy.myorg.com');
139
  $ua->no_proxy(qw(no se fi));
140
141
  my $req = HTTP::Request->new(GET => 'wais://xxx.com/');
142
  print $ua->request($req)->as_string;
143
144
The LWP::Simple interface will call env_proxy() for you automatically.
145
Applications that use the $ua->env_proxy() method will normally not
146
use the $ua->proxy() and $ua->no_proxy() methods.
147
148
Some proxies also require that you send it a username/password in
149
order to let requests through.  You should be able to add the
150
required header, with something like this:
151
152
 use LWP::UserAgent;
153
154
 $ua = LWP::UserAgent->new;
155
 $ua->proxy(['http', 'ftp'] => 'http://username:password@proxy.myorg.com');
156
157
 $req = HTTP::Request->new('GET',"http://www.perl.com");
158
159
 $res = $ua->request($req);
160
 print $res->decoded_content if $res->is_success;
161
162
Replace C<proxy.myorg.com>, C<username> and
163
C<password> with something suitable for your site.
164
165
166
=head1 ACCESS TO PROTECTED DOCUMENTS
167
168
Documents protected by basic authorization can easily be accessed
169
like this:
170
171
  use LWP::UserAgent;
172
  $ua = LWP::UserAgent->new;
173
  $req = HTTP::Request->new(GET => 'http://www.linpro.no/secret/');
174
  $req->authorization_basic('aas', 'mypassword');
175
  print $ua->request($req)->as_string;
176
177
The other alternative is to provide a subclass of I<LWP::UserAgent> that
178
overrides the get_basic_credentials() method. Study the I<lwp-request>
179
program for an example of this.
180
181
182
=head1 COOKIES
183
184
Some sites like to play games with cookies.  By default LWP ignores
185
cookies provided by the servers it visits.  LWP will collect cookies
186
and respond to cookie requests if you set up a cookie jar.
187
188
  use LWP::UserAgent;
189
  use HTTP::Cookies;
190
191
  $ua = LWP::UserAgent->new;
192
  $ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt",
193
				     autosave => 1));
194
195
  # and then send requests just as you used to do
196
  $res = $ua->request(HTTP::Request->new(GET => "http://www.yahoo.no"));
197
  print $res->status_line, "\n";
198
199
As you visit sites that send you cookies to keep, then the file
200
F<lwpcookies.txt"> will grow.
201
202
=head1 HTTPS
203
204
URLs with https scheme are accessed in exactly the same way as with
205
http scheme, provided that an SSL interface module for LWP has been
206
properly installed (see the F<README.SSL> file found in the
207
libwww-perl distribution for more details).  If no SSL interface is
208
installed for LWP to use, then you will get "501 Protocol scheme
209
'https' is not supported" errors when accessing such URLs.
210
211
Here's an example of fetching and printing a WWW page using SSL:
212
213
  use LWP::UserAgent;
214
215
  my $ua = LWP::UserAgent->new;
216
  my $req = HTTP::Request->new(GET => 'https://www.helsinki.fi/');
217
  my $res = $ua->request($req);
218
  if ($res->is_success) {
219
      print $res->as_string;
220
  }
221
  else {
222
      print "Failed: ", $res->status_line, "\n";
223
  }
224
225
=head1 MIRRORING
226
227
If you want to mirror documents from a WWW server, then try to run
228
code similar to this at regular intervals:
229
230
  use LWP::Simple;
231
232
  %mirrors = (
233
     'http://www.sn.no/'             => 'sn.html',
234
     'http://www.perl.com/'          => 'perl.html',
235
     'http://www.sn.no/libwww-perl/' => 'lwp.html',
236
     'gopher://gopher.sn.no/'        => 'gopher.html',
237
  );
238
239
  while (($url, $localfile) = each(%mirrors)) {
240
     mirror($url, $localfile);
241
  }
242
243
Or, as a perl one-liner:
244
245
  perl -MLWP::Simple -e 'mirror("http://www.perl.com/", "perl.html")';
246
247
The document will not be transfered unless it has been updated.
248
249
250
251
=head1 LARGE DOCUMENTS
252
253
If the document you want to fetch is too large to be kept in memory,
254
then you have two alternatives.  You can instruct the library to write
255
the document content to a file (second $ua->request() argument is a file
256
name):
257
258
  use LWP::UserAgent;
259
  $ua = LWP::UserAgent->new;
260
261
  my $req = HTTP::Request->new(GET =>
262
                'http://www.linpro.no/lwp/libwww-perl-5.46.tar.gz');
263
  $res = $ua->request($req, "libwww-perl.tar.gz");
264
  if ($res->is_success) {
265
     print "ok\n";
266
  }
267
  else {
268
     print $res->status_line, "\n";
269
  }
270
271
272
Or you can process the document as it arrives (second $ua->request()
273
argument is a code reference):
274
275
  use LWP::UserAgent;
276
  $ua = LWP::UserAgent->new;
277
  $URL = 'ftp://ftp.unit.no/pub/rfc/rfc-index.txt';
278
279
  my $expected_length;
280
  my $bytes_received = 0;
281
  my $res =
282
     $ua->request(HTTP::Request->new(GET => $URL),
283
               sub {
284
                   my($chunk, $res) = @_;
285
                   $bytes_received += length($chunk);
286
	           unless (defined $expected_length) {
287
	              $expected_length = $res->content_length || 0;
288
                   }
289
		   if ($expected_length) {
290
		        printf STDERR "%d%% - ",
291
	                          100 * $bytes_received / $expected_length;
292
                   }
293
	           print STDERR "$bytes_received bytes received\n";
294
295
                   # XXX Should really do something with the chunk itself
296
	           # print $chunk;
297
               });
298
   print $res->status_line, "\n";
299
300
301
302
=head1 COPYRIGHT
303
304
Copyright 1996-2001, Gisle Aas
305
306
This library is free software; you can redistribute it and/or
307
modify it under the same terms as Perl itself.