1
=head1 NAME
2
3
lwptut -- An LWP Tutorial
4
5
=head1 DESCRIPTION
6
7
LWP (short for "Library for WWW in Perl") is a very popular group of
8
Perl modules for accessing data on the Web. Like most Perl
9
module-distributions, each of LWP's component modules comes with
10
documentation that is a complete reference to its interface. However,
11
there are so many modules in LWP that it's hard to know where to start
12
looking for information on how to do even the simplest most common
13
things.
14
15
Really introducing you to using LWP would require a whole book -- a book
16
that just happens to exist, called I<Perl & LWP>. But this article
17
should give you a taste of how you can go about some common tasks with
18
LWP.
19
20
21
=head2 Getting documents with LWP::Simple
22
23
If you just want to get what's at a particular URL, the simplest way
24
to do it is LWP::Simple's functions.
25
26
In a Perl program, you can call its C<get($url)> function.  It will try
27
getting that URL's content.  If it works, then it'll return the
28
content; but if there's some error, it'll return undef.
29
30
  my $url = 'http://freshair.npr.org/dayFA.cfm?todayDate=current';
31
    # Just an example: the URL for the most recent /Fresh Air/ show
32
33
  use LWP::Simple;
34
  my $content = get $url;
35
  die "Couldn't get $url" unless defined $content;
36
37
  # Then go do things with $content, like this:
38
39
  if($content =~ m/jazz/i) {
40
    print "They're talking about jazz today on Fresh Air!\n";
41
  }
42
  else {
43
    print "Fresh Air is apparently jazzless today.\n";
44
  }
45
46
The handiest variant on C<get> is C<getprint>, which is useful in Perl
47
one-liners.  If it can get the page whose URL you provide, it sends it
48
to STDOUT; otherwise it complains to STDERR.
49
50
  % perl -MLWP::Simple -e "getprint 'http://cpan.org/RECENT'"
51
52
That is the URL of a plaintext file that lists new files in CPAN in
53
the past two weeks.  You can easily make it part of a tidy little
54
shell command, like this one that mails you the list of new
55
C<Acme::> modules:
56
57
  % perl -MLWP::Simple -e "getprint 'http://cpan.org/RECENT'"  \
58
     | grep "/by-module/Acme" | mail -s "New Acme modules! Joy!" $USER
59
60
There are other useful functions in LWP::Simple, including one function
61
for running a HEAD request on a URL (useful for checking links, or
62
getting the last-revised time of a URL), and two functions for
63
saving/mirroring a URL to a local file. See L<the LWP::Simple
64
documentation|LWP::Simple> for the full details, or chapter 2 of I<Perl
65
& LWP> for more examples.
66
67
68
69
=for comment
70
 ##########################################################################
71
72
73
74
=head2 The Basics of the LWP Class Model
75
76
LWP::Simple's functions are handy for simple cases, but its functions
77
don't support cookies or authorization, don't support setting header
78
lines in the HTTP request, generally don't support reading header lines
79
in the HTTP response (notably the full HTTP error message, in case of an
80
error). To get at all those features, you'll have to use the full LWP
81
class model.
82
83
While LWP consists of dozens of classes, the main two that you have to
84
understand are L<LWP::UserAgent> and L<HTTP::Response>. LWP::UserAgent
85
is a class for "virtual browsers" which you use for performing requests,
86
and L<HTTP::Response> is a class for the responses (or error messages)
87
that you get back from those requests.
88
89
The basic idiom is C<< $response = $browser->get($url) >>, or more fully
90
illustrated:
91
92
  # Early in your program:
93
  
94
  use LWP 5.64; # Loads all important LWP classes, and makes
95
                #  sure your version is reasonably recent.
96
97
  my $browser = LWP::UserAgent->new;
98
  
99
  ...
100
  
101
  # Then later, whenever you need to make a get request:
102
  my $url = 'http://freshair.npr.org/dayFA.cfm?todayDate=current';
103
  
104
  my $response = $browser->get( $url );
105
  die "Can't get $url -- ", $response->status_line
106
   unless $response->is_success;
107
108
  die "Hey, I was expecting HTML, not ", $response->content_type
109
   unless $response->content_type eq 'text/html';
110
     # or whatever content-type you're equipped to deal with
111
112
  # Otherwise, process the content somehow:
113
  
114
  if($response->decoded_content =~ m/jazz/i) {
115
    print "They're talking about jazz today on Fresh Air!\n";
116
  }
117
  else {
118
    print "Fresh Air is apparently jazzless today.\n";
119
  }
120
121
There are two objects involved: C<$browser>, which holds an object of
122
class LWP::UserAgent, and then the C<$response> object, which is of
123
class HTTP::Response. You really need only one browser object per
124
program; but every time you make a request, you get back a new
125
HTTP::Response object, which will have some interesting attributes:
126
127
=over
128
129
=item *
130
131
A status code indicating
132
success or failure
133
(which you can test with C<< $response->is_success >>).
134
135
=item *
136
137
An HTTP status
138
line that is hopefully informative if there's failure (which you can
139
see with C<< $response->status_line >>,
140
returning something like "404 Not Found").
141
142
=item *
143
144
A MIME content-type like "text/html", "image/gif",
145
"application/xml", etc., which you can see with 
146
C<< $response->content_type >>
147
148
=item *
149
150
The actual content of the response, in C<< $response->decoded_content >>.
151
If the response is HTML, that's where the HTML source will be; if
152
it's a GIF, then C<< $response->decoded_content >> will be the binary
153
GIF data.
154
155
=item *
156
157
And dozens of other convenient and more specific methods that are
158
documented in the docs for L<HTML::Response>, and its superclasses
159
L<HTML::Message> and L<HTML::Headers>.
160
161
=back
162
163
164
165
=for comment
166
 ##########################################################################
167
168
169
170
=head2 Adding Other HTTP Request Headers
171
172
The most commonly used syntax for requests is C<< $response =
173
$browser->get($url) >>, but in truth, you can add extra HTTP header
174
lines to the request by adding a list of key-value pairs after the URL,
175
like so:
176
177
  $response = $browser->get( $url, $key1, $value1, $key2, $value2, ... );
178
179
For example, here's how to send some more Netscape-like headers, in case
180
you're dealing with a site that would otherwise reject your request:
181
182
183
  my @ns_headers = (
184
   'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
185
   'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*',
186
   'Accept-Charset' => 'iso-8859-1,*,utf-8',
187
   'Accept-Language' => 'en-US',
188
  );
189
190
  ...
191
  
192
  $response = $browser->get($url, @ns_headers);
193
194
If you weren't reusing that array, you could just go ahead and do this: 
195
196
  $response = $browser->get($url,
197
   'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
198
   'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*',
199
   'Accept-Charset' => 'iso-8859-1,*,utf-8',
200
   'Accept-Language' => 'en-US',
201
  );
202
203
If you were only ever changing the 'User-Agent' line, you could just change
204
the C<$browser> object's default line from "libwww-perl/5.65" (or the like)
205
to whatever you like, using the LWP::UserAgent C<agent> method:
206
207
   $browser->agent('Mozilla/4.76 [en] (Win98; U)');
208
209
210
211
=for comment
212
 ##########################################################################
213
214
215
216
=head2 Enabling Cookies
217
218
A default LWP::UserAgent object acts like a browser with its cookies
219
support turned off. There are various ways of turning it on, by setting
220
its C<cookie_jar> attribute. A "cookie jar" is an object representing
221
a little database of all
222
the HTTP cookies that a browser can know about. It can correspond to a
223
file on disk (the way Netscape uses its F<cookies.txt> file), or it can
224
be just an in-memory object that starts out empty, and whose collection of
225
cookies will disappear once the program is finished running.
226
227
To give a browser an in-memory empty cookie jar, you set its C<cookie_jar>
228
attribute like so:
229
230
  $browser->cookie_jar({});
231
232
To give it a copy that will be read from a file on disk, and will be saved
233
to it when the program is finished running, set the C<cookie_jar> attribute
234
like this:
235
236
  use HTTP::Cookies;
237
  $browser->cookie_jar( HTTP::Cookies->new(
238
    'file' => '/some/where/cookies.lwp',
239
        # where to read/write cookies
240
    'autosave' => 1,
241
        # save it to disk when done
242
  ));
243
244
That file will be an LWP-specific format. If you want to be access the
245
cookies in your Netscape cookies file, you can use the
246
HTTP::Cookies::Netscape class:
247
248
  use HTTP::Cookies;
249
    # yes, loads HTTP::Cookies::Netscape too
250
  
251
  $browser->cookie_jar( HTTP::Cookies::Netscape->new(
252
    'file' => 'c:/Program Files/Netscape/Users/DIR-NAME-HERE/cookies.txt',
253
        # where to read cookies
254
  ));
255
256
You could add an C<< 'autosave' => 1 >> line as further above, but at
257
time of writing, it's uncertain whether Netscape might discard some of
258
the cookies you could be writing back to disk.
259
260
261
262
=for comment
263
 ##########################################################################
264
265
266
267
=head2 Posting Form Data
268
269
Many HTML forms send data to their server using an HTTP POST request, which
270
you can send with this syntax:
271
272
 $response = $browser->post( $url,
273
   [
274
     formkey1 => value1, 
275
     formkey2 => value2, 
276
     ...
277
   ],
278
 );
279
280
Or if you need to send HTTP headers:
281
282
 $response = $browser->post( $url,
283
   [
284
     formkey1 => value1, 
285
     formkey2 => value2, 
286
     ...
287
   ],
288
   headerkey1 => value1, 
289
   headerkey2 => value2, 
290
 );
291
292
For example, the following program makes a search request to AltaVista
293
(by sending some form data via an HTTP POST request), and extracts from
294
the HTML the report of the number of matches:
295
296
  use strict;
297
  use warnings;
298
  use LWP 5.64;
299
  my $browser = LWP::UserAgent->new;
300
  
301
  my $word = 'tarragon';
302
  
303
  my $url = 'http://www.altavista.com/sites/search/web';
304
  my $response = $browser->post( $url,
305
    [ 'q' => $word,  # the Altavista query string
306
      'pg' => 'q', 'avkw' => 'tgz', 'kl' => 'XX',
307
    ]
308
  );
309
  die "$url error: ", $response->status_line
310
   unless $response->is_success;
311
  die "Weird content type at $url -- ", $response->content_type
312
   unless $response->content_type eq 'text/html';
313
314
  if( $response->decoded_content =~ m{AltaVista found ([0-9,]+) results} ) {
315
    # The substring will be like "AltaVista found 2,345 results"
316
    print "$word: $1\n";
317
  }
318
  else {
319
    print "Couldn't find the match-string in the response\n";
320
  }
321
322
323
324
=for comment
325
 ##########################################################################
326
327
328
329
=head2 Sending GET Form Data
330
331
Some HTML forms convey their form data not by sending the data
332
in an HTTP POST request, but by making a normal GET request with
333
the data stuck on the end of the URL.  For example, if you went to
334
C<imdb.com> and ran a search on "Blade Runner", the URL you'd see
335
in your browser window would be:
336
337
  http://us.imdb.com/Tsearch?title=Blade%20Runner&restrict=Movies+and+TV
338
339
To run the same search with LWP, you'd use this idiom, which involves
340
the URI class:
341
342
  use URI;
343
  my $url = URI->new( 'http://us.imdb.com/Tsearch' );
344
    # makes an object representing the URL
345
  
346
  $url->query_form(  # And here the form data pairs:
347
    'title'    => 'Blade Runner',
348
    'restrict' => 'Movies and TV',
349
  );
350
  
351
  my $response = $browser->get($url);
352
353
See chapter 5 of I<Perl & LWP> for a longer discussion of HTML forms
354
and of form data, and chapters 6 through 9 for a longer discussion of
355
extracting data from HTML.
356
357
358
359
=head2 Absolutizing URLs
360
361
The URI class that we just mentioned above provides all sorts of methods
362
for accessing and modifying parts of URLs (such as asking sort of URL it
363
is with C<< $url->scheme >>, and asking what host it refers to with C<<
364
$url->host >>, and so on, as described in L<the docs for the URI
365
class|URI>.  However, the methods of most immediate interest
366
are the C<query_form> method seen above, and now the C<new_abs> method
367
for taking a probably-relative URL string (like "../foo.html") and getting
368
back an absolute URL (like "http://www.perl.com/stuff/foo.html"), as
369
shown here:
370
371
  use URI;
372
  $abs = URI->new_abs($maybe_relative, $base);
373
374
For example, consider this program that matches URLs in the HTML
375
list of new modules in CPAN:
376
377
  use strict;
378
  use warnings;
379
  use LWP;
380
  my $browser = LWP::UserAgent->new;
381
  
382
  my $url = 'http://www.cpan.org/RECENT.html';
383
  my $response = $browser->get($url);
384
  die "Can't get $url -- ", $response->status_line
385
   unless $response->is_success;
386
  
387
  my $html = $response->decoded_content;
388
  while( $html =~ m/<A HREF=\"(.*?)\"/g ) {
389
    print "$1\n";
390
  }
391
392
When run, it emits output that starts out something like this:
393
394
  MIRRORING.FROM
395
  RECENT
396
  RECENT.html
397
  authors/00whois.html
398
  authors/01mailrc.txt.gz
399
  authors/id/A/AA/AASSAD/CHECKSUMS
400
  ...
401
402
However, if you actually want to have those be absolute URLs, you
403
can use the URI module's C<new_abs> method, by changing the C<while>
404
loop to this:
405
406
  while( $html =~ m/<A HREF=\"(.*?)\"/g ) {
407
    print URI->new_abs( $1, $response->base ) ,"\n";
408
  }
409
410
(The C<< $response->base >> method from L<HTTP::Message|HTTP::Message>
411
is for returning what URL
412
should be used for resolving relative URLs -- it's usually just
413
the same as the URL that you requested.)
414
415
That program then emits nicely absolute URLs:
416
417
  http://www.cpan.org/MIRRORING.FROM
418
  http://www.cpan.org/RECENT
419
  http://www.cpan.org/RECENT.html
420
  http://www.cpan.org/authors/00whois.html
421
  http://www.cpan.org/authors/01mailrc.txt.gz
422
  http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS
423
  ...
424
425
See chapter 4 of I<Perl & LWP> for a longer discussion of URI objects.
426
427
Of course, using a regexp to match hrefs is a bit simplistic, and for
428
more robust programs, you'll probably want to use an HTML-parsing module
429
like L<HTML::LinkExtor> or L<HTML::TokeParser> or even maybe
430
L<HTML::TreeBuilder>.
431
432
433
434
435
=for comment
436
 ##########################################################################
437
438
=head2 Other Browser Attributes
439
440
LWP::UserAgent objects have many attributes for controlling how they
441
work.  Here are a few notable ones:
442
443
=over
444
445
=item *
446
447
C<< $browser->timeout(15); >>
448
449
This sets this browser object to give up on requests that don't answer
450
within 15 seconds.
451
452
453
=item *
454
455
C<< $browser->protocols_allowed( [ 'http', 'gopher'] ); >>
456
457
This sets this browser object to not speak any protocols other than HTTP
458
and gopher. If it tries accessing any other kind of URL (like an "ftp:"
459
or "mailto:" or "news:" URL), then it won't actually try connecting, but
460
instead will immediately return an error code 500, with a message like
461
"Access to 'ftp' URIs has been disabled".
462
463
464
=item *
465
466
C<< use LWP::ConnCache; $browser->conn_cache(LWP::ConnCache->new()); >>
467
468
This tells the browser object to try using the HTTP/1.1 "Keep-Alive"
469
feature, which speeds up requests by reusing the same socket connection
470
for multiple requests to the same server.
471
472
473
=item *
474
475
C<< $browser->agent( 'SomeName/1.23 (more info here maybe)' ) >>
476
477
This changes how the browser object will identify itself in
478
the default "User-Agent" line is its HTTP requests.  By default,
479
it'll send "libwww-perl/I<versionnumber>", like
480
"libwww-perl/5.65".  You can change that to something more descriptive
481
like this:
482
483
  $browser->agent( 'SomeName/3.14 (contact@robotplexus.int)' );
484
485
Or if need be, you can go in disguise, like this:
486
487
  $browser->agent( 'Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC)' );
488
489
490
=item *
491
492
C<< push @{ $ua->requests_redirectable }, 'POST'; >>
493
494
This tells this browser to obey redirection responses to POST requests
495
(like most modern interactive browsers), even though the HTTP RFC says
496
that should not normally be done.
497
498
499
=back
500
501
502
For more options and information, see L<the full documentation for
503
LWP::UserAgent|LWP::UserAgent>.
504
505
506
507
=for comment
508
 ##########################################################################
509
510
511
512
=head2 Writing Polite Robots
513
514
If you want to make sure that your LWP-based program respects F<robots.txt>
515
files and doesn't make too many requests too fast, you can use the LWP::RobotUA
516
class instead of the LWP::UserAgent class.
517
518
LWP::RobotUA class is just like LWP::UserAgent, and you can use it like so:
519
520
  use LWP::RobotUA;
521
  my $browser = LWP::RobotUA->new('YourSuperBot/1.34', 'you@yoursite.com');
522
    # Your bot's name and your email address
523
524
  my $response = $browser->get($url);
525
526
But HTTP::RobotUA adds these features:
527
528
529
=over
530
531
=item *
532
533
If the F<robots.txt> on C<$url>'s server forbids you from accessing
534
C<$url>, then the C<$browser> object (assuming it's of class LWP::RobotUA)
535
won't actually request it, but instead will give you back (in C<$response>) a 403 error
536
with a message "Forbidden by robots.txt".  That is, if you have this line:
537
538
  die "$url -- ", $response->status_line, "\nAborted"
539
   unless $response->is_success;
540
541
then the program would die with an error message like this:
542
543
  http://whatever.site.int/pith/x.html -- 403 Forbidden by robots.txt
544
  Aborted at whateverprogram.pl line 1234
545
546
=item *
547
548
If this C<$browser> object sees that the last time it talked to
549
C<$url>'s server was too recently, then it will pause (via C<sleep>) to
550
avoid making too many requests too often. How long it will pause for, is
551
by default one minute -- but you can control it with the C<<
552
$browser->delay( I<minutes> ) >> attribute.
553
554
For example, this code:
555
556
  $browser->delay( 7/60 );
557
558
...means that this browser will pause when it needs to avoid talking to
559
any given server more than once every 7 seconds.
560
561
=back
562
563
For more options and information, see L<the full documentation for
564
LWP::RobotUA|LWP::RobotUA>.
565
566
567
568
569
570
=for comment
571
 ##########################################################################
572
573
=head2 Using Proxies
574
575
In some cases, you will want to (or will have to) use proxies for
576
accessing certain sites and/or using certain protocols. This is most
577
commonly the case when your LWP program is running (or could be running)
578
on a machine that is behind a firewall.
579
580
To make a browser object use proxies that are defined in the usual
581
environment variables (C<HTTP_PROXY>, etc.), just call the C<env_proxy>
582
on a user-agent object before you go making any requests on it.
583
Specifically:
584
585
  use LWP::UserAgent;
586
  my $browser = LWP::UserAgent->new;
587
  
588
  # And before you go making any requests:
589
  $browser->env_proxy;
590
591
For more information on proxy parameters, see L<the LWP::UserAgent
592
documentation|LWP::UserAgent>, specifically the C<proxy>, C<env_proxy>,
593
and C<no_proxy> methods.
594
595
596
597
=for comment
598
 ##########################################################################
599
600
=head2 HTTP Authentication
601
602
Many web sites restrict access to documents by using "HTTP
603
Authentication". This isn't just any form of "enter your password"
604
restriction, but is a specific mechanism where the HTTP server sends the
605
browser an HTTP code that says "That document is part of a protected
606
'realm', and you can access it only if you re-request it and add some
607
special authorization headers to your request".
608
609
For example, the Unicode.org admins stop email-harvesting bots from
610
harvesting the contents of their mailing list archives, by protecting
611
them with HTTP Authentication, and then publicly stating the username
612
and password (at C<http://www.unicode.org/mail-arch/>) -- namely
613
username "unicode-ml" and password "unicode".  
614
615
For example, consider this URL, which is part of the protected
616
area of the web site:
617
618
  http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html
619
620
If you access that with a browser, you'll get a prompt
621
like 
622
"Enter username and password for 'Unicode-MailList-Archives' at server
623
'www.unicode.org'".
624
625
In LWP, if you just request that URL, like this:
626
627
  use LWP;
628
  my $browser = LWP::UserAgent->new;
629
630
  my $url =
631
   'http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html';
632
  my $response = $browser->get($url);
633
634
  die "Error: ", $response->header('WWW-Authenticate') || 'Error accessing',
635
    #  ('WWW-Authenticate' is the realm-name)
636
    "\n ", $response->status_line, "\n at $url\n Aborting"
637
   unless $response->is_success;
638
639
Then you'll get this error:
640
641
  Error: Basic realm="Unicode-MailList-Archives"
642
   401 Authorization Required
643
   at http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html
644
   Aborting at auth1.pl line 9.  [or wherever]
645
646
...because the C<$browser> doesn't know any the username and password
647
for that realm ("Unicode-MailList-Archives") at that host
648
("www.unicode.org").  The simplest way to let the browser know about this
649
is to use the C<credentials> method to let it know about a username and
650
password that it can try using for that realm at that host.  The syntax is:
651
652
  $browser->credentials(
653
    'servername:portnumber',
654
    'realm-name',
655
   'username' => 'password'
656
  );
657
658
In most cases, the port number is 80, the default TCP/IP port for HTTP; and
659
you usually call the C<credentials> method before you make any requests.
660
For example:
661
662
  $browser->credentials(
663
    'reports.mybazouki.com:80',
664
    'web_server_usage_reports',
665
    'plinky' => 'banjo123'
666
  );
667
668
So if we add the following to the program above, right after the C<<
669
$browser = LWP::UserAgent->new; >> line...
670
671
  $browser->credentials(  # add this to our $browser 's "key ring"
672
    'www.unicode.org:80',
673
    'Unicode-MailList-Archives',
674
    'unicode-ml' => 'unicode'
675
  );
676
677
...then when we run it, the request succeeds, instead of causing the
678
C<die> to be called.
679
680
681
682
=for comment
683
 ##########################################################################
684
685
=head2 Accessing HTTPS URLs
686
687
When you access an HTTPS URL, it'll work for you just like an HTTP URL
688
would -- if your LWP installation has HTTPS support (via an appropriate
689
Secure Sockets Layer library).  For example:
690
691
  use LWP;
692
  my $url = 'https://www.paypal.com/';   # Yes, HTTPS!
693
  my $browser = LWP::UserAgent->new;
694
  my $response = $browser->get($url);
695
  die "Error at $url\n ", $response->status_line, "\n Aborting"
696
   unless $response->is_success;
697
  print "Whee, it worked!  I got that ",
698
   $response->content_type, " document!\n";
699
700
If your LWP installation doesn't have HTTPS support set up, then the
701
response will be unsuccessful, and you'll get this error message:
702
703
  Error at https://www.paypal.com/
704
   501 Protocol scheme 'https' is not supported
705
   Aborting at paypal.pl line 7.   [or whatever program and line]
706
707
If your LWP installation I<does> have HTTPS support installed, then the
708
response should be successful, and you should be able to consult
709
C<$response> just like with any normal HTTP response.
710
711
For information about installing HTTPS support for your LWP
712
installation, see the helpful F<README.SSL> file that comes in the
713
libwww-perl distribution.
714
715
716
=for comment
717
 ##########################################################################
718
719
720
721
=head2 Getting Large Documents
722
723
When you're requesting a large (or at least potentially large) document,
724
a problem with the normal way of using the request methods (like C<<
725
$response = $browser->get($url) >>) is that the response object in
726
memory will have to hold the whole document -- I<in memory>. If the
727
response is a thirty megabyte file, this is likely to be quite an
728
imposition on this process's memory usage.
729
730
A notable alternative is to have LWP save the content to a file on disk,
731
instead of saving it up in memory.  This is the syntax to use:
732
733
  $response = $ua->get($url,
734
                         ':content_file' => $filespec,
735
                      );
736
737
For example,
738
739
  $response = $ua->get('http://search.cpan.org/',
740
                         ':content_file' => '/tmp/sco.html'
741
                      );
742
743
When you use this C<:content_file> option, the C<$response> will have
744
all the normal header lines, but C<< $response->content >> will be
745
empty.
746
747
Note that this ":content_file" option isn't supported under older
748
versions of LWP, so you should consider adding C<use LWP 5.66;> to check
749
the LWP version, if you think your program might run on systems with
750
older versions.
751
752
If you need to be compatible with older LWP versions, then use
753
this syntax, which does the same thing:
754
755
  use HTTP::Request::Common;
756
  $response = $ua->request( GET($url), $filespec );
757
758
759
=for comment
760
 ##########################################################################
761
762
763
=head1 SEE ALSO
764
765
Remember, this article is just the most rudimentary introduction to
766
LWP -- to learn more about LWP and LWP-related tasks, you really
767
must read from the following:
768
769
=over
770
771
=item *
772
773
L<LWP::Simple> -- simple functions for getting/heading/mirroring URLs
774
775
=item *
776
777
L<LWP> -- overview of the libwww-perl modules
778
779
=item *
780
781
L<LWP::UserAgent> -- the class for objects that represent "virtual browsers"
782
783
=item *
784
785
L<HTTP::Response> -- the class for objects that represent the response to
786
a LWP response, as in C<< $response = $browser->get(...) >>
787
788
=item *
789
790
L<HTTP::Message> and L<HTTP::Headers> -- classes that provide more methods
791
to HTTP::Response.
792
793
=item *
794
795
L<URI> -- class for objects that represent absolute or relative URLs
796
797
=item *
798
799
L<URI::Escape> -- functions for URL-escaping and URL-unescaping strings
800
(like turning "this & that" to and from "this%20%26%20that").
801
802
=item *
803
804
L<HTML::Entities> -- functions for HTML-escaping and HTML-unescaping strings
805
(like turning "C. & E. BrontE<euml>" to and from "C. &amp; E. Bront&euml;")
806
807
=item *
808
809
L<HTML::TokeParser> and L<HTML::TreeBuilder> -- classes for parsing HTML
810
811
=item *
812
813
L<HTML::LinkExtor> -- class for finding links in HTML documents
814
815
=item *
816
817
The book I<Perl & LWP> by Sean M. Burke.  O'Reilly & Associates, 2002.
818
ISBN: 0-596-00178-9.  C<http://www.oreilly.com/catalog/perllwp/>
819
820
=back
821
822
823
=head1 COPYRIGHT
824
825
Copyright 2002, Sean M. Burke.  You can redistribute this document and/or
826
modify it, but only under the same terms as Perl itself.
827
828
=head1 AUTHOR
829
830
Sean M. Burke C<sburke@cpan.org>
831
832
=for comment
833
 ##########################################################################
834
835
=cut
836
837
# End of Pod