[Talk] The simplest web client
Nick Simicich
talk@flux.org
Mon, 11 Sep 2006 04:48:39 -0400
I was playing with the VOA's web page monitoring program at
http://voa.his.com last night - the VoA compares their own web pages
with popular web pages to determine whether they have good
availability. I noted that www.wikipedia.org was listed by the VoA as
having miserable availability.
This seems to be related to the VoA's web page sampler - it seems to do
an HTTP/1.1 fetch (1.0 won't work on www.wikipedia.org at all, but many
many web sites barf on 1.0 these days) and they do not include the
non-required User-Agent field. wikipedia sometimes barfs if there is no
user-agent header and sometimes it works. It seems to be related to the
way cacheing is set up there. It looks like they use squid in a mode
that matches user-agent - but instead of matching a null with a null,
they require that the user agent be specified - sometimes, sometimes
they do not.
While doing this research, I noted that:
( echo -en "GET / HTTP/1.1\r\nHost: www.wikipedia.org\r\n\r\n"; cat )
|nc -v www.wikipedia.org 80
Was a quick and dirty replacement for a web fetch routine that allowed
you to bypass everything so that you could see what was actually
happening - every header going in and out could be seen easily - pipe to
less and make everything out put to file descriptor 2, maybe, if you
need to page it.
( echo -en "GET / HTTP/1.1\r\nHost: www.wikipedia.org\r\nUser-Agent: dum
dum dum dum dum\r\n\r\n"; cat ) |nc -v www.wikipedia.org 80
(think South Park's Mormon episode)
always works.
But nc is netcat - not a standard command, though lots of us have
installed it - I wanted to report this bug and what if the person who
got the bug had not installed netcat? What do you do if you don't have
it? Telnet would not work for me - not sure why. But I was able to get
bash to work...with no programs other than echo and cat.
(echo -en "GET / HTTP/1.1\r\nHost: www.wikipedia.org\r\n\r\n" 1>&0; cat
-nA 1>&2 & cat </dev/tty 1>&0) 0<>/dev/tcp/www.wikipedia.com/80
This works sometimes and fails sometimes, just like the nc version - it
does throw one cat error.
And they say that bash is not the shell of choice! This works exactly
the same way when run under Windows using Cygwin and Bash, BTW. Isn't
that special?
--
Blog: http://majordomo.squawk.com/njs/blog/blogger.html
Atom: http://majordomo.squawk.com/njs/blog/atom.xml
RSS: http://majordomo.squawk.com/njs/blog/atom.rdf