Bottom layers: physical network, IP, TCP
Top layers: FTP, telnet, http
Type of connection: connection-oriented vs. packet-oriented
Comments:
1) "remote" supports the same commands as a file handle.
"remote.readlines()" reads the content line by line
into a list. "remote.read()" reads all of the lines into a single string.
Unless a for loop is used to print the content, remote.read() is
usually more useful.
2) remote.info() contains the MIME headers
1.2 Create a form that asks the user to submit a URL. Write a CGI script that opens the URL, reads the content and displays it to the user. (There are all kinds of applications for such scripts, for example, they can serve as meta-search engines or could filter content, such as images or advertisements, out from a web page.)
1.3 Create a form that asks the user to submit a URL and a search term. Write a CGI script that opens the URL and displays all the lines that contain the search term.
1.4 (Optional) Write a script that determines whether a 404 File Not Found message is displayed. You should use remote.read() and regular expressions.
#!/usr/bin/env python
import urllib
import sys
import htmllib
import formatter
url = "http://www.slis.indiana.edu/"
##################### connect to remote page #################
try:
remote = urllib.urlopen(url)
except:
print "cannot open URL"
sys.exit()
#################### read the content of the remote page ####
content = remote.read()
#################### parse the HTML of the page #############
parser = htmllib.HTMLParser(formatter.NullFormatter())
parser.feed(content)
parser.close()
#################### get the links (anchors) ##################
links = parser.anchorlist
for eachlink in links:
print eachlink
2.2 (Optional) You can combine the example above and 1.4 to check for broken links. To simplify matters only check the links that actually start with html://.