Last updated: 2/26/03
Link back to course Welcome

The Internet:
Structure, Function and Applications
Computers and Society, GST 2710

  1. "Internet" can be used vaguely today. We will use the technical definition here. The Internet connects computers within local networks so that any connected computer can send computer information to any other.
    1. Each computer connected to the Internet is given a unique (no repeats) numerical "IP address" or just plain "IP" with the form of four bytes separated by three dots, e.g. 141.217.12.24 for the IP address of the CLL web server (www.cll.wayne.edu).
      1. IPs can be "static" -- unchanging, assigned by network administrator -- or "dynamic" -- assigned from a pool of IPs at each boot up by the router
        1. DHCP (Dynamic Host Configuration Protocol) is the protocol that assigns IP addresses to computers when they boot up, out of a common pool. One advantage of this for computer administrators is that it almost eliminates the problem of two computers with the same IP address. Most WSU computers (including WACC) use DHCP.
        2. Because of the way another system (the Domain Name System or DNS, see below) works, Internet servers, e.g. web servers and email servers) must have fixed IP addresses.
        3. Modem dialup gets a different IP address each session. AOL changes IPs at each "hit"
        4. Cable modems and other broadband services such as DSL have fixed IP addresses. This makes these computers attractive targets for hackers, and a personal firewall is a good idea.
    2. Information travels in packets - finite groups of bits. Packets have two parts
      1. Head is standardized - contains "meta information" - "from" address, length of packet, "to" address, etc. (time to die)
      2. Body is freeform - contains the information itself
    3. Computers are very inflexible, and must have an explicit order of which computer starts communication and what it does, how the second responds, etc. These are "protocols". The Internet uses the TCP/IP protocol for communication. This is actually two main protocols -- TCP and IP, with a host of others that are lumped in.
      1. IP is the raw transport mechanism, just throwing information out as fast as it can, without asking if it arrives.
      2. TCP sits on top, and uses IP both ways to confirm accurate arrival, and resend if not.
    4. Domain Name Server (DNS) system is another layer of protocol to make remembering addresses easier.
      1. Many servers (see below) have static IPs and employ "dot-com" names, e.g. www.cll.wayne.edu. These are called domain names, because the last group of letters is known as the domain. Primarily US: edu, com, org, gov, net
      2. Other countries use two-letter domain, e.g. de
      3. New domains added 1997: firm, store, arts, rec, info, web, nom
      4. The part of the domain name in front of the domain name is free form and unconstrained. For example, many people think that the "www" is required for a web site - it is not. As far as the Domain Name System is concerned, that is just letters.
      5. Client software goes to a local Domain Name Server (DNS) to get the IP (numerical) address
        1. Communication with DNS goes uses TCP/IP
        2. If local DNS does not have entry, kicked up to a higher-level DNS
      6. This only happens the first time during a session. For the rest of that session, the client remembers the IP address
      7. This happens without action from the user
      8. Sequence and errors are:
        1. Client sends out domain name seeking IP address. Local domain name server answers if it is found, kicks up to higher level if not. On-screen message is "Looking up host". If not found, error is "Has no domain name entry"
        2. Once IP address is found, Client goes to that server. On-screen message is "Contacting host" then "Waiting for reply, then "Reading file". If nothing received from server, error message is "No response from... Perhaps the server is down."
      9. TCP/IP developed incrementally by decentralized workers and informal groups. Protocols and software are freely and publicly available. Government support, especially at the start, was critical. Very different from proprietary development. 
      10. Audio and video could use another protocol instead of TCP, because dropping a bit or two is not serious, and faster transmission would more than make up for any "snow"
    5. This is what I will mean by "The Internet" -- a pipeline for delivering information between any two connected computers
      1. By connecting to the Internet, an organization extends it and provides alternate routes. Each organization funds its own part. Backbone is maintained by large communications companies.
      2. Standards were developed by public discussion, are not proprietary, many companies make routers and provide services. (Cisco Systems is largest manufacturer.)
    6. Seeing how the Internet works. We will use an Internet application called "tracert" (Trace Route) to see how information travels over the Internet.
      1. Bring up a DOS command line window by going to Start > Run then type cmd (alternate if this doesn't work: command) and then tap the <Enter> key.
      2. Where ^ is a space, type tracert^www.cll.wayne.edu and then tap the <Enter> key. The result is a list of all of the devices (computers, gateways and routers) that helped to deliver Internet packets from your computer (not listed) to the destination (last in the list). Notice that both the Domain Names and IP addresses are shown.
      3. To see a Domain Name Server, at the DOS command prompt type tracert^141.217.1.13 and then tap the <Enter> key.
      4. To see a more distant computer, try tracert to www.ibm.com, www.microsoft.com, or Wayne County Community College at www.wccc.edu (I went through Cleveland, Chicago and Philadelphia!)
      5. To see your computer's IP address, at the DOS command prompt type ipconfig (IP Configuration) and tap the <Enter> key. the line that says "IP address" is the IP address for your computer. The subnet mask and default gateway are information for getting out of the building.
      6. When you are done experimenting with the DOS window, at the DOS command prompt type exit and tap the <Enter> key.
  2. "Applications" are programs that use this transmission mechanism
    1. Peer applications have two computers acting as equals, but this is fairly rare. That is, until Napster and others. With Napster, the basic listing of songs and where to find them is on a central server, but the actual files are transported client-to-client. With Gnutella, everything is client-to-client. Very difficult to sue.
    2. Client-server is much more common
      1. A client requests information from a server, displays information when it is received. A web browser (Netscape or Internet Explorer, plus about fifty others) can also be called a web client.
      2. A server sits and waits for information request, services request when request is received. Server seems to be simpler, but it must be able to service simultaneous requests, also expected to be very robust -- always available
      3. Clients and servers using the same application protocol are (supposed to be) interchangeable.
    3. Email. Client A is first user, with an account on mail server #1, a second client, B, has an account on mail server #2. A addresses a message to B, sends it by transmitting it to mail server #1, mail server #1 sends it to mail server #2. Message waits until B logs on, picks up message. Uses simple text for messages, but can attach files to messages. There are two major protocols -- POP (Post Office Protocol) and IMAP (Internet Message Access Protocol). POP is simpler and more popular, IMAP more comprehensive and is commonly supposed to be the future. Client and server must use the same protocol, POP or IMAP. Some email server computers run both servers, and some email clients can be configured either way.
      Email.gif (5547 bytes)
      Internet email address has two parts separated by @. e.g. d.r.bowen@wayne.edu
      1. part to left of @ is name of account (d.r.bowen)
      2. part to right of @ is email server that account is on (wayne.edu)
    4. World Wide Web (the web). Client uses web client (a.k.a. web browser, e.g. Netscape Communicator or Microsoft Internet Explorer. User can request a file by (a) typing in the file, (b) clicking on a link containing the file as hidden text, or (c) selecting a bookmark, which is the specification for a file previously viewed. Server gets file and returns it, client displays it. HyperText Transport Protocol (HTTP) is the basic web protocol. (HyperText means linked text, but has provisions for graphics and many other extensions.)
      1. Anatomy of a URL (Universal Resource Locator, what you type into the Location or Address window of your browser.
        Example:
        http://www.cll.wayne.edu/isp/drbowen/casw03/welcome.htm
        1. http:// - The method (of transfer). http is optional. Other methods are
          1. file:// (You can open a file directly in your browser to check it out, without going through the web server, and this is the method used in that case.)
          2. ftp:// (File Transfer Protocol)
          3. telnet:// (Logging into a computer with a command line interface)
          4. gopher:// (Earlier test-based protocol without links inside documents)
        2. www.cll.wayne.edu - Domain Name of the web server. You can also use the numerical IP address, e.g. 141.217.142.149
        3. /isp/drbowen/casw03/ - The path of folders to the requested file, from the "document root" folder of the server.
        4. welcome.htm - The name of the requested file. The browser displays files with extensions of htm, html, gif, jpeg, and jpg, and for others, asks if you want to download the file. If no file is listed, web servers are configured with a default file name, which is sent from the folder in the URL.
        5. If the requested filename is the "default" filename, it does not have to be listed. This is good because the user has to type less. If there is no file extension at the end of the URL, the URL is interpreted as requesting the default file name. (Normally the default file name is index.htm or index.html. On the CLL web server, it is welcome.htm)
      2. The full URL specifies everything about the requested file. This is an absolute URL. If the requested file is on the same web server, an abbreviated form known as a relative URL can be used. This is particularly useful for creating links and loading images. There are several possible forms for relative URLs, depending on how close the requested page is to the current page.
        1. If the requested file is in the highest-level folder for this web server, only a "/" is necessary, followed by the filename if it is not the default filename
        2. If the requested page is in the same folder, only the name need be given.
        3. If the requested page is in a sub-folder, only the folder path from the folder for the current page, and the file name (if it is not the default file name) need be given. In this case, do not preceed the first folder with "/" - that is interpreted as the highest level folder for this web server

        NOTE: Relative URLs are very convenient, because if you develop the web site on one computer and the web server is another computer, then you do not have to worry about the higher-level folders on the web server, which the web master will often be reluctant to divulge (the folder structure is one element needed to hack a web site). Also, if web sites are moved, absolute URLs for the same web site are broken, while relative ones usually survive.

      3. Web file format is HTML - HyperText Markup Language. HTML files are simple text files with two types of content
        1. Text appears on the screen as typed except that multiple spaces and line starts (<Enter>) are ignored.
        2. Markup or formatting commands appear inside corner brackets <>, e.g. <center>...</center>
        3. Browser implements formatting commands
        4. Formatting also includes links, graphics, audio, video, accept user input, etc.
      4. For web-based email, browser takes information such as destination and message, sends it to web server, web server transfers it to an email server
      5. On the Internet, web traffic is increasing at a high rate of growth, doubling approximately every eighteen months or less. It has surpassed the previous leader, email traffic. Other indicators, such as the total number of servers, are growing at similar rates.There are probably serveral reasons for this popularity
        1. Connects all computer platforms
        2. Ease of use, including interactivity
        3. Colorful, attractive layout
        4. Wide variety of content, including purchasing from home
        5. Ability to search, although organization is not a strong point, hard to focus down on the content you want
      6. One convenient aspect of the World Wide Web is that the client and server are not in constant communication. The client requests a file, the server sends it back, and the two break contact until the next request. Therefore, if your modem fails and your Internet connection is broken, do nothing in your Browser when you see the message. do not panic and start clicking about, but simply redial the modem. You will be able to pick up where you left off, with no loss. The client and server will not have been aware that your Internet connection was broken. Other communications protocols are not so modem-tolerant, and you may lose everything if your modem fails. (Cable modem connections, and network connections, are usually reliable enough that broken Internet connections are not a concern.)
  3. Computer security
    1. A computer virus is a computer program that causes mischief or damage in one of several ways. A good antivirus program is a must these days. Symantec (Norton) and McAffee are the standard ones, and both a very good. There are two parts: the "scan engine" or program, and the virus signature files or data files that the scan engine looks for. The scan engine should automatically scan all files on the computer on a regular basis, and also examine each file as it comes in to the computer from whatever source; floppy diskette, email, web, etc. The scan engine typically needs to be updated every three to five years. Updating the engine costs money. The virus signature files need to be updated at least monthly, as thousands of new viruses can be found each month. An automatic update feature is highly advisable. If your antivirus program warns you that the signature files are getting old, update. Updating the signature files is usually free, using an Internet connection.
    2. Computer accounts consist of a public User Name and a private Password. The User Name provides no security; all of the security comes from a strong password; one that cannot be guessed easily. Hackers will apply programs to guess passwords, which can be very fast.
    3. Weak passwords include:
      1. No password (blank or null). This is often the first guess.
      2. Your User Name, or your User Name spelled backwards.
      3. A dictionary word (hackers will use electronic dictionaries).
    4. Some methods of generating strong passwords are:
      1. A completely random mixture of upper case letters, lower case letters, numbers and special characters such as *, %, and #, at least six characters long. There are approximately 1.4 trillion of these, so there would have to be an awful lot at stake in order to spend the necessary amount of time.
      2. Two unassociated dictionary words run together, such as jamraid or basebomb, but not baseball.
      3. The first or last letters of a sentence that you will remember, such as wcalicas for the first letters of "We cover a lot in Computers and Society."
    5. Especially in a computer lab, having the computer remember your password, as Windows and Internet Explorer often offer to do, is not a good idea.