Last updated: 2/21/02
Link back to course Welcome

The Internet
Structure, Function and Applications

  1. "Internet" can be used vaguely today. We will use the technical definition here. The Internet connects computers within local networks so that any connected computer can send computer information to any other.
    1. Most computers connected to the Internet are either part of a network or are connected via an ISP - Internet Service Provider. In either case, there is a "gateway" computer that can receive information from and direct it to the connected computer. The gateway computer has the direct internet connection.
    2. Each connected computer is given a numerical "IP address" or just plain "IP" with the form of four bytes separated by three dots, e.g. 141.217.142.149 for the IP address of the CLL web server. A gateway handles a "family" of IPs. For example, 141.217.142.--- is CLL - College of Lifelong Learning, including Interdisciplinary Studies Program. So, my desktop is 141.217.142.125. Within CLL, gateway is 141.217.142.1, and we are all connected together via an Novell NetWare LAN.

      InetWork.gif (5423 bytes)
      1. Both the left-hand and right-hand LANs in this picture have many computers, only one of which is shown. We want to send information from left-hand computer to right-hand computer over the Internet.
      2. LAN - Local Area Network. Shares files, printers. LANs (NetWare, Token Ring, AppleTalk) are usually proprietary. Proprietary arrangements are trade secrets and so do not interoperate. How the gateway and workstations communicate is different on each type of LAN.
      3. Internet protocols are public domain, how they work is published freely, anyone can use them.
      4. IP addresses must be unique - no repeats
      5. Organizations can be given control over a "domain" and allocate IP addresses within that domain. For example, 141.217.xxx.yyy is Wayne State University. WSU allocates 141.217.142 to CLL, and CLL allocates the yyy to individual computers.
      6. IPs can be "static" -- unchanging, assigned by network administrator -- or "dynamic" -- assigned from a pool of IPs at each boot up by the router
        1. AOL changes IPs at each "hit"
      7. Left-hand LAN uses its proprietary communications to send packet from computer to Gateway, Gateway uses Internet communications to send packet to right-hand Gateway, Right-hand LAN uses its proprietary communications to send packet to computer.
    3. Information travels in packets - finite groups of bits. Packets have two parts
      1. Head is standardized - contains "meta information" - "from" address, length of packet, "to" address, etc. (time to live)
      2. Body is freeform - contains the information itself
    4. Packets are steered to destination by Internet routers. Each router has an Internet address.
      1. Router has "router table" listing final destination and next hop, for each packet matches final destination, then sends packet on its next hop to the next router.
      2. There are usually many possible routes to destination, so routers have a method of making the choice, usually on the basis of low traffic and therefore probably fastest time
    5. Computers are very inflexible, and must have an explicit order of which computer starts communication and what it does, how the second responds, etc. These are "protocols". The Internet uses the TCP/IP protocol for communication. This is actually two main protocols -- TCP and IP, with a host of others that are lumped in.
      1. IP is the raw transport mechanism, just throwing information out as fast as it can, without asking if it arrives.
      2. TCP sits on top, and uses IP both ways to confirm accurate arrival, and resend if not
    6. Domain Name Server system is another layer of protocol to make remembering addresses easier.
      1. Many servers (see below) have static IPs and employ "dot-com" names, e.g. www.cll.wayne.edu. These are called domain names, because the last group of letters is known as the domain. Primarily US
        1. edu
        2. com
        3. org
        4. gov
        5. net
      2. Other countries use two-letter domain, e.g. de
      3. New domains added 1997
        1. firm
        2. store
        3. arts
        4. rec
        5. info
        6. web
        7. nom
      4. Client software goes to a local Domain Name Server (DNS) to get the IP (numerical) address
        1. Communication with DNS goes uses TCP/IP
        2. If local DNS does not have entry, kicked up to a higher-level DNS
      5. This only happens the first time during a session. For the rest of that session, the client remembers the IP address
      6. This happens without action from the user
      7. TCP/IP developed incrementally by decentralized workers and informal groups. Protocols and software are freely and publicly available. Government support, especially at the start, was critical. Very different from proprietary development.
    7. TCP/IP is the basic Internet communication protocol
      1. IP = Internet protocol
        1. One-way communication - packets go from A to B without any checking, implements IP addresses
        2. Not secure
        3. Fast
        4. Can run on many different types of hardware -- telephone wires, coax, optical fiber, etc.
        5. Current version is IPv4, being upgraded to IPv6. 4.3 billion IP4 addresses, not enough. Rapid growth, also many are not used. Number provided by IPv6: 3.4 followed by 38 zeroes
      2. TCP  uses ("sits on top of") IP. Transmission Control Protocol. Uses IP in both directions to implement secure transmission
        1. Waits for distant computer to acknowledge ("connection")
        2. Successive packets have sequence numbers.
        3. Flow control - halt, continue
        4. Slower
      3. Audio and video could use another protocol instead of TCP, because dropping a bit or two is not serious, and faster transmission would more than make up for any "snow"
    8. This is what I will mean by "The Internet" -- a pipeline for delivering information between any two connected computers
      1. By connecting to the Internet, an organization extends it and provides alternate routes. Each organization funds its own part. Backbone is maintained by large communications companies.
      2. Standards were developed by public discussion, are not proprietary, many companies make routers and provide services. (Cisco Systems is largest manufacturer.)
    9. Finding out (lab activity, works for Windows only)
      1. Setup - start a DOS command line window
        1. Click on "Start"
        2. Click on "Run..."
        3. Type (without the quotes) "cmd" and then tap the <Enter> key
        4. A "DOS command line window" will pop up. Notice the flashing cursor at the "C:>" prompt. (This "line" is called "the command line"; hence the name "command line window.")
      2. The IP address of the computer you are using.
        1. Type (without the quotes) "ipconfig" and then tap the <Enter> key. (What you type is a command to run the program "ipconfig.exe" which then displays the IP configuration of your computer.
        2. Notice the IP address of your computer. You will not need this to use the computer, but it does confirm that one is there. (Other computers may not have fixed IP addresses, and may display something else here.)
      3. Showing all of the routers that are involved in sending packets
        1. In your DOS command line window, with the cursor at the  prompt, type (without the quotes and making sure to include the space)
          "tracert www.cll.wayne.edu". This will display all of the computers involved in sending your packets. This should be a short list. Your computer is not shown, but the destination computer is. Notice that the Domain Name gets translated to the numerical IP address.
        2. Try a longer hop, to "www.michigan.gov", the home page site for the State of Michigan. This is a longer list. If your computer locks up doing this, hold down the <Ctrl> key and tap the "c" key, then letting go of both (Ctrl-c key combination). This will stop the program.
        3. Try a really long one such as "www.ibm.com" or any other commercial web site. If your computer locks up doing this, hold down the <Ctrl> key and tap the "c" key, then letting go of both (Ctrl-c key combination). This will stop the program.
      4. Close the DOS command line window by typing (at the C:> prompt and without using the quotes) "exit" and then tapping the <Enter> key.
      5. Go to a web site without the Domain Name but using the IP address directly.
        1. Start a web browser such as Netscape Navigator or Internet Explorer.
        2. Go to the course web site - http://www.cll.wayne.edu/isp/drbowen/webeduw02
        3. The numerical IP address for "www.cll.wayne.edu" is 141.142.217.149. Go to the course web site using this form by going to http://141.217.142.149/isp/drbowen/webeduw02.
  2. "Applications" are programs that use this transmission mechanism
    1. Peer applications have two computers acting as equals, but this is fairly rare
    2. Client-server is much more common
      1. A client requests information from a server, displays information when it is received
      2. Server sits and waits for information request, services request when request is received. Server seems to be simpler, but it must be able to service simultaneous requests, also expected to be very robust -- always available
      3. Clients and servers using the same application protocol are (supposed to be) interchangeable.
        1. E.g., email has three primary protocols
          1. POP (currently POP3) or Post Office Protocol
          2. IMAP (currently IMAP4) or Internet Mail Access Protocol
          3. Web-based email such as hotmail. Becoming very popular since it requires much less configuration than others, web-knowledgeable users already know how to use it.
          4. Client and server must be matched at each end, but can be different at the two ends, that is, any POP3 client can work with any POP3 server.
    3. World Wide Web (the web). Client uses web client (a.k.a. web browser, e.g. Netscape Communicator or Microsoft Internet Explorer. User can request a file by (a) typing in the file, (b) clicking on a link containing the file as hidden text, or (c) selecting a bookmark, which is the specification for a file previously viewed. Server gets file and returns it, client displays it. HyperText Transport Protocol (HTTP) is the basic web protocol. (HyperText means linked text, but has provisions for graphics and many other extensions.)
      1. Anatomy of a URL (Universal Resource Locator, what you type into the Location or Address window of your browser.
        Example:
        http://www.cll.wayne.edu/isp/drbowen/internet/welcome.htm
        1. http:// - The method (of transfer). http is optional. If not present, this defaults to (is treated as) http://. Other methods are
          1. ftp:// (File Transfer Protocol)
          2. telnet:// (Logging into a computer with a command line interface)
          3. gopher:// (Earlier test-based protocol without links inside documents)
          4. file:// (You can open a file directly in your browser to check it out, without going through the web server, and this is the method used in that case.)
        2. www.cll.wayne.edu - Domain Name of the web server. You can also use the numerical IP address, e.g. 141.217.142.149
        3. /isp/drbowen/internet/ - The path of folders to the requested file, from the "document root" folder of the server.
          1. The path of folder for the case below would be \doc\isp\coord. (The web server path is just for the last part. The web server starts from a predetermined folder which is not shown to the user for security purposes.)
        4. welcome.htm - The name of the requested file. The browser displays files with extensions of htm, html, gif, jpeg, and jpg, and for others, asks if you want to download the file. If no file is listed, web servers are configured with a default file name, which is sent from the folder in the URL.
        5. If the requested filename is the "default" filename, it does not have to be listed. This is good because the user has to type less. If there is no file extension at the end of the URL, the URL is interpreted as requesting the default file name. (Normally the default file name is index.htm or index.html. On the CLL web server, it is welcome.htm)
      2. The full URL specifies everything about the requested file. This is an absolute URL. If the requested file is on the same web server, an abbreviated form known as a relative URL can be used. This is particularly useful for creating links and loading images. There are several possible forms for relative URLs, depending on how close the requested page is to the current page.
        1. If the requested file is in the highest-level folder for this web server, only a "/" is necessary, followed by the filename if it is not the default filename
        2. If the requested page is in the same folder, only the name need be given.
        3. If the requested page is in a sub-folder, only the folder path from the folder for the current page, and the file name (if it is not the default file name) need be given. In this case, do not precede the first folder with "/" - that is interpreted as the highest level folder for this web server

        NOTE: Relative URLs are very convenient, because if you develop the web site on one computer and the web server is another computer, then you do not have to worry about the higher-level folders on the web server, which the web master will often be reluctant to divulge (the folder structure is one element needed to hack a web site). Also, if web sites are moved, absolute URLs for the same web site are broken, while relative ones usually survive.

      3. Web file format is HTML - HyperText Markup Language. HTML files are simple text files with two types of content
        1. Text appears on the screen as typed except that multiple spaces and line starts (<Enter>) are ignored.
        2. Markup or formatting commands appear inside corner brackets <>, e.g. <center>...</center>
        3. Browser implements formatting commands
        4. Formatting also includes links, graphics, audio, video, accept user input, etc.
      4. For web-based email, browser takes information such as destination and message, sends it to web server, web server transfers it to an email server
      5. On the Internet, web traffic is increasing at a high rate of growth, doubling approximately every eighteen months or less. It has surpassed the previous leader, email traffic. Other indicators, such as the total number of servers, are growing at similar rates. There are probably several reasons for this popularity
        1. Connects all computer platforms
        2. Ease of use, including interactivity
        3. Colorful, attractive layout
        4. Wide variety of content, including purchasing from home
        5. Ability to search, although organization is not a strong point, hard to focus down on the content you want
    4. Email. Client A is first user, with an account on mail server #1, a second client, B, has an account on mail server #2. A addresses a message to B, sends it by transmitting it to mail server #1, mail server #1 sends it to mail server #2. Message waits until B logs on, picks up message. Uses simple text for messages, but can attach files to messages. There are two major protocols -- POP (Post Office Protocol) and IMAP (Internet Message Access Protocol). POP is simpler and more popular, IMAP more comprehensive and is commonly supposed to be the future. Client and server must use the same protocol. Some email server computers run both servers, and some email clients can be configured either way.
      Email.gif (5547 bytes)
      1. Internet email address has two parts separated by @. e.g. d.r.bowen@wayne.edu
        1. part to left of @ is name of account (d.r.bowen)
        2. part to right of @ is email server that account is on (wayne.edu)
  3. WSU will be going to a central login ID and password, your "Access ID." For example, my Access ID is aa2012 so my email address can be aa2012@wayne.edu. The Access ID is mailed to all WSU students in their first semester. The central login is carried out by an "authentication server" - it authenticates or logs in users. If you are unable to log in to a system that uses your Access ID (www.cll.wayne.edu does not), this authentication server may have crashed, or perhaps one of the routers to it has crashed. 
  4. Debugging Internet problems
    1. Numbers for help - you are likely to get more respect and better help if you know what you are talking about when you ask for help
      1. Instructor (may not be expert)
      2. For anything using "cll.wayne.edu", I am generally the administrator, so call me.
      3. WSU computer help desk (313) 577-4778, 8 Am - 8 PM weekdays. These people are knowledgeable and patient. Helping people is their job.
      4. WSU network operations center (313) 577-4746 24 x 7. These people run the system. They are not patient and kind. Try to know what you are talking about.
    2. Trouble logging in to a system that uses your Access ID and password? (The CLL web server does not.) Ask about the authentication server. If you have made any changes on your computer since the last time such a login worked, be honest and tell the operator.
    3. If you see an error pop-up like this, or possibly one mentioning "does not have a DNS entry", your browser is telling you that it cannot find a DNS with that computer name. The most likely problem is that your Internet connection has been lost and you cannot do anything over the Internet. You may have mis-typed the URL. Possibly, though, the DNS is not working. Try tracert - that impresses the experts.
    4. Notice what happens on the browser "status bar" at the lower left. Often, these steps will flash by more quickly than you can see, but if one fails, the status bar can help you figure out what is wrong. First, your browser will look up the server on a DNS

      Then it will make an initial request to the server and wait for the file to be sent.

      At this point, the server might not respond at all. You will probably get an error message that the server is not responding. Sometimes this can simply mean that the server is momentarily overloaded and is rejecting requests. If this condition persists, it probably means that the server has crashed. Try tracert and see if you get the last response, which is the one from the server. One other thing that can go wrong is that the server cannot find the specified file. Sometimes you may not be allowed access tot he file, but for a course web site, that is unusual. Here is what you may see if the file cannot be found:

      404 Not Found

      The requested URL was not found on this server:

      /isp/drbowen/webedu/whatisit.htm

      (C:\WebSite\htdocs\webdocs\isp\drbowen\webedu\whatisit.htm)

      In this case, you may have mistyped the URL, or the file may have been moved or deleted.

    5. You can also have errors internal to your computer that have error pop-ups that can look like the "grey box" one above, but have nothing to do with the Internet. There can be quite a bit going on just on your desktop - different programs interacting with each other. Here, in getting help, it helps to know exactly what the error message says, and also what it says on the blue bar at the top of the error box; this is name of the program reporting the problem (although the problem could be in another program). Also know what operating system (Windows or Mac? XP Home or XP Professional or XP ?? Which version?) you are using, and what programs you had running at the time. For the running programs, look at the buttons on the "task bar" at the bottom of the Windows screen, and in the "tray" at the bottom right.
            
      For the name and version of a running program, use "Help > About..." as below
      ==>