Wayne State University
College of Lifelong Learning
Interdisciplinary Studies Program
Fall, 2000
http://www.cll.wayne.edu/isp/drbowen/inetf00
Instructor: David R. Bowen
2311 A/AB
Wayne State University
Detroit, MI 48202
Daytime tel: (313) 577-1498
Evening tel: (248) 549-8518
FAX: (313) 577-8585
Email: d.r.bowen@wayne.edu

Instructor's home page (David R. Bowen) at http://www.cll.wayne.edu/isp/drbowen

eCommerce: Using the Web to Find and Service Customers
AGS 3360 Section 986 Call Number 92073
or ISP 5500 Section 982 Call Number 92136
Computers, the Internet, and Society
AGS 3340 Section 981 Call Number 96761
or ISP 5990 Section 982 Call Number 99915

Last updated: 11/3/00
Link back to course Welcome

ecom_logo.gif (601 bytes)

Agenda for eCommerce Class #7
November 22, 2000

This agenda and class are for eCommerce only

  1. Announcements
    1. The eCommerce class was presented to WSU faculty on Thursday, November 9. There was a small (15 people) group, but the discussion was largely favorable, and intense. This includes the attention to and questioning of the students (Paul Mungar and Carolyn Mills).
    2. If your name is listed below, puh-LEAZE go to the course web site and fill in the course information form.
      1. Nafeesah Abdullah
      2. Paul Mungar
      3. Mary Phelps
      4. Chandra Williams
    3. The computer tutor is available at no charge (paid directly by ISP). His name is Matta Vijay Kumar and he is available as follows:
      1. Tel 313-832-6585
      2. Lab hours (also lab is open and you can work in there): Fridays 10 - 12 and 3 to 7 PM
    4. Team 3 / Team 4
  2. Reminder of what you should be doing online on a regular basis -- these are part of the grade
    1. Signin, from the lab, only on days for the class(es) you are taking
    2. Weekly course report (if you are taking both classes, a single report will do)
    3. Conference postings (one for eCommerce, two for Computers, the Internet, and Society, three if you are taking both)
    4. Not required, but do it anyway - check your email on at least a weekly basis. Don't have email: use hotmail - it's easy and free. See me if you need help.
  3. Finishing up
    1. Web sites
    2. Reports
  4. Web Server logs - handout
    1. All web servers (at least the ones that I know about) maintain a log file of all requests they get from web clients (web browsers). This log file can tell a lot about the performance of the web site and how users access it. There is a "standard" format for log files, and many log file analysis programs both free and pay-for. All of this is done because the log files are so useful. I have imported the log file covering the period from 9/25/2000 through today in Microsoft Access (database program) and eliminated the requests for graphics files (*.gif and *.jpg). The conferencing system is separate software with a separate logs, but running on the same computer, so is not included here. This result of omitting gif and jpg files, generally leaving HTML files, is usually called "hits." There were over 200,000 total requests in the two months covered by this access log, of which slightly more than 100,000 were for HTML files. Finally, I took the first 116 of these and printed them out using Excel to get the small fronts.
    2. The CLL web server uses an enhanced file format. The information is in fields, with fields separated by tab characters (ASCII code = 9), and some extra information is included in addition to the standard information. The fields are:
      1. Key. This is a number added by Access, just a running count.
      2. ID. This counts all of the original requests, so jumps when going over a gif or jpg graphic file. This was also added by Access.
      3. Date/Time. This is the date and time that the request was received by the web server.
      4. UserIP. This is the user's IP address, needed so that the web server knows where to send the requested file back to. If desired, this can be converted to the user's domain name.
      5. UserID. If the user logs in to a section of the web site, this is the UserID. (For example, users have to log in to get to the eCommerce web sites for this course.) If the user is not logged in, this is blank.
      6. Password. Same for the user's password.
      7. Method. This is an internal parameter, not too useful.
      8. Request. This is the path and file that the user requested. Since gif and jpg files have been eliminated for this handout, only htm or html files should remain.
      9. Referer. This is the web page that has the link that the user clicked on to make the request. If the user typed in the URL or used a bookmark, this field is blank.
      10. UserEMail. The user's email address if given. You can see that only web "crawlers" give their email addresses.
      11. Browser. This is the user's web browser brand and version. "Mozilla" = Netscape. This is needed by some very cutting-edge web sites that also have a large staff, since the different browsers handle the same HTML in slightly different ways, or might not handle new features at all.
      12. Code. This is the completion code assigned by the web server from a standard list of codes. 200 means successful completion. 404 means that the user's request could not be found. There are many other codes.
      13. Bytes sent. This is the number of bytes that the web server sent out to service this request. Here, obviously, the number of bytes does not include the ones for graphics files. This is not included in the standard log file format.
      14. Time (ms). This is the time required to service the hit, in milliseconds. that is,160 = 160 ms = 0.160 seconds.
    3. Discussion - what can you learn from the log files?
  5. The standard I have set for the eCommerce web sites is that all ordering information will be collected on a single web form. however, many eCommerce web sites have several forms that have to be completed one after the other. Many other interactive web sites also use forms that have to be filled out one after the other. The CLL conferencing system is an example of this, and the Online Math Tutor is another.

    All such interactive web sites with multiple forms, where the forms have to be connected to the same user, have a common problem. This problem is that the web itself does not "remember" the user. If two forms come in, there is not way that the web itself provides to identify them as coming from the same user. This was done initially to keep the web small and lean and fast, but now it is an impediment. The next version of the web may solve this problem, but for now, web site developers have to solve it for themselves. I thought for a while that the IP address would identify the computer. For example, dial-up connections have a different IP address for each session, but the IP does not change during a session. Computers with dedicated connections, like the computers in the 113 Rackham lab, even have fixed IP addresses that should never change (this will change next year). However, AOL users have a different IP address for each request. AOL users go through a "proxy server" that assigns an IP address for that hit and keeps track of which user the return information goes to. And AOL is a large ISP, so you cannot count on the IP address staying the same even during a session.

    So, for fully interactive web sites, the web site developer must have some way of matching up the different sets of form data from the same user. The CLL web server has available a rare but to my view superior method, which we use in both the conferencing system and the Online Math tutor, called "Web Basic Authentication", but this is normally available only to web masters and not to web developers. The other two methods are as follows:
    1. Hidden Fields. Here, each user session is assigned an identifier such as a Session Number that incremenets (1 - 2 - 3 - 4 - etc.) for each session. A User Number can also be used, but this can introduce a security breach, as we will see. Whatever the identifier is, when a web form is sent to the user, this number is inserted into the form as a hidden field - that is a special type of INPUT which uses the hidden value as the value, and something like "SessionNumber" as the name for the input. This data comes back as part of the form data. The return would look something like "SessionNumber = 106735," and would be available in iHTML using the colon variable :SessionNumber, which in this example would be replaced with 106735. The CGI script can use this number to link up this form-full of information with previous ones.

      The drawbacks to this method are:
      1. The CGI program has to generate each web form so that it can insert the hidden field information into it. This takes extra time.
      2. Savvy users can see this information using the "View / Source" menu item, and at least some will be able to figure out what is going on, and may be tempted to try and hack in (I have been tempted in this way). Web sites using this method need to pay extra attention to security.
    2. Cookies. A cookie is a file that the web server tells the browser to store on the user's hard drive ("set" the cookie). The web server specifies the file contents, which can be, for example, a UserID or a Session Number. When the web server gets the form information from the user, it can also go back through the user's browser and read the file contents ("read" the cookie) and somehow connect the cookie information with the file information. for example, you can set and read cookies in the iHTML language. There are several problems with cookies also:
      1. Users can delete cookies, or choose not to accept cookies, in which case this method does not work.
      2. Cookies have a reputation for being intrusive and security leaks. Apparently some web site developers have learned how to read cookies set by other web servers, and so collect information on the user. This is only possible if the other web site developer placed personal information about the user in the cookie, which should never be done, but there are careless people everywhere.
      3. Finally, it seems to me that the web site developer is using the user's computer for record-keeping, something that the web server should be doing for itself.
    3. Nevertheless, these are the two choices that web developers have for making truly interactive web sites. I find that neither of the choices is really a good one, which is why I have chosen the third way mentioned above. In this method, the web server tells the browser to pop up a grey login box and the user supplies a User Name and Password. afterwards, the browser scrambles these and sends them on every request to that web site, and the CGI program can use that information to connect that hit with others from the same user. The problem with this is that only one web server currently lets a web site developer use this method, and that web server is O'Reilly Website, the web server presently used by CLL.
    4. Hopefully in the future there will be a really good method that is generally available, without the practical and ethical problems associated with hidden fields and cookies.