Last updated: 3/26/03
Link Back to course Welcome...

File Sizes for Computer Information

  1. Types of Computer Information. The table below summarizes the forms of computer information; we have not covered them all in class yet.
    Types of computer information Subtypes Examples
    Programs Operating system Windows95, Windows 98, Windows NT
    Mac OS 8
    Linux
    Unix
    Application Word program
    Excel program
    Access program
    Netscape program
    Data System information User Name and Password
    Internet address
    Network connection
    Printer description
    Icons, screen colors
    Sound files - "The Windows Sound"
    User information Word processing document - mostly text
    Spreadsheet document - text and numbers
    Database document - text and numbers
    Graphic file - can be animated
    Sound file
    Video - Graphics and sound together
    Macro script - a small program written by a user, stored as a document, executed by the corresponding application
  2. How much information can be stored in n bits?
    1. Lowest number that can be stored is zero
    2. Highest number that can be stored is 2n - 1
    3. Number of different values (codes) is 2n
    4. 210 ~ 103 = 1,000
    5. A byte - 8 bits. Computers usually handle data in bytes. How much information is this, or how many different codes?
  3. Sound files. Sound can be stored in and played by computers using a sound board (a piece of hardware plugged into the inside of the computer, possibly after it was purchased). How is this done?
    1. The sound waves go into a microphone input on the sound board, and the sound board converts the sound wave into a electronic signal or voltage wave, in just the same way that sound is recorded in a studio or tape recorder.
    2. The sound board measures voltage wave at very close or rapid time intervals, and the converts the measurements into numbers.
    3. The numbers are stored in the computer, in its RAM or on its hard drive or other disk.
    4. When we want to play the sound back, the numbers are pulled back out of storage, converted to voltages by the sound board, and fed into a speaker or set of earphones.
    5. The graphic below shows a sound wave being measured at several points (the small circles)
      audio.gif (3688 bytes)
    6. The size of a sound file is determined by:
      1. Length of the sound in seconds
      2. Number of samples (measurements) per second, usually between 8,000 per second (tape recorder quality) and 40,000 per second (CD quality)
      3. Number of bytes used to store each measurement - 8 bits or one byte per sample for low quality, 16 bits or two bytes per sample for high quality
      4. Stereo is two independent sounds and so takes twice the storage.
      5. Compression factor (CF)
    7. The formula for the size of a sound file in bytes is (will be given on quizzes and exams if needed):
      File size in bytes =
           (length in seconds) × (samples / second) × (bytes / sample) × (2 if stereo, 1 if mono) / CF
      Example: A sound three seconds long, with 20,000 samples per second and 2 bytes per sample, in stereo, with a compression factor of 5, takes this many bytes:
      bytes = 3 × 20,000 × 2 × 2 / 5 = 48,000 bytes = 48 kB. A file this size could take a long time to download over the Internet. Originally, sound files download completely before playing. Streaming audio plays as it downloads, and so seems much faster to users.
  4. File sizes for numbers
    1. ASCII codes will not work for calculations. The ASCII code for 1 is 49 and the ASCII code for 2 is 50. If we add the ASCII codes we get 49 + 50 = 99. This is not the ASCII code for 3, which is 51, but the ASCII code for c, which is not even a number. We must use the binary code for storing numbers to be used in arithmetic. (To display number on the screen or printer, they are generally translated into ASCII by the computer, prior to display. Similarly, numbers come from the keyboard in ASCII and are converted to numbers before storage, if they are to be used for calculations. If you are typing a word processing document, the numbers get stored as characters - ASCII codes.)
    2. Computers can store numbers either as integers - whole numbers - or a "floating point" numbers - numbers with a decimal point.
      1. Integers use the straight binary number system as we have learned it in class. There are several storage arrangements. One or more bytes will be set aside for each integer, depending on how large a value we want to be able to store. For signed numbers (ones that can be + or -), generally the high-order bit is used for the sign, which reduces the size of the largest number that can be stored. The standard options are one, two or four bytes per integer. Two bytes is often called a "word", and four a "double word." It is important to pick the right scheme, depending on what range of numbers the program must accommodate. If too many bytes are allotted, storage is wasted, although with the low price of memory these days, that is not as important as it used to be. The real problem comes if too few bytes are allotted; then there are numbers that occur in the real world but cannot be stored in the computer.
        1. Example: What is the largest unsigned integer that can be stored in one byte?
        2. Example: What is the largest signed integer that can be stored in one byte?
        3. Example: What is the largest signed integer that can be stored in two bytes?
      2. Floating point numbers are stored in the form of a decimal number times an exponent, or power, or x × 10y, for example 0.5348 × 10+47. Standard schemes are four bytes per number, or eight bytes. The exponent in some schemes can be as high as 10+300 (a 1 followed by 300 zeroes) or as small as 10-300
      3. The file size, then, is the number of numbers times the bytes per number. If we have some numbers that are small and some that are large, we can have a mixed scheme. In this case, we find the size for each storage scheme and then add up the individual schemes to find the total.
      4. Example: How many bytes are required for 5,000 integers at two bytes per integer, and 2,000 floating point numbers at eight bytes per number?
  5. Computer programs or instructions.
    1. Almost all computers (and all current ones) use the same scheme for storing computer instructions (these would be in exe or com files). Data in the computer is stored in RAM memory, which is organized into bytes. The storage locations are numbered from 0 to the number of bytes of storage, minus 1. That number is called the address. The storage locations must be cheap, because there are so many of them, and hence incapable of doing arithmetic; a special location called the Accumulator is used as temporary storage where arithmetic can be done also. To add two numbers, for example, the computer copies or loads the first number into the accumulator, then adds the second number to the value already in the accumulator, and finally copies the value in the Accumulator (now containing the sum of the two numbers) into the address set aside for the sum. It is important to realize when thinking about this, that the adding, copying or loading does not change what is in the original location, only what is in the final location.
    2. A computer instruction, then, has two parts; a number called the "operation code" or "op code", representing the instruction (simple load from memory to accumulator, add from memory to accumulator, store from accumulator back to memory, multiply from memory to accumulator, copy from memory location to screen or disk, and so forth), and the address of the memory location to be used.
    3. The total set of instructions that a computer is capable of  (each type and model is different) is called the "Instruction Set" for the computer (technically, for the Central Processing Unit or CPU.) The more separate op codes a computer has in its Instruction Set, the more specifically it can carry out instructions, and the fewer instructions it needs to carry out to get a specific task done. However, having many op codes can mean that the microprocessor is very complicated, and therefore slower. (There used to be two divergent approaches - RISC or Reduced Instruction Set and CISC or Complex Instruction Set. RISC computers were supposed to be faster because it was felt that they could make up for needing to carry out more of their simpler instructions, by being much faster. However, this supposed advantage has not shown up in practice. It has also helped the CISCs that they have adopted many of the best ideas of the RISC advocates. The Intel Pentium processors are CISCs.)
    4. How much storage room does a computer instruction occupy?
      1. One factor is the number of different instructions that the CPU is capable of. Using the formula for the number of different values that can be stored in n bits, and knowing how many different op codes a given design needs, we can determine that part. For example, if a CPU is to have 64 op codes, we will need 6 bits. That may seem to be a large number of op codes, but many microprocessors have extra hardware such as Registers, that are capable of some arithmetic that can speed up operations on a series of numbers, and Interrupts that can handle critical events by taking the computer away from routine tasks. These features greatly increase the number of op codes needed, and some CPUs have hundreds and even thousands op codes, or more. In the figure above, the operation code occupies l ("L", not "one") bits, so the computer can have 2l different operations. 
      2. The second factor is the size of the address portion of the instruction. This determines the maximum number of storage spaces that the CPU can have connected. If the address is one byte, that CPU could only use 256 RAM locations. Set aside two bytes, and it can use 65,000+ locations. It is not uncommon today to have five bytes for the address, with a capacity of 10+ terrabytes or 10,000 gigabytes. The program is also stored in RAM, and so additional capacity is required here. The total memory requirement is the maximum number of storage for data (text, graphics, sound and numbers) plus the maximum size of a program in instructions. This maximum number is called the "address space" of the processor. (These days, programs are large enough so that they are loaded into memory only when they are needed. A "dll" extension in Windows is the extension for such a piece of a program.). A good shortcut here is that 10 bits can have 210 = 1024 values, so 210 ~ 103 (=1,000). So one million = 1,000 x 1,000 takes about 20 bits and one billion = one million x one thousand takes 30 bits. Finally, for example, 4 billion = 4 x one billion = 3 bits for the 4 + 30 bits for the one billion = 34 bits.
    5. So, a CPU with 65,000+ op codes and 10 terrabytes of address space requires 2 bytes per instruction for the op code and 5 bytes per instruction for the address, or a total of 7 bytes of storage for each instruction. A program that is 1 million instructions long will require 7 × 1,000,000 bytes, or 7,000,000 bytes (7 MB). In general, (bytes per instruction) = (bytes for op codes) + (bytes for address) and (program size in bytes) = (size of instruction in bytes) × (size of program in instructions).