Troubleshooting Memory

written by: Kyle Duke; article published: year 2006, month 07;



In: Categories » Computers and technology » Memory Processor Motherboards and buses » Troubleshooting Memory

Memory problems can be difficult to troubleshoot. For one thing, computer memory is still mysterious to people because it is a kind of "virtual" thing that can be hard to grasp. The other difficulty is that memory problems can be intermittent and often look like problems with other areas of the system, even software. This section shows simple troubleshooting steps you can perform if you suspect you are having a memory problem.

To troubleshoot memory, you first need some memory-diagnostics testing programs. You already have several, and might not know it. Every motherboard BIOS has a memory diagnostic in the POST that runs when you first turn on the system. In most cases, you also receive a memory diagnostic on a utility disk that came with your system. Many commercial diagnostics programs are on the market, and almost all of them include memory tests.

When the POST runs, it not only tests memory, but also counts it. The count is compared to the amount counted the last time BIOS Setup was run; if it is different, an error message is issued. As the POST runs, it writes a pattern of data to all the memory locations in the system and reads that pattern back to verify that the memory works. If any failure is detected, you see or hear a message. Audio messages (beeping) are used for critical or "fatal" errors that occur in areas important for the system's operation. If the system can access enough memory to at least allow video to function, you see error messages instead of hearing beep codes.

If your system makes it through the POST with no memory error indications, there might not be a hardware memory problem, or the POST might not be able to detect the problem. Intermittent memory errors are often not detected during the POST, and other subtle hardware defects can be hard for the POST to catch. The POST is designed to run quickly, so the testing is not nearly as thorough as it could be. That is why you often have to boot from a DOS or diagnostic disk and run a true hardware diagnostic to do more extensive memory testing. These types of tests can be run continuously and be left running for days if necessary to hunt down an elusive intermittent defect.

Still, even these programs do only pass/fail type testing; that is, all they can do is write patterns to memory and read them back. They can't determine how close the memory is to failingonly whether it worked. For the highest level of testing, the best thing to have is a dedicated memory test machine, usually called a SIMM/DIMM/RIMM module tester. These devices enable you to insert a module and test it thoroughly at a variety of speeds, voltages, and timings to let you know for certain whether the memory is good or bad. Versions of these testers are available to handle all types of memory from older SIMMs to the latest DDR DIMMs or RIMMs. I have defective modules, for example, that work in some systems (slower ones) but not others. What I mean is that the same memory test program fails the module in one machine but passes it in another. In the module tester, it is always identified as bad right down to the individual bit, and it even tells me the actual speed of the device, not just its rating. Companies that offer memory module testers include Tanisys (www.tanisys.com), CST (www.simmtester.com), and Innoventions (www.memorytest.com). They can be expensive, but for a professional in the PC repair business, using one of these SIMM/DIMM testers is the only way to go.

After your operating system is running, memory errors can still occur, typically identified by error messages you might receive. These are the most common:

  • Parity errors. Indicates that the parity-checking circuitry on the motherboard has detected a change in memory since the data was originally stored.

  • General or global protection faults. A general-purpose error indicating that a program has been corrupted in memory, usually resulting in immediate termination of the application. This can also be caused by buggy or faulty programs.

  • Fatal exception errors. Error codes returned by a program when an illegal instruction has been encountered, invalid data or code has been accessed, or the privilege level of an operation is invalid.
  • Divide error. A general-purpose error indicating that a division by 0 was attempted or the result of an operation does not fit in the destination register.

If you are encountering these errors, they could be caused by defective or improperly configured memory, but they can also be caused by software bugs (especially drivers), bad power supplies, static discharges, close proximity radio transmitters, timing problems, and more.

If you suspect the problems are caused by memory, there are ways to test the memory to determine whether that is the problem. Most of this testing involves running one or more memory test programs.

I am amazed that most people make a critical mistake when they run memory test software. The biggest problem I see is that people run memory tests with the system caches enabled. This effectively invalidates memory testing because most systems have what is called a write-back cache. This means that data written to main memory is first written to the cache. Because a memory test program first writes data and then immediately reads it back, the data is read back from the cache, not the main memory. It makes the memory test program run very quickly, but all you tested was the cache. The bottom line is that if you test memory with the cache enabled, you aren't really writing to the SIMM/DIMMs, but only to the cache. Before you run any memory test programs, be sure your cache is disabled. The system will run very slowly when you do this, and the memory test will take much longer to complete, but you will be testing your actual RAM, not the cache.

First, let's cover the memory testing and troubleshooting procedures.

1.
Power up the system and observe the POST. If the POST completes with no errors, basic memory functionality has been tested. If errors are encountered, go to the defect isolation procedures.

2.
Restart the system, and enter your BIOS (or CMOS) Setup. In most systems, this is done by pressing the Del or F2 key during the POST but before the boot process begins (see your system or motherboard documentation for details). Once in BIOS Setup, verify that the memory count is equal to the amount that has been installed. If the count does not match what has been installed, go to the defect isolation procedures.

3.
Find the BIOS Setup options for cache, and set all cache options to disabled. Save the settings and reboot to a DOS-formatted system disk (floppy) containing the diagnostics program of your choice. Note that some diagnostic programs use a self-booting disk or CD. If your system came with a diagnostics disk, you can use that, or you can use one of the many commercial PC diagnostics programs on the market, such as PC-Technician by Windsor Technologies, Norton System Works by Symantec, Doc Memory from SIMMTester, or others.
4.
Follow the instructions that came with your diagnostic program to have it test the system base and extended memory. Most programs have a mode that enables them to loop the testthat is, to run it continuously, which is great for finding intermittent problems. If the program encounters a memory error, proceed to the defect isolation procedures.

5.
If no errors are encountered in the POST or in the more comprehensive memory diagnostic, your memory has tested okay in hardware. Be sure at this point to reboot the system, enter the BIOS Setup, and re-enable the cache. The system will run very slowly until the cache is turned back on.

6.
If you are having memory problems yet the memory still tests okay, you might have a problem undetectable by simple pass/fail testing, or your problems could be caused by software or one of many other defects or problems in your system. You might want to bring the memory to a SIMM/DIMM tester for a more accurate analysis. Most PC repair shops have such a tester. I would also check the software (especially drivers, which might need updating), power supply, and system environment for problems such as static, radio transmitters, and so forth.

Memory Defect Isolation Procedures

To use these steps, I am assuming you have identified an actual memory problem that is being reported by the POST or disk-based memory diagnostics.

1.
Restart the system and enter the BIOS Setup. Under a menu usually called Advanced or Chipset Setup might be memory timing parameters. Select BIOS or Setup defaults, which are usually the slowest settings. If the memory timings have been manually set,reset the memory configuration to By SPD.


2.
Save the settings, reboot, and retest using the testing and troubleshooting procedures listed earlier. If the problem has been solved, improper BIOS settings were the problem. If the problem remains, you likely do have defective memory, so continue to the next step.

3.
Open the system for physical access to the SIMM/DIMM/RIMMs on the motherboard. Identify the bank arrangement in the system. Using the manual or the legend silk-screened on the motherboard, identify which modules correspond to which banks. Remember that if you are testing a dual-channel system, you must be sure you remove both Channel A and Channel B modules in the same logical bank.

4.
Remove all the memory except the first bank, and retest using the troubleshooting and testing procedures listed earlier. If the problem remains with all but the first bank removed, the problem has been isolated to the first bank, which must be replaced.

5.
Replace the memory in the first bank, preferably with known good spare modules, but you can also swap in others that you have removed and retest. If the problem still remains after testing all the memory banks (and finding them all to be working properly), it is likely the motherboard itself is bad (probably one of the memory sockets). Replace the motherboard and retest.
6.
At this point, the first (or previous) bank has tested good, so the problem must be in the remaining modules that have been temporarily removed. Install the next bank of memory and retest. If the problem resurfaces now, the memory in that bank is defective. Continue testing each bank until you find the defective module.

7.
Repeat the preceding step until all remaining banks of memory are installed and have been tested. If the problem has not resurfaced after removing and reinstalling all the memory, the problem was likely intermittent or caused by poor conduction on the memory contacts. Often simply removing and replacing memory can resolve problems because of the self-cleaning action between the module and the socket during removal and reinstallation.

legal disclaimer

1) Our website is not responsible for the information contained by this article as well for any and all copyright infringements by authors and writers. E-articles is a free information resource. If you suspect this article for any copyright infringements, please read the Terms of service and contact us to investigate the problem.
2) The E-articles directory team is not responsible for inaccuracies, falsehoods, or any other types of misinformation this tutorial may contain and will not be liable for any loss or damage suffered by a user through the user's reliance on the information gained here. Please read the Terms of service

Useful tools and features

Translate this article to...    Send this article to you or to a friend

Link to this article from your page   
If you like this article (tutorial), please link to it from your web page using the information above. Linking to this page, this is the only way to help us improve our service, the same time providing your visitors with a way to improve their online experience.

related articles

1. How SIMM DIMM and RIMM memory work
Originally, systems had memory installed via individual chips. They are often referred to as dual inline package (DIP) chips because of their designs. The original IBM XT and AT had 36 sockets on the motherboard for these individual chips; then more of them were installed on the memory cards plugged into the bus slots. I remember spending hours populating boards with these chips, which was a tedious job. Besides being a time-consuming and labor-intensive way to deal with memory, DIP chips had one notorious problemthey crept out of th...

2. The evolution of Microprocessors from 1971 to the Present
It is interesting to note that the microprocessor had existed for only 10 years prior to the creation of the PC! Intel invented the microprocessor in 1971; the PC was created by IBM in 1981. Now more than 20 years later, we are still using systems based more or less on the design of that first PC. The processors powering our PCs today are still backward compatible in many ways with the 8088 that IBM selected for the first PC in 1981. November 15, 2001 marked the 30th anniversary of the microprocessor, and in those 30 years processor ...

3. RDRAM
Rambus DRAM (RDRAM) is a fairly radical memory design found in high-end PC systems from late 1999 through 2002. Intel signed a contract with Rambus in 1996 ensuring it would support RDRAM into 2001. After 2001, Intel continued to support RDRAM in existing systems, but new chipsets and motherboards primarily shifted to DDR SDRAM, and all future Intel chipsets and motherboards are being designed for either conventional DDR or the newer DDR2 standard. RDRAM standards had been proposed that will support faster processors through 2006; however, w...

4. Processor Codenames
Intel, AMD, and Cyrix have always used codenames when talking about future processors. The codenames usually are not supposed to become public, but they typically do. They can often be found in online and print news and magazine articles talking about future-generation processors. Sometimes, they even appear in motherboard manuals because the manuals are written before the processors are officially introduced. Processor Coden...

5. What is UMA ~ Upper Memory Area
The term Upper Memory Area (UMA) describes the reserved 384KB at the top of the first megabyte of system memory on a PC/XT and the first megabyte on an AT-type system. This memory has the addresses from A0000 through FFFFF. The way the 384KB of upper memory is used breaks down as follows: The first 128KB after conventional memory is called video RAM. It is reserved for use by video adapters. When text and graphics are displayed onscreen, the data bits that make up those images reside in this space. Video RAM is allotted t...

6. Memory Basics ~ ROM DRAM SRAM Cache Memory
Memory is the workspace for the computer's processor. It is a temporary storage area where the programs and data being operated on by the processor must reside. Memory storage is considered temporary because the data and programs remain there only as long as the computer has electrical power or is not reset. Before being shut down or reset, any data that has been changed should be saved to a more permanent storage device (usually a hard disk) so it can be reloaded into memory in the future. Memory often is called RAM, for random acce...

7. What are Dual Core Processors. Advantages of Dual Core Processor
No matter how fast a conventional single-core processor operates or how much RAM is installed in a system, it must ensure that each program and process that is running is properly serviced. As more and more programs are opened, the amount of time the processor can devote to each program is reduced. The result is that system performance declines. Workstations and servers have long enjoyed the benefits of multiple processors, including better responsiveness when multitasking, faster performance in single multithreaded applications, and better ...

8. How to install RAM Upgrades
Adding memory to a system is one of the most useful upgrades you can perform and also one of the least expensiveespecially when you consider the increased capabilities of Windows 9x/Me, Windows NT/2000/XP, and Linux when you give them access to more memory. In some cases, doubling the memory can practically double the speed of a computer. The following sections discuss adding memory, including selecting memory chips, installing memory chips, and testing the installation. Upgrade Options and Strategies...

9. Video RAM Memory
A video adapter installed in your system uses a portion of your system's first megabyte of memory to hold graphics or character information for display, but this typically is used or active only when in basic VGA mode. Note that even though a modern video card can have 256MB or more of onboard memory, only 128KB of this memory appears available to the system in the video RAM area. The rest of the memory is accessible only by the video processor (on the video card) directly, or by your system processor via a memory aperture positioned...