Troubleshooting Methodology | CompTIA IT Fundamentals FC0-U61 | 1.6

By Tech Gee on March 27, 2020

In this video you will learn how to identify a problem; research the problem using a knowledge base or the internet (if applicable, establish a theory of probable causes; test the theory to determine the cause; establish a plan of action to resolve the problem & identify potential effects; implement the solution or escalate as necessary; verify full system functionality, and if applicable, implement preventive measures; and finally, document your findings/lessons learned, actions, and outcomes.

Troubleshooting Methodology

Identify the Problem

The first thing you need to be aware of when troubleshooting a problem is understanding that the symptoms are not the problem. The actual problem is what’s causing the symptoms to manifest. To identify the problem, use the following procedures as needed:

Gather information
Duplicate the problem, if possible
Question users
Identify symptoms
Determine if anything has changed
Approach multiple problems individually

Before making any changes, make sure you safeguard current settings:

If possible, back up the entire system
On a Windows computer, back up the system Registry using Regedit
Print out or record current firmware/UEFI/BIOS, network, and software settings

Gather Information

Start the process by gathering information. Some of the information you need may be obtained from log files created by the operating system. You can also compare the current settings for the device to its default settings.

Other information sources include:

Network logs
Printer self-test to display amount of RAM and number of pages printed (laser); firmware version (laser or inkjet)
Current BIOS/firmware settings
Hardware information
Windows update history

Duplicate the Problem, If Necessary

If possible to duplicate the problem. Try the same tasks with the same files and output devices that were originally involved. Record any error messages or dialogs that are displayed. Use the screen capture utility to capture error messages or dialogs.

Question Users

Ask the user(s) that have reported the issues to you to provide details about the problems they are experiencing. When asking users, remember that some of them may be scared for fear of believing that they may have done something they were not supposed to do. Just remind them in a compassionate and understanding manner that you are simply there to help them solve the issue they are experiencing with the network and/or their device.

Some of the ways you establish rapport and build a good relationship are:

Use proper language and avoid using jargon, acronyms, and slang when applicable
Maintain a positive attitude and project confidence
Actively listen (take notes) and avoid interrupting the user/customer
Avoid judging the customer
Avoid getting into arguments with the customer
Ask permission before touching, viewing, or moving printouts or other personal/business items
Use your smartphone only for business-related matters when in the presence of the customer

Identify Symptoms

As you are talking to the customers, you should also be trying to identify symptoms. Some possible symptoms might include:

Loud noise from inside the computer
Unpleasant odor which might be accompanied by smoke or fumes
Unexpected error messages
Flickering lights or other signs of a power fluctuation
Slower than normal system response
Inability to connect to network resources
Blank screen
Software not performing as expected

Some questions you could ask the user:

“Did you hear anything different than usual?”
“Do you usually print to a network printer?”
“Do you remember what apps were open when the problem happened?”
“Have you used the device/app before, or was it just installed?”

Be mindful that these are just questions to help get the ball rolling towards identifying the problem.

Determine If Anything Has Changed

Determine if anything has changed (device settings, upgraded hardware, updated operating system or app, cables, etc). The change might be the reason for the failure you are trying to troubleshoot. Some ways to determine if anything has changed are as follows:

Review operating system and application update logs
Compare the default settings for the device to the current settings
Review device driver versions & when they were last updated
Review antivirus and anti-malware scan histories
Review network update logs
Check to see if any device or peripheral cable has been disconnected or connected to a different port than their assigned ports
Check IT department records of system hardware upgrades
Check BIOS/UEFI firmware revisions and updates
Check to see what apps/programs were installed recently
Determine if new hardware was installed recently
Check to see what apps/programs were updated recently
Determine if there were changes to the database recently (new tables, new data, new queries, old data removed, etc)

Approach Multiple Problems Individually

Multiple problems could be the result of a common issue such as problems taking place with the network, but unless you know for sure it’s a network problem, it’s easier to figure out the solution to a single problem before moving on to the next.

Research Knowledge Base/Internet, If Applicable

Always remember, the knowledge base and/or the internet are your friends when it comes to troubleshooting. If your organization has its own knowledge base, start your research there first. If not, then head out to the internet to conduct research. When you are out there searching for possible reasons, keep these items in mind:

Whether you suspect a hardware or software problem, try searching the internet for symptoms before going to a particular vendor’s website to search
Use keywords that describe the symptoms or problems
Be sure to include brand, model and operating system information in your search
Pay attention to the dates of proposed solutions if multiple suggestions are given, as newer suggestions may work better than older suggestions
Note the operating system or app version listed because sometimes the same solution will work with older or newer versions than the current one
For apps that have cloud or local/network versions, make sure you specify the correct version

Establish a Theory of Probable Cause

Once you think you have successfully researched solutions & identified potential issues that may be causing the problem, it is now time for you to establish a theory of probable cause.

Question the Obvious

Sometimes the solution to a problem could be something very simple that just goes unnoticed. For example, a user calls the helpdesk to report that their screen just suddenly went black. You arrive to investigate the issue. As you are investigating the issue, you notice that power lights to the monitor are not on. You look behind the monitor to see that the power cord is plugged in. You follow the power cord to the wall outlet to discover that the power cord is not plugged into the outlet. You then inform the user that the monitor was not plugged into the outlet. The user then recalls accidentally kicking something under their desk which more than likely was the power cord from the outlet. You then plug the power cord back into the wall and secure the power cord with zip ties to the desk in such a manner as to prevent the user from accidentally kicking the power cord in the future. Problem solved.

Consider Multiple Approaches

A rule you should tell yourself (and only yourself) is the K.I.S.S. rule: Keep It Simple Stupid. Sometimes there are multiple approaches to solving a problem, but it is best to go with the simplest and most easy to implement an approach. For example, if a user is experiencing problems with their keyboard such as sticky keys, simply swap out the keyboard for another one so the user can continue on with their work instead of you trying to take apart keys to remove the stickiness.

Divide & Conquer

Sometimes problems arise that may deal with components and their various subsystems. Take for example a printer. From the time a user hits the print button until a document is actually printed out, there are various subsystems linked to the printer which could possibly cause a print failure. The printing subsystem includes the printer, the USB cable between the printer and the computer, the USB port, the printer driver in the operating system, and the application. Each could cause a problem. First thing you could do is check to see if the printer is turned on and if so, does it have ink and toner? If not, that may be the problem. If it does, then check the USB cable to the printer and the computer to make sure it is plugged in on each device in the correct ports. If not, then move on to the next print subsystem until you isolate the problem. This is called the “divide & conquer” technique which allows for you to find and fix problems in a systematic manner.

Test the Theory to Determine the Cause

Once you believe you have established probable cause, it is now time for you test a theory to determine the cause. To test a theory, change what you think is causing the problem. Some examples are as follows:

Update device drivers
Uninstall the device and let the operating system reinstall the device
Swap a cable
Move a USB device to a different port
Install updates for the operating system
Install updates for apps

After you make a single change in the system, retest it to see if the problem is solved.

Side Note: If you don’t record the current configuration of the system’s hardware and software before you make a change to test your theory, you will not be able to reset the system to its previous condition if your first change doesn’t solve the problem.

Once the Theory is Confirmed (Confirmed Root Cause), Determine the Next Steps to Resolve the Problem

If your theory is confirmed, it’s now time to resolve the problem. Here are some examples from the previous section:

Update device drivers: Continue to use the updated device drivers.
Uninstall the device & let the operating system reinstall the device: If the device now operates properly, continue to use it.
Swap a cable: If the replacement cable is not needed elsewhere, continue to use it.
Move a USB device to a different port: If the USB device now works, you must decide if the USB port’s host device (card, motherboard, or hub) needs to be replaced.
Install updates for the operating system: If installing operating system updates solves the problem, plan to implement the solution on other affected systems.
Install updates for apps: If installing app updates solves the problem, plan to implement the solution on other affected systems.

If the Theory is Not Confirmed, Establish a New Theory or Escalate

If you come to the conclusion that your theory did not work, the next thing you should do is develop a new theory and test it. If you are confident that your theory is what is causing the problem and you have identified the correct problematic subsystem, move to the next step in the process of testing the subsystem.

For example, if you remove a USB cable from a USB port and plug that same cable into a different yet similar system and the system works fine, then your issue might not actually be the USB cable, but the actual USB port on the original system. Some of the issues could be damaged contact pins or a build of dirt inside the port. Now if you have concluded that there is no dirt nor any damaged contact pins inside of the USB port, your next step may be to escalate the problem to the next support tier.

Establish a Plan of Action to Resolve the Problem & Identify Potential Effects

Once you have identified the problem and discovered a solution, it is now time to establish a plan of action to resolve the problem and identify potential effects. An example for how to deal with a malware outbreak is as follows:

Identify & research malware symptoms
Quarantine the infected systems
Disable system restore (in Windows)
Remediate the infected systems
- Update the anti-malware software
- Scan and use removal techniques (safe mode, pre-installation environment)
Schedule scans and run updates
Enable system restore and create a restore point (in Windows)
Educate the user

Implement the Solution or Escalate as Necessary

If you are responsible for implementing the plan of action, follow it carefully. Be sure to note any problems with the plan or any additional problems you observe. If you are not responsible for implementing the plan of action, escalate it to the department that is responsible.

Verify Full System Functionality &, If Applicable, Implement Preventive Measures

Once you have implemented the solution, the next step is to check to make sure that the system, peripheral, or device actually does what it is supposed to do. An example of a full system functionality test is as follows:

Connect to a wired network and open a folder, then a file.
Connect to wireless network and open a folder, then a file.
Connect to the internet and view a web page that changes frequently, such as a news aggregator.
Print to a local printer.
Print to al network printer.
Open a file from local or network storage, edit it, & save it under a different name.
Scan a document or photograph.
Copy a file to a USB drive.
Burn an optical disc.
Extend the desktop or mirror dual displays (depending on task requirements).
Scroll through a document with a mouse or pointing device.
Pinch zoom a web page with a touchscreen.
Run a backup.
Run the OS or app update process.
Run an app and use it normally (open, edit, save data, close app).

Document Findings/Lessons Learned, Actions, & Outcomes

When it comes to problems arising in IT, more than likely you are going to encounter similar problems over and over again. Instead of approaching each problem as if it is the first time you’ve seen the problem, the best approach is to document your findings, lessons learned, actions, and outcomes from each problem you have solved so that you can build a repository of solutions for future problems you are bound to encounter again in the future. Be sure to add any figures (screen captures, diagrams, photos, etc) that will help you or others solve similar problems next time. Detailed documentation is your friend when it comes to solving problems in IT.