Resources/Glossary/
Mean Time Between Failures
Maintenance and Reliability

Mean Time Between Failures

How long a machine runs before it lets you down. Higher is better.

Updated
·
4
min read
Definition

What is Mean Time Between Failures?

Mean time between failures, or MTBF, is the average run time a piece of equipment delivers between unplanned failures. It is calculated by dividing total run time by the number of failures over the same period. MTBF measures equipment reliability and feeds the availability factor of overall equipment effectiveness. A higher MTBF means the machine is failing less often, regardless of how long each repair takes.

Mean time between failures is one of the two reliability metrics every TPM program tracks, and the one that most directly answers the question "how dependable is this machine." Reliability is not the same as quality and not the same as productivity; it is specifically about how long the machine will keep running before something unexpected stops it. A shop that knows MTBF by machine can plan staffing, set realistic delivery promises, and target the right equipment for upgrades.

"Reliability is the difference between a quote you can keep and a quote that depends on luck."

How mean time between failures works

The calculation is simple arithmetic but the inputs require discipline. Run time is the time the machine was actually in production. It excludes scheduled meetings, lunches, planned maintenance, and changeovers. Failures are unplanned stops that took the machine out of production until someone repaired it. A jam the operator cleared in 90 seconds may or may not count depending on how the shop defines failure; many programs set a threshold of 10 or 15 minutes to filter out the noise. Whichever rule is chosen, the rule has to be consistent across machines and time periods or the comparisons mean nothing.

MTBF combines with mean time to repair to produce availability. The relationship: availability equals MTBF divided by the sum of MTBF and MTTR. A machine with a 200 hour MTBF and a four hour MTTR has availability of 200 over 204, or 98 percent. Drop MTBF to 50 hours with the same MTTR, and availability falls to 92 percent. Drop MTTR to one hour at the original MTBF and availability rises to 99.5 percent. Improving either lever helps; understanding which lever moves more for your shop is the diagnostic question.

The lever to pull depends on the failure pattern. If failures are concentrated on a few wear parts, preventive maintenance on those parts will raise MTBF. If failures are random and varied, predictive maintenance or autonomous maintenance inspections by the operator are usually a better bet.

Where mean time between failures fits on the shop floor

Picture a 25 person fab shop with two CNC mills, three press brakes, and a laser cutter. The owner has been told the laser is the most reliable machine. MTBF tracking for a quarter tells a different story. The laser has an MTBF of 220 hours and an MTTR of three hours. One of the mills has an MTBF of 95 hours and an MTTR of 45 minutes. Both have similar availability around 97 percent. The laser is not more reliable; it breaks less often but takes far longer to recover when it does.

That data changes the conversation. The laser's MTTR problem points to parts staging and procedure: the failures are usually the same two or three modules, but the parts are not on the shelf and the procedure is in someone's head. The mill's MTBF problem points to a single chronic issue with the way oil is delivered to one axis. Both fixes are now visible. Without MTBF and MTTR broken out, the shop would have spent capital replacing the laser and never touched the mill.

Common mistakes with mean time between failures

  • Mixing run time and scheduled time. Run time is the machine actually making parts. Scheduled time is the bigger window. MTBF wants run time.
  • Counting changeovers and breaks as failures. Planned stops are not failures. Filter them out cleanly.
  • Sample size too small to mean anything. One machine with three failures in a month gives a number, but not one with much signal. Aggregate or extend the window.
  • Benchmarking across dissimilar machines. A press brake's expected MTBF is not a CNC mill's expected MTBF. Compare like to like.
  • Tracking MTBF without MTTR. Reliability is half the picture. Without MTTR you cannot tell which way to invest.

Mean time between failures and related Lean tools

MTBF pairs with mean time to repair to compute availability, the A in OEE. Raising MTBF is the goal of preventive maintenance and any sensor based program for predicting failures. Each failure event also adds to total downtime, which is the raw input that connects reliability data back to lost production time.

Common questions

The questions we hear most about this term.

How is mean time between failures calculated?
Total running time divided by number of failures. If a machine ran for 800 hours over a quarter and had four unplanned breakdowns in that time, MTBF is 200 hours. The numerator is run time only, not scheduled time. Time the machine was on a planned maintenance pause does not count as running. The denominator is unplanned failures only; planned stops, changeovers, and breaks are excluded. The interval used is whatever gives a meaningful sample, usually a month for a single machine or a quarter for a smaller shop.
How is mean time between failures different from mean time to repair?
MTBF measures how long the machine runs between failures. MTTR measures how long it takes to fix the machine when it fails. They are independent. A machine with high MTBF and low MTTR is very reliable and quick to recover when it does break. A machine with low MTBF and low MTTR breaks often but recovers fast. The two together explain availability. Improve MTBF by removing failure causes; improve MTTR by training, parts staging, and standard repair procedures.
Is mean time between failures the same as mean time to repair?
No, and the two get confused because they share three letters. MTBF is the running interval between failures (how often the machine breaks). MTTR is the recovery interval after each failure (how long it takes to fix). The relationship is: availability equals MTBF divided by the sum of MTBF and MTTR. You need both numbers to know how often the machine is actually available.
What are common mistakes when using mean time between failures?
The biggest is calculating MTBF on too small a sample. One machine that failed twice in a month has too small an n to draw conclusions from. Aggregate across machines of the same type or use a longer window. The second is mixing planned stops and unplanned failures in the failure count. Changeovers are not failures. The third is benchmarking MTBF across machine types. A press brake and a CNC mill have completely different baseline MTBF. Compare like to like.
What does mean time between failures look like on the shop floor of a small contract shop?
A simple log per machine. Every unplanned stop gets a line: date, downtime minutes, cause. At the end of the month the maintenance lead totals the run hours from the time clock or the machine's own counter, counts the unplanned stops, and posts the MTBF on a board by the machine. Operators see the number, which makes the calculation real. A machine whose MTBF dropped from 180 hours to 110 hours gets a focused review before the trend gets worse.

Ditch the whiteboards and spreadsheets.

Same-day setup. No distributor lock-in. Zero stockouts. Top teams double revenue in 9 months.