Equipment Maintenance

Preventive Maintenance: A Practical Guide to Choosing the Right Strategy for Each Asset

Most plants get maintenance wrong in one of two opposite directions. Some run to failure: machines are fixed only when they break, downtime is unplanned, and the maintenance team lives in permanent firefighting mode. Others over-maintain: a calendar full of preventive tasks, technicians tearing down equipment on a schedule whether it needs it or not, burning labor and spares — and sometimes causing the very failures the program was meant to prevent. Both are expensive. Neither is necessary.

The hard-won insight behind a good maintenance program is simple to state and easy to ignore: not all assets deserve the same strategy, and calendar-based preventive maintenance is often the wrong default. The job is not "do more PM." It is to decide, asset by asset, which failure-management strategy actually reduces risk for the lowest cost — preventive, predictive, or deliberately running to failure. This guide gives you a real framework for that decision: criticality ranking to sort assets, the difference between time-based and condition-based maintenance and when each pays, how to set intervals with the P-F interval instead of guessing, and how to build toward TPM on the floor. The short version: maintenance is a portfolio of strategies, not one schedule applied to everything.

Why both extremes fail

Run-to-failure as a default fails for the obvious reason — unplanned downtime is the most expensive kind. A failure mid-shift takes the line out at the worst possible moment, often damages adjacent components, and forces overtime and expedited spares. Unplanned downtime erodes availability directly, which is the first lever in your OEE. If your maintenance is mostly reactive, your OEE has a ceiling you will never break through.

Over-maintenance fails for a subtler reason that catches experienced teams off guard: maintenance-induced failures. Every time you open a machine, you introduce risk — a gasket reseated wrong, a bearing contaminated during a "preventive" replacement, a bolt torqued unevenly, a sensor knocked out of calibration. A well-known pattern in reliability work is that many components follow no wear-out curve at all; they fail randomly. For those, replacing a healthy part on a schedule does nothing to lower failure probability and adds a fresh chance to break something. "More PM is always better" is a fallacy. Past a point, intervention itself becomes the dominant failure mode.

The way out is to stop treating maintenance as one decision and start treating it as a sorting problem.

Step one: rank assets by criticality

You cannot apply the right strategy to an asset until you know how much its failure costs. Criticality ranking gives you that, and it does not require a consultant or software. A simple criticality matrix scores each asset on two axes — the consequence of failure and the likelihood of failure — and multiplies them into a single rank.

For a slightly richer version, borrow the RPN (Risk Priority Number) idea from FMEA: score each failure mode on three factors from 1 to 10:

  • Severity — how bad the consequence is (safety, environmental, lost production, scrap).
  • Occurrence — how likely the failure is, from history.
  • Detection — how likely you are to catch it before it bites (a 10 means it gives no warning).

Multiply the three for an RPN. The point is not the precise number — it is the ranking. Sort your asset list by RPN or criticality score, and a Pareto pattern almost always appears: a small share of assets carries most of the risk. Those get the most attention. The long tail at the bottom often deserves almost none.

This sorting is what tells you which strategy each asset earns:

  • High criticality, gives warning before failure → predictive / condition-based maintenance.
  • High criticality, wears out predictably with no warning → time-based preventive maintenance.
  • Low criticality, cheap, easily replaced → run-to-failure, on purpose.

That last line is the one most programs miss. Run-to-failure is not a failure of planning; for the right asset it is the correct, lowest-cost strategy. A redundant pump, a $40 sensor with a spare on the shelf, a non-critical light — let it run until it dies and replace it then. Spending PM labor on those assets is waste dressed up as diligence.

Time-based vs condition-based: when each pays

Time-based (preventive) maintenance does a task on a fixed interval — every 500 hours, every quarter. It pays off only when the failure is genuinely age-related: the component wears out predictably, so replacing it before the wear-out zone prevents the failure. Think lubrication, filter changes, timing belts with a known service life. If there is no wear-out pattern, a time-based task is just scheduled risk.

Condition-based (predictive) maintenance watches an actual indicator of health — vibration, temperature, oil particle count, motor current, ultrasound — and acts only when the indicator says the asset is heading toward failure. It pays off when the failure develops gradually enough to detect, and when the asset is critical enough to justify the monitoring. You do less work, you do it just in time, and you avoid opening healthy machines.

The trade-off is real: condition-based monitoring costs money and skill to set up and interpret, so it is not worth it for a low-criticality asset. That is exactly why criticality ranking comes first. You spend predictive effort where failures are both costly and detectable.

Setting intervals with the P-F interval (not guesswork)

Here is where most preventive programs quietly go wrong: the intervals are guessed, or worse, copied from the OEM manual and never revisited. There is a better way, built on the P-F interval.

Most failures do not happen instantly. There is a point P where a defect becomes detectable — the first measurable sign, like a rising vibration signature — and a later point F where the asset has functionally failed. The time between them is the P-F interval. It is the window you have to catch and act on the problem.

The rule that follows is the most useful single idea in condition-based maintenance: your inspection interval must be shorter than the P-F interval — generally less than half of it. Inspect less often than that and you risk walking past the defect between checks and finding it only as a breakdown. This is how you set frequency with evidence instead of a hunch.

Worked example: a packaging-line motor

Take a motor driving a packaging line — high criticality, because if it stops the whole line stops. The old program called for a monthly teardown inspection: pull the motor, inspect bearings, reassemble. It is invasive, costs four hours of planned downtime, and every teardown risks contaminating the bearings on reassembly — a textbook maintenance-induced failure.

Score it first. Severity is high (whole line down): say 8. Occurrence is moderate: 4. Detection on the old scheme is poor because a monthly teardown catches nothing in between: 7. RPN = 8 × 4 × 7 = 224 — clearly in the "do something smarter" tier.

Now look at the dominant failure mode: bearing wear. Bearing degradation announces itself in the vibration spectrum well before it seizes. From maintenance history and bearing data, the team estimates the P-F interval — first detectable vibration change to functional failure — at roughly eight weeks. Apply the rule: inspect at no more than half the P-F interval, so monitor vibration every four weeks, or continuously with a fixed sensor.

The strategy flips. Drop the monthly teardown entirely. Put vibration monitoring on a four-week (or continuous) cadence and only intervene when the trend turns. Re-score detection — now you get weeks of warning, so detection drops from 7 to maybe 2, and RPN falls to around 64. You have eliminated four hours of monthly planned downtime and the contamination risk of needless teardowns, while catching real bearing failures with a planned, in-window repair. The downtime that used to be unplanned and ugly is now scheduled, short, and rare.

That is the whole framework in one asset: rank it, pick condition-based over time-based because the failure gives warning, and let the P-F interval — not the calendar — set the frequency.

Building toward TPM and autonomous maintenance

Strategy on paper is worthless if it stays in the planner's office. Total Productive Maintenance (TPM) closes that gap by moving the simplest, highest-frequency tasks onto the operators who run the equipment — autonomous maintenance. Cleaning, lubrication, basic inspection, and "look, listen, feel" checks become part of running the line, not a separate work order.

This matters for two reasons. First, operators are at the machine all shift; they notice the new noise or the small leak weeks before a scheduled inspection would. They effectively shorten your detection interval for free. Second, it frees the skilled maintenance team to do the condition-based and reliability work that actually needs them. Autonomous maintenance is not about offloading work — it is about putting the right eyes on the asset continuously, which is exactly what catches problems inside the P-F window.

Common mistakes — and why they hurt

  • Copying OEM PM intervals blindly. Manufacturers set conservative intervals to cover every customer and limit warranty exposure, not to optimize your specific duty cycle. Treat them as a starting point, then tune with your own failure data.
  • Believing more PM is always better. Past the point that addresses real wear-out modes, extra PM adds cost and induces failures through unnecessary intervention. For random-failure components, scheduled replacement does nothing.
  • No failure coding. If your work orders do not record what failed and why, you can never learn. You cannot estimate a P-F interval, rank by occurrence, or tell whether a strategy is working. Failure coding is the data foundation; without it you are guessing forever.
  • One strategy for everything. Applying the same calendar PM to a critical motor and a redundant $40 sensor over-serves one and under-serves the other. Sort first.

Edge cases worth getting right

Cheap, redundant assets. If an asset is inexpensive, has a standby, and fails without dragging anything else down, run-to-failure is correct. Do not let a blanket "everything gets PM" policy waste labor here.

Long-lead critical spares. Some critical components have lead times measured in months. For these, your strategy is not just maintenance frequency — it is spares stocking. A critical asset whose replacement part takes twelve weeks to arrive justifies holding that spare on the shelf even at significant carrying cost, because the alternative is twelve weeks of downtime. Tie your spare-parts strategy to your criticality ranking: the same analysis that earns an asset predictive monitoring usually earns its key spares a place in inventory.

The trick

If you remember one thing: match the strategy to the failure mode, not the calendar. Ask of every important asset, "Does this failure give warning?" If yes, watch the condition and let the P-F interval set your inspection frequency. If no but it wears out predictably, replace it on time. If it is cheap and harmless when it dies, let it run. The plants that escape both firefighting and over-maintenance are the ones that stopped asking "how often should we service this?" and started asking "how does this actually fail?"

Frequently asked questions

What is the difference between preventive and predictive maintenance?

Preventive (time-based) maintenance does a task on a fixed schedule — every so many hours or months — regardless of the asset's condition. Predictive (condition-based) maintenance measures an actual health indicator like vibration or temperature and acts only when the data shows trouble developing. Preventive suits predictable wear-out failures; predictive suits failures that give detectable warning and sit on assets critical enough to justify monitoring.

When is run-to-failure actually the right choice?

When the asset is cheap, easily and quickly replaced, and its failure does not endanger people, stop production, or damage other equipment — especially if it has a redundant backup. For those assets, preventive work is wasted labor. Run-to-failure is a deliberate, valid strategy for the low-criticality tail of your asset list, not a sign of poor planning.

How do I set the right maintenance interval?

Use the P-F interval — the time between when a failure first becomes detectable (P) and when the asset functionally fails (F). Estimate it from failure history and condition data, then inspect at less than half that interval so you never miss the warning window. This replaces guesswork and blindly copied OEM intervals with evidence from how the asset really fails.

Can preventive maintenance cause failures?

Yes. Every intervention risks introducing a new fault — contamination, misassembly, calibration drift, an over-torqued bolt. These are called maintenance-induced failures. For components that fail randomly rather than wearing out, scheduled teardowns add risk without reducing failure probability, which is why "more PM is always better" is a costly fallacy.

What is TPM and how does it fit in?

Total Productive Maintenance (TPM) makes equipment reliability a shared responsibility. Its autonomous-maintenance element moves routine cleaning, lubrication, and basic inspection onto operators, who are at the machine all shift and spot problems early. That extends your detection coverage for free and frees skilled technicians for the condition-based and reliability work that needs their expertise.

Build a maintenance strategy that matches the asset

A good maintenance program is not a fuller schedule — it is a smarter sort. Rank your assets by criticality, decide preventive, predictive, or run-to-failure for each based on how it actually fails, set intervals from the P-F interval instead of the manual, and push routine checks onto the floor through TPM. Do that and you escape both the chaos of running to failure and the waste of over-maintaining. Explore more practical, vendor-neutral operations guides at Manufax.

Comments are disabled for this article.