Transformer failure with flames at substation - predictive maintenance and early fault detection prevent these events

Synthetic data, LLMs and edge AI: what they actually mean for transformer diagnostics

The technologies reshaping predictive maintenance are finally mature enough to discuss honestly. Here is what the latest research tells us about where transformer health assessment is headed.


If you spend any time reading about industrial AI these days, you will encounter breathless claims about how large language models and synthetic data are about to revolutionize everything. The power transformer industry is no exception. Every conference, every vendor pitch, every LinkedIn post seems to promise that AI will solve all your asset management problems.

The reality, as usual, is more interesting and more nuanced.

Over the past few months, I have been digging through the latest peer-reviewed research on these technologies as they apply to our field. Not vendor whitepapers. Not press releases. Actual published studies with methodology sections you can scrutinize and results you can verify.

What I found surprised me. Some things that seemed promising turn out to have serious limitations. Other approaches that flew under the radar are showing remarkable potential. And a few assumptions that most of us in the industry have been making for years are being challenged by new data.

Let me walk you through what the research actually says.

The persistent problem: we wtill do not have enough failure data

Here is a truth that everyone in transformer diagnostics knows but rarely discusses openly: transformer failures are rare events. This is excellent news for grid reliability but terrible news for anyone trying to train machine learning models.

A comprehensive review in Applied Sciences examined 124 studies published between 2014 and 2024. The authors found that class imbalance remains the single biggest obstacle in developing reliable diagnostic models. Your typical DGA dataset contains thousands of “normal” readings and maybe a handful of actual fault cases.

Traditional approaches to this problem involve various data balancing techniques. A December 2024 study in Energy Reports tested five different strategies on real DGA data. The best performer, a combination of Edited Nearest Neighbors with Support Vector Machine, achieved 88% accuracy.

But here is where things get tricky. Performance on artificially balanced datasets does not always translate to real-world reliability. The researchers themselves acknowledge this limitation. An 88% accuracy rate sounds impressive until you realize that in production, you might be dealing with a 99:1 ratio of normal to abnormal cases. The math gets ugly fast.

This is why the industry has started looking seriously at synthetic data generation. Not as a silver bullet, but as one tool in a larger toolkit.

Synthetic data: the promise and the pitfalls

A January 2026 systematic review in the Journal of Intelligent Manufacturing analyzed 86 peer-reviewed articles on synthetic data for predictive maintenance. The researchers identified four main approaches: classical data augmentation, generative models like GANs and VAEs, physics-based simulation, and hybrid methods.

The finding that matters most for transformer diagnostics: hybrid and physics-informed models are particularly valuable in safety-critical domains where transparency and physical plausibility are essential.

Think about what this means in practice. If you are generating synthetic DGA data, it is not enough for the numbers to look statistically realistic. The gas ratios need to make physical sense. The relationships between hydrogen, methane, ethylene, and acetylene need to follow the actual chemistry of oil decomposition under different fault conditions.

Pure statistical generation can produce data that fools a model during training but leads to unreliable predictions on real equipment. The physics has to be baked in.

There is another critical finding from this research that deserves attention. An October 2025 arXiv paper investigated what happens when models are trained recursively on their own outputs. The phenomenon, called model collapse, causes quality to degrade over generations.

The good news: collapse only occurs when you completely replace real data with synthetic data. If you maintain a foundation of verified real-world data and use synthetic data to augment it, performance actually improves. The key word is augment, not replace.

For anyone building diagnostic models, this suggests a clear strategy. Start with the best real DGA data you can get. Use synthetic generation to fill gaps, simulate rare fault scenarios, and stress-test your model. But never lose the anchor of actual field data.

Large Language Models: useful, but not for what you might think

The AI hype cycle has created unrealistic expectations about what LLMs can do for industrial diagnostics. Let me be direct: if you are hoping that ChatGPT will replace your DGA interpretation methods, you are going to be disappointed.

A May 2025 study in Electronics tested an LLM-based approach against traditional models for industrial compressor monitoring. The LLM achieved 92.3% recall and 0.991 AUC-ROC. Impressive numbers, but the real insight is in why the LLM performed well.

It was not because the LLM was better at analyzing sensor data. Traditional models already do that effectively. The advantage came from multimodal data fusion. The LLM could simultaneously process structured sensor readings and unstructured maintenance logs, inspection reports, and operator notes.

This is where LLMs genuinely add value in industrial settings. Not as a replacement for specialized diagnostic algorithms, but as an integration layer that can make sense of heterogeneous information sources.

A 2025 review based on 126 peer-reviewed articles confirms this trend. Hybrid models combining multiple data sources, including DGA, vibration analysis, thermal imaging, and partial discharge measurements, are becoming the standard. LLMs offer a natural architecture for this kind of fusion because they were designed to handle diverse input types.

At Seetalabs, we have been working with this exact challenge. RONIN AI already integrates DGA data with oil quality parameters, paper condition indicators, and equipment age. The question we are exploring now is how to bring in unstructured data sources without sacrificing the precision that our customers depend on.

Edge deployment: bringing intelligence to remote substations

Here is a practical problem that anyone managing geographically distributed transformer fleets understands: connectivity is not always reliable. Remote substations may have intermittent network access or none at all. Sending data to cloud servers for analysis introduces latency and creates dependencies on infrastructure you do not control.

This is why edge deployment matters. Running diagnostic models directly on local hardware eliminates these problems.

A December 2025 paper in Frontiers in Computer Science explores LLM integration with edge computing specifically for industrial data analysis. The authors note that edge-based analysis enables processing of maintenance logs, sensor data, and equipment reports without requiring continuous cloud connectivity.

The technical challenge is model size. Full-scale LLMs require significant computational resources. A typical edge device in a substation does not have the processing power or memory to run GPT-4.

The solution involves model compression through techniques like quantization. A January 2026 survey in MDPI AI summarizes the standard approach: train in 16-bit precision, then quantize to 4-bit for deployment. Methods like GPTQ and AWQ preserve most model quality while reducing memory requirements by 75%.

But how do these compressed models actually perform on real hardware?

A study in ACM Transactions on Internet of Things tested 28 quantized LLM versions on Raspberry Pi 4 hardware with 4GB RAM. The results reveal important tradeoffs. Highly compressed models using Q3 quantization showed significant variability in energy consumption. One model variant exhibited plus or minus 3.49 joules per token variability.

For transformer diagnostics, where consistency is essential, this suggests that moderate Q4 or Q8 quantization may be preferable to aggressive compression. The slight increase in hardware requirements is worth the reliability gain.

Small Language Models: a pragmatic choice for industry

The small language model market is growing rapidly, from $0.93 billion in 2025 to a projected $5.45 billion by 2032. For our industry, SLMs offer specific advantages that matter more than raw capability benchmarks.

Offline operation is perhaps the most important. A technical article on Industrial IoT applications emphasizes that small models can function without network connectivity. For remote substations, this is not a nice-to-have feature. It is a requirement.

Data privacy is another consideration. DGA results and transformer condition data often contain commercially sensitive information. Utilities may be reluctant to send this data to external cloud services. Edge processing keeps everything on-site.

Latency matters for real-time monitoring applications. Even a few seconds of delay can be significant when you are tracking developing faults. Local processing eliminates the network round-trip entirely.

What the research gets wrong

A January 2025 SSRN review on AI and ML in transformer fault diagnosis highlights gaps in the current literature that deserve attention.

Most studies use proprietary datasets that other researchers cannot access. This makes it nearly impossible to compare approaches fairly or replicate results. The field would benefit enormously from more open benchmarking data.

Researchers tend to focus on accuracy while ignoring other metrics that matter in production. Precision and recall are critical when error costs are asymmetric. A false negative, where you miss a developing fault, can cost millions in equipment damage. A false positive just triggers an unnecessary inspection. These are not equivalent outcomes, but accuracy treats them as if they were.

Almost all published work reports performance on held-out test sets. Very few papers document actual field deployments and performance over time. Laboratory results do not always translate to operational reliability.

Interpretability remains underdeveloped. A December 2025 paper in Technologies introduces SHAP-based methods for explaining Health Index predictions. This kind of approach, which allows operators to understand why a model made a particular recommendation, is essential for industrial adoption. People responsible for million-dollar assets need to trust the tools they use.

What this means for asset managers

Based on the research I have reviewed, here are some practical takeaways.

Do not expect LLMs to replace established DGA interpretation methods. Traditional approaches, including the methods built into tools like RONIN AI, already work well for fault classification. The value of newer AI techniques is in integration and data fusion, not in replacing proven diagnostic logic.

Treat synthetic data as augmentation, not replacement. It can help fill gaps in imbalanced datasets and simulate rare scenarios for stress-testing. But your foundation must remain real field data from actual equipment.

Edge deployment is technically mature. The barrier now is organizational and integration-related, not technological. If your monitoring infrastructure supports it, field-deployable AI is achievable today.

Invest in data quality before chasing algorithmic sophistication. Clean, well-structured data fed into a straightforward model will outperform a cutting-edge algorithm trained on messy inputs.

Where things are headed

The convergence of LLMs, synthetic data generation, and edge computing is creating possibilities that did not exist even two years ago.

A transformer diagnostic system of the near future could process DGA data in real-time on local hardware, with no cloud dependency. It could automatically integrate structured sensor readings with unstructured maintenance notes and inspection reports. It could generate synthetic scenarios to stress-test predictions and identify edge cases the training data did not cover. And it could provide clear explanations for its recommendations, citing the specific factors that influenced each decision.

We are not there yet. But the direction is clear, and the pace of progress is accelerating.

For those of us building tools in this space, the challenge is separating genuine capability from hype. The research literature provides a grounding that vendor marketing cannot. Technologies that seemed revolutionary five years ago have matured into practical tools. Others that generated enormous excitement have revealed serious limitations.

The organizations that will benefit most from these advances are the ones paying attention to the details, not just the headlines.


This article was written for power transformer engineers, asset managers, and grid operators interested in the intersection of AI and predictive maintenance. All cited research is peer-reviewed and publicly accessible through the provided links.


If you want to dig deeper into the primary sources referenced in this article: