Knowledge Graph as Preparation for LLMs, the Logical Next Step in Networking Data Modeling!

While the LLMs will become commodities, and therefore will come and go (do I need to mention DeepSeek these days, or you have been living in a cave lately?), what the industry finally starts to realize is that the knowledge preparation for the LLMs is key, not the LLMs themselves. Let me stress: the knowledge preparation, not only data. This consists of schemas, well defined APIs, but also a semantic layer that combines the networking intent and business logic into an ontology. Let me decompose all of this.

During the 1848-1855 California gold rush, Mark Twain said “During the gold rush it’s a good time to be in the pick and shovel business“. Where is the gold in our networking business right now? My take: “During the LLMs fast-paced development, it’s a good time to be in the business of data to knowledge transformation, mapping to networking intent and business logic.”. This requires both an ontology and a knowledge graph as compulsory steps to be LLM-ready!

In the networking industry, we all dream of autonomous networks that manages themselves, without human interventions, exactly like an autonomous car. In case of network incident, the network could re-configure itself, either to correct the fault, or to circumnavigate (basically, re-route the traffic around the fault part … but “circumnavigate” is just a cooler!) while it’s being treated by the maintenance team. Next is incidents prediction: incidents are predicted and even addressed before they actually happen. Next to that, networks could further optimize themselves based on the different (potentially conflicting) intents.

When network will correct themselves, based on the famous closed loop system, network operators could work part time, making the CFO happy as the network OPEX is reduced. Alternatively, network operators could devote their time to more added-value tasks (this would be the topic for another blog: “will autonomous networking fulfill the promise to reduce OPEX or will it simply transform the operational team job?”).

Most operators have data lakes, that accumulate huge amount of data. In order to derive value from it, this data needs to be clean, consistent, interconnected, and with clear semantics, to enable data scientists and business analysts, and in the end the LLMs, to extract useful insights, based on the context. Without the right data, it’s extremely difficult, if not a lost cause, for machine learning algorithms to solve networking issues (In other words, it’s difficult to find gold). Remember: garbage in, garbage out. On top of that, next to the clean, consistent, interconnected, and with the clear semantics and context, we need the network “intent”: without it, there is no way to find anomalies or even a closed loop action.

Focusing on the “clean, consistent and with clear semantics” characteristics of the data, a couple of us have been focusing on the network monitoring aspects with YANG-Push & Apache Kafka integration (following the data mesh principles) in the management plane, on IP Flow Information eXport (IPFIX) improvements in the data plane including the delay, and on BGP Monitoring (BMP) in the control plane. Obviously, scalability is key: from the onboarding of new devices/new telemetry data, while keeping the data semantic up to the Time Series Database (TSDB), for further analytics. This data collection done the right way is still work-in-progress but on the right track, I would say.

Let’s now focus on the “interconnected” characteristic of the data. It’s a fact that a data lake contain many disparate and siloed data sources. If not convinced, go in a NOC (Network Operator Center) and observe; it’s full of different screens & windows, each of them monitoring ONE specific aspect of the network: customer complaints, configuration transactions, IPFIX flow records, BMP route addition & withdrawal updates, syslog events, digital map & topology changes, the connectivity tests, the SLA monitoring (active, passive, or hybrid), etc. The combination of those data sources, siloed based on historical organizational and tooling decisions, is required to troubleshoot networks: configuration, monitoring, security, application, server, you-name-it. In a NOC, network operators navigate between the different data sources in case of incidents, trying to correlate the different events, mainly based on the time … for one definition of time (homework for you: what is common, or not, between an syslog time and and IPFIX flow record observation?).

While some may say: “Don’t worry, LLMs will be solving all networking issues anyway”, I would personally say: “LLMs (via agentic AI) should be solving some networking issues, mainly helping with the troubleshooting for now, if and only if we provide the networking knowledge, based on additional semantic layer, with an ontology of the business and networking logics, organized as a knowledge graph.”

Are we inventing something new here? Not really, this is the world of semantic web. What is new here is to apply it to the networking world. And it’s complex as we are reaching the limit of known networking data models. And it’s not easy as the networking world has some specific aspects as we described in our “Knowledge Graph Framework for Network Operations” document, authored with Michael Mackey, Thomas Graf, Holger Keller, Daniel Voyer, and Paolo Lucente:

  - Data Overload from Network Operations 
  - Difficulties in Data Analysis and Insight Extraction  
  - Complex Data Correlation Requirements 
  - Service and Customer Correlation  
  - Data Storage and Format Disparities 
  - Contextual Understanding and Relationship Mapping 
  - Loss of Context in Data Collection  
  - Data Collection Methods and Interpretation  
  - Organizational Silos  
  - Multiple Sources of Truths  
  - Machine Readable Knowledge

What Is a Knowledge Graph?

A knowledge graph is a semantically rich data model for storing, organizing, and understanding connected entities. A knowledge graph contains three essential elements:

Entities, which represent the data of the organization or domain area.
Relationships, which show how the data entities interact with or relate to each other. Relationships provide context for the data.
An organizing principle that captures meta-information about core concepts relevant to the business.

Its primary function is to connect (link) and contextualize data in a way that is make this data more explicit and usable common application. It then realizes a semantic layer, connecting all of the enterprise’s data context, mapping them a conceptual model.

For more introduction to the semantic web technology stack and why it’s right for the networking world, please review our Knowledge Graph Framework for Network Operations document. Your feedback is welcome

Benoît Claise

Knowledge Graph as Preparation for LLMs, the Logical Next Step in Networking Data Modeling!

What Is a Knowledge Graph?

Leave a Reply Cancel reply