Optimizing Your Data for Maximum LLM Reliability

Whether you’ve noticed or not, artificial intelligence is becoming integral in modern systems, either front and center or behind the scenes. And this is just the beginning. Thanks to recent advances in large language models (LLMs), examples range from customer service chatbots to health-care data analysis and nuanced writing advice. This era has arrived with conversational interfaces, the processing of unstructured data, code synthesis and simple cognitive tasks.

Beneath this veneer of sophistication, however, lies a critical reality: LLMs are not a panacea for all computing challenges, especially given their tendency to produce results that are plausible without necessarily being accurate.

(That is, as Carnegie Mellon University professor Jignesh Patel put it: “Generative AI exceeded our expectations until we needed it to be dependable, not just amusing.”)

And if you need LLMs to make use of your enterprise data, models or algorithms, this is a very big issue.

Fine-Tuning Limitations

Fine-tuning LLMs was initially seen as the solution to inaccurate answers and hallucinations because it allowed models to be adapted specifically to particular domains or tasks. By exposing the LLM to a curated set of domain-specific data, the model could learn the nuances and specialized knowledge required to generate more accurate and contextually relevant responses. This process promised a significant reduction in errors and improved performance in niche applications, making it an appealing approach for early adopters.

Fine-tuning LLMs to increase accuracy and reduce hallucinations ultimately revealed a significant limitation: fine-tuned models tend to become rigid and less adaptable, struggling to incorporate new information or contexts without additional retraining, which is impractical in rapidly evolving fields. This challenge highlights the need for more flexible and scalable approaches to improving LLM performance.

The Retrieval-Interleaved Generation (RIG) Paradigm to the Rescue

Fortunately, the retrieval-interleaved generation (RIG) paradigm addresses many of these limitations. Instead of relying solely on the static knowledge embedded in the model (that is, from its training data), the LLM is connected to external sources such as databases, knowledge systems or even the web. When it encounters a query that requires current or domain-specific information, the model retrieves relevant data dynamically and incorporates it into its generated responses.

This was the reason Wolfram was invited to be among the first plugins to ChatGPT (something that has since evolved into Wolfram GPT). That plugin used Wolfram|Alpha as a source of data and the Wolfram Cloud as an engine for executing Wolfram Language code that the LLM might synthesize.

ChatGPT Gets Its Wolfram Superpowers!

This becomes relevant for most enterprise applications accessing private, proprietary data, whether that is billing and shipping data for a customer services chatbot; production, stock and orders data for a manufacturing control tool; or scientific and engineering models for a research assistant.

Of course, within Wolfram Language, we have a well-established pipeline of technology that makes it trivial to connect to various data sources and add computational tools to an LLM that can do all of that, so for simple projects, this is already a solved problem.


The Scaling Challenge

Unfortunately, while the approach of “take an LLM, add some prompt engineering and add some tools” can quickly make great applications for narrow purposes, it can start to break down as you broaden the aspirations for your tool. The problem? As you add more endpoints to each of your different databases or for multiple models and digital twins, this level of complexity can overwhelm your LLM and cause it to be confused.

For example, a financial analysis tool leverages multiple databases for various data types, such as stock market data, economic indicators and company financials. When a user asks for insights on how a recent economic indicator change might affect the stock market, the LLM needs to fetch data from both the economic indicators database and historical stock market data to analyze correlations. The LLM might, however, mistakenly call the stock market database for economic indicators or vice versa or send incorrect arguments like date ranges or specific indicators to each endpoint. This can result in the tool providing inaccurate or incomplete information, frustrating the user and diminishing the tool’s reliability.

The problem is twofold. First, the LLM starts to get confused about which endpoint to call for which piece of information and the arguments to send to the endpoint. But more profoundly, when you ask queries that cross different silos—say joining data or passing retrieved data into a model to produce a prediction—it gets confused about what things really mean. This is as much a feature of the ambiguity of human language, which is the LLM world, as a problem with the LLM. (It is, after all, why math and other forms of symbolic representation and processing were invented.)


The Computable Knowledge Layer

One solution to the scaling challenge is to produce an all-encompassing endpoint that is a single source of computational knowledge and data and where all these issues of symbolic meaning, source identification, formal representation and processing are taken care of. You then provide a single, flexible interface that the LLM can send its knowledge queries to.

Sure, the LLM still has to call this endpoint correctly, but Wolfram has already mastered this type of challenge thanks—once again—to our earlier work with Wolfram|Alpha. It’s a knowledge engine designed to be a single source of computable data—albeit, originally for direct human access—from private knowledge sources and ontologies. Furthermore, it also has a natural language interface that, while far less fluent than modern LLM approaches, is nevertheless sufficiently forgiving and broad for the LLM to communicate with natural language, which it naturally does, without having to try and teach it to use formal API codes.


Making Your Data Computable

So what is involved in getting data ready for LLM access? At a small scale, nothing. If you have relatively narrow goal and clean data sources, you can deal with the challenges through a combination of endpoint design and prompt engineering. Indeed, we are engaged in several “add an LLM to my data” type projects from database or document sources built directly with combinations of Wolfram Language LLM-related functionality, Wolfram Chat Notebooks and deployment technologies like Wolfram Enterprise Private Cloud.

But while you are getting these “easy wins” in place, you should start considering preparing your data for the more ambitious “make my entire enterprise knowledge accessible to AI” type projects that will soon become one of the decisive competitive advantages for many organizations. This requires moving all your data toward level 10 on Wolfram’s computable data scale.

Wolfram Scale of Data Computability

The central idea for achieving the higher levels is to build a symbolic representational layer on the meaning and relationships of the data. That doesn’t require an upheaval in the data capture and data storage infrastructure but is about adding a layer that ensures that when you retrieve a value from a your data, you know what it means, how it relates to other values and what models, calculations or visualizations can consume it—and in a fully automated way.

Take a simple example: if you extract a 2 and a 3 from a database, can you do the operation “2 + 3”? If so, what does it mean? Well, if they represent inches and meters, we could, but the answer would not be 5. If they represent product IDs, the operation probably isn’t valid. But perhaps if they were IDs of investment portfolios, adding them together might be chosen to represent the combined portfolio. Doing this systematically so that high-fidelity digital twins or predictive models can consume data is what unlocks the open-ended, ad-hoc queries that an LLM could request.

In most organizations, that knowledge is patched with humans—librarians, business intelligence (BI) teams, analysts and others with similar roles. Not only is that expensive, it is also slow and the reason why most organizations only have near-real-time access to mission-critical data. And data deemed “less critical”? It will likely wind up languishing in a queue waiting for analysts’ attention.


Use Wolfram to Connect Your Dots

Smart business decisions come from making connections between disparate datasets. Take a retail company looking to streamline its supply chain: they’re not just looking at sales numbers. They’re diving into customer feedback, inventory levels and market trends. This holistic view uncovers patterns and forecasts demand with increased precision. And LLMs have the potential to crunch mountains of data to find insights your people could miss. But here’s the kicker: the advantages of LLMs can easily be limited by bad or messy data. If you feed them curated, high-quality data, they’ll give you recommendations that are spot-on. But if not? Bad analysis is worse than no analysis at all.

Your solution is Wolfram technology and our data curation team, which has a decade of experience in creating computable representations of enterprise data. We’re ready to help you on the journey toward enterprise AI.

Contact Wolfram Consulting Group to learn more about using Wolfram’s tech stack and LLM tools to generate actionable business intelligence.