The secret to making language models useful

If you described your symptoms to me as a business leader and I typed them into ChatGPT, would you want me to generate and prescribe a treatment plan for you, sending orders to your local pharmacist — without consulting a doctor?

What if you were offered a trade: The top data scientists in the world will join your organization, but with the catch that every one of your business experts must join your competitor, leaving only data to work with and no experts to provide context?

In the era of AI, the public square is filled with voices touting the opportunities, risks, threats and recommended practices for adopting generative AI — especially language models such as GPT-4 or Bard. New open-sourced models, research breakthroughs and product launches are announced daily.

In the midst of this market momentum, emphasis has been placed on the capabilities of language models — but language is only useful when paired with knowledge and understanding. If someone memorized all the words in the dictionary that had to do with chemistry and could recite without knowledge or understanding of the basic principles, that language would be useless.

Getting the recipe right

For language models, this goes a step further and can be misleading, because models can not only recite related words, but underlying documents, frameworks, phrases and recommendations that have been written by experts.

When asked to generate a new recipe, for example, they can use correlations between previous recipes and descriptions to create a new recipe, but they have no knowledge of what tastes good — or even what the experience of tasting is. If there’s no correlation between mixing olive oil, ketchup and peaches in past recipes, models are unlikely to mix those ingredients — not because they have knowledge or understanding of what tastes good, but because of the lack of correlation between those ingredients in their dataset.

A good-tasting recipe generated by a language model is therefore a statistical likelihood for which we can thank the experts whose recipes were included in original source data. Language models are powerful, and the secret ingredient to making them useful is expertise.

Expertise combines language with knowledge and understanding

The phrase “correlation does not equal causation” is well-known by those who work with data. It refers to the fact that you can draw a false correlation between two unrelated things, misinterpreting the connection to imply that one caused the other, such as a rooster crowing in the morning commanding the rising of the sun.

Machines are extremely helpful in identifying correlations and patterns, but expertise is required to determine if those imply true causations and should inform decision-making (such as whether to invest in training roosters to crow an hour earlier to get an extra hour of daylight).

In the human experience of learning, language is only the first step. As a child gains language to label things, people, places, verbs and more, their caregivers use it to instill knowledge. We live on a planet called earth. That ball in the sky is called the sun. The next step is understanding cause and effect (causation or causality): The sun in the sky is making your skin feel warm. Jumping into a cold lake can cool you back down.

By the time we arrive at adulthood, we have internalized complex structures of expertise that consist of language, knowledge (what) and understanding (why).

Recreating the structure of expertise

Consider any topic. If you have language without knowledge or understanding, you’re not an expert. I know that a traditional car has a transmission, an engine that has pistons, a gas tank — I have some language about cars.

But do I have knowledge? I know that the car delivers gas to the engine through fuel injection, and there’s a reaction involving pistons firing, and that it is crucial in moving the car forward. But do I understand why? And if it stopped working, would I know how to fix it? Much to the chagrin of my high school auto shop teacher, I would need to hire an expert who understood why and had knowledge of how to fix it, learned through hands-on experience.

Translating that to a machine context, language models without knowledge (represented by knowledge graphs and models) or understanding (represented by causal models) should never make decisions, as they have no expertise. A language model making a decision on its own is like giving a toolbox and access to your car to a person who has only memorized the next most likely word on everything that has to do with cars.

So how do we harness the potential of language models by recreating the structure of expertise in machines?

Start with expertise and work backwards

Machine learning (ML) and machine teaching are sub-disciplines of the field of translating human expertise to machine language so that machines can either inform human decisions or autonomously make decisions. This can free up human capacity to focus on decisions and discovery that are either too nuanced or for which there is not enough data to translate to machine language.

ML begins with the question of how to better equip machines to learn, and machine teaching begins with the question of how to better equip humans to teach machines.

The most common misconception in discussions around AI and ML is that data is the most critical element — but expertise is the most critical element. Otherwise, what is the model learning? Sure, it’s identifying patterns, classifications and combing through millions of rows of data in seconds. But what makes those patterns useful?

When an expert has identified that a pattern can inform a decision that benefits the organization and its customers, that expertise can be translated into machine language, and the machine can be taught to associate that pattern with business rules and autonomously make beneficial decisions.

Therefore, the process of distilling expertise into machines does not begin with data, it begins with expertise and works backwards. An example of this is when a machine operator notices that certain sounds a machine makes correlate to necessary adjustments. When it makes a high-pitched whistle, the temperature needs to be turned down, for instance. On top of a full workload, the operator listens throughout the day in case the machine makes one of those sounds. There isn’t preexisting data that corresponds to this situation, but there is expertise.

Working backwards from that expertise is fairly straightforward. Install sensors that detect the sounds made by the machine, then work with the expert to correlate those sounds (frequencies and decibel combinations) to make necessary adjustments to the machine. This process can then be offloaded to an autonomous system, freeing up the operator’s time to handle other responsibilities.

Identify the most critical expertise

The process of building AI solutions should begin with the question of what expertise is most important to the organization, followed by an assessment of the level of risk associated with losing that expertise or the potential upside of offloading that expert-driven decision to a machine.

Is there only one person in the organization who understands that process or how to fix a system when it goes down? Do thousands of employees follow the same process each day that could be offloaded to an autonomous system, thus freeing up an extra 30 minutes on their daily calendars?

The third step is to assess which of those associated with the highest degree of risk or potential upside could be translated to machine language. This is the step when data and tools (such as language models) come into the conversation as an enabler translating expertise into machine language and interfacing with machines.

Fortunately for most organizations, the groundwork of building expert systems has already been laid, and as a starting point, language models can either reference or be checked against the expertise that has been programmed into them.

Exploration to operations

In the coming decade, we will see a shake-up of market sectors based on their investments in AI.

A cautionary tale is the emergence of video on demand: Netflix introduced streaming in 2007. Blockbuster filed for bankruptcy three years later, despite having incubated and piloted Blockbuster On Demand in 1995.

By the time a competitor introduces a product or service that is sufficiently advanced with meaningful and differentiated applications of AI, it will likely be too late to pivot or “fast follow,” especially given the time and expertise required to develop robust applications.

By the year 2030, household names we now revere will have joined the ranks alongside Blockbuster because they chose to fast follow, and by the time they saw the market force that would be their demise, it was too late.

Rather than planning to wait and react to developers’ investments and developments, leaders must begin with the question of what they could achieve in the market that would require everyone else to scramble to react to them.

In the era of autonomous transformation, the organizations best positioned to retain or expand their market position will be those that invest in transferring operationalized expertise to machines and setting a vision for the future of the market and of the value their organization can create, commissioning expeditions into the unknown to discover if that vision can be realized, while rigorously distilling discoveries into tangible value.

Brian Evergreen is founder of The Profitable Good Company.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!