How fine tuned LLMs power knowledge assist, summarization, and chat suggestions

In our last blog post on Ocean-1, Cresta’s foundation model for the contact center, we showed that, with the Ocean model, we can match GPT-4 in Knowledge Assist while achieving much better cost and lower latency.

Now, this model is live with our customers. We have now expanded the same paradigm to two other use cases: summarization and chat suggestions.

The Ocean Paradigm

In a recent paper by Google Deepmind on levels of AGI, researchers defined progress in AI through both performance and generality. For example, ChatGPT is categorized as an emerging general AI, while AlphaGo is considered a superhuman Narrow AI.

In the process of building the intelligence layer for the contact center, it became clear that we need an expert-level AI that would work well for this domain – one that would have expertise and knowledge.

Expertise is different from knowledge. Expertise is situational and requires long hours of practice. This is where learning by example comes in, either through in-context learning or fine tuning.

That’s why we built this learning paradigm for our Ocean foundation models:

Public Base Model: Start with an open-source base model (e.g. Mistral) or a partner base model (e.g. OpenAI/Anthropic)
Ocean Base Model: Instruction fine tuning using synthetic data and partner customer data
1. Ocean Per Industry: Instruction fine-tuning per industry
2. Ocean Per Customer: For enterprise customers, train on instruction data generated by our automated pipeline
Task Fine tunes: Continue fine tuning on task-specific data so that the model can learn situational expertise to perform at its peak performance

Knowledge Assist

In our previous blog post, we released Ocean for Knowledge Assist and demonstrated that a fine tuned Mistral-7B model can beat GPT-3.5 and its MoE version beating GPT-4. We went live with the fine tuned Mistral model. Both models are significantly cheaper to run inference than the GPTs.

After going live with the Ocean model for this use case with a customer, we evaluated the model’s retrieval-augmented generation (RAG) performance using these more regular metrics:

Coverage	Does the response cover all key points of the reference answer?	0 – cover no points 1 – cover partial points 2 – cover all points
Contradiction	Does the response contradict any points of the reference answer?	1 – does not contradict 0 – contradiction
Groundedness	Did the response hallucinate – generating facts not present in the retrieved articles?	1 – no hallucination 0 – hallucination
Answer Relevancy	How relevant is the generated answer to the question? If the model produces too many irrelevant facts, it should score lower.	1 – relevant 0 – not relevant
Citation Validation	Did the model include citations to articles? Are these citations accurate?

Summarization

In our previous blog post on Auto Summarization, we show how custom models can save hours of after-call work. A smaller language model could deliver post-call summarization in a much shorter time, enabling better agent receptivity. Now with a 7B base model, we can further scale the performance of summarization. Our new summarization model is particularly suited for the contact center domain and most of our deployments require our model to adapt to various customer templates:

Style 1
(default 3rs):

Style 2
(customized topic):

Style 3
(customized topic + tab view):

Style 4
(customized topic + dynamic generation):

Call reason:
xxx
Customer’s Name:
xxx
Resolution steps:
– xxx
– xxx
– xxx
Conversation results:
– xxx
– xxx

Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx
Customized Topic 4: xxx
Customized Topic 5: xxx
…

Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx
[Tab 1] [Tab 2]

Customized Topic 4: xxx
Customized Topic 5: xxx
Customized Topic 6: xxx
Customized Topic 7: xxx
[Tab 1] [Tab 2]

# when refund is not mentioned
Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx

# when refund is mentioned
Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx
Customized Topic A: xxx
Customized Topic B: xxx

Here are a few observations we found about our new summarization model.

Our summary prioritizes customer privacy and meets high standards of sensitive data compliance.

Summary topic	ChatGPT	Cresta Summary
Reason for delinquency	Customer is going through cancer treatment	Medical

2. Our summary is more concise, allowing agents to digest the entire summary quickly

Summary topic	ChatGPT	Cresta Summary
Customer’s complaints	Customer was mistakenly charged for three nights instead of two and was not provided with information regarding the hotel’s policy on group reservations. Additionally, they were not told about the charges and policies prior to checking in.	Charged for three nights, not two. Unaware of hotel’s group reservation policy. Uninformed about charges and policies prior to check-in.

3. Our model can continuously improve based on edits/feedback made by agents

Summary topic	Possible values from initial summary	Possible values from summary after feedback loop
Administrative fee	Administrative fee: $ xx Administrative fee: N/A If the specific dollar amount is not mentioned, it will output N/A	Administrative fee: $ xx Administrative fee: 5% of the loan amount Administrative fee: N/A It learns to use the percentage of the loan amount when the specific dollar amount is not mentioned.

Auto Compose (Chat)

Auto Compose is a great use case for fine-tuning Ocean models. We can leverage the large amount of transcripts available to derive a robust conversational model.

	BLEURT Score	% time is better
Previous Production Model	0.4340	10.88%
New Mistral Based Finetune	0.4925	5.79%

Going Forward

We believe that a treasure trove of expertise sits inside our customers’ data. As an intelligence layer for customer conversations, Cresta integrates into hundreds of systems to build a unified understanding of expertise from this private data – in a highly secure, responsible way (read more about our commitment to responsible AI here). We plan to bring this paradigm to more contact center workflows and supercharge coaching, assistance, and automation with a collection of expert models.