How fine tuned LLMs power knowledge assist, summarization, and chat suggestions

Motoki Wu

Ethan Jiang

Chuan Wang

Tim Shi

In our last blog post on Ocean-1, Cresta's foundation model for the contact center, we showed that, with the Ocean model, we can match GPT-4 in Knowledge Assist while achieving much better cost and lower latency.Now, this model is live with our customers. We have now expanded the same paradigm to two other use cases: summarization and chat suggestions.

The Ocean Paradigm

In a recent paper by Google Deepmind on levels of AGI, researchers defined progress in AI through both performance and generality. For example, ChatGPT is categorized as an emerging general AI, while AlphaGo is considered a superhuman Narrow AI.In the process of building the intelligence layer for the contact center, it became clear that we need an expert-level AI that would work well for this domain - one that would have expertise and knowledge.Expertise is different from knowledge. Expertise is situational and requires long hours of practice. This is where learning by example comes in, either through in-context learning or fine tuning. That’s why we built this learning paradigm for our Ocean foundation models:

Public Base Model: Start with an open-source base model (e.g. Mistral) or a partner base model (e.g. OpenAI/Anthropic)
Ocean Base Model: Instruction fine tuning using synthetic data and partner customer data
1. Ocean Per Industry: Instruction fine-tuning per industry
2. Ocean Per Customer: For enterprise customers, train on instruction data generated by our automated pipeline
Task Fine tunes: Continue fine tuning on task-specific data so that the model can learn situational expertise to perform at its peak performance

Knowledge Assist

In our previous blog post, we released Ocean for Knowledge Assist and demonstrated that a fine tuned Mistral-7B model can beat GPT-3.5 and its MoE version beating GPT-4. We went live with the fine tuned Mistral model. Both models are significantly cheaper to run inference than the GPTs.After going live with the Ocean model for this use case with a customer, we evaluated the model’s retrieval-augmented generation (RAG) performance using these more regular metrics:CoverageDoes the response cover all key points of the reference answer?0 - cover no points
1 - cover partial points
2 - cover all pointsContradictionDoes the response contradict any points of the reference answer?1 - does not contradict
0 - contradictionGroundednessDid the response hallucinate - generating facts not present in the retrieved articles?1 - no hallucination
0 - hallucinationAnswer RelevancyHow relevant is the generated answer to the question? If the model produces too many irrelevant facts, it should score lower.1 - relevant
0 - not relevantCitation ValidationDid the model include citations to articles? Are these citations accurate?

Summarization

In our previous blog post on Auto Summarization, we show how custom models can save hours of after-call work. A smaller language model could deliver post-call summarization in a much shorter time, enabling better agent receptivity. Now with a 7B base model, we can further scale the performance of summarization. Our new summarization model is particularly suited for the contact center domain and most of our deployments require our model to adapt to various customer templates:Style 1
(default 3rs):Style 2
(customized topic):Style 3
(customized topic + tab view):Style 4
(customized topic + dynamic generation):Call reason:
xxx
Customer’s Name:
xxx
Resolution steps:
- xxx
- xxx
- xxx
Conversation results:
- xxx
- xxxCustomized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx
Customized Topic 4: xxx
Customized Topic 5: xxx
…Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx
[Tab 1] [Tab 2]

Customized Topic 4: xxx
Customized Topic 5: xxx
Customized Topic 6: xxx
Customized Topic 7: xxx
[Tab 1] [Tab 2]# when refund is not mentioned
Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx

# when refund is mentioned
Customized Topic 1: xxx
Customized Topic 2: xxx
Customized Topic 3: xxx
Customized Topic A: xxx
Customized Topic B: xxxHere are a few observations we found about our new summarization model.

Our summary prioritizes customer privacy and meets high standards of sensitive data compliance.

Summary topicChatGPTCresta SummaryReason for delinquencyCustomer is going through cancer treatmentMedical 2. Our summary is more concise, allowing agents to digest the entire summary quicklySummary topicChatGPTCresta SummaryCustomer’s complaintsCustomer was mistakenly charged for three nights instead of two and was not provided with information regarding the hotel's policy on group reservations. Additionally, they were not told about the charges and policies prior to checking in.

Charged for three nights, not two.
Unaware of hotel's group reservation policy.
Uninformed about charges and policies prior to check-in.

3. Our model can continuously improve based on edits/feedback made by agentsSummary topicPossible values from initial summaryPossible values from summary after feedback loopAdministrative feeAdministrative fee: $ xxAdministrative fee: N/A
If the specific dollar amount is not mentioned, it will output N/AAdministrative fee: $ xx
Administrative fee: 5% of the loan amount
Administrative fee: N/A
It learns to use the percentage of the loan amount when the specific dollar amount is not mentioned.

Auto Compose (Chat)

Auto Compose is a great use case for fine-tuning Ocean models. We can leverage the large amount of transcripts available to derive a robust conversational model.BLEURT Score% time is betterPrevious Production Model0.434010.88%New Mistral Based Finetune0.49255.79%

Going Forward

We believe that a treasure trove of expertise sits inside our customers’ data. As an intelligence layer for customer conversations, Cresta integrates into hundreds of systems to build a unified understanding of expertise from this private data - in a highly secure, responsible way (read more about our commitment to responsible AI here). We plan to bring this paradigm to more contact center workflows and supercharge coaching, assistance, and automation with a collection of expert models.

3 Key Concepts for Creating AI Product Experiences

AI-powered products are more prevalent than ever before: automatically organizing your photos, driving a car without any human input, and helping you craft a perfect email.

Learn more