Cohere vs. OpenAI in the Enterprise: Which Will CIOs Choose?

As generative AI moves into the enterprise, a company founded by ex-Googlers aims to out-perform Microsoft backed OpenAI.

Mar 6th, 2023 6:36am by Richard MacManus

Featued image for: Cohere vs. OpenAI in the Enterprise: Which Will CIOs Choose?

Image via Shutterstock

OpenAI has just announced an enterprise version of its popular generative AI product, ChatGPT. But in this case, OpenAI is a fast follower — not the first-to-market. Cohere, a Toronto-based company with close ties to Google, is already bringing generative AI to businesses.

I spoke with Cohere’s President and COO, Martin Kon, about how its machine learning models are being used within enterprise companies.

Cohere is only a few years old, but it has an impressive pedigree. Two of Cohere’s founders worked in the recent past for Google Brain, which kickstarted the current craze around generative AI. In 2017, Google Brain introduced the “transformer” model for Natural Language Processing (NLP) — the ‘T’ in ChatGPT. Aidan Gomez and Nick Frosst, the CEO and CTO respectively of Cohere, then teamed up with Ivan Zhang to commercialize this form of NLP at Cohere.

Martin Kon is brand new to the company, having started just last month. But like the founders, he also has Google ties, having worked for YouTube for six years prior to joining Cohere. He was brought on board to run the business operations side of Cohere — and business, it seems, is booming.

According to Kon, Cohere has experienced a “65% month-on-month growth over the past year in API calls [and] similar in number of developers.”

Cohere’s Playground

Now that it has traction, Cohere has switched focus to bringing its large language models and associated tooling to the enterprise.

“We’re working with developers in organizations, the AI/ML teams, to bring these capabilities into their organizations,” said Kon. He claims that its approach is fundamentally different to OpenAI’s.

“OpenAI wants you to bring your data to their models, exclusive to Azure. Cohere wants to bring our models to your data, in whatever environment you feel comfortable in.”

Technical Comparisons of Cohere to OpenAI

Cohere has two types of LLM (large language model): generation and representation. The former is what ChatGPT does, the latter is for understanding language (for example, to do sentiment analysis). Each type comes in different sizes: small, medium, large, and xlarge. There are various tradeoffs between the size of the model and the speed it can work at.

Cohere’s base model has 52 billion parameters, based on the Stanford HELM rankings (Holistic Evaluation of Language Models). Stanford’s HELM website notes that this is for the “xlarge” version of Cohere’s model, the largest version. OpenAI’s GPT-3 davinci model, its largest, is listed by Stanford as having 175B parameters.

List of Cohere’s models in Stanford HELM directory

The primary models of OpenAI

During our conversation, Kon said that Cohere’s models were shown to test better against GPT-3. I asked the company for verification of this and it responded by pointing me to Stanford’s accuracy measurements. According to Cohere, “the study shows that the Cohere xlarge model achieves higher accuracy than a number of well-known models which are 3x larger, including GPT-3, Jurassic-1 Jumbo, and BLOOM (each of which has about 175B parameters).”

However, it should be noted that Cohere’s model is only ahead of GPT-3 models. OpenAI’s more recent GPT-3.5 models, text-davinci-002 and text-davinci-003, are both rated higher than Cohere in accuracy. Indeed, these are currently ranked highest of all models by the HELM accuracy measure (see below).

Stanford HELM tests for accuracy of ML models

Kon told me that Cohere’s latest model, Command (currently in beta), gets re-tuned every week. “This means that every week, you can expect the performance of command to improve,” he said.

According to the documentation, Command is “a generative model that responds well with instruction-like prompts.” Davinci, as a point of comparison, is described by OpenAI as being good at “complex intent, cause and effect, summarization for audience.”

Enterprise Use Cases

The earliest use cases for generative AI have been based on content generation and summarization — Stable Diffusion’s image generator, ChatGPT’s conversational search engine, GitHub’s Copilot code generator, and so forth. But I asked Kon what other use cases are there for its technology, particularly for enterprise companies.

Firstly, he said, companies are using it to create a kind of semantic search engine for their own private data.

“Bringing semantic search — contextual search — into private environments, like the information inside an organization, in a similar way to what we’re used to with Google search,” said Kon. “So that enables companies, especially multinational companies, to search through or classify or look for sentiment across every single piece of material they have internally — every document, every customer call transcript, every video chat transcript, emails, documents, etc.”

He explained that an organization typically adds its own data to one of Cohere’s base models. This would be “much smaller amounts of much higher quality data, generally human annotated reinforcement learning,” he said (and “smaller” here simply means comparative to the billions of parameters in the base model). On top of all that, there is a “dialogue” layer, or conversational layer — like ChatGPT. Cohere’s dialogue model is in internal beta, he added.

I had earlier asked what might be an example use case for a big retailer, and Kon returned to that question here.

“Let’s say you’re a retailer and you want to [ask] how is our business doing in Bolivia? It [the AI] can then say, here are the latest sales results that are pulled out from Spanish, or whatever. No, [you say], I meant the wholesale business. Okay, [it replies] let me pull some different things from somewhere else. And so you’re basically having a conversation. You’re accessing [that data] in a very safe way, because no one externally can see it — you’re not feeding it into ChatGPT, which is kind of what everyone’s been doing.”

He ran me through a couple of other examples, including one from the point of view of a customer of a retailer. Say you bought a TV and want to return it; you could have a dialogue with the retailer’s AI, to check in real-time what the current returns policy is.

Google Partnership

Running machine learning workloads, especially ones with billions of parameters, is extremely hardware-intensive. So I asked Kon how Cohere is handling that — do they have largely in-house hardware, or have they partnered with any of the platform companies?

Unsurprisingly, given the background of the founders and of Kon himself, Cohere has partnered with Google on hardware.

“We have a strategic relationship right now with Google,” said Kon. “So I believe we were the first — and still the largest — consumer of TPUs [Tensor Processing Unit] outside of Google itself. So we have access to, and the funding to be able to afford, these enormous compute resources that we need to pre-train. The models take four to six weeks to train — the base models. Our command models, we can do nightly because it’s a much smaller amount of data — but really specific data — so we do those every week, but it takes about a night to train.”

Of course, Google itself is also in the generative AI game. It has several obscurely-named ML models: T5, UL2, Flan-T5, and PaLM. So, just as with cloud computing platforms, the market for enterprise AI is turning into a “horses for courses” situation. Some enterprise customers will be Microsoft shops, and thus might lean towards OpenAI. Others will be Google Cloud customers. But many will not want to be tied to one single cloud platform, and that’s where Cohere most appeals.

There are clearly huge opportunities for enterprise companies to get a jump on their competitors with generative AI technology. As for the providers of generative AI, including Cohere and OpenAI, I can’t think of a more exciting enterprise IT category to watch this year. It’s game on!

Richard MacManus is a Senior Editor at The New Stack and writes about web and application development trends. Previously he founded ReadWriteWeb in 2003 and built it into one of the world’s most influential technology news sites. From the early...