There’s a new search algorithm in town. It’s called SMITH.
You might not have heard about it. That’s understandable as it’s not even in use yet.
But it might be coming to a search engine near you. That’s because a new study by Google shows that it’s superior to the BERT algorithm that you may already know about.
In this article, I’ll go over the differences between the SMITH and BERT algorithms and explain why Google says SMITH beats BERT.
But First, a Warning
Although I’ve made it clear that Google thinks SMITH is better than BERT, the company won’t say if its current search algorithm uses SMITH at all.
In other words, the SMITH algorithm might remain on the shelf. For a while, anyway.
Still, it’s a great idea to understand what’s going on with this new algo because odds are better than even money that one day Google will use it to return search results.
What Is BERT?
BERT stands for Bidirectional Encoder Representations and Transformers. Any questions?
Seriously, Google uses BERT for natural language processing. It helps the search software better understand online documents so it can rank them according to a specific query.
It works great right now. But there’s a problem.
You see, BERT works best with short text. Not long-form content.
It’s limited to handle only a few sentences or perhaps an entire paragraph because of the “quadratic computational complexity of self-attention with respect to input length.”
Truth be told: all you really need to get out of that is that BERT is best for processing short documents. Not long ones.
And so Google is looking at another algorithm called the Siamese Multidepth Transformer-based Hierarchical (SMITH).
What Is SMITH?
The SMITH algorithm enables Google to understand entire documents as opposed to just brief sentences or paragraphs.
Here it is in a nutshell: while BERT tries to understand words within sentences, SMITH tries to understand sentences within documents.
To do that, it uses a predictive algorithm. That’s how it understands whole documents.
And, according to Google, it’s already better-suited to handle long-form documents.
The research shows that “experimental results on several benchmark data for long-form text matching… show that our proposed SMITH model outperforms the previous state-of-the-art models and increases the maximum input text length from 512 to 2048 when comparing with BERT based baselines.”
In other words, SMITH can do what BERT can’t do.
But hold on, that doesn’t mean Google will replace BERT with SMITH. That’s not how this works.
Instead, Google will use SMITH to supplement BERT. They’ll work together to fully understand document content.
There’s another great benefit to the SMITH algorithm, though. It helps with long-tail queries.
According to the research, Google says that semantic matching between long documents “is less explored.”
That’s exactly the problem they’re solving with SMITH.
How It Works
In this section, I’ll cover how the SMITH algorithm works. If you’re a true hardcore nerd, read on.
First, I’ll go over the important concept of algorithm pre-training. That’s when the algorithm is trained on a specific data set.
For example, if a sentence is written as: “To be or not to be, that is the ____.” What comes next?
The algorithm gets trained to understand that “question” comes next.
Repeat that process over and over again countless times with different phrases and sentences and eventually the algorithm becomes fairly smart.
But the example I shared above only masks out a single word. What would happen if I masked out a sentence in the middle of a paragraph?
Well, that’s what SMITH handles.
BERT uses masked word prediction. SMITH uses masked sentence prediction.
That’s yet another reason why SMITH is better. It can predict blocks of sentences.
Read that again: SMITH can predict blocks of sentences.
Welcome to the 21st century.
Where It’s Going
I think that you’ll see SMITH in Google’s search algorithm one day. But that’s just my opinion.
Google will make its own decision as to whether SMITH becomes part of the ranking algo.
However, given the positive reviews from Google’s own research, the current limitations of BERT, and the fact that strategists love to use long-form content, I’d be very surprised if we didn’t see SMITH in use in the near future.
It’s also worth noting what Google doesn’t say. It doesn’t say that “more research is needed.”
That’s a pretty big tell right there.