A simple guide on how search engines work

Search engines are a staple part of the digital landscape. Whether it's Google or Bing (Yandex or others), using search engines is a task that many of us perform many times a day.

Feb 14, 2022

Technology

min read

Peter Lambrou

,

Sitecore Optimisation Consultant MVP Strategist

woman sitting on a rock using binoculars

Most people don't even think about the dynamics of search results. But if you run a business, which will no doubt have an online presence, or work in one, then understanding SEO and it's inner workings can be of benefit.

In essence the 'nuts and bolts’ of search engines involve six key elements:

Crawling
Indexing
Rendering
Algorithms
Machine learning
User intent

1. Crawling

A web crawler, or spider, is a bot that systematically crawls the Internet in order to index websites. It's that simple!

You may (or may not) have heard the term 'crawl budget'. A 'crawl budget' is the amount of time Google allocates to website crawling. The crawl budget is based on many factors, but there are two central ones:

Server speed: Essentially, the faster the server, the more Google can crawl a website without affecting the user experience.
Site importance: Websites with regularly updated content, like news sites, will get crawled more frequently to ensure that search index news content is up-to-date. Alternatively, websites, for instance a nail salon, will have a lower crawl budget because the website will be deemed not as important as the news site. It may sound unfair, but there is logic behind crawl budgeting.

2. Indexing

Indexing is essentially the adding of a web page's content to search engines. If a web page isn't indexed it can't be found via search engines. And if search engines can't find it, neither can potential visitors.

There are a few ways a web page can be indexed after it's created:

Let the crawlers do the work: Search engine crawlers do the hard work by following links. As long as your website is indexed and content is linked, new web pages will eventually be found and added to the index. This may happen if other sites link to your page.
XML sitemaps: This is what you want the search engine to index in an XML format. The sitemap can not only list all the pages in your website, it can also show additional details such as modification dates. The XML sitemap is submitted via Search Console (Google) and/or Webmaster Tools (Bing/Yandex).
Index request: If you have a web page of high importance, for example a customer notice of a recall, then requesting an index speeds up the indexing process. It's the best hope of an immediate index! The 'Request Index' can be done via Search Console (Google) and/or Webmaster Tools (Bing).

3. Rendering

Rendering is the content and layout that the code generates. It's what we see when we visit a web page. And it's what search engines see to determine the context of the user experience.

There are many parameters that search engines examine during rendering that determine how a web page should be ranked. For example, is there content hidden behind a link? Are there any ads within the web page? Does the page load slowly?

Although rendering happens after indexing, it can take several weeks. This rendering latency occurs because web pages (more often than not) use more than just simple HTML.

Websites that use JavaScript instances (that relies on the rendering), will take longer to render which means that the search engines will only know the page content when the Web Rendering Service has been completed.

The rendering life-cycle of a web page is:

Discovered via sitemap, crawler, etc.
Added to the list of pages to be crawled when crawl budget is available
Content is crawled and indexed
Added to list of pages to be rendered when the rendering budget is available
Rendered

4. Algorithms

Trawling through billions of web pages to meet a search query requires some complex programming and powerful algorithms. And search engines love algorithms. But what are they? The Oxford dictionary defines them as:

"a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer."

When it comes to search, Google’s search engine algorithm for instance is defined as:

"the internal process that Google uses to rank content. It takes a number of factors into account when determining these rankings, such as the relevance and quality of the content against a particular search query."

So when you hit enter after you've entered your search query a whole series of signals are triggered. The algorithms start to compute the search engine ranking factors that will determine search results, which include (but are not limited to):

The intent of the search query
Keywords
Relevance
Usability of web pages
Expertise of sources
Language
Location
Spellings
Query categorisation
Synonyms
Page significance (click through measurement and click through rates)
And more...

The core process that will determine the ranking of a web page can be summarised in five simple steps:

Classification
Context
Weight (importance)
Layout
Rank

Search engine algorithms are constantly changing so it's important you're up to speed with the latest updates as they could impact your SEO activities and your site’s performance in search results.

5. Machine learning

Machine learning is one of the most significant technological advancements in recent years. Wikipedia defines Machine learning as:

"Machine learning is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence."

Over the years the influence of machine learning in search has increased. For instance, Google's machine learning program called RankBrain has revolutionised how search results are determined. This is done by:

Continuously learning about the 'connectedness' of entities and their relationships .Understanding when words are synonyms and when they're not
Instructing other portions of the algorithm to produce the correct search engine results page (SERP)

In search there are two machine learning models:

Supervised model: Uses 3 sequences (Provide the data, Set up a reward for success, Integrate machine learning with the algorithm.
Unsupervised model: The system is not told what it’s looking for. Instead it's instructed to group entities (an image, article, etc.) by similar traits.

Machine learning improves the user experience by surfacing more relevant results based on the search query.

6. User intent

"User intent" are the user behaviour signals that search engines use to surface relevant search results. It's the 'why' behind a search query.

There are four types of search intent:

Informational: The user is looking for information. For example "what are the football scores?"
Navigational: The user is looking for a specific website, for example "Codehouse"
Transactional: The user is looking at potentially purchasing a product or a service, for example, "cheap Android phones."
Commercial investigation: The user is looking for information on a product or service to ascertain the best purchasing options, for example, "best mobile provider."

Whichever type of search intent, search engines use user behaviour as signals in order to appropriately rank websites in the SERPs. Google will know:

Which sites you click on in the search results.
How long you visited the target website for before you returned to Google.
What you did next.

For example, a user performs a transactional search for "cheap Android phones". They click on a link from the search results and visit the website. The user spends a lot of time on the website before returning to Google and searching for something else. This behaviour is interpreted by Google as a positive signal as it indicates that the user found what they're looking for and have moved to a different task. Because Google knows this, it will improve the ranking of the web page based on the search intent.

Understanding what your audience wants influences the pieces of content you produce as it reflects user intent. This in turn will influence your page ranking, click throughs and conversions.

Working with Codehouse

At Codehouse, our digital experience team including Google certified experts are available to help you get the best from your content. Get in touch to find out more.

Discover Coveo, integrating intelligent and predictive search with your website.