Alternatives to Hreflang Attribute

Bill Hunt

In a LinkedIn post, Gary Illyes indicated SEOs are frustrated with hreflang and is seeking suggestions for alternative methods.  Gary has been asking for this for several years, and most of the solutions presented at various conferences and sidebar conversations don’t seem viable. 

This latest call for ideas has prompted several new suggestions. I have included those that I believe are feasible, along with my insights. For each option, I’ve also outlined the potential advantages and disadvantages. For more on my personal thoughts, I wrote a lengthy article about why SEOs are so frustrated with hreflang to try to reframe the discussion to focus on why they are frustrated and try to mitigate those challenges rather than scrapping a very effective feature that works well as it is for many websites.  But essentially, there are a few key reasons for the frustration:

  1. Global web infrastructures are complex and unwieldy
  2. Geopolitical and organizational challenges 
  3. Overly rigid syntax and implementation rules
  4. Lack of any sort of reporting from Google or SEO tools for cannibalization

Since Hreflang’s launch in 2013, I have worked extensively with thousands of companies, including some of the largest sites in the world, to implement hreflang at scale.  For many of these sites, it was nearly impossible for various reasons.  During this time, I focused less on complaining about why it isn’t easy and how to implement hreflang despite these challenges.  

Is there an alternate solution for hreflang?

Most SEOs are now at a point where if there is not an easy-to-use solution, they want to get rid of it. If we take a moment to understand what hreflang does and why we have challenges, we might be able to either fix what we have or logically work through scalable options.  
Hreflang is an attribute applied at the page level that helps to distinguish nearly identical pages by specifying their language and intended target markets. It also identifies alternative versions of the page and their respective languages and target markets.  Any solution that replaces the current hreflang attribute would need to pass similar signals as Gary’s note of passing the same level of information and enabling it for large and small websites.

Current hreflang functionality and purpose

Hreflang does two things: it sets the language and market of a webpage and lists alternate versions of that page, as well as its language and target market. Brands can deploy the hreflang attribute to disambiguate this uncertainty, give each alternate a specific purpose, and indicate to whom it should be presented.  Why would we not want this gift from the Google gods?

In their quest for global domination, brands create market-specific websites for various reasons.  We are repeatedly told we must localize for market-specific nuances of language, sizes, price, etc.  We also have KPIs.  Significant investments of time and resources are needed to launch market websites, so we need to ensure they deliver a return on that investment. For anyone working in this vertical, there are varied degrees of adherence to proper market localization. Some companies clone a language version multiple times as a unique market site.

Before we dive in, I fully anticipate that some will claim I am biased toward the current standard and thus resist change. I have attempted to explain any potential bias at the end of this post. I want this to be a dialogue, so the comments are open for this post to encourage comments, disagreements, and alternative ideas, which we can integrate into a larger working document.

Domain-Level Statement

A couple of recommendations have been made to propose shifting from declaring the website’s language and market on each individual page to adopting a domain-level approach for this purpose. The domain-level recommendations have excluded the critical second part of the functionality, noting the alternate variations that completely alter the attribute’s functionality. See the bottom of the article for why I simplify “language region” with “market” and “target market.” 

Pro

  • This allows SEOs to designate the language and target market (language region) for the root domain(s) and any subordinate language/country folders without setting the language/country declaration and alternate versions for individual pages. This makes implementation relatively easy, bypassing many of the frustrating challenges.
  • Depending on the method, it eliminates or minimizes the need for CMS and other technical implementations, making hreflang simpler to implement correctly.

Con

  • Moving from a page to a domain setting will require Google modifications.
    • To maintain the state of the language and country for any URL they find for each declared domain structure. For example, any URLs found at the /au/ folder level and below would be attributed to Australian English.
    • Google needs to ignore potential duplicate content within a domain variation. Applying the Australian English declaration to all subordinate URLs clarifies that they are unique to the market and, therefore, not duplicate content from other markets.
      • If a product has multiple variations, should Google take them all and sort them out?  
  • Given Google’s goal to reduce the crawling and indexing burden, does it trust and take all or flag more pages as canonical duplicates? 
    • Does this open Google to additional duplicate content exploits, or would Helpful Content algorithms sort this positively or negatively for the site?
  • Each root domain variation (gTLD, ccTLD) must be set using the Domain-Level method, which may not be possible in all markets. This is necessary to validate cross-site ownership to prevent Cross-Domain exploits.
    • A large, complex organization could require a large footprint of annotations. We recently onboarded a company with 17 variations, each combining gTLD and ccTLD with folders and subdomains.
    • I have cataloged 87 variations companies have used to designate the language and country, requiring Google to understand and accept these variations. With each root domain variation (gTLD, ccTLD) 
  • Google must enforce compliance with the settings and not allow any deviations to syntax, codes or purpose.
  • Google will need to clarify how to handle X-Default fallback options. (should not need to change syntax but too many deviations by markets currently may be an issue)
  • Google will need to clarify how it handles regional sites. Should a site be able to have 20 or 30 designations of the same root structure or introduce regional options?
  • Multiple methods for setting the domain-level attribute may be necessary as many enterprises and web hosts may not allow access to robots.txt or GSC . What is an option for them?

Domain-Level Questions/Concerns

  1. Does this domain or market-level declaration ensure /au/ content will appear in Australia rather than on a more potentially relevant page from another market? This was easier to do with market-specific indexes, but can it be done globally?
  2. What if I have multiple similar pages in a market for the page? Does Google consolidate the canonical into one? Using page-level hreflang, I can definitively state which variation I desire.
    1. For example, a website has a main product page with facets to select package quantity in the UK, but other markets have individual pages for each package size and no main page. Before hreflang set a specific quantity to the category page, Google would only show the UK’s main product. It also canonicaled product size pages to smaller packages. Assume similar issues with market settings.
  3. If I set the declaration in my gTLD domain for multiple ccTLDs, do I also need to declare the reciprocation in robots.txt?
  4. What if a website misstates the language or target market of a country? Is Google correct?

Domain-Level Scenario Questions

  1. When a company launches a product in the US, the page(s) are part of the en-us designation and get indexed.
    1. Can/will searchers using related queries find the content in local engines or just in the US?
  2. The same company later launched the product in the UK on a UK ccTLD designated en-gb at the domain level.
    1. Can/will both be eligible to be shown in multiple markets?
    2. What happens to the original US page in the UK? Is it automatically replaced by the UK page, even if it is less authoritative?

Can this domain-level declaration work where you have different product names across markets but similar text? For example, Axe Body Spray in North America and Lynx Body Spray in the UK have product pages that are nearly identical except for the name. If we indicate both domains, will Google show Lynx in the UK and Axe in the US?

We can use several methods to indicate our preferred language and market settings to Google.  Any of these options would require Google to enable domain application of the site preferences and allow alternate pages to be presented accordingly.

Option 1:  Robots.txt Declaration

Aleyda proposed a solid alternative to the option that uses robots.txt statements to designate the language/market string and covers a variety of scenarios.

John Mueller suggested adapting a regex string that would incorporate more variations.

My only suggestions are to keep the simple name “hreflang” instead of “hreflang_scope” and adhere to the established robots.txt syntax of field, colon, and value, which would keep the root domain entry to one line similar to the sitemaps prefix.

hreflang:https://example.com/en-us/ hreflang=”en-us”

Pro

  • Same as noted for domain-level declaration
  • This option enables easy/quick hreflang implementation, especially for gTLD and folders.

Con

  • Same as noted for domain-level declaration
  • Google and other search engines would need to accept a standard syntax.
  • Robots.txt could become quite large with “SEO creativity,” similar to robots.txt sculpting, where SEOs attempt to channel robots into specific content sections away from others. 
  • Not everyone can edit robots.txt, which may cause more significant challenges at the market level or where a market refuses to participate, mainly for ccTLDs and those with non-common implementations. 
  • There is no easy way to ensure the correct language and country settings, with over 30% today having the syntax set incorrectly. How do we prevent it from being set incorrectly with reversed cc-ll website structures? Can this be tested or prevented? Could/should Google add a test protocol to their robots.txt tester?

Option 2: GSC Setting

Restore and extend classic GSC Geo-Targeting to all domain formats.  For large and more complex structures, brands could upload a language country matrix CSV into GSC

Pro

  • Same as noted for domain-level declaration
  • It makes it relatively easy to create the matrix and import to set hreflang globally.
  • Immediately cross-validates root domains in the same GSC account
  • Allows sites to set it once centrally to prevent easy change by non-authorized

Con

  • Same as noted for domain-level declaration
  • Some domains/markets may not be in the primary GSC account
  • May have arguments about who can edit and manage

Option 3:  DNS TXT Record

Many websites currently use this method for GSC DNS verification. It is more rigid than uploading a file and adding a tag to a home page.

Theoretically, we could add a variation of the current syntax to a DNS TXT entry to indicate the language and region preference. This would offer a similar rigidity from change and help ensure verifications with the organization, not the agency. The language and region preference should not change, and once Google validates, it can set that state for any URL with that pattern. Any new versions would need to be added.

TXT records allow up to 255 characters and unlimited entries. If multiple entries are not desired, a regex or paring syntax could be created for gTLD setups.

We strongly recommend DNS verification for the main corporate GSC account so that it cannot be easily altered. This is also very effective for those with gTLDS, especially ccTLDs, that need cross-domain verification.

Pro

  • Same as noted for domain-level declaration
  • It makes it relatively easy for ccTLDs to set hreflang
  • Works similar to GSC DNS validation (could be done at the same time as validation)
  • Allows sites to set it once centrally prevents easy change and deletion if robots edited 

Con

  • Same as noted for domain-level declaration.
  • Google would need to align the original verification token across multiple domain combinations, similar to having all domains in primary GSC.
  • Some domains/markets may be unable to edit the DNS, requiring corporate intervention and increasing the time for implementation.
  • DNS can get large due to SEO creativity

Option 4: Maintain the Current HREFLang Protocol on Home Page

We can keep what we have and only require it to be set for the home page in hreflang tags or Hreflang XML, as shown below. We can allow them to keep the home page settings but ignore any deeper page settings.

Adopting one of the other options could be a fallback method and act as a bridge for those companies here. A change requires waiting many months to move up the action list.

Pro

  • Same as noted for domain-level declaration
  • Many websites already have it set up currently
  • Various plugins would allow you to set on the home page

Con

  • Same as noted for domain-level declaration
  • Implementation has the same potential errors and complexities; how do we mitigate failures to follow syntax?
  • Websites in markets may be unable to cross-verify or add code to pages.

Option 5 HTML Lang tag

If we are deviating from page-level settings and hate hreflang, why not use the old-school and required HTML Lang tag? Most sites have the tag <html lang=”en-us”>. We could argue that this is the simplest method for each home page variation.  

Ironically, 20 years ago, when we initially submitted the idea of hreflang, search engine engineers pushed back, saying that sites must have the HTML lang tag and that they could use it. A test found that 80% of non-US websites had an incorrect setting, so we deemed it ineffective. 

We see problems with e-commerce sites that clone languages. For example, Michael Kors Austrian German language website has an HTML lang code of “de” for German, and the hreflang is set correctly to de-AT

Pro

  • Same as noted for domain-level declaration
  • Nearly all websites already have the tag, so no action is required other than validating and correcting if needed. Google may flag language tag mix match.

Con

  • Same as noted for domain-level declaration
  • Many sites need to set the tag correctly. The last statistic I saw was that nearly 25% of websites need correct language and country settings.
  • No method to set X-Default for a fallback website 

Page-Level Statements

This is the root of the frustration and a challenge for most implementations. Either the CMS or the implementation team cannot completely and effectively map the alternate versions of pages. Hreflang attributes fundamentally attempt to prevent duplicate issues, so websites need to pair each alternate page with the other.

Pro

  • Allows website owners to set the specific language and regional preference for each page and the alternate variations, removing any ambiguity between alternates and potentially duplicate pages. 
  • Indicates to Google alternate URLs for a language and market that may help improve global indexing 

Con

  • Requires websites to set full hreflang syntax at a page level
  • Creates the need for CMS and other technical implementations to enable hreflang.
  • Requires web teams to comply with rigid syntax and rules to enable hreflang
  • All of the reasons SEOs are frustrated with hreflang

If a domain solution is not viable, is there an easier way of implementing page-level declarations? I don’t believe larger or decentralized organizations’ URL mapping may ever have an easy button.

Option 1 Incentivise Correct Use of Hreflang

Many websites have perfectly fine hreflang implementations as they have mitigated many of the challenges that make it frustrating.  

Rather than finding an alternate solution, can Google reward websites that have successfully deployed hreflang? What about implementing awareness campaigns or using scare tactics to drive change? Did this approach motivate compliance with mobile friendliness and core web vitals? Previously, when sites with poor user experiences were at risk of not being indexed or ranked, such initiatives motivated development teams and SEOs to significantly improve underperforming websites to enhance their core web vitals scores. This motivated companies to move mountains to fix underperforming websites to get better core web vitals scores.  Why can’t we have a similar emphasis on global representation, especially when real ROI is attached to a completely and correctly implemented hreflang? 

A significant challenge is that those deciding and prioritizing actions to fix the mess must know the implications. I guarantee that if a CEO or board knew they were losing $2 to $20 million a month due to cannibalization, they would fix their organizational and infrastructure problems. I have had agencies and internal teams debate a solution for months without a solution to have a senior executive become aware and mandate a fix immediately.

Pro

  • Same as noted for page-level declaration
  • Benefits websites that follow hreflang rules and/or have an organized structure
  • It will go a long way to getting websites to enable better alignment and collaboration and other needed infrastructure changes
  • We don’t have to change anything many already deem challenging or confusing.

Con

  • Same as noted for page-level declaration
  • It will require website owners to solve the problems that are frustrating SEOs.
  • It will continue to be challenging for many companies to implement hreflang.

Option: Merchant Center Feeds GTIN, Language and Market

Google Merchant Product and Amazon’s Feeds may have enough information to make this a viable page-level option. Regional Product Inventory Feed that overrides the primary feed to show regional pricing or product availability in your predefined regions. These contain language and market identifiers.

Note: I am not an expert on Mercent Feeds, so if anything is incorrect, please let me know if there are other attributes we can use to help support this method.

Google and Amazon product feeds have unique Global Trade Item Number (GTIN). GTINs are 14-digit global data structures that include numbering strings like UPC, EAN, and ISBN. Amazon uses a strict verification process to ensure that all GTINs on its platform are valid and licensed.  

Many e-commerce sites already have the necessary attributes, such as a GTIN number, URL, language, and target market(s) in their feeds. As regional product feeds are in addition to the primary market feed, Google may be able to infer the desired language and market for these product URLs. Since a setup is already in place and has a validation method, a subroutine would just be required to align the GTIN and URL. This would also help cut down on aspiration hreflang implementations. 

Recently, Hreflang Builder, my enterprise hreflang solution developed to overcome many of these challenges, added the ability to import product feeds for different markets and map URLs by extracting the GTIN, target URL, and Language and Market variables from each feed. Many more organized global merchants embed their GTIN or a unique global ID in the page code that we can extract to generate hreflang XML sitemaps.

Pro

  • Same as noted for page-level declaration
  • Leverages a typically data-driven code element that more companies might get correct because they often pay to have product feeds created, and they require GTINs

Con

  • Same as noted for page-level declaration
  • It would require Google to enable processing these variables for Hreflang purposes.
  • Not all websites use Google or Amazon, but with SGE that may change this attitude.
  • Not all feeds are managed globally, and not all markets participate.

Option: Schema Elements

Ziggy Shtrosberg recently proposed on LinkedIn that Schema elements “inLanguage” and/or “area served” attributes could be used to designate the internationalization signal. As more and more sites are doing Entity SEO and leveraging schema, that could be a viable option.  The product schema aligned with these two elements could be an existing data source already available and embedded at the page level.  The recent adoption of product variants helps solve the challenge of capturing variations of product pages in hreflang.

Pro

  • Same as noted for page-level declaration
  • Leverages a typically data-driven code element that more companies might get correct
  • Many sites use schema, and since Google currently ingests this data, we just need a method to indicate the domains in a cluster so they can be integrated.  Maybe one of the domain-level options can solve this.  

Con

  • Same as noted for page-level declaration
  • The “areaServed” attribute might be too granular and confusing if used by a business to indicate a smaller local radius than a country or region.
  • Not all websites use schema and/or may not be indexed or implemented correctly.
  • It would require Google to enable processing these tags for Hreflang purposes.

Market vs. Language Region Naming Convention

Google correctly uses language region to reference the geographical area where the language is represented.  While it is the correct reference, Some find the reference to regional language confusing. While the language region is designated by ISO 3166-1 alpha-2, which represents UN-recognized countries, there are those who take the region literally and try to set it to a wide geographical region like Latin America or Middle East and others that take other liberties and invent codes and su-codes.  For completeness, there are several reasons to acknowledge why Google references “language region”:

  1.  Geopolitical and cultural sensitivities toward language and national borders and history. Wars have been fought over language suppression and adaptations.
  2. To represent a language spoken within a region of a country, territory, or providence.
  3. To represent a language for a country where it is not an official language

For all practical purposes, multinational websites target those speaking a language or specific markets with language websites. For hreflang simplicity, I propose we reference markets or target markets in documentation with an asterisk acknowledging the cultural, geopolitical, and linguistic differences of language.

Potential Bias:

Some will say I have a bias toward the current standard and am thus resist change because I have built Hreflang Builder, an enterprise-grade tool that is successful due to the challenges.  I have spent a good portion of my career trying to fix this problem and most of the last three years primarily helping companies reap the benefits of hreflang.  

Despite building Hreflang Builder, I have always advocated at conferences and in posts for companies to implement any method that will allow them to benefit from minimizing local market SERP cannibalization and cart abandonment. I built Hreflang Builder because I needed a solution for my clients who were facing significant negative financial impact.  We estimate that we have helped recover over $100 million in lost revenue for companies. So, yes, I am biased toward the current Hreflang method, specifically hreflang XML sitemaps, as they are very effective, highly scalable, and enable companies to work around most infrastructure challenges.

For the many SEOs that share their frustration, changing the methodology, even simplifying it, would unravel more than ten years of effort for many companies.  I am completely open to having a proper discussion of the challenges and all solutions.  I have called for a roundtable multiple times, and maybe we can get enough traction to have it.  

Comments and Discussion

Please add any comments, ideas, suggestions, or criticisms in the comments section below, and we will incorporate them.

Photo of author

About the Author

Bill Hunt

Leave a Comment