Search Engine Internationalization Roudtable

Bill Hunt

Many in the international search community have proposed a roundtable for several years to discuss challenges with internationalized content and websites. The challenges are more than indexing and ranking in local market SERPs; they include understanding language and target markets by search engines and managing datasets globally vs. locally, IP detection, localization, and the operational and organizational challenges that multinational websites experience. The growth of AI-generated search results, product shopping feeds, and an emphasis on entities significantly affect how these should be managed globally. Yet, it is only really addressed at a market level.  Few information sources exist for managing websites globally, and little information is available beyond fundamental localization, hreflang, and choosing a domain structure. 

It is even more important as search engines are under pressure to index the exponential growth of content while also managing the increasing costs of crawling and storage. Bing had been vocal about improving crawl efficiency by launching IndexNow to identify updated content to index and revisit, as well as similar efforts with CDN and CMS vendors. By elevating the importance of internationalization elements into the discussion, CMS vendors, DevOps, and creative shops can better integrate these elements during site creation rather than trying to incorporate them after the fact.

Any successful roundtable must include a cohort of players, not just celebrity SEOs. We need a well-rounded group from multinational companies, both B2C and B2B, e-commerce, CDN, CMS, localization vendors, and experienced international consultants and agencies.  I would go as far as some government organizations like the World Trade Organization, which have a variety of initiatives to help nations and businesses thrive in their sustainable development programs. There is precedent to have this roundtable as they are always mutually beneficial.  

Over 20 years ago, when I was at IBM, several large enterprises met with search engines to discuss methods for getting more of our content indexed. Search Engines were in an arms race to get content, so it was a win-win situation. On and a few of our site search customers, we developed nested lists of URLs to enable our crawler to get deep into the content.  Working with crawl engineers from multiple engines, they agreed that XML sitemaps would be a scalable solution. This resulted in the adoption of XML sitemaps in 2005.

Shortly after the XML sitemap beta test went live and all of the content was being indexed, we noticed that content for some markets was not being indexed.  A deeper dive identified it as we had a new problem, with much of it being considered duplicate.  Since Argentina was first on the country selector and one of the first XML sitemaps indexed, most Spanish market pages from Panama, Uruguay, and Venezuela were not indexed as it seemed they were considered duplicates.  We had a similar problem with Arabic and English content. Google was considering most of these pages as duplicates, and we needed a way to indicate that while they were near duplicates, they were dedicated to a specific market.

We proposed various methods for webmasters to indicate the specific market to give it a purpose and not be considered duplicates.  We did not get much traction because most ideas were not scalable and easy for any size company to implement, not just large multinationals. Google finally deployed hreflang in 2013. I admit our initial request naively did not anticipate the chaos we see with websites today.  At that time, most enterprises had very rigid architecture standards, making alternate market URL mapping relatively easy to indicate in the XML sitemaps. This is again why ongoing forums can discuss changes in technology and indexing.  

Another successful collaboration happened at SES San Jose in 2008, when a group of companies, SEOs, and Adobe met to discuss the Flash indexing problem. A group of Google crawl engineers, an Adobe Flash development team, Flash-heavy brand sites, Disney and Warner Brothers developers, top technical SEOs, and even a few black hats are willing to discuss how they might exploit it.  

Google explained its crawling process, and with a few questions from the Adobe engineers and tech SEOs, we clarified where the problems were and how to fix them. Adobe changed its compiler to bring text higher in the DOM and educate developers on using microdata and links. Then, Adobe worked with Google and Yahoo! to create a headless player for the crawler.  It was then rolled and explained in a series of workshops at Adobe’s partner event a month later.  Educating the developers on the new process and updating the crawl and compile process validated the benefit of a two-pronged process for solving this massive problem.

This global search roundtable can document the issues and frustrations of the SEO, development, and localization community. The search engines can share clarity on how they process international content and define optimal infrastructures so the CDN and CMS vendors can integrate at the development level.

Photo of author

About the Author

Bill Hunt

Leave a Comment