I wanted to share some notes on an experiment my agency performed recently, which resulted in Google believing our website was the canonical version of their own search engine optimization starter guide PDFÂ â€” and ranking us in place of their own content for â€œsearch engine optimizationâ€ and thousands of other phrases.
We perform many tests internally, both for our SEO Spider software and as an agency for clients. This particular experiment was purely for fun to highlight the issue we discovered, without the intention of hurting anyone, or indeed for any profit. We have now ended the experiment and removed the content.
We had previously been in touch with Google after noticing some strange behavior in the search engine results.Â While their SEO starter guide PDF was ranking for relevant terms like â€œSEOâ€ and â€œgoogle SEO guide,â€ something wasnâ€™t quite rightâ€¦.
â€” Dan Sharp (@screamingfrog) November 7, 2016
For the searches we performed, theÂ listing for the starter guide PDF would appear, but it would link toÂ various other websites that had uploaded it rather than to Googleâ€™sÂ own website. So Google wasnâ€™t ranking its own page for some reason; other websites appeared instead, using Googleâ€™s content.
Hereâ€™s a view of some of the sites ranking for itÂ in the UK. Each site appeared to knock the other out of the search results as GoogleÂ changed which one it believed was the canonical version.
We decided to look into why Googleâ€™s page wasnâ€™t being indexed and other pages were seemingly showing in its place. We noticed GoogleÂ appeared to be using a 302 temporary redirect on their search engine optimization starterÂ guide, which is hosted on a separate domain.
However, neither URL was indexed, and they appeared to be struggling to understand the canonical and index their original content and URL. Google was not using â€œnoindex,â€ nothing was blocked via robots.txt, other content was indexed on the subdomain, and theyÂ didnâ€™t appear to have any conflicting directives with canonicals or anything else on the page, or within the HTTP header.
Google has said that PageRank flows the same regardless of whether itâ€™s a 302 temporary redirect or 301 permanent redirect â€” itâ€™s reallyÂ a matterÂ of which URL they index and show in the search results. So in theory, the original URL should have been indexed and ranking, but this wasnâ€™t the case.
While each type of redirect should pass PageRank in a similar way, Gary Illyes has said that 301s help with canonicalization.
â€” Gary Illyes á••( á› )á•— (@methode) August 5, 2016
We knew from previous experiments that identicalÂ content can be hijacked, but generally by more authoritative websites. Googleâ€™s SEO starter guide has about 2,100 linking root domainsÂ to the original URL and another 485Â to the redirect target (HTTP/HTTPS protocols combined), so itâ€™s a very powerful page with lots of visibility.
The starter guide isÂ also on Google.com,Â which has a huge amount of reputation. The final target was on a separate domain, though.
Obviously, the Screaming Frog website is not as authoritative as Google, but far less authoritative websites had already replaced them previously,Â due to the issues described above.
We decided to run a short-term experiment and simply upload Googleâ€™s SEO starter guide to our domain. We then got it indexed via GoogleÂ Search Console and forgot about it.
A week later, we noticed we had hijacked Googleâ€™s own rankings (and any previous hijackers, due to our higher â€œauthorityâ€), as their algorithm seeminglyÂ believed we were now the canonical source of their own content. Our URL would return under a info: and cache: query for either of Googleâ€™s URLs.
We had hijacked the hijackersÂ â€” and Google.
Even though we are a UK site, we jumped into 4th position for â€œsearch engine optimizationâ€ and the top 10 for â€œSEOâ€ in the USAÂ â€” from outsideÂ the top 50.
The PDF ranked for â€œGoogle SEO,â€Â â€œGoogle SEO guide,â€ â€œwww google comâ€ and every other phrase that Googleâ€™s content should be visible for.
The PDF ranked for loads of other brand type queries in the UK and the US, which we can see courtesy of SEMrush (US specifically in the screen shot).
And Sistrix highlighted the sudden â€œnewâ€ keywords we were now appearing for organically:
Google Search Console recorded nearly 800K impressions for the PDF specifically for a period of four days.
This experiment received a lot of attention whenÂ we tweeted it.
So, we kept an eye on itÂ over the following days to see if Google made any changes to correct indexing, canonicalization and ranking. Around 48Â hours later, we noticed that Googleâ€™s guide started ranking and was clearly now indexed (andÂ would appear under a site: query), when previously it wouldnâ€™t have returned a result.
We then noticed Google had added an HTTP canonical to their PDF to the original URL, which helped get it indexed.
However, we were still appearing as the canonical under an info: query and ranking for their queries. This meant both guides were now rankingÂ in the search results, often with our site above Googleâ€™s own.
We were expecting this to changeÂ â€” for Google to become the canonical again and our page to drop out of the rankings. Up to five daysÂ later, we were still there, alongside Google in the search results for thousands of search queries. Then, our PDF disappeared from the search results, and we ended the experiment fairly swiftly.
First of all, we donâ€™t recommend messing with other peopleâ€™s content. This is not a viable strategy or tactic for gaining higher rankings, merelyÂ an unusual and interesting case study. It can be very difficult to draw conclusions, as we canâ€™t always be sure what other factors or unknowns might be in play.
While we have plenty of theories and thoughts internally, weâ€™ll end on three closing points.
1. 302 redirect not (fully) to blame
While we initially believed the 302 redirect might be the root cause, I know Google isÂ adamant that there are no issues using 302 redirects. We believe there are some contributory reasons around how the files are hosted.
We found a few other quirks around URLs changing over time (based upon values supplied in the Accept-Language header) and canonicalizationÂ on HTTPS incorrectly just for good measure.
2. Use canonicals
Itâ€™s very wise to use canonicals to help with indexation. As soon as Google updated the PDFâ€™sÂ HTTP canonicals to a single URL, it was immediately indexed.
Using a crawler, you can scan your siteÂ for missing canonical link elements or canonical links in your HTTP header.
For PDFs and docs, you can easily set an HTTP canonical using .htaccess, for example.
3. Although rare, hijacking can occur
A pageâ€™s rankings can be hijacked by another domain that uses identical content under specific circumstances, such as problems with indexationÂ or being a more authoritative source. This is generally unlikely, but perhaps thereÂ are some things that Google can still improve in ranking the original source.
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.