February 28, 2014 - Penalty to scraper sites

Google launches hunt to scraper website that are better ranked in its results that the sites they copy. Apparently, the algorithm is not able to identify them, so a call to witnesses is launched.
The site which is most denounced for the moment is Google itself, which incorporates paragraphs from Wikipedia for its knowledge graph!
In fact we see in the archives on August 29, 2011 that it is not the first time that Google launched this initiative.

Evolution in 2013: The verbose Web

Some important sites have been penalized for buying links (see Rap Genius), but it seems to be the exception rather than the rule. Buying links from quality sites and paying bloggers remain productive, except for a small site. May be an effect of what Google calls "detecting the authority of a site"!
Besides that, we see more and more pages on the same model: a new information at the beginning diluted in a flood of words to cover the total lack of originality, followed by some equally verbose reminders of what we already know about the subject. Presumably the authors are satisfied with the ranking, so we must conclude that the algorithm is satisfied with the wordy pages and considers only the first part of the text.

September 27, 2013, Hummingbird takes flight

It's the end of the black and white serie, after Panda and Penguin, here is the hummingbird. This new algorithm applies to both the frontend and the backend: it can handle question as a whole rather than as a sequence of keywords and makes the link with the page contents, ie, to find those in the database that meet the issue better. This algorithm, has just been revealed by Google, but it is in place for several weeks.

July 18, 2013, Panda softened

An update of Panda which adds signals to the authority of a site in a niche to prevent useful sites are penalized.
The rollout takes about 10 days to be finished worldwide.

May 22, 2013, Penguin 2

While the previous version of Penguin affected only the home page of a site, the new iteration directly affects all pages (and not indirectly because they depend on the home page). This penalizes sites that have artificial backlinks, usually with optimized anchors.

May 15, 2013. The program for the year to come.

In a video, the head of webspam at Google says that we should expect from changes in the algorithm in the coming months:

  1. Better detection of the authority in various niches with better signals, so to moderate the negative effect of Panda.
    It is claimed that the sites that are on the borderline will benefit of the doubt and will no longer be penalized. Here, as always, it's all about signals as these are signals that demonstrate authority in an area that will be improved. Hopefully this does not mean enrollment in social sites!
  2. More sophisticated methods to analyze links and to remove any value to the activity of spammers.
  3. Reduction of too many links to sites that have too much in the SERPs.
    In this regard, there is a back and forth with Google in reduction and increase depending on the times, which leaves little credibility to this intention.
  4. Improving categories of results that have too much spam. Once again reduce spam.
  5. Do not pass PageRank by advertisements.
  6. Better information to the webmaster when a site is hacked.

This announcement confirms the weight given to "signals" to assess the pages. Besides the extraction of keywords, the algorithm ignores their content, which is the source of all spam. In essence, the list shows the intention to reduce spam, which is nothing new and it is recognized that Panda is approximate.

Evolution of the algorithm in 2012

In 2012, SEO has become a minefield. The idea to downgrade a site regardless of the content that appeared with Panda in 2011 has expanded to Penguin in 2012. This time there are pages that have too much artificial backlinks that are penalized. But this tendency to treat all sites on the basis of hypothetical manipulations of the algorithm increases. Any action on a site can be considered an attempt to falsely spam since it looks like a maneuver practiced by some spammer.
It's not that Google is totally incapable of judging the content of a page, it shows it with the knowledge graph. But it seems that this ability is reserved for his own use while websites are judged solely from signals which can be interpreted as a vote for or against from users.
Penguin iterations are dated 24 February, 26 May and 5 October.

November 17, 2012. Mysterious update

There is no communication on the change that has affected many sites, but that if it is not an iteration of Panda. My theory is that it relates to the consideration of likes in social networking sites and comments, but it is purely personal. Note that the count of likes on Google Plus disappeared from GWT over the same time.
Panda updates occured on November 5 and 21.

October 10, 2012. The footer links are a penalty factor

The Webmaster Guidelines have recently changed. As always they state that exchanges and sales of links are prohibited, but a new line appears:

Links widely distributed in the footers of various sites.

These links are not considered natural and therefore violate the webmaster guidelines.

September 28, 2012. Domain names with exact-matching keywords

Sites that have a domain name chosen for its keywords, without content is considered of quality (see Panda), are now penalized. This affects less than 1% of sites.

September 14, 2012. Return to diversity

After the change of 17 November 2010 which allowed a single domain spamming search results pages, Google make ​​a back and limit again the number of links to the same domain.

August 10, 2012. Complaints of copyright infringement penalize a site

Even if Google does not remove content from its index that is the subject of a complaint, it penalizes the page now, when a site is too often the subject of copyright infringement complaints. We wonder if Scribd will be penalized!
Probably not because Google says that many popular sites will escape. Youtube uses a different DMCA form that is not addressed by this criterion of the algorithm: it can not be penalized.

April 24, 2012. Change in algorithm against webspam affecting 3% of websites : Penguin Update!

And obviously related to suroptimisation as announced, even if the term is refined. A site is targeted when it accumulates negative signals, such as link exchanges, backlinks created by oneself. Even internal links if they have the same keywords in the anchor. Irrelevant external links, ie unrelated to the surrounding text, it is also a negative signal. Filling a page with the same keyword too, as we already know.
Studies have shown that a site is penalized when the majority of the backlinks are not from sites in the same topic and when anchors contain keywords that are not in the topic of the page that makes them.
It's the sum of negative signals causing the penalty. These are criteria that  has always been taken into account, as we know, but the new algorithm performs a deeper analysis to better penalyze black hat techniques.

In fact Google has communicated about the changes and is talking specifically about "black hat webspam". This change also has a name, it's the penguin update, referring to the black-hat or white-hat sides (it's not a joke).

In April also the algorithm was modified to stop granting freshness boost to new articles when the site is considered of poor quality. This among other changes affecting mainly the presentation of results.

March 2012. Over-optimization to be penalized - Anchor of link revisited - Graphical interface, a ranking factor

After having penalized sites that the algorithm finds the content not original and substancial enough (resulting mostly by promoting verbose sites and doing in the digression), Google is preparing to take the next step and attack sites that drive the optimization too far, that's what Matt Cutts just announced at SXSW.
What optimizations will be penalized?

All this was already countered by the algorithm but it seems that Google wants to improve recognition of these actions and penalize more the sites. The effect will appear in a few weeks.
Some already believe that de-optimizing a site (ignoring all the optimization rules) to make it look more natural and avoid penalty can have a positive effect, it may be confirmed with the changes of algo expected. But Google says that a good optimization, made ​​only to help engiens to find the content, is recommended.

Google announces that how to interpret anchor links has been modified. Unspecified, a classifier has been removed. The interpretation of the anchor according to the query has been refined. Other changes relate synonyms, dates of threads, freshness, quality sites.
We also learn that the algorithm takes into account the interface and rendering on mobile for the ranking. We understand that the presence of icons, stars, etc ... is an indication of quality.

February 2012. After several years, how to analyze links changed

"We often use the characteristics of links to help us understand the topic of a linked page. We have changed the way we analyze the links. In particular we cancel a method of link analysis used for years."

Even if Google does not say precisely what has changed in its approach to evaluating the links, this sentence implies strongly that there are signals related to the relevance of the link that are concerned and that one is no longer taken into account. Here is the list of these signals:

  1. The link anchor.
  2. The text surrounding the link.
  3. The position in the page. Depending on whether the link is in the body of the text or at bottom, in footer, it has a different role and a different relationship with the linked page.
  4. The title, rel and nofollow attributes. The latter being unaware of the link (while consuming its share of PageRank), the only possible change would be that it is itself ignored.
  5. The PageRank of the page containing the link.
  6. Social link or link inside an article.

One could refer to various Google' patents on link analysis to find out what was changed. But without significant change in the SERPs, we can not really judge. Each of these factors may be written down except one: a relevant link in the body of the text.

The announcement on the changes in March contradicts the fact that this is the link anchor which is now devalued. Maybe it will have to be?

January 19, 2012. Content visible above the fold is now a ranking factor

As announced in November last year, the pages that show at first advertising content and then actual content visible only when you scroll the text, will be penalized.
It affects 1% of searches.
Users complained that to find content that meets their query, they must scroll the page and skip ads.
But how to determine what is "above the fold", as it depends on the resolution of the screen. On a mobile, depending on whether one holds it in portrait or landscape mode, it is not the same. This is intended to affect pages that have two 280 pixels high ads side by side. In fact Google provides a statistical measure of what is the page height without scrolling with the tool Browser Size. 550 pixels is an acceptable value.
The size of the header is it a part of the equation? If it is not considered part of the content.
The site Browsersize from Google (now closed) provided a measure of what is "above the fold " for a web page.
Announcement in Inside Search.

Evolution of the algorithm in 2011

2011 was a challenge for Google because its algorithm is increasingly criticized. With the arrival of a new CEO, the policy of the Google's search engine has changed. Indeed, Google has stopped ranking  pages to rank sites.
On the one hand, there was Panda, a new classification tool that penalizes a site where a number of pages is considered "low quality", i.e. lacking content or originality. If the site has also pages of quality, they rank lower.
In addition there is the return of multiple links on a single site in results pages. You can often see results monopolized by two sites, which is more than unfortunate.
The impression we have is that while Google communicates so much about the development of its algorithms, it is confined mostly to promote the most important sites.  This is perhaps not unrelated to the fact that there are 300 million new sites in 2011?
Fighting spam has become an obsession to the detriment of many sites that provide information that is searched but not visible in the SERPs

December 2, 2011. Detection of parked domains.

And whose home page is filled with advertising. A new algorithm is added to detect and exclude them from results. This is part of a dozen measures announced for the month of November, includind also ones about the freshness of content, to promote the most recent pages.
In November.

November 14, 2011. Bonus for official sites.

Sites related to a product, a person, when identified as official sites (made by the brand owner itself), will now receive a preferential treatment in ranking, according to a modification of the algorithm announced November 14, 2011.

November 10, 2011. Too many ads in a page: a direct ranking factor in the algo.

At the 2011 PubCon, Matt Cutts said that having too many ads on a page was going to be a (negative) direct ranking factor.
This has always been an indirect factor in so far as this can encourage visitors to leave the site and increase the bounce rate and reduce the time of visit. But it will now be taken directly into account.
This also confirms that this was not a criterion in Panda.
Note that "too much ads" depends upon the size of the page and he said also its placement above the fold is taken into account.

November 3, 2011. New ranking about  freshness of  pages.

A change in the algorithm affects 35% of queries on the search engine. This concerns the novelty of pages that can be promoted according to the search context.
It is about recent events and hot topics,  or topics that come up regularly in the news (Ex: F1 Grand Prix), or which is continually updated without current (Ex: A software).
Other topics such as cooking recipes should not be affected by this change.

August 29, 2011. Better recognition of scrapers.

Sites that duplicate verbatim the pages of other sites to display advertisements should be better identified. They are sometimes better positioned in search results pages as the originals!
Google is testing a new algorithm and asks users to report such sites to help develop.
Report a scraper.
This is not for copyright infringement but for sites that use some tool to extract a content and put in their pages.

August 12, 2011. Panda in all languages.

Apart Chinese, Korean, Japanese Panda now applies to the whole World. The impact is between 6 and 9% in each language.
Panda. In the time, Google changed the way Analytics calculates the bound factor.

June 20, 2011. Post-Panda recovery.

Since June 15, some sites recovered from Panda penalty, when they were modified to remove duplicate content.
This seems to address mainly site hits because duplicate content. The sites which were victim of scrapers recovered and the latter now often removed from SERPs.

June 8, 2011. The author attribute.

Several tags to place award in the body of the page are now recognized by Google:

<a rel="author" href="profile.html">Myself </a>
<a rel="me" href="profile.html">Myself </a>

This will help to classify the pages per author.
The profile page so designated must be on the site that contains this attribute.
More information.

April 11, 2011. Panda action extended to the World.

The Panda action against poor quality is now rolled out in the whole World.
But this target only English language queries (on local versions of the search engine). Google is also starting to take into account the fact that some sites are blocked by users. This is one more criterion but minor.
New big sites like eHow were affected by the update, but a lot of much smaller sites with indirect results as links from these sites are written down and this affects also other site, not directly affected.
Understanding Panda Update

Update March 3, 2011. Important change against content farms: Panda Update.

Called internally "Panda" (from the name of an engineer), this action impacted 11.8% of queries by reducing the presence in results pages of poor content, not original or not very useful. On the contrary those which provide detailed articles resulting from a original research will be favored.

"We want to encourage a healthy ecosystem..." Google said.

Google says that these changes does not come from the new extension for Chrome that allows you to block sites. But a comparison with data collected shows that 84% of the sites concerned are included in the list of blocked sites.
The effects appear today only in the U.S.A. Subsequently, this will involve the rest of the world. One main result will be an increase in Adsense revenue for other sites because these content farms are mostly intended to display advertisements.
Remain to see how content farms will be affected, on Alexa or Google trends and if it is a Farmer Day.
Finding more quality sites.
February 24, 2011.
List of sites penalized by the change.
Interview of Google's staff.

January 28, 2011. Change against copied content.

To fight against sites which take content from other sites or of wich the content has no originality, a change was made in the algorithm earlier this week, from 24 January.
This only affects 2% of queries but according to Matt Cutts, that's enough for you to experience a change in the positioning (in the case of Scriptol, the audience grew by 10%).
It is a further improvement affecting the long tail. This can affect content farms that produce line articles, necessarily not original.
Announcement by Matt Cutts.
How organized spam works. By SEOMoz.

January 21, 2011. New ranking formula.

The new algorithm is more efficient to detect spam in the page content, represented by a repetition of words, with the obvious intention of being ranked on those words.
They can be found in an article or blog comments.
See link below.

January 21, 2011. Algorithm better than ever against spam.

So says Google in a letter responding to criticisms about the quality of its search engine particularly in the fight against spam.
Google says ad posters Adsense does not a site prevent without any meaningful content to be downgraded nor participation in the Adwords program.
In 2010, the algorithm has undergone two major changes to cons spam. We remin the change that has affected the long tail at the expense of sites with no content.
Google want to go further in 2011 and invites webmasters to give their opinion. The target is mainly the "content farm" that provide interest-free pages filled with keywords to position themselves in the results.
Some sites as Demand Media (eHow, Answerbag), Associated Content, Suite101, could match the definition of content farm: lot of pages each day with few or no interest targetting the main asked keywords.
The algorithm will be enhanced to recognize the content copied or with no original content.
Google search and search engine spam.
Give your opinion.
Content farms. Exact definition of what is a content farm and the list.

Evolution of the algorithm in 2010

Significant changes have occurred in 2010 in results pages, including instant search, preview of sites, filtering by reading level and in the index with the inclusion of new file formats.
But for the ranking algorithm itself, progress is not as obvious. Search results pages are spam-infested, full of empty pages. Very large sites are able to generate millions of internal links or to satellites sites intended only to display advertisements.
Companies are formed to produce by an army of typists amounts of web pages to support advertising only, on which of course originality is totally absent.
It's not nice to ask a question and find for results a page with the same question and no answer. So the ability to assess the semantic content is where the engine should make progress.

December 2, 2010. Sentiment analysis added to the algorithm.

Following an article on the New York Times, denouncing the fact that a merchant who causes the dissatisfaction of its customers and generates many complaints in blogs and forum gains an advantage with search engines, Google reacted.
Indeed, when we denounce the practices or content of a site, we put links on it to provide examples, and these backlinks are treated as a popularity indice by search engines, which translates into better ranking in the results!
Google therefore developed an algorithm for sentiment analysis, which aims to recognize if the text surrounding a link is positive or negative towards it, depending keywords it contains to penalize sites that we complained.
Google also advises the nofollow attribute to put a link on a site without wishing to contribute to its positioning.
Being bad to your customers is bad for business.
Large-Scale Sentiment Analysis for News and Blogs. Analysis in English of the algorithm.

November 5, 2010. Black Friday.

Since 21 and 22 October depending on the region, a modification of the algorithm on the ranking in results affected a lot of sites, some losing up to 80% of their traffic. The Alexa search engine, has published graphs showing huge losses or gains equivalent to some sites.
These changes seem permanent. The purpose of these changes appears to be intended to improve the relevance of results.

August 31 2010. SVG indexed.

SVG content now indexed either it is in a file to include or embedded into HTML code.
List of files formats supported by Google.

August 20, 2010. Harmful internationalization?

Some webmasters have seen their traffic increased from search engines other than Google Google.com or that of their country.
So Americans can see the arrival of visitors who visit other ccTLD engines, such as google.co.uk or google.fr for example, implying that the engine of other ccTLD includes U.S. sites in the results.
This could reduce the audience for the sites in these countries.

June 8, 2010.  Caffeine builds a fresher index.

Google announced June 8 that the new indexing engine, Caffeine is finalized. It offers a new index with 50% fresher results.
Its operation differs from the previous system which was updating as a whole, by waves. Caffeine updates the index incrementally. New pages can be added and made available for search as soon as they are discovered.
The new architecture allows also to associate a page to several countries.
Caffeine vs. previous system.

May 27, 2010. MayDay: The long tail evolves.

This was confirmed by Matt Cutts at the Google I/O in May, the radical evolution in the month of April comes from the change in the algorithm, to promote quality content on the long trail.

This is an algorithmic change in Google, looking for higher quality sites to surface for long tail queries. It went through vigorous testing and isn’t going to be rolled back.

Remind that the long tail is the set of queries with multiple keywords, each being rare, but which together form the bulk of traffic to a site.

Webmasters gave the evolution the name of MayDay. I have previously called Black Tuesday. This has been disastrous for some sites well established but having not enough content in deep pages. This happened in late April and early May depending on the sites, even though other sites have experienced a loss of traffic for other reasons.
This has boosted traffic on scriptol.com.
Google confirms Mayday impact. By Vanessa Fox that says also Caffeine is not live yet.
Matt Cutts explains Mayday in a video. It is not related to Caffeine and is definitive. Webmasters must add content to their page to retrieve the traffic lost.

April 27, 2010. Black Tuesday: Ranking changes on the long tail.

The long tail is the set of pages on a site that make few visit each but all together have a large traffic.
Queries on multiple keywords, make the long tail.
Many sites have seen a change in traffic of these pages since April 27. Some have lost up to 90% of their traffic. They attributed this change to Caffeine, the new infrastructure of Google indexes more pages and creates more competition, but it has been confirmed later by Google it is a change in the algorithm (see May 27).

April 9, 2010. Site speed.

It is officially a ranking factor. This was announced a few months ago, it became reality: a site that is too slow is now downgraded in SERPs or at least has a chance to be in conjunction with other factors.

Today we're Including a new signal in our search ranking algorithms: site speed.

It is possible to know if your site is too slow from Google Webmaster Tools (Labs -> Site performance).
Using site speed in web search ranking.

November 19, 2009. The speed of a site will be a ranking factor in 2010.

This is what Matt Cutts has just said in an interview.

"Historically, we haven't had to use it in our search rankings, but a lot of people within Google think that the web should be fast.
It should be a good experience, and so it's sort of fair to say that if you're a fast site, maybe you should get a little bit of a bonus. If you really have an awfully slow site, then maybe users don't want that as much.
I think a lot of people in 2010 are going to be thinking more about 'how do I have my site be fast, how do I have it be rich without writing a bunch of custom javascript?"

This should favor static website with no SQL. See our article, How to build a CMS without database.
See also Let's make the Web faster.
The interview.


According to Google, 540 improvements was made to the search engine in the year 2009.

December 15, 2009. Canonical Cross-Domain.

Taking into account the attribute rel="canonical" which was implemented some months ago to avoid duplicate content between pages within a site, has been extended to similar pages on different domain names.
It is still preferable to use 301 redirects when you migrate a site on another domain.
Source Google.
To protect your site against other sites that might copy your content without permission, see how to build a generic canonical tag in PHP.

August 11, 2009.New search engine, Caffeine.

Google is trying a new search engine that is intended to be faster and to provide more relevant results.

July 2, 2009. Less weight for irrelevant backlinks.

This is not confirmed officially by Google (who spoke little of its algorithm in any case), but webmasters believe that the results have changed and that positions in the SERPS are lost because they came from quantities of lower quality backlinks.
What are irrelevant links? There are:
- Blogroll.
- Backinks from social sites.
- From directories.
- Backlinks in footers in partner sites.
- Links included in CMS templates.
In fact Google recently announced that it would take no more account of blogrolls. It is without doubt the result. And it is not just a loss of importance to these links: they are no longer taken into account!

With regard to social sites (such as Delicious, Stumbleupon), in contrast, Google said in a roundtable with webmasters: "They are adressed as other sites".

June 19, 2009. Flash resources indexed.

The crawler is able to index Flash application, but now, it can index images and texts uses by these applications too. Source Webmaster Central Blog.

June 2, 2009. New effect of the nofollow attribute - Onclick links.

The nofollow attribute let crawlers to ignore a link in a page. So the PR is distributed among the remaining links.
It now appears that the PR is first distributed among all the links (with or without nofollow) and then not distributed to the nofollowed links.
Example: You have 10 points and 5 PR links, 2 points are awarded for each. If two links are nofollow, no PR is passed through them, but others will not receive more points, they will receive only 6 points shared in 3.
The consequences are dramatic, links in comments on a blog would result in lost of PR for other links.

Quoting Matt Cutts:

"Suppose you have 10 links and 5 of them are nofollowed. There’s this assumption that that the other 5 links get ALL that PageRank and that may not be as true anymore."

More on PageRank and nofollow.
Also, Google takes into account links assigned in the onclick event.

April 4, 2009. Local Search.

Google improves local search, based on the IP address, which allows it to find the country and the city of a visitor. From it, Google tries to show in results sites that are located as closely as possible.
To take advantage of this option requires that the research includes a place name, in which case a map is displayed.
Google's Blog.

February 26, 2009. Brand names.

The algorithm gives now more weight to brand name and therefore promotes related sites. This is confirmed by Matt Cutts (head of staff and spokesman of Google) in a video.
The video.

February 25, 2009. The canonical tag.

A new tag tells to crawlers which URL it should index when a page is accessible with multiple addresses.
The duplicate content problem solved.

July 16, 2008.

Google introduced on an experimental basis in some of Wikia's search engine. Users can mark results as good or spam.
The engine takes into account, but for the user who has scored only. For now ...

July 2008.

Google announces that it has one trillion URLs of Web pages in its database.
The pages are all indexed.

June 2008. Nofollow and PageRank.

Nofollow links do not count for the transmission of PageRank, but their PR is not spread over the normal links.
So the PR sent to related pages is divided by the number of links first, then when it evaporates links to nofollow.
Source: PageRank Sculpting.

May 17, 2007. Universal search

New algorithm and architecture to populate the results pages of diverse content such as images, videos, maps, news...

October 19, 2005. Jagger Update

This update adds more weight to relevance in the links. Important sites appear as fortunate.
Spam is fought, especially techniques using CSS for hidden content for visitors.
An analysis of the Jagger Update.

May 20, 2005. Bourbon Update

An update to penalize sites with duplicate content, links to irrelevant pages (unrelated to the linked page), reciprocal links in quantity, quantity links to a nearby site. This has affected many sites with collateral damage.

2003. Florida Update.

It upset the SERPs. One of the key changes and that the algorithm works differently for different types of queries, and the SERPs are populated with the results of different and complementary types.


The Google search engine appears on the Web.

