Remember how we discussed the topic of enabling full-text RSS feeds? Well, full-text RSS feeds contain the full text of your articles. But you might also want to include some text in the RSS feeds that does not show in the articles of the site itself. Using this trick, you can include practically anything you want in your RSS feeds; for the purposes of this article we're going to take on spammers as an example of that.
Suppose there is a spammer harvesting your RSS feeds for content on his site. You don't want to help him, but you don't want to remove full-text RSS feeds on your site.
So how do we discourage spammers from harvesting our content? We ameliorate their content-reproduction scams basically by tacking a link to the original article onto the RSS feed text. This has an effect that goes against the interests of the spammer: every time they harvest content from your site, that gives your site backlinks while increasing your search engine rank.
And, oh, this hack is very simple.
Injecting a custom script into the RSS generation template
Remember how we had customized the
rss_template? Here is the snippet we added:
<content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/" tal:condition="obj_item/getText | nothing" tal:content="structure python: '<![CDATA[' + obj_item.getText() + ']]>' ">blah </content:encoded>
Well, we're going to change that a bit now:
<content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/" tal:condition="obj_item/getText | nothing" tal:content="structure python: '<![CDATA[' + context.rss_antispam(obj_item) + ']]>' ">blah </content:encoded>
Creating the antispam script that modifies the RSS content on the fly
You'll note that we've now included a call to
context.rss_antispam(). This is a Script (Python) that you're going to add to the same folder where your customized
rss_template lives. The contents of the script are straightforward:
text = object.getText() try: m = unicode(text,"utf-8") # if this is unicode, the next line does not execute text = text.decode(object.getCharset()) # convert to unicode except TypeError,e: pass title = object.pretty_title_or_id() try: m = unicode(title,"utf-8") # if this is unicode, the next line does not execute title = title.decode(object.getCharset()) # convert to unicode except TypeError,e: pass link = object.absolute_url() pattern = u'<p><small>This article was culled from <a href="%s">%s</a></small></p>' preface = pattern%(link,title) return u"\n" + preface + u"\n" + text
Once you have added this script and set its ID to
object to its parameter list.
You'll note that the script contains several casts to Unicode text for your article's fields. This has a rationale behind it: Plone sometimes returns Unicode objects which, when concatenated to straight text objects, produce an UnicodeEncodeError. All we do here is convert Unicode objects to UTF-8 encoded straight text ones. This prevents the error.
And that's it.
Now, when an RSS feed is accessed on your site, every object to be "feedified" will be run through the
rss_antispam script, which will prepend a nice direct link in small typeface to the text of your article.
You'll notice that you can add anything in your antispam script, not just the link to the original article. Original ideas for you to add:
- a link to the comments anchor on your article
- articles on your site related to the text in question
- links to other feeds or topic search results