Linq to Xml – querying an RSS feed

Linq (and all its flavors) will come out with .Net 3.5 and Visual Studio 2008. Along with the Xml support are classes such as XDocument, XElement, XAttribute, etc. What’s interesting about XElement in particular is that it allows us to load some Xml from many sources and query into it. Depending on your experience with XPath, an Xml Linq query may be easier to write AND read.

Let’s look at a sample Linq to Xml query. I’m going to use my blog’s RSS feed, http://feeds.feedburner.com/jeffreypalermo.

Let’s start simple. We’ll read in the Xml from the RSS feed and enumerate through all the posts (or "items" in the RSS world).

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 select item;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

Note that we are merely reading in the feed into an XElement and then selecting all item nodes from the document.

That’s pretty interesting that we can filter to only the "item" nodes that quickly, but what if I want to find the posts with the author of "Jeffrey Palermo" (never mind that this is my feed, but imagine it was a composite feed).

XNamespace dc = "http://purl.org/dc/elements/1.1/";

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 where item.Element(dc + "creator").Value == "Jeffrey Palermo"
 select item;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

I gave this example to illustrate how to use the XNamespace class to help filter in nodes that declare a namespace prefix such as "dc:creator" in an RSS feed. Note the "where" clause in my Linq query.

Let’s go further and filter down to just the posts that contain the term "altnetconf" (or choose another term to search for)

XNamespace dc = "http://purl.org/dc/elements/1.1/";

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 where item.Element(dc + "creator").Value == "Jeffrey Palermo"
 && item.Element("description").Value.Contains("altnetconf")
 select item;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

Now we’ve seen how the where clause functions. Let’s shape the data now. I’ll drop the where clause for simplicity, and I want to grab my feed and put a title and abstract into a collection for use in my application. I’ll use an anonymous type to help me do that.

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

var items = from item in rssFeed.Elements("channel").Elements("item")
 select new { Title = item.Element("title").Value, 
 Abstract = item.Element("description").Value.Substring(0, 100) }
;

Console.WriteLine(items.Count());
foreach (object o in items)
{
 Console.WriteLine(o);
}

Here, I have created an anonymous type with a Title and Abstract. Now that I have the information in object form, I can work with it. I would not want to work with Xml throughout my application, but only at the edges. I prefer to pull in the information and get it into my domain model as quickly as possible because objects are easier to work with that raw Xml.

I’m not finished yet because my type doesn’t have a name, so there is no way I can pass my IEnumberable<whateverthetypeis> to another part of my application. I need to hydrate one of my domain objects so I can work with my abstracts.

I’m going to use this class:

public class PostAbstract
{
 private string _title;
 private string _abstract;

 public string Title
 {
 get { return _title; }
 set { _title = value; }
 }

 public string Abstract
 {
 get { return _abstract; }
 set { _abstract = value; }
 }
}

I’ll have the results of my Xml query create a set of PostAbstracts, and then I can work with them.

XElement rssFeed = XElement.Load(@"http://feeds.feedburner.com/jeffreypalermo");

IEnumerable<PostAbstract> items = from item in rssFeed.Elements("channel").Elements("item")
 select new PostAbstract { Title = item.Element("title").Value, 
 Abstract = item.Element("description").Value.Substring(0, 100) }
;

Console.WriteLine(items.Count());
foreach (PostAbstract anAbstract in items)
{
 Console.WriteLine("{0} - {1}", anAbstract.Title, anAbstract.Abstract);
}

Note that I merely gave my type a name, PostAbstract. I used object initializers to set the properties, and now I have a set of objects I can work with to accomplish my purpose.

If you enjoyed this article, subscribe to my feed: http://feeds.feedburner.com/jeffreypalermo

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

10 Responses to Linq to Xml – querying an RSS feed

  1. Jim Wooley says:

    In addition to querying the raw XML, you can use the new SyndicationFeed in the 3.5 System.ServiceModel.Web library to load feeds in either RSS or ATOM and have strongly typed results which you can then query through LINQ to Objects. See http://devauthority.com/blogs/jwooley/archive/2007/10/12/85717.aspx for a sample implementation.

  2. DonXML says:

    @Jeffrey

    You don’t need a schema to use XML Properties (that’s what they are called), but it does help with the intellisense.

  3. @Jeremy,
    Nope, but I wonder if there is Linq support in J#. . . oh that’s right, Microsoft silently deprecated that language.

  4. @DonXML,
    Yep, I demoed the VB xml schema intellisense at the Austin DevCares event last Friday. It is cool how by importing the schema one could access the Xml with better syntax.

  5. @Joe,
    Yeah, I had to manually fix the  

  6. Joe Ocampo says:

    Looks like your post are suffering from the same “ ” Live Writer Beta 3 bug that I have. Gotta love a double space after an end of sentence. ;-)

  7. VB.Net, its the ultimate Alt.Net experience!

  8. Isn’t there something in the CodeBetter bylaws about VB.NET?

  9. DonXML says:

    Add the VB version is even easier to read and write, with no perf difference from C#!

    Dim items = From item In rssFeed.. _
    Select New PostAbstract With {.Title = item..Value, _<br /> .Abstract = item.<description>.Value.Substring(100)}</p> <p>and handling namespaces is even easier. You declare namespaces as import statements:</p> <p>Imports <xmlns:dc ="http://purl.org/dc/elements/1.1/"></p> <p>And then refeerence them:</p> <p>Dim items = From item In rssFeed.<channel>.<item> _<br /> Where item.<dc:creator>.Value = “Jeffrey Palermo” _<br /> Select item</p> <p>Works the same way when writing XML</p> <p>It you don’t want to write your whole app in VB, you can use Hanselman’s ILMerge technique: <a href="http://www.hanselman.com/blog/MixingLanguagesInASingleAssemblyInVisualStudioSeamlesslyWithILMergeAndMSBuild.aspx" rel="nofollow">http://www.hanselman.com/blog/MixingLanguagesInASingleAssemblyInVisualStudioSeamlesslyWithILMergeAndMSBuild.aspx</a></dc:creator></item></channel></xmlns:dc></description>

  10. Good post, Jeffrey. It illustrates very well how Linq can help us in the near future.

Leave a Reply