WYSIWYG Web Printing with PDF & Response Filters

By far the #1 complaint I hear from users of various web applications is that printing a page is not WYSIWYG.  It’s really a simple problem that the industry has sort of swept under the rug for most of the history of the Internet.  Browser printing in the current versions of IE and Firefox is simply horrible, and while server technologies like ASP.NET and PHP have gotten better and better at screen presentation with every release, printing simply not kept up pace.  As web developers we spend our time figuring out all the intricacies of browser page rendering, and often are asked to do the same thing for printing.  It’s frankly a frustrating, difficult problem to solve.  IE7 should finally fix many of these problems when it’s released, but what are we supposed to do until then?  And even when it’s released, how many years will we be designing to support the current crop of web browsers?

Adobe PDF Printing Rocks

Adobe’s PDF format does have it’s issues, but it also has huge advantages that make it a great solution for creating printable website documents. 1) Nearly all your users already have the free reader software and 2) Printing is nearly 100% WYSIWYG.  So what’s the solution to your website printing problems at least until you can guarantee that the majority of your users will be using a print-friendly browser?  Simple, when a user makes a print request on your site, dynamically create PDF documents on the fly making sure to include all of your users page state, and output that PDF document to the browser.  Simple right? Well, it’s actually easier said than done.  You have two big issues when attempting to solve this problem: Finding a decent PDF writing component to deploy on your webservers, and being able to accurately render a html page to this component while at the same time preserving your user’s page state.

Finding a Decent PDF Component

In order to create a PDF document on the fly, I wanted a PDF component that would take a URL and create a document from the resulting page.  There are plenty of resources out there for you to use that will help you find the right PDF component, so I won’t list them here.  I settled on webSuperGoo’s ABCPdf component.  I liked it’s simplicity – you can grab any page and turn it into a PDF, including JavaScript rendered elements using the following code:

Doc theDoc = new Doc();

theDoc.HtmlOptions.PageCacheEnabled = false;

theDoc.HtmlOptions.PageCacheClear();

 

theDoc.Page = theDoc.AddPage();

 

int theID;

theDoc.HtmlOptions.UseScript = true;

theID = theDoc.AddImageUrl(“http://codebetter.com);

 

while (true)

{

    theDoc.FrameRect(); // add a black border

    if (!theDoc.Chainable(theID))

        break;

    theDoc.Page = theDoc.AddPage();

    theID = theDoc.AddImageToChain(theID);

}

 

for (int i = 1; i <= theDoc.PageCount; i++)

{

    theDoc.PageNumber = i;

    theDoc.Flatten();

}

You can then take the resulting document and stream it back to the browser.  Here’s what the PDF output looks like rendered via the Acrobat reader in IE

 

Preserving Page State

A problem you’ll find with this method of generating PDF documents is that while it works well for static HTML pages and other content, it quickly breaks down for anything but the simplest web applications.  The reason?  This PDF component won’t carry the user’s state information such as login status, session and viewstate cache, through the request, therefore, the resulting PDF output may not show things like secure content, grid sorting, or other page content that is dependent on Page State.  You’ll often bump up against this limitation when you’re trying to stream the content of a web response to some other output location, such as a file, database, cache, email or something else entirely.  The solution to this problem is to create a custom ASP.NET Response.Filter.  This will allow you to intercept the response that would normally be sent to the user’s browser, and do all kinds of neat things with it, like turn it into a PDF document.  Here’s a great article from a great site that talks all about these Response.Filter thingeys.

Putting it all Together

In order to get this all to work properly, you’ve got to wire it all up.  I’ve found that the following method works best:

First, create a linkbutton on your master page to allow the user to create a “Printable PDF View” Handle the button click event, and hand off the response to your custom PDF response filter.  Here’s the code Snippet which switches the filter:

    protected void LinkButton1_Click(object sender, EventArgs e)

    {

        // HOW THIS WORKS:

        //    Add a new response filter.  This saves the page output in the database, and redirects to

        //    Pdf/PageToPdf.ashx.  This page creates a PDF object and adds the saved output page via

        //    Pdf/PreRenderedPdfPage.ashx.  This allows the current page state and view to be requestsed

        //    via the PDF object, which normally wouldn’t be able to share session, etc.

        Response.Filter = new PdfResponseFilter(Response.Filter);

    }

In your custom filter, write the output stream temporarially to the cache (or wherever you want to temporarially store it).  It has to be stored somewhere that any web request can access it.  Good candidates are the file system, cache, or the database.  NOTE: This is what is called “Security by Obscurity” and is generally considered not secure enough for any highly sensitive data, such as credit card information.  The issue is that while it may be extremely difficult to guess the GUID and you’d have to make the page request during the milliseconds or so that passes while this rendered HTML waits to be picked up, it is possible that it could be intercepted.  So, don’t do this if you work for a bank or the government . I don’t and find it secure enough for most applications I work with.

    public override void Write(byte[] buffer, int offset, int count)

    {

        string strBuffer = UTF8Encoding.UTF8.GetString(buffer, offset, count);

 

        // ———————————

        // Wait for the closing </html> tag

        // ———————————

        Regex eof = new Regex(“</html>”, RegexOptions.IgnoreCase);

 

        if (!eof.IsMatch(strBuffer))

        {

            responseHtml.Append(strBuffer);

        }

        else

        {

            responseHtml.Append(strBuffer);

            string finalHtml = responseHtml.ToString();

 

            Guid g = Guid.NewGuid();

 

            HttpContext.Current.Cache[g.ToString()] = finalHtml;

 

            HttpContext.Current.Response.Redirect(“PageToPdf.ashx?DocId=” + g.ToString(), false);           

        }

    }

The PDF component has be issued a one-time ticket (GUID) to pickup the resulting HTML (which may contain secure information) and the Http response is re-directed to an ASHX handler that will create the PDF document.  This handler creates a SuperGoo pdf document, and requests the stored html, passing in the ticket (GUID). 

public void ProcessRequest(HttpContext context)

    {

 

        Doc theDoc = new Doc();

        theDoc.HtmlOptions.PageCacheEnabled = false;

        theDoc.HtmlOptions.PageCacheClear();

 

        theDoc.Page = theDoc.AddPage();

 

        int theID;

        theDoc.HtmlOptions.UseScript = true;

        theID = theDoc.AddImageUrl(“http://localhost/” + context.Request.ApplicationPath + “/PreRenderedPdfPage.ashx?DocId=” + context.Request["DocId"]);

 

        while (true)

        {

            theDoc.FrameRect(); // add a black border

            if (!theDoc.Chainable(theID))

                break;

            theDoc.Page = theDoc.AddPage();

            theID = theDoc.AddImageToChain(theID);

        }

 

        for (int i = 1; i <= theDoc.PageCount; i++)

        {

            theDoc.PageNumber = i;

            theDoc.Flatten();

        }

 

 

        context.Response.ClearHeaders();

        context.Response.Expires = 0;

        context.Response.Cache.SetCacheability(HttpCacheability.NoCache);

        context.Response.Cache.SetNoServerCaching();

        context.Response.Cache.SetNoStore();

        context.Response.Cache.SetMaxAge(TimeSpan.Zero);

 

        context.Response.AddHeader(“content-disposition”, “attachement; filename=Doc” + context.Request["DocId"] + “.pdf”);

        context.Response.ContentType = “application/pdf”;

 

        theDoc.Save(context.Response.OutputStream);

        theDoc.Clear();

 

        context.Response.End();

    }

A final ASHX handler writes out the stored HTML and removes it from cache.

    public void ProcessRequest (HttpContext context)

    {

        context.Response.Expires = 0;

        context.Response.Cache.SetCacheability(HttpCacheability.NoCache);

        context.Response.Cache.SetNoServerCaching();

        context.Response.Cache.SetNoStore();

        context.Response.Cache.SetMaxAge(TimeSpan.Zero);

 

        if (context.Request["DocId"] != null && context.Cache[context.Request["DocId"]] != null)

        {

            String preRenderedHtml = context.Cache[context.Request["DocId"]].ToString();

 

            // Write out this document.               

            context.Response.Write(preRenderedHtml);

 

            // Delete this so no one can pick up this page by guessing an ID later on.

            context.Cache.Remove(context.Request["DocId"]);

        }

        else

        {             

           context.Response.Write(“<HTML><BODY><H1>THIS PAGE HAS EXPIRED.</H1></BODY></HTML>”);

        }                   

 

 

        context.Response.End();

    }

This all allows the exact HTML that would normally be sent back to the browser, to be routed through the PDF component and sent back to the user in PDF resulting in a true WYSIWYG printable page.

I’ve thrown together a quick sample app that you can download here that demonstrates this entire process.  You’ll have to visit webSuperGoo and download the trial PDF component to get this to work.   It shows you the following page, allows you to set some session and view state information, then allows you to generate a WYSIWYG PDF document.

Clicking the Printable PDF View link will generate a pdf, with the calendar and other page state and login information saved.

Good luck, and I hope this can help solve your web printing headaches, it certainly did for me!

[tags: ASP.NET, PDF, WYSIWYG Printing]

About Brendan Tompkins

By day, I'm a software engineer working in the transportation and logistics industry... By night I'm a fabulous disco dancer.
This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Dave

    It looks like you are simply embedding a “screenshot” of the page into a pdf. Can you actually select the text in the PDF you generated or is it just one image? Seems to me like you could do the same think without any 3rd party components.

  • btompkins

    “Can you actually select the text in the PDF you generated or is it just one image”

    Dave, this component creates a true PDF with selectable text and everything, this is why it’s so cool! You can scale in and print a perfect copy.

  • Dave

    Thanks for clearing that up for me. When I’ve looked into these types of controls before, they all used IE as the underlying renderer to get the webpage, is this one different?

  • btompkins

    The underlying mechanism looks to be a com automation of IE, so yes, it’s rendering the webpage from IE.. This is a good thing tho, because you don’t have to design for yet another browser.

    I don’t know how it scrapes the rendered page though.. Maybe via the printer output? It does do a true pdf document…

  • http://www.m2designgroup.com dmitchell

    Great artice except when I run the app I get “This Page Has Expired.” for my pdf. Any idea what i am doing wrong? I would greatly appreciate some help. i have been working on this for days.
    Thanks
    Dave Mitchell
    dmitchell@mhcc.state.md.us

  • btompkins

    Yes,

    I ran into this too. For some reason, under some browsers the page is being requested twice. To fix this, in the ProcessRequest (HttpContext context) method, comment out the line that clears the cache. You’ll have to clear out the cache some other way, perhaps using a timed expiration.

    // context.Cache.Remove(context.Request["DocId"]);

    Good luck!

  • dmitchell

    I am so close. The fix above to PreRenderedPage.ashx didn’t help. I still get This Page Has Expired. I am using IE 6.0.

  • btompkins

    Try changing the ProcessRequest method to the following:

    public void ProcessRequest (HttpContext context)

    {

    context.Response.Expires = 0;

    context.Response.Cache.SetCacheability(HttpCacheability.NoCache);

    context.Response.Cache.SetNoServerCaching();
    context.Response.Cache.SetNoStore();
    context.Response.Cache.SetMaxAge(TimeSpan.Zero);

    String preRenderedHtml = context.Cache[context.Request["DocId"]].ToString();

    // Write out this document.
    context.Response.Write(preRenderedHtml);
    context.Response.End();

    }

  • dmitchell

    Damn! Almost. I got the following error related to Line 21
    String preRenderedHtml = context.Cache[context.Request["DocId"]].ToString();

    “Object reference not set to an instance of an object”

  • dMitchell

    I was hoping I could get a working example. Still having trouble with Line 21.

  • David Bilbow

    HI,

    We are using a similar technique. One of the issues is long pages. The pdf put page breaks at the end of the page (makes sense). What we would like to do is control where the page breaks are inserted. We have trying to set a style to force page breaks which works when you print.

     

    This seems to be ignored which doesnt surprise me as not all HTML/CSS is supported.

    Have you had a requirement for this or do you know of any solutions?

    Thanks
    David

  • mon

    in the splitting of the document into multiple pages, how can i put footer and header margins in each break so it wouldn’t look like its a continuous document. this is for aesthetics purpose. thanks

  • http://www.winnovative-software.com razva

    you can simply get the html code from a .aspx page with the Server.Execute() method and then convert it with the html to pdf converter from http://www.winnovative-software.com

  • scokim

    does anyone have the sample app written by brendan please ?

  • g

    g

  • http://www.printingblue.com/envelopes-printing.asp envelopes printing

    The process methodology is easy to understand and i think it can help us in gaining many sample app by Brenda…