ASP.NET Triple Whammy I got a phone call from client last week asking for some help using .NET to grab the contents of a web page full of links. Pulling down the content of a web site is pretty simple using the WebRequest and WebResponse objects. The following code demonstrates how to “scrape” the contents of a web page using ASP.NET ‘– Import a couple of assemblies Imports System.IO Imports System.Net ‘– And run this code ‘– open the channel to web site Dim oReq As WebRequest = _ System.Net.HttpWebRequest.Create(“http://www.dashpoint.com”) ‘– get a response from the site Dim oResp As WebResponse = oReq.GetResponse() ‘– attach the stream to a reader Dim oSRead As New StreamReader(oResp.GetResponseStream) ‘– get the content Dim cContent As String = oSRead.ReadToEnd MessageBox.Show(cContent) That was pretty simple. Now for the problems: Problem 1: Attaching to HTTPS site with credentials The site we were accessing was an HTTPS site. So we needed to login with username and password. How the heck do you do that. Its actually pretty simple. If you want to connect to an HTTPS secured site you need to create “Credentials” to hand to the site. You do this by creating a System.Net.Credentials object and attaching it to the request object like so ‘– Create the credentials for HTTPS and ‘– attach them to the request object Dim oCred As New _ System.Net.NetworkCredential(<<USERNAME>>, <<PASSWORD>>) oReq.Credentials = oCred Problem 2: Attaching to HTTPS site a bad/invalid certificate Second problem was that the site we were connecting to had a “questionable” certificate. We received an error in our browser when trying to attach to this site via a browser. Like so: This is more common than you would think. So how do you connect to an HTTPS site with a bad certificate? After a little research using Google we found code that discussed overriding the certificate policy by creating a class that implements the ICertificatePolicy interface. The code below demonstrates this class: Imports System.Net Public Class CertificateOverride Implements ICertificatePolicy Public Function CheckValidationResult(. . .)As Boolean _ Implements System.Net.ICertificatePolicy.CheckValidationResult Return True End Function End Class Basically this interface implements one function. When a bad certificate is found a value is handed to the CheckValidationResult and the type of error found is passed into the certificateProblem parameter. The list of possible values for this are as follows: I found these values at the following blog: http://www.codexchange.net/PreviewSnippet.aspx?SnippetID=d40708fc-4041-42b8-9016-f0ac96d14fce Basically we trusted the site we were connected to so we defaulted the return value to true from this function regardless of the problem. After creating this class we needed to override the certificate management in our code. So we added this code to the top of the function: ‘– over ride the bad certificate error ServicePointManager.CertificatePolicy = New CertificateOverride ‘– over ride the bad certificate error ServicePointManager.CertificatePolicy = New CertificateOverride ‘– open the channel to web site Dim oReq As WebRequest = _ System.Net.HttpWebRequest.Create(“http://www.dashpoint.com”) ‘– set the credentials for HTTPS Dim oCred As New System.Net.NetworkCredential(“”, “”) oReq.Credentials = oCred ‘– get a response from the site Dim oResp As WebResponse = oReq.GetResponse() ‘– attach the stream to a reader Dim oSRead As New StreamReader(oResp.GetResponseStream) ‘– get the content Dim cContent As String = oSRead.ReadToEnd MessageBox.Show(cContent) So now we have a comprehensive example of connecting to sites with bad(or good certs) and using credentials. ONe good thing is that the .NET Framework was capable of doing this every step of the way! NOTE: In a comment someone pointed out that the CerticatePolicy interface I used was obsolete for the 2.0 Framework (My client still uses VS 2003 and 1.1 framework for there code). Did a little digging and found that this interface is now done via a callback function. the code and class for this are below

CertEXPIRED = 2148204801,
CertVALIDITYPERIODNESTING = 2148204802,
CertPATHLENCONST = 2148204804,
CertROLE = 2148204803,
CertCRITICAL = 2148204805,
CertPURPOSE = 2148204806,
CertISSUERCHAINING = 2148204807,
CertMALFORMED = 2148204808,
CertUNTRUSTEDROOT = 2148204809,
CertCHAINING = 2148204810,
CertREVOKED = 2148204812,
CertUNTRUSTEDTESTROOT = 2148204813,
CertREVOCATION_FAILURE = 2148204814,
CertCN_NO_MATCH = 2148204815,
CertWRONG_USAGE = 2148204816,
CertUNTRUSTEDCA = 2148204818
Imports System.Net
Imports System.Net.Security
Imports System.Security.Cryptography.X509Certificates
Public Class CertificateOverride
Public Function RemoteCertificateValidationCallback( _
ByVal sender As Object, _
ByVal certificate As X509Certificate, _
ByVal chain As X509Chain, _
ByVal sslPolicyErrors As SslPolicyErrors _
) As Boolean
Return True
End Function
End Class
Dim oCertOverride As New CertificateOverride
‘– over ride the bad certificate error
ServicePointManager.ServerCertificateValidationCallback = _
AddressOf oCertOverride.RemoteCertificateValidationCallback
‘– open the channel to web site
Dim oReq As WebRequest = _
System.Net.HttpWebRequest.Create(“http://www.dashpoint.com“)
‘– set the credentials for HTTPS
Dim oCred As New System.Net.NetworkCredential(“”, “”) oReq.Credentials = oCred
‘– get a response from the site
Dim oResp As WebResponse = oReq.GetResponse()
‘– attach the stream to a reader
Dim oSRead As New StreamReader(oResp.GetResponseStream)
‘– get the content
Dim cContent As String = oSRead.ReadToEnd
MessageBox.Show(cContent)
Thank Rod, but I have a problem. I ckecked code source and I received statuscode 200, but only this. I would receive a text like CORRECT or INCORRECT but I receive nothing. Would you help me?
Thank in advance.
Thank Rod, but I have a problem. I ckecked code source and I received statuscode 200, but only this. I would receive a text like CORRECT or INCORRECT but I receive nothing. Would you help me?
Thank in advance.
Thanks Rod! This code bailed me out big time. I needed to post variables to the page so I added the following before oResp was defined: (example)
Dim data As String = “action=updateAll&maxPrice=590&sPrice=0&startingPos=0&nbrRecs=60″
Dim writer As New StreamWriter(oReq.GetRequestStream)
writer.Write(data)
writer.Close()
Rod,
Thanks for the code…but where do I put it in my project? Can this be used in the webbrowser control?
Thanks!
pls help to login
secure.lme.com/…/Dataprices_daily_metals.aspx
can you write a code which working true.
Thanks.
Manic is right , i want to connect but i did not it. pls help
Thanks
This is not working for me. Check this page:
https://secure.lme.com/Data/community/Dataprices_daily_metals.aspx
Great post! Thanks
In problem 1 there is mention of a UserName and Password. The site I’m trying to scrape also has a Username and password to get into the site. Is this the same? Or is the elements used for the certificate only?
Worked a treat. Thanks
I saw a lot of C# examples, but this was the first VB example I saw.
Thank you very much !
This is exactly what I was looking for.
Great help.
Thanks for the great code. I have a question, if the site require authentication (form authentication) we need to do it through post and then store the cookie to work further.
The problem I am facing is when I attach the cookie with the second page after successful login, it failed to retrieve any data due to SSL failure. Can you help me on that.
hi sir
i need to login programattically and scrapping data from a website which is the next page after login .
for this i m using html agility pack.
i m using the link http://www.dotnetjunkies.com/WebLog/joshuagough/archive/2006/01/20/134825.aspx
as reference .
for trial i m trying to login in gmail and code is as following……………………
using System;
using System.Data;
using System.Configuration;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
FormProcessor p = new FormProcessor();
string userName = “*****************”;
string password = “******************”;
Form form = p.GetForm(“https://Gmail.com”,”//form@name=’loginForm’”, FormQueryModeEnum.Nested);
form”j_username”.SetAttributeValue(“value”, userName);
form”j_password”.SetAttributeValue(“value”, password);
HtmlDocument doc = p.SubmitForm(form);
string strBal = doc.DocumentNode.SelectSingleNode
(“//span@class=’redText’”).InnerText;
strBal = System.Web.HttpUtility.HtmlDecode(strBal);
strBal = strBal.Substring(1).Trim();
}
}
in which i m facing problem in xpath //form@name=’loginForm’ the error is node not found.
i want to know that how can i compose the xpath for any website . plz tell me complete reference about it.
thanks in advance
sharad soni
Love it Thanks
Very useful and simple – thanx !
DisonWorld: this article is suitable for a web application calling another web service, not for a web browser accessing a web application.
useful posting.
If the site using the https, then when a user visits the site, a popup same as the picture in this article will show, could anyone tell me how to remove it?
That is, is there a way to put any codes in the aspx, then the popup will never show again?
I have tried to put the following codes [C# 2.0] in the global.asax [Application_AuthenticateRequest]
System.Net.ServicePointManager.ServerCertificateValidationCallback += delegate(object objSender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors)
{
return true;
};
But the popup still shows after the codes run.
Super useful code here – I can now make my utility to go and grab my PIX firewall configurations on a regular basis with ease! Sweet & Thank you!
Good point joshua. The HTTPS didn’t require authentication it was the fact that the web site did. The HTTPS issue was the bad certificate.
Do the credentials really have anything to do with the fact that the site uses https? Isn’t it just because the site requires authentication (windows, or forms)? You would have to supply the credentials, regardless of the protocol. Similarly, if the site was available to anonymous users, I don’t think you would have to supply credentials – even it it was over https. Protecting access (authentication) is different than encrypting the traffic on the wire (https).
Hey Rod
Good to see there’s a VB guy here now that I’ve left codebetter
One thing to note however is that the ServicePointManager.CertificatePolicy property is now marked as obsolete in the 2.0 framework (the code is completely valid in 1.1, and still works in 2.0).
The warning says that to use ServerCertificateValidationCallback instead.
I discovered this when I upgraded some old code a couple of months ago – but I haven’t had time to figure out who this new one works yet
That’s useful information. Thanks.
Good post.