ASP.NET Triple Whammy
I got a phone call from client last week asking for some help using .NET to grab the contents of a web page full of links. Pulling down the content of a web site is pretty simple using the WebRequest and WebResponse objects. The following code demonstrates how to "scrape" the contents of a web page using ASP.NET
'-- Import a couple of assemblies
Imports System.IO
Imports System.Net
'-- And run this code
'-- open the channel to web site
Dim oReq As WebRequest = _
System.Net.HttpWebRequest.Create("http://www.dashpoint.com")
'-- get a response from the site
Dim oResp As WebResponse = oReq.GetResponse()
'-- attach the stream to a reader
Dim oSRead As New StreamReader(oResp.GetResponseStream)
'-- get the content
Dim cContent As String = oSRead.ReadToEnd
MessageBox.Show(cContent)
That was pretty simple. Now for the problems:
Problem 1: Attaching to HTTPS site with credentials
The site we were accessing was an HTTPS site. So we needed to login with username and password. How the heck do you do that. Its actually pretty simple.
If you want to connect to an HTTPS secured site you need to create "Credentials" to hand to the site. You do this by creating a System.Net.Credentials object and attaching it to the request object like so
'-- Create the credentials for HTTPS and
'-- attach them to the request object
Dim oCred As New _
System.Net.NetworkCredential(<<USERNAME>>, <<PASSWORD>>)
oReq.Credentials = oCred
Problem 2: Attaching to HTTPS site a bad/invalid certificate
Second problem was that the site we were connecting to had a "questionable" certificate. We received an error in our browser when trying to attach to this site via a browser. Like so:
This is more common than you would think. So how do you connect to an HTTPS site with a bad certificate? After a little research using Google we found code that discussed overriding the certificate policy by creating a class that implements the ICertificatePolicy interface. The code below demonstrates this class:
Imports System.Net
Public Class CertificateOverride
Implements ICertificatePolicy
Public Function CheckValidationResult(. . .)As Boolean _
Implements System.Net.ICertificatePolicy.CheckValidationResult
Return True
End Function
End Class
Basically this interface implements one function. When a bad certificate is found a value is handed to the CheckValidationResult and the type of error found is passed into the certificateProblem parameter. The list of possible values for this are as follows:
CertEXPIRED = 2148204801,
CertVALIDITYPERIODNESTING = 2148204802,
CertPATHLENCONST = 2148204804,
CertROLE = 2148204803,
CertCRITICAL = 2148204805,
CertPURPOSE = 2148204806,
CertISSUERCHAINING = 2148204807,
CertMALFORMED = 2148204808,
CertUNTRUSTEDROOT = 2148204809,
CertCHAINING = 2148204810,
CertREVOKED = 2148204812,
CertUNTRUSTEDTESTROOT = 2148204813,
CertREVOCATION_FAILURE = 2148204814,
CertCN_NO_MATCH = 2148204815,
CertWRONG_USAGE = 2148204816,
CertUNTRUSTEDCA = 2148204818
I found these values at the following blog:
http://www.codexchange.net/PreviewSnippet.aspx?SnippetID=d40708fc-4041-42b8-9016-f0ac96d14fce
Basically we trusted the site we were connected to so we defaulted the return value to true from this function regardless of the problem.
After creating this class we needed to override the certificate management in our code. So we added this code to the top of the function:
'-- over ride the bad certificate error
ServicePointManager.CertificatePolicy = New CertificateOverride
So now the complete example looks like this
'-- over ride the bad certificate error
ServicePointManager.CertificatePolicy = New CertificateOverride
'-- open the channel to web site
Dim oReq As WebRequest = _
System.Net.HttpWebRequest.Create("http://www.dashpoint.com")
'-- set the credentials for HTTPS
Dim oCred As New System.Net.NetworkCredential("", "")
oReq.Credentials = oCred
'-- get a response from the site
Dim oResp As WebResponse = oReq.GetResponse()
'-- attach the stream to a reader
Dim oSRead As New StreamReader(oResp.GetResponseStream)
'-- get the content
Dim cContent As String = oSRead.ReadToEnd
MessageBox.Show(cContent)
So now we have a comprehensive example of connecting to sites with bad(or good certs) and using credentials. ONe good thing is that the .NET Framework was capable of doing this every step of the way!
NOTE: In a comment someone pointed out that the CerticatePolicy interface I used was obsolete for the 2.0 Framework (My client still uses VS 2003 and 1.1 framework for there code). Did a little digging and found that this interface is now done via a callback function. the code and class for this are below