I had an interesting case today where I needed to strip all that ugly Office HTML out of an html document. I found this tool for Office 2000 that supposedly did just that – strip all that ugly “MSO” stuff from the file. There was a problem though, this .exe installer wouldn’t open on my machine, because I don’t have Office 2000 installed, only Office 2003.
I found that this installer supposedly contains a utility called “MSFilter.exe” that will run as a stand alone .exe, and would batch convert html files, stripping the office XML. Some sites mentioned that if you install this tool on a machine with Office 2000, you could just copy the MSFilter.exe file to the computer without office 2000, and use the utility.
I don’t have such a system, but I figured out a way to extract these files from an installer. I’m not sure how often this will work, but the method I used here worked great to extract just the utility I needed from this installer.
Here’s what I did after downloading the MsoHtmf.exe file from MS.
Step 1: Extract CAB files from the .EXE installer file
I opened the .exe file in Visual Studio and peeked at the files resources. I noticed a binary node called “CABINET.” Right-Clicking on this node allowed me to export the contents to disk as a “BIN” file. I changed the extension to to “.CAB” and was able to open the file with WinZip
.

When I did this, I was left with two files inside of this CAB file, “Luncher2.exe” and “msohtmf2.msi”. Running this .MSI file gives the same “Cannot Install” error as before, since I didn’t have Office 2000 installed. I was sortof back at square 1. I needed to open an MSI file and extract the contents.
Step 2: Extract the CAB contents from the MSI file.
It turns out that there’s a little utility deep inside the Windows Installer SDK that will allow you to extract files from MSI installers called “Orcas.” I opened Orcas.exe and extract the “Cabs” item in the msi file by selecting the proper item and choosing “Tables..Export Tables”

This exported the contents of the cab file to disk. In this case the Cabs node contained all the files I needed to run this utility.

I extracted these files to disk, and could run the batch remove all that ugly code from my office HTML documents. I wanted to blog about this, because it’s useful to sometimes just extract one tool from an entire installer. I don’t recommend that you go around hacking installer files, but it’s nice to know how to do it, when you need to. ![Wink [;)]](/emoticons/emotion-5.gif)
-Brendan