CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Brendan Tompkins [MVP]

Blog First. Ask Questions Later.

Extracting files from installer EXEs and MSIs

I had an interesting case today where I needed to strip all that ugly Office HTML out of an html document. I found this tool for Office 2000 that supposedly did just that – strip all that ugly “MSO” stuff from the file.  There was a problem though, this .exe installer wouldn’t open on my machine, because I don’t have Office 2000 installed, only Office 2003. 

I found that this installer supposedly contains a utility called “MSFilter.exe” that will run as a stand alone .exe, and would batch convert html files, stripping the office XML.   Some sites mentioned that if you install this tool on a machine with Office 2000, you could just copy the MSFilter.exe file to the computer without office 2000, and use the utility. 

I don’t have such a system, but I figured out a way to extract these files from an installer.  I’m not sure how often this will work, but the method I used here worked great to extract just the utility I needed from this installer.

Here’s what I did after downloading the MsoHtmf.exe file from MS.

Step 1: Extract CAB files from the .EXE installer file

I opened the .exe file in Visual Studio and peeked at the files resources.  I noticed a binary node called “CABINET.” Right-Clicking on this node allowed me to export the contents to disk as a “BIN” file.  I changed the extension to  to “.CAB” and was able to open the file with WinZip

.

When I did this, I was left with two files inside of this CAB file, “Luncher2.exe” and “msohtmf2.msi”.  Running this .MSI file gives the same “Cannot Install” error as before, since I didn’t have Office 2000 installed.  I was sortof back at square 1. I needed to open an MSI file and extract the contents. 

Step 2: Extract the CAB contents from the MSI file.

It turns out that there’s a little utility deep inside the Windows Installer SDK that will allow you to extract files from MSI installers called “Orcas.” I opened Orcas.exe and extract the “Cabs” item in the msi file by selecting the proper item and choosing “Tables..Export Tables”

This exported the contents of the cab file to disk.  In this case the Cabs node contained all the files I needed to run this utility.

 

I extracted these files to disk, and could run the batch remove all that ugly code from my office HTML documents.  I wanted to blog about this, because it’s useful to sometimes just extract one tool from an entire installer. I don’t recommend that you go around hacking installer files, but it’s nice to know how to do it, when you need to. Wink [;)]

-Brendan



Comments

Mike said:

Or you could simply download the four files already zipped up for you with installation instructions from:

www.yesyes.com/asp/ArticleDetail.asp?ArticleId=282
# April 4, 2006 10:52 PM

Jeff Atwood said:

I have a Word cleaning solution here:

http://www.codinghorror.com/blog/archives/000485.html

It's basically just a light console wrapper around some regular expressions, written in C#.
# April 10, 2006 1:47 PM

Dipesh said:

No Comments
# June 28, 2006 2:14 AM

Dipesh said:

No Comments
# June 28, 2006 2:16 AM

rob said:

I downloaded the Windows Installer SDK using the link provided above, and after isntalling it of course there is no orcas.exe.  Are you sure this is where you found it?

# September 6, 2006 4:00 PM

Brendan Tompkins said:

Um.. I think so!  Maybe they removed it?

# September 6, 2006 4:02 PM

rob said:

Aha.  It's inside yet another MSI after you install that SDK.  How convenient! :)

# September 6, 2006 4:09 PM

MEK said:

Are You sure it Orcas.exe ? There is orca.exe though. The problem is that I cannot export any cabs or other files either. It just makes some nonsense .idt export table files.

Probably becaus there is no "Cabs" to select, only item even close is CabinetDetail, which shows the .cabs, but how to extract them is the question.

I need a cab -file, it is inside e.g. Office 2003 installation msi named PRO11.msi, but how the f**k ti get it out of there ?

# April 13, 2008 5:10 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add

About Brendan Tompkins

Brendan has been programming with .NET since the first public beta and is owner and operator of Port Technology Services, a consultancy company providing .NET application development services to the Maritime industry. In July, 2007, he was awarded the Microsoft MVP award for ASP.NET. He's also a proud co-founder of failed .COM startup Intrinsigo, and has had a hand in the failure of numerous other businesses. He currently runs CodeBetter.Com and Devlicio.us, and lives in Norfolk, Virgina with his wife Tiara and son Ian.

View Brendan's profile on LinkedIn

Check out Devlicio.us!

Our Sponsors