Sponsored By Aspose - File Format APIs for .NET

Aspose are the market leader of .NET APIs for file business formats – natively work with DOCX, XLSX, PPT, PDF, MSG, MPP, images formats and many more!

Trying to access Windows Search from SQL Server: An Appeal

Had some time to review my search dilemma today and three billable hours later, I’m no further along in my task.

There is no shortage of documentation on the new Windows Search and I do so desperately want to try it out. Just not as a client.

To sum up my requirements, I have a search screen that combines meta data in a SQL database with a contents search. That is, the user types in a search word or phrase and/or selects meta data about the documents (e.g. country, document type, date, and so on and so forth).

At present, using Indexing Services, I am able to perform such a query within SQL Server with a single SQL Statement. That is what I would like to do with Windows Search.

Alas, all efforts to connect to Windows Search via SQL Server have failed. The nearest I’ve come is to find a couple of lost souls to commiserate with who are having the same problem. I’ve tried every possible combination of provider, datasource, and provstr I can fathom with the connection string, Provider=Search.CollatorDSO;Extended Properties=\"Application=Windows\", and have come up empty.

The issue appears to be with SQL Server as I am able to run pretty much every sample app under the sun and get search results back. I’ve even overcome my fear of C++ to read through some samples in the Search SDK.

So I’m appealing to you, generous reader(s), for some help. I know I’ve broken the unwritten rule of finding at least one workable solution, however hideous, before asking for help but the nature of Windows Search is such that any workable solution would be the one I glom onto.

Again, the criteria is relatively straightforward: I want to be able to search for documents (Word, PowerPoint, PDF, and Excel only for the moment) based on their contents as well as metadata.

I can do that with Windows Search now but it would involve retrieving a result set based on metadata, retrieving a second result set based on contents, then merging the two. Given the size of the repository (about 2600 documents), this is do-able but it’s the kind of bastardized union that perpetuates the hillbilly stereotype. And I’m trying to be more PC.

I would consider SharePoint only if someone can convince me it is an elegant solution that specifically meets these requirements. I am also open to a third-party component that indexes files and provides an API. But again, I can do this already with Windows Search. It would have to be something that can combine with SQL Server.

And if I don’t get a decent answer, then I’m gonna…well, swear a bit maybe but that’s probably it.

Kyle the Unthreatening

This entry was posted in Conscientious Coding. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Josh

    Here is a link doing it from code, but nothing from SQL Server that I have seen.

    http://www.thejoyofcode.com/Using_Windows_Search_in_your_applications.aspx

  • Kyle Baley

    @MSenn

    Not especially. The two questions I asked on the forums remain either unanswered or with “I’d like this too” responses. I have put this on hold for the time being but will report back once it becomes urgent again.

  • MSenn

    Kyle?
    Did you get information more about adding documents to the ‘Windows Search’ index?
    Manfred

  • http://www.ovalsquare.com Ted Jardine

    Another plug for Lucene.

  • Greg

    I was gonna say lucene as well but someone already has, consider it another endorsement.

  • Kyle Baley

    Sweet. Thanks! That looks even better assuming I can work out how to get the contents into the index somehow.

  • BrianW

    The technology at the foundation of Microsoft Sharepoint and Windows Search is Filtdump and iFilters. Filtdump is an exe that utilizes iFilters to extract the text from documents (word, excel, pdfs, etc). Best part is that adding support for additional file types is just a matter of finding an iFilter that supports it.

    You can either crawl the documents, call filtdump.exe, and insert the output (i.e. the text of the documents) into a SQL Server table or you can use SQL Server’s Full Text Index to do all this for you as it also uses iFilters under the hood (http://www.codeproject.com/KB/architecture/sqlfulltextindexing.aspx).

  • Kyle Baley

    Thanks Sean. That’s two endorsements I’ve had for Lucene.net which is enough to look into it in more detail. The part about having to extract the text from PDF documents and Office documents manually to insert into the index kind of troubles me but I like the idea of having the metadata and the contents in the same place.

  • http://geekswithblogs.net/scarpenter Sean Carpenter

    I think you should check out Lucene.Net (http://incubator.apache.org/projects/lucene.net.html). You can index and store the document content as well as the metadata and searches are very fast. The index file format is also compatible with the Java version, so anything that can read those indexes will be able to use yours as well (something like Luke (http://www.getopt.org/luke/) is very useful).