Monday, February 20, 2012

SSIS: How to Load search result from Search Engines?

This article is in continuation to my last post How to return search result in XML?

In case, you need to pull the search result in XML from the search engines, you can take the approach discussed in the above post.

I wanted to explore the same technique with SSIS. I created two tables– Table SearchXML stores the search key words we are searching for on internet. Table SearchXML_URL will store the searched key word and their respective URL(s) which it has retrieved from the Bing.

I have two search keywords in table SearchXML as shown in below screen.

In SSIS, I created following four variables ApplID, BingSearchURL, SearchKeyword and SourceType. These variables store the application id, URL for the Bing API search, Source type and Search keyword.

On the control flow, I added a Data Flow Task. Under Data Flow Task, I have following three components:

Read SearchXML Table for Search Keywords: This component is an OLE DB Source component and it reads the data from table SearchXML. It will reads the two keyword searches we have in the table (Chrome, Windows 8).

Return Search keyword in XML format: This is a script component. The script component is using the user defined variables. The script component has two output columns. XMLKeyWord and XMLURL. These columns will store the search keyword and the URL it returned from the Bing in XML format.

The code snippet inside the script component is following:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Xml;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
       string searchXMLURL;
        string strXMLURL, strXMLkeyword;

        public override void PreExecute()
        {
            base.PreExecute();
        }

        public override void PostExecute()
        {
            base.PostExecute();
        }

        public override void Input0_ProcessInputRow(Input0Buffer Row)
       {
        searchXMLURL = Variables.BingSearchURL + "?" + Variables.AppID + "&" + Variables.SearchKeyword + Row.SearchKeyword.ToString() + "&" + Variables.SourceType;
       XmlTextReader xmt = new XmlTextReader(searchXMLURL);
        while (xmt.Read())
        {
        if (xmt.Name == "web:DisplayUrl")
        {
        strXMLURL = xmt.ReadString();
        strXMLkeyword =Row.SearchKeyword.ToString();
        CreateNewOutputRows();
        }
        }
        }

        public override void CreateNewOutputRows()
        {
        Output0Buffer.AddRow();
        Output0Buffer.XMLKeyWord = strXMLkeyword;
        Output0Buffer.XMLURL = strXMLURL;
        }

       }

Insert in SearchXML_URL: This is an OLE DB Destination task and it simply stores the output of two columns (XMLKeyWord and XMLURL) into SQL table SearchXML_URL.

After setting up the components in SSIS, I run the SSIS package and it runs successfully and I got the live search results in my destination table.

Thanks for reading this post

Related Article:

Popular Posts

Real Time Web Analytics