A day with .Net

My day to day experince in .net

Screen scrapping in c#.net

Posted by vivekcek on August 20, 2009

Hi after some weeks i got some time to post an article.This time i am coming with some good article that made a lot of headache to me.I cant put my real application code here.Now come to my problem.It was on monday i am back to office after enjoying weekends at home.As soon as a reached my office my manager called me and assigned this work.As i said earlier i am working in some online airline reservation portal,normaly online bookings are performed via webservice’s like galelio,amadeus etc.But some low cost carriers does nt have such webservice.And they may have their on sites to sell their flights directly.In such cases we have to extract information from their sites and show it in our site also submit some information to their site via our site,the user of our site dont know from where we fetch
data.They can book flights on the provider airlines site via our websites UI.This technology is termed as screen scrapping which is not ethical in all means.

Ok now i will try to explain the concept with windows application that automatically log in to twitter.For efficient screen scrapping in .net applications we can use webrowser control avilable in System.Windows.Forms.This control can be used in asp.net by some
threading i will explain the concept in other post,it is very tricky.

STEPS

1.Put two text boxes,labels and a button in your form
1.Find the webbrowser control in your tool box and drag it to your form name it as “WBrowser”

align as shown below

blog

The webrowser controls Navigate(string URL) method is used navigate to a particular URL in our example it is http://twitter.com/login.

We can see the login page is rendered in our webbrowser control.After Login page is rendered fully the webbrowser control fires WBrowser_DocumentCompleted() event.So we can ensure that
a pge is fully loaded after the above event is fired and we can extract the HTML of the page only after full rendering.So after calling Navigate() method we have to wait in a loop untill document complete event is fired.

In the case of twitter for Login we have to provide username and password in their respective text boxes nad click on the submit button on twitter’s log in page.For that first of all we study the HTML of twitter page and find the ID’s of the above specified controls.And they are.

Usernametextbox–ID->”username_or_email”
Passwordtextbox–ID->”session[password]”
Submit Button—–>”signin_submit”

Now we have to set the value attribute of above textboxes from the values in our windows
form and then invoke the click attribute of submit button

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace TwitterTweet
{
    public partial class Form1 : Form
    {
        private bool DocCompleted = false;
        public string LoginUrl = "http://twitter.com/login";
        public Form1()
        {
            InitializeComponent();
        }

        private void WBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            this.DocCompleted = true;
        }

        private void Synchronize()
        {
            while (!this.DocCompleted)
            {
                Application.DoEvents();
            }
            this.DocCompleted = false;
        }

        private void ccBtnLogIn_Click(object sender, EventArgs e)
        {
            this.Login(this.ccTxtUid.Text.Trim(),this.ccTxtPwd.Text.Trim());
            this.groupBox1.Visible = false;
            
        }

        private void Login(string Uid,string Pwd)
        {
            this.WBrowser.Navigate(LoginUrl);
            this.Synchronize();
            HtmlElementCollection htmlCol = this.WBrowser.Document.GetElementsByTagName("input");
            foreach (HtmlElement el in htmlCol)
            {
                if (el.Id == "username_or_email")
                    el.SetAttribute("value", Uid);
                else if (el.Id == "session[password]")
                    el.SetAttribute("value", Pwd);
                else if (el.Id == "signin_submit")
                {
                    el.InvokeMember("click");
                    this.Synchronize();
                }

            }
        }

       

       
    }
}
Advertisements

2 Responses to “Screen scrapping in c#.net”

  1. Aravind.S said

    Hai Vivek,

    I am a GNIIT(SWE) student. I find your posts very useful and helps me to gain more knowledge. Please continue your venture.

  2. Asharaf said

    Good! keep writing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s