How to Create Add-ons for Internet Explorer, Part 2

In the continuation of the article about creating add-ons for IE , we will show you how to access browser cookies and not force users to log in to websites and browser add-ons separately. Longer than you can send HTTP requests, send data to the server, and receive responses, all asynchronously without blocking threads. And also how to manipulate the content of a web page that is currently open in the browser using the IE core API.

Reading cookies

For most tasks on today’s web, you need to know the user identifier.

Browser add-on actions associated with a user profile require the user to be logged in to their profile. The user identifier is then stored in the browser cookie. The easiest way to get it is to read the file in which Internet Explorer stores the cookie. One entry in it looks something like this:

webauthtoken
w7tkNt...Q%3D%3D
moje.domena.cz/
1536
1882420992
30229098
2616903031
30155471
*

The algorithm first requests a cookie file of the desired domain and then looks for a token in it, which the server converts into a user identifier. The token value is stored under the webauthtoken key.

// read the token
String^ tokenValue;
String^ cookies = Path::Combine(Environment::GetFolderPath(Environment::SpecialFolder::ApplicationData), L"Microsoft\\Windows\\Cookies");
for each (String^ file in Directory::GetFiles(cookies, L"*@www.contoso[*.txt", SearchOption::TopDirectoryOnly)) {
    bool value = false;
    String^ cookieValue = nullptr;
    for each (String^ line in File::ReadLines(file)) {
        if (value) {
            tokenValue = line;
            break;
        } else if (line == _T("webauthtoken")) {
            value = true;
        }
    }
}

Depending on the uniqueness of the second-level domain name, you need to check the value for the value itself for the correct domain name. This is because the file name does not contain a top-level domain. If the domain name is reasonable, most of the time there is no problem. In some cases, however, the file name can be quite wild. It is therefore important to look in the folder to see how IE actually names the cookie file of your domain.

The server receives the token with the request and converts it into a user identifier. For example, if the server receives a token in POST data and uses a Windows Live ID, the processing looks like this:

string wlid;
var token = context.Request.Form["webauthtoken"];
var wll = new WindowsLiveLogin(true);
var user = wll.ProcessToken(token);
if (user != null) {
    wlid = user.Id;
} else {
    ...
    return;
}

A completely different situation occurs if your goal is to read the cookies of the page you have just loaded. An appropriate API is available for this:

void STDMETHODCALLTYPE CMyButton::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL) {
    HRESULT hr = S_OK;

    // Query for the IWebBrowser2 interface.
    CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;

    // Is this event associated with the top-level browser?
    if (spTempWebBrowser && m_spWebBrowser &&
        m_spWebBrowser.IsEqualObject(spTempWebBrowser)) {
        
        // Get the current document object from browser...
        CComPtr<IDispatch> spDispDoc;
        hr = m_spWebBrowser->get_Document(&spDispDoc);
        if (SUCCEEDED(hr)) {
            
            // ...and query for an HTML document.
            CComQIPtr<IHTMLDocument2> spHTMLDoc = spDispDoc;
            if (spHTMLDoc != NULL) {

                BSTR cookie;
                HRESULT hr = spHTMLDoc->get_cookie(&cookie);
                MessageBox(NULL, cookie, L"Cookie", 0);

            }
        }
    }
}

However, it has a major disadvantage. It consists in the fact that the page from the domain whose cookies we want to read must be loaded. The browser add-on on the www.contoso.com website would therefore only know the user’s profile when the user visits the site. Until then, the add-on would not work. Therefore, it is advisable to read a specific cookie file in order to obtain the user identifier.

If the add-on does not even find a cookie in the file, then the user is not logged in to the www.contoso.com page. In this case, one MessageBox is enough to prompt the user to log in to the site. The user usually understands what is required of him and can log in to the page. Compared to the add-on’s own login form, it is not stressed by a user interface that is still unknown to it. And the main advantage is that the form does not have to be programmed at all.

Sample script to download: Add-ons for IE - cookies

Communication with the server

Communication with the server can be just as important for a browser add-on as it is for a website. In this part, we will show you how to send a request to the server in a simple way using C++/CLI and how to receive a response. At the same time, we will explain how to do it asynchronously and not block threads.

Sending data to the server is easiest to do over .NET because you just need to split the code into three methods to make it asynchronous. First, you need to encode the Systém.Web library that provides HTTP communication. Libraries need to be referenced as little as possible. In the case of a browser helper object, loading them into memory delays its execution.

The first method, which is called whenever it is necessary to send some data to the server, is in charge of opening the request stream. For comfortable data processing, I chose to simulate an HTML form.

void MyNetClass::Exec() {
    try {
        // prepare web request
        HttpWebRequest^ request = static_cast<HttpWebRequest^>(
          WebRequest::Create(L"http://www.contoso.com/api.ashx"));
        request->Method = L"POST";
        request->ContentType = L"application/x-www-form-urlencoded";
        request->BeginGetRequestStream(gcnew AsyncCallback(GetRequestStreamCallback), request);
        
    } catch (System::Exception^ ex) {
        pin_ptr<const wchar_t> mgs = PtrToStringChars(ex->Message);
        MessageBox(NULL, mgs, _T("IE Toolbar Button"), 0);
    }
}

The request body contains each value. In this case, webauthtoken to identify the user and action to identify what the server is actually supposed to do with the request. The UrlEncode method modifies the string for use in the request body. It is not used for parameter action, because I assume that it will only take values defined within the organization.

First, the string data is created, which is then written into the stream. This is fine only for shorter strings (and clear examples of source code). However, sending a larger volume of data (such as file content) in this way is no longer okay. There you need to write individual pieces of data piece by piece directly into the stream.

void MyNetClass::GetRequestStreamCallback(IAsyncResult^ asynchronousResult) {
    try {
        HttpWebRequest^ request = static_cast<HttpWebRequest^>(asynchronousResult->AsyncState);
        Stream^ stream = request->EndGetRequestStream(asynchronousResult);

        // prepare data
        String^ data = L"webauthtoken=" + HttpUtility::UrlEncode(tokenValue) + L"&action= ActionName";
        array<unsigned char>^ post = Encoding::ASCII->GetBytes(data);
        stream->Write(post, 0, post->Length);
        stream->Close();

        // send the request
        request->BeginGetResponse(gcnew AsyncCallback(GetResponseCallback), request);

    } catch (System::Exception^ ex) {
        pin_ptr<const wchar_t> mgs = PtrToStringChars(ex->Message);
        MessageBox(NULL, mgs, _T("IE Toolbar Button"), 0);
    }
}

The last method is in charge of processing the server’s response. I have a habit of returning the string OK if everything went without error. Otherwise, I return the text of the error directly. If the server’s response contains data to be processed, it is appropriate to detect the error from the HTTP status code. But here it depends mainly on your habits.

void MyNetClass::GetResponseCallback(IAsyncResult^ asynchronousResult) {
    try {

        // read the response
        HttpWebRequest^ request = static_cast<HttpWebRequest^>(asynchronousResult->AsyncState);
        HttpWebResponse^ response = static_cast<HttpWebResponse^>(request->EndGetResponse(asynchronousResult));
        Stream^ stream = response->GetResponseStream();
        StreamReader^ reader = gcnew StreamReader(stream);
        String^ answer = reader->ReadToEnd();

        if (answer != L"OK") throw gcnew Exception(answer);

        stream->Close();
        reader->Close();
        response->Close();

    } catch (System::Exception^ ex) {
        pin_ptr<const wchar_t> mgs = PtrToStringChars(ex->Message);
        MessageBox(NULL, mgs, _T("IE Toolbar Button"), 0);
    }
}

However, it is possible to use a number of other options. For example, web services, Managed Extensibility Framework components, connections to SQL Server, devices connected to PCs, in short, the entire .NET Framework.

Sample script for download: IE Add-ons - Communication

DOM manipulation

Finally, we will show you how to access the website itself, manipulate its structure and change cascading styles. Because it is directly an IE core API, which can only be accessed using C++, the managed code cannot help us this time.

First, we'll show you how to make all images disappear when the page loads. To do this, you need to create a browser helper object. Again, it is called CMyButton, but only to make the source code samples compatible across the entire series. If a class is not registered as a browser extension at the same time, it has nothing to do with the button.

void STDMETHODCALLTYPE CMyButton::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL) {
    HRESULT hr = S_OK;

    // Query for the IWebBrowser2 interface.
    CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;

    // Is this event associated with the top-level browser?
    if (spTempWebBrowser && m_spWebBrowser &&
        m_spWebBrowser.IsEqualObject(spTempWebBrowser)) {
        
        // Get the current document object from browser...
        CComPtr<IDispatch> spDispDoc;
        hr = m_spWebBrowser->get_Document(&spDispDoc);
        if (SUCCEEDED(hr)) {
            
            // ...and query for an HTML document.
            CComQIPtr<IHTMLDocument2> spHTMLDoc = spDispDoc;
            if (spHTMLDoc != NULL) {
                
                // Finally, remove the images.
                RemoveImages(spHTMLDoc);
            }
        }
    }
}

Because the document can be loaded within and the document type does not have to be HTML or XHTML at all, several checks are performed first. Subsequently, the RemoveImages method is called, which will take care of the actual hiding of the images.

void CMyButton::RemoveImages(IHTMLDocument2* pDocument) {
    CComPtr<IHTMLElementCollection> spImages;

    // Get the collection of images from the DOM.
    HRESULT hr = pDocument->get_images(&spImages);
    if (hr == S_OK && spImages != NULL) {
        
        // Get the number of images in the collection.
        long cImages = 0;
        hr = spImages->get_length(&cImages);
        if (hr == S_OK && cImages > 0) {
            for (int i = 0; i < cImages; i++) {
                CComVariant svarItemIndex(i);
                CComVariant svarEmpty;
                CComPtr<IDispatch> spdispImage;

                // Get the image out of the collection by index.
                hr = spImages->item(svarItemIndex, svarEmpty, &spdispImage);
                if (hr == S_OK && spdispImage != NULL) {
                    // First, query for the generic HTML element interface...
                    CComQIPtr<IHTMLElement> spElement = spdispImage;

                    if (spElement) {
                        // ...then ask for the style interface.
                        CComPtr<IHTMLStyle> spStyle;
                        hr = spElement->get_style(&spStyle);

                        // Set display="none" to hide the image.
                        if (hr == S_OK && spStyle != NULL) {
                            static const CComBSTR sbstrNone(L"none");
                            spStyle->put_display(sbstrNone);
                        }
                    }
                }
            }
        }
    }
}

A document object model, or more commonly abbreviated DOM, is a structure that describes a web page. HTML or XHTML is first converted to DOM. This is then converted to a display tree, where all cascading styles are applied. Only then is it rendered. The DOM is represented by the IHTMLDocument2 interface.

The HTML document interface exposes the get_images method, which returns all images. It is then iterated through them, converting the individual items of the collection to the IHTMLElement interface. It is not necessary to access CSS through element attributes and work with CSS in text mode. Instead, you can use the IHTMLStyle interface, which allows you to set individual properties directly.

The IHTMLDocument2 interface has methods to return the active element, create a new one, and open the window. It provides events on keystrokes, mouse buttons, text markup, hovering over an element, and much more. Everything that is available from JavaScript is available. As the API expands with newer versions of IE, it is possible to look at the DOM through new interfaces. IE9 already has IHTMLDocument7.

And how many interfaces does Internet Explorer actually offer? The ninth version has over five hundred. Less than a hundred were added with the ninth version due to SVG. But for example, Canvas has only seven, XmlHttpRequest five. However, the vast majority of interfaces are for HTML. Of course, the full list of them and documentation is available on the MSDN Library.

Sample script for download: Add-ons for IE - DOM manipulation

Conclusion

If you've ever thought of a feature that would be worth having in your browser, but programming the entire browser isn't worth it, a browser extension is the easiest way to try something out. Internet Explorer doesn't just offer a window in which web pages are rendered (although that’s a huge benefit in itself), but it also provides a platform for processing web pages.

The article was written for Zdroják.