A Guide to Selenium Architecture and Basics

Table of Contents

Basics

Selenium remains one of the most popular tools for automating cross-browser web applications. It works seamlessly across major operating systems like Windows, Linux/Unix, and Mac, and supports all major browsers such as Chrome, Firefox, Edge, and Opera. Additionally, it supports various programming languages including C#, Java, JavaScript, Python, and Ruby, offering flexibility to write automation scripts in your preferred language.

Selenium Components

Selenium IDE

Selenium IDE is a browser extension primarily used for recording and replaying scripts. It supports Chrome, Firefox, and Edge, making it ideal for quick test creation and exploratory testing.

Selenium WebDriver

Selenium WebDriver enables you to create test scripts using your programming language of choice through language bindings. It follows the W3C WebDriver standard, which replaced the JSON Wire Protocol in Selenium 4. This ensures better compatibility and standardization across browsers. Selenium WebDriver also supports advanced features like WebDriver BiDi (Bidirectional) capabilities for network response handling and other interactions.

Architecture of Selenium WebDriver

Language Bindings

Language Bindings provide support for various programming languages to create Selenium scripts.

W3C Protocol

The W3C WebDriver standard is used for transferring data between a client and server on the web, ensuring seamless communication with browser drivers.

Browser Drivers

Browser Drivers act as servers that interact with the respective browsers. Each browser is implemented differently by various vendors, and browser drivers allow interaction without requiring knowledge of browser internals. For instance, ChromeDriver is needed to work with Chrome.

Typical Workflow

  1. Each Selenium command generates an HTTP request sent to the Browser Driver.
  2. The Browser Driver (HTTP Server) forwards the execution steps to the browser.
  3. The HTTP Server receives the execution status and returns it to the automation script.

Selenium Grid

Selenium Grid allows for running multiple tests in parallel on different browsers and operating systems. Selenium 4 introduced a more user-friendly UI and support for Kubernetes scaling, making it easier to manage large-scale testing environments.