This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

WebDriver

WebDriver drives a browser natively; learn more about it.

WebDriver drives a browser natively, as a user would, either locally or on a remote machine using the Selenium server, marks a leap forward in terms of browser automation.

Selenium WebDriver refers to both the language bindings and the implementations of the individual browser controlling code. This is commonly referred to as just WebDriver.

Selenium WebDriver is a W3C Recommendation

  • WebDriver is designed as a simple and more concise programming interface.

  • WebDriver is a compact object-oriented API.

  • It drives the browser effectively.

1 - Getting started

If you are new to Selenium, we have a few resources that can help you get up to speed right away.

Selenium supports automation of all the major browsers in the market through the use of WebDriver. WebDriver is an API and protocol that defines a language-neutral interface for controlling the behaviour of web browsers. Each browser is backed by a specific WebDriver implementation, called a driver. The driver is the component responsible for delegating down to the browser, and handles communication to and from Selenium and the browser.

This separation is part of a conscious effort to have browser vendors take responsibility for the implementation for their browsers. Selenium makes use of these third party drivers where possible, but also provides its own drivers maintained by the project for the cases when this is not a reality.

The Selenium framework ties all of these pieces together through a user-facing interface that enables the different browser backends to be used transparently, enabling cross-browser and cross-platform automation.

Selenium setup is quite different from the setup of other commercial tools. Before you can start writing Selenium code, you have to install the language bindings libraries for your language of choice, the browser you want to use, and the driver for that browser.

Follow the links below to get up and going with Selenium WebDriver.

If you wish to start with a low-code/record and playback tool, please check Selenium IDE

Once you get things working, if you want to scale up your tests, check out the Selenium Grid.

1.1 - Install a Selenium library

Setting up the Selenium library for your favourite programming language.

First you need to install the Selenium bindings for your automation project. The installation process for libraries depends on the language you choose to use. Make sure you check the Selenium downloads page to make sure you are using the latest version.

Requirements by language

View the minimum supported Java version here.

Installation of Selenium libraries for Java is accomplished using a build tool.

Maven

Specify the dependencies in the project’s pom.xml file:

    <dependencies>
        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-java</artifactId>
            <version>${selenium.version}</version>

Gradle

Specify the dependency in the project build.gradle file as testImplementation:

    testImplementation 'org.seleniumhq.selenium:selenium-java:4.20.0'
    testImplementation 'org.junit.jupiter:junit-jupiter-engine:5.10.2'

The minimum supported Python version for each Selenium version can be found in Supported Python Versions on PyPi

There are a couple different ways to install Selenium.

Pip

pip install selenium

Download

Alternatively you can download the PyPI source archive (selenium-x.x.x.tar.gz) and install it using setup.py:

python setup.py install

Require in project

To use it in a project, add it to the requirements.txt file:

selenium==4.20.0

A list of all supported frameworks for each version of Selenium is available on Nuget

There are a few options for installing Selenium.

Packet Manager

Install-Package Selenium.WebDriver

.NET CLI

dotnet add package Selenium.WebDriver

CSProj

in the project’s csproj file, specify the dependency as a PackageReference in ItemGroup:

      <PackageReference Include="Selenium.WebDriver" Version="4.20.0" />

Additional considerations

Further items of note for using Visual Studio Code (vscode) and C#

Install the compatible .NET SDK as per the section above. Also install the vscode extensions (Ctrl-Shift-X) for C# and NuGet. Follow the instruction here to create and run the “Hello World” console project using C#. You may also create a NUnit starter project using the command line dotnet new NUnit. Make sure the file %appdata%\NuGet\nuget.config is configured properly as some developers reported that it will be empty due to some issues. If nuget.config is empty, or not configured properly, then .NET builds will fail for Selenium Projects. Add the following section to the file nuget.config if it is empty:

<configuration>
  <packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="nuget.org" value="https://www.nuget.org/api/v2/" />   
  </packageSources>
...

For more info about nuget.config click here. You may have to customize nuget.config to meet you needs.

Now, go back to vscode, press Ctrl-Shift-P, and type “NuGet Add Package”, and enter the required Selenium packages such as Selenium.WebDriver. Press Enter and select the version. Now you can use the examples in the documentation related to C# with vscode.

You can see the minimum required version of Ruby for any given Selenium version on rubygems.org

Selenium can be installed two different ways.

Install manually

gem install selenium-webdriver

Add to project’s gemfile

gem 'selenium-devtools', '= 0.124.0'

You can find the minimum required version of Node for any given version of Selenium in the Node Support Policy section on npmjs

Selenium is typically installed using npm.

Install locally

npm install selenium-webdriver

Add to project

In your project’s package.json, add requirement to dependencies:

        "mocha": "10.4.0"
Use the Java bindings for Kotlin.

Next Step

Create your first Selenium script

1.2 - Write your first Selenium script

Step-by-step instructions for constructing a Selenium script

Once you have Selenium installed, you’re ready to write Selenium code.

Eight Basic Components

Everything Selenium does is send the browser commands to do something or send requests for information. Most of what you’ll do with Selenium is a combination of these basic commands

Click on the link to “View full example on GitHub” to see the code in context.

1. Start the session

For more details on starting a session read our documentation on driver sessions

        WebDriver driver = new ChromeDriver();
driver = webdriver.Chrome()
        IWebDriver driver = new ChromeDriver();
driver = Selenium::WebDriver.for :chrome
    driver = await new Builder().forBrowser(Browser.CHROME).build();
        driver = ChromeDriver()

2. Take action on browser

In this example we are navigating to a web page.

        driver.get("https://www.selenium.dev/selenium/web/web-form.html");
driver.get("https://www.selenium.dev/selenium/web/web-form.html")
        driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/web-form.html");
driver.get('https://www.selenium.dev/selenium/web/web-form.html')
    await driver.get('https://www.selenium.dev/selenium/web/web-form.html');
        driver.get("https://www.selenium.dev/selenium/web/web-form.html")

3. Request browser information

There are a bunch of types of information about the browser you can request, including window handles, browser size / position, cookies, alerts, etc.

        driver.getTitle();
title = driver.title
        var title = driver.Title;
    let title = await driver.getTitle();
        val title = driver.title

4. Establish Waiting Strategy

Synchronizing the code with the current state of the browser is one of the biggest challenges with Selenium, and doing it well is an advanced topic.

Essentially you want to make sure that the element is on the page before you attempt to locate it and the element is in an interactable state before you attempt to interact with it.

An implicit wait is rarely the best solution, but it’s the easiest to demonstrate here, so we’ll use it as a placeholder.

Read more about Waiting strategies.

        driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500));
driver.implicitly_wait(0.5)
        driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromMilliseconds(500);
driver.manage.timeouts.implicit_wait = 500
    await driver.manage().setTimeouts({implicit: 500});
        driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500))

5. Find an element

The majority of commands in most Selenium sessions are element related, and you can’t interact with one without first finding an element

        WebElement textBox = driver.findElement(By.name("my-text"));
        WebElement submitButton = driver.findElement(By.cssSelector("button"));
text_box = driver.find_element(by=By.NAME, value="my-text")
submit_button = driver.find_element(by=By.CSS_SELECTOR, value="button")
        var textBox = driver.FindElement(By.Name("my-text"));
        var submitButton = driver.FindElement(By.TagName("button"));
text_box = driver.find_element(name: 'my-text')
submit_button = driver.find_element(tag_name: 'button')
    let textBox = await driver.findElement(By.name('my-text'));
    let submitButton = await driver.findElement(By.css('button'));
        var textBox = driver.findElement(By.name("my-text"))
        val submitButton = driver.findElement(By.cssSelector("button"))

6. Take action on element

There are only a handful of actions to take on an element, but you will use them frequently.

        textBox.sendKeys("Selenium");
        submitButton.click();
text_box.send_keys("Selenium")
submit_button.click()
        textBox.SendKeys("Selenium");
        submitButton.Click();
text_box.send_keys('Selenium')
submit_button.click
    await textBox.sendKeys('Selenium');
    await submitButton.click();
        textBox.sendKeys("Selenium")
        submitButton.click()

7. Request element information

Elements store a lot of information that can be requested.

        message.getText();
text = message.text
        var value = message.Text;
    let value = await message.getText();
        val value = message.getText()

8. End the session

This ends the driver process, which by default closes the browser as well. No more commands can be sent to this driver instance. See Quitting Sessions.

Running Selenium File

Next Steps

Most Selenium users execute many sessions and need to organize them to minimize duplication and keep the code more maintainable. Read on to learn about how to put this code into context for your use case with Using Selenium.

1.3 - Organizing and Executing Selenium Code

Scaling Selenium execution with an IDE and a Test Runner library

If you want to run more than a handful of one-off scripts, you need to be able to organize and work with your code. This page should give you ideas for how to actually do productive things with your Selenium code.

Common Uses

Most people use Selenium to execute automated tests for web applications, but Selenium supports any use case of browser automation.

Repetitive Tasks

Perhaps you need to log into a website and download something, or submit a form. You can create a Selenium script to run with a service at preset times.

Web Scraping

Are you looking to collect data from a site that doesn’t have an API? Selenium will let you do this, but please make sure you are familiar with the website’s terms of service as some websites do not permit it and others will even block Selenium.

Testing

Running Selenium for testing requires making assertions on actions taken by Selenium. So a good assertion library is required. Additional features to provide structure for tests require use of Test Runner.

IDEs

Regardless of how you use Selenium code, you won’t be very effective writing or executing it without a good Integrated Developer Environment. Here are some common options…

Test Runner

Even if you aren’t using Selenium for testing, if you have advanced use cases, it might make sense to use a test runner to better organize your code. Being able to use before/after hooks and run things in groups or in parallel can be very useful.

Choosing

There are many different test runners available.

All the code examples in this documentation can be found in (or is being moved to) our example directories that use test runners and get executed every release to ensure all the code is correct and updated. Here is a list of test runners with links. The first item is the one that is used by this repository and the one that will be used for all examples on this page.

  • JUnit - A widely-used testing framework for Java-based Selenium tests.
  • TestNG - Offers extra features like parallel test execution and parameterized tests.
  • pytest - A preferred choice for many, thanks to its simplicity and powerful plugins.
  • unittest - Python’s standard library testing framework.
  • NUnit - A popular unit-testing framework for .NET.
  • MS Test - Microsoft’s own unit testing framework.
  • RSpec - The most widely used testing library for running Selenium tests in Ruby.
  • Minitest - A lightweight testing framework that comes with Ruby standard library.
  • Jest - Primarily known as a testing framework for React, it can also be used for Selenium tests.
  • Mocha - The most common JS library for running Selenium tests.

Installing

This is very similar to what was required in Install a Selenium Library. This code is only showing examples for what is being used in our Documentation Examples project.

Maven

Gradle

To use it in a project, add it to the requirements.txt file:

in the project’s csproj file, specify the dependency as a PackageReference in ItemGroup:

Add to project’s gemfile

In your project’s package.json, add requirement to dependencies:

Asserting

		String title = driver.getTitle();
		assertEquals("Web form", title);
    expect(value).to eq('Received!')

Setting Up and Tearing Down

Set Up

	@BeforeEach
	public void setup() {
		driver = new ChromeDriver();
	}

Tear Down

	@AfterEach
	public void teardown() {
		driver.quit();
	}

Set Up

  before do
    @driver = Selenium::WebDriver.for :chrome
  end

Tear Down

  config.after { @driver&.quit }

Executing

Maven

mvn clean test

Gradle

gradle clean test
mocha runningTests.spec.js

Examples

In First script, we saw each of the components of a Selenium script. Here’s an example of that code using a test runner:

package dev.selenium.getting_started;

import static org.junit.jupiter.api.Assertions.assertEquals;

import java.time.Duration;

import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class UsingSeleniumTest {

	WebDriver driver;

	@BeforeEach
	public void setup() {
		driver = new ChromeDriver();
	}

	@Test
	public void eightComponents() {

		driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500));
		driver.get("https://www.selenium.dev/selenium/web/web-form.html");

		String title = driver.getTitle();
		assertEquals("Web form", title);

		WebElement textBox = driver.findElement(By.name("my-text"));
		WebElement submitButton = driver.findElement(By.cssSelector("button"));

		textBox.sendKeys("Selenium");
		submitButton.click();

		WebElement message = driver.findElement(By.id("message"));
		String value = message.getText();
		assertEquals("Received!", value);

	}

	@AfterEach
	public void teardown() {
		driver.quit();
	}

}
from selenium import webdriver
from selenium.webdriver.common.by import By


def test_eight_components():
    driver = webdriver.Chrome()

    driver.get("https://www.selenium.dev/selenium/web/web-form.html")

    title = driver.title
    assert title == "Web form"

    driver.implicitly_wait(0.5)

    text_box = driver.find_element(by=By.NAME, value="my-text")
    submit_button = driver.find_element(by=By.CSS_SELECTOR, value="button")

    text_box.send_keys("Selenium")
    submit_button.click()

    message = driver.find_element(by=By.ID, value="message")
    value = message.text
    assert value == "Received!"

    driver.quit()
using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace SeleniumDocs.GettingStarted
{
    [TestClass]
    public class UsingSeleniumTest
    {

        [TestMethod]
        public void EightComponents()
        {
            IWebDriver driver = new ChromeDriver();

            driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/web-form.html");

            var title = driver.Title;
            Assert.AreEqual("Web form", title);

            driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromMilliseconds(500);

            var textBox = driver.FindElement(By.Name("my-text"));
            var submitButton = driver.FindElement(By.TagName("button"));
            
            textBox.SendKeys("Selenium");
            submitButton.Click();
            
            var message = driver.FindElement(By.Id("message"));
            var value = message.Text;
            Assert.AreEqual("Received!", value);
            
            driver.Quit();
        }
    }
}
# frozen_string_literal: true
require 'spec_helper'
require 'selenium-webdriver'


RSpec.describe 'Using Selenium' do
  before do
    @driver = Selenium::WebDriver.for :chrome
  end

  it 'uses eight components' do
    @driver.get('https://www.selenium.dev/selenium/web/web-form.html')

    title = @driver.title
    expect(title).to eq('Web form')

    @driver.manage.timeouts.implicit_wait = 500

    text_box = @driver.find_element(name: 'my-text')
    submit_button = @driver.find_element(tag_name: 'button')

    text_box.send_keys('Selenium')
    submit_button.click

    message = @driver.find_element(id: 'message')
    value = message.text
    expect(value).to eq('Received!')
  end
end
const {By, Builder} = require('selenium-webdriver');
const assert = require("assert");

  describe('First script', function () {
    let driver;
    
    before(async function () {
      driver = await new Builder().forBrowser('chrome').build();
    });
    
    it('First Selenium script with mocha', async function () {
      await driver.get('https://www.selenium.dev/selenium/web/web-form.html');
      
      let title = await driver.getTitle();
      assert.equal("Web form", title);
      
      await driver.manage().setTimeouts({implicit: 500});
      
      let textBox = await driver.findElement(By.name('my-text'));
      let submitButton = await driver.findElement(By.css('button'));
      
      await textBox.sendKeys('Selenium');
      await submitButton.click();
      
      let message = await driver.findElement(By.id('message'));
      let value = await message.getText();
      assert.equal("Received!", value);
    });
  
    after(async () => await driver.quit());
  });

Next Steps

Take what you’ve learned and build out your Selenium code!

As you find more functionality that you need, read up on the rest of our WebDriver documentation.

2 - Driver Sessions

Starting and stopping a session is for opening and closing a browser.

Creating Sessions

Creating a new session corresponds with the W3C command for New session

The session is created automatically by initializing a new Driver class object.

Each language allows a session to be created with arguments from one of these classes (or equivalent):

  • Options to describe the kind of session you want; default values are used for local, but this is required for remote
  • Some form of HTTP Client configuration (the implementation varies between languages)
  • Listeners

Local Driver

The primary unique argument for starting a local driver includes information about starting the required driver service on the local machine.

  • Service object applies only to local drivers and provides information about the browser driver

Remote Driver

The primary unique argument for starting a remote driver includes information about where to execute the code. Read the details in the Remote Driver Section

Quitting Sessions

Quitting a session corresponds to W3C command for Deleting a Session.

Important note: the quit method is different from the close method, and it is recommended to always use quit to end the session

2.1 - Browser Options

These capabilities are shared by all browsers.

In Selenium 3, capabilities were defined in a session by using Desired Capabilities classes. As of Selenium 4, you must use the browser options classes. For remote driver sessions, a browser options instance is required as it determines which browser will be used.

These options are described in the w3c specification for Capabilities.

Each browser has custom options that may be defined in addition to the ones defined in the specification.

browserName

Browser name is set by default when using an Options class instance.

browserVersion

This capability is optional, this is used to set the available browser version at remote end. In recent versions of Selenium, if the version is not found on the system, it will be automatically downloaded by Selenium Manager

pageLoadStrategy

Three types of page load strategies are available.

The page load strategy queries the document.readyState as described in the table below:

StrategyReady StateNotes
normalcompleteUsed by default, waits for all resources to download
eagerinteractiveDOM access is ready, but other resources like images may still be loading
noneAnyDoes not block WebDriver at all

The document.readyState property of a document describes the loading state of the current document.

When navigating to a new page via URL, by default, WebDriver will hold off on completing a navigation method (e.g., driver.navigate().get()) until the document ready state is complete. This does not necessarily mean that the page has finished loading, especially for sites like Single Page Applications that use JavaScript to dynamically load content after the Ready State returns complete. Note also that this behavior does not apply to navigation that is a result of clicking an element or submitting a form.

If a page takes a long time to load as a result of downloading assets (e.g., images, css, js) that aren’t important to the automation, you can change from the default parameter of normal to eager or none to speed up the session. This value applies to the entire session, so make sure that your waiting strategy is sufficient to minimize flakiness.

normal (default)

WebDriver waits until the load event fire is returned.

Move Code

    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.setPageLoadStrategy(PageLoadStrategy.NORMAL);
    WebDriver driver = new ChromeDriver(chromeOptions);
    options.page_load_strategy = 'normal'
    driver = webdriver.Chrome(options=options)
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace pageLoadStrategy {
  class pageLoadStrategy {
    public static void Main(string[] args) {
      var chromeOptions = new ChromeOptions();
      chromeOptions.PageLoadStrategy = PageLoadStrategy.Normal;
      IWebDriver driver = new ChromeDriver(chromeOptions);
      try {
        driver.Navigate().GoToUrl("https://example.com");
      } finally {
        driver.Quit();
      }
    }
  }
}
      options = Selenium::WebDriver::Options.chrome
      options.page_load_strategy = :normal
    it('Navigate using normal page loading strategy', async function () {
      let driver = await env
        .builder()
        .setChromeOptions(options.setPageLoadStrategy('normal'))
        .build();

      await driver.get('https://www.selenium.dev/selenium/web/blank.html');
import org.openqa.selenium.PageLoadStrategy
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

fun main() {
  val chromeOptions = ChromeOptions()
  chromeOptions.setPageLoadStrategy(PageLoadStrategy.NORMAL)
  val driver = ChromeDriver(chromeOptions)
  try {
    driver.get("https://www.google.com")
  }
  finally {
    driver.quit()
  }
}

eager

WebDriver waits until DOMContentLoaded event fire is returned.

Move Code

    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.setPageLoadStrategy(PageLoadStrategy.EAGER);
    WebDriver driver = new ChromeDriver(chromeOptions);
    options.page_load_strategy = 'eager'
    driver = webdriver.Chrome(options=options)
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace pageLoadStrategy {
  class pageLoadStrategy {
    public static void Main(string[] args) {
      var chromeOptions = new ChromeOptions();
      chromeOptions.PageLoadStrategy = PageLoadStrategy.Eager;
      IWebDriver driver = new ChromeDriver(chromeOptions);
      try {
        driver.Navigate().GoToUrl("https://example.com");
      } finally {
        driver.Quit();
      }
    }
  }
}
      options = Selenium::WebDriver::Options.chrome
      options.page_load_strategy = :eager
    it('Navigate using eager page loading strategy', async function () {
      let driver = await env
        .builder()
        .setChromeOptions(options.setPageLoadStrategy('eager'))
        .build();

      await driver.get('https://www.selenium.dev/selenium/web/blank.html');
import org.openqa.selenium.PageLoadStrategy
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

fun main() {
  val chromeOptions = ChromeOptions()
  chromeOptions.setPageLoadStrategy(PageLoadStrategy.EAGER)
  val driver = ChromeDriver(chromeOptions)
  try {
    driver.get("https://www.google.com")
  }
  finally {
    driver.quit()
  }
}

none

WebDriver only waits until the initial page is downloaded.

Move Code

    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.setPageLoadStrategy(PageLoadStrategy.NONE);
    WebDriver driver = new ChromeDriver(chromeOptions);
    options.page_load_strategy = 'none'
    driver = webdriver.Chrome(options=options)
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace pageLoadStrategy {
  class pageLoadStrategy {
    public static void Main(string[] args) {
      var chromeOptions = new ChromeOptions();
      chromeOptions.PageLoadStrategy = PageLoadStrategy.None;
      IWebDriver driver = new ChromeDriver(chromeOptions);
      try {
        driver.Navigate().GoToUrl("https://example.com");
      } finally {
        driver.Quit();
      }
    }
  }
}
      options = Selenium::WebDriver::Options.chrome
      options.page_load_strategy = :none
    it('Navigate using none page loading strategy', async function () {
      let driver = await env
        .builder()
        .setChromeOptions(options.setPageLoadStrategy('none'))
        .build();

      await driver.get('https://www.selenium.dev/selenium/web/blank.html');
import org.openqa.selenium.PageLoadStrategy
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

fun main() {
  val chromeOptions = ChromeOptions()
  chromeOptions.setPageLoadStrategy(PageLoadStrategy.NONE)
  val driver = ChromeDriver(chromeOptions)
  try {
    driver.get("https://www.google.com")
  }
  finally {
    driver.quit()
  }
}

platformName

This identifies the operating system at the remote-end, fetching the platformName returns the OS name.

In cloud-based providers, setting platformName sets the OS at the remote-end.

      options = Selenium::WebDriver::Options.firefox
      options.platform_name = 'Windows 10'

acceptInsecureCerts

This capability checks whether an expired (or) invalid TLS Certificate is used while navigating during a session.

If the capability is set to false, an insecure certificate error will be returned as navigation encounters any domain certificate problems. If set to true, invalid certificate will be trusted by the browser.

All self-signed certificates will be trusted by this capability by default. Once set, acceptInsecureCerts capability will have an effect for the entire session.

      options = Selenium::WebDriver::Options.chrome
      options.accept_insecure_certs = true

timeouts

A WebDriver session is imposed with a certain session timeout interval, during which the user can control the behaviour of executing scripts or retrieving information from the browser.

Each session timeout is configured with combination of different timeouts as described below:

Script Timeout

Specifies when to interrupt an executing script in a current browsing context. The default timeout 30,000 is imposed when a new session is created by WebDriver.

      options = Selenium::WebDriver::Options.chrome
      options.timeouts = {script: 40_000}

Page Load Timeout

Specifies the time interval in which web page needs to be loaded in a current browsing context. The default timeout 300,000 is imposed when a new session is created by WebDriver. If page load limits a given/default time frame, the script will be stopped by TimeoutException.

      options = Selenium::WebDriver::Options.chrome
      options.timeouts = {page_load: 400_000}

Implicit Wait Timeout

This specifies the time to wait for the implicit element location strategy when locating elements. The default timeout 0 is imposed when a new session is created by WebDriver.

      options = Selenium::WebDriver::Options.chrome
      options.timeouts = {implicit: 1}

unhandledPromptBehavior

Specifies the state of current session’s user prompt handler. Defaults to dismiss and notify state

User Prompt Handler

This defines what action must take when a user prompt encounters at the remote-end. This is defined by unhandledPromptBehavior capability and has the following states:

  • dismiss
  • accept
  • dismiss and notify
  • accept and notify
  • ignore
      options = Selenium::WebDriver::Options.chrome
      options.unhandled_prompt_behavior = :accept

setWindowRect

Indicates whether the remote end supports all of the resizing and repositioning commands.

      options = Selenium::WebDriver::Options.firefox
      options.set_window_rect = true

strictFileInteractability

This new capability indicates if strict interactability checks should be applied to input type=file elements. As strict interactability checks are off by default, there is a change in behaviour when using Element Send Keys with hidden file upload controls.

      options = Selenium::WebDriver::Options.chrome
      options.strict_file_interactability = true

proxy

A proxy server acts as an intermediary for requests between a client and a server. In simple terms, the traffic flows through the proxy server on its way to the address you requested and back.

A proxy server for automation scripts with Selenium could be helpful for:

  • Capture network traffic
  • Mock backend calls made by the website
  • Access the required website under complex network topologies or strict corporate restrictions/policies.

If you are in a corporate environment, and a browser fails to connect to a URL, this is most likely because the environment needs a proxy to be accessed.

Selenium WebDriver provides a way to proxy settings:

Move Code

import org.openqa.selenium.Proxy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;

public class ProxyTest {
  public static void main(String[] args) {
    Proxy proxy = new Proxy();
    proxy.setHttpProxy("<HOST:PORT>");
    ChromeOptions options = new ChromeOptions();
    options.setCapability("proxy", proxy);
    WebDriver driver = new ChromeDriver(options);
    driver.get("https://www.google.com/");
    driver.manage().window().maximize();
    driver.quit();
  }
}
from selenium import webdriver

PROXY = "<HOST:PORT>"
webdriver.DesiredCapabilities.FIREFOX['proxy'] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY,
"proxyType": "MANUAL",

}

with webdriver.Firefox() as driver:
    driver.get("https://selenium.dev")
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

public class ProxyTest{
public static void Main() {
ChromeOptions options = new ChromeOptions();
Proxy proxy = new Proxy();
proxy.Kind = ProxyKind.Manual;
proxy.IsAutoDetect = false;
proxy.SslProxy = "<HOST:PORT>";
options.Proxy = proxy;
options.AddArgument("ignore-certificate-errors");
IWebDriver driver = new ChromeDriver(options);
driver.Navigate().GoToUrl("https://www.selenium.dev/");
}
}
      options = Selenium::WebDriver::Options.chrome
      options.proxy = Selenium::WebDriver::Proxy.new(http: 'myproxy.com:8080')
let webdriver = require('selenium-webdriver');
let chrome = require('selenium-webdriver/chrome');
let proxy = require('selenium-webdriver/proxy');
let opts = new chrome.Options();

(async function example() {
opts.setProxy(proxy.manual({http: '<HOST:PORT>'}));
let driver = new webdriver.Builder()
.forBrowser('chrome')
.setChromeOptions(opts)
.build();
try {
await driver.get("https://selenium.dev");
}
finally {
await driver.quit();
}
}());
import org.openqa.selenium.Proxy
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

class proxyTest {
fun main() {

        val proxy = Proxy()
        proxy.setHttpProxy("<HOST:PORT>")
        val options = ChromeOptions()
        options.setCapability("proxy", proxy)
        val driver: WebDriver = ChromeDriver(options)
        driver["https://www.google.com/"]
        driver.manage().window().maximize()
        driver.quit()
    }
}

2.2 - HTTP Client Configuration

These allow you to set various parameters for the HTTP library

2.3 - Driver Service Class

The Service classes are for managing the starting and stopping of local drivers. They cannot be used with a Remote WebDriver session.

Service classes allow you to specify information about the driver, like location and which port to use. They also let you specify what arguments get passed to the command line. Most of the useful arguments are related to logging.

Default Service instance

To start a driver with a default service instance:

    driver = new ChromeDriver(service);
  }

Note: Java Service classes only allow values to be set during construction with a Builder pattern.

Selenium v4.11

    service = webdriver.ChromeService()
    driver = webdriver.Chrome(service=service)

Note: Python Service classes only allow values to be set as arguments to the constructor.

            var service = ChromeDriverService.CreateDefaultService();
            driver = new ChromeDriver(service);

Note: .NET Service classes allow values to be set as properties.

    service = Selenium::WebDriver::Service.chrome
    @driver = Selenium::WebDriver.for :chrome, service: service

Note: Ruby Service classes allow values to be set either as arguments in the constructor or as attributes.

Driver location

Note: If you are using Selenium 4.6 or greater, you shouldn’t need to set a driver location. If you cannot update Selenium or have an advanced use case, here is how to specify the driver location:

        new ChromeDriverService.Builder().usingDriverExecutable(driverPath).build();

Selenium v4.11

    service = webdriver.ChromeService(executable_path=chromedriver_bin)

Selenium v4.9

            var service = ChromeDriverService.CreateDefaultService(GetDriverLocation(options));

Selenium v4.8

    service.executable_path = driver_path

Driver port

If you want the driver to run on a specific port, you may specify it as follows:

Logging

Logging functionality varies between browsers. Most browsers allow you to specify location and level of logs. Take a look at the respective browser page:

2.4 - Remote WebDriver

Selenium lets you automate browsers on remote computers if there is a Selenium Grid running on them. The computer that executes the code is referred to as the client computer, and the computer with the browser and driver is referred to as the remote computer or sometimes as an end-node. To direct Selenium tests to the remote computer, you need to use a Remote WebDriver class and pass the URL including the port of the grid on that machine. Please see the grid documentation for all the various ways the grid can be configured.

Basic Example

The driver needs to know where to send commands to and which browser to start on the Remote computer. So an address and an options instance are both required.

    ChromeOptions options = new ChromeOptions();
    driver = new RemoteWebDriver(gridUrl, options);
    options = webdriver.ChromeOptions()
    driver = webdriver.Remote(command_executor=server, options=options)
            var options = new ChromeOptions();
            driver = new RemoteWebDriver(GridUrl, options);
    options = Selenium::WebDriver::Options.chrome
    driver = Selenium::WebDriver.for :remote, url: grid_url, options: options

Uploads

Uploading a file is more complicated for Remote WebDriver sessions because the file you want to upload is likely on the computer executing the code, but the driver on the remote computer is looking for the provided path on its local file system. The solution is to use a Local File Detector. When one is set, Selenium will bundle the file, and send it to the remote machine, so the driver can see the reference to it. Some bindings include a basic local file detector by default, and all of them allow for a custom file detector.

Java does not include a Local File Detector by default, so you must always add one to do uploads.
    ((RemoteWebDriver) driver).setFileDetector(new LocalFileDetector());
    WebElement fileInput = driver.findElement(By.cssSelector("input[type=file]"));
    fileInput.sendKeys(uploadFile.getAbsolutePath());
    driver.findElement(By.id("file-submit")).click();

Python adds a local file detector to remote webdriver instances by default, but you can also create your own class.

    driver.file_detector = LocalFileDetector()
    file_input = driver.find_element(By.CSS_SELECTOR, "input[type='file']")
    file_input.send_keys(upload_file)
    driver.find_element(By.ID, "file-submit").click()
.NET adds a local file detector to remote webdriver instances by default, but you can also create your own class.
            ((RemoteWebDriver)driver).FileDetector = new LocalFileDetector();
            IWebElement fileInput = driver.FindElement(By.CssSelector("input[type=file]"));
            fileInput.SendKeys(uploadFile);
            driver.FindElement(By.Id("file-submit")).Click();
Ruby adds a local file detector to remote webdriver instances by default, but you can also create your own lambda:
    driver.file_detector = ->((filename, *)) { filename.include?('selenium') && filename }
    file_input = driver.find_element(css: 'input[type=file]')
    file_input.send_keys(upload_file)
    driver.find_element(id: 'file-submit').click

Downloads

Chrome, Edge and Firefox each allow you to set the location of the download directory. When you do this on a remote computer, though, the location is on the remote computer’s local file system. Selenium allows you to enable downloads to get these files onto the client computer.

Enable Downloads in the Grid

Regardless of the client, when starting the grid in node or standalone mode, you must add the flag:

--enable-managed-downloads true

Enable Downloads in the Client

The grid uses the se:downloadsEnabled capability to toggle whether to be responsible for managing the browser location. Each of the bindings have a method in the options class to set this.

    ChromeOptions options = new ChromeOptions();
    options.setEnableDownloads(true);
    driver = new RemoteWebDriver(gridUrl, options);
    options = webdriver.ChromeOptions()
    options.enable_downloads = True
    driver = webdriver.Remote(command_executor=server, options=options)
            ChromeOptions options = new ChromeOptions
            {
                EnableDownloads = true
            };

            driver = new RemoteWebDriver(GridUrl, options);
    options = Selenium::WebDriver::Options.chrome(enable_downloads: true)
    driver = Selenium::WebDriver.for :remote, url: grid_url, options: options

List Downloadable Files

Be aware that Selenium is not waiting for files to finish downloading, so the list is an immediate snapshot of what file names are currently in the directory for the given session.

    List<String> files = ((HasDownloads) driver).getDownloadableFiles();
    files = driver.get_downloadable_files()
            IReadOnlyList<string> names = ((RemoteWebDriver)driver).GetDownloadableFiles();
    files = driver.downloadable_files

Download a File

Selenium looks for the name of the provided file in the list and downloads it to the provided target directory.

    ((HasDownloads) driver).downloadFile(downloadableFile, targetDirectory);
    driver.download_file(downloadable_file, target_directory)
    driver.download_file(downloadable_file, target_directory)

Delete Downloaded Files

By default, the download directory is deleted at the end of the applicable session, but you can also delete all files during the session.

    ((HasDownloads) driver).deleteDownloadableFiles();
    driver.delete_downloadable_files()
            ((RemoteWebDriver)driver).DeleteDownloadableFiles();
    driver.delete_downloadable_files

Browser specific functionalities

Each browser has implemented special functionality that is available only to that browser. Each of the Selenium bindings has implemented a different way to use those features in a Remote Session

Java requires you to use the Augmenter class, which allows it to automatically pull in implementations for all interfaces that match the capabilities used with the RemoteWebDriver

    driver = new Augmenter().augment(driver);

Of interest, using the RemoteWebDriverBuilder automatically augments the driver, so it is a great way to get all the functionality by default:

        RemoteWebDriver.builder()
            .address(gridUrl)
            .oneOf(new ChromeOptions())
            .setCapability("ext:options", Map.of("key", "value"))
            .config(ClientConfig.defaultConfig())
            .build();
.NET uses a custom command executor for executing commands that are valid for the given browser in the remote driver.
            var customCommandDriver = driver as ICustomDriverCommandExecutor;
            customCommandDriver.RegisterCustomDriverCommands(FirefoxDriver.CustomCommandDefinitions);

            var screenshotResponse = customCommandDriver
                .ExecuteCustomDriverCommand(FirefoxDriver.GetFullPageScreenshotCommand, null);
Ruby uses mixins to add applicable browser specific methods to the Remote WebDriver session; the methods should always just work for you.

Tracing client requests

This feature is only available for Java client binding (Beta onwards). The Remote WebDriver client sends requests to the Selenium Grid server, which passes them to the WebDriver. Tracing should be enabled at the server and client-side to trace the HTTP requests end-to-end. Both ends should have a trace exporter setup pointing to the visualization framework. By default, tracing is enabled for both client and server. To set up the visualization framework Jaeger UI and Selenium Grid 4, please refer to Tracing Setup for the desired version.

For client-side setup, follow the steps below.

Add the required dependencies

Installation of external libraries for tracing exporter can be done using Maven. Add the opentelemetry-exporter-jaeger and grpc-netty dependency in your project pom.xml:

  <dependency>
      <groupId>io.opentelemetry</groupId>
      <artifactId>opentelemetry-exporter-jaeger</artifactId>
      <version>1.0.0</version>
    </dependency>
    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-netty</artifactId>
      <version>1.35.0</version>
    </dependency>

Add/pass the required system properties while running the client

System.setProperty("otel.traces.exporter", "jaeger");
System.setProperty("otel.exporter.jaeger.endpoint", "http://localhost:14250");
System.setProperty("otel.resource.attributes", "service.name=selenium-java-client");

ImmutableCapabilities capabilities = new ImmutableCapabilities("browserName", "chrome");

WebDriver driver = new RemoteWebDriver(new URL("http://www.example.com"), capabilities);

driver.get("http://www.google.com");

driver.quit();

  

Please refer to Tracing Setup for more information on external dependencies versions required for the desired Selenium version.

More information can be found at:

3 - Supported Browsers

Each browser has custom capabilities and unique features.

3.1 - Chrome specific functionality

These are capabilities and features specific to Google Chrome browsers.

By default, Selenium 4 is compatible with Chrome v75 and greater. Note that the version of the Chrome browser and the version of chromedriver must match the major version.

Options

Capabilities common to all browsers are described on the Options page.

Capabilities unique to Chrome and Chromium are documented at Google’s page for Capabilities & ChromeOptions

Starting a Chrome session with basic defined options looks like this:

    driver = new ChromeDriver(options);
  }
    options = webdriver.ChromeOptions()
    driver = webdriver.Chrome(options=options)
            var options = new ChromeOptions();
            driver = new ChromeDriver(options);
      options = Selenium::WebDriver::Options.chrome
      @driver = Selenium::WebDriver.for :chrome, options: options
      const Options = new Chrome.Options();
      let driver = await env
        .builder()
        .setChromeOptions(Options)
        .build();

Arguments

The args parameter is for a list of command line switches to be used when starting the browser. There are two excellent resources for investigating these arguments:

Commonly used args include --start-maximized, --headless=new and --user-data-dir=...

Add an argument to options:

    options.add_argument("--start-maximized")
            options.AddArgument("--start-maximized");
      options.args << '--start-maximized'
      let driver = await env
        .builder()
        .setChromeOptions(options.addArguments('--headless=new'))
        .build();

Start browser in a specified location

The binary parameter takes the path of an alternate location of browser to use. With this parameter you can use chromedriver to drive various Chromium based browsers.

Add a browser location to options:

    options.binary_location = chrome_bin
            options.BinaryLocation = GetChromeLocation();
      options.binary = chrome_location
      let driver = await env
        .builder()
        .setChromeOptions(options.setChromeBinaryPath(`Path to chrome binary`))
        .build();

Add extensions

The extensions parameter accepts crx files. As for unpacked directories, please use the load-extension argument instead, as mentioned in this post.

Add an extension to options:

    options.add_extension(extension_file_path)
            options.AddExtension(extensionFilePath);
      options.add_extension(extension_file_path)
      const options = new Chrome.Options();
      let driver = await env
        .builder()
        .setChromeOptions(options.addExtensions(['./test/resources/extensions/webextensions-selenium-example.crx']))
        .build();

Keeping browser open

Setting the detach parameter to true will keep the browser open after the process has ended, so long as the quit command is not sent to the driver.

Note: This is already the default behavior in Java.

    options.add_experimental_option("detach", True)

Note: This is already the default behavior in .NET.

      options.detach = true
      let driver = await env
        .builder()
        .setChromeOptions(options.detachDriver(true))
        .build();

Excluding arguments

Chromedriver has several default arguments it uses to start the browser. If you do not want those arguments added, pass them into excludeSwitches. A common example is to turn the popup blocker back on. A full list of default arguments can be parsed from the Chromium Source Code

Set excluded arguments on options:

    options.add_experimental_option('excludeSwitches', ['disable-popup-blocking'])
            options.AddExcludedArgument("disable-popup-blocking");
      options.exclude_switches << 'disable-popup-blocking'
      let driver = await env
        .builder()
        .setChromeOptions(options.excludeSwitches('enable-automation'))
        .build();

Service

Examples for creating a default Service object, and for setting driver location and port can be found on the Driver Service page.

Log output

Getting driver logs can be helpful for debugging issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

        new ChromeDriverService.Builder().withLogFile(logLocation).build();

Note: Java also allows setting file output by System Property:
Property key: ChromeDriverService.CHROME_DRIVER_LOG_PROPERTY
Property value: String representing path to log file

Selenium v4.11

    service = webdriver.ChromeService(log_output=log_path)
            service.LogPath = GetLogLocation();

Console output

To change the logging output to display in the console as STDOUT:

Selenium v4.10

        new ChromeDriverService.Builder().withLogOutput(System.out).build();

Note: Java also allows setting console output by System Property;
Property key: ChromeDriverService.CHROME_DRIVER_LOG_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

Selenium v4.11

    service = webdriver.ChromeService(log_output=subprocess.STDOUT)

$stdout and $stderr are both valid values

Selenium v4.10

      service.log = $stdout

Log level

There are 6 available log levels: ALL, DEBUG, INFO, WARNING, SEVERE, and OFF. Note that --verbose is equivalent to --log-level=ALL and --silent is equivalent to --log-level=OFF, so this example is just setting the log level generically:

Selenium v4.8

        new ChromeDriverService.Builder().withLogLevel(ChromiumDriverLogLevel.DEBUG).build();

Note: Java also allows setting log level by System Property:
Property key: ChromeDriverService.CHROME_DRIVER_LOG_LEVEL_PROPERTY
Property value: String representation of ChromiumDriverLogLevel enum

Selenium v4.11

    service = webdriver.ChromeService(service_args=['--log-level=DEBUG'], log_output=subprocess.STDOUT)

Selenium v4.10

      service.args << '--log-level=DEBUG'

Log file features

There are 2 features that are only available when logging to a file:

  • append log
  • readable timestamps

To use them, you need to also explicitly specify the log path and log level. The log output will be managed by the driver, not the process, so minor differences may be seen.

Selenium v4.8

        new ChromeDriverService.Builder().withAppendLog(true).withReadableTimestamp(true).build();

Note: Java also allows toggling these features by System Property:
Property keys: ChromeDriverService.CHROME_DRIVER_APPEND_LOG_PROPERTY and ChromeDriverService.CHROME_DRIVER_READABLE_TIMESTAMP
Property value: "true" or "false"

    service = webdriver.ChromeService(service_args=['--append-log', '--readable-timestamp'], log_output=log_path)

Selenium v4.8

      service.args << '--append-log'
      service.args << '--readable-timestamp'

Disabling build check

Chromedriver and Chrome browser versions should match, and if they don’t the driver will error. If you disable the build check, you can force the driver to be used with any version of Chrome. Note that this is an unsupported feature, and bugs will not be investigated.

Selenium v4.8

        new ChromeDriverService.Builder().withBuildCheckDisabled(true).build();

Note: Java also allows disabling build checks by System Property:
Property key: ChromeDriverService.CHROME_DRIVER_DISABLE_BUILD_CHECK
Property value: "true" or "false"

Selenium v4.11

    service = webdriver.ChromeService(service_args=['--disable-build-check'], log_output=subprocess.STDOUT)
            service.DisableBuildCheck = true;

Selenium v4.8

      service.args << '--disable-build-check'

Special Features

Some browsers have implemented additional features that are unique to them.

Casting

You can drive Chrome Cast devices, including sharing tabs

Network conditions

You can simulate various network conditions.

Logs

Permissions

DevTools

See the Chrome DevTools section for more information about using Chrome DevTools

3.2 - Edge specific functionality

These are capabilities and features specific to Microsoft Edge browsers.

Microsoft Edge is implemented with Chromium, with the earliest supported version of v79. Similar to Chrome, the major version number of edgedriver must match the major version of the Edge browser.

Options

Capabilities common to all browsers are described on the Options page.

Capabilities unique to Chromium are documented at Google’s page for Capabilities & ChromeOptions

Starting an Edge session with basic defined options looks like this:

    driver = new EdgeDriver(options);
  }
    options = webdriver.EdgeOptions()
    driver = webdriver.Edge(options=options)
            var options = new EdgeOptions();
            driver = new EdgeDriver(options);
      options = Selenium::WebDriver::Options.edge
      @driver = Selenium::WebDriver.for :edge, options: options
      driver = await env.builder()
        .setEdgeOptions(options)
        .build();
    });

Arguments

The args parameter is for a list of command line switches to be used when starting the browser. There are two excellent resources for investigating these arguments:

Commonly used args include --start-maximized, --headless=new and --user-data-dir=...

Add an argument to options:

    options.add_argument("--start-maximized")
            options.AddArgument("--start-maximized");
      options.args << '--start-maximized'
        .setEdgeOptions(options.addArguments('--headless=new'))

Start browser in a specified location

The binary parameter takes the path of an alternate location of browser to use. With this parameter you can use chromedriver to drive various Chromium based browsers.

Add a browser location to options:

    options.binary_location = edge_bin
            options.BinaryLocation = GetEdgeLocation();
      options.binary = edge_location

Add extensions

The extensions parameter accepts crx files. As for unpacked directories, please use the load-extension argument instead, as mentioned in this post.

Add an extension to options:

    options.add_extension(extension_file_path)
            options.AddExtension(extensionFilePath);
      options.add_extension(extension_file_path)

Keeping browser open

Setting the detach parameter to true will keep the browser open after the process has ended, so long as the quit command is not sent to the driver.

Note: This is already the default behavior in Java.

    options.add_experimental_option("detach", True)

Note: This is already the default behavior in .NET.

      options.detach = true

Excluding arguments

MSEdgedriver has several default arguments it uses to start the browser. If you do not want those arguments added, pass them into excludeSwitches. A common example is to turn the popup blocker back on. A full list of default arguments can be parsed from the Chromium Source Code

Set excluded arguments on options:

    options.add_experimental_option('excludeSwitches', ['disable-popup-blocking'])
            options.AddExcludedArgument("disable-popup-blocking");
      options.exclude_switches << 'disable-popup-blocking'

Service

Examples for creating a default Service object, and for setting driver location and port can be found on the Driver Service page.

Log output

Getting driver logs can be helpful for debugging issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

Selenium v4.10

Note: Java also allows setting file output by System Property:
Property key: EdgeDriverService.EDGE_DRIVER_LOG_PROPERTY
Property value: String representing path to log file

    service = webdriver.EdgeService(log_output=log_path)
            service.LogPath = GetLogLocation();

Console output

To change the logging output to display in the console as STDOUT:

Selenium v4.10

Note: Java also allows setting console output by System Property;
Property key: EdgeDriverService.EDGE_DRIVER_LOG_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

$stdout and $stderr are both valid values

Selenium v4.10

      service.log = $stdout

Log level

There are 6 available log levels: ALL, DEBUG, INFO, WARNING, SEVERE, and OFF. Note that --verbose is equivalent to --log-level=ALL and --silent is equivalent to --log-level=OFF, so this example is just setting the log level generically:

Selenium v4.8

        new EdgeDriverService.Builder().withLoglevel(ChromiumDriverLogLevel.DEBUG).build();

Note: Java also allows setting log level by System Property:
Property key: EdgeDriverService.EDGE_DRIVER_LOG_LEVEL_PROPERTY
Property value: String representation of ChromiumDriverLogLevel enum

    service = webdriver.EdgeService(service_args=['--log-level=DEBUG'], log_output=log_path)

Selenium v4.10

      service.args << '--log-level=DEBUG'

Log file features

There are 2 features that are only available when logging to a file:

  • append log
  • readable timestamps

To use them, you need to also explicitly specify the log path and log level. The log output will be managed by the driver, not the process, so minor differences may be seen.

Selenium v4.8

        new EdgeDriverService.Builder().withAppendLog(true).withReadableTimestamp(true).build();

Note: Java also allows toggling these features by System Property:
Property keys: EdgeDriverService.EDGE_DRIVER_APPEND_LOG_PROPERTY and EdgeDriverService.EDGE_DRIVER_READABLE_TIMESTAMP
Property value: "true" or "false"

    service = webdriver.EdgeService(service_args=['--append-log', '--readable-timestamp'], log_output=log_path)

Selenium v4.8

      service.args << '--append-log'
      service.args << '--readable-timestamp'

Disabling build check

Edge browser and msedgedriver versions should match, and if they don’t the driver will error. If you disable the build check, you can force the driver to be used with any version of Edge. Note that this is an unsupported feature, and bugs will not be investigated.

Selenium v4.8

        new EdgeDriverService.Builder().withBuildCheckDisabled(true).build();

Note: Java also allows disabling build checks by System Property:
Property key: EdgeDriverService.EDGE_DRIVER_DISABLE_BUILD_CHECK
Property value: "true" or "false"

    service = webdriver.EdgeService(service_args=['--disable-build-check'], log_output=log_path)
            service.DisableBuildCheck = true;

Selenium v4.8

      service.args << '--disable-build-check'

Internet Explorer Mode

Microsoft Edge can be driven in “Internet Explorer Compatibility Mode”, which uses the Internet Explorer Driver classes in conjunction with Microsoft Edge. Read the Internet Explorer page for more details.

Special Features

Some browsers have implemented additional features that are unique to them.

Casting

You can drive Chrome Cast devices with Edge, including sharing tabs

Network conditions

You can simulate various network conditions.

Logs

Permissions

DevTools

See the Chrome DevTools section for more information about using DevTools in Edge

3.3 - Firefox specific functionality

These are capabilities and features specific to Mozilla Firefox browsers.

Selenium 4 requires Firefox 78 or greater. It is recommended to always use the latest version of geckodriver.

Options

Capabilities common to all browsers are described on the Options page.

Capabilities unique to Firefox can be found at Mozilla’s page for firefoxOptions

Starting a Firefox session with basic defined options looks like this:

    driver = new FirefoxDriver(options);
  }
    options = webdriver.FirefoxOptions()
    driver = webdriver.Firefox(options=options)
            var options = new FirefoxOptions();
            driver = new FirefoxDriver(options);
      options = Selenium::WebDriver::Options.firefox
      @driver = Selenium::WebDriver.for :firefox, options: options
      let options = new firefox.Options();
      driver = await env.builder()
        .setFirefoxOptions(options)
        .build();

Arguments

The args parameter is for a list of Command line switches used when starting the browser.
Commonly used args include -headless and "-profile", "/path/to/profile"

Add an argument to options:

    options.add_argument("-headless")
            options.AddArgument("-headless");
      options.args << '-headless'
    let driver = await env.builder()
      .setFirefoxOptions(options.addArguments('--headless'))
      .build();

Start browser in a specified location

The binary parameter takes the path of an alternate location of browser to use. For example, with this parameter you can use geckodriver to drive Firefox Nightly instead of the production version when both are present on your computer.

Add a browser location to options:

    options.binary_location = firefox_bin
            options.BinaryLocation = GetFirefoxLocation();
      options.binary = firefox_location

Profiles

There are several ways to work with Firefox profiles.

Move Code

FirefoxProfile profile = new FirefoxProfile();
FirefoxOptions options = new FirefoxOptions();
options.setProfile(profile);
driver = new FirefoxDriver(options);
  
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
options=Options()
firefox_profile = FirefoxProfile()
firefox_profile.set_preference("javascript.enabled", False)
options.profile = firefox_profile
  
var options = new FirefoxOptions();
var profile = new FirefoxProfile();
options.Profile = profile;
var driver = new FirefoxDriver(options);
  
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.dir'] = "/tmp/webdriver-downloads"
options = Selenium::WebDriver::Firefox::Options.new(profile: profile)
driver = Selenium::WebDriver.for :firefox, options: options
  
const { Builder } = require("selenium-webdriver");
const firefox = require('selenium-webdriver/firefox');

const options = new firefox.Options();
let profile = '/path to custom profile';
options.setProfile(profile);
const driver = new Builder()
    .forBrowser('firefox')
    .setFirefoxOptions(options)
    .build();
  
val options = FirefoxOptions()
options.profile = FirefoxProfile()
driver = FirefoxDriver(options)
  

Service

Service settings common to all browsers are described on the Service page.

Log output

Getting driver logs can be helpful for debugging various issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

        new GeckoDriverService.Builder().withLogFile(logLocation).build();

Note: Java also allows setting file output by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_LOG_PROPERTY
Property value: String representing path to log file

Selenium v4.11

    service = webdriver.FirefoxService(log_output=log_path, service_args=['--log', 'debug'])

Console output

To change the logging output to display in the console:

Selenium v4.10

        new GeckoDriverService.Builder().withLogOutput(System.out).build();

Note: Java also allows setting console output by System Property;
Property key: GeckoDriverService.GECKO_DRIVER_LOG_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

Selenium v4.11

    service = webdriver.FirefoxService(log_output=subprocess.STDOUT)

Log level

There are 7 available log levels: fatal, error, warn, info, config, debug, trace. If logging is specified the level defaults to info.

Note that -v is equivalent to -log debug and -vv is equivalent to log trace, so this examples is just for setting the log level generically:

Selenium v4.10

        new GeckoDriverService.Builder().withLogLevel(FirefoxDriverLogLevel.DEBUG).build();

Note: Java also allows setting log level by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_LOG_LEVEL_PROPERTY
Property value: String representation of FirefoxDriverLogLevel enum

Selenium v4.11

    service = webdriver.FirefoxService(log_output=log_path, service_args=['--log', 'debug'])

Selenium v4.10

      service.args += %w[--log debug]

Truncated Logs

The driver logs everything that gets sent to it, including string representations of large binaries, so Firefox truncates lines by default. To turn off truncation:

Selenium v4.10

        new GeckoDriverService.Builder().withTruncatedLogs(false).build();

Note: Java also allows setting log level by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_LOG_NO_TRUNCATE
Property value: "true" or "false"

Selenium v4.11

    service = webdriver.FirefoxService(service_args=['--log-no-truncate', '--log', 'debug'], log_output=log_path)

Selenium v4.10

      service.args << '--log-no-truncate'

Profile Root

The default directory for profiles is the system temporary directory. If you do not have access to that directory, or want profiles to be created some place specific, you can change the profile root directory:

Selenium v4.10

        new GeckoDriverService.Builder().withProfileRoot(profileDirectory).build();

Note: Java also allows setting log level by System Property:
Property key: GeckoDriverService.GECKO_DRIVER_PROFILE_ROOT
Property value: String representing path to profile root directory

    service = webdriver.FirefoxService(service_args=['--profile-root', temp_dir])

Selenium v4.8

      service.args += ['--profile-root', root_directory]

Special Features

Some browsers have implemented additional features that are unique to them.

Add-ons

Unlike Chrome, Firefox extensions are not added as part of capabilities as mentioned in this issue, they are created after starting the driver.

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

Installation

A signed xpi file you would get from Mozilla Addon page

    driver.install_addon(addon_path_xpi)
            driver.InstallAddOnFromFile(Path.GetFullPath(extensionFilePath));
      driver.install_addon(extension_file_path)
    const xpiPath = path.resolve('./test/resources/extensions/selenium-example.xpi')
    let driver = await env.builder().build();
    let id = await driver.installAddon(xpiPath);

Uninstallation

Uninstalling an addon requires knowing its id. The id can be obtained from the return value when installing the add-on.

    driver.uninstall_addon(id)

Selenium v4.5

            driver.UninstallAddOn(extensionId);
      driver.uninstall_addon(extension_id)
    await driver.uninstallAddon(id);

Unsigned installation

When working with an unfinished or unpublished extension, it will likely not be signed. As such, it can only be installed as “temporary.” This can be done by passing in either a zip file or a directory, here’s an example with a directory:

    driver.install_addon(addon_path_dir, temporary=True)

Selenium v4.5

            driver.InstallAddOnFromDirectory(Path.GetFullPath(extensionDirPath), true);

Selenium v4.5

      driver.install_addon(extension_dir_path, true)
    const xpiPath = path.resolve('./test/resources/extensions/selenium-example')
    let driver = await env.builder().build();
    let id = await driver.installAddon(xpiPath, true);

Full page screenshots

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

Context

The following examples are for local webdrivers. For remote webdrivers, please refer to the Remote WebDriver page.

3.4 - IE specific functionality

These are capabilities and features specific to Microsoft Internet Explorer browsers.

As of June 2022, Selenium officially no longer supports standalone Internet Explorer. The Internet Explorer driver still supports running Microsoft Edge in “IE Compatibility Mode.”

Special considerations

The IE Driver is the only driver maintained by the Selenium Project directly. While binaries for both the 32-bit and 64-bit versions of Internet Explorer are available, there are some known limitations with the 64-bit driver. As such it is recommended to use the 32-bit driver.

Additional information about using Internet Explorer can be found on the IE Driver Server page

Options

Starting a Microsoft Edge browser in Internet Explorer Compatibility mode with basic defined options looks like this:

        InternetExplorerOptions options = new InternetExplorerOptions();
        options.attachToEdgeChrome();
        options.withEdgeExecutablePath(getEdgeLocation());
        driver = new InternetExplorerDriver(options);
    options = webdriver.IeOptions()
    options.attach_to_edge_chrome = True
    options.edge_executable_path = edge_bin
    driver = webdriver.Ie(options=options)
            var options = new InternetExplorerOptions();
            options.AttachToEdgeChrome = true;
            options.EdgeExecutablePath = GetEdgeLocation();
            driver = new InternetExplorerDriver(options);
      options = Selenium::WebDriver::Options.ie
      options.attach_to_edge_chrome = true
      options.edge_executable_path = edge_location
      @driver = Selenium::WebDriver.for :ie, options: options

As of Internet Explorer Driver v4.5.0:

  • If IE is not present on the system (default in Windows 11), you do not need to use the two parameters above. IE Driver will use Edge and will automatically locate it.
  • If IE and Edge are both present on the system, you only need to set attaching to Edge, IE Driver will automatically locate Edge on your system.

So, if IE is not on the system, you only need:

Move Code

        InternetExplorerOptions options = new InternetExplorerOptions();
        driver = new InternetExplorerDriver(options);
    options = webdriver.IeOptions()
    driver = webdriver.Ie(options=options)
            var options = new InternetExplorerOptions();
            driver = new InternetExplorerDriver(options);
      options = Selenium::WebDriver::Options.ie
      @driver = Selenium::WebDriver.for :ie, options: options
let driver = await new Builder()
.forBrowser('internet explorer')
.setIEOptions(options)
.build();
val options = InternetExplorerOptions()
val driver = InternetExplorerDriver(options)

Here are a few common use cases with different capabilities:

fileUploadDialogTimeout

In some environments, Internet Explorer may timeout when opening the File Upload dialog. IEDriver has a default timeout of 1000ms, but you can increase the timeout using the fileUploadDialogTimeout capability.

Move Code

InternetExplorerOptions options = new InternetExplorerOptions();
options.waitForUploadDialogUpTo(Duration.ofSeconds(2));
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.file_upload_dialog_timeout = 2000
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.FileUploadDialogTimeout = TimeSpan.FromMilliseconds(2000);
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.file_upload_dialog_timeout = 2000
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().fileUploadDialogTimeout(2000);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.waitForUploadDialogUpTo(Duration.ofSeconds(2))
val driver = RemoteWebDriver(options)
  

ensureCleanSession

When set to true, this capability clears the Cache, Browser History and Cookies for all running instances of InternetExplorer including those started manually or by the driver. By default, it is set to false.

Using this capability will cause performance drop while launching the browser, as the driver will wait until the cache gets cleared before launching the IE browser.

This capability accepts a Boolean value as parameter.

Move Code

InternetExplorerOptions options = new InternetExplorerOptions();
options.destructivelyEnsureCleanSession();
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.ensure_clean_session = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.EnsureCleanSession = true;
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.ensure_clean_session = true
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().ensureCleanSession(true);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.destructivelyEnsureCleanSession()
val driver = RemoteWebDriver(options)
  

ignoreZoomSetting

InternetExplorer driver expects the browser zoom level to be 100%, else the driver will throw an exception. This default behaviour can be disabled by setting the ignoreZoomSetting to true.

This capability accepts a Boolean value as parameter.

Move Code

InternetExplorerOptions options = new InternetExplorerOptions();
options.ignoreZoomSettings();
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.ignore_zoom_level = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.IgnoreZoomLevel = true;
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.ignore_zoom_level = true
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().ignoreZoomSetting(true);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.ignoreZoomSettings()
val driver = RemoteWebDriver(options)
  

ignoreProtectedModeSettings

Whether to skip the Protected Mode check while launching a new IE session.

If not set and Protected Mode settings are not same for all zones, an exception will be thrown by the driver.

If capability is set to true, tests may become flaky, unresponsive, or browsers may hang. However, this is still by far a second-best choice, and the first choice should always be to actually set the Protected Mode settings of each zone manually. If a user is using this property, only a “best effort” at support will be given.

This capability accepts a Boolean value as parameter.

Move Code

InternetExplorerOptions options = new InternetExplorerOptions();
options.introduceFlakinessByIgnoringSecurityDomains();
WebDriver driver = new RemoteWebDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.ignore_protected_mode_settings = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
var options = new InternetExplorerOptions();
options.IntroduceInstabilityByIgnoringProtectedModeSettings = true;
var driver = new RemoteWebDriver(options);
  
options = Selenium::WebDriver::IE::Options.new
options.ignore_protected_mode_settings = true
driver = Selenium::WebDriver.for(:ie, options: options)
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options().introduceFlakinessByIgnoringProtectedModeSettings(true);
let driver = await Builder()
          .setIeOptions(options)
          .build(); 
  
val options = InternetExplorerOptions()
options.introduceFlakinessByIgnoringSecurityDomains()
val driver = RemoteWebDriver(options)
  

silent

When set to true, this capability suppresses the diagnostic output of the IEDriverServer.

This capability accepts a Boolean value as parameter.

Move Code

InternetExplorerOptions options = new InternetExplorerOptions();
options.setCapability("silent", true);
WebDriver driver = new InternetExplorerDriver(options);
  
from selenium import webdriver

options = webdriver.IeOptions()
options.set_capability("silent", True)
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
InternetExplorerOptions options = new InternetExplorerOptions();
options.AddAdditionalInternetExplorerOption("silent", true);
IWebDriver driver = new InternetExplorerDriver(options);
  
    
    
    
    
    <p><a href=/documentation/about/contributing/#creating-examples>
    <span class="selenium-badge-code" data-bs-toggle="tooltip" data-bs-placement="right"
          title="This code example is missing. Examples are added to the examples directory; click for details in the contribution guide">Add Example</span></a></p>
    

  
const {Builder,By, Capabilities} = require('selenium-webdriver');
let caps = Capabilities.ie();
caps.set('silent', true);

(async function example() {
    let driver = await new Builder()
        .forBrowser('internet explorer')
        .withCapabilities(caps)
        .build();
    try {
        await driver.get('http://www.google.com/ncr');
    }
    finally {
        await driver.quit();
    }
})();
  
import org.openqa.selenium.Capabilities
import org.openqa.selenium.ie.InternetExplorerDriver
import org.openqa.selenium.ie.InternetExplorerOptions

fun main() {
    val options = InternetExplorerOptions()
    options.setCapability("silent", true)
    val driver = InternetExplorerDriver(options)
    try {
        driver.get("https://google.com/ncr")
        val caps = driver.getCapabilities()
        println(caps)
    } finally {
        driver.quit()
    }
}
  

Command-Line Options

Internet Explorer includes several command-line options that enable you to troubleshoot and configure the browser.

The following describes few supported command-line options

  • -private : Used to start IE in private browsing mode. This works for IE 8 and later versions.

  • -k : Starts Internet Explorer in kiosk mode. The browser opens in a maximized window that does not display the address bar, the navigation buttons, or the status bar.

  • -extoff : Starts IE in no add-on mode. This option specifically used to troubleshoot problems with browser add-ons. Works in IE 7 and later versions.

Note: forceCreateProcessApi should to enabled in-order for command line arguments to work.

Move Code

import org.openqa.selenium.Capabilities;
import org.openqa.selenium.ie.InternetExplorerDriver;
import org.openqa.selenium.ie.InternetExplorerOptions;

public class ieTest {
    public static void main(String[] args) {
        InternetExplorerOptions options = new InternetExplorerOptions();
        options.useCreateProcessApiToLaunchIe();
        options.addCommandSwitches("-k");
        InternetExplorerDriver driver = new InternetExplorerDriver(options);
        try {
            driver.get("https://google.com/ncr");
            Capabilities caps = driver.getCapabilities();
            System.out.println(caps);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

options = webdriver.IeOptions()
options.add_argument('-private')
options.force_create_process_api = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.IE;

namespace ieTest {
 class Program {
  static void Main(string[] args) {
   InternetExplorerOptions options = new InternetExplorerOptions();
   options.ForceCreateProcessApi = true;
   options.BrowserCommandLineArguments = "-k";
   IWebDriver driver = new InternetExplorerDriver(options);
   driver.Url = "https://google.com/ncr";
  }
 }
}
  
require 'selenium-webdriver'
options = Selenium::WebDriver::IE::Options.new
options.force_create_process_api = true
options.add_argument('-k')
driver = Selenium::WebDriver.for(:ie, options: options)

begin
  driver.get 'https://google.com'
  puts(driver.capabilities.to_json)
ensure
  driver.quit
end
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options();
options.addBrowserCommandSwitches('-k');
options.addBrowserCommandSwitches('-private');
options.forceCreateProcessApi(true);

driver = await env.builder()
          .setIeOptions(options)
          .build();
  
import org.openqa.selenium.Capabilities
import org.openqa.selenium.ie.InternetExplorerDriver
import org.openqa.selenium.ie.InternetExplorerOptions

fun main() {
    val options = InternetExplorerOptions()
    options.useCreateProcessApiToLaunchIe()
    options.addCommandSwitches("-k")
    val driver = InternetExplorerDriver(options)
    try {
        driver.get("https://google.com/ncr")
        val caps = driver.getCapabilities()
        println(caps)
    } finally {
        driver.quit()
    }
}
  

forceCreateProcessApi

Forces launching Internet Explorer using the CreateProcess API. The default value is false.

For IE 8 and above, this option requires the “TabProcGrowth” registry value to be set to 0.

Move Code

import org.openqa.selenium.Capabilities;
import org.openqa.selenium.ie.InternetExplorerDriver;
import org.openqa.selenium.ie.InternetExplorerOptions;

public class ieTest {
    public static void main(String[] args) {
        InternetExplorerOptions options = new InternetExplorerOptions();
        options.useCreateProcessApiToLaunchIe();
        InternetExplorerDriver driver = new InternetExplorerDriver(options);
        try {
            driver.get("https://google.com/ncr");
            Capabilities caps = driver.getCapabilities();
            System.out.println(caps);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

options = webdriver.IeOptions()
options.force_create_process_api = True
driver = webdriver.Ie(options=options)

driver.get("http://www.google.com")

driver.quit()
  
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.IE;

namespace ieTest {
 class Program {
  static void Main(string[] args) {
   InternetExplorerOptions options = new InternetExplorerOptions();
   options.ForceCreateProcessApi = true;
   IWebDriver driver = new InternetExplorerDriver(options);
   driver.Url = "https://google.com/ncr";
  }
 }
}
  
require 'selenium-webdriver'
options = Selenium::WebDriver::IE::Options.new
options.force_create_process_api = true
driver = Selenium::WebDriver.for(:ie, options: options)

begin
  driver.get 'https://google.com'
  puts(driver.capabilities.to_json)
ensure
  driver.quit
end
  
const ie = require('selenium-webdriver/ie');
let options = new ie.Options();
options.forceCreateProcessApi(true);

driver = await env.builder()
          .setIeOptions(options)
          .build();
  
import org.openqa.selenium.Capabilities
import org.openqa.selenium.ie.InternetExplorerDriver
import org.openqa.selenium.ie.InternetExplorerOptions

fun main() {
    val options = InternetExplorerOptions()
    options.useCreateProcessApiToLaunchIe()
    val driver = InternetExplorerDriver(options)
    try {
        driver.get("https://google.com/ncr")
        val caps = driver.getCapabilities()
        println(caps)
    } finally {
        driver.quit()
    }
}
  

Service

Service settings common to all browsers are described on the Service page.

Log output

Getting driver logs can be helpful for debugging various issues. The Service class lets you direct where the logs will go. Logging output is ignored unless the user directs it somewhere.

File output

To change the logging output to save to a specific file:

Selenium v4.10

                .withLogFile(getLogLocation())

Note: Java also allows setting file output by System Property:
Property key: InternetExplorerDriverService.IE_DRIVER_LOGFILE_PROPERTY
Property value: String representing path to log file

    service = webdriver.IeService(log_output=log_path, log_level='INFO')

Console output

To change the logging output to display in the console as STDOUT:

Selenium v4.10

                .withLogOutput(System.out)

Note: Java also allows setting console output by System Property;
Property key: InternetExplorerDriverService.IE_DRIVER_LOGFILE_PROPERTY
Property value: DriverService.LOG_STDOUT or DriverService.LOG_STDERR

Selenium v4.11

    service = webdriver.IeService(log_output=subprocess.STDOUT)

Log Level

There are 6 available log levels: FATAL, ERROR, WARN, INFO, DEBUG, and TRACE If logging output is specified, the default level is FATAL

                .withLogLevel(InternetExplorerDriverLogLevel.WARN)

Note: Java also allows setting log level by System Property:
Property key: InternetExplorerDriverService.IE_DRIVER_LOGLEVEL_PROPERTY
Property value: String representation of InternetExplorerDriverLogLevel.DEBUG.toString() enum

    service = webdriver.IeService(log_output=log_path, log_level='WARN')
            service.LoggingLevel = InternetExplorerDriverLogLevel.Warn;

Selenium v4.10

      service.args << '-log-level=WARN'

Supporting Files Path

                .withExtractPath(getTempDirectory())
**Note**: Java also allows setting log level by System Property:\ Property key: `InternetExplorerDriverService.IE_DRIVER_EXTRACT_PATH_PROPERTY`\ Property value: String representing path to supporting files directory

Selenium v4.11

    service = webdriver.IeService(service_args=["–extract-path="+temp_dir])
            service.LibraryExtractionPath = GetTempDirectory();

Selenium v4.8

      service.args << "–extract-path=#{root_directory}"

3.5 - Safari specific functionality

These are capabilities and features specific to Apple Safari browsers.

Unlike Chromium and Firefox drivers, the safaridriver is installed with the Operating System. To enable automation on Safari, run the following command from the terminal:

safaridriver --enable

Options

Capabilities common to all browsers are described on the Options page.

Capabilities unique to Safari can be found at Apple’s page About WebDriver for Safari

Starting a Safari session with basic defined options looks like this:

Move Code

        SafariOptions options = new SafariOptions();
        driver = new SafariDriver(options);
    options = webdriver.SafariOptions()
    driver = webdriver.Safari(options=options)
            var options = new SafariOptions();
            driver = new SafariDriver(options);
      options = Selenium::WebDriver::Options.safari
      @driver = Selenium::WebDriver.for :safari, options: options
      let driver = await env.builder()
      .setSafariOptions(options)
      .build();
  val options = SafariOptions()
  val driver = SafariDriver(options)

Mobile

Those looking to automate Safari on iOS should look to the Appium project.

Service

Service settings common to all browsers are described on the Service page.

Logging

Unlike other browsers, Safari doesn’t let you choose where logs are output, or change levels. The one option available is to turn logs off or on. If logs are toggled on, they can be found at:~/Library/Logs/com.apple.WebDriver/.

Selenium v4.10

                .withLogging(true)

Note: Java also allows setting console output by System Property;
Property key: SafariDriverService.SAFARI_DRIVER_LOGGING
Property value: "true" or "false"

    service = webdriver.SafariService(service_args=["--diagnose"])

Selenium v4.8

      service.args << '--diagnose'

Safari Technology Preview

Apple provides a development version of their browser — Safari Technology Preview

4 - Waiting Strategies

Perhaps the most common challenge for browser automation is ensuring that the web application is in a state to execute a particular Selenium command as desired. The processes often end up in a race condition where sometimes the browser gets into the right state first (things work as intended) and sometimes the Selenium code executes first (things do not work as intended). This is one of the primary causes of flaky tests.

All navigation commands wait for a specific readyState value based on the page load strategy (the default value to wait for is "complete") before the driver returns control to the code. The readyState only concerns itself with loading assets defined in the HTML, but loaded JavaScript assets often result in changes to the site, and elements that need to be interacted with may not yet be on the page when the code is ready to execute the next Selenium command.

Similarly, in a lot of single page applications, elements get dynamically added to a page or change visibility based on a click. An element must be both present and displayed on the page in order for Selenium to interact with it.

Take this page for example: https://www.selenium.dev/selenium/web/dynamic.html When the “Add a box!” button is clicked, a “div” element that does not exist is created. When the “Reveal a new input” button is clicked, a hidden text field element is displayed. In both cases the transition takes a couple seconds. If the Selenium code is to click one of these buttons and interact with the resulting element, it will do so before that element is ready and fail.

The first solution many people turn to is adding a sleep statement to pause the code execution for a set period of time. Because the code can’t know exactly how long it needs to wait, this can fail when it doesn’t sleep long enough. Alternately, if the value is set too high and a sleep statement is added in every place it is needed, the duration of the session can become prohibitive.

Selenium provides two different mechanisms for synchronization that are better.

Implicit waits

Selenium has a built-in way to automatically wait for elements called an implicit wait. An implicit wait value can be set either with the timeouts capability in the browser options, or with a driver method (as shown below).

This is a global setting that applies to every element location call for the entire session. The default value is 0, which means that if the element is not found, it will immediately return an error. If an implicit wait is set, the driver will wait for the duration of the provided value before returning the error. Note that as soon as the element is located, the driver will return the element reference and the code will continue executing, so a larger implicit wait value won’t necessarily increase the duration of the session.

Warning: Do not mix implicit and explicit waits. Doing so can cause unpredictable wait times. For example, setting an implicit wait of 10 seconds and an explicit wait of 15 seconds could cause a timeout to occur after 20 seconds.

Solving our example with an implicit wait looks like this:

    driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(2));
    driver.implicitly_wait(2)
            driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(2);
    driver.manage.timeouts.implicit_wait = 2
            await driver.manage().setTimeouts({ implicit: 2000 });
            await driver.get('https://www.selenium.dev/selenium/web/dynamic.html');
            await driver.findElement(By.id("adder")).click();

            let added = await driver.findElement(By.id("box0"));

Explicit waits

Explicit waits are loops added to the code that poll the application for a specific condition to evaluate as true before it exits the loop and continues to the next command in the code. If the condition is not met before a designated timeout value, the code will give a timeout error. Since there are many ways for the application not to be in the desired state, explicit waits are a great choice to specify the exact condition to wait for in each place it is needed. Another nice feature is that, by default, the Selenium Wait class automatically waits for the designated element to exist.

This example shows the condition being waited for as a lambda. Java also supports Expected Conditions

    Wait<WebDriver> wait = new WebDriverWait(driver, Duration.ofSeconds(2));
    wait.until(d -> revealed.isDisplayed());

This example shows the condition being waited for as a lambda. Python also supports Expected Conditions

    wait = WebDriverWait(driver, timeout=2)
    wait.until(lambda d : revealed.is_displayed())
            WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(2));
            wait.Until(d => revealed.Displayed);
    wait = Selenium::WebDriver::Wait.new
    wait.until { revealed.displayed? }

JavaScript also supports Expected Conditions

            await driver.get('https://www.selenium.dev/selenium/web/dynamic.html');
            let revealed = await driver.findElement(By.id("revealed"));
            await driver.findElement(By.id("reveal")).click();
            await driver.wait(until.elementIsVisible(revealed), 2000);
            await revealed.sendKeys("Displayed");

Customization

The Wait class can be instantiated with various parameters that will change how the conditions are evaluated.

This can include:

  • Changing how often the code is evaluated (polling interval)
  • Specifying which exceptions should be handled automatically
  • Changing the total timeout length
  • Customizing the timeout message

For instance, if the element not interactable error is retried by default, then we can add an action on a method inside the code getting executed (we just need to make sure that the code returns true when it is successful):

The easiest way to customize Waits in Java is to use the FluentWait class:

    Wait<WebDriver> wait =
        new FluentWait<>(driver)
            .withTimeout(Duration.ofSeconds(2))
            .pollingEvery(Duration.ofMillis(300))
            .ignoring(ElementNotInteractableException.class);

    wait.until(
        d -> {
          revealed.sendKeys("Displayed");
          return true;
        });
    errors = [NoSuchElementException, ElementNotInteractableException]
    wait = WebDriverWait(driver, timeout=2, poll_frequency=.2, ignored_exceptions=errors)
    wait.until(lambda d : revealed.send_keys("Displayed") or True)
            WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(2))
            {
                PollingInterval = TimeSpan.FromMilliseconds(300),
            };
            wait.IgnoreExceptionTypes(typeof(ElementNotInteractableException));

            wait.Until(d => {
                revealed.SendKeys("Displayed");
                return true;
            });
    errors = [Selenium::WebDriver::Error::NoSuchElementError,
              Selenium::WebDriver::Error::ElementNotInteractableError]
    wait = Selenium::WebDriver::Wait.new(timeout: 2,
                                         interval: 0.3,
                                         ignore: errors)

    wait.until { revealed.send_keys('Displayed') || true }

5 - Web elements

Identifying and working with element objects in the DOM.

The majority of most people’s Selenium code involves working with web elements.

5.1 - File Upload

Because Selenium cannot interact with the file upload dialog, it provides a way to upload files without opening the dialog. If the element is an input element with type file, you can use the send keys method to send the full path to the file that will be uploaded.

    WebElement fileInput = driver.findElement(By.cssSelector("input[type=file]"));
    fileInput.sendKeys(uploadFile.getAbsolutePath());
    driver.findElement(By.id("file-submit")).click();
    file_input = driver.find_element(By.CSS_SELECTOR, "input[type='file']")
    file_input.send_keys(upload_file)
    driver.find_element(By.ID, "file-submit").click()
            IWebElement fileInput = driver.FindElement(By.CssSelector("input[type=file]"));
            fileInput.SendKeys(uploadFile);
            driver.FindElement(By.Id("file-submit")).Click();
    file_input = driver.find_element(css: 'input[type=file]')
    file_input.send_keys(upload_file)
    driver.find_element(id: 'file-submit').click
      await driver.findElement(By.id("file-upload")).sendKeys(image);
      await driver.findElement(By.id("file-submit")).submit();

Move Code

```java import org.openqa.selenium.By import org.openqa.selenium.chrome.ChromeDriver fun main() { val driver = ChromeDriver() driver.get("https://the-internet.herokuapp.com/upload") driver.findElement(By.id("file-upload")).sendKeys("selenium-snapshot.jpg") driver.findElement(By.id("file-submit")).submit() if(driver.pageSource.contains("File Uploaded!")) { println("file uploaded") } else{ println("file not uploaded") } } ```

5.2 - Locator strategies

Ways to identify one or more specific elements in the DOM.

A locator is a way to identify elements on a page. It is the argument passed to the Finding element methods.

Check out our encouraged test practices for tips on locators, including which to use when and why to declare locators separately from the finding methods.

Traditional Locators

Selenium provides support for these 8 traditional location strategies in WebDriver:

LocatorDescription
class nameLocates elements whose class name contains the search value (compound class names are not permitted)
css selectorLocates elements matching a CSS selector
idLocates elements whose ID attribute matches the search value
nameLocates elements whose NAME attribute matches the search value
link textLocates anchor elements whose visible text matches the search value
partial link textLocates anchor elements whose visible text contains the search value. If multiple elements are matching, only the first one will be selected.
tag nameLocates elements whose tag name matches the search value
xpathLocates elements matching an XPath expression

Creating Locators

To work on a web element using Selenium, we need to first locate it on the web page. Selenium provides us above mentioned ways, using which we can locate element on the page. To understand and create locator we will use the following HTML snippet.

<html>
<body>
<style>
.information {
  background-color: white;
  color: black;
  padding: 10px;
}
</style>
<h2>Contact Selenium</h2>

<form action="/action_page.php">
  <input type="radio" name="gender" value="m" />Male &nbsp;
  <input type="radio" name="gender" value="f" />Female <br>
  <br>
  <label for="fname">First name:</label><br>
  <input class="information" type="text" id="fname" name="fname" value="Jane"><br><br>
  <label for="lname">Last name:</label><br>
  <input class="information" type="text" id="lname" name="lname" value="Doe"><br><br>
  <label for="newsletter">Newsletter:</label>
  <input type="checkbox" name="newsletter" value="1" /><br><br>
  <input type="submit" value="Submit">
</form> 

<p>To know more about Selenium, visit the official page 
<a href ="www.selenium.dev">Selenium Official Page</a> 
</p>

</body>
</html>

class name

The HTML page web element can have attribute class. We can see an example in the above shown HTML snippet. We can identify these elements using the class name locator available in Selenium.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.className("information"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.CLASS_NAME, "information")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.ClassName("information"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(class: 'information')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.className('information'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.className("information"))
  

css selector

CSS is the language used to style HTML pages. We can use css selector locator strategy to identify the element on the page. If the element has an id, we create the locator as css = #id. Otherwise the format we follow is css =[attribute=value] . Let us see an example from above HTML snippet. We will create locator for First Name textbox, using css.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.cssSelector("#fname"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.CSS_SELECTOR, "#fname")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.CssSelector("#fname"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(css: '#fname')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.css('#fname'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.css("#fname"))
  

id

We can use the ID attribute of an element in a web page to locate it. Generally the ID property should be unique for each element on the web page. We will identify the Last Name field using it.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.id("lname"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.ID, "lname")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.Id("lname"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(id: 'lname')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.id('lname'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.id("lname"))
  

name

We can use the NAME attribute of an element in a web page to locate it. Generally the NAME property should be unique for each element on the web page. We will identify the Newsletter checkbox using it.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.name("newsletter"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.NAME, "newsletter")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.Name("newsletter"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(name: 'newsletter')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.name('newsletter'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.name("newsletter"))
  

If the element we want to locate is a link, we can use the link text locator to identify it on the web page. The link text is the text displayed of the link. In the HTML snippet shared, we have a link available, let’s see how will we locate it.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.linkText("Selenium Official Page"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.LINK_TEXT, "Selenium Official Page")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.LinkText("Selenium Official Page"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(link_text: 'Selenium Official Page')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.linkText('Selenium Official Page'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.linkText("Selenium Official Page"))
  

If the element we want to locate is a link, we can use the partial link text locator to identify it on the web page. The link text is the text displayed of the link. We can pass partial text as value. In the HTML snippet shared, we have a link available, lets see how will we locate it.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.partialLinkText("Official Page"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.PARTIAL_LINK_TEXT, "Official Page")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.PartialLinkText("Official Page"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(partial_link_text: 'Official Page')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.partialLinkText('Official Page'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.partialLinkText("Official Page"))
  

tag name

We can use the HTML TAG itself as a locator to identify the web element on the page. From the above HTML snippet shared, lets identify the link, using its html tag “a”.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.tagName("a"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.TAG_NAME, "a")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.TagName("a"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(tag_name: 'a')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.tagName('a'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.tagName("a"))
  

xpath

A HTML document can be considered as a XML document, and then we can use xpath which will be the path traversed to reach the element of interest to locate the element. The XPath could be absolute xpath, which is created from the root of the document. Example - /html/form/input[1]. This will return the male radio button. Or the xpath could be relative. Example- //input[@name=‘fname’]. This will return the first name text box. Let us create locator for female radio button using xpath.

Move Code

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.xpath("//input[@value='f']"));
  
    driver = webdriver.Chrome()
	driver.find_element(By.XPATH, "//input[@value='f']")
  
    var driver = new ChromeDriver();
	driver.FindElement(By.Xpath("//input[@value='f']"));
  
    driver = Selenium::WebDriver.for :chrome
	driver.find_element(xpath: '//input[@value='f']')
  
    let driver = await new Builder().forBrowser('chrome').build();
	const loc = await driver.findElement(By.xpath('//input[@value='f']'));
  
    val driver = ChromeDriver()
	val loc: WebElement = driver.findElement(By.xpath('//input[@value='f']'))
  

Relative Locators

Selenium 4 introduces Relative Locators (previously called Friendly Locators). These locators are helpful when it is not easy to construct a locator for the desired element, but easy to describe spatially where the element is in relation to an element that does have an easily constructed locator.

How it works

Selenium uses the JavaScript function getBoundingClientRect() to determine the size and position of elements on the page, and can use this information to locate neighboring elements.

Relative locator methods can take as the argument for the point of origin, either a previously located element reference, or another locator. In these examples we’ll be using locators only, but you could swap the locator in the final method with an element object and it will work the same.

Let us consider the below example for understanding the relative locators.

Relative Locators

Available relative locators

Above

If the email text field element is not easily identifiable for some reason, but the password text field element is, we can locate the text field element using the fact that it is an “input” element “above” the password element.

Move Code

By emailLocator = RelativeLocator.with(By.tagName("input")).above(By.id("password"));
email_locator = locate_with(By.TAG_NAME, "input").above({By.ID: "password"})
var emailLocator = RelativeBy.WithLocator(By.TagName("input")).Above(By.Id("password"));
email_locator = {relative: {tag_name: 'input', above: {id: 'password'}}}
let emailLocator = locateWith(By.tagName('input')).above(By.id('password'));
val emailLocator = RelativeLocator.with(By.tagName("input")).above(By.id("password"))

Below

If the password text field element is not easily identifiable for some reason, but the email text field element is, we can locate the text field element using the fact that it is an “input” element “below” the email element.

Move Code

By passwordLocator = RelativeLocator.with(By.tagName("input")).below(By.id("email"));
password_locator = locate_with(By.TAG_NAME, "input").below({By.ID: "email"})
var passwordLocator = RelativeBy.WithLocator(By.TagName("input")).Below(By.Id("email"));
password_locator = {relative: {tag_name: 'input', below: {id: 'email'}}}
let passwordLocator = locateWith(By.tagName('input')).below(By.id('email'));
val passwordLocator = RelativeLocator.with(By.tagName("input")).below(By.id("email"))

Left of

If the cancel button is not easily identifiable for some reason, but the submit button element is, we can locate the cancel button element using the fact that it is a “button” element to the “left of” the submit element.

Move Code

By cancelLocator = RelativeLocator.with(By.tagName("button")).toLeftOf(By.id("submit"));
cancel_locator = locate_with(By.TAG_NAME, "button").to_left_of({By.ID: "submit"})
var cancelLocator = RelativeBy.WithLocator(By.tagName("button")).LeftOf(By.Id("submit"));
cancel_locator = {relative: {tag_name: 'button', left: {id: 'submit'}}}
let cancelLocator = locateWith(By.tagName('button')).toLeftOf(By.id('submit'));
val cancelLocator = RelativeLocator.with(By.tagName("button")).toLeftOf(By.id("submit"))

Right of

If the submit button is not easily identifiable for some reason, but the cancel button element is, we can locate the submit button element using the fact that it is a “button” element “to the right of” the cancel element.

Move Code

By submitLocator = RelativeLocator.with(By.tagName("button")).toRightOf(By.id("cancel"));
submit_locator = locate_with(By.TAG_NAME, "button").to_right_of({By.ID: "cancel"})
var submitLocator = RelativeBy.WithLocator(By.tagName("button")).RightOf(By.Id("cancel"));
submit_locator = {relative: {tag_name: 'button', right: {id: 'cancel'}}}
let submitLocator = locateWith(By.tagName('button')).toRightOf(By.id('cancel'));
val submitLocator = RelativeLocator.with(By.tagName("button")).toRightOf(By.id("cancel"))

Near

If the relative positioning is not obvious, or it varies based on window size, you can use the near method to identify an element that is at most 50px away from the provided locator. One great use case for this is to work with a form element that doesn’t have an easily constructed locator, but its associated input label element does.

Move Code

By emailLocator = RelativeLocator.with(By.tagName("input")).near(By.id("lbl-email"));
email_locator = locate_with(By.TAG_NAME, "input").near({By.ID: "lbl-email"})
var emailLocator = RelativeBy.WithLocator(By.tagName("input")).Near(By.Id("lbl-email"));
email_locator = {relative: {tag_name: 'input', near: {id: 'lbl-email'}}}
let emailLocator = locateWith(By.tagName('input')).near(By.id('lbl-email'));
val emailLocator = RelativeLocator.with(By.tagName("input")).near(By.id("lbl-email"));

Chaining relative locators

You can also chain locators if needed. Sometimes the element is most easily identified as being both above/below one element and right/left of another.

Move Code

By submitLocator = RelativeLocator.with(By.tagName("button")).below(By.id("email")).toRightOf(By.id("cancel"));
submit_locator = locate_with(By.TAG_NAME, "button").below({By.ID: "email"}).to_right_of({By.ID: "cancel"})
var submitLocator = RelativeBy.WithLocator(By.tagName("button")).Below(By.Id("email")).RightOf(By.Id("cancel"));
submit_locator = {relative: {tag_name: 'button', below: {id: 'email'}, right: {id: 'cancel'}}}
let submitLocator = locateWith(By.tagName('button')).below(By.id('email')).toRightOf(By.id('cancel'));
val submitLocator = RelativeLocator.with(By.tagName("button")).below(By.id("email")).toRightOf(By.id("cancel"))

5.3 - Finding web elements

Locating the elements based on the provided locator values.

One of the most fundamental aspects of using Selenium is obtaining element references to work with. Selenium offers a number of built-in locator strategies to uniquely identify an element. There are many ways to use the locators in very advanced scenarios. For the purposes of this documentation, let’s consider this HTML snippet:

<ol id="vegetables">
 <li class="potatoes"> <li class="onions"> <li class="tomatoes"><span>Tomato is a Vegetable</span></ol>
<ul id="fruits">
  <li class="bananas">  <li class="apples">  <li class="tomatoes"><span>Tomato is a Fruit</span></ul>

First matching element

Many locators will match multiple elements on the page. The singular find element method will return a reference to the first element found within a given context.

Evaluating entire DOM

When the find element method is called on the driver instance, it returns a reference to the first element in the DOM that matches with the provided locator. This value can be stored and used for future element actions. In our example HTML above, there are two elements that have a class name of “tomatoes” so this method will return the element in the “vegetables” list.

Move Code

WebElement vegetable = driver.findElement(By.className("tomatoes"));
  
vegetable = driver.find_element(By.CLASS_NAME, "tomatoes")
  
var vegetable = driver.FindElement(By.ClassName("tomatoes"));
  
vegetable = driver.find_element(class: 'tomatoes')
  
const vegetable = await driver.findElement(By.className('tomatoes'));
  
val vegetable: WebElement = driver.findElement(By.className("tomatoes"))
  

Evaluating a subset of the DOM

Rather than finding a unique locator in the entire DOM, it is often useful to narrow the search to the scope of another located element. In the above example there are two elements with a class name of “tomatoes” and it is a little more challenging to get the reference for the second one.

One solution is to locate an element with a unique attribute that is an ancestor of the desired element and not an ancestor of the undesired element, then call find element on that object:

Move Code

WebElement fruits = driver.findElement(By.id("fruits"));
WebElement fruit = fruits.findElement(By.className("tomatoes"));
  
fruits = driver.find_element(By.ID, "fruits")
fruit = fruits.find_element(By.CLASS_NAME,"tomatoes")
  
IWebElement fruits = driver.FindElement(By.Id("fruits"));
IWebElement fruit = fruits.FindElement(By.ClassName("tomatoes"));
  
fruits = driver.find_element(id: 'fruits')
fruit = fruits.find_element(class: 'tomatoes')
  
const fruits = await driver.findElement(By.id('fruits'));
const fruit = fruits.findElement(By.className('tomatoes'));
  
val fruits = driver.findElement(By.id("fruits"))
val fruit = fruits.findElement(By.className("tomatoes"))
  

Java and C#
WebDriver, WebElement and ShadowRoot classes all implement a SearchContext interface, which is considered a role-based interface. Role-based interfaces allow you to determine whether a particular driver implementation supports a given feature. These interfaces are clearly defined and try to adhere to having only a single role of responsibility.

Optimized locator

A nested lookup might not be the most effective location strategy since it requires two separate commands to be issued to the browser.

To improve the performance slightly, we can use either CSS or XPath to find this element in a single command. See the Locator strategy suggestions in our Encouraged test practices section.

For this example, we’ll use a CSS Selector:

Move Code

WebElement fruit = driver.findElement(By.cssSelector("#fruits .tomatoes"));
  
fruit = driver.find_element(By.CSS_SELECTOR,"#fruits .tomatoes")
  
var fruit = driver.FindElement(By.CssSelector("#fruits .tomatoes"));
  
fruit = driver.find_element(css: '#fruits .tomatoes')
  
const fruit = await driver.findElement(By.css('#fruits .tomatoes'));
  
val fruit = driver.findElement(By.cssSelector("#fruits .tomatoes"))
  

All matching elements

There are several use cases for needing to get references to all elements that match a locator, rather than just the first one. The plural find elements methods return a collection of element references. If there are no matches, an empty list is returned. In this case, references to all fruits and vegetable list items will be returned in a collection.

Move Code

List<WebElement> plants = driver.findElements(By.tagName("li"));
  
plants = driver.find_elements(By.TAG_NAME, "li")
  
IReadOnlyList<IWebElement> plants = driver.FindElements(By.TagName("li"));
  
plants = driver.find_elements(tag_name: 'li')
  
const plants = await driver.findElements(By.tagName('li'));
  
val plants: List<WebElement> = driver.findElements(By.tagName("li"))
  

Get element

Often you get a collection of elements but want to work with a specific element, which means you need to iterate over the collection and identify the one you want.

Move Code

List<WebElement> elements = driver.findElements(By.tagName("li"));

for (WebElement element : elements) {
    System.out.println("Paragraph text:" + element.getText());
}
  
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()

    # Navigate to Url
driver.get("https://www.example.com")

    # Get all the elements available with tag name 'p'
elements = driver.find_elements(By.TAG_NAME, 'p')

for e in elements:
    print(e.text)
  
using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using System.Collections.Generic;

namespace FindElementsExample {
 class FindElementsExample {
  public static void Main(string[] args) {
   IWebDriver driver = new FirefoxDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");

    // Get all the elements available with tag name 'p'
    IList < IWebElement > elements = driver.FindElements(By.TagName("p"));
    foreach(IWebElement e in elements) {
     System.Console.WriteLine(e.Text);
    }

   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
begin
     # Navigate to URL
  driver.get 'https://www.example.com'

     # Get all the elements available with tag name 'p'
  elements = driver.find_elements(:tag_name,'p')

  elements.each { |e|
    puts e.text
  }
ensure
  driver.quit
end
  
const {Builder, By} = require('selenium-webdriver');
(async function example() {
    let driver = await new Builder().forBrowser('firefox').build();
    try {
        // Navigate to Url
        await driver.get('https://www.example.com');

        // Get all the elements available with tag 'p'
        let elements = await driver.findElements(By.css('p'));
        for(let e of elements) {
            console.log(await e.getText());
        }
    }
    finally {
        await driver.quit();
    }
})();
  
import org.openqa.selenium.By
import org.openqa.selenium.firefox.FirefoxDriver

fun main() {
    val driver = FirefoxDriver()
    try {
        driver.get("https://example.com")
        // Get all the elements available with tag name 'p'
        val elements = driver.findElements(By.tagName("p"))
        for (element in elements) {
            println("Paragraph text:" + element.text)
        }
    } finally {
        driver.quit()
    }
}
  

Find Elements From Element

It is used to find the list of matching child WebElements within the context of parent element. To achieve this, the parent WebElement is chained with ‘findElements’ to access child elements

Move Code

  import org.openqa.selenium.By;
  import org.openqa.selenium.WebDriver;
  import org.openqa.selenium.WebElement;
  import org.openqa.selenium.chrome.ChromeDriver;
  import java.util.List;

  public class findElementsFromElement {
      public static void main(String[] args) {
          WebDriver driver = new ChromeDriver();
          try {
              driver.get("https://example.com");

              // Get element with tag name 'div'
              WebElement element = driver.findElement(By.tagName("div"));

              // Get all the elements available with tag name 'p'
              List<WebElement> elements = element.findElements(By.tagName("p"));
              for (WebElement e : elements) {
                  System.out.println(e.getText());
              }
          } finally {
              driver.quit();
          }
      }
  }
  
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://www.example.com")

    # Get element with tag name 'div'
element = driver.find_element(By.TAG_NAME, 'div')

    # Get all the elements available with tag name 'p'
elements = element.find_elements(By.TAG_NAME, 'p')
for e in elements:
    print(e.text)
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using System.Collections.Generic;

namespace FindElementsFromElement {
 class FindElementsFromElement {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    driver.Navigate().GoToUrl("https://example.com");

    // Get element with tag name 'div'
    IWebElement element = driver.FindElement(By.TagName("div"));

    // Get all the elements available with tag name 'p'
    IList < IWebElement > elements = element.FindElements(By.TagName("p"));
    foreach(IWebElement e in elements) {
     System.Console.WriteLine(e.Text);
    }
   } finally {
    driver.Quit();
   }
  }
 }
}
  
  require 'selenium-webdriver'
  driver = Selenium::WebDriver.for :chrome
  begin
    # Navigate to URL
    driver.get 'https://www.example.com'

    # Get element with tag name 'div'
    element = driver.find_element(:tag_name,'div')

    # Get all the elements available with tag name 'p'
    elements = element.find_elements(:tag_name,'p')

    elements.each { |e|
      puts e.text
    }
  ensure
    driver.quit
  end
  
  const {Builder, By} = require('selenium-webdriver');

  (async function example() {
      let driver = new Builder()
          .forBrowser('chrome')
          .build();

      await driver.get('https://www.example.com');

      // Get element with tag name 'div'
      let element = driver.findElement(By.css("div"));

      // Get all the elements available with tag name 'p'
      let elements = await element.findElements(By.css("p"));
      for(let e of elements) {
          console.log(await e.getText());
      }
  })();
  
  import org.openqa.selenium.By
  import org.openqa.selenium.chrome.ChromeDriver

  fun main() {
      val driver = ChromeDriver()
      try {
          driver.get("https://example.com")

          // Get element with tag name 'div'
          val element = driver.findElement(By.tagName("div"))

          // Get all the elements available with tag name 'p'
          val elements = element.findElements(By.tagName("p"))
          for (e in elements) {
              println(e.text)
          }
      } finally {
          driver.quit()
      }
  }
  

Get Active Element

It is used to track (or) find DOM element which has the focus in the current browsing context.

Move Code

  import org.openqa.selenium.*;
  import org.openqa.selenium.chrome.ChromeDriver;

  public class activeElementTest {
    public static void main(String[] args) {
      WebDriver driver = new ChromeDriver();
      try {
        driver.get("http://www.google.com");
        driver.findElement(By.cssSelector("[name='q']")).sendKeys("webElement");

        // Get attribute of current active element
        String attr = driver.switchTo().activeElement().getAttribute("title");
        System.out.println(attr);
      } finally {
        driver.quit();
      }
    }
  }
  
  from selenium import webdriver
  from selenium.webdriver.common.by import By

  driver = webdriver.Chrome()
  driver.get("https://www.google.com")
  driver.find_element(By.CSS_SELECTOR, '[name="q"]').send_keys("webElement")

    # Get attribute of current active element
  attr = driver.switch_to.active_element.get_attribute("title")
  print(attr)
  
    using OpenQA.Selenium;
    using OpenQA.Selenium.Chrome;

    namespace ActiveElement {
     class ActiveElement {
      public static void Main(string[] args) {
       IWebDriver driver = new ChromeDriver();
       try {
        // Navigate to Url
        driver.Navigate().GoToUrl("https://www.google.com");
        driver.FindElement(By.CssSelector("[name='q']")).SendKeys("webElement");

        // Get attribute of current active element
        string attr = driver.SwitchTo().ActiveElement().GetAttribute("title");
        System.Console.WriteLine(attr);
       } finally {
        driver.Quit();
       }
      }
     }
    }
  
  require 'selenium-webdriver'
  driver = Selenium::WebDriver.for :chrome
  begin
    driver.get 'https://www.google.com'
    driver.find_element(css: '[name="q"]').send_keys('webElement')

    # Get attribute of current active element
    attr = driver.switch_to.active_element.attribute('title')
    puts attr
  ensure
    driver.quit
  end
  
  const {Builder, By} = require('selenium-webdriver');

  (async function example() {
      let driver = await new Builder().forBrowser('chrome').build();
      await driver.get('https://www.google.com');
      await  driver.findElement(By.css('[name="q"]')).sendKeys("webElement");

      // Get attribute of current active element
      let attr = await driver.switchTo().activeElement().getAttribute("title");
      console.log(`${attr}`)
  })();
  
  import org.openqa.selenium.By
  import org.openqa.selenium.chrome.ChromeDriver

  fun main() {
      val driver = ChromeDriver()
      try {
          driver.get("https://www.google.com")
          driver.findElement(By.cssSelector("[name='q']")).sendKeys("webElement")

          // Get attribute of current active element
          val attr = driver.switchTo().activeElement().getAttribute("title")
          print(attr)
      } finally {
          driver.quit()
      }
  }
  

5.4 - Interacting with web elements

A high-level instruction set for manipulating form controls.

There are only 5 basic commands that can be executed on an element:

  • click (applies to any element)
  • send keys (only applies to text fields and content editable elements)
  • clear (only applies to text fields and content editable elements)
  • submit (only applies to form elements)
  • select (see Select List Elements)

Additional validations

These methods are designed to closely emulate a user’s experience, so, unlike the Actions API, it attempts to perform two things before attempting the specified action.

  1. If it determines the element is outside the viewport, it scrolls the element into view, specifically it will align the bottom of the element with the bottom of the viewport.
  2. It ensures the element is interactable before taking the action. This could mean that the scrolling was unsuccessful, or that the element is not otherwise displayed. Determining if an element is displayed on a page was too difficult to define directly in the webdriver specification, so Selenium sends an execute command with a JavaScript atom that checks for things that would keep the element from being displayed. If it determines an element is not in the viewport, not displayed, not keyboard-interactable, or not pointer-interactable, it returns an element not interactable error.

Click

The element click command is executed on the center of the element. If the center of the element is obscured for some reason, Selenium will return an element click intercepted error.

        driver.get("https://www.selenium.dev/selenium/web/inputs.html");

	    // Click on the element 
        WebElement checkInput=driver.findElement(By.name("checkbox_input"));
        checkInput.click();
    # Navigate to url
	driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Click on the element 
	driver.find_element(By.NAME, "color_input").click()
  
  // Navigate to Url
  driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

  // Click the element
  driver.FindElement(By.Name("color_input")).Click();
  
  
    # Navigate to URL
  driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Click the element
  driver.find_element(name: 'color_input').click

  
    await submitButton.click();
    // Navigate to Url
    driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    // Click the element
    driver.findElement(By.name("color_input")).click();
  
  

Send keys

The element send keys command types the provided keys into an editable element. Typically, this means an element is an input element of a form with a text type or an element with a content-editable attribute. If it is not editable, an invalid element state error is returned.

Here is the list of possible keystrokes that WebDriver Supports.

        // Clear field to empty it from any previous data
        WebElement emailInput=driver.findElement(By.name("email_input"));
        emailInput.clear();
	    //Enter Text
        String email="admin@localhost.dev";
	    emailInput.sendKeys(email);
    # Navigate to url
	driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Clear field to empty it from any previous data
	driver.find_element(By.NAME, "email_input").clear()

	# Enter Text
	driver.find_element(By.NAME, "email_input").send_keys("admin@localhost.dev" )

  
  // Navigate to Url
  driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

  // Clear field to empty it from any previous data
  driver.FindElement(By.Name("email_input")).Clear();
  
  //Enter Text
  driver.FindElement(By.Name("email_input")).SendKeys("admin@localhost.dev");
  
  
}
  
    # Navigate to URL
	driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Clear field to empty it from any previous data
	driver.find_element(name: 'email_input').clear
	
	# Enter Text
	driver.find_element(name: 'email_input').send_keys 'admin@localhost.dev'

  
        await inputField.sendKeys('Selenium');
  
    // Navigate to Url
    driver.get("https://www.selenium.dev/selenium/web/inputs.html")

	//Clear field to empty it from any previous data
	driver.findElement(By.name("email_input")).clear()
	
    // Enter text 
    driver.findElement(By.name("email_input")).sendKeys("admin@localhost.dev")
  
  

Clear

The element clear command resets the content of an element. This requires an element to be editable, and resettable. Typically, this means an element is an input element of a form with a text type or an element with acontent-editable attribute. If these conditions are not met, an invalid element state error is returned.

        //Clear Element
        // Clear field to empty it from any previous data
        emailInput.clear();
    # Navigate to url
	driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Clear field to empty it from any previous data
	driver.find_element(By.NAME, "email_input").clear()

	
  
  // Navigate to Url
  driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

  // Clear field to empty it from any previous data
  driver.FindElement(By.Name("email_input")).Clear();
  
 
  
}
  
    # Navigate to URL
	driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Clear field to empty it from any previous data
	driver.find_element(name: 'email_input').clear

  
        await inputField.clear();
  
    // Navigate to Url
    driver.get("https://www.selenium.dev/selenium/web/inputs.html")

	//Clear field to empty it from any previous data
	driver.findElement(By.name("email_input")).clear()
	
  
  

Submit

In Selenium 4 this is no longer implemented with a separate endpoint and functions by executing a script. As such, it is recommended not to use this method and to click the applicable form submission button instead.

5.5 - Information about web elements

What you can learn about an element.

There are a number of details you can query about a specific element.

Is Displayed

This method is used to check if the connected Element is displayed on a webpage. Returns a Boolean value, True if the connected element is displayed in the current browsing context else returns false.

This functionality is mentioned in, but not defined by the w3c specification due to the impossibility of covering all potential conditions. As such, Selenium cannot expect drivers to implement this functionality directly, and now relies on executing a large JavaScript function directly. This function makes many approximations about an element’s nature and relationship in the tree to return a value.

         driver.get("https://www.selenium.dev/selenium/web/inputs.html");

    	// isDisplayed        
        // Get boolean value for is element display
        boolean isEmailVisible = driver.findElement(By.name("email_input")).isDisplayed();
        assertEquals(isEmailVisible,true);
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

# Get boolean value for is element display
is_email_visible = driver.find_element(By.NAME, "email_input").is_displayed()
//Navigate to the url
driver.Url = "https://www.selenium.dev/selenium/web/inputs.html";

//Get boolean value for is element display
Boolean is_email_visible = driver.FindElement(By.Name("email_input")).Displayed;
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

#fetch display status
val = driver.find_element(name: 'email_input').displayed?
    // Resolves Promise and returns boolean value
    let result =  await driver.findElement(By.name("email_input")).isDisplayed();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns true if element is displayed else returns false
 val flag = driver.findElement(By.name("email_input")).isDisplayed()

Is Enabled

This method is used to check if the connected Element is enabled or disabled on a webpage. Returns a boolean value, True if the connected element is enabled in the current browsing context else returns false.

        //isEnabled
       //returns true if element is enabled else returns false
        boolean isEnabledButton = driver.findElement(By.name("button_input")).isEnabled();
        assertEquals(isEnabledButton,true);
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns true if element is enabled else returns false
value = driver.find_element(By.NAME, 'button_input').is_enabled()
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

// Store the WebElement
IWebElement element = driver.FindElement(By.Name("button_input"));

// Prints true if element is enabled else returns false
System.Console.WriteLine(element.Enabled);
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns true if element is enabled else returns false
ele = driver.find_element(name: 'button_input').enabled?
  
    // Resolves Promise and returns boolean value
    let element =  await driver.findElement(By.name("button_input")).isEnabled();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns true if element is enabled else returns false
 val attr = driver.findElement(By.name("button_input")).isEnabled()
  

Is Selected

This method determines if the referenced Element is Selected or not. This method is widely used on Check boxes, radio buttons, input elements, and option elements.

Returns a boolean value, True if referenced element is selected in the current browsing context else returns false.

        //isSelected
        //returns true if element is checked else returns false
        boolean isSelectedCheck = driver.findElement(By.name("checkbox_input")).isSelected();
        assertEquals(isSelectedCheck,true); 
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns true if element is checked else returns false
value = driver.find_element(By.NAME, "checkbox_input").is_selected()
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

// Returns true if element ins checked else returns false
bool value = driver.FindElement(By.Name("checkbox_input")).Selected;
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns true if element is checked else returns false
ele = driver.find_element(name: "checkbox_input").selected?
  
    // Returns true if element ins checked else returns false
    let isSelected = await driver.findElement(By.name("checkbox_input")).isSelected();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns true if element is checked else returns false
 val attr =  driver.findElement(By.name("checkbox_input")).isSelected()
  

Tag Name

It is used to fetch the TagName of the referenced Element which has the focus in the current browsing context.

        //TagName
        //returns TagName of the element
        String tagNameInp = driver.findElement(By.name("email_input")).getTagName();
        assertEquals(tagNameInp,"input"); 
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns TagName of the element
attr = driver.find_element(By.NAME, "email_input").tag_name
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

// Returns TagName of the element
string attr = driver.FindElement(By.Name("email_input")).TagName;
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns TagName of the element
attr = driver.find_element(name: "email_input").tag_name
  
    // Returns TagName of the element
    let value = await driver.findElement(By.name('email_input')).getTagName();
 //navigates to url
 driver.get("https://www.selenium.dev/selenium/web/inputs.html")

 //returns TagName of the element
 val attr =  driver.findElement(By.name("email_input")).getTagName()
  

Size and Position

It is used to fetch the dimensions and coordinates of the referenced element.

The fetched data body contain the following details:

  • X-axis position from the top-left corner of the element
  • y-axis position from the top-left corner of the element
  • Height of the element
  • Width of the element
        //GetRect
        // Returns height, width, x and y coordinates referenced element
        Rectangle res =  driver.findElement(By.name("range_input")).getRect();
        // Rectangle class provides getX,getY, getWidth, getHeight methods
        assertEquals(res.getX(),10);
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

    # Returns height, width, x and y coordinates referenced element
res = driver.find_element(By.NAME, "range_input").rect
  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/inputs.html");

var res = driver.FindElement(By.Name("range_input"));
// Return x and y coordinates referenced element
System.Console.WriteLine(res.Location);
// Returns height, width
System.Console.WriteLine(res.Size);
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/inputs.html'

    # Returns height, width, x and y coordinates referenced element
res = driver.find_element(name: "range_input").rect
  
    let object = await driver.findElement(By.name('range_input')).getRect();
// Navigate to url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

// Returns height, width, x and y coordinates referenced element
val res = driver.findElement(By.name("range_input")).rect

// Rectangle class provides getX,getY, getWidth, getHeight methods
println(res.getX())
  

Get CSS Value

Retrieves the value of specified computed style property of an element in the current browsing context.

     // Retrieves the computed style property 'font-size' of field
     String cssValue = driver.findElement(By.name("color_input")).getCssValue("font-size");
     assertEquals(cssValue, "13.3333px");
    # Navigate to Url
driver.get('https://www.selenium.dev/selenium/web/colorPage.html')

    # Retrieves the computed style property 'color' of linktext
cssValue = driver.find_element(By.ID, "namedColor").value_of_css_property('background-color')

  
// Navigate to Url
driver.Navigate().GoToUrl("https://www.selenium.dev/selenium/web/colorPage.html");

// Retrieves the computed style property 'color' of linktext
String cssValue = driver.FindElement(By.Id("namedColor")).GetCssValue("background-color");

  
    # Navigate to Url
driver.get 'https://www.selenium.dev/selenium/web/colorPage.html'

    # Retrieves the computed style property 'color' of linktext
cssValue = driver.find_element(:id, 'namedColor').css_value('background-color')

  
    await driver.get('https://www.selenium.dev/selenium/web/colorPage.html');
      // Returns background color of the element
      let value = await driver.findElement(By.id('namedColor')).getCssValue('background-color');
// Navigate to Url
driver.get("https://www.selenium.dev/selenium/web/colorPage.html")

// Retrieves the computed style property 'color' of linktext
val cssValue = driver.findElement(By.id("namedColor")).getCssValue("background-color")

  

Text Content

Retrieves the rendered text of the specified element.

        //GetText
       // Retrieves the text of the element
        String text = driver.findElement(By.tagName("h1")).getText();
        assertEquals(text, "Testing Inputs");
    # Navigate to url
driver.get("https://www.selenium.dev/selenium/web/linked_image.html")

    # Retrieves the text of the element
text = driver.find_element(By.ID, "justanotherlink").text
  
// Navigate to url
driver.Url="https://www.selenium.dev/selenium/web/linked_image.html";

// Retrieves the text of the element
String text = driver.FindElement(By.Id("justanotherlink")).Text;
  
    # Navigate to url
driver.get 'https://www.selenium.dev/selenium/web/linked_image.html'

    # Retrieves the text of the element
text = driver.find_element(:id, 'justanotherlink').text
  
    await driver.get('https://www.selenium.dev/selenium/web/linked_image.html');
    // Returns text of the element
    let text = await driver.findElement(By.id('justanotherLink')).getText();
// Navigate to URL
driver.get("https://www.selenium.dev/selenium/web/linked_image.html")

// retrieves the text of the element
val text = driver.findElement(By.id("justanotherlink")).getText()
  

Fetching Attributes or Properties

Fetches the run time value associated with a DOM attribute. It returns the data associated with the DOM attribute or property of the element.

        //FetchAttributes
      //identify the email text box
      WebElement emailTxt = driver.findElement(By.name(("email_input")));
     //fetch the value property associated with the textbox
      String valueInfo = emailTxt.getAttribute("value");
      assertEquals(valueInfo,"admin@localhost");
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

# Identify the email text box
email_txt = driver.find_element(By.NAME, "email_input")

# Fetch the value property associated with the textbox
value_info = email_txt.get_attribute("value")
  
 //Navigate to the url
driver.Url="https://www.selenium.dev/selenium/web/inputs.html";

//identify the email text box
IWebElement emailTxt = driver.FindElement(By.Name(("email_input")));

//fetch the value property associated with the textbox
String valueInfo = eleSelLink.GetAttribute("value");
  
# Navigate to the url
driver.get("https://www.selenium.dev/selenium/web/inputs.html");

#identify the email text box
email_element=driver.find_element(name: 'email_input')

#fetch the value property associated with the textbox
emailVal = email_element.attribute("value");
  
    // identify the email text box
    const emailElement = await driver.findElement(By.xpath('//input[@name="email_input"]'));
    
    //fetch the attribute "name" associated with the textbox
    const nameAttribute = await emailElement.getAttribute("name");
// Navigate to URL
driver.get("https://www.selenium.dev/selenium/web/inputs.html")

//fetch the value property associated with the textbox
val attr = driver.findElement(By.name("email_input")).getAttribute("value")
  

6 - Browser interactions

Get browser information

Get title

You can read the current page title from the browser:

Move Code

      String title = driver.getTitle();
title = driver.title
driver.Title;
driver.title
    let title = await driver.getTitle();
driver.title

Get current URL

You can read the current URL from the browser’s address bar using:

Move Code

      String url = driver.getCurrentUrl();
url = driver.current_url
driver.Url;
driver.current_url
    let currentUrl = await driver.getCurrentUrl();
driver.currentUrl

6.1 - Browser navigation

The first thing you will want to do after launching a browser is to open your website. This can be achieved in a single line:

        //Convenient
        driver.get("https://selenium.dev");
            
        //Longer way
        driver.navigate().to("https://selenium.dev");
driver.get("https://www.selenium.dev/selenium/web/index.html")
  




  




  













  
  
  

  
  
  

  
  
    
    
    
    
    
    
  

  <div class="highlight"><pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cs" data-lang="cs"><span style="display:flex;"><span>            <span style="color:#8f5902;font-style:italic">//Convenient</span>
</span></span><span style="display:flex;"><span>            <span style="color:#000">driver</span><span style="color:#000;font-weight:bold">.</span><span style="color:#000">Url</span> <span style="color:#000;font-weight:bold">=</span> <span style="color:#4e9a06">&#34;https://selenium.dev&#34;</span><span style="color:#000;font-weight:bold">;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#8f5902;font-style:italic">//Longer</span>
</span></span><span style="display:flex;"><span>            <span style="color:#000">driver</span><span style="color:#000;font-weight:bold">.</span><span style="color:#000">Navigate</span><span style="color:#000;font-weight:bold">().</span><span style="color:#000">GoToUrl</span><span style="color:#000;font-weight:bold">(</span><span style="color:#4e9a06">&#34;https://selenium.dev&#34;</span><span style="color:#000;font-weight:bold">);</span></span></span></code></pre></div>

  <div class="text-end pb-2">
    <a href="https://github.com/SeleniumHQ/seleniumhq.github.io/blob/trunk/examples/dotnet/SeleniumDocs/Interactions/NavigationTest.cs#L17-L20" target="_blank">
      <i class="fas fa-external-link-alt pl-2"></i>
        <strong>View full example on GitHub</strong>
    </a>
  </div>




  
driver.navigate.to 'https://selenium.dev'
  
        //Convenient
          await driver.get('https://www.selenium.dev');
  
          //Longer way
          await driver.navigate().to("https://www.selenium.dev/selenium/web/index.html");
//Convenient
driver.get("https://selenium.dev")

//Longer way
driver.navigate().to("https://selenium.dev")
  

Back

Pressing the browser’s back button:

        //Back
        driver.navigate().back();
 
 
 
 
 
 
 
 
   
 
 
 
 
   
 
 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
   
   
   
 
   
   
   
 
   
   
     
     
     
     
     
     
   
 
   <div class="highlight"><pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cs" data-lang="cs"><span style="display:flex;"><span>            <span style="color:#8f5902;font-style:italic">//Back</span>
 </span></span><span style="display:flex;"><span>            <span style="color:#000">driver</span><span style="color:#000;font-weight:bold">.</span><span style="color:#000">Navigate</span><span style="color:#000;font-weight:bold">().</span><span style="color:#000">Back</span><span style="color:#000;font-weight:bold">();</span></span></span></code></pre></div>
 
   <div class="text-end pb-2">
     <a href="https://github.com/SeleniumHQ/seleniumhq.github.io/blob/trunk/examples/dotnet/SeleniumDocs/Interactions/NavigationTest.cs#L24-L25" target="_blank">
       <i class="fas fa-external-link-alt pl-2"></i>
         <strong>View full example on GitHub</strong>
     </a>
   </div>
 
 
 

  
driver.navigate.back
        //Back
          await driver.navigate().back();
driver.navigate().back() 

Forward

Pressing the browser’s forward button:

        //Forward
        driver.navigate().forward();
 
 
 
 
 
 
 
 
   
 
 
 
 
   
 
 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
   
   
   
 
   
   
   
 
   
   
     
     
     
     
     
     
   
 
   <div class="highlight"><pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cs" data-lang="cs"><span style="display:flex;"><span>            <span style="color:#8f5902;font-style:italic">//Forward</span>
 </span></span><span style="display:flex;"><span>            <span style="color:#000">driver</span><span style="color:#000;font-weight:bold">.</span><span style="color:#000">Navigate</span><span style="color:#000;font-weight:bold">().</span><span style="color:#000">Forward</span><span style="color:#000;font-weight:bold">();</span></span></span></code></pre></div>
 
   <div class="text-end pb-2">
     <a href="https://github.com/SeleniumHQ/seleniumhq.github.io/blob/trunk/examples/dotnet/SeleniumDocs/Interactions/NavigationTest.cs#L29-L30" target="_blank">
       <i class="fas fa-external-link-alt pl-2"></i>
         <strong>View full example on GitHub</strong>
     </a>
   </div>
 
 
 

  
driver.navigate.forward
        //Forward
        await driver.navigate().forward();
driver.navigate().forward()

Refresh

Refresh the current page:

        //Refresh
        driver.navigate().refresh();
 
 
 
 
 
 
 
 
   
 
 
 
 
   
 
 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
   
   
   
 
   
   
   
 
   
   
     
     
     
     
     
     
   
 
   <div class="highlight"><pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cs" data-lang="cs"><span style="display:flex;"><span>            <span style="color:#8f5902;font-style:italic">//Refresh</span>
 </span></span><span style="display:flex;"><span>            <span style="color:#000">driver</span><span style="color:#000;font-weight:bold">.</span><span style="color:#000">Navigate</span><span style="color:#000;font-weight:bold">().</span><span style="color:#000">Refresh</span><span style="color:#000;font-weight:bold">();</span></span></span></code></pre></div>
 
   <div class="text-end pb-2">
     <a href="https://github.com/SeleniumHQ/seleniumhq.github.io/blob/trunk/examples/dotnet/SeleniumDocs/Interactions/NavigationTest.cs#L34-L35" target="_blank">
       <i class="fas fa-external-link-alt pl-2"></i>
         <strong>View full example on GitHub</strong>
     </a>
   </div>
 
 
 

  
driver.navigate.refresh
        //Refresh
        await driver.navigate().refresh();
driver.navigate().refresh()

6.2 - JavaScript alerts, prompts and confirmations

WebDriver provides an API for working with the three types of native popup messages offered by JavaScript. These popups are styled by the browser and offer limited customisation.

Alerts

The simplest of these is referred to as an alert, which shows a custom message, and a single button which dismisses the alert, labelled in most browsers as OK. It can also be dismissed in most browsers by pressing the close button, but this will always do the same thing as the OK button. See an example alert.

WebDriver can get the text from the popup and accept or dismiss these alerts.

Move Code

//Click the link to activate the alert
driver.findElement(By.linkText("See an example alert")).click();

//Wait for the alert to be displayed and store it in a variable
Alert alert = wait.until(ExpectedConditions.alertIsPresent());

//Store the alert text in a variable
String text = alert.getText();

//Press the OK button
alert.accept();
  
# Click the link to activate the alert
driver.find_element(By.LINK_TEXT, "See an example alert").click()

# Wait for the alert to be displayed and store it in a variable
alert = wait.until(expected_conditions.alert_is_present())

# Store the alert text in a variable
text = alert.text

# Press the OK button
alert.accept()
  
//Click the link to activate the alert
driver.FindElement(By.LinkText("See an example alert")).Click();

//Wait for the alert to be displayed and store it in a variable
IAlert alert = wait.Until(ExpectedConditions.AlertIsPresent());

//Store the alert text in a variable
string text = alert.Text;

//Press the OK button
alert.Accept();
  
# Click the link to activate the alert
driver.find_element(:link_text, 'See an example alert').click

# Store the alert reference in a variable
alert = driver.switch_to.alert

# Store the alert text in a variable
alert_text = alert.text

# Press on OK button
alert.accept
  
        let alert = await driver.switchTo().alert();
          let alertText = await alert.getText();
          await alert.accept();
//Click the link to activate the alert
driver.findElement(By.linkText("See an example alert")).click()

//Wait for the alert to be displayed and store it in a variable
val alert = wait.until(ExpectedConditions.alertIsPresent())

//Store the alert text in a variable
val text = alert.getText()

//Press the OK button
alert.accept()
  

Confirm

A confirm box is similar to an alert, except the user can also choose to cancel the message. See a sample confirm.

This example also shows a different approach to storing an alert:

Move Code

//Click the link to activate the alert
driver.findElement(By.linkText("See a sample confirm")).click();

//Wait for the alert to be displayed
wait.until(ExpectedConditions.alertIsPresent());

//Store the alert in a variable
Alert alert = driver.switchTo().alert();

//Store the alert in a variable for reuse
String text = alert.getText();

//Press the Cancel button
alert.dismiss();
  
# Click the link to activate the alert
driver.find_element(By.LINK_TEXT, "See a sample confirm").click()

# Wait for the alert to be displayed
wait.until(expected_conditions.alert_is_present())

# Store the alert in a variable for reuse
alert = driver.switch_to.alert

# Store the alert text in a variable
text = alert.text

# Press the Cancel button
alert.dismiss()
  
//Click the link to activate the alert
driver.FindElement(By.LinkText("See a sample confirm")).Click();

//Wait for the alert to be displayed
wait.Until(ExpectedConditions.AlertIsPresent());

//Store the alert in a variable
IAlert alert = driver.SwitchTo().Alert();

//Store the alert in a variable for reuse
string text = alert.Text;

//Press the Cancel button
alert.Dismiss();
  
# Click the link to activate the alert
driver.find_element(:link_text, 'See a sample confirm').click

# Store the alert reference in a variable
alert = driver.switch_to.alert

# Store the alert text in a variable
alert_text = alert.text

# Press on Cancel button
alert.dismiss
  
        let alert = await driver.switchTo().alert();
          let alertText = await alert.getText();
          await alert.dismiss();
//Click the link to activate the alert
driver.findElement(By.linkText("See a sample confirm")).click()

//Wait for the alert to be displayed
wait.until(ExpectedConditions.alertIsPresent())

//Store the alert in a variable
val alert = driver.switchTo().alert()

//Store the alert in a variable for reuse
val text = alert.text

//Press the Cancel button
alert.dismiss()
  

Prompt

Prompts are similar to confirm boxes, except they also include a text input. Similar to working with form elements, you can use WebDriver’s send keys to fill in a response. This will completely replace the placeholder text. Pressing the cancel button will not submit any text. See a sample prompt.

Move Code

//Click the link to activate the alert
driver.findElement(By.linkText("See a sample prompt")).click();

//Wait for the alert to be displayed and store it in a variable
Alert alert = wait.until(ExpectedConditions.alertIsPresent());

//Type your message
alert.sendKeys("Selenium");

//Press the OK button
alert.accept();
  
# Click the link to activate the alert
driver.find_element(By.LINK_TEXT, "See a sample prompt").click()

# Wait for the alert to be displayed
wait.until(expected_conditions.alert_is_present())

# Store the alert in a variable for reuse
alert = Alert(driver)

# Type your message
alert.send_keys("Selenium")

# Press the OK button
alert.accept()
  
//Click the link to activate the alert
driver.FindElement(By.LinkText("See a sample prompt")).Click();

//Wait for the alert to be displayed and store it in a variable
IAlert alert = wait.Until(ExpectedConditions.AlertIsPresent());

//Type your message
alert.SendKeys("Selenium");

//Press the OK button
alert.Accept();
  
# Click the link to activate the alert
driver.find_element(:link_text, 'See a sample prompt').click

# Store the alert reference in a variable
alert = driver.switch_to.alert

# Type a message
alert.send_keys("selenium")

# Press on Ok button
alert.accept
  
        let alert = await driver.switchTo().alert();
        //Type your message
        await alert.sendKeys(text);
        await alert.accept();
//Click the link to activate the alert
driver.findElement(By.linkText("See a sample prompt")).click()

//Wait for the alert to be displayed and store it in a variable
val alert = wait.until(ExpectedConditions.alertIsPresent())

//Type your message
alert.sendKeys("Selenium")

//Press the OK button
alert.accept()
  

6.3 - Working with cookies

A cookie is a small piece of data that is sent from a website and stored in your computer. Cookies are mostly used to recognise the user and load the stored information.

WebDriver API provides a way to interact with cookies with built-in methods:

It is used to add a cookie to the current browsing context. Add Cookie only accepts a set of defined serializable JSON object. Here is the link to the list of accepted JSON key values

First of all, you need to be on the domain that the cookie will be valid for. If you are trying to preset cookies before you start interacting with a site and your homepage is large / takes a while to load an alternative is to find a smaller page on the site (typically the 404 page is small, e.g. http://example.com/some404page)

Move Code

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class addCookie {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");

            // Adds the cookie into current browser context
            driver.manage().addCookie(new Cookie("key", "value"));
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("http://www.example.com")

# Adds the cookie into current browser context
driver.add_cookie({"name": "key", "value": "value"})
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace AddCookie {
 class AddCookie {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");

    // Adds the cookie into current browser context
    driver.Manage().Cookies.AddCookie(new Cookie("key", "value"));
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  
  # Adds the cookie into current browser context
  driver.manage.add_cookie(name: "key", value: "value")
ensure
  driver.quit
end
  
            await driver.manage().addCookie({ name: 'key', value: 'value' });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")

        // Adds the cookie into current browser context
        driver.manage().addCookie(Cookie("key", "value"))
    } finally {
        driver.quit()
    }
}
  

It returns the serialized cookie data matching with the cookie name among all associated cookies.

Move Code

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class getCookieNamed {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            driver.manage().addCookie(new Cookie("foo", "bar"));

            // Get cookie details with named cookie 'foo'
            Cookie cookie1 = driver.manage().getCookieNamed("foo");
            System.out.println(cookie1);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")

# Adds the cookie into current browser context
driver.add_cookie({"name": "foo", "value": "bar"})

# Get cookie details with named cookie 'foo'
print(driver.get_cookie("foo"))
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace GetCookieNamed {
 class GetCookieNamed {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("foo", "bar"));

    // Get cookie details with named cookie 'foo'
    var cookie = driver.Manage().Cookies.GetCookieNamed("foo");
    System.Console.WriteLine(cookie);
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "foo", value: "bar")

  # Get cookie details with named cookie 'foo'
  puts driver.manage.cookie_named('foo')
ensure
  driver.quit
end
  
            // Get cookie details with named cookie 'foo'
            await driver.manage().getCookie('foo').then(function(cookie) {
                console.log('cookie details => ', cookie);
            });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("foo", "bar"))

        // Get cookie details with named cookie 'foo'
        val cookie = driver.manage().getCookieNamed("foo")
        println(cookie)
    } finally {
        driver.quit()
    }
}  
  

Get All Cookies

It returns a ‘successful serialized cookie data’ for current browsing context. If browser is no longer available it returns error.

Move Code

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;
import java.util.Set;

public class getAllCookies {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            // Add few cookies
            driver.manage().addCookie(new Cookie("test1", "cookie1"));
            driver.manage().addCookie(new Cookie("test2", "cookie2"));

            // Get All available cookies
            Set<Cookie> cookies = driver.manage().getCookies();
            System.out.println(cookies);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")

driver.add_cookie({"name": "test1", "value": "cookie1"})
driver.add_cookie({"name": "test2", "value": "cookie2"})

# Get all available cookies
print(driver.get_cookies())
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace GetAllCookies {
 class GetAllCookies {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("test1", "cookie1"));
    driver.Manage().Cookies.AddCookie(new Cookie("test2", "cookie2"));

    // Get All available cookies
    var cookies = driver.Manage().Cookies.AllCookies;
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "test1", value: "cookie1")
  driver.manage.add_cookie(name: "test2", value: "cookie2")

  # Get all available cookies
  puts driver.manage.all_cookies
ensure
  driver.quit
end
  
            await driver.manage().getCookies().then(function(cookies) {
                console.log('cookie details => ', cookies);
            });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("test1", "cookie1"))
        driver.manage().addCookie(Cookie("test2", "cookie2"))

        // Get All available cookies
        val cookies = driver.manage().cookies
        println(cookies)
    } finally {
        driver.quit()
    }
}  
  

It deletes the cookie data matching with the provided cookie name.

Move Code

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class deleteCookie {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            driver.manage().addCookie(new Cookie("test1", "cookie1"));
            Cookie cookie1 = new Cookie("test2", "cookie2");
            driver.manage().addCookie(cookie1);

            // delete a cookie with name 'test1'
            driver.manage().deleteCookieNamed("test1");

            /*
             Selenium Java bindings also provides a way to delete
             cookie by passing cookie object of current browsing context
             */
            driver.manage().deleteCookie(cookie1);
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver
driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")
driver.add_cookie({"name": "test1", "value": "cookie1"})
driver.add_cookie({"name": "test2", "value": "cookie2"})

# Delete a cookie with name 'test1'
driver.delete_cookie("test1")
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace DeleteCookie {
 class DeleteCookie {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("test1", "cookie1"));
    var cookie = new Cookie("test2", "cookie2");
    driver.Manage().Cookies.AddCookie(cookie);

    // delete a cookie with name 'test1'	
    driver.Manage().Cookies.DeleteCookieNamed("test1");

    // Selenium .net bindings also provides a way to delete
    // cookie by passing cookie object of current browsing context
    driver.Manage().Cookies.DeleteCookie(cookie);
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "test1", value: "cookie1")
  driver.manage.add_cookie(name: "test2", value: "cookie2")

  # delete a cookie with name 'test1'
  driver.manage.delete_cookie('test1')
ensure
  driver.quit
end
  
            // Delete a cookie with name 'test1'
            await driver.manage().deleteCookie('test1');
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("test1", "cookie1"))
        val cookie1 = Cookie("test2", "cookie2")
        driver.manage().addCookie(cookie1)

        // delete a cookie with name 'test1'
        driver.manage().deleteCookieNamed("test1")
        
        // delete cookie by passing cookie object of current browsing context.
        driver.manage().deleteCookie(cookie1)
    } finally {
        driver.quit()
    }
}  
  

Delete All Cookies

It deletes all the cookies of the current browsing context.

Move Code

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class deleteAllCookies {
    public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("http://www.example.com");
            driver.manage().addCookie(new Cookie("test1", "cookie1"));
            driver.manage().addCookie(new Cookie("test2", "cookie2"));

            // deletes all cookies
            driver.manage().deleteAllCookies();
        } finally {
            driver.quit();
        }
    }
}
  
from selenium import webdriver
driver = webdriver.Chrome()

# Navigate to url
driver.get("http://www.example.com")
driver.add_cookie({"name": "test1", "value": "cookie1"})
driver.add_cookie({"name": "test2", "value": "cookie2"})

#  Deletes all cookies
driver.delete_all_cookies()
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace DeleteAllCookies {
 class DeleteAllCookies {
  public static void Main(string[] args) {
   IWebDriver driver = new ChromeDriver();
   try {
    // Navigate to Url
    driver.Navigate().GoToUrl("https://example.com");
    driver.Manage().Cookies.AddCookie(new Cookie("test1", "cookie1"));
    driver.Manage().Cookies.AddCookie(new Cookie("test2", "cookie2"));

    // deletes all cookies
    driver.Manage().Cookies.DeleteAllCookies();
   } finally {
    driver.Quit();
   }
  }
 }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  driver.manage.add_cookie(name: "test1", value: "cookie1")
  driver.manage.add_cookie(name: "test2", value: "cookie2")

  # deletes all cookies
  driver.manage.delete_all_cookies
ensure
  driver.quit
end
  
            // Delete all cookies
            await driver.manage().deleteAllCookies();
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("https://example.com")
        driver.manage().addCookie(Cookie("test1", "cookie1"))
        driver.manage().addCookie(Cookie("test2", "cookie2"))

        // deletes all cookies
        driver.manage().deleteAllCookies()
    } finally {
        driver.quit()
    }
}  
  

It allows a user to instruct browsers to control whether cookies are sent along with the request initiated by third party sites. It is introduced to prevent CSRF (Cross-Site Request Forgery) attacks.

Same-Site cookie attribute accepts two parameters as instructions

Strict:

When the sameSite attribute is set as Strict, the cookie will not be sent along with requests initiated by third party websites.

Lax:

When you set a cookie sameSite attribute to Lax, the cookie will be sent along with the GET request initiated by third party website.

Note: As of now this feature is landed in chrome(80+version), Firefox(79+version) and works with Selenium 4 and later versions.

Move Code

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;

public class cookieTest {
  public static void main(String[] args) {
    WebDriver driver = new ChromeDriver();
    try {
      driver.get("http://www.example.com");
      Cookie cookie = new Cookie.Builder("key", "value").sameSite("Strict").build();
      Cookie cookie1 = new Cookie.Builder("key", "value").sameSite("Lax").build();
      driver.manage().addCookie(cookie);
      driver.manage().addCookie(cookie1);
      System.out.println(cookie.getSameSite());
      System.out.println(cookie1.getSameSite());
    } finally {
      driver.quit();
    }
  }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("http://www.example.com")
# Adds the cookie into current browser context with sameSite 'Strict' (or) 'Lax'
driver.add_cookie({"name": "foo", "value": "value", 'sameSite': 'Strict'})
driver.add_cookie({"name": "foo1", "value": "value", 'sameSite': 'Lax'})
cookie1 = driver.get_cookie('foo')
cookie2 = driver.get_cookie('foo1')
print(cookie1)
print(cookie2)
  
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

namespace SameSiteCookie {
  class SameSiteCookie {
    static void Main(string[] args) {
      IWebDriver driver = new ChromeDriver();
      try {
        driver.Navigate().GoToUrl("http://www.example.com");

        var cookie1Dictionary = new System.Collections.Generic.Dictionary<string, object>() {
          { "name", "test1" }, { "value", "cookie1" }, { "sameSite", "Strict" } };
        var cookie1 = Cookie.FromDictionary(cookie1Dictionary);

        var cookie2Dictionary = new System.Collections.Generic.Dictionary<string, object>() {
          { "name", "test2" }, { "value", "cookie2" }, { "sameSite", "Lax" } };
        var cookie2 = Cookie.FromDictionary(cookie2Dictionary);

        driver.Manage().Cookies.AddCookie(cookie1);
        driver.Manage().Cookies.AddCookie(cookie2);

        System.Console.WriteLine(cookie1.SameSite);
        System.Console.WriteLine(cookie2.SameSite);
      } finally {
        driver.Quit();
      }
    }
  }
}
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://www.example.com'
  # Adds the cookie into current browser context with sameSite 'Strict' (or) 'Lax'
  driver.manage.add_cookie(name: "foo", value: "bar", same_site: "Strict")
  driver.manage.add_cookie(name: "foo1", value: "bar", same_site: "Lax")
  puts driver.manage.cookie_named('foo')
  puts driver.manage.cookie_named('foo1')
ensure
  driver.quit
end
  
            // set a cookie on the current domain with sameSite 'Strict' (or) 'Lax'
            await driver.manage().addCookie({ name: 'key', value: 'value', sameSite: 'Strict' });
            await driver.manage().addCookie({ name: 'key', value: 'value', sameSite: 'Lax' });
import org.openqa.selenium.Cookie
import org.openqa.selenium.chrome.ChromeDriver

fun main() {
    val driver = ChromeDriver()
    try {
        driver.get("http://www.example.com")
        val cookie = Cookie.Builder("key", "value").sameSite("Strict").build()
        val cookie1 = Cookie.Builder("key", "value").sameSite("Lax").build()
        driver.manage().addCookie(cookie)
        driver.manage().addCookie(cookie1)
        println(cookie.getSameSite())
        println(cookie1.getSameSite())
    } finally {
        driver.quit()
    }
}
  

6.4 - Working with IFrames and frames

Frames are a now deprecated means of building a site layout from multiple documents on the same domain. You are unlikely to work with them unless you are working with an pre HTML5 webapp. Iframes allow the insertion of a document from an entirely different domain, and are still commonly used.

If you need to work with frames or iframes, WebDriver allows you to work with them in the same way. Consider a button within an iframe. If we inspect the element using the browser development tools, we might see the following:

<div id="modal">
  <iframe id="buttonframe" name="myframe"  src="https://seleniumhq.github.io">
   <button>Click here</button>
 </iframe>
</div>

If it was not for the iframe we would expect to click on the button using something like:

Move Code

//This won't work
driver.findElement(By.tagName("button")).click();
  
    # This Wont work
driver.find_element(By.TAG_NAME, 'button').click()
  
//This won't work
driver.FindElement(By.TagName("button")).Click();
  
    # This won't work
driver.find_element(:tag_name,'button').click
  
// This won't work
await driver.findElement(By.css('button')).click();
  
//This won't work
driver.findElement(By.tagName("button")).click()
  

However, if there are no buttons outside of the iframe, you might instead get a no such element error. This happens because Selenium is only aware of the elements in the top level document. To interact with the button, we will need to first switch to the frame, in a similar way to how we switch windows. WebDriver offers three ways of switching to a frame.

Using a WebElement

Switching using a WebElement is the most flexible option. You can find the frame using your preferred selector and switch to it.

Move Code

//Store the web element
WebElement iframe = driver.findElement(By.cssSelector("#modal>iframe"));

//Switch to the frame
driver.switchTo().frame(iframe);

//Now we can click the button
driver.findElement(By.tagName("button")).click();
  
    # Store iframe web element
iframe = driver.find_element(By.CSS_SELECTOR, "#modal > iframe")

    # switch to selected iframe
driver.switch_to.frame(iframe)

    # Now click on button
driver.find_element(By.TAG_NAME, 'button').click()
  
//Store the web element
IWebElement iframe = driver.FindElement(By.CssSelector("#modal>iframe"));

//Switch to the frame
driver.SwitchTo().Frame(iframe);

//Now we can click the button
driver.FindElement(By.TagName("button")).Click();
  
    # Store iframe web element
iframe = driver.find_element(:css,'#modal > iframe')

    # Switch to the frame
driver.switch_to.frame iframe

    # Now, Click on the button
driver.find_element(:tag_name,'button').click
  
// Store the web element
const iframe = driver.findElement(By.css('#modal > iframe'));

// Switch to the frame
await driver.switchTo().frame(iframe);

// Now we can click the button
await driver.findElement(By.css('button')).click();
  
//Store the web element
val iframe = driver.findElement(By.cssSelector("#modal>iframe"))

//Switch to the frame
driver.switchTo().frame(iframe)

//Now we can click the button
driver.findElement(By.tagName("button")).click()
  

Using a name or ID

If your frame or iframe has an id or name attribute, this can be used instead. If the name or ID is not unique on the page, then the first one found will be switched to.

Move Code

//Using the ID
driver.switchTo().frame("buttonframe");

//Or using the name instead
driver.switchTo().frame("myframe");

//Now we can click the button
driver.findElement(By.tagName("button")).click();
  
    # Switch frame by id
driver.switch_to.frame('buttonframe')

    # Now, Click on the button
driver.find_element(By.TAG_NAME, 'button').click()
  
//Using the ID
driver.SwitchTo().Frame("buttonframe");

//Or using the name instead
driver.SwitchTo().Frame("myframe");

//Now we can click the button
driver.FindElement(By.TagName("button")).Click();
  
    # Switch by ID
driver.switch_to.frame 'buttonframe'

    # Now, Click on the button
driver.find_element(:tag_name,'button').click
  
// Using the ID
await driver.switchTo().frame('buttonframe');

// Or using the name instead
await driver.switchTo().frame('myframe');

// Now we can click the button
await driver.findElement(By.css('button')).click();
  
//Using the ID
driver.switchTo().frame("buttonframe")

//Or using the name instead
driver.switchTo().frame("myframe")

//Now we can click the button
driver.findElement(By.tagName("button")).click()
  

Using an index

It is also possible to use the index of the frame, such as can be queried using window.frames in JavaScript.

Move Code

// Switches to the second frame
driver.switchTo().frame(1);
  
    # Switch to the second frame
driver.switch_to.frame(1)
  
// Switches to the second frame
driver.SwitchTo().Frame(1);
  
    # switching to second iframe based on index
iframe = driver.find_elements(By.TAG_NAME,'iframe')[1]

    # switch to selected iframe
driver.switch_to.frame(iframe)
  
// Switches to the second frame
await driver.switchTo().frame(1);
  
// Switches to the second frame
driver.switchTo().frame(1)
  

Leaving a frame

To leave an iframe or frameset, switch back to the default content like so:

Move Code

// Return to the top level
driver.switchTo().defaultContent();
  
    # switch back to default content
driver.switch_to.default_content()
  
// Return to the top level
driver.SwitchTo().DefaultContent();
  
    # Return to the top level
driver.switch_to.default_content
  
// Return to the top level
await driver.switchTo().defaultContent();
  
// Return to the top level
driver.switchTo().defaultContent()
  

6.5 - Working with windows and tabs

Windows and tabs

Get window handle

WebDriver does not make the distinction between windows and tabs. If your site opens a new tab or window, Selenium will let you work with it using a window handle. Each window has a unique identifier which remains persistent in a single session. You can get the window handle of the current window by using:

Move Code

driver.getWindowHandle();
driver.current_window_handle
driver.CurrentWindowHandle;
driver.window_handle
await driver.getWindowHandle();
driver.windowHandle

Switching windows or tabs

Clicking a link which opens in a new window will focus the new window or tab on screen, but WebDriver will not know which window the Operating System considers active. To work with the new window you will need to switch to it. If you have only two tabs or windows open, and you know which window you start with, by the process of elimination you can loop over both windows or tabs that WebDriver can see, and switch to the one which is not the original.

However, Selenium 4 provides a new api NewWindow which creates a new tab (or) new window and automatically switches to it.

Move Code

//Store the ID of the original window
String originalWindow = driver.getWindowHandle();

//Check we don't have other windows open already
assert driver.getWindowHandles().size() == 1;

//Click the link which opens in a new window
driver.findElement(By.linkText("new window")).click();

//Wait for the new window or tab
wait.until(numberOfWindowsToBe(2));

//Loop through until we find a new window handle
for (String windowHandle : driver.getWindowHandles()) {
    if(!originalWindow.contentEquals(windowHandle)) {
        driver.switchTo().window(windowHandle);
        break;
    }
}

//Wait for the new tab to finish loading content
wait.until(titleIs("Selenium documentation"));
  
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

with webdriver.Firefox() as driver:
    # Open URL
    driver.get("https://seleniumhq.github.io")

    # Setup wait for later
    wait = WebDriverWait(driver, 10)

    # Store the ID of the original window
    original_window = driver.current_window_handle

    # Check we don't have other windows open already
    assert len(driver.window_handles) == 1

    # Click the link which opens in a new window
    driver.find_element(By.LINK_TEXT, "new window").click()

    # Wait for the new window or tab
    wait.until(EC.number_of_windows_to_be(2))

    # Loop through until we find a new window handle
    for window_handle in driver.window_handles:
        if window_handle != original_window:
            driver.switch_to.window(window_handle)
            break

    # Wait for the new tab to finish loading content
    wait.until(EC.title_is("SeleniumHQ Browser Automation"))
  
//Store the ID of the original window
string originalWindow = driver.CurrentWindowHandle;

//Check we don't have other windows open already
Assert.AreEqual(driver.WindowHandles.Count, 1);

//Click the link which opens in a new window
driver.FindElement(By.LinkText("new window")).Click();

//Wait for the new window or tab
wait.Until(wd => wd.WindowHandles.Count == 2);

//Loop through until we find a new window handle
foreach(string window in driver.WindowHandles)
{
    if(originalWindow != window)
    {
        driver.SwitchTo().Window(window);
        break;
    }
}
//Wait for the new tab to finish loading content
wait.Until(wd => wd.Title == "Selenium documentation");
  
    # Store the ID of the original window
original_window = driver.window_handle

    # Check we don't have other windows open already
assert(driver.window_handles.length == 1, 'Expected one window')

    # Click the link which opens in a new window
driver.find_element(link: 'new window').click

    # Wait for the new window or tab
wait.until { driver.window_handles.length == 2 }

    #Loop through until we find a new window handle
driver.window_handles.each do |handle|
    if handle != original_window
        driver.switch_to.window handle
        break
    end
end

    #Wait for the new tab to finish loading content
wait.until { driver.title == 'Selenium documentation'}
  
//Store the ID of the original window
const originalWindow = await driver.getWindowHandle();

//Check we don't have other windows open already
assert((await driver.getAllWindowHandles()).length === 1);

//Click the link which opens in a new window
await driver.findElement(By.linkText('new window')).click();

//Wait for the new window or tab
await driver.wait(
    async () => (await driver.getAllWindowHandles()).length === 2,
    10000
  );

//Loop through until we find a new window handle
const windows = await driver.getAllWindowHandles();
windows.forEach(async handle => {
  if (handle !== originalWindow) {
    await driver.switchTo().window(handle);
  }
});

//Wait for the new tab to finish loading content
await driver.wait(until.titleIs('Selenium documentation'), 10000);
  
//Store the ID of the original window
val originalWindow = driver.getWindowHandle()

//Check we don't have other windows open already
assert(driver.getWindowHandles().size() === 1)

//Click the link which opens in a new window
driver.findElement(By.linkText("new window")).click()

//Wait for the new window or tab
wait.until(numberOfWindowsToBe(2))

//Loop through until we find a new window handle
for (windowHandle in driver.getWindowHandles()) {
    if (!originalWindow.contentEquals(windowHandle)) {
        driver.switchTo().window(windowHandle)
        break
    }
}

//Wait for the new tab to finish loading content
wait.until(titleIs("Selenium documentation"))

  

Create new window (or) new tab and switch

Creates a new window (or) tab and will focus the new window or tab on screen. You don’t need to switch to work with the new window (or) tab. If you have more than two windows (or) tabs opened other than the new window, you can loop over both windows or tabs that WebDriver can see, and switch to the one which is not the original.

Note: This feature works with Selenium 4 and later versions.

Move Code

// Opens a new tab and switches to new tab
driver.switchTo().newWindow(WindowType.TAB);

// Opens a new window and switches to new window
driver.switchTo().newWindow(WindowType.WINDOW);
  
    # Opens a new tab and switches to new tab
driver.switch_to.new_window('tab')

    # Opens a new window and switches to new window
driver.switch_to.new_window('window')
  
// Opens a new tab and switches to new tab
driver.SwitchTo().NewWindow(WindowType.Tab)

// Opens a new window and switches to new window
driver.SwitchTo().NewWindow(WindowType.Window)
  

Opens a new tab and switches to new tab:

    driver.switch_to.new_window(:tab)

Opens a new window and switches to new window:

    driver.switch_to.new_window(:window)
Opens a new tab and switches to new tab
    await driver.switchTo().newWindow('tab');
Opens a new window and switches to new window:
    await driver.switchTo().newWindow('window');
// Opens a new tab and switches to new tab
driver.switchTo().newWindow(WindowType.TAB)

// Opens a new window and switches to new window
driver.switchTo().newWindow(WindowType.WINDOW)
  

Closing a window or tab

When you are finished with a window or tab and it is not the last window or tab open in your browser, you should close it and switch back to the window you were using previously. Assuming you followed the code sample in the previous section you will have the previous window handle stored in a variable. Put this together and you will get:

Move Code

//Close the tab or window
driver.close();

//Switch back to the old tab or window
driver.switchTo().window(originalWindow);
  
    #Close the tab or window
driver.close()

    #Switch back to the old tab or window
driver.switch_to.window(original_window)
  
//Close the tab or window
driver.Close();

//Switch back to the old tab or window
driver.SwitchTo().Window(originalWindow);
  
    #Close the tab or window
driver.close

    #Switch back to the old tab or window
driver.switch_to.window original_window
  
//Close the tab or window
await driver.close();

//Switch back to the old tab or window
await driver.switchTo().window(originalWindow);
  
//Close the tab or window
driver.close()

//Switch back to the old tab or window
driver.switchTo().window(originalWindow)

  

Forgetting to switch back to another window handle after closing a window will leave WebDriver executing on the now closed page, and will trigger a No Such Window Exception. You must switch back to a valid window handle in order to continue execution.

Quitting the browser at the end of a session

When you are finished with the browser session you should call quit, instead of close:

Move Code

driver.quit();
driver.quit()
driver.Quit();
driver.quit
await driver.quit();
driver.quit()
  • Quit will:
    • Close all the windows and tabs associated with that WebDriver session
    • Close the browser process
    • Close the background driver process
    • Notify Selenium Grid that the browser is no longer in use so it can be used by another session (if you are using Selenium Grid)

Failure to call quit will leave extra background processes and ports running on your machine which could cause you problems later.

Some test frameworks offer methods and annotations which you can hook into to tear down at the end of a test.

Move Code

/**
 * Example using JUnit
 * https://junit.org/junit5/docs/current/api/org/junit/jupiter/api/AfterAll.html
 */
@AfterAll
public static void tearDown() {
    driver.quit();
}
  
    # unittest teardown
    # https://docs.python.org/3/library/unittest.html?highlight=teardown#unittest.TestCase.tearDown
def tearDown(self):
    self.driver.quit()
  
/*
    Example using Visual Studio's UnitTesting
    https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.testtools.unittesting.aspx
*/
[TestCleanup]
public void TearDown()
{
    driver.Quit();
}
  
    # UnitTest Teardown
    # https://www.rubydoc.info/github/test-unit/test-unit/Test/Unit/TestCase
def teardown
    @driver.quit
end
  
/**
 * Example using Mocha
 * https://mochajs.org/#hooks
 */
after('Tear down', async function () {
  await driver.quit();
});
  
/**
 * Example using JUnit
 * https://junit.org/junit5/docs/current/api/org/junit/jupiter/api/AfterAll.html
 */
@AfterAll
fun tearDown() {
    driver.quit()
}
  

If not running WebDriver in a test context, you may consider using try / finally which is offered by most languages so that an exception will still clean up the WebDriver session.

Move Code

try {
    //WebDriver code here...
} finally {
    driver.quit();
}
  
try:
    #WebDriver code here...
finally:
    driver.quit()
  
try {
    //WebDriver code here...
} finally {
    driver.Quit();
}
  
begin
    #WebDriver code here...
ensure
    driver.quit
end
  
try {
    //WebDriver code here...
} finally {
    await driver.quit();
}
  
try {
    //WebDriver code here...
} finally {
    driver.quit()
}
  

Python’s WebDriver now supports the python context manager, which when using the with keyword can automatically quit the driver at the end of execution.

with webdriver.Firefox() as driver:
  # WebDriver code here...

# WebDriver will automatically quit after indentation

Window management

Screen resolution can impact how your web application renders, so WebDriver provides mechanisms for moving and resizing the browser window.

Get window size

Fetches the size of the browser window in pixels.

Move Code

//Access each dimension individually
int width = driver.manage().window().getSize().getWidth();
int height = driver.manage().window().getSize().getHeight();

//Or store the dimensions and query them later
Dimension size = driver.manage().window().getSize();
int width1 = size.getWidth();
int height1 = size.getHeight();
  
    # Access each dimension individually
width = driver.get_window_size().get("width")
height = driver.get_window_size().get("height")

    # Or store the dimensions and query them later
size = driver.get_window_size()
width1 = size.get("width")
height1 = size.get("height")
  
//Access each dimension individually
int width = driver.Manage().Window.Size.Width;
int height = driver.Manage().Window.Size.Height;

//Or store the dimensions and query them later
System.Drawing.Size size = driver.Manage().Window.Size;
int width1 = size.Width;
int height1 = size.Height;
  
    # Access each dimension individually
width = driver.manage.window.size.width
height = driver.manage.window.size.height

    # Or store the dimensions and query them later
size = driver.manage.window.size
width1 = size.width
height1 = size.height
  
Access each dimension individually
    const { width, height } = await driver.manage().window().getRect();
(or) store the dimensions and query them later
    const rect = await driver.manage().window().getRect();
    const windowWidth = rect.width;
    const windowHeight = rect.height;
//Access each dimension individually
val width = driver.manage().window().size.width
val height = driver.manage().window().size.height

//Or store the dimensions and query them later
val size = driver.manage().window().size
val width1 = size.width
val height1 = size.height
  

Set window size

Restores the window and sets the window size.

Move Code

driver.manage().window().setSize(new Dimension(1024, 768));
driver.set_window_size(1024, 768)
driver.Manage().Window.Size = new Size(1024, 768);
driver.manage.window.resize_to(1024,768)
await driver.manage().window().setRect({ width: 1024, height: 768 });
driver.manage().window().size = Dimension(1024, 768)

Get window position

Fetches the coordinates of the top left coordinate of the browser window.

Move Code

// Access each dimension individually
int x = driver.manage().window().getPosition().getX();
int y = driver.manage().window().getPosition().getY();

// Or store the dimensions and query them later
Point position = driver.manage().window().getPosition();
int x1 = position.getX();
int y1 = position.getY();
  
    # Access each dimension individually
x = driver.get_window_position().get('x')
y = driver.get_window_position().get('y')

    # Or store the dimensions and query them later
position = driver.get_window_position()
x1 = position.get('x')
y1 = position.get('y')
  
//Access each dimension individually
int x = driver.Manage().Window.Position.X;
int y = driver.Manage().Window.Position.Y;

//Or store the dimensions and query them later
Point position = driver.Manage().Window.Position;
int x1 = position.X;
int y1 = position.Y;
  
    #Access each dimension individually
x = driver.manage.window.position.x
y = driver.manage.window.position.y

    # Or store the dimensions and query them later
rect  = driver.manage.window.rect
x1 = rect.x
y1 = rect.y
  
Access each dimension individually
    const { x, y } = await driver.manage().window().getRect();
(or) store the dimensions and query them later
    const rect = await driver.manage().window().getRect();
    const x1 = rect.x;
    const y1 = rect.y;
// Access each dimension individually
val x = driver.manage().window().position.x
val y = driver.manage().window().position.y

// Or store the dimensions and query them later
val position = driver.manage().window().position
val x1 = position.x
val y1 = position.y

  

Set window position

Moves the window to the chosen position.

Move Code

// Move the window to the top left of the primary monitor
driver.manage().window().setPosition(new Point(0, 0));
  
    # Move the window to the top left of the primary monitor
driver.set_window_position(0, 0)
  
// Move the window to the top left of the primary monitor
driver.Manage().Window.Position = new Point(0, 0);
  
driver.manage.window.move_to(0,0)
  
// Move the window to the top left of the primary monitor
await driver.manage().window().setRect({ x: 0, y: 0 });
  
// Move the window to the top left of the primary monitor
driver.manage().window().position = Point(0,0)
    

Maximize window

Enlarges the window. For most operating systems, the window will fill the screen, without blocking the operating system’s own menus and toolbars.

Move Code

driver.manage().window().maximize();
driver.maximize_window()
driver.Manage().Window.Maximize();
driver.manage.window.maximize
await driver.manage().window().maximize();
driver.manage().window().maximize()

Minimize window

Minimizes the window of current browsing context. The exact behavior of this command is specific to individual window managers.

Minimize Window typically hides the window in the system tray.

Note: This feature works with Selenium 4 and later versions.

Move Code

driver.manage().window().minimize();
driver.minimize_window()
driver.Manage().Window.Minimize();
driver.manage.window.minimize
await driver.manage().window().minimize();
driver.manage().window().minimize()

Fullscreen window

Fills the entire screen, similar to pressing F11 in most browsers.

Move Code

driver.manage().window().fullscreen();
driver.fullscreen_window()
driver.Manage().Window.FullScreen();
driver.manage.window.full_screen
await driver.manage().window().fullscreen();
driver.manage().window().fullscreen()

TakeScreenshot

Used to capture screenshot for current browsing context. The WebDriver endpoint screenshot returns screenshot which is encoded in Base64 format.

Move Code

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.*;
import org.openqa.selenium.*;

public class SeleniumTakeScreenshot {
    public static void main(String args[]) throws IOException {
        WebDriver driver = new ChromeDriver();
        driver.get("http://www.example.com");
        File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
        FileUtils.copyFile(scrFile, new File("./image.png"));
        driver.quit();
    }
}
  
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("http://www.example.com")

    # Returns and base64 encoded string into image
driver.save_screenshot('./image.png')

driver.quit()
  using OpenQA.Selenium;
  using OpenQA.Selenium.Chrome;
  using OpenQA.Selenium.Support.UI;

  var driver = new ChromeDriver();
  driver.Navigate().GoToUrl("http://www.example.com");
  Screenshot screenshot = (driver as ITakesScreenshot).GetScreenshot();
  screenshot.SaveAsFile("screenshot.png", ScreenshotImageFormat.Png); // Format values are Bmp, Gif, Jpeg, Png, Tiff
  
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://example.com/'

    # Takes and Stores the screenshot in specified path
  driver.save_screenshot('./image.png')

end
  
    // Captures the screenshot
    let encodedString = await driver.takeScreenshot();
    // save screenshot as below
    // await fs.writeFileSync('./image.png', encodedString, 'base64');
import com.oracle.tools.packager.IOUtils.copyFile
import org.openqa.selenium.*
import org.openqa.selenium.chrome.ChromeDriver
import java.io.File

fun main(){
    val driver =  ChromeDriver()
    driver.get("https://www.example.com")
    val scrFile = (driver as TakesScreenshot).getScreenshotAs<File>(OutputType.FILE)
    copyFile(scrFile, File("./image.png"))
    driver.quit()
}
   

TakeElementScreenshot

Used to capture screenshot of an element for current browsing context. The WebDriver endpoint screenshot returns screenshot which is encoded in Base64 format.

Move Code

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.File;
import java.io.IOException;

public class SeleniumelementTakeScreenshot {
  public static void main(String args[]) throws IOException {
    WebDriver driver = new ChromeDriver();
    driver.get("https://www.example.com");
    WebElement element = driver.findElement(By.cssSelector("h1"));
    File scrFile = element.getScreenshotAs(OutputType.FILE);
    FileUtils.copyFile(scrFile, new File("./image.png"));
    driver.quit();
  }
}
  
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()

driver.get("http://www.example.com")

ele = driver.find_element(By.CSS_SELECTOR, 'h1')

    # Returns and base64 encoded string into image
ele.screenshot('./image.png')

driver.quit()
  
    using OpenQA.Selenium;
    using OpenQA.Selenium.Chrome;
    using OpenQA.Selenium.Support.UI;

    // Webdriver
    var driver = new ChromeDriver();
    driver.Navigate().GoToUrl("http://www.example.com");

    // Fetch element using FindElement
    var webElement = driver.FindElement(By.CssSelector("h1"));

    // Screenshot for the element
    var elementScreenshot = (webElement as ITakesScreenshot).GetScreenshot();
    elementScreenshot.SaveAsFile("screenshot_of_element.png");
  
    # Works with Selenium4-alpha7 Ruby bindings and above
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :chrome

begin
  driver.get 'https://example.com/'
  ele = driver.find_element(:css, 'h1')

    # Takes and Stores the element screenshot in specified path
  ele.save_screenshot('./image.jpg')
end
  
    let header = await driver.findElement(By.css('h1'));
      // Captures the element screenshot
      let encodedString = await header.takeScreenshot(true);
      // save screenshot as below
      // await fs.writeFileSync('./image.png', encodedString, 'base64');
  
import org.apache.commons.io.FileUtils
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.*
import java.io.File

fun main() {
    val driver = ChromeDriver()
    driver.get("https://www.example.com")
    val element = driver.findElement(By.cssSelector("h1"))
    val scrFile: File = element.getScreenshotAs(OutputType.FILE)
    FileUtils.copyFile(scrFile, File("./image.png"))
    driver.quit()
}
  

Execute Script

Executes JavaScript code snippet in the current context of a selected frame or window.

Move Code

    //Creating the JavascriptExecutor interface object by Type casting
      JavascriptExecutor js = (JavascriptExecutor)driver;
    //Button Element
      WebElement button =driver.findElement(By.name("btnLogin"));
    //Executing JavaScript to click on element
      js.executeScript("arguments[0].click();", button);
    //Get return value from script
      String text = (String) js.executeScript("return arguments[0].innerText", button);
    //Executing JavaScript directly
      js.executeScript("console.log('hello world')");
  
    # Stores the header element
header = driver.find_element(By.CSS_SELECTOR, "h1")

    # Executing JavaScript to capture innerText of header element
driver.execute_script('return arguments[0].innerText', header)
  
	//creating Chromedriver instance
	IWebDriver driver = new ChromeDriver();
	//Creating the JavascriptExecutor interface object by Type casting
	IJavaScriptExecutor js = (IJavaScriptExecutor) driver;
	//Button Element
	IWebElement button = driver.FindElement(By.Name("btnLogin"));
	//Executing JavaScript to click on element
	js.ExecuteScript("arguments[0].click();", button);
	//Get return value from script
	String text = (String)js.ExecuteScript("return arguments[0].innerText", button);
	//Executing JavaScript directly
	js.ExecuteScript("console.log('hello world')");
  
    # Stores the header element
header = driver.find_element(css: 'h1')

    # Get return value from script
result = driver.execute_script("return arguments[0].innerText", header)

    # Executing JavaScript directly
driver.execute_script("alert('hello world')")
  
    // Stores the header element
    let header = await driver.findElement(By.css('h1'));

    // Executing JavaScript to capture innerText of header element
    let text = await driver.executeScript('return arguments[0].innerText', header);
// Stores the header element
val header = driver.findElement(By.cssSelector("h1"))

// Get return value from script
val result = driver.executeScript("return arguments[0].innerText", header)

// Executing JavaScript directly
driver.executeScript("alert('hello world')")
  

Prints the current page within the browser.

Note: This requires Chromium Browsers to be in headless mode

Move Code

    import org.openqa.selenium.print.PrintOptions;

    driver.get("https://www.selenium.dev");
    printer = (PrintsPage) driver;

    PrintOptions printOptions = new PrintOptions();
    printOptions.setPageRanges("1-2");

    Pdf pdf = printer.print(printOptions);
    String content = pdf.getContent();
  
    from selenium.webdriver.common.print_page_options import PrintOptions

    print_options = PrintOptions()
    print_options.page_ranges = ['1-2']

    driver.get("printPage.html")

    base64code = driver.print_page(print_options)
  
    // code sample not available please raise a PR
  
    driver.navigate_to 'https://www.selenium.dev'

    base64encodedContent = driver.print_page(orientation: 'landscape')
  
    await driver.get('https://www.selenium.dev/selenium/web/alerts.html');
      let base64 = await driver.printPage({pageRanges: ["1-2"]});
      // page can be saved as a PDF as below
      // await fs.writeFileSync('./test.pdf', base64, 'base64');
  
    driver.get("https://www.selenium.dev")
    val printer = driver as PrintsPage

    val printOptions = PrintOptions()
    printOptions.setPageRanges("1-2")
    
    val pdf: Pdf = printer.print(printOptions)
    val content = pdf.content
  

6.6 - Virtual Authenticator

A representation of the Web Authenticator model.

Web applications can enable a public key-based authentication mechanism known as Web Authentication to authenticate users in a passwordless manner. Web Authentication defines APIs that allows a user to create a public-key credential and register it with an authenticator. An authenticator can be a hardware device or a software entity that stores user’s public-key credentials and retrieves them on request.

As the name suggests, Virtual Authenticator emulates such authenticators for testing.

Virtual Authenticator Options

A Virtual Authenticatior has a set of properties. These properties are mapped as VirtualAuthenticatorOptions in the Selenium bindings.

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setIsUserVerified(true)
      .setHasUserVerification(true)
      .setIsUserConsenting(true)
      .setTransport(VirtualAuthenticatorOptions.Transport.USB)
      .setProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
      .setHasResidentKey(false);
            // Create virtual authenticator options
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetIsUserVerified(true)
                .SetHasUserVerification(true)
                .SetIsUserConsenting(true)
                .SetTransport(VirtualAuthenticatorOptions.Transport.USB)
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);
    options = VirtualAuthenticatorOptions()
    options.is_user_verified = True
    options.has_user_verification = True
    options.is_user_consenting = True
    options.transport = VirtualAuthenticatorOptions.Transport.USB
    options.protocol = VirtualAuthenticatorOptions.Protocol.U2F
    options.has_resident_key = False
      options = new VirtualAuthenticatorOptions();
      options.setIsUserVerified(true);
      options.setHasUserVerification(true);
      options.setIsUserConsenting(true);
      options.setTransport(Transport['USB']);
      options.setProtocol(Protocol['U2F']);
      options.setHasResidentKey(false);

Add Virtual Authenticator

It creates a new virtual authenticator with the provided properties.

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
      .setHasResidentKey(false);

    VirtualAuthenticator authenticator =
      ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(options);
            // Create virtual authenticator options
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);

            // Register a virtual authenticator
            ((WebDriver)driver).AddVirtualAuthenticator(options);

            List<Credential> credentialList = ((WebDriver)driver).GetCredentials();
    options = VirtualAuthenticatorOptions()
    options.protocol = VirtualAuthenticatorOptions.Protocol.U2F
    options.has_resident_key = False

    # Register a virtual authenticator
    driver.add_virtual_authenticator(options)
            options.setProtocol(Protocol['U2F']);
            options.setHasResidentKey(false);

            // Register a virtual authenticator
            await driver.addVirtualAuthenticator(options);

Remove Virtual Authenticator

Removes the previously added virtual authenticator.

    ((HasVirtualAuthenticator) driver).removeVirtualAuthenticator(authenticator);
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);

            String virtualAuthenticatorId = ((WebDriver)driver).AddVirtualAuthenticator(options);

            ((WebDriver)driver).RemoveVirtualAuthenticator(virtualAuthenticatorId);
    options = VirtualAuthenticatorOptions()

    # Register a virtual authenticator
    driver.add_virtual_authenticator(options)

    # Remove virtual authenticator
    driver.remove_virtual_authenticator()
            await driver.addVirtualAuthenticator(options);
            await driver.removeVirtualAuthenticator();

Create Resident Credential

Creates a resident (stateful) credential with the given required credential parameters.

    byte[] credentialId = {1, 2, 3, 4};
    byte[] userHandle = {1};
    Credential residentCredential = Credential.createResidentCredential(
      credentialId, "localhost", rsaPrivateKey, userHandle, /*signCount=*/0);
            byte[] credentialId = { 1, 2, 3, 4 };
            byte[] userHandle = { 1 };

            Credential residentCredential = Credential.CreateResidentCredential(
              credentialId, "localhost", base64EncodedPK, userHandle, 0);
    options = VirtualAuthenticatorOptions()
    options.protocol = VirtualAuthenticatorOptions.Protocol.CTAP2
    options.has_resident_key = True
    options.has_user_verification = True
    options.is_user_verified = True

    # Register a virtual authenticator
    driver.add_virtual_authenticator(options)

    # parameters for Resident Credential
    credential_id = bytearray({1, 2, 3, 4})
    rp_id = "localhost"
    user_handle = bytearray({1})
    privatekey = urlsafe_b64decode(BASE64__ENCODED_PK)
    sign_count = 0

    # create a  resident credential using above parameters
    resident_credential = Credential.create_resident_credential(credential_id, rp_id, user_handle, privatekey, sign_count)
            options.setProtocol(Protocol['CTAP2']);
            options.setHasResidentKey(true);
            options.setHasUserVerification(true);
            options.setIsUserVerified(true);

            await driver.addVirtualAuthenticator(options);

            let residentCredential = new Credential().createResidentCredential(
                new Uint8Array([1, 2, 3, 4]),
                'localhost',
                new Uint8Array([1]),
                Buffer.from(BASE64_ENCODED_PK, 'base64').toString('binary'),
                0);

            await driver.addCredential(residentCredential);

Create Non-Resident Credential

Creates a resident (stateless) credential with the given required credential parameters.

    byte[] credentialId = {1, 2, 3, 4};
    Credential nonResidentCredential = Credential.createNonResidentCredential(
      credentialId, "localhost", ec256PrivateKey, /*signCount=*/0);
            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

Add Credential

Registers the credential with the authenticator.

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
      .setHasResidentKey(false);

    VirtualAuthenticator authenticator = ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(options);

    byte[] credentialId = {1, 2, 3, 4};
    Credential nonResidentCredential = Credential.createNonResidentCredential(
      credentialId, "localhost", ec256PrivateKey, /*signCount=*/0);
    authenticator.addCredential(nonResidentCredential);
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(VirtualAuthenticatorOptions.Protocol.U2F)
                .SetHasResidentKey(false);

            ((WebDriver)driver).AddVirtualAuthenticator(options);

            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

            ((WebDriver)driver).AddCredential(nonResidentCredential);

Get Credential

Returns the list of credentials owned by the authenticator.

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setProtocol(VirtualAuthenticatorOptions.Protocol.CTAP2)
      .setHasResidentKey(true)
      .setHasUserVerification(true)
      .setIsUserVerified(true);
    VirtualAuthenticator authenticator = ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(options);

    byte[] credentialId = {1, 2, 3, 4};
    byte[] userHandle = {1};
    Credential residentCredential = Credential.createResidentCredential(
      credentialId, "localhost", rsaPrivateKey, userHandle, /*signCount=*/0);

    authenticator.addCredential(residentCredential);

    List<Credential> credentialList = authenticator.getCredentials();
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetProtocol(Protocol.CTAP2)
                .SetHasResidentKey(true)
                .SetHasUserVerification(true)
                .SetIsUserVerified(true);

            ((WebDriver)driver).AddVirtualAuthenticator(options);

            byte[] credentialId = { 1, 2, 3, 4 };
            byte[] userHandle = { 1 };

            Credential residentCredential = Credential.CreateResidentCredential(
              credentialId, "localhost", base64EncodedPK, userHandle, 0);

            ((WebDriver)driver).AddCredential(residentCredential);

            List<Credential> credentialList = ((WebDriver)driver).GetCredentials();

Remove Credential

Removes a credential from the authenticator based on the passed credential id.

    VirtualAuthenticator authenticator =
      ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(new VirtualAuthenticatorOptions());

    byte[] credentialId = {1, 2, 3, 4};
    Credential credential = Credential.createNonResidentCredential(
      credentialId, "localhost", rsaPrivateKey, 0);

    authenticator.addCredential(credential);

    authenticator.removeCredential(credentialId);
            ((WebDriver)driver).AddVirtualAuthenticator(new VirtualAuthenticatorOptions());

            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

            ((WebDriver)driver).AddCredential(nonResidentCredential);

            ((WebDriver)driver).RemoveCredential(credentialId);

Remove All Credentials

Removes all the credentials from the authenticator.

    VirtualAuthenticator authenticator =
      ((HasVirtualAuthenticator) driver).addVirtualAuthenticator(new VirtualAuthenticatorOptions());

    byte[] credentialId = {1, 2, 3, 4};
    Credential residentCredential = Credential.createNonResidentCredential(
      credentialId, "localhost", rsaPrivateKey, /*signCount=*/0);

    authenticator.addCredential(residentCredential);

    authenticator.removeAllCredentials();
            ((WebDriver)driver).AddVirtualAuthenticator(new VirtualAuthenticatorOptions());

            byte[] credentialId = { 1, 2, 3, 4 };

            Credential nonResidentCredential = Credential.CreateNonResidentCredential(
              credentialId, "localhost", base64EncodedEC256PK, 0);

            ((WebDriver)driver).AddCredential(nonResidentCredential);

            ((WebDriver)driver).RemoveAllCredentials();

Set User Verified

Sets whether the authenticator will simulate success or fail on user verification.

    VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
      .setIsUserVerified(true);
            VirtualAuthenticatorOptions options = new VirtualAuthenticatorOptions()
                .SetIsUserVerified(true);

7 - Actions API

A low-level interface for providing virtualized device input actions to the web browser.

In addition to the high-level element interactions, the Actions API provides granular control over exactly what designated input devices can do. Selenium provides an interface for 3 kinds of input sources: a key input for keyboard devices, a pointer input for a mouse, pen or touch devices, and wheel inputs for scroll wheel devices (introduced in Selenium 4.2). Selenium allows you to construct individual action commands assigned to specific inputs and chain them together and call the associated perform method to execute them all at once.

Action Builder

In the move from the legacy JSON Wire Protocol to the new W3C WebDriver Protocol, the low level building blocks of actions became especially detailed. It is extremely powerful, but each input device has a number of ways to use it and if you need to manage more than one device, you are responsible for ensuring proper synchronization between them.

Thankfully, you likely do not need to learn how to use the low level commands directly, since almost everything you might want to do has been given a convenience method that combines the lower level commands for you. These are all documented in keyboard, mouse, pen, and wheel pages.

Pause

Pointer movements and Wheel scrolling allow the user to set a duration for the action, but sometimes you just need to wait a beat between actions for things to work correctly.

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .moveToElement(clickable)
                .pause(Duration.ofSeconds(1))
                .clickAndHold()
                .pause(Duration.ofSeconds(1))
                .sendKeys("abc")
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .move_to_element(clickable)\
        .pause(1)\
        .click_and_hold()\
        .pause(1)\
        .send_keys("abc")\
        .perform()

Selenium v4.2

            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .MoveToElement(clickable)
                .Pause(TimeSpan.FromSeconds(1))
                .ClickAndHold()
                .Pause(TimeSpan.FromSeconds(1))
                .SendKeys("abc")
                .Perform();

Selenium v4.2

    clickable = driver.find_element(id: 'clickable')
    driver.action
          .move_to(clickable)
          .pause(duration: 1)
          .click_and_hold
          .pause(duration: 1)
          .send_keys('abc')
          .perform
      const clickable = await driver.findElement(By.id('clickable'))
      await driver.actions()
        .move({ origin: clickable })
        .pause(1000)
        .press()
        .pause(1000)
        .sendKeys('abc')
        .perform()
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
            .moveToElement(clickable)
            .pause(Duration.ofSeconds(1))
            .clickAndHold()
            .pause(Duration.ofSeconds(1))
            .sendKeys("abc")
            .perform() 

Release All Actions

An important thing to note is that the driver remembers the state of all the input items throughout a session. Even if you create a new instance of an actions class, the depressed keys and the location of the pointer will be in whatever state a previously performed action left them.

There is a special method to release all currently depressed keys and pointer buttons. This method is implemented differently in each of the languages because it does not get executed with the perform method.

        ((RemoteWebDriver) driver).resetInputState();
    ActionBuilder(driver).clear_actions()
            ((WebDriver)driver).ResetInputState();
    driver.action.release_actions
      await driver.actions().clear()
        (driver as RemoteWebDriver).resetInputState()

7.1 - Keyboard actions

A representation of any key input device for interacting with a web page.

There are only 2 actions that can be accomplished with a keyboard: pressing down on a key, and releasing a pressed key. In addition to supporting ASCII characters, each keyboard key has a representation that can be pressed or released in designated sequences.

Keys

In addition to the keys represented by regular unicode, unicode values have been assigned to other keyboard keys for use with Selenium. Each language has its own way to reference these keys; the full list can be found here.

Key down

        new Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .perform();
    ActionChains(driver)\
        .key_down(Keys.SHIFT)\
        .send_keys("abc")\
        .perform()
                .KeyDown(Keys.Shift)
                .SendKeys("a")
                .Perform();
    driver.action
          .key_down(:shift)
          .send_keys('a')
          .perform
      await driver.actions()
        .keyDown(Key.SHIFT)
        .sendKeys('a')
        .perform()
        Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .perform()

Key up

        new Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .keyUp(Keys.SHIFT)
                .sendKeys("b")
                .perform();
    ActionChains(driver)\
        .key_down(Keys.SHIFT)\
        .send_keys("a")\
        .key_up(Keys.SHIFT)\
        .send_keys("b")\
        .perform()
            new Actions(driver)
                .KeyDown(Keys.Shift)
                .SendKeys("a")
                .KeyUp(Keys.Shift)
                .SendKeys("b")
                .Perform();
    driver.action
          .key_down(:shift)
          .send_keys('a')
          .key_up(:shift)
          .send_keys('b')
          .perform
      await driver.actions()
        .keyDown(Key.SHIFT)
        .sendKeys('a')
        .keyUp(Key.SHIFT)
        .sendKeys('b')
        .perform()
        Actions(driver)
                .keyDown(Keys.SHIFT)
                .sendKeys("a")
                .keyUp(Keys.SHIFT)
                .sendKeys("b")
                .perform()

Send keys

This is a convenience method in the Actions API that combines keyDown and keyUp commands in one action. Executing this command differs slightly from using the element method, but primarily this gets used when needing to type multiple characters in the middle of other actions.

Active Element

        new Actions(driver)
                .sendKeys("abc")
                .perform();
    ActionChains(driver)\
        .send_keys("abc")\
        .perform()

            new Actions(driver)
                .SendKeys("abc")
    driver.action
          .send_keys('abc')
          .perform
      const textField = driver.findElement(By.id('textInput'))
      await textField.click()
        Actions(driver)
                .sendKeys("abc")
                .perform()

Designated Element

        new Actions(driver)
                .sendKeys(textField, "Selenium!")
                .perform();
    text_input = driver.find_element(By.ID, "textInput")
    ActionChains(driver)\
        .send_keys_to_element(text_input, "abc")\
        .perform()
            driver.FindElement(By.TagName("body")).Click();
            
            IWebElement textField = driver.FindElement(By.Id("textInput"));
            new Actions(driver)
    text_field = driver.find_element(id: 'textInput')
    driver.action
          .send_keys(text_field, 'Selenium!')
          .perform

Selenium v4.5.0

      const textField = await driver.findElement(By.id('textInput'))

      await driver.actions()
        .sendKeys(textField, 'abc')
        .perform()
        val textField = driver.findElement(By.id("textInput"))
        Actions(driver)
                .sendKeys(textField, "Selenium!")
                .perform()

Copy and Paste

Here’s an example of using all of the above methods to conduct a copy / paste action. Note that the key to use for this operation will be different depending on if it is a Mac OS or not. This code will end up with the text: SeleniumSelenium!

        Keys cmdCtrl = Platform.getCurrent().is(Platform.MAC) ? Keys.COMMAND : Keys.CONTROL;

        WebElement textField = driver.findElement(By.id("textInput"));
        new Actions(driver)
                .sendKeys(textField, "Selenium!")
                .sendKeys(Keys.ARROW_LEFT)
                .keyDown(Keys.SHIFT)
                .sendKeys(Keys.ARROW_UP)
                .keyUp(Keys.SHIFT)
                .keyDown(cmdCtrl)
                .sendKeys("xvv")
                .keyUp(cmdCtrl)
                .perform();

        Assertions.assertEquals("SeleniumSelenium!", textField.getAttribute("value"));
    cmd_ctrl = Keys.COMMAND if sys.platform == 'darwin' else Keys.CONTROL

    ActionChains(driver)\
        .send_keys("Selenium!")\
        .send_keys(Keys.ARROW_LEFT)\
        .key_down(Keys.SHIFT)\
        .send_keys(Keys.ARROW_UP)\
        .key_up(Keys.SHIFT)\
        .key_down(cmd_ctrl)\
        .send_keys("xvv")\
        .key_up(cmd_ctrl)\
        .perform()

            var capabilities = ((WebDriver)driver).Capabilities;
            String platformName = (string)capabilities.GetCapability("platformName");

            String cmdCtrl = platformName.Contains("mac") ? Keys.Command : Keys.Control;

            new Actions(driver)
                .SendKeys("Selenium!")
                .SendKeys(Keys.ArrowLeft)
                .KeyDown(Keys.Shift)
                .SendKeys(Keys.ArrowUp)
    cmd_ctrl = driver.capabilities.platform_name.include?('mac') ? :command : :control
    driver.action
          .send_keys('Selenium!')
          .send_keys(:arrow_left)
          .key_down(:shift)
          .send_keys(:arrow_up)
          .key_up(:shift)
          .key_down(cmd_ctrl)
          .send_keys('xvv')
          .key_up(cmd_ctrl)
          .perform
      const cmdCtrl = platform.includes('darwin') ? Key.COMMAND : Key.CONTROL

      await driver.actions()
        .click(textField)
        .sendKeys('Selenium!')
        .sendKeys(Key.ARROW_LEFT)
        .keyDown(Key.SHIFT)
        .sendKeys(Key.ARROW_UP)
        .keyUp(Key.SHIFT)
        .keyDown(cmdCtrl)
        .sendKeys('xvv')
        .keyUp(cmdCtrl)
        .perform()
        val cmdCtrl = if(platformName == Platform.MAC) Keys.COMMAND else Keys.CONTROL

        val textField = driver.findElement(By.id("textInput"))
        Actions(driver)
                .sendKeys(textField, "Selenium!")
                .sendKeys(Keys.ARROW_LEFT)
                .keyDown(Keys.SHIFT)
                .sendKeys(Keys.ARROW_UP)
                .keyUp(Keys.SHIFT)
                .keyDown(cmdCtrl)
                .sendKeys("xvv")
                .keyUp(cmdCtrl)
                .perform()

7.2 - Mouse actions

A representation of any pointer device for interacting with a web page.

There are only 3 actions that can be accomplished with a mouse: pressing down on a button, releasing a pressed button, and moving the mouse. Selenium provides convenience methods that combine these actions in the most common ways.

Click and hold

This method combines moving the mouse to the center of an element with pressing the left mouse button. This is useful for focusing a specific element:

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .clickAndHold(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .click_and_hold(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .ClickAndHold(clickable)
                .Perform();
    clickable = driver.find_element(id: 'clickable')
    driver.action
          .click_and_hold(clickable)
          .perform
      let clickable = driver.findElement(By.id("clickable"));
      const actions = driver.actions({async: true});
      await actions.move({origin: clickable}).press().perform();
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
                .clickAndHold(clickable)
                .perform()

Click and release

This method combines moving to the center of an element with pressing and releasing the left mouse button. This is otherwise known as “clicking”:

        WebElement clickable = driver.findElement(By.id("click"));
        new Actions(driver)
                .click(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "click")
    ActionChains(driver)\
        .click(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("click"));
            new Actions(driver)
                .Click(clickable)
                .Perform();
    clickable = driver.find_element(id: 'click')
    driver.action
          .click(clickable)
          .perform
      let click = driver.findElement(By.id("click"));
      const actions = driver.actions({async: true});
      await actions.move({origin: click}).click().perform();
        val clickable = driver.findElement(By.id("click"))
        Actions(driver)
                .click(clickable)
                .perform()

Alternate Button Clicks

There are a total of 5 defined buttons for a Mouse:

  • 0 — Left Button (the default)
  • 1 — Middle Button (currently unsupported)
  • 2 — Right Button
  • 3 — X1 (Back) Button
  • 4 — X2 (Forward) Button

Context Click

This method combines moving to the center of an element with pressing and releasing the right mouse button (button 2). This is otherwise known as “right-clicking”:

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .contextClick(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .context_click(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .ContextClick(clickable)
                .Perform();
      clickable = driver.find_element(id: 'clickable')
      driver.action
            .context_click(clickable)
            .perform
      const clickable = driver.findElement(By.id("clickable"));
      const actions = driver.actions({async: true});
      await actions.contextClick(clickable).perform();
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
                .contextClick(clickable)
                .perform()

Back Click

There is no convenience method for this, it is just pressing and releasing mouse button 3

        PointerInput mouse = new PointerInput(PointerInput.Kind.MOUSE, "default mouse");

        Sequence actions = new Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.BACK.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.BACK.asArg()));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actions));

Selenium v4.2

    action = ActionBuilder(driver)
    action.pointer_action.pointer_down(MouseButton.BACK)
    action.pointer_action.pointer_up(MouseButton.BACK)
    action.perform()

Selenium v4.2

            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice mouse = new PointerInputDevice(PointerKind.Mouse, "default mouse");
            actionBuilder.AddAction(mouse.CreatePointerDown(MouseButton.Back));
            actionBuilder.AddAction(mouse.CreatePointerUp(MouseButton.Back));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());

Selenium v4.2

      driver.action
            .pointer_down(:back)
            .pointer_up(:back)
            .perform

Selenium v4.5.0

      const actions = driver.actions({async: true});
      await actions.press(Button.BACK).release(Button.BACK).perform()
        val mouse = PointerInput(PointerInput.Kind.MOUSE, "default mouse")

        val actions = Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.BACK.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.BACK.asArg()))

        (driver as RemoteWebDriver).perform(Collections.singletonList(actions))

Forward Click

There is no convenience method for this, it is just pressing and releasing mouse button 4

        PointerInput mouse = new PointerInput(PointerInput.Kind.MOUSE, "default mouse");

        Sequence actions = new Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.FORWARD.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.FORWARD.asArg()));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actions));

Selenium v4.2

    action = ActionBuilder(driver)
    action.pointer_action.pointer_down(MouseButton.FORWARD)
    action.pointer_action.pointer_up(MouseButton.FORWARD)
    action.perform()

Selenium v4.2

            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice mouse = new PointerInputDevice(PointerKind.Mouse, "default mouse");
            actionBuilder.AddAction(mouse.CreatePointerDown(MouseButton.Forward));
            actionBuilder.AddAction(mouse.CreatePointerUp(MouseButton.Forward));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());

Selenium v4.2

      driver.action
            .pointer_down(:forward)
            .pointer_up(:forward)
            .perform

Selenium v4.5.0

      const actions = driver.actions({async: true});
      await actions.press(Button.FORWARD).release(Button.FORWARD).perform()
        val mouse = PointerInput(PointerInput.Kind.MOUSE, "default mouse")

        val actions = Sequence(mouse, 0)
                .addAction(mouse.createPointerDown(PointerInput.MouseButton.FORWARD.asArg()))
                .addAction(mouse.createPointerUp(PointerInput.MouseButton.FORWARD.asArg()))

        (driver as RemoteWebDriver).perform(Collections.singletonList(actions))

Double click

This method combines moving to the center of an element with pressing and releasing the left mouse button twice.

        WebElement clickable = driver.findElement(By.id("clickable"));
        new Actions(driver)
                .doubleClick(clickable)
                .perform();
    clickable = driver.find_element(By.ID, "clickable")
    ActionChains(driver)\
        .double_click(clickable)\
        .perform()
            IWebElement clickable = driver.FindElement(By.Id("clickable"));
            new Actions(driver)
                .DoubleClick(clickable)
                .Perform();
    clickable = driver.find_element(id: 'clickable')
    driver.action
          .double_click(clickable)
          .perform
      const clickable = driver.findElement(By.id("clickable"));
      const actions = driver.actions({async: true});
      await actions.doubleClick(clickable).perform();
        val clickable = driver.findElement(By.id("clickable"))
        Actions(driver)
                .doubleClick(clickable)
                .perform()

Move to element

This method moves the mouse to the in-view center point of the element. This is otherwise known as “hovering.” Note that the element must be in the viewport or else the command will error.

        WebElement hoverable = driver.findElement(By.id("hover"));
        new Actions(driver)
                .moveToElement(hoverable)
                .perform();
    hoverable = driver.find_element(By.ID, "hover")
    ActionChains(driver)\
        .move_to_element(hoverable)\
        .perform()
            IWebElement hoverable = driver.FindElement(By.Id("hover"));
            new Actions(driver)
                .MoveToElement(hoverable)
                .Perform();
    hoverable = driver.find_element(id: 'hover')
    driver.action
          .move_to(hoverable)
          .perform
      const hoverable = driver.findElement(By.id("hover"));
      const actions = driver.actions({async: true});
      await actions.move({origin: hoverable}).perform();
        val hoverable = driver.findElement(By.id("hover"))
        Actions(driver)
                .moveToElement(hoverable)
                .perform()

Move by offset

These methods first move the mouse to the designated origin and then by the number of pixels in the provided offset. Note that the position of the mouse must be in the viewport or else the command will error.

Offset from Element

This method moves the mouse to the in-view center point of the element, then moves by the provided offset.

        WebElement tracker = driver.findElement(By.id("mouse-tracker"));
        new Actions(driver)
                .moveToElement(tracker, 8, 0)
                .perform();
    mouse_tracker = driver.find_element(By.ID, "mouse-tracker")
    ActionChains(driver)\
        .move_to_element_with_offset(mouse_tracker, 8, 0)\
        .perform()
            IWebElement tracker = driver.FindElement(By.Id("mouse-tracker"));
            new Actions(driver)
                .MoveToElement(tracker, 8, 0)
                .Perform();
      mouse_tracker = driver.find_element(id: 'mouse-tracker')
      driver.action
            .move_to(mouse_tracker, 8, 11)
            .perform
        val tracker = driver.findElement(By.id("mouse-tracker"))
        Actions(driver)
                .moveToElement(tracker, 8, 0)
                .perform()

Offset from Viewport

This method moves the mouse from the upper left corner of the current viewport by the provided offset.

        PointerInput mouse = new PointerInput(PointerInput.Kind.MOUSE, "default mouse");

        Sequence actions = new Sequence(mouse, 0)
                .addAction(mouse.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), 8, 12));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actions));
    action = ActionBuilder(driver)
    action.pointer_action.move_to_location(8, 0)
    action.perform()
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice mouse = new PointerInputDevice(PointerKind.Mouse, "default mouse");
            actionBuilder.AddAction(mouse.CreatePointerMove(CoordinateOrigin.Viewport,
                8, 0, TimeSpan.Zero));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
      driver.action
            .move_to_location(8, 12)
            .perform
      const actions = driver.actions({async: true});
      await actions.move({x: 8, y: 0}).perform();
        val mouse = PointerInput(PointerInput.Kind.MOUSE, "default mouse")

        val actions = Sequence(mouse, 0)
                .addAction(mouse.createPointerMove(Duration.ZERO, PointerInput.Origin.viewport(), 8, 12))

        (driver as RemoteWebDriver).perform(Collections.singletonList(actions))

Offset from Current Pointer Location

This method moves the mouse from its current position by the offset provided by the user. If the mouse has not previously been moved, the position will be in the upper left corner of the viewport. Note that the pointer position does not change when the page is scrolled.

Note that the first argument X specifies to move right when positive, while the second argument Y specifies to move down when positive. So moveByOffset(30, -10) moves right 30 and up 10 from the current mouse position.

        new Actions(driver)
                .moveByOffset(13, 15)
                .perform();
    ActionChains(driver)\
        .move_by_offset( 13, 15)\
        .perform()
            new Actions(driver)
                .MoveByOffset(13, 15)
                .Perform();
      driver.action
            .move_by(13, 15)
            .perform
      await actions.move({x: 13, y: 15, origin: Origin.POINTER}).perform()
        Actions(driver)
                .moveByOffset(13, 15)
                .perform()

Drag and Drop on Element

This method firstly performs a click-and-hold on the source element, moves to the location of the target element and then releases the mouse.

        WebElement draggable = driver.findElement(By.id("draggable"));
        WebElement droppable = driver.findElement(By.id("droppable"));
        new Actions(driver)
                .dragAndDrop(draggable, droppable)
                .perform();
    draggable = driver.find_element(By.ID, "draggable")
    droppable = driver.find_element(By.ID, "droppable")
    ActionChains(driver)\
        .drag_and_drop(draggable, droppable)\
        .perform()
            IWebElement draggable = driver.FindElement(By.Id("draggable"));
            IWebElement droppable = driver.FindElement(By.Id("droppable"));
            new Actions(driver)
                .DragAndDrop(draggable, droppable)
                .Perform();
    draggable = driver.find_element(id: 'draggable')
    droppable = driver.find_element(id: 'droppable')
    driver.action
          .drag_and_drop(draggable, droppable)
          .perform
      const draggable = driver.findElement(By.id("draggable"));
      const droppable = await driver.findElement(By.id("droppable"));
      const actions = driver.actions({async: true});
      await actions.dragAndDrop(draggable, droppable).perform();
        val draggable = driver.findElement(By.id("draggable"))
        val droppable = driver.findElement(By.id("droppable"))
        Actions(driver)
                .dragAndDrop(draggable, droppable)
                .perform()

Drag and Drop by Offset

This method firstly performs a click-and-hold on the source element, moves to the given offset and then releases the mouse.

        WebElement draggable = driver.findElement(By.id("draggable"));
        Rectangle start = draggable.getRect();
        Rectangle finish = driver.findElement(By.id("droppable")).getRect();
        new Actions(driver)
                .dragAndDropBy(draggable, finish.getX() - start.getX(), finish.getY() - start.getY())
                .perform();
    draggable = driver.find_element(By.ID, "draggable")
    start = draggable.location
    finish = driver.find_element(By.ID, "droppable").location
    ActionChains(driver)\
        .drag_and_drop_by_offset(draggable, finish['x'] - start['x'], finish['y'] - start['y'])\
        .perform()
            IWebElement draggable = driver.FindElement(By.Id("draggable"));
            Point start = draggable.Location;
            Point finish = driver.FindElement(By.Id("droppable")).Location;
            new Actions(driver)
                .DragAndDropToOffset(draggable, finish.X - start.X, finish.Y - start.Y)
                .Perform();
    draggable = driver.find_element(id: 'draggable')
    start = draggable.rect
    finish = driver.find_element(id: 'droppable').rect
    driver.action
          .drag_and_drop_by(draggable, finish.x - start.x, finish.y - start.y)
          .perform
      const draggable = driver.findElement(By.id("draggable"));
      let start = await draggable.getRect();
      let finish = await driver.findElement(By.id("droppable")).getRect();
      const actions = driver.actions({async: true});
      await actions.dragAndDrop(draggable, {x: finish.x - start.x, y: finish.y - start.y}).perform();
        val draggable = driver.findElement(By.id("draggable"))
        val start = draggable.getRect()
        val finish = driver.findElement(By.id("droppable")).getRect()
        Actions(driver)
                .dragAndDropBy(draggable, finish.getX() - start.getX(), finish.getY() - start.getY())
                .perform()

7.3 - Pen actions

A representation of a pen stylus kind of pointer input for interacting with a web page.

Chromium Only

A Pen is a type of pointer input that has most of the same behavior as a mouse, but can also have event properties unique to a stylus. Additionally, while a mouse has 5 buttons, a pen has 3 equivalent button states:

  • 0 — Touch Contact (the default; equivalent to a left click)
  • 2 — Barrel Button (equivalent to a right click)
  • 5 — Eraser Button (currently unsupported by drivers)

Using a Pen

Selenium v4.2

        WebElement pointerArea = driver.findElement(By.id("pointerArea"));
        new Actions(driver)
                .setActivePointer(PointerInput.Kind.PEN, "default pen")
                .moveToElement(pointerArea)
                .clickAndHold()
                .moveByOffset(2, 2)
                .release()
                .perform();

Selenium v4.2

    pointer_area = driver.find_element(By.ID, "pointerArea")
    pen_input = PointerInput(POINTER_PEN, "default pen")
    action = ActionBuilder(driver, mouse=pen_input)
    action.pointer_action\
        .move_to(pointer_area)\
        .pointer_down()\
        .move_by(2, 2)\
        .pointer_up()
    action.perform()
            IWebElement pointerArea = driver.FindElement(By.Id("pointerArea"));
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice pen = new PointerInputDevice(PointerKind.Pen, "default pen");
            
            actionBuilder.AddAction(pen.CreatePointerMove(pointerArea, 0, 0, TimeSpan.FromMilliseconds(800)));
            actionBuilder.AddAction(pen.CreatePointerDown(MouseButton.Left));
            actionBuilder.AddAction(pen.CreatePointerMove(CoordinateOrigin.Pointer,
                2, 2, TimeSpan.Zero));
            actionBuilder.AddAction(pen.CreatePointerUp(MouseButton.Left));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());

Selenium v4.2

    pointer_area = driver.find_element(id: 'pointerArea')
    driver.action(devices: :pen)
          .move_to(pointer_area)
          .pointer_down
          .move_by(2, 2)
          .pointer_up
          .perform
        val pointerArea = driver.findElement(By.id("pointerArea"))
        Actions(driver)
                .setActivePointer(PointerInput.Kind.PEN, "default pen")
                .moveToElement(pointerArea)
                .clickAndHold()
                .moveByOffset(2, 2)
                .release()
                .perform()

Adding Pointer Event Attributes

Selenium v4.2

        WebElement pointerArea = driver.findElement(By.id("pointerArea"));
        PointerInput pen = new PointerInput(PointerInput.Kind.PEN, "default pen");
        PointerInput.PointerEventProperties eventProperties = PointerInput.eventProperties()
                .setTiltX(-72)
                .setTiltY(9)
                .setTwist(86);
        PointerInput.Origin origin = PointerInput.Origin.fromElement(pointerArea);

        Sequence actionListPen = new Sequence(pen, 0)
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 0, 0))
                .addAction(pen.createPointerDown(0))
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 2, 2, eventProperties))
                .addAction(pen.createPointerUp(0));

        ((RemoteWebDriver) driver).perform(Collections.singletonList(actionListPen));
    pointer_area = driver.find_element(By.ID, "pointerArea")
    pen_input = PointerInput(POINTER_PEN, "default pen")
    action = ActionBuilder(driver, mouse=pen_input)
    action.pointer_action\
        .move_to(pointer_area)\
        .pointer_down()\
        .move_by(2, 2, tilt_x=-72, tilt_y=9, twist=86)\
        .pointer_up(0)
    action.perform()
            IWebElement pointerArea = driver.FindElement(By.Id("pointerArea"));
            ActionBuilder actionBuilder = new ActionBuilder();
            PointerInputDevice pen = new PointerInputDevice(PointerKind.Pen, "default pen");
            PointerInputDevice.PointerEventProperties properties = new PointerInputDevice.PointerEventProperties() {
                TiltX = -72,
                TiltY = 9,
                Twist = 86,
            };            
            actionBuilder.AddAction(pen.CreatePointerMove(pointerArea, 0, 0, TimeSpan.FromMilliseconds(800)));
            actionBuilder.AddAction(pen.CreatePointerDown(MouseButton.Left));
            actionBuilder.AddAction(pen.CreatePointerMove(CoordinateOrigin.Pointer,
                2, 2, TimeSpan.Zero, properties));
            actionBuilder.AddAction(pen.CreatePointerUp(MouseButton.Left));
            ((IActionExecutor)driver).PerformActions(actionBuilder.ToActionSequenceList());
    pointer_area = driver.find_element(id: 'pointerArea')
    driver.action(devices: :pen)
          .move_to(pointer_area)
          .pointer_down
          .move_by(2, 2, tilt_x: -72, tilt_y: 9, twist: 86)
          .pointer_up
          .perform
        val pointerArea = driver.findElement(By.id("pointerArea"))
        val pen = PointerInput(PointerInput.Kind.PEN, "default pen")
        val eventProperties = PointerInput.eventProperties()
                .setTiltX(-72)
                .setTiltY(9)
                .setTwist(86)
        val origin = PointerInput.Origin.fromElement(pointerArea)
        
        val actionListPen = Sequence(pen, 0)
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 0, 0))
                .addAction(pen.createPointerDown(0))
                .addAction(pen.createPointerMove(Duration.ZERO, origin, 2, 2, eventProperties))
                .addAction(pen.createPointerUp(0))

        (driver as RemoteWebDriver).perform(listOf(actionListPen))

7.4 - Scroll wheel actions

A representation of a scroll wheel input device for interacting with a web page.

Selenium v4.2

Chromium Only

There are 5 scenarios for scrolling on a page.

Scroll to element

This is the most common scenario. Unlike traditional click and send keys methods, the actions class does not automatically scroll the target element into view, so this method will need to be used if elements are not already inside the viewport.

This method takes a web element as the sole argument.

Regardless of whether the element is above or below the current viewscreen, the viewport will be scrolled so the bottom of the element is at the bottom of the screen.

        WebElement iframe = driver.findElement(By.tagName("iframe"));
        new Actions(driver)
                .scrollToElement(iframe)
                .perform();
    iframe = driver.find_element(By.TAG_NAME, "iframe")
    ActionChains(driver)\
        .scroll_to_element(iframe)\
        .perform()
            IWebElement iframe = driver.FindElement(By.TagName("iframe"));
            new Actions(driver)
                .ScrollToElement(iframe)
                .Perform();
    iframe = driver.find_element(tag_name: 'iframe')
    driver.action
          .scroll_to(iframe)
          .perform
      const iframe = await driver.findElement(By.css("iframe"))

      await driver.actions()
        .scroll(0, 0, 0, 0, iframe)
        .perform()
        val iframe = driver.findElement(By.tagName("iframe"))
        Actions(driver)
                .scrollToElement(iframe)
                .perform()

Scroll by given amount

This is the second most common scenario for scrolling. Pass in an delta x and a delta y value for how much to scroll in the right and down directions. Negative values represent left and up, respectively.

        WebElement footer = driver.findElement(By.tagName("footer"));
        int deltaY = footer.getRect().y;
        new Actions(driver)
                .scrollByAmount(0, deltaY)
                .perform();
    footer = driver.find_element(By.TAG_NAME, "footer")
    delta_y = footer.rect['y']
    ActionChains(driver)\
        .scroll_by_amount(0, delta_y)\
        .perform()
            IWebElement footer = driver.FindElement(By.TagName("footer"));
            int deltaY = footer.Location.Y;
            new Actions(driver)
                .ScrollByAmount(0, deltaY)
                .Perform();
    footer = driver.find_element(tag_name: 'footer')
    delta_y = footer.rect.y
    driver.action
          .scroll_by(0, delta_y)
          .perform
      const footer = await driver.findElement(By.css("footer"))
      const deltaY = (await footer.getRect()).y

      await driver.actions()
        .scroll(0, 0, 0, deltaY)
        .perform()
        val footer = driver.findElement(By.tagName("footer"))
        val deltaY = footer.getRect().y
        Actions(driver)
                .scrollByAmount(0, deltaY)
                .perform()

Scroll from an element by a given amount

This scenario is effectively a combination of the above two methods.

To execute this use the “Scroll From” method, which takes 3 arguments. The first represents the origination point, which we designate as the element, and the second two are the delta x and delta y values.

If the element is out of the viewport, it will be scrolled to the bottom of the screen, then the page will be scrolled by the provided delta x and delta y values.

        WebElement iframe = driver.findElement(By.tagName("iframe"));
        WheelInput.ScrollOrigin scrollOrigin = WheelInput.ScrollOrigin.fromElement(iframe);
        new Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform();
    iframe = driver.find_element(By.TAG_NAME, "iframe")
    scroll_origin = ScrollOrigin.from_element(iframe)
    ActionChains(driver)\
        .scroll_from_origin(scroll_origin, 0, 200)\
        .perform()
            IWebElement iframe = driver.FindElement(By.TagName("iframe"));
            WheelInputDevice.ScrollOrigin scrollOrigin = new WheelInputDevice.ScrollOrigin
            {
                Element = iframe
            };
            new Actions(driver)
                .ScrollFromOrigin(scrollOrigin, 0, 200)
                .Perform();
    iframe = driver.find_element(tag_name: 'iframe')
    scroll_origin = Selenium::WebDriver::WheelActions::ScrollOrigin.element(iframe)
    driver.action
          .scroll_from(scroll_origin, 0, 200)
          .perform
      const iframe = await driver.findElement(By.css("iframe"))

      await driver.actions()
        .scroll(0, 0, 0, 200, iframe)
        .perform()
        val iframe = driver.findElement(By.tagName("iframe"))
        val scrollOrigin = WheelInput.ScrollOrigin.fromElement(iframe)
        Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform()

Scroll from an element with an offset

This scenario is used when you need to scroll only a portion of the screen, and it is outside the viewport. Or is inside the viewport and the portion of the screen that must be scrolled is a known offset away from a specific element.

This uses the “Scroll From” method again, and in addition to specifying the element, an offset is specified to indicate the origin point of the scroll. The offset is calculated from the center of the provided element.

If the element is out of the viewport, it first will be scrolled to the bottom of the screen, then the origin of the scroll will be determined by adding the offset to the coordinates of the center of the element, and finally the page will be scrolled by the provided delta x and delta y values.

Note that if the offset from the center of the element falls outside of the viewport, it will result in an exception.

        WebElement footer = driver.findElement(By.tagName("footer"));
        WheelInput.ScrollOrigin scrollOrigin = WheelInput.ScrollOrigin.fromElement(footer, 0, -50);
        new Actions(driver)
                .scrollFromOrigin(scrollOrigin,0, 200)
                .perform();
    footer = driver.find_element(By.TAG_NAME, "footer")
    scroll_origin = ScrollOrigin.from_element(footer, 0, -50)
    ActionChains(driver)\
        .scroll_from_origin(scroll_origin, 0, 200)\
        .perform()
            IWebElement footer = driver.FindElement(By.TagName("footer"));
            var scrollOrigin = new WheelInputDevice.ScrollOrigin
            {
                Element = footer,
                XOffset = 0,
                YOffset = -50
            };
            new Actions(driver)
                .ScrollFromOrigin(scrollOrigin, 0, 200)
                .Perform();
    footer = driver.find_element(tag_name: 'footer')
    scroll_origin = Selenium::WebDriver::WheelActions::ScrollOrigin.element(footer, 0, -50)
    driver.action
          .scroll_from(scroll_origin, 0, 200)
          .perform
      const footer = await driver.findElement(By.css("footer"))

      await driver.actions()
        .scroll(0, -50, 0, 200, footer)
        .perform()
        val footer = driver.findElement(By.tagName("footer"))
        val scrollOrigin = WheelInput.ScrollOrigin.fromElement(footer, 0, -50)
        Actions(driver)
                .scrollFromOrigin(scrollOrigin,0, 200)
                .perform()

Scroll from a offset of origin (element) by given amount

The final scenario is used when you need to scroll only a portion of the screen, and it is already inside the viewport.

This uses the “Scroll From” method again, but the viewport is designated instead of an element. An offset is specified from the upper left corner of the current viewport. After the origin point is determined, the page will be scrolled by the provided delta x and delta y values.

Note that if the offset from the upper left corner of the viewport falls outside of the screen, it will result in an exception.

        WheelInput.ScrollOrigin scrollOrigin = WheelInput.ScrollOrigin.fromViewport(10, 10);
        new Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform();
    scroll_origin = ScrollOrigin.from_viewport(10, 10)

    ActionChains(driver)\
        .scroll_from_origin(scroll_origin, 0, 200)\
        .perform()
            var scrollOrigin = new WheelInputDevice.ScrollOrigin
            {
                Viewport = true,
                XOffset = 10,
                YOffset = 10
            };
            new Actions(driver)
                .ScrollFromOrigin(scrollOrigin, 0, 200)
                .Perform();
    scroll_origin = Selenium::WebDriver::WheelActions::ScrollOrigin.viewport(10, 10)
    driver.action
          .scroll_from(scroll_origin, 0, 200)
          .perform
      await driver.actions()
        .scroll(10, 10, 0, 200)
        .perform()
        val scrollOrigin = WheelInput.ScrollOrigin.fromViewport(10, 10)
        Actions(driver)
                .scrollFromOrigin(scrollOrigin, 0, 200)
                .perform()

8 - BiDirectional functionality

Selenium is working with browser vendors to create the WebDriver BiDirectional Protocol as a means to provide a stable, cross-browser API that uses the bidirectional functionality useful for both browser automation generally and testing specifically. Before now, users seeking this functionality have had to rely on CDP (Chrome DevTools Protocol) with all of its frustrations and limitations.

The traditional WebDriver model of strict request/response commands will be supplemented with the ability to stream events from the user agent to the controlling software via WebSockets, better matching the evented nature of the browser DOM.

As it is not a good idea to tie your tests to a specific version of any browser, the Selenium project recommends using WebDriver BiDi wherever possible.

While the specification is in works, the browser vendors are parallely implementing the WebDriver BiDirectional Protocol. Refer web-platform-tests dashboard to see how far along the browser vendors are. Selenium is trying to keep up with the browser vendors and has started implementing W3C BiDi APIs. The goal is to ensure APIs are W3C compliant and uniform among the different language bindings.

However, until the specification and corresponding Selenium implementation is complete there are many useful things that CDP offers. Selenium offers some useful helper classes that use CDP.

8.1 - Chrome DevTools

Many browsers provide “DevTools” – a set of tools that are integrated with the browser that developers can use to debug web apps and explore the performance of their pages. Google Chrome’s DevTools make use of a protocol called the Chrome DevTools Protocol (or “CDP” for short). As the name suggests, this is not designed for testing, nor to have a stable API, so functionality is highly dependent on the version of the browser.

The WebDriver BiDirectional Protocol is the next generation of the W3C WebDriver protocol and aims to provide a stable API implemented by all browsers, but it’s not yet complete. Until it is, Selenium provides access to the CDP for those browsers that implement it (such as Google Chrome, or Microsoft Edge, and Firefox), allowing you to enhance your tests in interesting ways. Some examples of what you can do with it are given below.

Ways to Use Chrome DevTools With Selenium

There are three different ways to access Chrome DevTools in Selenium. If you look for other examples online, you will likely see each of these mixed and matched.

  • The CDP Endpoint was the first option available to users. It only works for the most simple things (setting state, getting basic information), and you have to know the “magic strings” for the domain and methods and key value pairs. For basic requirements, this might be simpler than the other options. These methods are only temporarily supported.
  • The CDP API is an improvement on just using the endpoint because you can set do things asynchronously. Instead of a String and a Map, you can access the supported classes, methods and parameters in the code. These methods are also only temporarily supported.
  • The BiDi API option should be used whenever possible because it abstracts away the implementation details entirely and will work with either CDP or WebDriver-BiDi when Selenium moves away from CDP.

Examples With Limited Value

There are a number of commonly cited examples for using CDP that are of limited practical value.

  • Geo Location — almost all sites use the IP address to determine physical location, so setting an emulated geolocation rarely has the desired effect.
  • Overriding Device Metrics — Chrome provides a great API for setting Mobile Emulation in the Options classes, which is generally superior to attempting to do this with CDP.

Check out the examples in these documents for ways to do additional useful things:

8.1.1 - Chrome DevTools Protocol Endpoint

Google provides a /cdp/execute endpoint that can be accessed directly. Each Selenium binding provides a method that allows you to pass the CDP domain as a String, and the required parameters as a simple Map.

These methods will eventually be removed. It is recommended to use the WebDriver-BiDi or WebDriver Bidi APIs methods where possible to ensure future compatibility.

Usage

Generally you should prefer the use of the CDP API over this approach, but sometimes the syntax is cleaner or significantly more simple.

Limitations include:

  • It only works for use cases that are limited to setting or getting information; any actual asynchronous interactions require another implementation
  • You have to know the exactly correct “magic strings” for domains and keys
  • It is possible that an update to Chrome will change the required parameters

Examples

An alternate implementation can be found at CDP API Set Cookie

    Map<String, Object> cookie = new HashMap<>();
    cookie.put("name", "cheese");
    cookie.put("value", "gouda");
    cookie.put("domain", "www.selenium.dev");
    cookie.put("secure", true);

    ((HasCdp) driver).executeCdpCommand("Network.setCookie", cookie);
    cookie = {'name': 'cheese',
              'value': 'gouda',
              'domain': 'www.selenium.dev',
              'secure': True}

    driver.execute_cdp_cmd('Network.setCookie', cookie)
            var cookie = new Dictionary<string, object>
            {
                { "name", "cheese" },
                { "value", "gouda" },
                { "domain", "www.selenium.dev" },
                { "secure", true }
            };

            ((ChromeDriver)driver).ExecuteCdpCommand("Network.setCookie", cookie);

The CDP API Set Cookie implementation should be preferred

    cookie = {name: 'cheese',
              value: 'gouda',
              domain: 'www.selenium.dev',
              secure: true}

    driver.execute_cdp('Network.setCookie', **cookie)

Performance Metrics

An alternate implementation can be found at CDP API Performance Metrics

The CDP API Performance Metrics implementation should be preferred

    ((HasCdp) driver).executeCdpCommand("Performance.enable", new HashMap<>());

    Map<String, Object> response =
        ((HasCdp) driver).executeCdpCommand("Performance.getMetrics", new HashMap<>());
    driver.execute_cdp_cmd('Performance.enable', {})

    metric_list = driver.execute_cdp_cmd('Performance.getMetrics', {})["metrics"]
            ((ChromeDriver)driver).ExecuteCdpCommand("Performance.enable", emptyDictionary);

            Dictionary<string, object> response = (Dictionary<string, object>)((ChromeDriver)driver)
                .ExecuteCdpCommand("Performance.getMetrics", emptyDictionary);

The CDP API Performance Metrics implementation should be preferred

    driver.execute_cdp('Performance.enable')

    metric_list = driver.execute_cdp('Performance.getMetrics')['metrics']

Basic authentication

Alternate implementations can be found at CDP API Basic Authentication and BiDi API Basic Authentication

The BiDi API Basic Authentication implementation should be preferred

    ((HasCdp) driver).executeCdpCommand("Network.enable", new HashMap<>());

    String encodedAuth = Base64.getEncoder().encodeToString("admin:admin".getBytes());
    Map<String, Object> headers =
        ImmutableMap.of("headers", ImmutableMap.of("authorization", "Basic " + encodedAuth));

    ((HasCdp) driver).executeCdpCommand("Network.setExtraHTTPHeaders", headers);
    driver.execute_cdp_cmd("Network.enable", {})

    credentials = base64.b64encode("admin:admin".encode()).decode()
    headers = {'headers': {'authorization': 'Basic ' + credentials}}

    driver.execute_cdp_cmd('Network.setExtraHTTPHeaders', headers)
            ((ChromeDriver)driver).ExecuteCdpCommand("Network.enable", emptyDictionary);
            
            string encodedAuth = Convert.ToBase64String(Encoding.Default.GetBytes("admin:admin"));
            var headers = new Dictionary<string, object>
            {
                { "headers", new Dictionary<string, string> { { "authorization", "Basic " + encodedAuth } } }
            };

            ((ChromeDriver)driver).ExecuteCdpCommand("Network.setExtraHTTPHeaders", headers);

The BiDi API Basic Authentication implementation should be preferred

    driver.execute_cdp('Network.enable')

    credentials = Base64.strict_encode64('admin:admin')
    headers = {authorization: "Basic #{credentials}"}

    driver.execute_cdp('Network.setExtraHTTPHeaders', headers: headers)

8.1.2 - Chrome DevTools Protocol API

Each of the Selenium bindings dynamically generates classes and methods for the various CDP domains and features; these are tied to specific versions of Chrome.

While Selenium 4 provides direct access to the Chrome DevTools Protocol (CDP), these methods will eventually be removed. It is recommended to use the WebDriver Bidi APIs methods where possible to ensure future compatibility.

Usage

If your use case has been implemented by WebDriver Bidi or the BiDi API, you should use those implementations instead of this one. Generally you should prefer this approach over executing with the CDP Endpoint, especially in Ruby.

Examples

An alternate implementation can be found at CDP Endpoint Set Cookie

Because Java requires using all the parameters example, the Map approach used in CDP Endpoint Set Cookie might be more simple.

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();

    devTools.send(
        Network.setCookie(
            "cheese",
            "gouda",
            Optional.empty(),
            Optional.of("www.selenium.dev"),
            Optional.empty(),
            Optional.of(true),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty(),
            Optional.empty()));

Because Python requires using async methods for this example, the synchronous approach found in CDP Endpoint Set Cookie might be easier.

    async with driver.bidi_connection() as connection:
        execution = connection.devtools.network.set_cookie(
            name="cheese",
            value="gouda",
            domain="www.selenium.dev",
            secure=True
        )

        await connection.session.execute(execution)

Due to the added complexity in .NET of obtaining the domains and executing with awaits, the CDP Endpoint Set Cookie might be easier.

            var session = ((IDevTools)driver).GetDevToolsSession();
            var domains = session.GetVersionSpecificDomains<OpenQA.Selenium.DevTools.V124.DevToolsSessionDomains>();
            await domains.Network.Enable(new OpenQA.Selenium.DevTools.V124.Network.EnableCommandSettings());

            var cookieCommandSettings = new SetCookieCommandSettings
            {
                Name = "cheese",
                Value = "gouda",
                Domain = "www.selenium.dev",
                Secure = true
            };

            await domains.Network.SetCookie(cookieCommandSettings);
    driver.devtools.network.set_cookie(name: 'cheese',
                                       value: 'gouda',
                                       domain: 'www.selenium.dev',
                                       secure: true)

Performance Metrics

An alternate implementation can be found at CDP Endpoint Performance Metrics

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Performance.enable(Optional.empty()));

    List<Metric> metricList = devTools.send(Performance.getMetrics());

Because Python requires using async methods for this example, the synchronous approach found in CDP Endpoint Performance Metrics might be easier.

    async with driver.bidi_connection() as connection:
        await connection.session.execute(connection.devtools.performance.enable())

        metric_list = await connection.session.execute(connection.devtools.performance.get_metrics())

Due to the added complexity in .NET of obtaining the domains and executing with awaits, the CDP Endpoint Performance Metrics might be easier.

            var session = ((IDevTools)driver).GetDevToolsSession();
            var domains = session.GetVersionSpecificDomains<OpenQA.Selenium.DevTools.V124.DevToolsSessionDomains>();
            await domains.Performance.Enable(new OpenQA.Selenium.DevTools.V124.Performance.EnableCommandSettings());

            var metricsResponse =
                await session.SendCommand<GetMetricsCommandSettings, GetMetricsCommandResponse>(
                    new GetMetricsCommandSettings()
                );
    driver.devtools.performance.enable

    metric_list = driver.devtools.performance.get_metrics.dig('result', 'metrics')

Basic authentication

Alternate implementations can be found at CDP Endpoint Basic Authentication and BiDi API Basic Authentication

The BiDi API Basic Authentication implementation should be preferred

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Network.enable(Optional.of(100000), Optional.of(100000), Optional.of(100000)));

    String encodedAuth = Base64.getEncoder().encodeToString("admin:admin".getBytes());
    Map<String, Object> headers = ImmutableMap.of("Authorization", "Basic " + encodedAuth);

    devTools.send(Network.setExtraHTTPHeaders(new Headers(headers)));

Because Python requires using async methods for this example, the synchronous approach found in CDP Endpoint Basic Authentication might be easier.

    async with driver.bidi_connection() as connection:
        await connection.session.execute(connection.devtools.network.enable())

        credentials = base64.b64encode("admin:admin".encode()).decode()
        auth = {'authorization': 'Basic ' + credentials}

        await connection.session.execute(connection.devtools.network.set_extra_http_headers(Headers(auth)))

Due to the added complexity in .NET of obtaining the domains and executing with awaits, the CDP Endpoint Basic Authentication might be easier.

            var session = ((IDevTools)driver).GetDevToolsSession();
            var domains = session.GetVersionSpecificDomains<OpenQA.Selenium.DevTools.V124.DevToolsSessionDomains>();
            await domains.Network.Enable(new OpenQA.Selenium.DevTools.V124.Network.EnableCommandSettings());

            var encodedAuth = Convert.ToBase64String(Encoding.Default.GetBytes("admin:admin"));
            var headerSettings = new SetExtraHTTPHeadersCommandSettings
            {
                Headers = new Headers()
                {
                    { "authorization", "Basic " + encodedAuth }
                }
            };

            await domains.Network.SetExtraHTTPHeaders(headerSettings);

The BiDi API Basic Authentication implementation should be preferred

    driver.devtools.network.enable

    credentials = Base64.strict_encode64('admin:admin')

    driver.devtools.network.set_extra_http_headers(headers: {authorization: "Basic #{credentials}"})

Console logs

Because reading console logs requires setting an event listener, this cannot be done with a CDP Endpoint implementation Alternate implementations can be found at BiDi API Console logs and errors and WebDriver BiDi Console logs

Use the WebDriver BiDi Console logs implementation

    DevTools devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Runtime.enable());

    CopyOnWriteArrayList<String> logs = new CopyOnWriteArrayList<>();
    devTools.addListener(
        Runtime.consoleAPICalled(),
        event -> logs.add((String) event.getArgs().get(0).getValue().orElse("")));

The BiDi API Console logs and errors implementation should be preferred

    driver.devtools.runtime.enable

    logs = []
    driver.devtools.runtime.on(:console_api_called) do |params|
      logs << params['args'].first['value']
    end

JavaScript exceptions

Similar to console logs, but this listens for actual javascript exceptions not just logged errors Alternate implementations can be found at BiDi API JavaScript exceptions and WebDriver BiDi JavaScript exceptions

Use the WebDriver BiDi JavaScript exceptions implementation

    DevTools devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(Runtime.enable());

    CopyOnWriteArrayList<JavascriptException> errors = new CopyOnWriteArrayList<>();
    devTools.getDomains().events().addJavascriptExceptionListener(errors::add);

Download complete

Wait for a download to finish before continuing. Because getting download status requires setting a listener, this cannot be done with a CDP Endpoint implementation.

    devTools = ((HasDevTools) driver).getDevTools();
    devTools.createSession();
    devTools.send(
        Browser.setDownloadBehavior(
            Browser.SetDownloadBehaviorBehavior.ALLOWANDNAME,
            Optional.empty(),
            Optional.of(""),
            Optional.of(true)));

    AtomicBoolean completed = new AtomicBoolean(false);
    devTools.addListener(
        Browser.downloadProgress(),
        e -> completed.set(Objects.equals(e.getState().toString(), "completed")));
    driver.devtools.browser.set_download_behavior(behavior: 'allow',
                                                  download_path: '',
                                                  events_enabled: true)

    driver.devtools.browser.on(:download_progress) do |progress|
      @completed = progress['state'] == 'completed'
    end

8.1.3 - Chrome Devtools Protocol with BiDi API

These examples are currently implemented with CDP, but the same code should work when the functionality is re-implemented with WebDriver-BiDi.

Usage

The following list of APIs will be growing as the Selenium project works through supporting real world use cases. If there is additional functionality you’d like to see, please raise a feature request.

As these examples are re-implemented with the WebDriver-Bidi protocol, they will be moved to the WebDriver Bidi pages.

Examples

Basic authentication

Some applications make use of browser authentication to secure pages. It used to be common to handle them in the URL, but browser stopped supporting this. With BiDi, you can now provide the credentials when necessary

Alternate implementations can be found at CDP Endpoint Basic Authentication and CDP API Basic Authentication

    Predicate<URI> uriPredicate = uri -> uri.toString().contains("herokuapp.com");
    Supplier<Credentials> authentication = UsernameAndPassword.of("admin", "admin");

    ((HasAuthentication) driver).register(uriPredicate, authentication);

An alternate implementation may be found at CDP Endpoint Basic Authentication

Implementation Missing

            var handler = new NetworkAuthenticationHandler()
            {
                UriMatcher = uri => uri.AbsoluteUri.Contains("herokuapp"),
                Credentials = new PasswordCredentials("admin", "admin")
            };

            var networkInterceptor = driver.Manage().Network;
            networkInterceptor.AddAuthenticationHandler(handler);

            await networkInterceptor.StartMonitoring();
    driver.register(username: 'admin',
                    password: 'admin',
                    uri: /herokuapp/)

Move Code

const {Builder} = require('selenium-webdriver');

(async function example() {
  try {
    let driver = await new Builder()
      .forBrowser('chrome')
      .build();

    const pageCdpConnection = await driver.createCDPConnection('page');
    await driver.register('username', 'password', pageCdpConnection);
    await driver.get('https://the-internet.herokuapp.com/basic_auth');
    await driver.quit();
  }catch (e){
    console.log(e)
  }
}())

Move Code

val uriPredicate = Predicate { uri: URI ->
        uri.host.contains("your-domain.com")
    }
(driver as HasAuthentication).register(uriPredicate, UsernameAndPassword.of("admin", "password"))
driver.get("https://your-domain.com/login")

Pin scripts

This can be especially useful when executing on a remote server. For example, whenever you check the visibility of an element, or whenever you use the classic get attribute method, Selenium is sending the contents of a js file to the script execution endpoint. These files are each about 50kB, which adds up.

            var key = await new JavaScriptEngine(driver).PinScript("return arguments;");

            var arguments = ((WebDriver)driver).ExecuteScript(key, 1, true, element);
    key = driver.pin_script('return arguments;')
    arguments = driver.execute_script(key, 1, true, element)

Mutation observation

Mutation Observation is the ability to capture events via WebDriver BiDi when there are DOM mutations on a specific element in the DOM.

    CopyOnWriteArrayList<WebElement> mutations = new CopyOnWriteArrayList<>();
    ((HasLogEvents) driver).onLogEvent(domMutation(e -> mutations.add(e.getElement())));
    async with driver.bidi_connection() as session:
        log = Log(driver, session)
            var mutations = new List<IWebElement>();
            using IJavaScriptEngine monitor = new JavaScriptEngine(driver);
            monitor.DomMutated += (_, e) =>
            {
                var locator = By.CssSelector($"*[data-__webdriver_id='{e.AttributeData.TargetId}']");
                mutations.Add(driver.FindElement(locator));
            };

            await monitor.StartEventMonitoring();
            await monitor.EnableDomMutationMonitoring();
    mutations = []
    driver.on_log_event(:mutation) { |mutation| mutations << mutation.element }

Move Code

const {Builder, until} = require('selenium-webdriver');
const assert = require("assert");

(async function example() {
  try {
    let driver = await new Builder()
      .forBrowser('chrome')
      .build();

    const cdpConnection = await driver.createCDPConnection('page');
    await driver.logMutationEvents(cdpConnection, event => {
      assert.deepStrictEqual(event['attribute_name'], 'style');
      assert.deepStrictEqual(event['current_value'], "");
      assert.deepStrictEqual(event['old_value'], "display:none;");
    });

    await driver.get('dynamic.html');
    await driver.findElement({id: 'reveal'}).click();
    let revealed = driver.findElement({id: 'revealed'});
    await driver.wait(until.elementIsVisible(revealed), 5000);
    await driver.quit();
  }catch (e){
    console.log(e)
  }
}())

Console logs and errors

Listen to the console.log events and register callbacks to process the event.

CDP API Console logs and WebDriver BiDi Console logs

Use the WebDriver BiDi Console logs implementation. HasLogEvents will likely end up deprecated because it does not implement Closeable.

    CopyOnWriteArrayList<String> messages = new CopyOnWriteArrayList<>();
    ((HasLogEvents) driver).onLogEvent(consoleEvent(e -> messages.add(e.getMessages().get(0))));
    async with driver.bidi_connection() as session:
        log = Log(driver, session)

        async with log.add_listener(Console.ALL) as messages:
            using IJavaScriptEngine monitor = new JavaScriptEngine(driver);
            var messages = new List<string>();
            monitor.JavaScriptConsoleApiCalled += (_, e) =>
            {
                messages.Add(e.MessageContent);
            };

            await monitor.StartEventMonitoring();
    logs = []
    driver.on_log_event(:console) { |log| logs << log.args.first }

Move Code

const {Builder} = require('selenium-webdriver');
(async () => {
  try {
    let driver = new Builder()
      .forBrowser('chrome')
      .build();

    const cdpConnection = await driver.createCDPConnection('page');
    await driver.onLogEvent(cdpConnection, function (event) {
      console.log(event['args'][0]['value']);
    });
    await driver.executeScript('console.log("here")');
    await driver.quit();
  }catch (e){
    console.log(e);
  }
})()

Move Code

fun kotlinConsoleLogExample() {
    val driver = ChromeDriver()
    val devTools = driver.devTools
    devTools.createSession()

    val logConsole = { c: ConsoleEvent -> print("Console log message is: " + c.messages)}
    devTools.domains.events().addConsoleListener(logConsole)

    driver.get("https://www.google.com")

    val executor = driver as JavascriptExecutor
    executor.executeScript("console.log('Hello World')")

    val input = driver.findElement(By.name("q"))
    input.sendKeys("Selenium 4")
    input.sendKeys(Keys.RETURN)
    driver.quit()
}

JavaScript exceptions

Listen to the JS Exceptions and register callbacks to process the exception details.

    async with driver.bidi_connection() as session:
        log = Log(driver, session)

        async with log.add_js_error_listener() as messages:
            using IJavaScriptEngine monitor = new JavaScriptEngine(driver);
            var messages = new List<string>();
            monitor.JavaScriptExceptionThrown += (_, e) =>
            {
                messages.Add(e.Message);
            };

            await monitor.StartEventMonitoring();
    exceptions = []
    driver.on_log_event(:exception) { |exception| exceptions << exception }

Network Interception

Both requests and responses can be recorded or transformed.

Response information

    CopyOnWriteArrayList<String> contentType = new CopyOnWriteArrayList<>();

    try (NetworkInterceptor ignored =
        new NetworkInterceptor(
            driver,
            (Filter)
                next ->
                    req -> {
                      HttpResponse res = next.execute(req);
                      contentType.add(res.getHeader("Content-Type"));
                      return res;
                    })) {
            var contentType = new List<string>();

            INetwork networkInterceptor = driver.Manage().Network;
            networkInterceptor.NetworkResponseReceived += (_, e)  =>
            {
                contentType.Add(e.ResponseHeaders["content-type"]);
            };

            await networkInterceptor.StartMonitoring();
    content_type = []
    driver.intercept do |request, &continue|
      continue.call(request) do |response|
        content_type << response.headers['content-type']
      end
    end

Response transformation

    try (NetworkInterceptor ignored =
        new NetworkInterceptor(
            driver,
            Route.matching(req -> true)
                .to(
                    () ->
                        req ->
                            new HttpResponse()
                                .setStatus(200)
                                .addHeader("Content-Type", MediaType.HTML_UTF_8.toString())
                                .setContent(Contents.utf8String("Creamy, delicious cheese!"))))) {
            var handler = new NetworkResponseHandler()
            {
                ResponseMatcher = _ => true,
                ResponseTransformer = _ => new HttpResponseData
                {
                    StatusCode = 200,
                    Body = "Creamy, delicious cheese!"
                }
            };

            INetwork networkInterceptor = driver.Manage().Network;
            networkInterceptor.AddResponseHandler(handler);

            await networkInterceptor.StartMonitoring();
    driver.intercept do |request, &continue|
      continue.call(request) do |response|
        response.body = 'Creamy, delicious cheese!' if request.url.include?('blank')
      end
    end

Move Code

const connection = await driver.createCDPConnection('page')
let url = fileServer.whereIs("/cheese")
let httpResponse = new HttpResponse(url)
httpResponse.addHeaders("Content-Type", "UTF-8")
httpResponse.body = "sausages"
await driver.onIntercept(connection, httpResponse, async function () {
  let body = await driver.getPageSource()
  assert.strictEqual(body.includes("sausages"), true, `Body contains: ${body}`)
})
driver.get(url)

Move Code

val driver = ChromeDriver()
val interceptor = new NetworkInterceptor(
      driver,
      Route.matching(req -> true)
        .to(() -> req -> new HttpResponse()
          .setStatus(200)
          .addHeader("Content-Type", MediaType.HTML_UTF_8.toString())
          .setContent(utf8String("Creamy, delicious cheese!"))))

    driver.get(appServer.whereIs("/cheese"))

    String source = driver.getPageSource()

Request interception

            var handler = new NetworkRequestHandler
            {
                RequestMatcher = request => request.Url.Contains("one.js"),
                RequestTransformer = request =>
                {
                    request.Url = request.Url.Replace("one", "two");

                    return request;
                }
            };

            INetwork networkInterceptor = driver.Manage().Network;
            networkInterceptor.AddRequestHandler(handler);

            await networkInterceptor.StartMonitoring();
    driver.intercept do |request, &continue|
      uri = URI(request.url)
      request.url = uri.to_s.gsub('one', 'two') if uri.path&.end_with?('one.js')
      continue.call(request)
    end

8.2 - BiDirectional API (W3C compliant)

The following list of APIs will be growing as the WebDriver BiDirectional Protocol grows and browser vendors implement the same. Additionally, Selenium will try to support real-world use cases that internally use a combination of W3C BiDi protocol APIs.

If there is additional functionality you’d like to see, please raise a feature request.

8.2.1 - Browsing Context

This section contains the APIs related to browsing context commands.

Open a new window

Creates a new browsing context in a new window.

Selenium v4.8

    void testCreateAWindow() {
        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.WINDOW);
        Assertions.assertNotNull(browsingContext.getId());
    }

Selenium v4.8

            const browsingContext = await BrowsingContext(driver, {
                type: 'window',
            })

Open a new tab

Creates a new browsing context in a new tab.

Selenium v4.8

    void testCreateATab() {
        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);
        Assertions.assertNotNull(browsingContext.getId());
    }

Selenium v4.8

            const browsingContext = await BrowsingContext(driver, {
                type: 'tab',
            })

Use existing window handle

Creates a browsing context for the existing tab/window to run commands.

Selenium v4.8

    void testCreateABrowsingContextForGivenId() {
        String id = driver.getWindowHandle();
        BrowsingContext browsingContext = new BrowsingContext(driver, id);
        Assertions.assertEquals(id, browsingContext.getId());
    }

Selenium v4.8

            const id = await driver.getWindowHandle()
            const browsingContext = await BrowsingContext(driver, {
                browsingContextId: id,
            })

Open a window with a reference browsing context

A reference browsing context is a top-level browsing context. The API allows to pass the reference browsing context, which is used to create a new window. The implementation is operating system specific.

Selenium v4.8

    void testCreateAWindowWithAReferenceContext() {
        BrowsingContext
                browsingContext =
                new BrowsingContext(driver, WindowType.WINDOW, driver.getWindowHandle());
        Assertions.assertNotNull(browsingContext.getId());
    }

Selenium v4.8

            const browsingContext = await BrowsingContext(driver, {
                type: 'window',
                referenceContext: await driver.getWindowHandle(),
            })

Open a tab with a reference browsing context

A reference browsing context is a top-level browsing context. The API allows to pass the reference browsing context, which is used to create a new tab. The implementation is operating system specific.

Selenium v4.8

    void testCreateATabWithAReferenceContext() {
        BrowsingContext
                browsingContext =
                new BrowsingContext(driver, WindowType.TAB, driver.getWindowHandle());
        Assertions.assertNotNull(browsingContext.getId());
    }

Selenium v4.8

            const browsingContext = await BrowsingContext(driver, {
                type: 'tab',
                referenceContext: await driver.getWindowHandle(),
            })

Selenium v4.8

    void testNavigateToAUrl() {
        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);

        NavigationResult info = browsingContext.navigate("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html");

        Assertions.assertNotNull(browsingContext.getId());
        Assertions.assertNotNull(info.getNavigationId());
        Assertions.assertTrue(info.getUrl().contains("/bidi/logEntryAdded.html"));
    }

Selenium v4.8

            let info = await browsingContext.navigate('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')

Selenium v4.8

    void testNavigateToAUrlWithReadinessState() {
        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);

        NavigationResult info = browsingContext.navigate("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html",
                ReadinessState.COMPLETE);

        Assertions.assertNotNull(browsingContext.getId());
        Assertions.assertNotNull(info.getNavigationId());
        Assertions.assertTrue(info.getUrl().contains("/bidi/logEntryAdded.html"));
    }

Selenium v4.8

            const info = await browsingContext.navigate(
                'https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html',
                'complete'
            )

Get browsing context tree

Provides a tree of all browsing contexts descending from the parent browsing context, including the parent browsing context.

Selenium v4.8

    void testGetTreeWithAChild() {
        String referenceContextId = driver.getWindowHandle();
        BrowsingContext parentWindow = new BrowsingContext(driver, referenceContextId);

        parentWindow.navigate("https://www.selenium.dev/selenium/web/iframes.html", ReadinessState.COMPLETE);

        List<BrowsingContextInfo> contextInfoList = parentWindow.getTree();

        Assertions.assertEquals(1, contextInfoList.size());
        BrowsingContextInfo info = contextInfoList.get(0);
        Assertions.assertEquals(1, info.getChildren().size());
        Assertions.assertEquals(referenceContextId, info.getId());
        Assertions.assertTrue(info.getChildren().get(0).getUrl().contains("formPage.html"));
    }

Selenium v4.8

            const browsingContextId = await driver.getWindowHandle()
            const parentWindow = await BrowsingContext(driver, {
                browsingContextId: browsingContextId,
            })
            await parentWindow.navigate('https://www.selenium.dev/selenium/web/iframes.html', 'complete')

            const contextInfo = await parentWindow.getTree()

Get browsing context tree with depth

Provides a tree of all browsing contexts descending from the parent browsing context, including the parent browsing context upto the depth value passed.

Selenium v4.8

    void testGetTreeWithDepth() {
        String referenceContextId = driver.getWindowHandle();
        BrowsingContext parentWindow = new BrowsingContext(driver, referenceContextId);

        parentWindow.navigate("https://www.selenium.dev/selenium/web/iframes.html", ReadinessState.COMPLETE);

        List<BrowsingContextInfo> contextInfoList = parentWindow.getTree(0);

        Assertions.assertEquals(1, contextInfoList.size());
        BrowsingContextInfo info = contextInfoList.get(0);
        Assertions.assertNull(info.getChildren()); // since depth is 0
        Assertions.assertEquals(referenceContextId, info.getId());
    }

Selenium v4.8

            const browsingContextId = await driver.getWindowHandle()
            const parentWindow = await BrowsingContext(driver, {
                browsingContextId: browsingContextId,
            })
            await parentWindow.navigate('https://www.selenium.dev/selenium/web/iframes.html', 'complete')

            const contextInfo = await parentWindow.getTree(0)

Get All Top level browsing contexts

Selenium v4.8

    void testGetAllTopLevelContexts() {
        BrowsingContext window1 = new BrowsingContext(driver, driver.getWindowHandle());
        BrowsingContext window2 = new BrowsingContext(driver, WindowType.WINDOW);

        List<BrowsingContextInfo> contextInfoList = window1.getTopLevelContexts();

        Assertions.assertEquals(2, contextInfoList.size());
    }

Selenium v4.20.0

            const id = await driver.getWindowHandle()
            const window1 = await BrowsingContext(driver, {
                browsingContextId: id,
            })
            await BrowsingContext(driver, { type: 'window' })
            const res = await window1.getTopLevelContexts()

Close a tab/window

Selenium v4.8

    void testCloseAWindow() {
        BrowsingContext window1 = new BrowsingContext(driver, WindowType.WINDOW);
        BrowsingContext window2 = new BrowsingContext(driver, WindowType.WINDOW);

        window2.close();

        Assertions.assertThrows(BiDiException.class, window2::getTree);
    }

    @Test
    void testCloseATab() {
        BrowsingContext tab1 = new BrowsingContext(driver, WindowType.TAB);
        BrowsingContext tab2 = new BrowsingContext(driver, WindowType.TAB);

        tab2.close();

        Assertions.assertThrows(BiDiException.class, tab2::getTree);
    }

Selenium v4.8

            const window1 = await BrowsingContext(driver, {type: 'window'})
            const window2 = await BrowsingContext(driver, {type: 'window'})

            await window2.close()

Activate a browsing context

Selenium v4.14.1

        BrowsingContext window1 = new BrowsingContext(driver, driver.getWindowHandle());
        BrowsingContext window2 = new BrowsingContext(driver, WindowType.WINDOW);

        window1.activate();

Selenium v4.15

            const window1 = await BrowsingContext(driver, {
                browsingContextId: id,
            })

            await BrowsingContext(driver, {
                type: 'window',
            })

            const result = await driver.executeScript('return document.hasFocus();')

            assert.equal(result, false)

            await window1.activate()

Reload a browsing context

Selenium v4.13.0

        BrowsingContext browsingContext = new BrowsingContext(driver, WindowType.TAB);

        browsingContext.navigate("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html", ReadinessState.COMPLETE);

        NavigationResult reloadInfo = browsingContext.reload(ReadinessState.INTERACTIVE);

Selenium v4.15

            await browsingContext.reload(undefined, 'complete')

Handle user prompt

Selenium v4.13.0

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());

        driver.get("https://www.selenium.dev/selenium/web/alerts.html");

        driver.findElement(By.id("prompt-with-default")).click();

        String userText = "Selenium automates browsers";
        browsingContext.handleUserPrompt(true, userText);

Selenium v4.15

            await browsingContext.handleUserPrompt(true, userText)

Capture Screenshot

Selenium v4.13.0

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());

        driver.get("https://www.selenium.dev/selenium/web/alerts.html");

        String screenshot = browsingContext.captureScreenshot();

Selenium v4.15

            const response = await browsingContext.captureScreenshot()

Capture Viewport Screenshot

Selenium v4.14.0

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());

        driver.get("https://www.selenium.dev/selenium/web/coordinates_tests/simple_page.html");

        WebElement element = driver.findElement(By.id("box"));
        Rectangle elementRectangle = element.getRect();

        String screenshot =
                browsingContext.captureBoxScreenshot(
                        elementRectangle.getX(), elementRectangle.getY(), 5, 5);

Selenium v4.15

            const browsingContext = await BrowsingContext(driver, {
                browsingContextId: id,
            })

            const response = await browsingContext.captureBoxScreenshot(5, 5, 10, 10)

Capture Element Screenshot

Selenium v4.14.0

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());

        driver.get("https://www.selenium.dev/selenium/web/formPage.html");
        WebElement element = driver.findElement(By.id("checky"));

        String screenshot = browsingContext.captureElementScreenshot(((RemoteWebElement) element).getId());

Selenium v4.15

            const response = await browsingContext.captureElementScreenshot(elementId)

Set Viewport

Selenium v4.14.1

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());
        driver.get("https://www.selenium.dev/selenium/web/formPage.html");

        browsingContext.setViewport(250, 300, 5);

Selenium v4.15

            await browsingContext.setViewport(250, 300)

Selenium v4.14.1

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());

        driver.get("https://www.selenium.dev/selenium/web/formPage.html");
        PrintOptions printOptions = new PrintOptions();

        String printPage = browsingContext.print(printOptions);

Selenium v4.10

            const result = await browsingContext.printPage({
                orientation: 'landscape',
                scale: 1,
                background: true,
                width: 30,
                height: 30,
                top: 1,
                bottom: 1,
                left: 1,
                right: 1,
                shrinkToFit: true,
                pageRanges: ['1-2'],
            })

Selenium v4.16.0

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());
        browsingContext.navigate("https://www.selenium.dev/selenium/web/formPage.html", ReadinessState.COMPLETE);

        wait.until(visibilityOfElementLocated(By.id("imageButton"))).submit();
        wait.until(titleIs("We Arrive Here"));

        browsingContext.back();

Selenium v4.17

            await browsingContext.back()

Selenium v4.16.0

    void canNavigateForwardInTheBrowserHistory() {
        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());
        browsingContext.navigate("https://www.selenium.dev/selenium/web/formPage.html", ReadinessState.COMPLETE);

        wait.until(visibilityOfElementLocated(By.id("imageButton"))).submit();
        wait.until(titleIs("We Arrive Here"));

        browsingContext.back();
        Assertions.assertTrue(driver.getPageSource().contains("We Leave From Here"));

        browsingContext.forward();

Selenium v4.17

            await browsingContext.forward()

Traverse history

Selenium v4.16.0

        BrowsingContext browsingContext = new BrowsingContext(driver, driver.getWindowHandle());
        browsingContext.navigate("https://www.selenium.dev/selenium/web/formPage.html", ReadinessState.COMPLETE);

        wait.until(visibilityOfElementLocated(By.id("imageButton"))).submit();
        wait.until(titleIs("We Arrive Here"));

        browsingContext.traverseHistory(-1);

Selenium v4.17

            await browsingContext.traverseHistory(-1)

8.2.2 - Input

This section contains the APIs related to input commands.

Perform Actions

Selenium v4.17

        Actions selectThreeOptions =
                actions.click(options.get(1)).keyDown(Keys.SHIFT).click(options.get(3)).keyUp(Keys.SHIFT);

        input.perform(windowHandle, selectThreeOptions.getSequences());

Selenium v4.17

            const actions = driver.actions().click(options[1]).keyDown(Key.SHIFT).click(options[3]).keyUp(Key.SHIFT).getSequences()

            await input.perform(browsingContextId, actions)

Release Actions

Selenium v4.17

        Actions sendLowercase =
                new Actions(driver).keyDown(inputTextBox, "a").keyDown(inputTextBox, "b");

        input.perform(windowHandle, sendLowercase.getSequences());
        ((JavascriptExecutor) driver).executeScript("resetEvents()");

        input.release(windowHandle);

Selenium v4.17

            await input.release(browsingContextId)

8.2.3 - Log

This section contains the APIs related to logging.

Console logs

Listen to the console.log events and register callbacks to process the event.

Selenium v4.8

    public void jsErrors() {
        CopyOnWriteArrayList<ConsoleLogEntry> logs = new CopyOnWriteArrayList<>();

        try (LogInspector logInspector = new LogInspector(driver)) {
            logInspector.onConsoleEntry(logs::add);
        }

        driver.get("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html");
            const inspector = await LogInspector(driver)
            await inspector.onConsoleEntry(function (log) {
                logEntry = log
            })

            await driver.get('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')
            await driver.findElement({id: 'consoleLog'}).click()

            assert.equal(logEntry.text, 'Hello, world!')
            assert.equal(logEntry.realm, null)
            assert.equal(logEntry.type, 'console')
            assert.equal(logEntry.level, 'info')
            assert.equal(logEntry.method, 'log')
            assert.equal(logEntry.stackTrace, null)
            assert.equal(logEntry.args.length, 1)

JavaScript exceptions

Listen to the JS Exceptions and register callbacks to process the exception details.

            logInspector.onJavaScriptLog(future::complete);

            driver.get("https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html");
            driver.findElement(By.id("jsException")).click();

            JavascriptLogEntry logEntry = future.get(5, TimeUnit.SECONDS);

            Assertions.assertEquals("Error: Not working", logEntry.getText());
            const inspector = await LogInspector(driver)
            await inspector.onJavascriptException(function (log) {
                logEntry = log
            })

            await driver.get('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')
            await driver.findElement({id: 'jsException'}).click()

            assert.equal(logEntry.text, 'Error: Not working')
            assert.equal(logEntry.type, 'javascript')
            assert.equal(logEntry.level, 'error')

Listen to JS Logs

Listen to all JS logs at all levels and register callbacks to process the log.

Selenium v4.8

            driver.findElement(By.id("consoleLog")).click();

            ConsoleLogEntry logEntry = future.get(5, TimeUnit.SECONDS);

            Assertions.assertEquals("Hello, world!", logEntry.getText());
            Assertions.assertNull(logEntry.getRealm());
            Assertions.assertEquals(1, logEntry.getArgs().size());
            Assertions.assertEquals("console", logEntry.getType());

8.2.4 - Network

This section contains the APIs related to network commands.

Add network intercept

Selenium v4.18

        try (Network network = new Network(driver)) {
            String intercept =
                    network.addIntercept(new AddInterceptParameters(InterceptPhase.BEFORE_REQUEST_SENT));

Selenium v4.18

            const intercept = await network.addIntercept(new AddInterceptParameters(InterceptPhase.BEFORE_REQUEST_SENT))

Remove network intercept

Selenium v4.18

        try (Network network = new Network(driver)) {
            String intercept =
                    network.addIntercept(new AddInterceptParameters(InterceptPhase.BEFORE_REQUEST_SENT));
            Assertions.assertNotNull(intercept);
            network.removeIntercept(intercept);

Selenium v4.18

            const network = await Network(driver)
            const intercept = await network.addIntercept(new AddInterceptParameters(InterceptPhase.BEFORE_REQUEST_SENT))

Continue request blocked at authRequired phase with credentials

Selenium v4.18

        try (Network network = new Network(driver)) {
            network.addIntercept(new AddInterceptParameters(InterceptPhase.AUTH_REQUIRED));
            network.onAuthRequired(
                    responseDetails ->
                            network.continueWithAuth(
                                    responseDetails.getRequest().getRequestId(),
                                    new UsernameAndPassword("admin", "admin")));
            driver.get("https://the-internet.herokuapp.com/basic_auth");

Selenium v4.18

            await network.addIntercept(new AddInterceptParameters(InterceptPhase.AUTH_REQUIRED))

            await network.authRequired(async (event) => {
                await network.continueWithAuth(event.request.request, 'admin','admin')
            })

Continue request blocked at authRequired phase without credentials

Selenium v4.18

        try (Network network = new Network(driver)) {
            network.addIntercept(new AddInterceptParameters(InterceptPhase.AUTH_REQUIRED));
            network.onAuthRequired(
                    responseDetails ->
                            // Does not handle the alert
                            network.continueWithAuthNoCredentials(responseDetails.getRequest().getRequestId()));
            driver.get("https://the-internet.herokuapp.com/basic_auth");

Selenium v4.18

            await network.addIntercept(new AddInterceptParameters(InterceptPhase.AUTH_REQUIRED))

            await network.authRequired(async (event) => {
                await network.continueWithAuthNoCredentials(event.request.request)
            })

Cancel request blocked at authRequired phase

Selenium v4.18

        try (Network network = new Network(driver)) {
            network.addIntercept(new AddInterceptParameters(InterceptPhase.AUTH_REQUIRED));
            network.onAuthRequired(
                    responseDetails ->
                            // Does not handle the alert
                            network.cancelAuth(responseDetails.getRequest().getRequestId()));
            driver.get("https://the-internet.herokuapp.com/basic_auth");

Selenium v4.18

            await network.addIntercept(new AddInterceptParameters(InterceptPhase.AUTH_REQUIRED))

            await network.authRequired(async (event) => {
                await network.cancelAuth(event.request.request)
            })

Fail request

Selenium v4.18

        try (Network network = new Network(driver)) {
            network.addIntercept(new AddInterceptParameters(InterceptPhase.BEFORE_REQUEST_SENT));
            network.onBeforeRequestSent(
                    responseDetails -> network.failRequest(responseDetails.getRequest().getRequestId()));
            driver.manage().timeouts().pageLoadTimeout(Duration.of(5, ChronoUnit.SECONDS));

8.2.5 - Script

This section contains the APIs related to script commands.

Call function in a browsing context

Selenium v4.15

        try (Script script = new Script(id, driver)) {
            List<LocalValue> arguments = new ArrayList<>();
            arguments.add(PrimitiveProtocolValue.numberValue(22));

            Map<Object, LocalValue> value = new HashMap<>();
            value.put("some_property", LocalValue.numberValue(42));
            LocalValue thisParameter = LocalValue.objectValue(value);

            arguments.add(thisParameter);

            EvaluateResult result =
                    script.callFunctionInBrowsingContext(
                            id,
                            "function processWithPromise(argument) {\n"
                                    + "  return new Promise((resolve, reject) => {\n"
                                    + "    setTimeout(() => {\n"
                                    + "      resolve(argument + this.some_property);\n"
                                    + "    }, 1000)\n"
                                    + "  })\n"
                                    + "}",
                            true,
                            Optional.of(arguments),
                            Optional.of(thisParameter),
                            Optional.of(ResultOwnership.ROOT));

Selenium v4.9

            const manager = await ScriptManager(id, driver)

            let argumentValues = []
            let value = new ArgumentValue(LocalValue.createNumberValue(22))
            argumentValues.push(value)

            let mapValue = {some_property: LocalValue.createNumberValue(42)}
            let thisParameter = new ArgumentValue(LocalValue.createObjectValue(mapValue)).asMap()

            const result = await manager.callFunctionInBrowsingContext(
                id,
                'function processWithPromise(argument) {' +
                'return new Promise((resolve, reject) => {' +
                'setTimeout(() => {' +
                'resolve(argument + this.some_property);' +
                '}, 1000)' +
                '})' +
                '}',
                true,
                argumentValues,
                thisParameter,
                ResultOwnership.ROOT)

Call function in a sandbox

Selenium v4.15

        try (Script script = new Script(id, driver)) {
            EvaluateResult result =
                    script.callFunctionInBrowsingContext(
                            id,
                            "sandbox",
                            "() => window.foo",
                            true,
                            Optional.empty(),
                            Optional.empty(),
                            Optional.empty());

Selenium v4.9

            const manager = await ScriptManager(id, driver)

            await manager.callFunctionInBrowsingContext(id, '() => { window.foo = 2; }', true, null, null, null, 'sandbox')

Call function in a realm

Selenium v4.15

        try (Script script = new Script(tab, driver)) {
            List<RealmInfo> realms = script.getAllRealms();
            String realmId = realms.get(0).getRealmId();

            EvaluateResult result = script.callFunctionInRealm(
                    realmId,
                    "() => { window.foo = 3; }",
                    true,
                    Optional.empty(),
                    Optional.empty(),
                    Optional.empty());

Selenium v4.9

            const manager = await ScriptManager(firstTab, driver)

            const realms = await manager.getAllRealms()
            const realmId = realms[0].realmId

            await manager.callFunctionInRealm(realmId, '() => { window.foo = 3; }', true)

Evaluate script in a browsing context

Selenium v4.15

        try (Script script = new Script(id, driver)) {
            EvaluateResult result =
                    script.evaluateFunctionInBrowsingContext(id, "1 + 2", true, Optional.empty());

Selenium v4.9

            const manager = await ScriptManager(id, driver)

            const result = await manager.evaluateFunctionInBrowsingContext(id, '1 + 2', true)

Evaluate script in a sandbox

Selenium v4.15

        try (Script script = new Script(id, driver)) {
            EvaluateResult result =
                    script.evaluateFunctionInBrowsingContext(
                            id, "sandbox", "window.foo", true, Optional.empty());

Selenium v4.9

            const manager = await ScriptManager(id, driver)

            await manager.evaluateFunctionInBrowsingContext(id, 'window.foo = 2', true, null, 'sandbox')

            const resultInSandbox = await manager.evaluateFunctionInBrowsingContext(id, 'window.foo', true, null, 'sandbox')

Evaluate script in a realm

Selenium v4.15

        try (Script script = new Script(tab, driver)) {
            List<RealmInfo> realms = script.getAllRealms();
            String realmId = realms.get(0).getRealmId();

            EvaluateResult result =
                    script.evaluateFunctionInRealm(
                            realmId, "window.foo", true, Optional.empty());

Selenium v4.9

            const manager = await ScriptManager(firstTab, driver)

            const realms = await manager.getAllRealms()
            const realmId = realms[0].realmId

            await manager.evaluateFunctionInRealm(realmId, 'window.foo = 3', true)

            const result = await manager.evaluateFunctionInRealm(realmId, 'window.foo', true)

Disown handles in a browsing context

Selenium v4.15

            script.disownBrowsingContextScript(

Selenium v4.9

            await manager.disownBrowsingContextScript(id, boxId)

Disown handles in a realm

Selenium v4.15

            script.disownRealmScript(realmId, List.of(boxId));

Selenium v4.9

            await manager.disownRealmScript(realmId, boxId)

Get all realms

Selenium v4.15

        try (Script script = new Script(firstWindow, driver)) {
            List<RealmInfo> realms = script.getAllRealms();

Selenium v4.9

            const manager = await ScriptManager(firstWindow, driver)

            const realms = await manager.getAllRealms()

Get realm by type

Selenium v4.15

        try (Script script = new Script(firstWindow, driver)) {
            List<RealmInfo> realms = script.getRealmsByType(RealmType.WINDOW);

Selenium v4.9

            const manager = await ScriptManager(firstWindow, driver)

            const realms = await manager.getRealmsByType(RealmType.WINDOW)

Get browsing context realms

Selenium v4.15

        try (Script script = new Script(windowId, driver)) {
            List<RealmInfo> realms = script.getRealmsInBrowsingContext(tabId);

Selenium v4.9

            const manager = await ScriptManager(windowId, driver)

            const realms = await manager.getRealmsInBrowsingContext(tabId)

Get browsing context realms by type

Selenium v4.15

            List<RealmInfo> windowRealms =
                    script.getRealmsInBrowsingContextByType(windowId, RealmType.WINDOW);

Selenium v4.9

            const realms = await manager.getRealmsInBrowsingContextByType(windowId, RealmType.WINDOW)

Preload a script

Selenium v4.15

            String id = script.addPreloadScript("() => { window.bar=2; }", "sandbox");

Selenium v4.10

            const manager = await ScriptManager(id, driver)

            const scriptId = await manager.addPreloadScript('() => {{ console.log(\'{preload_script_console_text}\') }}')

Remove a preloaded script

Selenium v4.15

                script.removePreloadScript(id);

Selenium v4.10

            await manager.removePreloadScript(scriptId)

9 - Support features

Support classes provide optional higher level features.

The core libraries of Selenium try to be low level and non-opinionated. The Support classes in each language provide opinionated wrappers for common interactions that may be used to simplify some behaviors.

9.1 - Waiting with Expected Conditions

These are classes used to describe what needs to be waited for.

Expected Conditions are used with Explicit Waits. Instead of defining the block of code to be executed with a lambda, an expected conditions method can be created to represent common things that get waited on. Some methods take locators as arguments, others take elements as arguments.

These methods can include conditions such as:

  • element exists
  • element is stale
  • element is visible
  • text is visible
  • title contains specified value
[Expected Conditions Documentation](https://www.selenium.dev/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.expected_conditions.html)

Add Example

.NET stopped supporting Expected Conditions in Selenium 4 to minimize maintenance hassle and redundancy.
Ruby makes frequent use of blocks, procs and lambdas and does not need Expected Conditions classes

9.2 - Command Listeners

These allow you to execute custom actions in every time specific Selenium commands are sent

9.3 - Working With Colors

You will occasionally want to validate the colour of something as part of your tests; the problem is that colour definitions on the web are not constant. Would it not be nice if there was an easy way to compare a HEX representation of a colour with a RGB representation of a colour, or a RGBA representation of a colour with a HSLA representation of a colour?

Worry not. There is a solution: the Color class!

First of all, you will need to import the class:

Move Code

import org.openqa.selenium.support.Color;
  
from selenium.webdriver.support.color import Color
  
include Selenium::WebDriver::Support
  
import org.openqa.selenium.support.Color

You can now start creating colour objects. Every colour object will need to be created from a string representation of your colour. Supported colour representations are:

Move Code

private final Color HEX_COLOUR = Color.fromString("#2F7ED8");
private final Color RGB_COLOUR = Color.fromString("rgb(255, 255, 255)");
private final Color RGB_COLOUR = Color.fromString("rgb(40%, 20%, 40%)");
private final Color RGBA_COLOUR = Color.fromString("rgba(255, 255, 255, 0.5)");
private final Color RGBA_COLOUR = Color.fromString("rgba(40%, 20%, 40%, 0.5)");
private final Color HSL_COLOUR = Color.fromString("hsl(100, 0%, 50%)");
private final Color HSLA_COLOUR = Color.fromString("hsla(100, 0%, 50%, 0.5)");
  
HEX_COLOUR = Color.from_string('#2F7ED8')
RGB_COLOUR = Color.from_string('rgb(255, 255, 255)')
RGB_COLOUR = Color.from_string('rgb(40%, 20%, 40%)')
RGBA_COLOUR = Color.from_string('rgba(255, 255, 255, 0.5)')
RGBA_COLOUR = Color.from_string('rgba(40%, 20%, 40%, 0.5)')
HSL_COLOUR = Color.from_string('hsl(100, 0%, 50%)')
HSLA_COLOUR = Color.from_string('hsla(100, 0%, 50%, 0.5)')
  
HEX_COLOUR = Color.from_string('#2F7ED8')
RGB_COLOUR = Color.from_string('rgb(255, 255, 255)')
RGB_COLOUR = Color.from_string('rgb(40%, 20%, 40%)')
RGBA_COLOUR = Color.from_string('rgba(255, 255, 255, 0.5)')
RGBA_COLOUR = Color.from_string('rgba(40%, 20%, 40%, 0.5)')
HSL_COLOUR = Color.from_string('hsl(100, 0%, 50%)')
HSLA_COLOUR = Color.from_string('hsla(100, 0%, 50%, 0.5)')
  
private val HEX_COLOUR = Color.fromString("#2F7ED8")
private val RGB_COLOUR = Color.fromString("rgb(255, 255, 255)")
private val RGB_COLOUR_PERCENT = Color.fromString("rgb(40%, 20%, 40%)")
private val RGBA_COLOUR = Color.fromString("rgba(255, 255, 255, 0.5)")
private val RGBA_COLOUR_PERCENT = Color.fromString("rgba(40%, 20%, 40%, 0.5)")
private val HSL_COLOUR = Color.fromString("hsl(100, 0%, 50%)")
private val HSLA_COLOUR = Color.fromString("hsla(100, 0%, 50%, 0.5)")
  

The Color class also supports all of the base colour definitions specified in http://www.w3.org/TR/css3-color/#html4.

Move Code

private final Color BLACK = Color.fromString("black");
private final Color CHOCOLATE = Color.fromString("chocolate");
private final Color HOTPINK = Color.fromString("hotpink");
  
BLACK = Color.from_string('black')
CHOCOLATE = Color.from_string('chocolate')
HOTPINK = Color.from_string('hotpink')
  
BLACK = Color.from_string('black')
CHOCOLATE = Color.from_string('chocolate')
HOTPINK = Color.from_string('hotpink')
  
private val BLACK = Color.fromString("black")
private val CHOCOLATE = Color.fromString("chocolate")
private val HOTPINK = Color.fromString("hotpink")
  

Sometimes browsers will return a colour value of “transparent” if no colour has been set on an element. The Color class also supports this:

Move Code

private final Color TRANSPARENT = Color.fromString("transparent");
  
TRANSPARENT = Color.from_string('transparent')
  
TRANSPARENT = Color.from_string('transparent')
  
private val TRANSPARENT = Color.fromString("transparent")
  

You can now safely query an element to get its colour/background colour knowing that any response will be correctly parsed and converted into a valid Color object:

Move Code

Color loginButtonColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("color"));

Color loginButtonBackgroundColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("background-color"));
  
login_button_colour = Color.from_string(driver.find_element(By.ID,'login').value_of_css_property('color'))

login_button_background_colour = Color.from_string(driver.find_element(By.ID,'login').value_of_css_property('background-color'))
  
login_button_colour = Color.from_string(driver.find_element(id: 'login').css_value('color'))

login_button_background_colour = Color.from_string(driver.find_element(id: 'login').css_value('background-color'))
  
val loginButtonColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("color"))

val loginButtonBackgroundColour = Color.fromString(driver.findElement(By.id("login")).getCssValue("background-color"))
  

You can then directly compare colour objects:

Move Code

assert loginButtonBackgroundColour.equals(HOTPINK);
  
assert login_button_background_colour == HOTPINK
  
assert(login_button_background_colour == HOTPINK)
  
assert(loginButtonBackgroundColour.equals(HOTPINK))
  

Or you can convert the colour into one of the following formats and perform a static validation:

Move Code

assert loginButtonBackgroundColour.asHex().equals("#ff69b4");
assert loginButtonBackgroundColour.asRgba().equals("rgba(255, 105, 180, 1)");
assert loginButtonBackgroundColour.asRgb().equals("rgb(255, 105, 180)");
  
assert login_button_background_colour.hex == '#ff69b4'
assert login_button_background_colour.rgba == 'rgba(255, 105, 180, 1)'
assert login_button_background_colour.rgb == 'rgb(255, 105, 180)'
  
assert(login_button_background_colour.hex == '#ff69b4')
assert(login_button_background_colour.rgba == 'rgba(255, 105, 180, 1)')
assert(login_button_background_colour.rgb == 'rgb(255, 105, 180)')
  
assert(loginButtonBackgroundColour.asHex().equals("#ff69b4"))
assert(loginButtonBackgroundColour.asRgba().equals("rgba(255, 105, 180, 1)"))
assert(loginButtonBackgroundColour.asRgb().equals("rgb(255, 105, 180)"))
  

Colours are no longer a problem.

9.4 - Working with select list elements

Select lists have special behaviors compared to other elements.

The Select object will now give you a series of commands that allow you to interact with a <select> element.

If you are using Java or .NET make sure that you’ve properly required the support package in your code. See the full code from GitHub in any of the examples below.

Note that this class only works for HTML elements select and option. It is possible to design drop-downs with JavaScript overlays using div or li, and this class will not work for those.

Types

Select methods may behave differently depending on which type of <select> element is being worked with.

Single select

This is the standard drop-down object where one and only one option may be selected.

<select name="selectomatic">
    <option selected="selected" id="non_multi_option" value="one">One</option>
    <option value="two">Two</option>
    <option value="four">Four</option>
    <option value="still learning how to count, apparently">Still learning how to count, apparently</option>
</select>

Multiple select

This select list allows selecting and deselecting more than one option at a time. This only applies to <select> elements with the multiple attribute.

<select name="multi" id="multi" multiple="multiple">
    <option selected="selected" value="eggs">Eggs</option>
    <option value="ham">Ham</option>
    <option selected="selected" value="sausages">Sausages</option>
    <option value="onion gravy">Onion gravy</option>
</select>

Create class

First locate a <select> element, then use it to initialize a Select object. Note that as of Selenium 4.5, you can’t create a Select object if the <select> element is disabled.

        WebElement selectElement = driver.findElement(By.name("selectomatic"));
        Select select = new Select(selectElement);
    select_element = driver.find_element(By.NAME, 'selectomatic')
    select = Select(select_element)
            var selectElement = driver.FindElement(By.Name("selectomatic"));
            var select = new SelectElement(selectElement);
    select_element = driver.find_element(name: 'selectomatic')
    select = Selenium::WebDriver::Support::Select.new(select_element)
      const selectElement = await driver.findElement(By.name('selectomatic'))
      const select = new Select(selectElement)
    val selectElement = driver.findElement(By.name("selectomatic"))
    val select = Select(selectElement)

List options

There are two lists that can be obtained:

All options

Get a list of all options in the <select> element:

        List<WebElement> optionList = select.getOptions();
    option_list = select.options
            IList<IWebElement> optionList = select.Options;
    option_list = select.options
      const optionList = await select.getOptions()
    val optionList = select.getOptions()

Selected options

Get a list of selected options in the <select> element. For a standard select list this will only be a list with one element, for a multiple select list it can contain zero or many elements.

        List<WebElement> selectedOptionList = select.getAllSelectedOptions();
    selected_option_list = select.all_selected_options
            IList<IWebElement> selectedOptionList = select.AllSelectedOptions;
    selected_option_list = select.selected_options
      const selectedOptionList = await select.getAllSelectedOptions()
    val selectedOptionList = select.getAllSelectedOptions()

Select option

The Select class provides three ways to select an option. Note that for multiple select type Select lists, you can repeat these methods for each element you want to select.

Text

Select the option based on its visible text

        select.selectByVisibleText("Four");
    select.select_by_visible_text('Four')
            select.SelectByText("Four");
    select.select_by(:text, 'Four')
      await select.selectByVisibleText('Four')
    select.selectByVisibleText("Four")

Value

Select the option based on its value attribute

        select.selectByValue("two");
    select.select_by_value('two')
            select.SelectByValue("two");
    select.select_by(:value, 'two')
      await select.selectByValue('two')
    select.selectByValue("two")

Index

Select the option based on its position in the list

        select.selectByIndex(3);
    select.select_by_index(3)
            select.SelectByIndex(3);
    select.select_by(:index, 3)
      await select.selectByIndex(3)
    select.selectByIndex(3)

Disabled options

Selenium v4.5

Options with a disabled attribute may not be selected.

    <select name="single_disabled">
      <option id="sinlge_disabled_1" value="enabled">Enabled</option>
      <option id="sinlge_disabled_2" value="disabled" disabled="disabled">Disabled</option>
    </select>
        Assertions.assertThrows(UnsupportedOperationException.class, () -> {
            select.selectByValue("disabled");
        });
    with pytest.raises(NotImplementedError):
        select.select_by_value('disabled')
            Assert.ThrowsException<InvalidOperationException>(() => select.SelectByValue("disabled"));
    expect {
      select.select_by(:value, 'disabled')
    }.to raise_exception(Selenium::WebDriver::Error::UnsupportedOperationError)
      await assert.rejects(async () => {
        await select.selectByValue("disabled")
      }, {
        name: 'UnsupportedOperationError',
    Assertions.assertThrows(UnsupportedOperationException::class.java) {
      select.selectByValue("disabled")
    }

De-select option

Only multiple select type select lists can have options de-selected. You can repeat these methods for each element you want to select.

        select.deselectByValue("eggs");
    select.deselect_by_value('eggs')
            select.DeselectByValue("eggs");
    select.deselect_by(:value, 'eggs')
      await select.deselectByValue('eggs')
    select.deselectByValue("eggs")

9.5 - ThreadGuard

This class is only available in the Java Binding

ThreadGuard checks that a driver is called only from the same thread that created it. Threading issues especially when running tests in Parallel may have mysterious and hard to diagnose errors. Using this wrapper prevents this category of errors and will raise an exception when it happens.

The following example simulate a clash of threads:

public class DriverClash {
  //thread main (id 1) created this driver
  private WebDriver protectedDriver = ThreadGuard.protect(new ChromeDriver()); 

  static {
    System.setProperty("webdriver.chrome.driver", "<Set path to your Chromedriver>");
  }
  
  //Thread-1 (id 24) is calling the same driver causing the clash to happen
  Runnable r1 = () -> {protectedDriver.get("https://selenium.dev");};
  Thread thr1 = new Thread(r1);
   
  void runThreads(){
    thr1.start();
  }

  public static void main(String[] args) {
    new DriverClash().runThreads();
  }
}

The result shown below:

Exception in thread "Thread-1" org.openqa.selenium.WebDriverException:
Thread safety error; this instance of WebDriver was constructed
on thread main (id 1)and is being accessed by thread Thread-1 (id 24)
This is not permitted and *will* cause undefined behaviour

As seen in the example:

  • protectedDriver Will be created in Main thread
  • We use Java Runnable to spin up a new process and a new Thread to run the process
  • Both Thread will clash because the Main Thread does not have protectedDriver in it’s memory.
  • ThreadGuard.protect will throw an exception.

Note:

This does not replace the need for using ThreadLocal to manage drivers when running parallel.

10 - Troubleshooting Assistance

How to get manage WebDriver problems.

It is not always obvious the root cause of errors in Selenium.

  1. The most common Selenium-related error is a result of poor synchronization. Read about Waiting Strategies. If you aren’t sure if it is a synchronization strategy you can try temporarily hard coding a large sleep where you see the issue, and you’ll know if adding an explicit wait can help.

  2. Note that many errors that get reported to the project are actually caused by issues in the underlying drivers that Selenium sends the commands to. You can rule out a driver problem by executing the command in multiple browsers.

  3. If you have questions about how to do things, check out the Support options for ways get assistance.

  4. If you think you’ve found a problem with Selenium code, go ahead and file a Bug Report on GitHub.

10.1 - Understanding Common Errors

How to get deal with various problems in your Selenium code.

Invalid Selector Exception

CSS and XPath Selectors are sometimes difficult to get correct.

Likely Cause

The CSS or XPath selector you are trying to use has invalid characters or an invalid query.

Possible Solutions

Run your selector through a validator service:

Or use a browser extension to get a known good value:

No Such Element Exception

The element can not be found at the exact moment you attempted to locate it.

Likely Cause

  • You are looking for the element in the wrong place (perhaps a previous action was unsuccessful).
  • You are looking for the element at the wrong time (the element has not shown up in the DOM, yet)
  • The locator has changed since you wrote the code

Possible Solutions

  • Make sure you are on the page you expect to be on, and that previous actions in your code completed correctly
  • Make sure you are using a proper Waiting Strategy
  • Update the locator with the browser’s devtools console or use a browser extension like:

Stale Element Reference Exception

An element goes stale when it was previously located, but can not be currently accessed. Elements do not get relocated automatically; the driver creates a reference ID for the element and has a particular place it expects to find it in the DOM. If it can not find the element in the current DOM, any action using that element will result in this exception.

Common Causes

This can happen when:

  • You have refreshed the page, or the DOM of the page has dynamically changed.
  • You have navigated to a different page.
  • You have switched to another window or into or out of a frame or iframe.

Common Solutions

The DOM has changed

When the page is refreshed or items on the page have moved around, there is still an element with the desired locator on the page, it is just no longer accessible by the element object being used, and the element must be relocated before it can be used again. This is often done in one of two ways:

  • Always relocate the element every time you go to use it. The likelihood of the element going stale in the microseconds between locating and using the element is small, though possible. The downside is that this is not the most efficient approach, especially when running on a remote grid.

  • Wrap the Web Element with another object that stores the locator, and caches the located Selenium element. When taking actions with this wrapped object, you can attempt to use the cached object if previously located, and if it is stale, exception can be caught, the element relocated with the stored locator, and the method re-tried. This is more efficient, but it can cause problems if the locator you’re using references a different element (and not the one you want) after the page has changed.

The Context has changed

Element objects are stored for a given context, so if you move to a different context — like a different window or a different frame or iframe — the element reference will still be valid, but will be temporarily inaccessible. In this scenario, it won’t help to relocate the element, because it doesn’t exist in the current context. To fix this, you need to make sure to switch back to the correct context before using the element.

The Page has changed

This scenario is when you haven’t just changed contexts, you have navigated to another page and have destroyed the context in which the element was located. You can’t just relocate it from the current context, and you can’t switch back to an active context where it is valid. If this is the reason for your error, you must both navigate back to the correct location and relocate it.

10.1.1 - Unable to Locate Driver Error

Troubleshooting missing path to driver executable.

Historically, this is the most common error beginning Selenium users get when trying to run code for the first time:

The path to the driver executable must be set by the webdriver.chrome.driver system property; for more information, see https://chromedriver.chromium.org/. The latest version can be downloaded from https://chromedriver.chromium.org/downloads
The executable chromedriver needs to be available in the path.
The file geckodriver does not exist. The driver can be downloaded at https://github.com/mozilla/geckodriver/releases"
Unable to locate the chromedriver executable;

Likely cause

Through WebDriver, Selenium supports all major browsers. In order to drive the requested browser, Selenium needs to send commands to it via an executable driver. This error means the necessary driver could not be found by any of the means Selenium attempts to use.

Possible solutions

There are several ways to ensure Selenium gets the driver it needs.

Use the latest version of Selenium

As of Selenium 4.6, Selenium downloads the correct driver for you. You shouldn’t need to do anything. If you are using the latest version of Selenium and you are getting an error, please turn on logging and file a bug report with that information.

If you want to read more information about how Selenium manages driver downloads for you, you can read about the Selenium Manager.

Use the PATH environment variable

This option first requires manually downloading the driver.

This is a flexible option to change location of drivers without having to update your code, and will work on multiple machines without requiring that each machine put the drivers in the same place.

You can either place the drivers in a directory that is already listed in PATH, or you can place them in a directory and add it to PATH.

To see what directories are already on PATH, open a Terminal and execute:

echo $PATH

If the location to your driver is not already in a directory listed, you can add a new directory to PATH:

echo 'export PATH=$PATH:/path/to/driver' >> ~/.bash_profile
source ~/.bash_profile

You can test if it has been added correctly by checking the version of the driver:

chromedriver --version

To see what directories are already on PATH, open a Terminal and execute:

echo $PATH

If the location to your driver is not already in a directory listed, you can add a new directory to PATH:

echo 'export PATH=$PATH:/path/to/driver' >> ~/.zshenv
source ~/.zshenv

You can test if it has been added correctly by checking the version of the driver:

chromedriver --version

To see what directories are already on PATH, open a Command Prompt and execute:

echo %PATH%

If the location to your driver is not already in a directory listed, you can add a new directory to PATH:

setx PATH "%PATH%;C:\WebDriver\bin"

You can test if it has been added correctly by checking the version of the driver:

chromedriver.exe --version

Specify the location of the driver

If you cannot upgrade to the latest version of Selenium, you do not want Selenium to download drivers for you, and you can’t figure out the environment variables, you can specify the location of the driver in the Service object.

You first need to download the desired driver, then create an instance of the applicable Service class and set the path.

Specifying the location in the code itself has the advantage of not needing to figure out Environment Variables on your system, but has the drawback of making the code less flexible.

Driver management libraries

Before Selenium managed drivers itself, other projects were created to do so for you.

If you can’t use Selenium Manager because you are using an older version of Selenium (please upgrade), or need an advanced feature not yet implemented by Selenium Manager, you might try one of these tools to keep your drivers automatically updated:

Download the driver

BrowserSupported OSMaintained byDownloadIssue Tracker
Chromium/ChromeWindows/macOS/LinuxGoogleDownloadsIssues
FirefoxWindows/macOS/LinuxMozillaDownloadsIssues
EdgeWindows/macOS/LinuxMicrosoftDownloadsIssues
Internet ExplorerWindowsSelenium ProjectDownloadsIssues
SafarimacOS High Sierra and newerAppleBuilt inIssues

Note: The Opera driver no longer works with the latest functionality of Selenium and is currently officially unsupported.

10.2 - Logging Selenium commands

Getting information about Selenium execution.

Turning on logging is a valuable way to get extra information that might help you determine why you might be having a problem.

Getting a logger

Java logs are typically created per class. You can work with the default logger to work with all loggers. To filter out specific classes, see Filtering

Get the root logger:

        Logger logger = Logger.getLogger("");

Java Logging is not exactly straightforward, and if you are just looking for an easy way to look at the important Selenium logs, take a look at the Selenium Logger project

Python logs are typically created per module. You can match all submodules by referencing the top level module. So to work with all loggers in selenium module, you can do this:

    logger = logging.getLogger('selenium')

.NET logger is managed with a static class, so all access to logging is managed simply by referencing Log from the OpenQA.Selenium.Internal.Logging namespace.

If you want to see as much debugging as possible in all the classes, you can turn on debugging globally in Ruby by setting $DEBUG = true.

For more fine-tuned control, Ruby Selenium created its own Logger class to wrap the default Logger class. This implementation provides some interesting additional features. Obtain the logger directly from the #loggerclass method on the Selenium::WebDriver module:

Selenium v4.10

    logger = Selenium::WebDriver.logger
const logging = require('selenium-webdriver/lib/logging')
logger = logging.getLogger('webdriver')

Logger level

Logger level helps to filter out logs based on their severity.

Java has 7 logger levels: SEVERE, WARNING, INFO, CONFIG, FINE, FINER, and FINEST. The default is INFO.

You have to change both the level of the logger and the level of the handlers on the root logger:

        logger.setLevel(Level.FINE);
        Arrays.stream(logger.getHandlers()).forEach(handler -> {
            handler.setLevel(Level.FINE);
        });

Python has 6 logger levels: CRITICAL, ERROR, WARNING, INFO, DEBUG, and NOTSET. The default is WARNING

To change the level of the logger:

    logger.setLevel(logging.DEBUG)

Things get complicated when you use PyTest, though. By default, PyTest hides logging unless the test fails. You need to set 3 things to get PyTest to display logs on passing tests.

To always output logs with PyTest you need to run with additional arguments. First, -s to prevent PyTest from capturing the console. Second, -p no:logging, which allows you to override the default PyTest logging settings so logs can be displayed regardless of errors.

So you need to set these flags in your IDE, or run PyTest on command line like:

pytest -s -p no:logging

Finally, since you turned off logging in the arguments above, you now need to add configuration to turn it back on:

logging.basicConfig(level=logging.WARN)

.NET has 6 logger levels: Error, Warn, Info, Debug, Trace and None. The default level is Info.

To change the level of the logger:

            Log.SetLevel(LogEventLevel.Trace);

Ruby logger has 5 logger levels: :debug, :info, :warn, :error, :fatal. The default is :info.

To change the level of the logger:

Selenium v4.10

    logger.level = :debug

JavaScript has 9 logger levels: OFF, SEVERE, WARNING, INFO, DEBUG, FINE, FINER, FINEST, ALL. The default is OFF.

To change the level of the logger:

logger.setLevel(logging.Level.INFO)

Actionable items

Things are logged as warnings if they are something the user needs to take action on. This is often used for deprecations. For various reasons, Selenium project does not follow standard Semantic Versioning practices. Our policy is to mark things as deprecated for 3 releases and then remove them, so deprecations may be logged as warnings.

Java logs actionable content at logger level WARN

Example:

May 08, 2023 9:23:38 PM dev.selenium.troubleshooting.LoggingTest logging
WARNING: this is a warning

Python logs actionable content at logger level — WARNING Details about deprecations are logged at this level.

Example:

WARNING  selenium:test_logging.py:23 this is a warning

.NET logs actionable content at logger level Warn.

Example:

11:04:40.986 WARN LoggingTest: this is a warning

Ruby logs actionable content at logger level — :warn. Details about deprecations are logged at this level.

For example:

2023-05-08 20:53:13 WARN Selenium [:example_id] this is a warning 

Because these items can get annoying, we’ve provided an easy way to turn them off, see filtering section below.

Useful information

This is the default level where Selenium logs things that users should be aware of but do not need to take actions on. This might reference a new method or direct users to more information about something

Java logs useful information at logger level INFO

Example:

May 08, 2023 9:23:38 PM dev.selenium.troubleshooting.LoggingTest logging
INFO: this is useful information

Python logs useful information at logger level — INFO

Example:

INFO     selenium:test_logging.py:22 this is useful information

.NET logs useful information at logger level Info.

Example:

11:04:40.986 INFO LoggingTest: this is useful information

Ruby logs useful information at logger level — :info.

Example:

2023-05-08 20:53:13 INFO Selenium [:example_id] this is useful information 

Logs useful information at level: INFO

Debugging Details

The debug log level is used for information that may be needed for diagnosing issues and troubleshooting problems.

Java logs most debug content at logger level FINE

Example:

May 08, 2023 9:23:38 PM dev.selenium.troubleshooting.LoggingTest logging
FINE: this is detailed debug information

Python logs debugging details at logger level — DEBUG

Example:

DEBUG    selenium:test_logging.py:24 this is detailed debug information

.NET logs most debug content at logger level Debug.

Example:

11:04:40.986 DEBUG LoggingTest: this is detailed debug information

Ruby only provides one level for debugging, so all details are at logger level — :debug.

Example:

2023-05-08 20:53:13 DEBUG Selenium [:example_id] this is detailed debug information 

Logs debugging details at level: FINER and FINEST

Logger output

Logs can be displayed in the console or stored in a file. Different languages have different defaults.

By default all logs are sent to System.err. To direct output to a file, you need to add a handler:

        Handler handler = new FileHandler("selenium.xml");
        logger.addHandler(handler);

By default all logs are sent to sys.stderr. To direct output somewhere else, you need to add a handler with either a StreamHandler or a FileHandler:

    handler = logging.FileHandler(log_path)
    logger.addHandler(handler)

By default all logs are sent to System.Console.Error output. To direct output somewhere else, you need to add a handler with a FileLogHandler:

            Log.Handlers.Add(new FileLogHandler(filePath));

By default, logs are sent to the console in stdout.
To store the logs in a file:

Selenium v4.10

    logger.output = file_name

JavaScript does not currently support sending output to a file.

To send logs to console output:

logging.installConsoleHandler()

Logger filtering

Java logging is managed on a per class level, so instead of using the root logger (Logger.getLogger("")), set the level you want to use on a per-class basis:

        Logger.getLogger(RemoteWebDriver.class.getName()).setLevel(Level.FINEST);
        Logger.getLogger(SeleniumManager.class.getName()).setLevel(Level.SEVERE);
Because logging is managed by module, instead of working with just "selenium", you can specify different levels for different modules:
    logging.getLogger('selenium.webdriver.remote').setLevel(logging.WARN)
    logging.getLogger('selenium.webdriver.common').setLevel(logging.DEBUG)

.NET logging is managed on a per class level, set the level you want to use on a per-class basis:

            Log.SetLevel(typeof(RemoteWebDriver), LogEventLevel.Debug);
            Log.SetLevel(typeof(SeleniumManager), LogEventLevel.Info);

Ruby’s logger allows you to opt in (“allow”) or opt out (“ignore”) of log messages based on their IDs. Everything that Selenium logs includes an ID. You can also turn on or off all deprecation notices by using :deprecations.

These methods accept one or more symbols or an array of symbols:

Selenium v4.10

    logger.ignore(:jwp_caps, :logger_info)

or

Selenium v4.10

    logger.allow(%i[selenium_manager example_id])

10.3 - Upgrade to Selenium 4

Are you still using Selenium 3? This guide will help you upgrade to the latest release!

Upgrading to Selenium 4 should be a painless process if you are using one of the officially supported languages (Ruby, JavaScript, C#, Python, and Java). There might be some cases where a few issues can happen, and this guide will help you to sort them out. We will go through the steps to upgrade your project dependencies and understand the major deprecations and changes the version upgrade brings.

These are the steps we will follow to upgrade to Selenium 4:

  • Preparing our test code
  • Upgrading dependencies
  • Potential errors and deprecation messages

Note: while Selenium 3.x versions were being developed, support for the W3C WebDriver standard was implemented. Both this new protocol and the legacy JSON Wire Protocol were supported. Around version 3.11, Selenium code became compliant with the level W3C 1 specification. The W3C compliant code in the latest version of Selenium 3 will work as expected in Selenium 4.

Preparing our test code

Selenium 4 removes support for the legacy protocol and uses the W3C WebDriver standard by default under the hood. For most things, this implementation will not affect end users. The major exceptions are Capabilities and the Actions class.

Capabilities

If the test capabilities are not structured to be W3C compliant, may cause a session to not be started. Here is the list of W3C WebDriver standard capabilities:

  • browserName
  • browserVersion (replaces version)
  • platformName (replaces platform)
  • acceptInsecureCerts
  • pageLoadStrategy
  • proxy
  • timeouts
  • unhandledPromptBehavior

An up-to-date list of standard capabilities can be found at W3C WebDriver.

Any capability that is not contained in the list above, needs to include a vendor prefix. This applies to browser specific capabilities as well as cloud vendor specific capabilities. For example, if your cloud vendor uses build and name capabilities for your tests, you need to wrap them in a cloud:options block (check with your cloud vendor for the appropriate prefix).

Before

Move Code

DesiredCapabilities caps = DesiredCapabilities.firefox();
caps.setCapability("platform", "Windows 10");
caps.setCapability("version", "92");
caps.setCapability("build", myTestBuild);
caps.setCapability("name", myTestName);
WebDriver driver = new RemoteWebDriver(new URL(cloudUrl), caps);
caps = {};
caps['browserName'] = 'Firefox';
caps['platform'] = 'Windows 10';
caps['version'] = '92';
caps['build'] = myTestBuild;
caps['name'] = myTestName;
DesiredCapabilities caps = new DesiredCapabilities();
caps.SetCapability("browserName", "firefox");
caps.SetCapability("platform", "Windows 10");
caps.SetCapability("version", "92");
caps.SetCapability("build", myTestBuild);
caps.SetCapability("name", myTestName);
var driver = new RemoteWebDriver(new Uri(CloudURL), caps);
      caps = Selenium::WebDriver::Remote::Capabilities.firefox
      caps[:platform] = 'Windows 10'
      caps[:version] = '92'
      caps[:build] = my_test_build
      caps[:name] = my_test_name
      driver = Selenium::WebDriver.for :remote, url: cloud_url, desired_capabilities: caps
      driver.get(url)
      driver.quit
caps = {}
caps['browserName'] = 'firefox'
caps['platform'] = 'Windows 10'
caps['version'] = '92'
caps['build'] = my_test_build
caps['name'] = my_test_name
driver = webdriver.Remote(cloud_url, desired_capabilities=caps)

After

Move Code

FirefoxOptions browserOptions = new FirefoxOptions();
browserOptions.setPlatformName("Windows 10");
browserOptions.setBrowserVersion("92");
Map<String, Object> cloudOptions = new HashMap<>();
cloudOptions.put("build", myTestBuild);
cloudOptions.put("name", myTestName);
browserOptions.setCapability("cloud:options", cloudOptions);
WebDriver driver = new RemoteWebDriver(new URL(cloudUrl), browserOptions);
capabilities = {
  browserName: 'firefox',
  browserVersion: '92',
  platformName: 'Windows 10',
  'cloud:options': {
     build: myTestBuild,
     name: myTestName,
  }
}
var browserOptions = new FirefoxOptions();
browserOptions.PlatformName = "Windows 10";
browserOptions.BrowserVersion = "92";
var cloudOptions = new Dictionary<string, object>();
cloudOptions.Add("build", myTestBuild);
cloudOptions.Add("name", myTestName);
browserOptions.AddAdditionalOption("cloud:options", cloudOptions);
var driver = new RemoteWebDriver(new Uri(CloudURL), browserOptions);
      options = Selenium::WebDriver::Options.firefox
      options.platform_name = 'Windows 10'
      options.browser_version = 'latest'
      cloud_options = {}
      cloud_options[:build] = my_test_build
      cloud_options[:name] = my_test_name
      options.add_option('cloud:options', cloud_options)
      driver = Selenium::WebDriver.for :remote, capabilities: options
      driver.get(url)
      driver.quit
from selenium.webdriver.firefox.options import Options as FirefoxOptions
options = FirefoxOptions()
options.browser_version = '92'
options.platform_name = 'Windows 10'
cloud_options = {}
cloud_options['build'] = my_test_build
cloud_options['name'] = my_test_name
options.set_capability('cloud:options', cloud_options)
driver = webdriver.Remote(cloud_url, options=options)

Find element(s) utility methods in Java

The utility methods to find elements in the Java bindings (FindsBy interfaces) have been removed as they were meant for internal use only. The following code samples explain this better.

Finding a single element with findElement*

Before

driver.findElementByClassName("className");
driver.findElementByCssSelector(".className");
driver.findElementById("elementId");
driver.findElementByLinkText("linkText");
driver.findElementByName("elementName");
driver.findElementByPartialLinkText("partialText");
driver.findElementByTagName("elementTagName");
driver.findElementByXPath("xPath");
After

driver.findElement(By.className("className"));
driver.findElement(By.cssSelector(".className"));
driver.findElement(By.id("elementId"));
driver.findElement(By.linkText("linkText"));
driver.findElement(By.name("elementName"));
driver.findElement(By.partialLinkText("partialText"));
driver.findElement(By.tagName("elementTagName"));
driver.findElement(By.xpath("xPath"));

Finding a multiple elements with findElements*

Before

driver.findElementsByClassName("className");
driver.findElementsByCssSelector(".className");
driver.findElementsById("elementId");
driver.findElementsByLinkText("linkText");
driver.findElementsByName("elementName");
driver.findElementsByPartialLinkText("partialText");
driver.findElementsByTagName("elementTagName");
driver.findElementsByXPath("xPath");
After

driver.findElements(By.className("className"));
driver.findElements(By.cssSelector(".className"));
driver.findElements(By.id("elementId"));
driver.findElements(By.linkText("linkText"));
driver.findElements(By.name("elementName"));
driver.findElements(By.partialLinkText("partialText"));
driver.findElements(By.tagName("elementTagName"));
driver.findElements(By.xpath("xPath"));

Upgrading dependencies

Check the subsections below to install Selenium 4 and have your project dependencies upgraded.

Java

The process of upgrading Selenium depends on which build tool is being used. We will cover the most common ones for Java, which are Maven and Gradle. The minimum Java version required is still 8.

Maven

Before

<dependencies>
  <!-- more dependencies ... -->
  <dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>3.141.59</version>
  </dependency>
  <!-- more dependencies ... -->
</dependencies>
After

<dependencies>
    <!-- more dependencies ... -->
    <dependency>
        <groupId>org.seleniumhq.selenium</groupId>
        <artifactId>selenium-java</artifactId>
        <version>4.4.0</version>
    </dependency>
    <!-- more dependencies ... -->
</dependencies>

After making the change, you could execute mvn clean compile on the same directory where the pom.xml file is.

Gradle

Before

plugins {
    id 'java'
}
group 'org.example'
version '1.0-SNAPSHOT'
repositories {
    mavenCentral()
}
dependencies {
    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.7.0'
    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.7.0'
    implementation group: 'org.seleniumhq.selenium', name: 'selenium-java', version: '3.141.59'
}
test {
    useJUnitPlatform()
}
After

plugins {
    id 'java'
}
group 'org.example'
version '1.0-SNAPSHOT'
repositories {
    mavenCentral()
}
dependencies {
    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.7.0'
    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.7.0'
    implementation group: 'org.seleniumhq.selenium', name: 'selenium-java', version: '4.4.0'
}
test {
    useJUnitPlatform()
}

After making the change, you could execute ./gradlew clean build on the same directory where the build.gradle file is.

To check all the Java releases, you can head to MVNRepository.

C#

The place to get updates for Selenium 4 in C# is NuGet. Under the Selenium.WebDriver package you can get the instructions to update to the latest version. Inside of Visual Studio, through the NuGet Package Manager you can execute:

PM> Install-Package Selenium.WebDriver -Version 4.4.0

Python

The most important change to use Python is the minimum required version. Selenium 4 will require a minimum Python 3.7 or higher. More details can be found at the Python Package Index. To upgrade from the command line, you can execute:

pip install selenium==4.4.3

Ruby

The update details for Selenium 4 can be seen at the selenium-webdriver gem in RubyGems. To install the latest version, you can execute:

gem install selenium-webdriver

To add it to your Gemfile:

gem 'selenium-webdriver', '~> 4.4.0'

JavaScript

The selenium-webdriver package can be found at the Node package manager, npmjs. Selenium 4 can be found here. To install it, you could either execute:

npm install selenium-webdriver

Or, update your package.json and run npm install:

{
  "name": "selenium-tests",
  "version": "1.0.0",
  "dependencies": {
    "selenium-webdriver": "^4.4.0"
  }
}

Potential errors and deprecation messages

Here is a set of code examples that will help to overcome the deprecation messages you might encounter after upgrading to Selenium 4.

Java

Waits and Timeout

The parameters received in Timeout have switched from expecting (long time, TimeUnit unit) to expect (Duration duration).

Before

driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.manage().timeouts().setScriptTimeout(2, TimeUnit.MINUTES);
driver.manage().timeouts().pageLoadTimeout(10, TimeUnit.SECONDS);
After

driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
driver.manage().timeouts().scriptTimeout(Duration.ofMinutes(2));
driver.manage().timeouts().pageLoadTimeout(Duration.ofSeconds(10));

Waits are also expecting different parameters now. WebDriverWait is now expecting a Duration instead of a long for timeout in seconds and milliseconds. The withTimeout and pollingEvery utility methods from FluentWait have switched from expecting (long time, TimeUnit unit) to expect (Duration duration).

Before

new WebDriverWait(driver, 3)
.until(ExpectedConditions.elementToBeClickable(By.cssSelector("#id")));

Wait<WebDriver> wait = new FluentWait<WebDriver>(driver)
  .withTimeout(30, TimeUnit.SECONDS)
  .pollingEvery(5, TimeUnit.SECONDS)
  .ignoring(NoSuchElementException.class);
After

new WebDriverWait(driver, Duration.ofSeconds(3))
  .until(ExpectedConditions.elementToBeClickable(By.cssSelector("#id")));

  Wait<WebDriver> wait = new FluentWait<WebDriver>(driver)
  .withTimeout(Duration.ofSeconds(30))
  .pollingEvery(Duration.ofSeconds(5))
  .ignoring(NoSuchElementException.class);

Merging capabilities is no longer changing the calling object

It was possible to merge a different set of capabilities into another set, and it was mutating the calling object. Now, the result of the merge operation needs to be assigned.

Before

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("platformVersion", "Windows 10");
FirefoxOptions options = new FirefoxOptions();
options.setHeadless(true);
options.merge(capabilities);

// As a result, the `options` object was getting modified.
After

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("platformVersion", "Windows 10");
FirefoxOptions options = new FirefoxOptions();
options.setHeadless(true);
options = options.merge(capabilities);

// The result of the `merge` call needs to be assigned to an object.

Firefox Legacy

Before GeckoDriver was around, the Selenium project had a driver implementation to automate Firefox (version <48). However, this implementation is not needed anymore as it does not work in recent versions of Firefox. To avoid major issues when upgrading to Selenium 4, the setLegacy option will be shown as deprecated. The recommendation is to stop using the old implementation and rely only on GeckoDriver. The following code will show the setLegacy line deprecated after upgrading.

FirefoxOptions options = new FirefoxOptions();
options.setLegacy(true);

BrowserType

The BrowserType interface has been around for a long time, however it is getting deprecated in favour of the new Browser interface.

Before

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("browserVersion", "92");
capabilities.setCapability("browserName", BrowserType.FIREFOX);
After

MutableCapabilities capabilities = new MutableCapabilities();
capabilities.setCapability("browserVersion", "92");
capabilities.setCapability("browserName", Browser.FIREFOX);

C#

AddAdditionalCapability is deprecated

Instead of it, AddAdditionalOption is recommended. Here is an example showing this:

Before

var browserOptions = new ChromeOptions();
browserOptions.PlatformName = "Windows 10";
browserOptions.BrowserVersion = "latest";
var cloudOptions = new Dictionary<string, object>();
browserOptions.AddAdditionalCapability("cloud:options", cloudOptions, true);
After

var browserOptions = new ChromeOptions();
browserOptions.PlatformName = "Windows 10";
browserOptions.BrowserVersion = "latest";
var cloudOptions = new Dictionary<string, object>();
browserOptions.AddAdditionalOption("cloud:options", cloudOptions);

Python

executable_path has been deprecated, please pass in a Service object

In Selenium 4, you’ll need to set the driver’s executable_path from a Service object to prevent deprecation warnings. (Or don’t set the path and instead make sure that the driver you need is on the System PATH.)

Before

from selenium import webdriver
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(
    executable_path=CHROMEDRIVER_PATH, 
    options=options
)
After

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
options = webdriver.ChromeOptions()
service = ChromeService(executable_path=CHROMEDRIVER_PATH)
driver = webdriver.Chrome(service=service, options=options)

Summary

We went through the major changes to be taken into consideration when upgrading to Selenium 4. Covering the different aspects to cover when test code is prepared for the upgrade, including suggestions on how to prevent potential issues that can show up when using the new version of Selenium. To finalize, we also covered a set of possible issues that you can bump into after upgrading, and we shared potential fixes for those issues.

This was originally posted at https://saucelabs.com/resources/articles/how-to-upgrade-to-selenium-4