HOPE 2020 (2020): "Back Seat Webdriving via Browser Automation" (Download)
Sunday, July 26, 2020: 2100. There are many reasons to automate web browsing for security purposes, from scraping websites, to request manipulation, to task automation. Staid tools like wget and curl are a good start. But the modern web is dynamic and often client-side, limiting the effectiveness of these tools. Luckily, most modern web browsers provide webdriver engines that, when coupled with an automation framework, allow users near limitless methods to automate interactive web browsing sessions as if they were interacting with the browser themselves.
This talk will share basic concepts and advanced tips and tricks from years of experience automating web browsers using automation frameworks like Selenium. It will begin by discussing common methods of web automation, the Document Object Model and how to use it, and how webdrivers work with automation frameworks. From there more advanced topics will be explored such as browser configuration for research, headless browsing, interacting with modals, dealing with catpchas, and logging all the things. Code snippets will be provided along the way, including multiple methods of solving most problems.