top of page

About QA - AI and what's in between

  • Apr 25
  • 16 min read
A cutting-edge tech infographic in Hebrew that shows the connection between QA and AI. The image details four key areas: script automation, visual and API testing, outcome analysis and regression prediction, and business optimization. The design combines elements of printed circuit boards, data networks, and modern icons on a dark blue background.

introduction

The future is here; artificial intelligence (AI) is changing the face of testing and quality assurance (QA) processes.

Today, with the proper use of artificial intelligence, every organization can leap to the forefront of innovation. Instead of relying solely on manual scripting, which consumes significant time and resources, testing processes can now be streamlined with machine learning and deep learning algorithms, improving accuracy and accelerating feedback.

  • Maximum efficiency: Select and focus on high-risk test cases in real time, saving resources and increasing test coverage.

  • Uncompromising quality: Identify functional and visual deviations with an accuracy that far exceeds human ability.

  • Fast development pace: Integrate CI/CD with smart gatekeepers and automated feedback to ensure that every new version goes to Production with high quality.


In this article, we will review the key benefits of integrating AI into the testing system, present practical examples of existing uses, and together shape the outlook for the future, in which every test is another opportunity to innovate and grow.


The potential of AI in QA

1. Saving time and costs

  • Automation of menial tasks: Deploying intelligent bots that will run repeated tests, collect logs, and define reports without human intervention, instead of QA testers performing manual work.

  • Shortening the Feedback Loop: In CI/CD, an AI model can identify bugs as early as the pre-integration stage and alert immediately, which reduces time to fix and reduces late-fix costs.

  • Streamlining development resources: Instead of writing hundreds of scripts manually, QA engineers can focus on creating complex and creative cases, while AI generates and maintains routine scripts.


2. Improving quality and reliability

  • Error Pattern Detection: Analyzing historical data (logs, error reports) allows the model to find recurring patterns — for example, recurring bugs in a specific module — and focus resources on them.

  • Predictive Testing: Smart algorithms predict which scenarios will crash after code changes, ensuring that fewer bugs reach production.

  • Automated Root Cause Analysis: Using NLP to identify root cause explanations for stack traces and error messages, and automatically suggest a first fix.


3. Dynamic adaptation and continuous improvement

  • Continuous Learning: Each test run provides new data—successes and failures—and the models use the new data to improve the quality of predictions and choices.

  • Self-Optimizing Pipelines: Recurring issues cause AI to automatically run additional tests on problematic modules and drop unnecessary tests on stable modules.


4. Expanding coverage and testing edge cases

  • Synthetic test data generation: Models create complex and unpredictable data sets (e.g., extreme-length strings, special characters) to find bugs that are not detected in regular real data.

  • Smart fuzz testing scripts: AI generates many variations of API calls or UI actions that will cause the system to go beyond the end-case scenarios and identify vulnerabilities.


5. Scalability and operational flexibility

  • Intelligent load balancing: When the job is run, the AI ​​divides the test run between different machines based on the complexity of the scripts, enabling fast and cost-effective parallel execution in the cloud or on-premise.

  • Support for a variety of technologies: AI-based tools can work with Web, Mobile, and Desktop interfaces and monitor backend infrastructures – all on one centralized platform.


6. Business innovation/decision support

  • Futuristic dashboards: Real-time reporting on quality metrics trends (such as MTTR, failure rate) and mapping relationships between bugs and business modules, to show ROI for QA investment.

  • Risk-Based testing planning: Using AI, scripts can be categorized/prioritized by business risk (e.g., Online payment tests – top priority) and recommended execution order according to the strategic needs of the organization.


Creating automated test scripts

1. Introduction to the idea

A test script is a set of commands that perform automated testing of a particular software component. In the past, these scripts were written manually, line by line, based on understanding the system. Today, artificial intelligence models (mainly large language models – LLMs) allow us to generate scripts automatically from documentation, requirements specifications, or existing code.


2. How it works

Input to the AI model

  • Description of the test scenario in natural language (for example: “Test login with a valid user and an incorrect password”).

  • Link to the page's DOM (to Playwright/Selenium), API specification, or source code.

Input processing

  • The model identifies entities (buttons, fields, menus) and maps them to technical elements (locators, selectors).

  • Creates logic for running actions: opening the browser, navigating, entering values, clicking, validating a result.

Output – Automatic script

  • Usually in the format of an existing framework.

  • Also includes asserts/expectations for checking results (such as the existence of certain text, checking HTTP status).


3. Key Benefits

  • Development speed: Instead of hours of manual writing, basic scripts are built in minutes.

  • Consistency: The model generates code according to a uniform template, reducing syntax errors and “drift” in writing style.

  • Instant feedback: You can change a scenario description and request an immediate refresh of the script.

  • Adaptation to changing environments: If the UI design changes (IDs, class names), you can request that the selectors be converted relatively easily.


4. Challenges and recommendations for overcoming them

The accuracy of the selectors

Always use dedicated data-attributes (data-test-id) to prevent breakage.

Script maintenance

Separate logic (Page Object Model) — define classes of elements on the side page, and running the steps will be based on these classes.

Full coverage of edge cases

Identifying “edge cases” still requires active human intervention.


5. A look into the future in the field

Learning UI Language Models

Models that can read screen images and generate test scripts from them.

No-Code Automation

Graphical interfaces that allow drag-and-drop of test scenarios – and the AI translates them into code in the background.

Self-healing scripts

Scripts that detect changed elements (e.g., changed ID) and automatically update the selector using pattern recognition.


Identifying and testing APIs

1. Definition

APIs – Application Programming Interfaces are the meeting points between the various software components – Frontend and Backend, or between microservices. API testing tests the inputs (requests) and outputs (responses) at the protocol level, without the need for a user interface.


2. Interface Discovery Steps

Documented API specifications

  • OpenAPI / Swagger: A YAML/JSON file that describes all endpoints, parameters, response structures, and data models.

  • RAML / API Blueprint: Common alternatives to text-based description.

Dynamic Inspection (Runtime Inspection)

  • Using proxy tools (such as Charles, Fiddler, Mitmproxy) and tracking calls while the application is running.

  • Detecting new endpoints that may not be documented.

Source code

  • Search the codebase for Controllers in Spring Boot, Routers Express.js, etc., to discover internal or experimental endpoints.


3. Common tools

name

Short description

Postman

A user-friendly platform for building, organizing, and running API calls, with JavaScript scripting support.

Rest Assured

A Java library for writing intuitive API tests using DSL.

HTTPie

A simple and convenient CLI tool for running HTTP calls and integrating into scripts.

Swagger UI

A graphical interface that displays OpenAPI and allows for interactive testing from within the browser.

Pact

Contract Testing library for Node.js/Java/.NET, with integration into CI environments.

JMeter/Gatling/k6

Load and performance tools with complex scenario options and throughput test runs.

OWASP ZAP/Burp

Security tool for automatic scanning and penetration testing of HTTP/S interfaces.

 

4. Integration in a CI/CD environment

Running tests as part of a pipeline

  • For every change made to the code, run a dedicated API testing step and scan reports (JUnit, HTML).

Threshold Gates

  • For example, if more than 1% of the tests fail, reject the code and allow deployment only after correction.

Reporting and automation

  • Integrate with Allure or ReportPortal to display insights and a live dashboard of API status over time.


5. Challenges and recommendations

  • Documentation coherence: It is important to ensure that any changes to the interface are documented in OpenAPI.

  • Versioning: Using URIs like /v1/…, /v2/… to avoid breaking existing consumers.

  • Mocking/Stubbing: In a development environment, create a copy of the API using WireMock or MockServer for consumer testing without touching the production environment.

  • Data Clean-Up: After create/update/delete checks, return the system to a clean state to prevent fragmented processes.

 

Test Results Analysis and Defect Triage

1. Definition

  • Test Results Analysis is the phase in which the output obtained from the test run—reports, logs, UI states, etc. is examined to understand what was updated, what defects were found, and their severity.

  • Defect Triage is a systematic process in which each defect found is prioritized, categorized, and assigned responsibility, in order to effectively manage the remediation and release of the fixes.


2. The stages of the process

Collecting results

  • Runtime reports from all Frameworks (Unit, Integration, API, UI).

  • System logs (application/server logs).

  • Performance reports if available (response times, memory/CPU).


Initial filtering (Filtering & Grouping)

  • Filtering false-positives (tests that failed due to environmental circumstances, not code errors).

  • Grouping recurring errors by stack trace, message, or location in the code.


Classification by severity and type

  • Severity levels – Critical, Major, Minor, Trivial.

  • Priority – Key: P0/P1 Must fix before release; P2 – Recommended; P3 – Postpone to next release).

  • Type – Functional, Performance, Security, Usability.


Assignment of positions

  • Who is responsible for the fix (Developer, DevOps, Security Team).

  • Determine who is responsible for validating the fix (QA) and who should close the ticket.


Effort Estimation

  • Estimating the time required for repair and re-inspection.

  • Integrate into the Sprint or Release backlog assessment.


Documentation and drawing conclusions

  • Input to bug management tools (Jira, Azure DevOps) with accurate documentation: steps to reproduce, system data, screenshot/log.

  • Publishing a weekly/daily Triage report to the product management and development circle.


3. Process support tools

category

Main tool

Job description

Bug management

Jira, Azure DevOps, GitHub Issues

Fault documentation, assignment, status tracking and prioritization.

Log analysis

ELK Stack, Splunk, Graylog

Collection, search, and aggregation of logs by run ID, error type.

Dashboards and monitoring

Grafana, Kibana, Azure Monitor

Displaying failure trends, response times, and failure rates over time.

Defect Triage Coordination

Confluence, Slack, Microsoft Teams

Conduct structured triage meetings, share reports and approvals.

 

4. Best Practices

Report automation

  • Define reports that automatically centralize all failures in one report, with links to logs and screenshots.


Distinct versions

  • Tag bugs by the code version in which they were observed, to ensure consistency and easy recovery.


Effective Triage Meetings

  • Set a regular agenda: a short daily check-in (15 minutes) and a more detailed weekly triage meeting.


KPIs

  • Know the MTTR (Mean Time To Resolve), the size of the queues (Backlog), and the False Positives rate.


Sharing and training

  • Provide developers and the QA team with guidance and training to create a common language between the teams – for example, proper bug ticket drafting (clear steps, environment).


5. Challenges and recommendations for overcoming them

Loaded with bugs

  • Use auto-triage to cluster similar problems and avoid duplication.


Coordination between teams

  • Define SLA for Triage meetings and viewing the early Triage report.


Many False Positives

  • Track down the source of the FP (infrastructure, data instability) and address the underlying issue to improve test stability.

 

Regression prediction and test case selection

1. Definition and objectives

  • Regression Prediction: Using historical test run and bug data to predict which areas of the system are likely to experience problems following code changes.

  • Test Case Selection: Based on the prediction, select the most relevant test cases to run, to optimize CI/CD resources and shorten feedback times.


2. General process

Historical data collection

  • Data on all previous test runs: status (Pass/Fail), timing, code version, commit ID.

  • Bug logs: stack traces, log files, description of the commit in which the problem occurred.


Training a predictive model

Feature Engineering:

  • Generate features such as lines_changed, files_touched, historical_failure_rate for each file/module.


Choose a suitable algorithm:

  • Random Forest, XGBoost, or graph-based models like GNN – Graph Neural Networks that can map dependencies between components.

  • Training and cross-validation to measure prediction accuracy (precision/recall).


Running the model on the new changes

In each Merge Request/Pull Request, the system sends the information about the changes (diff) and commit data to the AI, and receives back


Mapping to Test Cases

  • Define a mapping between files in the system and test cases (using Metadata in scripts, such as @covers login.js).

  • Select only the scripts that cover the files with the highest score (for example, any script that receives a risk_score ≥ 0.7).


Running in a CI/CD environment

  • In GitLab/Jenkins/DevOps, an “Impact Analysis” phase is defined that checks which scripts to run based on the model's output.


3. Tools and technologies

category

Tool/Library

Description

Models and ML Platforms

TensorFlow, PyTorch

Development and training of regression prediction models.

Risk Analysis

Commercial/open source solutions for AI-based Test Impact Analysis.

Information about versions and changes

Git, GitHub API

Retrieving the diff and commit history for Feature Engineering.

CI/CD Integration

Jenkins, GitLab CI, Azure DevOps

Integrating the Impact Analysis phase into the pipeline.

Managing the relationship between code and scripts

Allure, TestRail

Tagging scripts and mapping them to components in the code.

 

4. Best Practices

Constant updating of the model

  • Make sure to re-train periodically – for example, a month after the last goal – with new data.


Dynamic Threshold

  • Adjust the selection threshold according to pipeline load and recommended test times.


Log and transparency

  • Keep a log of the model's decisions and conduct A/B tests to monitor performance: Were only the relevant scripts actually run?


Fallback fill run

  • In cases of model failure or missing data, run all scripts to avoid missing critical bugs.


5. Challenges and recommendations for overcoming them

Dealing with big changes

  • A fundamental change in the code architecture (extensive refactoring) may mislead the model.

Recommendation: Mark large commits as "unsafe" and require a full run.


Lack of precise mapping between code and test cases

  • Scripts with no labeling or poor mapping may not be selected.

Recommendation: Enforce uniform script tagging (@component:auth, @module:payments).


Calculation and implementation costs

  • Model training is heavy on GPU resources and cloud costs.

Recommendation: Use fixed time machines or spot instances, and run a round-robin training.

 

Visual Testing

1. Definition

Visual testing involves identifying unexpected changes in the user interface (UI) by comparing screenshots across different versions. The goal is to catch visual deviations—such as shifting elements, incorrect colors, inconsistent fonts, and responsiveness issues—that might otherwise be missed during manual functional testing.


2. How it works

Baseline image collection

  • During the first test run, screenshots of all suspicious screens/components are taken.

  • The images are saved as “base images” to which we will compare in the future.


Repeated test run

  • With every code change, the UI scenario automatically runs and saves up-to-date images.


Comparison (Image Comparison)

  • Pixel-by-pixel or fuzzy matching algorithms compare the baseline to the current.

  • Identifies diff regions: regions where pixels differ beyond a defined sensitivity threshold.


Reporting results

  • Each deviation is displayed in the report with an overlay of the diff (a red block above the changing area).

  • Raises an automatic bug ticket (e.g., in Jira) if the threshold is exceeded.


3. Common tools

Tool / Framework

Description

Applitools Eyes

An AI-based platform that uses Visual AI to detect intelligent deviations.

Percy

A cloud service that integrates with CI for taking and comparing snapshots.

BackstopJS

An open-source JavaScript tool for comparing images by Puppeteer.

Selenium + OpenCV

Manual integration of Selenium for capture and OpenCV for advanced comparison.

Storybook + Chromatic

Testing UI components built into the Storybook library in the Chromatic cloud.

4. Integration in a CI/CD environment

  • Add a backstop reference step in the initial pipeline run (per branch or in controlled release).


5. Challenges and recommendations

challenge

recommendation

False Positives

Adjust the MisMatch Threshold or use “IgnoreAreas” to remove dynamic areas (e.g. timer).

Responsive support

Define multiple viewports and test on different screens (mobile, tablet, desktop).

Support for animations and dynamics

Wait for a delay or block animations with CSS (eg, * { animation: none !important; }).

Baseline Management

Keep a baseline in Git and use a separate branch to avoid conflicts.

6. Best Practices

Partial Matching

  • Only test critical elements (headlines, forms, call-to-action buttons) instead of the entire page.


Visual AI

  • Prefer tools like Applitools that identify and don't bother with pixel thresholds, but rather understand "similar" in terms of structure and location.


Storybook Integration

  • If you're building UI components as separate packages, test them within Storybook in Chromatic for early testing.


Documentation and sharing

  • Distribute UI reports on a shared dashboard so QA and developers can quickly see what has changed.


Automated Baseline Approval

  • If the change is valid (for example, a new design), give QA managers the option to automatically approve a new baseline through CI.


7. Looking to the Future

AI-Driven Visual Analysis

  • Beyond pixel comparison: Identifying elements (buttons, text, images) and changes in their context.


Accessibility Visual Testing

  • Verifying color contrast, font size, and therapeutic elements using AI.


Self-Healing Snapshots

  • Infrastructures that can automatically “revive” a baseline when agreed-upon changes are developed instead of creating multiple images.

 

Challenges and concerns

1. Reliability of the results

  • Artificial intelligence models are not perfect, and in dynamic environments—where the UI, DOM, or data change at a high rate—they can be wrong.

  • False Positives: The system reports a visual or functional deviation when there is actually no real problem.

  • False Negatives (“Missing Faults”): A critical change goes under the radar because the model is not restrictive enough or because the threshold is set too high.

Effects:

  • Wasting time of QA engineers and developers on manual testing for approval and verification.

  • Lack of trust in the automated tool, which may lead the team back to old manual methods.

  • Risk of releasing critical bugs that were not fixed/diagnosed in a timely manner.


My ways

Dynamic adjustment of boundaries

  • Setting MisMatch Threshold at varying levels for different components: buttons and input fields have a low threshold, “cosmetic” design elements have a higher threshold.


Using Visual AI tools

  • Moving beyond presenting raw pixel-agnostic methods to methods based on identifying elements and understanding the DOM structure (such as Applitools), which reduce false positives.


Controlled human supervision

  • Each automated critical deviation report undergoes limited human peer review before upgrading the baseline or opening a bug ticket.


Monitoring model performance

  • Checking the percentage of false positives/negatives over time and triggering alerts when they rise above a reasonable threshold.


2. Data and Training Requirements

Machine learning models need large amounts of training data:

  • Test history data

  • Bug history data

Effects:

  • Manually collecting and labeling data is challenging and requires a lot of effort.

  • Risk of inconsistent data (e.g., variable naming conventions), which harms training quality.

  • Privacy and regulatory issues when using real user data — GDPR, ISO 27001.


Ways to cope:

Early planning of data labeling

  • Defining uniform templates for documenting test results and bugs (required fields, clear formats).


Anonymization and aggregation

  • Removing personally identifiable information (PII) and sensitive data before training, using hashing or aggregation.


Automated Data Pipeline

  • Automatic connection between the CI/CD tools and a data validation system that exports and stores clean data in a dedicated repository.


Collaborations with security and regulatory teams

  • Building GDPR-compliant processes for collecting and using data, including signing Data Processing Agreements (DPA).


3. Implementation and maintenance

Integrating AI tools into existing infrastructures often requires:

  • Adjustments to the system architecture (microservices, data pipelines).

  • Cultural change in teams — new workflow processes and writing metadata for scripts.

  • A process of continuously updating and upgrading the models to adapt them to innovations in the project and technology.

Effects:

  • High development and DevOps costs for designing data infrastructure and writing integration layers.

  • A new learning curve for QA and development teams — the need for training and memorization of new practices.

  • Risk of "model obsolescence" that does not improve over time due to a lack of defined re-training processes.


Ways to cope:

Proof of Concept – Rated POC

  • Start with a small project or one component and demonstrate clear business value before expanding.


Modular infrastructure

  • Building AI components as independent microservices, which communicate through clear APIs, to avoid a broad impact on the system.


Automatic re-training process

  • Define a separate Job in CI whose purpose is to collect new data, retrain the model, and deploy it similarly to canary releases.


Documentation and internal support

  • Create a Wiki or Confluence with user guides, architecture diagrams, and error handling guidelines.

 

Costs and ROI

Significant initial investment in model development and training:

  • Hardware acquisition (GPU/TPU), cloud costs (compute/storage).

  • License costs for commercial tools.

  • Development resources and QA tools for data collection and operations.


Effects:

  • Difficulty justifying a budget to management when benefits are not directly measured (time reduction, bug prevention).

  • Complex measurement of return on investment in implementing the use of AI, due to the qualitative and asymmetric nature of the improvement.


Ways to cope:

Measuring clear KPIs

  • Determine metrics such as:

  • Decrease in MTTR for UI issues

  • Visual inspection coverage percentage

  • Average time saved to create a test script

  • Track these metrics before and after implementation.


Focus on Quick Wins

  • Choose simple, high-ROI scenarios (e.g., smoke tests for criticality) to demonstrate value quickly.


Integrated models

  • Combining simple open source tools (BackstopJS) with advanced tools only for critical parts, to regulate costs.


Gradual financing

  • Set up budget monitoring and alerts for crossing cost thresholds, and create periodic ROI reports that reflect improvements.


A look into the future of AI in QA and testing

  • Full autonomy in testing scenarios

The next generation of tools will build and run end-to-end scenarios without human intervention. The system will independently detect edge conditions, interact with UI/API, and issue detailed error reports.

A combination of agents and smart RPA, which will complete repetitive tasks—such as filling out forms, checking emails, transferring files—without the need for manual scripts.


  • Scripting using natural language (NL2Test Code)

From now on, product or QA people will write “Given/When/Then” or simply write scenario descriptions in free language, and the AI will convert them directly into functional test code in Selenium, Playwright, Postman or any other framework.

The ability to combine natural language processing with DOM structure recognition will enable support for even complex applications (Desktop, Mobile, Web).


  • Self-Healing Tests and Self-Adaptive Pipelines

Scripts that will identify broken selectors or moved elements, and fix them “on the fly” by dynamically searching for elements based on patterns or similar attributes.

Pipelines that will adapt themselves through automation: If the AI detects multiple bugs in a particular module, additional testing in that module will be prioritized over testing of stable modules.


  • Reinforcement Learning for Stress Scenarios

Agents will autonomously investigate unexpected extreme scenarios—such as service collapse, abnormal load, or lateral slowness—and develop new scenarios to test the system's stability at levels never before tested.

Using “reward functions” will guide the agent to better learn which scenarios are important and in what order to run for testing.


  • Synthetic Monitoring & Testing

Systems that will generate complex synthetic test data and scenarios based on real user traffic, to perform load simulations and test services in a staging/production environment without interfering with customer data.

AI observability integration – will analyze network matrices and traces, identify obstacles, and generate automated test scenarios to reproduce problems.


  • AI-Assistants for QA and development teams

Chatbots will answer questions in real time (for example: “What scenario is relevant for this code change?”), Suggest test fixes, provide sample code, and contribute to script maintenance.

IDE integration: AI-powered development that will detect changes in test code and offer local updates or coverage feedback.


  • Intelligent Maintenance Automation

Continuous monitoring of test results and upgrade suggestions for old or redundant scripts based on usage statistics.

When the AI tool detects scripts that exceed Page Object Model disciplines, the tool will suggest refactoring and produce cleaner, more maintainable code.


  • No-Code/Low-Code Testing Platforms

Advanced GUIs where users will “drag” visual components, thus creating complex tests. The AI will translate the graphical flows into optimized and readable code.

Infrastructures that will be imported from existing projects (such as Postman Collections or TestRail) and transformed into editable graphical flows.


  • Automated Security and Compliance

The AI tool will review regulatory compliance (GDPR, HIPAA) at every stage: identify whether checks mention personal information, assign GDPR-aware tags, and generate compliance reports.

Automated scanning of security standards, not only functional security tests, but also penetration simulation tests.


  • Advanced Measurement and Analytics

Smart dashboards with predictive analytics: which test scenarios are likely to fail in the future, the estimated ROI, and recommendations for adding coverage in weak areas.

MLops integration: Automation of model training cycles, model version management, and A/B testing between different versions to find the best fit for the system.


Summary

Artificial intelligence is indeed a turning point at the core of QA and software testing processes: it enables high-speed automated script generation, deep insights from error data, and the identification of bug trends before they are discovered in production. Today, it interfaces with API testing and Visual Testing tools to perform both functional testing and intelligent display comparisons, and implements regression prediction algorithms and selects only the relevant tests.

Looking ahead, we could see these capabilities expanded to be fully integrated into CI/CD pipelines — from intelligent “gatekeepers” that decide when to run certain tests, to autonomous agents that will prepare test environments and install scripts “independently,” to the use of reinforcement learning to create previously unimaginable end-case scenarios. In addition, LLM-based virtual assistants will closely accompany development and QA teams, accelerating bug investigation, script writing, and maintenance.

Organizations that integrate technologies in a balanced manner—while ensuring infrastructure planning, proper model training, proper integration of the human factor, and systematic measurement of ROI—will produce dramatic improvements in reliability and release speed, reduce operating costs, and continuously improve the product experience, and will take their place at the forefront of innovation in the global software industry.

Comments


bottom of page