About QA - AI and what's in between
- Apr 25
- 16 min read

introduction
The future is here; artificial intelligence (AI) is changing the face of testing and quality assurance (QA) processes.
Today, with the proper use of artificial intelligence, every organization can leap to the forefront of innovation. Instead of relying solely on manual scripting, which consumes significant time and resources, testing processes can now be streamlined with machine learning and deep learning algorithms, improving accuracy and accelerating feedback.
Maximum efficiency: Select and focus on high-risk test cases in real time, saving resources and increasing test coverage.
Uncompromising quality: Identify functional and visual deviations with an accuracy that far exceeds human ability.
Fast development pace: Integrate CI/CD with smart gatekeepers and automated feedback to ensure that every new version goes to Production with high quality.
In this article, we will review the key benefits of integrating AI into the testing system, present practical examples of existing uses, and together shape the outlook for the future, in which every test is another opportunity to innovate and grow.
The potential of AI in QA
1. Saving time and costs
Automation of menial tasks: Deploying intelligent bots that will run repeated tests, collect logs, and define reports without human intervention, instead of QA testers performing manual work.
Shortening the Feedback Loop: In CI/CD, an AI model can identify bugs as early as the pre-integration stage and alert immediately, which reduces time to fix and reduces late-fix costs.
Streamlining development resources: Instead of writing hundreds of scripts manually, QA engineers can focus on creating complex and creative cases, while AI generates and maintains routine scripts.
2. Improving quality and reliability
Error Pattern Detection: Analyzing historical data (logs, error reports) allows the model to find recurring patterns — for example, recurring bugs in a specific module — and focus resources on them.
Predictive Testing: Smart algorithms predict which scenarios will crash after code changes, ensuring that fewer bugs reach production.
Automated Root Cause Analysis: Using NLP to identify root cause explanations for stack traces and error messages, and automatically suggest a first fix.
3. Dynamic adaptation and continuous improvement
Continuous Learning: Each test run provides new data—successes and failures—and the models use the new data to improve the quality of predictions and choices.
Self-Optimizing Pipelines: Recurring issues cause AI to automatically run additional tests on problematic modules and drop unnecessary tests on stable modules.
4. Expanding coverage and testing edge cases
Synthetic test data generation: Models create complex and unpredictable data sets (e.g., extreme-length strings, special characters) to find bugs that are not detected in regular real data.
Smart fuzz testing scripts: AI generates many variations of API calls or UI actions that will cause the system to go beyond the end-case scenarios and identify vulnerabilities.
5. Scalability and operational flexibility
Intelligent load balancing: When the job is run, the AI divides the test run between different machines based on the complexity of the scripts, enabling fast and cost-effective parallel execution in the cloud or on-premise.
Support for a variety of technologies: AI-based tools can work with Web, Mobile, and Desktop interfaces and monitor backend infrastructures – all on one centralized platform.
6. Business innovation/decision support
Futuristic dashboards: Real-time reporting on quality metrics trends (such as MTTR, failure rate) and mapping relationships between bugs and business modules, to show ROI for QA investment.
Risk-Based testing planning: Using AI, scripts can be categorized/prioritized by business risk (e.g., Online payment tests – top priority) and recommended execution order according to the strategic needs of the organization.
Creating automated test scripts
1. Introduction to the idea
A test script is a set of commands that perform automated testing of a particular software component. In the past, these scripts were written manually, line by line, based on understanding the system. Today, artificial intelligence models (mainly large language models – LLMs) allow us to generate scripts automatically from documentation, requirements specifications, or existing code.
2. How it works
Input to the AI model
Description of the test scenario in natural language (for example: “Test login with a valid user and an incorrect password”).
Link to the page's DOM (to Playwright/Selenium), API specification, or source code.
Input processing
The model identifies entities (buttons, fields, menus) and maps them to technical elements (locators, selectors).
Creates logic for running actions: opening the browser, navigating, entering values, clicking, validating a result.
Output – Automatic script
Usually in the format of an existing framework.
Also includes asserts/expectations for checking results (such as the existence of certain text, checking HTTP status).
3. Key Benefits
Development speed: Instead of hours of manual writing, basic scripts are built in minutes.
Consistency: The model generates code according to a uniform template, reducing syntax errors and “drift” in writing style.
Instant feedback: You can change a scenario description and request an immediate refresh of the script.
Adaptation to changing environments: If the UI design changes (IDs, class names), you can request that the selectors be converted relatively easily.
4. Challenges and recommendations for overcoming them
The accuracy of the selectors
Always use dedicated data-attributes (data-test-id) to prevent breakage.
Script maintenance
Separate logic (Page Object Model) — define classes of elements on the side page, and running the steps will be based on these classes.
Full coverage of edge cases
Identifying “edge cases” still requires active human intervention.
5. A look into the future in the field
Learning UI Language Models
Models that can read screen images and generate test scripts from them.
No-Code Automation
Graphical interfaces that allow drag-and-drop of test scenarios – and the AI translates them into code in the background.
Self-healing scripts
Scripts that detect changed elements (e.g., changed ID) and automatically update the selector using pattern recognition.
Identifying and testing APIs
1. Definition
APIs – Application Programming Interfaces are the meeting points between the various software components – Frontend and Backend, or between microservices. API testing tests the inputs (requests) and outputs (responses) at the protocol level, without the need for a user interface.
2. Interface Discovery Steps
Documented API specifications
OpenAPI / Swagger: A YAML/JSON file that describes all endpoints, parameters, response structures, and data models.
RAML / API Blueprint: Common alternatives to text-based description.
Dynamic Inspection (Runtime Inspection)
Using proxy tools (such as Charles, Fiddler, Mitmproxy) and tracking calls while the application is running.
Detecting new endpoints that may not be documented.
Source code
Search the codebase for Controllers in Spring Boot, Routers Express.js, etc., to discover internal or experimental endpoints.
3. Common tools
name | Short description |
Postman | A user-friendly platform for building, organizing, and running API calls, with JavaScript scripting support. |
Rest Assured | A Java library for writing intuitive API tests using DSL. |
HTTPie | A simple and convenient CLI tool for running HTTP calls and integrating into scripts. |
Swagger UI | A graphical interface that displays OpenAPI and allows for interactive testing from within the browser. |
Pact | Contract Testing library for Node.js/Java/.NET, with integration into CI environments. |
JMeter/Gatling/k6 | Load and performance tools with complex scenario options and throughput test runs. |
OWASP ZAP/Burp | Security tool for automatic scanning and penetration testing of HTTP/S interfaces. |
4. Integration in a CI/CD environment
Running tests as part of a pipeline
For every change made to the code, run a dedicated API testing step and scan reports (JUnit, HTML).
Threshold Gates
For example, if more than 1% of the tests fail, reject the code and allow deployment only after correction.
Reporting and automation
Integrate with Allure or ReportPortal to display insights and a live dashboard of API status over time.
5. Challenges and recommendations
Documentation coherence: It is important to ensure that any changes to the interface are documented in OpenAPI.
Versioning: Using URIs like /v1/…, /v2/… to avoid breaking existing consumers.
Mocking/Stubbing: In a development environment, create a copy of the API using WireMock or MockServer for consumer testing without touching the production environment.
Data Clean-Up: After create/update/delete checks, return the system to a clean state to prevent fragmented processes.
Test Results Analysis and Defect Triage
1. Definition
Test Results Analysis is the phase in which the output obtained from the test run—reports, logs, UI states, etc. is examined to understand what was updated, what defects were found, and their severity.
Defect Triage is a systematic process in which each defect found is prioritized, categorized, and assigned responsibility, in order to effectively manage the remediation and release of the fixes.
2. The stages of the process
Collecting results
Runtime reports from all Frameworks (Unit, Integration, API, UI).
System logs (application/server logs).
Performance reports if available (response times, memory/CPU).
Initial filtering (Filtering & Grouping)
Filtering false-positives (tests that failed due to environmental circumstances, not code errors).
Grouping recurring errors by stack trace, message, or location in the code.
Classification by severity and type
Severity levels – Critical, Major, Minor, Trivial.
Priority – Key: P0/P1 Must fix before release; P2 – Recommended; P3 – Postpone to next release).
Type – Functional, Performance, Security, Usability.
Assignment of positions
Who is responsible for the fix (Developer, DevOps, Security Team).
Determine who is responsible for validating the fix (QA) and who should close the ticket.
Effort Estimation
Estimating the time required for repair and re-inspection.
Integrate into the Sprint or Release backlog assessment.
Documentation and drawing conclusions
Input to bug management tools (Jira, Azure DevOps) with accurate documentation: steps to reproduce, system data, screenshot/log.
Publishing a weekly/daily Triage report to the product management and development circle.
3. Process support tools
category | Main tool | Job description |
Bug management | Jira, Azure DevOps, GitHub Issues | Fault documentation, assignment, status tracking and prioritization. |
Log analysis | ELK Stack, Splunk, Graylog | Collection, search, and aggregation of logs by run ID, error type. |
Dashboards and monitoring | Grafana, Kibana, Azure Monitor | Displaying failure trends, response times, and failure rates over time. |
Defect Triage Coordination | Confluence, Slack, Microsoft Teams | Conduct structured triage meetings, share reports and approvals. |
4. Best Practices
Report automation
Define reports that automatically centralize all failures in one report, with links to logs and screenshots.
Distinct versions
Tag bugs by the code version in which they were observed, to ensure consistency and easy recovery.
Effective Triage Meetings
Set a regular agenda: a short daily check-in (15 minutes) and a more detailed weekly triage meeting.
KPIs
Know the MTTR (Mean Time To Resolve), the size of the queues (Backlog), and the False Positives rate.
Sharing and training
Provide developers and the QA team with guidance and training to create a common language between the teams – for example, proper bug ticket drafting (clear steps, environment).
5. Challenges and recommendations for overcoming them
Loaded with bugs
Use auto-triage to cluster similar problems and avoid duplication.
Coordination between teams
Define SLA for Triage meetings and viewing the early Triage report.
Many False Positives
Track down the source of the FP (infrastructure, data instability) and address the underlying issue to improve test stability.
Regression prediction and test case selection
1. Definition and objectives
Regression Prediction: Using historical test run and bug data to predict which areas of the system are likely to experience problems following code changes.
Test Case Selection: Based on the prediction, select the most relevant test cases to run, to optimize CI/CD resources and shorten feedback times.
2. General process
Historical data collection
Data on all previous test runs: status (Pass/Fail), timing, code version, commit ID.
Bug logs: stack traces, log files, description of the commit in which the problem occurred.
Training a predictive model
Feature Engineering:
Generate features such as lines_changed, files_touched, historical_failure_rate for each file/module.
Choose a suitable algorithm:
Random Forest, XGBoost, or graph-based models like GNN – Graph Neural Networks that can map dependencies between components.
Training and cross-validation to measure prediction accuracy (precision/recall).
Running the model on the new changes
In each Merge Request/Pull Request, the system sends the information about the changes (diff) and commit data to the AI, and receives back
Mapping to Test Cases
Define a mapping between files in the system and test cases (using Metadata in scripts, such as @covers login.js).
Select only the scripts that cover the files with the highest score (for example, any script that receives a risk_score ≥ 0.7).
Running in a CI/CD environment
In GitLab/Jenkins/DevOps, an “Impact Analysis” phase is defined that checks which scripts to run based on the model's output.
3. Tools and technologies
category | Tool/Library | Description |
Models and ML Platforms | TensorFlow, PyTorch | Development and training of regression prediction models. |
Risk Analysis | Commercial/open source solutions for AI-based Test Impact Analysis. | |
Information about versions and changes | Git, GitHub API | Retrieving the diff and commit history for Feature Engineering. |
CI/CD Integration | Jenkins, GitLab CI, Azure DevOps | Integrating the Impact Analysis phase into the pipeline. |
Managing the relationship between code and scripts | Allure, TestRail | Tagging scripts and mapping them to components in the code. |
4. Best Practices
Constant updating of the model
Make sure to re-train periodically – for example, a month after the last goal – with new data.
Dynamic Threshold
Adjust the selection threshold according to pipeline load and recommended test times.
Log and transparency
Keep a log of the model's decisions and conduct A/B tests to monitor performance: Were only the relevant scripts actually run?
Fallback fill run
In cases of model failure or missing data, run all scripts to avoid missing critical bugs.
5. Challenges and recommendations for overcoming them
Dealing with big changes
A fundamental change in the code architecture (extensive refactoring) may mislead the model.
Recommendation: Mark large commits as "unsafe" and require a full run.
Lack of precise mapping between code and test cases
Scripts with no labeling or poor mapping may not be selected.
Recommendation: Enforce uniform script tagging (@component:auth, @module:payments).
Calculation and implementation costs
Model training is heavy on GPU resources and cloud costs.
Recommendation: Use fixed time machines or spot instances, and run a round-robin training.
Visual Testing
1. Definition
Visual testing involves identifying unexpected changes in the user interface (UI) by comparing screenshots across different versions. The goal is to catch visual deviations—such as shifting elements, incorrect colors, inconsistent fonts, and responsiveness issues—that might otherwise be missed during manual functional testing.
2. How it works
Baseline image collection
During the first test run, screenshots of all suspicious screens/components are taken.
The images are saved as “base images” to which we will compare in the future.
Repeated test run
With every code change, the UI scenario automatically runs and saves up-to-date images.
Comparison (Image Comparison)
Pixel-by-pixel or fuzzy matching algorithms compare the baseline to the current.
Identifies diff regions: regions where pixels differ beyond a defined sensitivity threshold.
Reporting results
Each deviation is displayed in the report with an overlay of the diff (a red block above the changing area).
Raises an automatic bug ticket (e.g., in Jira) if the threshold is exceeded.
3. Common tools
Tool / Framework | Description |
Applitools Eyes | An AI-based platform that uses Visual AI to detect intelligent deviations. |
Percy | A cloud service that integrates with CI for taking and comparing snapshots. |
BackstopJS | An open-source JavaScript tool for comparing images by Puppeteer. |
Selenium + OpenCV | Manual integration of Selenium for capture and OpenCV for advanced comparison. |
Storybook + Chromatic | Testing UI components built into the Storybook library in the Chromatic cloud. |
4. Integration in a CI/CD environment
Add a backstop reference step in the initial pipeline run (per branch or in controlled release).
5. Challenges and recommendations
challenge | recommendation |
False Positives | Adjust the MisMatch Threshold or use “IgnoreAreas” to remove dynamic areas (e.g. timer). |
Responsive support | Define multiple viewports and test on different screens (mobile, tablet, desktop). |
Support for animations and dynamics | Wait for a delay or block animations with CSS (eg, * { animation: none !important; }). |
Baseline Management | Keep a baseline in Git and use a separate branch to avoid conflicts. |
6. Best Practices
Partial Matching
Only test critical elements (headlines, forms, call-to-action buttons) instead of the entire page.
Visual AI
Prefer tools like Applitools that identify and don't bother with pixel thresholds, but rather understand "similar" in terms of structure and location.
Storybook Integration
If you're building UI components as separate packages, test them within Storybook in Chromatic for early testing.
Documentation and sharing
Distribute UI reports on a shared dashboard so QA and developers can quickly see what has changed.
Automated Baseline Approval
If the change is valid (for example, a new design), give QA managers the option to automatically approve a new baseline through CI.
7. Looking to the Future
AI-Driven Visual Analysis
Beyond pixel comparison: Identifying elements (buttons, text, images) and changes in their context.
Accessibility Visual Testing
Verifying color contrast, font size, and therapeutic elements using AI.
Self-Healing Snapshots
Infrastructures that can automatically “revive” a baseline when agreed-upon changes are developed instead of creating multiple images.
Challenges and concerns
1. Reliability of the results
Artificial intelligence models are not perfect, and in dynamic environments—where the UI, DOM, or data change at a high rate—they can be wrong.
False Positives: The system reports a visual or functional deviation when there is actually no real problem.
False Negatives (“Missing Faults”): A critical change goes under the radar because the model is not restrictive enough or because the threshold is set too high.
Effects:
Wasting time of QA engineers and developers on manual testing for approval and verification.
Lack of trust in the automated tool, which may lead the team back to old manual methods.
Risk of releasing critical bugs that were not fixed/diagnosed in a timely manner.
My ways
Dynamic adjustment of boundaries
Setting MisMatch Threshold at varying levels for different components: buttons and input fields have a low threshold, “cosmetic” design elements have a higher threshold.
Using Visual AI tools
Moving beyond presenting raw pixel-agnostic methods to methods based on identifying elements and understanding the DOM structure (such as Applitools), which reduce false positives.
Controlled human supervision
Each automated critical deviation report undergoes limited human peer review before upgrading the baseline or opening a bug ticket.
Monitoring model performance
Checking the percentage of false positives/negatives over time and triggering alerts when they rise above a reasonable threshold.
2. Data and Training Requirements
Machine learning models need large amounts of training data:
Test history data
Bug history data
Effects:
Manually collecting and labeling data is challenging and requires a lot of effort.
Risk of inconsistent data (e.g., variable naming conventions), which harms training quality.
Privacy and regulatory issues when using real user data — GDPR, ISO 27001.
Ways to cope:
Early planning of data labeling
Defining uniform templates for documenting test results and bugs (required fields, clear formats).
Anonymization and aggregation
Removing personally identifiable information (PII) and sensitive data before training, using hashing or aggregation.
Automated Data Pipeline
Automatic connection between the CI/CD tools and a data validation system that exports and stores clean data in a dedicated repository.
Collaborations with security and regulatory teams
Building GDPR-compliant processes for collecting and using data, including signing Data Processing Agreements (DPA).
3. Implementation and maintenance
Integrating AI tools into existing infrastructures often requires:
Adjustments to the system architecture (microservices, data pipelines).
Cultural change in teams — new workflow processes and writing metadata for scripts.
A process of continuously updating and upgrading the models to adapt them to innovations in the project and technology.
Effects:
High development and DevOps costs for designing data infrastructure and writing integration layers.
A new learning curve for QA and development teams — the need for training and memorization of new practices.
Risk of "model obsolescence" that does not improve over time due to a lack of defined re-training processes.
Ways to cope:
Proof of Concept – Rated POC
Start with a small project or one component and demonstrate clear business value before expanding.
Modular infrastructure
Building AI components as independent microservices, which communicate through clear APIs, to avoid a broad impact on the system.
Automatic re-training process
Define a separate Job in CI whose purpose is to collect new data, retrain the model, and deploy it similarly to canary releases.
Documentation and internal support
Create a Wiki or Confluence with user guides, architecture diagrams, and error handling guidelines.
Costs and ROI
Significant initial investment in model development and training:
Hardware acquisition (GPU/TPU), cloud costs (compute/storage).
License costs for commercial tools.
Development resources and QA tools for data collection and operations.
Effects:
Difficulty justifying a budget to management when benefits are not directly measured (time reduction, bug prevention).
Complex measurement of return on investment in implementing the use of AI, due to the qualitative and asymmetric nature of the improvement.
Ways to cope:
Measuring clear KPIs
Determine metrics such as:
Decrease in MTTR for UI issues
Visual inspection coverage percentage
Average time saved to create a test script
Track these metrics before and after implementation.
Focus on Quick Wins
Choose simple, high-ROI scenarios (e.g., smoke tests for criticality) to demonstrate value quickly.
Integrated models
Combining simple open source tools (BackstopJS) with advanced tools only for critical parts, to regulate costs.
Gradual financing
Set up budget monitoring and alerts for crossing cost thresholds, and create periodic ROI reports that reflect improvements.
A look into the future of AI in QA and testing
Full autonomy in testing scenarios
The next generation of tools will build and run end-to-end scenarios without human intervention. The system will independently detect edge conditions, interact with UI/API, and issue detailed error reports.
A combination of agents and smart RPA, which will complete repetitive tasks—such as filling out forms, checking emails, transferring files—without the need for manual scripts.
Scripting using natural language (NL2Test Code)
From now on, product or QA people will write “Given/When/Then” or simply write scenario descriptions in free language, and the AI will convert them directly into functional test code in Selenium, Playwright, Postman or any other framework.
The ability to combine natural language processing with DOM structure recognition will enable support for even complex applications (Desktop, Mobile, Web).
Self-Healing Tests and Self-Adaptive Pipelines
Scripts that will identify broken selectors or moved elements, and fix them “on the fly” by dynamically searching for elements based on patterns or similar attributes.
Pipelines that will adapt themselves through automation: If the AI detects multiple bugs in a particular module, additional testing in that module will be prioritized over testing of stable modules.
Reinforcement Learning for Stress Scenarios
Agents will autonomously investigate unexpected extreme scenarios—such as service collapse, abnormal load, or lateral slowness—and develop new scenarios to test the system's stability at levels never before tested.
Using “reward functions” will guide the agent to better learn which scenarios are important and in what order to run for testing.
Synthetic Monitoring & Testing
Systems that will generate complex synthetic test data and scenarios based on real user traffic, to perform load simulations and test services in a staging/production environment without interfering with customer data.
AI observability integration – will analyze network matrices and traces, identify obstacles, and generate automated test scenarios to reproduce problems.
AI-Assistants for QA and development teams
Chatbots will answer questions in real time (for example: “What scenario is relevant for this code change?”), Suggest test fixes, provide sample code, and contribute to script maintenance.
IDE integration: AI-powered development that will detect changes in test code and offer local updates or coverage feedback.
Intelligent Maintenance Automation
Continuous monitoring of test results and upgrade suggestions for old or redundant scripts based on usage statistics.
When the AI tool detects scripts that exceed Page Object Model disciplines, the tool will suggest refactoring and produce cleaner, more maintainable code.
No-Code/Low-Code Testing Platforms
Advanced GUIs where users will “drag” visual components, thus creating complex tests. The AI will translate the graphical flows into optimized and readable code.
Infrastructures that will be imported from existing projects (such as Postman Collections or TestRail) and transformed into editable graphical flows.
Automated Security and Compliance
The AI tool will review regulatory compliance (GDPR, HIPAA) at every stage: identify whether checks mention personal information, assign GDPR-aware tags, and generate compliance reports.
Automated scanning of security standards, not only functional security tests, but also penetration simulation tests.
Advanced Measurement and Analytics
Smart dashboards with predictive analytics: which test scenarios are likely to fail in the future, the estimated ROI, and recommendations for adding coverage in weak areas.
MLops integration: Automation of model training cycles, model version management, and A/B testing between different versions to find the best fit for the system.
Summary
Artificial intelligence is indeed a turning point at the core of QA and software testing processes: it enables high-speed automated script generation, deep insights from error data, and the identification of bug trends before they are discovered in production. Today, it interfaces with API testing and Visual Testing tools to perform both functional testing and intelligent display comparisons, and implements regression prediction algorithms and selects only the relevant tests.
Looking ahead, we could see these capabilities expanded to be fully integrated into CI/CD pipelines — from intelligent “gatekeepers” that decide when to run certain tests, to autonomous agents that will prepare test environments and install scripts “independently,” to the use of reinforcement learning to create previously unimaginable end-case scenarios. In addition, LLM-based virtual assistants will closely accompany development and QA teams, accelerating bug investigation, script writing, and maintenance.
Organizations that integrate technologies in a balanced manner—while ensuring infrastructure planning, proper model training, proper integration of the human factor, and systematic measurement of ROI—will produce dramatic improvements in reliability and release speed, reduce operating costs, and continuously improve the product experience, and will take their place at the forefront of innovation in the global software industry.



Comments