Continuous Accessibility is defined as the approach to ensuring that code intended to be displayed in browsers can be continuously checked and monitored for digital accessibility requirements through a novel application of existing software engineering concepts and Web Content Accessibility Guidelines (WCAG).
There is not a sufficient way to quantify the full scope of digital accessibility (a11y) work, and a public recommendation for the quantification of accessibility work does not currently exist.
Additionally, these challenges frequently exist:
- Limited visibility into the audit process and results. A11y audit results are typically only available for a few people. The results/reports themselves are often complex or difficult to use.
- Incomplete audit process. Sometimes there is not enough context provided on an accessibility scorecard or dashboard, so folks don’t have the correct insights needed to improve.
- Lack of sufficient methods to prevent regressions for tested code; while technical solutions for regression and snapshot testing exist, they are largely ineffective because the solutions are not trained to recognize regression significance. Additionally, they do not often have policy support.
- Static (linting) and dynamic (testing) checks can be turned off at the application level, at the code line level, and in other places.
- Lack of incentive for developers to create accessible code. Development practices have yet to align in such a way that accessibility is considered as an integral part of shipping feature work.
- Lack of metrics. Subject matter experts have the lived experience to know what kind of work needs to be done, but need the numbers to back them up. Clearly-defined metrics will give management and product teams the numbers they need to both understand and support this work.
- The work is complex. In this case, complexity is calculated by the number of interactions that must take place for the work to be completed, the amount of time it will take to complete the project, and that some of the methods still need to be invented, since some do not currently exist.
- Inability to benchmark. Accessibility is in a different kind of zone from other software engineering specialties, due to the legal requirements for digital accessibility. Avoiding the risk of legal liability is just one of the potential reasons that no public, industry-wide benchmarks exist.
Digital accessibility can be hard. Let's automate it.
Comprehensive accessibility conformance for digital products represents an unsolved, complex problem and known to be without a reliable, transparent route to success. Further adding to the complexity, standards are still being developed for native applications on devices such as mobile phones and wearables.
In practice, achieving a given WCAG criterion requires that a set of actions, implementations, or policy updates that individually ensure for some portion of the success criterion (e.g., a particular use case) is met. As such, the most effective means of generating rapid, high-impact accessibility improvements for digital products is to deconstruct a WCAG success criterion into a subset of accessibility quality-control mechanisms: action steps that can be taken to comprehensively achieve an accessible experience.
The process by which efficient improvements can be achieved matches that of the software development process in general-- tightly scoped, iterative, and distributed.
A Continuous Accessibility strategy, then, can be divided into three parts:
- A plan for the code that already exists
- A plan for the code that will be created
- A method to measure and report progress
I will only briefly cover the first two (at least for now), since I have previously spoken about this topic, most recently at GitHub Universe, and the slides are available on my noti.st profile.
The code that already exists
Any plans made to improve accessibility in digital products should include plans for the code that already exists. It is necessary to consider the age of the code base. How long has that code been around? This will probably lead to conversations about dependencies that need to be upgraded.
How do we plan for upgrades? It’s great if the accessibility of a common dependency is improved, but what does this mean for existing users? What does the upgrade path look like? This will probably lead to conversations about the ease…or not…of delivering the latest and greatest.
Delivery of new features or developer tools also needs to be carefully considered. Backwards compatibility and stability are important and, let’s face it, sometimes overlooked in our race to build new awesome things. Unless this has been done thoughtfully and purposefully, the very people we are trying to empower may reject what they are being offered.
The code that will be created
Let’s talk about future code. It is important to plan for future code, as failure to do so increases the likelihood of repeating past mistakes.
Who knows what will come next? The way code was written ten years ago is different from the way code is written today...and who knows how code will be written ten years from now?
So what does a strategy need to prepare for these unknown unknowns?
Let’s consider the principles of continuous delivery, especially the third one: computers perform repetitive tasks, people solve problems. When engineers solve the problem of automating accessibility checks, computers can perform the repetitive tasks of automatically checking code.
Goodhart's law: When a measure becomes a target, it ceases to be a good measure.
On to the third key part of this strategy- plan to measure and report progress. Right now, there is no shared, public standard for metrics in accessibility engineering; my vision is to change that. Metrics play an essential part of a Continuous Accessibility strategy. After all, metrics and reporting is how the business is empowered the to provide essential things. Things like time. Things like budget. Things like vocalization of priorities. When ways to measure this work are provided, a gap that has been historically unfilled and ignored can be bridged.
These are the metrics I think would be reasonable to consider when creating a progress-tracking strategy.
Metric: Total Criteria Count Actionable Outcome(s): Goal Setting
The Total Criteria Count (TCC) is the first item to establish as it will be the basis for all other metrics. It is also likely to take the most amount of time to establish (relative to other metrics). An exact count of all WCAG criteria with which a digital product must comply must be established- with the full understanding that this count may (some may even argue, should) change over time. These changes should be tracked.
The Total Criteria Count should be an itemized list that includes:
- WCAG Success Criteria
- Criteria that are not already reflected in WCAG criteria (perhaps related to tooling, not covered by web content guidelines)
- Known techniques
- Common failures
- Failures as identified by audit findings
As the ability to identify root causes increases, we should be able to add to this list. The more detail we have, the better it will be, since success criteria cover generalities rather than specifics. For example, WCAG 1.3.1 (Info and Relationships) is a single success criterion but relates to 25+ different failure scenarios. It may not be meaningful enough to simply indicate a decrease in issues related to a single WCAG success criterion; the metrics may need to be more granular than that.
The list MUST be easily available and have the ability to track change over time.
Metric: Automation Capabilities Actionable Outcome(s): Problem Analysis, Goal Setting
We must then identify additional baseline numbers from TCC line items, related to automation:
- Automated Linting Criteria Count (ALCC): Criteria for which we can provide automated linting (static analysis).
- Automated Testing Criteria Count (ATCC): Criteria for which we can provide automated testing (dynamic analysis).
- Developer Test Criteria Count (DTCC): Criteria that require developer-authored tests.
- Manual Test Criteria Count (MTCC): Criteria that require manual testing.
We should determine which line items from TCC currently require manual testing but could reasonably have automated tests. This can help to determine future work. This number may also change over time, given a few variables:
- Reduction in number of ways to approach any given UI component (either self-imposed or via engineering policy)
- Deeper analysis of a line item may show that the status may change (from “possible to automate” to “manual check” or vice versa)
- Changes to the success criteria (as WCAG grows to include devices and not just web, etc.)
Metric: Audit Results Actionable Outcome(s): Problem Analysis, Trend Development
If apps are audited and scorecards are produced, these metrics are indicators of overall progress. As such, they could be analyzed from a few angles:
- Total Bug Count (TBC): Further breakdowns may be justifiable, such as by Line of Business or by WCAG Success Criteria
- Valid Bug Count (VBC): It may be more meaningful to look at VBC than TBC, or at least normalize the data to account for the mean)
- Bug Severity Count (ISC): For bugs that are violations of WCAG conformance, we can categorize based on impact to the user: Severe, Critical, Major, and Minor.
- Time To Resolution (TTR): How long does it take each team to resolve violations? What percent of the resolved violations are within the SLA for resolution? What are the blockers to violation resolution (is there a recognizable pattern that could inform tooling)? What is the severity of those blockers?
- Relative Incident Frequency (RIF): Is there a WCAG violation that seems to appear more than the others?
- Conformance Exemption Count (CEC): What LOBs have exemptions? How many? Why? This can inform process improvements.
- Linting Automation Suppression Count (LASC) and Testing Automation Suppression Count (TASC): While the validity of suppressions should be examined, too many suppressions could be an indication that issues are being hidden instead of resolved.
It is necessary to understand the audit process. This will show how the proposed metrics can be useful both now and in the long-term. Typically, manual testing is done on specific workflows for each product, and those numbers are then fed into an audit report. This means that it may be useless, at first, to have a process to deal with an uptick in reported issues, since it only indicates that the audit has occurred.
However, it should be possible to implement improved monitoring with an increase in automated testing capabilities. This will allow teams to be notified sooner if a product’s digital a11y conformance deteriorates too rapidly (Conformance Deterioration Rate (CDR)).
Additionally, by obtaining the baseline metrics for audits, we should be able to identify training opportunities for each team. Teams with high audit numbers should automatically qualify for a11y training (A11y Training Threshold (ATT)).
These audit-related metrics could also inform processes from a business/legal perspective, as they can be used to quantify risk (of legal action, costs, and other things). When the risk is quantified, the reduction in that risk by taking specific steps to remedy the issues can also be determined; while this may be a business justification, the result is that users have an improved default experience.
Finally, and perhaps most importantly, these numbers can be used to see what the persistent issues are. By this, I mean, which violations happen the most often, but lack a meaningful automated way to test or prevent these issues? In these instances, a body of work where the component (simple or complex) is built around the ability to test specific accessibility criteria could be considered. A horizontal initiative (or other appropriate action) could then be used, so teams could refactor their code to use the accessible replacement component.
Metric: Internal Developer A11y Training Actionable Outcome(s): Process Improvement
Potential items to track:
- Total UI Developer Count (TDC): How many developers are contributing UI code?
- Training: Has a developer has received a11y-specific training while at the company?
- Champion: Has the developer has gone through something like an a11y champions program?
It is logical to infer that required a11y-focused training for FE developers would reduce the number of accessibility issues in a given codebase.
When used in conjunction with tracking the TBC (or VBC) per line of business (LOB) or product team, we can determine if there is a reduction in reported accessibility issues for that product for teams that have received training.
This data could be made available to promotion committees to be used in craft quality measurements. For developers who have received training, is there an improvement in code quality? If so, we will be able to make the case for scheduled, mandatory accessibility training for engineers. Furthermore, if the data suggests that teams with alumni of an A11y Champions Program produce products with greater quality, there can be a business value that can be assigned to the investment in the program.
Metric: Internal Developer Support Actionable Outcome(s): Problem Analysis, Trend Development
One of the ways we could identify common problem areas for developers is by developing the means to analyze discussion channels internally. With sentiment analysis (or similar), we could determine what the commonly asked questions are, for two reasons:
- An automated tool or bot that could provide answers to common questions or provide links to relevant reference materials would reduce the amount of manual support required.
- The commonly asked questions could also be indicative of a problem area and could inform future tooling work. If we see the same question asked 100 times, it’s an indication that a broadly applied solution could be useful.
- Tracking chat trends over time could indicate trends.
Think SRE, but for A11Y.