How we score AI exposure
FYHR's headline outputs — automation exposure and augmentation potential — are computed using a fixed aggregation function applied to model-generated task estimates. We decompose a role into concrete tasks, score each task on four dimensions, and aggregate the results using explicit task weights. The approach is informed by published AI exposure research — the GPTs are GPTs task-exposure rubric, the Suitability for Machine Learning framework, the AI Occupational Exposure (AIOE) literature, and O*NET — but it is not a direct implementation of any single academic model. Treat the headline numbers as a transparent house rubric, not a peer-reviewed labor-market forecast.
Two layers, one map
The top row is what you do. The bottom row is what happens under the hood — the math that turns a job title into a defensible score.
What you do
- 01
Type a job title
Anything from “graphic designer” to “mid-level data scientist.” Plain English.
- 02
AI maps the role
We break the job into 9–15 concrete tasks across automated, augmented, and human work.
- 03
See the split
A clean dashboard shows automation risk, augmentation potential, and outlook at a glance.
- 04
Skills + chat
Get a focused list of skills to learn and ask follow-ups about your specific situation.
What we do under the hood
- 01
Anchor to O*NET
Each role is mapped to a U.S. Department of Labor occupation code and live-validated.
- 02
Rate every task (R · D · J · P)
Routinization, Data availability, Judgment, Presence — each scored 0–4 against our rubric.
- 03
Compute time-weighted scores
Per-task scores are weighted by estimated time share, then summed into role-level numbers.
- 04
Show the dispersion (±)
A heterogeneity band tells you how much the tasks inside the role disagree with each other.
The full math, sources, and known limitations are spelled out below. This overview is the map; the rest of the page is the territory.
Want to see this on a real role?
Map a role1. Decompose the role into tasks
Each role is decomposed into 9–15 representative tasks generated by a language model and guided by O*NET-aligned role context. Tasks are split across three buckets: tasks AI handles end-to-end, tasks where you and AI collaborate, and tasks that stay human. This task-based approach is inspired by prior work — notably Eloundou et al. (2024), "GPTs are GPTs" in Science, and the Felten, Raj & Seamans (2021) AIOE framework — but the specific decomposition here is generated per query and is not sourced directly from those datasets.
2. Rate each task on four dimensions (0–4)
Each task is scored on four ordinal dimensions (0–4 scale), using structured prompts with anchored definitions:
Routinization (R)
How repeatable and rule-based is the task? Filing the same expense report each week scores high; negotiating a contract scores low.
Data availability (D)
Does AI have the inputs it needs — text, code, images, structured data? A task that hinges on tacit context the model can't see scores low.
Judgment required (J)
How much human judgment, ethics, or accountability does it take? This is an inverse signal — high judgment lowers automation risk.
Presence required (P)
Does it need a body in a room or a real human relationship? Also inverse — physical / interpersonal presence resists automation.
The 0–4 scale is ordinal, not interval. Arithmetic operations applied to these values assume approximate linear spacing for practical modeling purposes.
Each dimension is defined using anchored behavioral criteria to reduce rating variance; however, ratings remain model-estimated rather than independently human-validated.
3. Compute per-task scores
automation(task) = (R + D) / 8 × (1 − J/4) × (1 − P/4) augmentation(task) = (D/4) × max(J/4, P/4) × (1 − automation(task))
This functional form assumes:
- R and D contribute linearly and equally to automation potential.
- J and P act as multiplicative suppressors of automation.
- Maximum judgment or presence (a score of 4) reduces automation potential to zero by construction.
These assumptions reflect a modeling choice rather than an empirically calibrated relationship and may be revised as validation data becomes available.
The augmentation formula is heuristic. It assumes augmentation potential rises with data availability and at least one human-dependent dimension (judgment or presence), and falls as tasks become fully automatable.
Two paralegal tasks, same rubric
Drafting a standard NDA from a template
automation = (4 + 4) / 8 × (1 − 1/4) × (1 − 0/4)
= 1.00 × 0.75 × 1.00
= 0.75 → 75%
augmentation = 4/4 × max(1/4, 0/4) × (1 − 0.75)
= 1.00 × 0.25 × 0.25
= 0.06 → 6%High automation — AI handles end-to-end.
Interviewing a distressed client about a custody dispute
automation = (1 + 1) / 8 × (1 − 4/4) × (1 − 4/4)
= 0.25 × 0.00 × 0.00
= 0.00 → 0%
augmentation = 1/4 × max(4/4, 4/4) × (1 − 0.00)
= 0.25 × 1.00 × 1.00
= 0.25 → 25%Stays human — judgment + presence dominate.
These two tasks live under the same job title. The role-level automation_score is the time-weighted average — so a paralegal who spends 60% of their week on template work and 40% on client intake lands somewhere in between, with a wider task heterogeneity band because the tasks disagree.
4. Estimate time share and weight
Role-level scores are computed as time-weighted averages of task-level scores, where task time shares are estimated by the model and normalized to sum to 100%.
automation_risk = Σ time_share_i × automation_score_i augmentation_potential = Σ time_share_i × augmentation_score_i
Time allocation is inferred rather than observed and introduces an additional source of uncertainty in role-level scores.
5. Task heterogeneity band (the ± number)
The ± value represents a time-weighted dispersion measure (standard deviation) of task-level automation scores. The displayed dispersion is capped between ±3 and ±18 percentage points for interpretability. A role with consistent task ratings gets a tight band; a role that mixes very automatable and very human work gets a wider one.
Because of this capping, interpret the displayed value as a bounded heterogeneity indicator rather than a raw statistical estimate.
This is not a statistical confidence interval in the repeated-sampling sense — it doesn't quantify uncertainty around a population parameter. It's a dispersion measure that tells you how internally mixed the task profile is. Read it as: "how much do the tasks inside this role disagree with each other?"
6. O*NET anchor
Each map is programmatically matched to the closest O*NET-SOC occupation — the U.S. Department of Labor's standard occupation taxonomy — using keyword and similarity-based search via O*NET Web Services. We then confirm the code against O*NET in real time:
- Verified by O*NET — the code exists and either matches O*NET's own top search hit for the job title, or its canonical title overlaps the input meaningfully. The map page surfaces a green pill linking straight to the O*NET occupation summary.
- Best-guess match — the SOC exists but O*NET's top hit for the title is a different occupation. We surface the alternative in the badge tooltip so you can judge for yourself.
- Unverified SOC — the suggested code doesn't exist in O*NET. We fall back to O*NET's top match for the title and show it in the tooltip.
This mapping confirms alignment with an existing occupation code but does not guarantee semantic equivalence between the generated role description and the official O*NET definition.
Note that our time-share weighting is a product choice that goes beyond O*NET's native core-vs-supplemental task convention (which is based on relevance and importance thresholds, not estimated time). It's defensible as a modeling decision, but it is additional structure on top of O*NET, not a feature of O*NET itself.
Interpreting these scores
These scores reflect technical task-level exposure to AI capabilities — not realized adoption, job loss, or labor market outcomes.
Organizational, regulatory, economic, and behavioral constraints are not modeled and may materially affect real-world outcomes.
Sources
Methodological foundations:
- Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2024). GPTs are GPTs: Labor market impact potential of LLMs. Science, 384(6702). Preprint: arXiv:2303.10130 (we rely on the preprint appendix for rubric-level detail).
- Brynjolfsson, E., Mitchell, T., & Rock, D. (2018). What can machines learn, and what does it imply for the workforce? AEA Papers and Proceedings — the Suitability for Machine Learning (SML) framework, the closer methodological ancestor of the GPTs task-exposure rubric.
- Felten, E., Raj, M., & Seamans, R. (2021). Occupational, industry, and geographic exposure to artificial intelligence (AIOE). Strategic Management Journal. The authors explicitly state AIOE is agnostic about whether AI substitutes for or complements labor — citing it as direct support for an "automation risk" score would overclaim.
- U.S. Department of Labor / ETA, O*NET OnLine (onetonline.org) — task, ability, and SOC taxonomy.
Career-guidance reference (not a methodological foundation):
- U.S. Bureau of Labor Statistics, Occupational Outlook Handbook — used for role descriptions and outlook context, not for the exposure scores themselves.
Limitations
This system is a structured estimation model with the following limitations:
- Model-generated inputs. Task decomposition, time allocation, and dimension ratings are generated by a language model and may vary with phrasing, model version, or prompt structure.
- Unverified reliability. Test–retest stability, inter-rater agreement, and prompt sensitivity have not yet been formally benchmarked.
- Heuristic functional forms. The automation and augmentation formulas encode modeling assumptions that are not empirically calibrated against observed labor outcomes.
- Ordinal scaling. Dimension scores (0–4) are ordinal but are treated as approximately linear for aggregation.
- Incomplete construct coverage. Factors such as error tolerance, regulatory constraints, and implementation costs are not explicitly modeled.
- O*NET alignment limits. SOC mapping improves standardization but does not guarantee precise occupational equivalence.
- Non-predictive scope. Scores represent technical feasibility, not timelines, adoption rates, or economic displacement.
Validation roadmap
To improve reliability and validity, future iterations will benchmark:
- Test–retest stability across repeated runs.
- Sensitivity to prompt phrasing and role descriptions.
- Agreement between model-generated and human-rated scores.
- Alignment with O*NET task importance and frequency data.
- Sensitivity of outputs to changes in task weights and dimension scores.
These evaluations will inform calibration and potential revision of scoring functions.
Ready to map your role?
Generate a career map