Methodology

How we score AI exposure

FYHR's automation and augmentation scores are product-specific exposure estimates inspired by O*NET-based AI-exposure research. We decompose a role into concrete tasks, score each task on four dimensions, and aggregate the results using explicit task weights. The approach is informed by the GPTs are GPTs task-exposure rubric, the Suitability for Machine Learning framework, the AI Occupational Exposure (AIOE) literature, and O*NET — but it does not reproduce any published index exactly. Treat the headline numbers as a transparent house rubric, not a peer-reviewed labor-market forecast.

1. Decompose the role into tasks

Every role gets broken into 9–15 concrete tasks across three buckets: tasks AI handles end-to-end, tasks where you and AI collaborate, and tasks that stay human. This task decomposition is inspired by task-based and occupation-based exposure studies — notably Eloundou et al. (2024), "GPTs are GPTs" in Science, and the Felten, Raj & Seamans (2021) AIOE framework — but the specific task list and rubric below are ours, not theirs.

2. Rate each task on four dimensions (0–4)

Routinization (R)
How repeatable and rule-based is the task? Filing the same expense report each week scores high; negotiating a contract scores low.
Data availability (D)
Does AI have the inputs it needs — text, code, images, structured data? A task that hinges on tacit context the model can't see scores low.
Judgment required (J)
How much human judgment, ethics, or accountability does it take? This is an inverse signal — high judgment lowers automation risk.
Presence required (P)
Does it need a body in a room or a real human relationship? Also inverse — physical / interpersonal presence resists automation.

3. Compute per-task scores (deterministic)

automation(task) = (R + D) / 8 × (1 − J/4) × (1 − P/4)
augmentation(task) = D/4 × max(J/4, P/4) × (1 − automation)

Both scores are bounded 0–1. A task scores high on automation only when it's repeatable and AI has the data and human judgment / presence are low. Augmentation is highest when AI has the data but a human still has to own the call.

These are design choices, not literature outputs. By construction, automation falls to exactly zero whenever J = 4 or P = 4 — meaning maximum required judgment or in-person presence zeroes out automation regardless of how routine the task is. That's a strong normative bet on what makes work resist AI; reasonable people could weight these factors differently.

Worked example

Two paralegal tasks, same rubric

Drafting a standard NDA from a template

R4/4D4/4J1/4P0/4

automation  = (4 + 4) / 8 × (1 − 1/4) × (1 − 0/4)
            = 1.00 × 0.75 × 1.00
            = 0.75   →  75%

augmentation = 4/4 × max(1/4, 0/4) × (1 − 0.75)
            = 1.00 × 0.25 × 0.25
            = 0.06   →  6%

High automation — AI handles end-to-end.

Interviewing a distressed client about a custody dispute

R1/4D1/4J4/4P4/4

automation  = (1 + 1) / 8 × (1 − 4/4) × (1 − 4/4)
            = 0.25 × 0.00 × 0.00
            = 0.00   →  0%

augmentation = 1/4 × max(4/4, 4/4) × (1 − 0.00)
            = 0.25 × 1.00 × 1.00
            = 0.25   →  25%

Stays human — judgment + presence dominate.

These two tasks live under the same job title. The role-level automation_score is the time-weighted average — so a paralegal who spends 60% of their week on template work and 40% on client intake lands somewhere in between, with a wider task heterogeneity band because the tasks disagree.

4. Estimate time share and weight

The model also estimates each task's share of the role's time (a kind of "where does the day actually go"). Weights are normalized to sum to 100%, and the role-level scores are time-weighted sums:

automation_risk = Σ time_share_i × automation_score_i
augmentation_potential = Σ time_share_i × augmentation_score_i

5. Task heterogeneity band (the ± number)

We compute a time-weighted standard deviation of per-task automation scores and present it as a ± range (clamped to ±3 to ±18 percentage points). A role with consistent task ratings gets a tight band; a role that mixes very automatable and very human work gets a wider one.

This is not a statistical confidence interval in the repeated-sampling sense — it doesn't quantify uncertainty around a population parameter. It's a dispersion measure that tells you how internally mixed the task profile is. Read it as: "how much do the tasks inside this role disagree with each other?"

6. O*NET anchor

Each map is mapped to its closest O*NET-SOC code — the U.S. Department of Labor's standard occupation taxonomy — so your role connects to a real labor-market reference, not a free-form string.

Note that our time-share weighting is a product choice that goes beyond O*NET's native core-vs-supplemental task convention (which is based on relevance and importance thresholds, not estimated time). It's defensible as a modeling decision, but it is additional structure on top of O*NET, not a feature of O*NET itself.

Sources

Methodological foundations:

Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2024). GPTs are GPTs: Labor market impact potential of LLMs. Science, 384(6702). Preprint: arXiv:2303.10130 (we rely on the preprint appendix for rubric-level detail).
Brynjolfsson, E., Mitchell, T., & Rock, D. (2018). What can machines learn, and what does it imply for the workforce? AEA Papers and Proceedings — the Suitability for Machine Learning (SML) framework, the closer methodological ancestor of the GPTs task-exposure rubric.
Felten, E., Raj, M., & Seamans, R. (2021). Occupational, industry, and geographic exposure to artificial intelligence (AIOE). Strategic Management Journal. The authors explicitly state AIOE is agnostic about whether AI substitutes for or complements labor — citing it as direct support for an "automation risk" score would overclaim.
U.S. Department of Labor / ETA, O*NET OnLine (onetonline.org) — task, ability, and SOC taxonomy.

Career-guidance reference (not a methodological foundation):

U.S. Bureau of Labor Statistics, Occupational Outlook Handbook — used for role descriptions and outlook context, not for the exposure scores themselves.

Known limitations

Per-task ratings (R, D, J, P) and the task list itself are generated by a language model against our rubric. So the end-to-end pipeline is "model-generated task judgments + deterministic aggregation," not deterministic from first principles.
The headline number is best read as technical automation exposure — how much of the work AI could do — not a labor-substitution forecast about hiring, wages, or whether a specific employer will adopt AI.
The "2–3 year horizon" is our near-term forecast window, not a claim made by any of the cited papers. The GPTs paper in particular explicitly does not predict adoption timelines.
The O*NET code is the model's best mapping — not yet validated through O*NET Web Services or the O*NET Code Connector.
The ± band is task dispersion, not a statistical confidence interval (see Step 5).

Ready to map your role?

Generate a career map