How MRM Method Builder normalizes inputs and assembles GC-QQQ output
MRM Method Builder converts compound names or CAS numbers into workflow-aware GC-QQQ output by normalizing inputs, routing requests by family, and returning vendor-neutral transition rows with method metadata where available.
This page describes the current production logic. It explains how pesticide, environmental, and odor/VOC workflows use different dataset paths and why final lab validation still remains necessary.
Accept list inputs
The workflow starts with a user-supplied list of compound names or CAS numbers.
Normalize entries
Each item is resolved to a canonical internal record when a confident match exists.
Route by family
Family selection controls which dataset and method logic path are applied.
Assemble output rows
Matched compounds are expanded into vendor-neutral transition rows with method context.
Return audit metadata
The response includes preview state, counts, and unsupported cases where relevant.
Input normalization
Input normalization starts by trimming empty lines, deduplicating repeated entries, and attempting to match each item to a canonical internal compound record.
For non-odor workflows, the current implementation searches the main GC dataset in data/database.csv using CAS numbers, compound names, and supported alternate naming fields. For odor workflows, the resolver uses the odor dataset and maps names or CAS values to a canonical odor CAS key.
If an input cannot be matched confidently, it remains unmatched. If multiple aliases resolve to the same canonical CAS, the system keeps one matched record and reports the rest as duplicates rather than inflating the result set.
Why this matters
Users trust the build result only if unmatched and duplicate cases stay visible during review.
This is why the current logic favors explicit unmatched reporting over silent fallback matching.
Family routing and active data sources
| Workflow family | Normalization source | Build source | Method behavior |
|---|---|---|---|
| Pesticides | Main GC dataset data/database.csv | Main GC dataset data/database.csv | Canonical GC method mapping selects RI fields and method fingerprints. |
| Environmental | Main GC dataset data/database.csv | Main GC dataset data/database.csv | Shares the current canonical GC method handling used by the main dataset path. |
| Odor / VOCs | Odor resolver data/odor/odor-dataset.json | Odor transitions + RI/RT metadata data/odor/odor-dataset.json | Method-aware support changes by WAX vs 5ms and reports unsupported compounds explicitly. |
Family selection changes the actual processing path. In the current codebase, pesticide and environmental workflows share the main GC dataset, while odor workflows use a separate odor-specific path generated from the odor source workbook.
Method-aware build logic
After normalization, the build step assembles vendor-neutral output rows for the selected family and method context. On the main GC path, canonical method IDs are resolved through the method-mapping layer. That method context is then used to select the relevant RI field and attach a method fingerprint to the result.
On the odor path, the selected method controls compound availability, transitions, RI/RT metadata, RT windows, and unsupported-compound reporting. In the current project snapshot, WAX covers the full odor dataset while 5ms supports most, but not all, odor compounds.
- WAX (PEG, polar)
- 5ms (5% phenyl, low-polarity)
Returned row content
The build response returns export-oriented rows, not a final instrument method file.
- compound identity and CAS
- precursor and product ions
- quantifier / qualifier role and relative intensity
- RI references and RT window defaults when available
- column phase, flow mode, oven program, and inlet context
Preview and credits
Demo mode and limited-access sessions may receive restricted previews. Logged-out users and users without credits can also see preview-only output.
These limits change how much preview data is shown, but they do not change the normalization and family routing logic used by the build pipeline.
Unsupported vs unmatched
Unmatched inputs are entries the system cannot map confidently to an internal record.
Unsupported compounds are matched compounds that exist in the broader library but are unavailable in the selected method context, especially in odor workflows.
Validation still required
Method Builder accelerates list preparation and method drafting. It does not replace laboratory validation.
Users still need to confirm RT behavior, ion ratios, matrix effects, acquisition settings, and acceptance criteria in their own environment.
Source files described by this page
data/database.csvdata/odor/odor-dataset.jsondata/methods.json + lib/utils/gcMethods.tsMethodology FAQ
How does the platform match names and CAS numbers?
Inputs are trimmed, deduplicated, and matched to internal compound records. Non-odor workflows use the main GC dataset, while odor workflows use an odor-specific resolver that maps names and CAS values to canonical odor records.
Do pesticide and environmental workflows use the same dataset?
Yes in the current implementation. Pesticide and environmental workflows share the main GC dataset in data/database.csv, while odor workflows use a separate odor dataset.
Why can an odor compound be supported in WAX but not in 5ms?
Odor support is method-aware. Some compounds are available in the WAX library but not in the current 5ms library, so they remain visible as unsupported when 5ms is selected.
Does the exported list replace lab validation?
No. The exported list is a workflow and method-building output. Users still need to verify RT behavior, ion ratios, matrix suitability, and acceptance criteria under their own laboratory conditions.
Use the methodology with the live workflow
Check current coverage, test your own compound list, and move into the workflow that matches your target class and method context.