MaterialsAtlas Docs

StructMatcher

StructMatcher is a CIF-based composition and structure app for structure hygiene, duplicate removal, known-library matching, novelty filtering, and structure-landscape visualization.

Modes

ModeUse caseMain outputs
PairwiseCompare exactly two CIFsstructure-comparison-pair.csv, summary report
DedupCluster identical CIFs in one uploaded setstructure-comparison-dedup.csv, summary report
Cross-referenceFind candidate CIFs that match a reference librarystructure-comparison-cross-matches.csv, unmatched novel CIF ZIP, summary report
Novelty filterKeep candidates with no match in local MP or an uploaded reference librarynovelty CSV, novel CIF ZIP, novel representative CIF ZIP, summary report
Cluster mapEmbed uploaded CIFs into a 2D mapembedding CSV, PNG plot, summary report

Matching Logic

The app uses pymatgen StructureMatcher. Matching can tolerate translations, rotations, primitive/supercell changes, and modest relaxation noise.

Pairwise mode reports both the normal element-aware structure match and an anonymous framework/prototype match. The pair CSV and HTML report include exactly one V among these four outcome columns:

Result tables report derived symmetry metadata (space_group_name, space_group_number, and crystal_system) from pymatgen SpacegroupAnalyzer.

Identity-style matching now uses a two-step gate by default:

1. StructureMatcher must fit the structures. 2. If the Space group match checkbox is enabled, both structures must have the same derived space-group number using the same symprec and angle_tolerance values.

The default SpacegroupAnalyzer tolerances are symprec=0.05 and angle_tolerance=8. The CSV reports structurematcher_fit, symmetry_agrees, final_identity_match, and symmetry_rejected_matches so users can audit cases where topology matched but derived symmetry disagreed. Framework/anonymous topology mode disables this gate by default; for other topology-only checks, uncheck Space group match or set identity_symmetry_check=false in advanced options.

Presets:

The UI exposes the main matching controls:

Advanced options can still be supplied as key=value lines:

ltol=0.3
stol=0.5
angle_tol=10
rms_tolerance=0.2
max_dist_tolerance=0.5
symprec=0.05
angle_tolerance=8
identity_symmetry_check=true
formula_mode=reduced
species_mode=element
cell_mode=primitive_supercell
include_matched=false
reference_database=local_mp
cluster_eps=1.5
max_cifs=1000

Candidate And Reference Uploads

For cross-reference and novelty modes, the UI splits the input row into candidates on the left and the reference set on the right. The reference set can be:

Cross-reference mode also exports structure-comparison-cross-novel-cifs.zip, containing all candidate CIFs that did not match the selected reference set.

When Space group match is enabled, cross-reference and novelty reports also include a “Geometric Matches Rejected by Space Group” audit section in summary.html. Those rows passed the geometric StructureMatcher fit but were rejected because the derived candidate and reference space-group numbers disagreed under the selected symprec and angle_tolerance. The same rows are exported as structure-comparison-symmetry-rejected.csv when present.

The legacy combined-upload workflow is still supported:

For novelty mode, the same reference selector is used, but the output emphasizes candidates that do not match the selected reference set. With the default Local Materials Project CIFs reference, all left-side uploaded CIFs are treated as candidates. For each candidate, the app first normalizes its composition with pymatgen, retrieves local MP CIFs with the identical reduced composition, and only then runs StructureMatcher. With Uploaded reference archive, left-side uploads are candidates and right-side uploads are references. The CSV reports:

Choose Uploaded reference archive when you want the older candidate-vs-reference archive workflow.

Novel Representatives

Novelty mode first filters candidates that have no reference match. It then deduplicates the novel subset and exports:

This is the recommended output for downstream screening pipelines.

Phase 2 Report Package

Every run now exports structure-comparison-report-package.zip, which bundles:

Dedup mode also exports standalone structure-comparison-representatives.zip and structure-comparison-duplicates-removed.zip.

Cluster Features

Cluster mode builds a lightweight feature matrix from:

PCA is always available when scikit-learn is installed. t-SNE uses scikit-learn. UMAP is used when umap-learn is installed, otherwise the app falls back to PCA.

Limitations

The comparison ignores magnetic order, charge state, and detailed partial-occupancy semantics. Borderline structures should be inspected manually, especially when using tolerant or anonymous matching.

All docs SEO hub