Class CheckFingerprint

java.lang.Object
picard.cmdline.CommandLineProgram
picard.fingerprint.CheckFingerprint

@DocumentedFeature public class CheckFingerprint extends CommandLineProgram
Checks the sample identity of the sequence/genotype data in the provided file (SAM/BAM or VCF) against a set of known genotypes in the supplied genotype file (in VCF format).

Summary

Computes a fingerprint (essentially, genotype information from different parts of the genome) from the supplied input file (SAM/BAM or VCF) file and compares it to the expected fingerprint genotypes provided. The key output is a LOD score which represents the relative likelihood of the sequence data originating from the same sample as the genotypes vs. from a random sample.
Two outputs are produced:
  1. A summary metrics file that gives metrics of the fingerprint matches when comparing the input to a set of genotypes for the expected sample. At the single sample level (if the input was a VCF) or at the read level (lane or index within a lane) (if the input was a SAM/BAM)
  2. A detail metrics file that contains an individual SNP/Haplotype comparison within a fingerprint comparison.
The metrics files fill the fields of the classes FingerprintingSummaryMetrics and FingerprintingDetailMetrics. The output files may be specified individually using the SUMMARY_OUTPUT and DETAIL_OUTPUT options. Alternatively the OUTPUT option may be used instead to give the base of the two output files, with the summary metrics having a file extension ".fingerprinting_summary_metrics", and the detail metrics having a file extension ".fingerprinting_detail_metrics".

Example comparing a bam against known genotypes:

     java -jar picard.jar CheckFingerprint \
          INPUT=sample.bam \
          GENOTYPES=sample_genotypes.vcf \
          HAPLOTYPE_MAP=fingerprinting_haplotype_database.txt \
          OUTPUT=sample_fingerprinting
 

Detailed Explanation

This tool calculates a single number that reports the LOD score for identity check between the INPUT and the GENOTYPES. A positive value indicates that the data seems to have come from the same individual or, in other words the identity checks out. The scale is logarithmic (base 10), so a LOD of 6 indicates that it is 1,000,000 more likely that the data matches the genotypes than not. A negative value indicates that the data do not match. A score that is near zero is inconclusive and can result from low coverage or non-informative genotypes.

The identity check makes use of haplotype blocks defined in the HAPLOTYPE_MAP file to enable it to have higher statistical power for detecting identity or swap by aggregating data from several SNPs in the haplotype block. This enables an identity check of samples with very low coverage (e.g. ~1x mean coverage).

When provided a VCF, the identity check looks at the PL, GL and GT fields (in that order) and uses the first one that it finds.

  • Field Details

    • INPUT

      @Argument(shortName="I", doc="Input file SAM/BAM/CRAM or VCF. If a VCF is used, it must have at least one sample. If there are more than one samples in the VCF, the parameter OBSERVED_SAMPLE_ALIAS must be provided in order to indicate which sample\'s data to use. If there are no samples in the VCF, an exception will be thrown.") public String INPUT
    • OBSERVED_SAMPLE_ALIAS

      @Argument(optional=true, doc="If the input is a VCF, this parameters used to select which sample\'s data in the VCF to use.") public String OBSERVED_SAMPLE_ALIAS
    • OUTPUT

      @Argument(shortName="O", doc="The base prefix of output files to write. The summary metrics will have the file extension \'.fingerprinting_summary_metrics\' and the detail metrics will have the extension \'.fingerprinting_detail_metrics\'.", mutex={"SUMMARY_OUTPUT","DETAIL_OUTPUT"}) public String OUTPUT
    • SUMMARY_OUTPUT

      @Argument(shortName="S", doc="The text file to which to write summary metrics.", mutex="OUTPUT") public File SUMMARY_OUTPUT
    • DETAIL_OUTPUT

      @Argument(shortName="D", doc="The text file to which to write detail metrics.", mutex="OUTPUT") public File DETAIL_OUTPUT
    • GENOTYPES

      @Argument(shortName="G", doc="File of genotypes (VCF) to be used in comparison. May contain any number of genotypes; CheckFingerprint will use only those that are usable for fingerprinting.") public String GENOTYPES
    • EXPECTED_SAMPLE_ALIAS

      @Argument(shortName="SAMPLE_ALIAS", optional=true, doc="This parameter can be used to specify which sample\'s genotypes to use from the expected VCF file (the GENOTYPES file). If it is not supplied, the sample name from the input (VCF or BAM read group header) will be used.") public String EXPECTED_SAMPLE_ALIAS
    • HAPLOTYPE_MAP

      @Argument(shortName="H", doc="The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.") public File HAPLOTYPE_MAP
    • GENOTYPE_LOD_THRESHOLD

      @Argument(shortName="LOD", doc="When counting haplotypes checked and matching, count only haplotypes where the most likely haplotype achieves at least this LOD.") public double GENOTYPE_LOD_THRESHOLD
    • IGNORE_READ_GROUPS

      @Argument(optional=true, shortName="IGNORE_RG", doc="If the input is a SAM/BAM/CRAM, and this parameter is true, treat the entire input BAM as one single read group in the calculation, ignoring RG annotations, and producing a single fingerprint metric for the entire BAM.") public boolean IGNORE_READ_GROUPS
    • EXIT_CODE_WHEN_EXPECTED_SAMPLE_NOT_FOUND

      @Argument(doc="When the expected fingerprint sample is not found in the genotypes file, this exit code is returned.") public int EXIT_CODE_WHEN_EXPECTED_SAMPLE_NOT_FOUND
    • EXIT_CODE_WHEN_NO_VALID_CHECKS

      @Argument(doc="When all LOD score are zero, exit with this value.") public int EXIT_CODE_WHEN_NO_VALID_CHECKS
    • FINGERPRINT_SUMMARY_FILE_SUFFIX

      public static final String FINGERPRINT_SUMMARY_FILE_SUFFIX
      See Also:
    • FINGERPRINT_DETAIL_FILE_SUFFIX

      public static final String FINGERPRINT_DETAIL_FILE_SUFFIX
      See Also:
  • Constructor Details

    • CheckFingerprint

      public CheckFingerprint()
  • Method Details

    • doWork

      protected int doWork()
      Description copied from class: CommandLineProgram
      Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
      Specified by:
      doWork in class CommandLineProgram
      Returns:
      program exit status.
    • customCommandLineValidation

      protected String[] customCommandLineValidation()
      Description copied from class: CommandLineProgram
      Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
      Overrides:
      customCommandLineValidation in class CommandLineProgram
      Returns:
      null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.