DNANexus & UK Biobank Research Analysis Platform (RAP)¶

Last update: 2023/03/16

Getting access to the RAP¶

Sign up here UKB takes about a week to approve your application.
Once that’s done, ask Melissa or Jonathan to add you to our UKB application. That’s instantaneous.
Then create an account on DNANexus’s website This is instantaneous. You must use your UCSD email.
Then email Melissa and ask her to add you to our project (currently UKB_Test)

Using DNANexus from the command line¶

(There’s also a GUI, docs TODO)

Install the DNA nexus command line tools vended through pip: pip3 install dxpy.
Run dx login and dx select <project name>.

Choosing Instance Types¶

Instances are the virtual machines allocated for each job in the cloud. The link here (under the words rate card) describes the costs of the various instances. At the time of writing, asking for more memory per core doesn’t seem to cost more (mem1 vs :code: mem3). Here is an explanation of the naming conventions for instance types.

Job runtimes¶

Jobs listed under the Monitor tab on the web GUI have a Status and a Duration. Note that the Status will be listed as In Progress and the Duration will start counting from when the job is submitted, regardless of whether or not the other jobs that job is dependent on are all finished. Thus this Duration is effectively meaningless. While it’s hard to see this number, the important information is their actual runtime. Actual runtime determines costs. If actual runtime exceeds one day, you will receive a warning message. As far as I understand, if actual runtime for any one job exceeds two days, the job will be killed. Duration exceeding two days is irrelevant.

Exporting UKB phenotypes as TSVs¶

This code runs the table exporter app from the command line:

dx run table-exporter \
  --folder <output_folder> \
  -ioutput=<output_file_name_without_extension> \
  -idataset_or_cohort_or_dashboard=/app46122_20220823045256.dataset \
  -ioutput_format=TSV \
  -icoding_option=RAW \
  -iheader_style=UKB-FORMAT \
  -ientity=participant \
  -ifield_names=eid \
  ...

You should then append an additional -ifield_names=<field_name> for each field you want in the TSV. The field names of UKB data fields follow the format p<numeric field ID>_i<instance>_a<array> Look up fields in the UKB data showcase to determine if any given data field has instances or arrays (if it doesn’t, you need to omit that portion of the field name format). As far as I can tell instances are zero indexed but arrays are one indexed. You need to append -ifield_names=<field_name> for each instance and/or array field you wish to extract. So for example, to extract a data field with three instances, you could append the following to the command above:

$(for i in $(seq 0 2); do echo "-ifield_names=p<field ID>_i${i}" ; done)

DNANexus & UK Biobank Research Analysis Platform (RAP)¶

Getting access to the RAP¶

Using DNANexus from the command line¶

Choosing Instance Types¶

Job runtimes¶

Exporting UKB phenotypes as TSVs¶

GymrekLab

Navigation

Related Topics