RAGE-redcap Data and Sequence Import Instructions

This guide provides step-by-step instructions for preparing and importing metadata and sequence data into RAGE-redcap.

1. Metadata Import

Fill in the template using the latest data dictionary as reference.
Adhere to approved choices for dropdown or radio fields; only these will be accepted by the database.
Not all fields are relevant for every user — complete what applies.
Latest data dictionary can be obtained:
- From REDCap: Project Home > Design > Download Data Dictionary
- From GitHub: RAGE REDCap Data Dictionary

Save the completed template with a suitable filename.
Use the rabvRedcapProcessing R tool to prepare the data for REDCap import: rabvRedcapProcessing GitHub
Run the checkScripts.R script to process your imported sheet and generate sequencing and diagnostic forms ready for REDCap: checkScripts.R
- Edit only the input file paths to match your data.
Visually inspect the outputs.

Duplicate sequencing events: Carefully check your import list against the latest REDCap dataset to identify any repeat sequencing events. Manually adjust the redcap_repeat_instance for these repeats and remove duplicates from the diagnostic sheet to prevent overwriting existing records. Assign repeat instances in the new data consistently to ensure accurate tracking and integration.
Review REDCap import warnings carefully; data that may be overwritten is highlighted in red.

Before running the sequence import scripts, set up the conda environment included in this repository.

Create the environment from the provided environment.yml file:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate rage-redcap
```
Verify installation:
```
python --version
```
Prepare Sequences
Ensure metadata has been imported first to associate sequences with records.
Obtain consensus FASTA sequences from your latest run.
Concatenate sequences from artic-rabv pipeline output: results/concatenate/concat_genome.fasta.

Use multi_to_single_fasta.py to split the multi-FASTA into individual FASTA files: python3 multi_to_single_fasta.py
Edit input/output paths in the script as required.
Ensure filenames correspond to sample IDs from FASTA headers.
Negative controls: Copy and rename manually as negative_runname__runname__instance1.fasta and edit internal FASTA headers to match.

Run redcap-prepareFASTA.R
- Edit metadata_file, fasta_dir, and output_dir filepaths.
- Script matches sequences to REDCap metadata and renames FASTA files as sampleID__runname__instanceX.fasta.
Verify all renamed FASTA files and negative controls.

Use bulk_upload_fasta_repeatInstances.py: python3 bulk_upload_fasta_repeatInstances.py
Edit input folder path to the renamed FASTA directory.
Ensure the REDCap API URL points to the correct project version (updates may change the endpoint).
Monitor command-line output for successful uploads.
If files do not appear, check API version and repeat instances.

Always verify sequencing records are not accidentally overwritten.
Assumes you have REDCap API access with local environment files; contact project lead if not set up.
REDCap provides informative error messages during import; use these to correct formatting or repeat instance issues before re-importing.