GeneRxGPT Overview
GeneRxGPT is a user-friendly web server that utilizes the GPT-4 model to infer drug-gene relationships in cancer based on the latest scientific literature. GeneRxGPT features an interactive R Shiny framework with an automated pipeline that integrates PubMed, GPT-4, and PubChem APIs with real-world data resources. Given a gene, a drug, and a cancer type, the tool 1) displays relevant drug information available from PubChem, 2) leverages GPT-4 to infer the drug-gene relationship within the specific cancer context based on real-time PubMed abstracts, and 3) analyzes the similarity in cell viability effects induced by drug treatments and gene knockouts using high-throughput screening datasets. Here, we provide an overview of our tool.
How to Run GeneRxGPT
GeneRxGPT requires three inputs to start the analysis:
- Gene: Type a gene symbol and select the gene of interest (example: EGFR).
- Drug: Type a drug name (example: Afatinib), PubChem CID (example: 10184653), or SMILES (example: CN(C)C/C=C/C(=O)NC1=C(C=C2C(=C1)C(=NC=N2)NC3=CC(=C(C=C3)F)Cl)O[C@H]4CCOC4). The input identifier is used to search PubChem for the drug and extract relevant chemical information.
- Cancer type: Select from the drop-down menu of the 27 most frequent cancer types in adults and children, or choose "PanCan" for a general cancer analysis (the term "Cancer" is used for PubMed searches). The selected cancer type provides context for searching PubMed for relevant abstracts and analyzing cell viability data.
After specifying the gene, drug, and cancer type of interest, the user should click on the [SUBMIT] button to start the analysis. Alternatively, users can test our tool using a built-in example featuring EGFR, afatinib, and lung cancer by clicking the [EXAMPLE] button. For details regarding the input data format or each module, please visit the corresponding help pages or click on the question mark at each input/output section.
**The analysis can take up to one minute, depending on the current load of the GPT-4, PubMed, and PubChem servers. **
Results: PubChem Drug Information
This section searches the PubChem database for the query drug and displays its structural and detailed information available from PubChem, such as: CID (Compound ID with a hyperlink to PubChem), canonical SMILES (Simplified Molecular Input Line Entry System), InChI (International Chemical Identifier), InChIKey, IUPAC name (International Union of Pure and Applied Chemistry), and chemical features including molecular formula, molecular weight, and so on.
Results: GPT Inference Based on PubMed Abstracts (see Help -> GPT Inference)
GeneRxGPT searches PubMed for relevant articles related to the drug-gene pair and extracts their abstracts. Leveraging prompt engineering techniques, our specialized prompt facilitates meaningful analysis of these abstracts and generates three key outputs:
- A one-sentence inference about the relationship between the drug and gene, accompanied by a confidence level.
- Detailed step-by-step explanations of the inference process.
- A concise summary of the abstracts.
- References to the abstracts.
Results: Cell Viability Analysis Based on Drug and CRISPR Screening Data (see Help -> Cell Viability Analysis)
This section supplements the GPT-based literature analysis by assessing drug-gene relationships in real-world cell viability data. These data are derived from high-throughput drug (the PRISM project) and CRISPR screens (the Broad and Sanger DepMap projects) across pan-cancer cell lines. It evaluates the similarity of cell viability effects induced by drug treatments and gene knockouts, specifically within the specified cancer context.