본문 바로가기
AI & Data Analysis/Deep Learning

[ScRAT Dataset] compare in cellxgene

by doraemin_dev 2025. 7. 23.

Paper : Phenotype prediction from single-cell RNA-seq data using attention-based neural networks

 https://academic.oup.com/bioinformatics/article/40/2/btae067/7613064

https://github.com/yuzhenmao/ScRAT


ScRAT 데이터에 대해 살펴보자.

COMBAT와 Haniffa 데이터는 → covid 인지 아닌지에 대한 데이터다.

SC4 데이터는 → (대부분 covid라 covid가) 경증인지 중증인지에 대한 데이터다. 그리, 회복인지 진행인지를 나타내는 데이터다.

 For COMBAT and Haniffa datasets, we perform the task of disease diagnosis
    COVID versus Non-COVID

For SC4 which includes mostly COVID samples
    mild/moderate versus severe/critical (경증/중등증 vs. 중증/위중증)
    convalescence versus progression (회복 vs. 진행)

0. Data 

데이터 다운로드 하기 :  

https://figshare.com/projects/ScRAT_Early_Phenotype_Prediction_From_Single-cell_RNA-seq_Data_using_Attention-Based_Neural_Networks/151659


1. COMBAT Dataset

https://figshare.com/articles/dataset/COMBAT/21397239

pkl 파일 여러개 말고, h5ad 하나 다운 받았다.

wget https://zenodo.org/record/5139561/files/COMBAT-CITESeq-DATA.h5ad

총 세포 수: 836,148개

총 환자 수: 124명

Unique Labels (8, object)

  • ['COVID_SEV', 'COVID_MILD', 'COVID_HCW_MILD', 'COVID_CRIT', 'COVID_LDN', 'Sepsis', 'HV', 'Flu']
    • COVID_SEV → 중증 COVID-19 환자
    • COVID_MILD → 경증 COVID-19 환자
    • COVID_HCW_MILD → 경증 COVID-19 환자(보건의료 종사자)
    • COVID_CRIT → 위중증 COVID-19 환자
    • COVID_LDN → 런던 COVID-19 환자군
    • Sepsis → 패혈증(Sepsis) 환자
    • HV → 건강한 대조군(Healthy Volunteers)
    • Flu → 인플루엔자(Flu) 환자

Unique Cell Types Large (41, object)

  • ['NK.CD16hi', 'CD8.TEMRA', 'nan', 'ncMono', 'cMono', ..., 'GDT.VD2', 'CD8.TREG', 'PLT', 'RET', 'Mast’]

Unique Cell Types (18, object)

  • ['NK', 'CD8', 'nan', 'ncMono', 'cMono', ..., 'HSC', 'DC', 'PLT', 'RET', 'Mast']

ScGPT 학습에 사용 되었다.

https://cellxgene.cziscience.com/collections/8f126edf-5405-4731-8374-b5ce11f53e82

 

Cellxgene Data Portal

Find, download, and visually explore curated and standardized single cell datasets.

cellxgene.cziscience.com


2. Haniffa Datasets

https://figshare.com/articles/dataset/Haniffa/21397254

pkl 파일 여러개 말고, h5ad 하나 다운 받았다.
wget https://covid19.cog.sanger.ac.uk/submissions/release1/haniffa21.processed.h5ad

총 세포 수: 647,366개

총 환자 수: 130명 (또는 143명)

Unique Labels (10, object) : ['Moderate', 'Healthy', 'Death', 'Mild', 'Severe', 'LPS', 'Critical ', 'Non-covid', 'Asymptomatic', 'nan']

  • Moderate → 중등증 COVID-19 환자
  • Mild → 경증 COVID-19 환자
  • Severe → 중증 COVID-19 환자
  • Critical → 위중증 COVID-19 환자
  •  Asymptomatic → 무증상 COVID-19 환자
  • Non-covid → COVID-19 감염되지 않은 환자
  • Healthy → 건강한 대조군(Healthy Control)
  • LPS → LPS(lipopolysaccharide) 염증 반응 실험군
  • Death → 사망자 데이터
  • nan → 결측값(missing value) 포함

Unique Cell Types Large (51, object)

  • ['CD8.TE', 'CD4.IL22', 'CD8.Naive', 'CD4.Naive', 'CD8.EM', ..., 'HSC_CD38neg', 'HSC_myeloid', 'HSC_MK', 'CD4.Th17', 'B_malignant’]

Unique Cell Types (18, object)

  • ['CD8', 'CD4', 'CD14', 'B_cell', 'NK_16hi', ..., 'gdT', 'HSC', 'pDC', 'RBC', 'Mono_prolif']
더보기

🧬 전체 cell 수: 647366
🧪 전체 feature 수 (유전자 등): 24929
📝 obs 컬럼 목록: ['sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'full_clustering', 'initial_clustering', 'Resample', 'Collection_Day', 'Sex', 'Age_interval', 'Swab_result', 'Status', 'Smoker', 'Status_on_day_collection', 'Status_on_day_collection_summary', 'Days_from_onset', 'Site', 'time_after_LPS', 'Worst_Clinical_Status', 'Outcome', 'patient_id']

👤 환자 수: 143
👥 환자별 셀 수:
sample_id
MH9143427    14317
AP6          14086
MH8919333    12081
MH9143277    11710
AP11         10921
             ...  
MH8919233      307
MH8919229      305
MH8919232      184
MH8919228      147
MH8919277       65
Name: count, Length: 143, dtype: int64

👤 환자 수: 130
👥 환자별 셀 수:
patient_id
MH9143427    14317
AP6          14086
MH8919333    12081
MH9143277    11710
AP11         10921
             ...  
MH8919233      307
MH8919229      305
MH8919232      184
MH8919228      147
MH8919277       65
Name: count, Length: 130, dtype: int64

Status                                                                                                           
Covid        527286                                                                                              
Healthy       97039                                                                                              
Non_covid     15157                                                                                              
LPS            7884                                                                                              
Name: count, dtype: int64    

Outcome                                                                                                          
Home       504847
unknown    100683
Death       41836
Name: count, dtype: int64

sample_id                                               
MH9143427    14317
AP6          14086
MH8919333    12081
MH9143277    11710
AP11         10921
             ...  
MH8919233      307
MH8919229      305                                      
MH8919232      184
MH8919228      147
MH8919277       65
Name: count, Length: 143, dtype: int64

patient_id                                              
MH9143427    14317                                      
AP6          14086
MH8919333    12081
MH9143277    11710
AP11         10921
             ...  
MH8919233      307
MH8919229      305                                      
MH8919232      184                                      
MH8919228      147
MH8919277       65
Name: count, Length: 130, dtype: int64


 

ScGPT 학습에 사용 되었다.

https://cellxgene.cziscience.com/collections/ddfad306-714d-4cc0-9985-d9072820c530

 

Cellxgene Data Portal

Find, download, and visually explore curated and standardized single cell datasets.

cellxgene.cziscience.com


3.SC4 

https://figshare.com/articles/dataset/SC4/21397257

http://covid19.cancer-pku.cn (위의 구글 드라이브 링크로 연결된다.)

*구글 드라이브에서 COVID19_ALL.h5ad.tar.gz (15.35 GB) 다운로드 받은 뒤 서버로 옮겨주자.

# gdown은 Google Drive 파일을 커맨드라인에서 쉽게 다운로드할 수 있도록 도와주는 Python 패키지입니다.
# pip install gdown
gdown https://drive.google.com/uc?id=1IwWcn4W-YKgNbz4DpNweM2cKxlx1hbM0

# obs 컬럼 목록:  'CoVID-19 severity'

 

총 세포 수: 1,462,702

총 환자 수: 196명

 

CoVID-19 severity
mild/moderate      700968
severe/critical    596734
control            165000
Name: count, dtype: int64

 

SARS-CoV-2   
positive    1297702
negative     165000
Name: count, dtype: int64

더보기

🧬 전체 cell 수: 1462702


🧪 전체 feature 수 (유전자 등): 27943


📝 obs 컬럼 목록: ['celltype', 'majorType', 'sampleID', 'PatientID', 'datasets', 'City', 'Age', 'Sex', 'Sample type', 'CoVID-19 severity', 'Sample time', 'Sampling day (Days after symptom onset)', 'SARS-CoV-2', 'Single cell sequencing platform', 'BCR single cell sequencing', 'TCR single cell sequencing', 'Outcome', 'Comorbidities', 'COVID-19-related medication and anti-microbials', 'Leukocytes [G/L]', 'Neutrophils [G/L]', 'Lymphocytes [G/L]', 'Unpublished']

 

👤 환자 수: 196

 

👥 환자별 셀 수:
PatientID
P-M004    49223
P-M007    31763
P-M010    31359
P-S022    29408
P-S086    28625
          ...  
P-S003      447
P-S006      403
P-M057      356
P-S002      252
P-S009      168
Name: count, Length: 196, dtype: int64

 

📊 세포 타입 분포:
celltype
B_c01-TCL1A                    227948
Mono_c3-CD14-VCAN              136158
T_CD4_c01-LEF1                 107008
B_c02-MS4A1-CD27                92913
Mono_c2-CD14-HLA-DPB1           84402
                                ...  
T_CD4_c14-MKI67-CCL5_h            191
DC_c3-LAMP3                       186
Neu_c5-GSTP1(high)OASL(low)        59
Epi-AT2                            25
Mast                               17
Name: count, Length: 64, dtype: int64


Outcome
discharged    1216871
control        165000
deceased        80831
Name: count, dtype: int64

 

Sample type                                                                                
fresh PBMC                                                    542075
frozen PBMC                                                   451096
B cells sorted from frozen PBMC (MACS, STEMCELL 19054)        196908
CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)     74822
CD19+ B cell sorted from fresh PBMC (MACS)                     65822
fresh BALF                                                     42723
CD3+ T cell sorted from fresh PBMC (FACS)                      36550
CD19+ B cell sorted from fresh PBMC (FACS)                     31913
fresh Sputum                                                   14502
fresh PFMC                                                      6291
Name: count, dtype: int64

Sample time
convalescence    787987
progression      509715
control          165000
Name: count, dtype: int64


ScGPT 학습에 사용 되었다.

https://cellxgene.cziscience.com/collections/0a839c4b-10d0-4d64-9272-684c49a2c8ba

 

Cellxgene Data Portal

Find, download, and visually explore curated and standardized single cell datasets.

cellxgene.cziscience.com


scGPT 학습에 사용된 데이터 리스트

https://github.com/bowang-lab/scGPT/blob/main/data/cellxgene/metainfo.json

 

scGPT/data/cellxgene/metainfo.json at main · bowang-lab/scGPT

Contribute to bowang-lab/scGPT development by creating an account on GitHub.

github.com

 

 

{
  # COVID # Cells : 58만개 # cell_type : 22개
  "f72958f5-7f42-4ebb-98da-445b0c6de516": {
    "name": "Azimuth meta-analysis of 10 datasets of healthy and diseased human lung",
    "url": "https://cellxgene.cziscience.com/e/f72958f5-7f42-4ebb-98da-445b0c6de516.cxg/",
    "include_disease": ["normal", "COVID-19"]
  }, 

  "d0c12af4-c0e4-4c7b-873a-70752b449689": {
    "name": "Stromal cells (all non-immune cells)",
    "url": "https://cellxgene.cziscience.com/e/d0c12af4-c0e4-4c7b-873a-70752b449689.cxg/"
  },
  "804d9a85-665b-45c6-b204-13457fdcc7ac": {
    "name": "Megakaryocyte/erythroid cells",
    "url": "https://cellxgene.cziscience.com/e/804d9a85-665b-45c6-b204-13457fdcc7ac.cxg/"
  },
  "6a30bf44-c490-41ac-965b-0bb58432b10a": {
    "name": "HSC/progenitor cells",
    "url": "https://cellxgene.cziscience.com/e/6a30bf44-c490-41ac-965b-0bb58432b10a.cxg/"
  },
  "2aa1c93c-4ef3-4e9a-98e7-0bd37933953c": {
    "name": "Myeloid cells",
    "url": "https://cellxgene.cziscience.com/e/2aa1c93c-4ef3-4e9a-98e7-0bd37933953c.cxg/"
  },
  "3affa268-8a74-460a-ac9b-a984c0832469": {
    "name": "Lymphoid cells",
    "url": "https://cellxgene.cziscience.com/e/3affa268-8a74-460a-ac9b-a984c0832469.cxg/"
  },
  "fd072bc3-2dfb-46f8-b4e3-467cb3223182": {
    "name": "Full dataset of single-cell RNA-seq profiles from 9 developmental tissues across gestation (4-17 pcw)",
    "url": "https://cellxgene.cziscience.com/e/fd072bc3-2dfb-46f8-b4e3-467cb3223182.cxg/"
  },
  "48101fa2-1a63-4514-b892-53ea1d3a8657": {
    "name": "HSC/immune cells (all hematopoietic-derived cells)",
    "url": "https://cellxgene.cziscience.com/e/48101fa2-1a63-4514-b892-53ea1d3a8657.cxg/"
  },
  "aa633105-e8e5-4dcc-b72d-6da6c191b3e9": {
    "name": "NK/T cells",
    "url": "https://cellxgene.cziscience.com/e/aa633105-e8e5-4dcc-b72d-6da6c191b3e9.cxg/"
  },
  "1b9d8702-5af8-4142-85ed-020eb06ec4f6": {
    "name": "Global",
    "url": "https://cellxgene.cziscience.com/e/1b9d8702-5af8-4142-85ed-020eb06ec4f6.cxg/"
  },
  "71be997d-ff75-41b9-8a9f-1288c865f921": {
    "name": "B cell compartment",
    "url": "https://cellxgene.cziscience.com/e/71be997d-ff75-41b9-8a9f-1288c865f921.cxg/"
  },
  "fe52003e-1460-4a65-a213-2bb1a508332f": {
    "name": "Myeloid compartment",
    "url": "https://cellxgene.cziscience.com/e/fe52003e-1460-4a65-a213-2bb1a508332f.cxg/"
  },
  "53d208b0-2cfd-4366-9866-c3c6114081bc": {
    "name": "Tabula Sapiens - All Cells",
    "url": "https://cellxgene.cziscience.com/e/53d208b0-2cfd-4366-9866-c3c6114081bc.cxg/"
  },
  "c5d88abe-f23a-45fa-a534-788985e93dad": {
    "name": "Tabula Sapiens - Immune",
    "url": "https://cellxgene.cziscience.com/e/c5d88abe-f23a-45fa-a534-788985e93dad.cxg/"
  },
  "ae29ebd0-1973-40a4-a6af-d15a5f77a80f": {
    "name": "T & innate lymphoid cells",
    "url": "https://cellxgene.cziscience.com/e/ae29ebd0-1973-40a4-a6af-d15a5f77a80f.cxg/"
  },
  "218acb0f-9f2f-4f76-b90b-15a4b7c7f629": {
    "name": "multiplexed scRNA-seq of 1.2 million PBMCs from adult lupus samples",
    "url": "https://cellxgene.cziscience.com/e/218acb0f-9f2f-4f76-b90b-15a4b7c7f629.cxg/",
    "include_disease": ["normal"]
  },
  "3faad104-2ab8-4434-816d-474d8d2641db": {
    "name": "Single-cell eQTL mapping identifies cell type specific genetic control of autoimmune disease",
    "url": "https://cellxgene.cziscience.com/e/3faad104-2ab8-4434-816d-474d8d2641db.cxg/"
  },
  
  # COVID # COMBAT Dataset!!! # ScGPT 학습에 사용 되었다!
  "ebc2e1ff-c8f9-466a-acf4-9d291afaf8b3": {
    "name": "COMBAT project: single cell gene expression data from COVID-19, sepsis and flu patient PBMCs",
    "url": "https://cellxgene.cziscience.com/e/ebc2e1ff-c8f9-466a-acf4-9d291afaf8b3.cxg/"
  },
  "2a498ace-872a-4935-984b-1afa70fd9886": {
    "name": "PBMC",
    "url": "https://cellxgene.cziscience.com/e/2a498ace-872a-4935-984b-1afa70fd9886.cxg/"
  },
  "c05fb583-eb2f-4e3a-8e74-f9bd6414e418": {
    "name": "healthy young bone marrow donor",
    "url": "https://cellxgene.cziscience.com/e/c05fb583-eb2f-4e3a-8e74-f9bd6414e418.cxg/"
  },
  "cd4c96bb-ad66-4e83-ba9e-a7df8790eb12": {
    "name": "3 healthy young and 3 healthy old bone marrow donors (Reference sample)",
    "url": "https://cellxgene.cziscience.com/e/cd4c96bb-ad66-4e83-ba9e-a7df8790eb12.cxg/"
  },
  "d3566d6a-a455-4a15-980f-45eb29114cab": {
    "name": "blood and bone marrow from a healthy young donor",
    "url": "https://cellxgene.cziscience.com/e/d3566d6a-a455-4a15-980f-45eb29114cab.cxg/"
  },
  "1a2e3350-28a8-4f49-b33c-5b67ceb001f6": {
    "name": "Fetal Bone Marrow (10x)",
    "url": "https://cellxgene.cziscience.com/e/1a2e3350-28a8-4f49-b33c-5b67ceb001f6.cxg/"
  },
  "343ff97c-85df-494b-8400-beb937618611": {
    "name": "Human Fetal Bone Marrow (CITE-seq)",
    "url": "https://cellxgene.cziscience.com/e/343ff97c-85df-494b-8400-beb937618611.cxg/"
  },
  "471647b3-04fe-4c76-8372-3264feb950e8": {
    "name": "CD34+ Fetal Bone Marrow, Fetal Liver, Cord Blood (CITE-seq)",
    "url": "https://cellxgene.cziscience.com/e/471647b3-04fe-4c76-8372-3264feb950e8.cxg/"
  },
  "4c4cd77c-8fee-4836-9145-16562a8782fe": {
    "name": "Individual Single-Cell RNA-seq PBMC Data from Lee et al.",
    "url": "https://cellxgene.cziscience.com/e/4c4cd77c-8fee-4836-9145-16562a8782fe.cxg/"
  },
  
  # COVID # Cells : 55만개
  "db0752b9-f20e-40b8-8997-992f3ae0bb2e": {
    "name": "Classical Monocyte sub_clusters of COVID-19 Immune Altas: Integration of 5 public COVID-19 PBMC single-cell datasets",
    "url": "https://cellxgene.cziscience.com/e/db0752b9-f20e-40b8-8997-992f3ae0bb2e.cxg/"
  },
  # COVID # Cells : 7천개
  "e763ed0d-0e5a-4b8e-9514-6da3d9e47956": {
    "name": "Platelet sub_clusters of COVID-19 Immune Altas: Integration of 5 public COVID-19 PBMC single-cell datasets",
    "url": "https://cellxgene.cziscience.com/e/e763ed0d-0e5a-4b8e-9514-6da3d9e47956.cxg/"
  },
  "59b69042-47c2-47fd-ad03-d21beb99818f": {
    "name": "Individual Single-Cell RNA-seq PBMC Data from Arunachalam et al.",
    "url": "https://cellxgene.cziscience.com/e/59b69042-47c2-47fd-ad03-d21beb99818f.cxg/"
  },
  # COVID # Cells : 2만개
  "d9b4bc69-ed90-4f5f-99b2-61b0681ba436": {
    "name": "B Cell/Plasmablast Sub_clusters of COVID-19 Immune Altas: Integration of 5 public COVID-19 PBMC single-cell datasets",
    "url": "https://cellxgene.cziscience.com/e/d9b4bc69-ed90-4f5f-99b2-61b0681ba436.cxg/"
  },
  # COVID # Cells : 23만개
  "96a3f64b-0ee9-40d8-91e9-813ce38261c9": {
    "name": "COVID-19 Immune Altas: Integration of 5 public COVID-19 PBMC single-cell datasets",
    "url": "https://cellxgene.cziscience.com/e/96a3f64b-0ee9-40d8-91e9-813ce38261c9.cxg/"
  },
  # COVID # Cells : 10만개
  "bc2a7b3d-f04e-477e-96c9-9d5367d5425c": {
    "name": "T Cell and NK Cell Subtypes of COVID-19 Immune Altas: Integration of 5 public COVID-19 PBMC single-cell datasets",
    "url": "https://cellxgene.cziscience.com/e/bc2a7b3d-f04e-477e-96c9-9d5367d5425c.cxg/"
  },
  "055ca631-6ffb-40de-815e-b931e10718c0": {
    "name": "Individual Single-Cell RNA-seq PBMC Data from Wilk et al.",
    "url": "https://cellxgene.cziscience.com/e/055ca631-6ffb-40de-815e-b931e10718c0.cxg/"
  },
  "ae5341b8-60fb-4fac-86db-86e49ee66287": {
    "name": "Individual Single-Cell RNA-seq PBMC Data from Guo et al.",
    "url": "https://cellxgene.cziscience.com/e/ae5341b8-60fb-4fac-86db-86e49ee66287.cxg/"
  },
  "5e717147-0f75-4de1-8bd2-6fda01b8d75f": {
    "name": "Individual Single-Cell RNA-seq PBMC Data from Schulte-Schrepping et al.",
    "url": "https://cellxgene.cziscience.com/e/5e717147-0f75-4de1-8bd2-6fda01b8d75f.cxg/"
  },
  
  # COVID # Cells : 60만개 # COVID-19 환자 54명과 비COVID-19 대조군 26명의 말초혈액 단핵구 60만 개 이상을 대상
  "01ad3cd7-3929-4654-84c0-6db05bd5fd59": {
    "name": "Type I interferon autoantibodies are associated with systemic immune alterations in patients with COVID-19",
    "url": "https://cellxgene.cziscience.com/e/01ad3cd7-3929-4654-84c0-6db05bd5fd59.cxg/"
  },
  "ed5d841d-6346-47d4-ab2f-7119ad7e3a35": {
    "name": "nygc multimodal pbmc",
    "url": "https://cellxgene.cziscience.com/e/ed5d841d-6346-47d4-ab2f-7119ad7e3a35.cxg/"
  },
  # COVID # Cells : 64만개 # 본 연구에서는 COVID-19의 중증도가 다양한 130명의 환자로 구성된 단면 코호트에서 78만 개 이상의 말초혈액 단핵구에 대한 단일 세포 전사체, 표면 단백체, 그리고 T세포와 B세포 항원 수용체 분석을 수행했습니다
  "c7775e88-49bf-4ba2-a03b-93f00447c958": {
    "name": "Single-cell multi-omics analysis of the immune response in COVID-19",
    "url": "https://cellxgene.cziscience.com/e/c7775e88-49bf-4ba2-a03b-93f00447c958.cxg/"
  },
  # COVID # Cells : 24만개 # Cells : 12만개
  "30cd5311-6c09-46c9-94f1-71fe4b91813c": {
    "name": "Time-resolved Systems Immunology Reveals a Late Juncture Linked to Fatal COVID-19: Innate Cells",
    "url": "https://cellxgene.cziscience.com/e/30cd5311-6c09-46c9-94f1-71fe4b91813c.cxg/"
  },
  "c874f155-9bf9-4928-b821-f52c876b3e48": {
    "name": "49 years old male - Fresh PBMCs (1 day post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/c874f155-9bf9-4928-b821-f52c876b3e48.cxg/"
  },
  "8a554710-08bc-4005-87cd-da9675bdc2e7": {
    "name": "82 years old female - Fresh PBMCs (1 day post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/8a554710-08bc-4005-87cd-da9675bdc2e7.cxg/"
  },
  "881fe679-c6e0-45a3-9427-c4e81be6921f": {
    "name": "66 years old female - Fresh PBMCs (2 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/881fe679-c6e0-45a3-9427-c4e81be6921f.cxg/"
  },
  "eeacb0c1-2217-4cf6-b8ce-1f0fedf1b569": {
    "name": "49 years old male - Fresh PBMCs (3 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/eeacb0c1-2217-4cf6-b8ce-1f0fedf1b569.cxg/"
  },
  "ed9e9f96-4f08-49d2-bef5-b2c29adf3edc": {
    "name": "66 years old female - Fresh PBMCs (4 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/ed9e9f96-4f08-49d2-bef5-b2c29adf3edc.cxg/"
  },
  "01c93cf6-b695-4e30-a26e-121ae8b16a9e": {
    "name": "66 years old female - Fresh PBMCs (7 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/01c93cf6-b695-4e30-a26e-121ae8b16a9e.cxg/"
  },
  "db59611b-42de-4035-93aa-1ed39f38b467": {
    "name": "49 years old male - Fresh PBMCs (2 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/db59611b-42de-4035-93aa-1ed39f38b467.cxg/"
  },
  "ea786a06-5855-48b7-80d7-0313a21a2044": {
    "name": "66 years old female - Fresh PBMCs (3 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/ea786a06-5855-48b7-80d7-0313a21a2044.cxg/"
  },
  "84230ea4-998d-4aa8-8456-81dd54ce23af": {
    "name": "74 years old female - Fresh PBMCs (3 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/84230ea4-998d-4aa8-8456-81dd54ce23af.cxg/"
  },
  "50eb1e23-b8d4-4f76-a184-44e5541fa05a": {
    "name": "74 years old female - Fresh PBMCs (8 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/50eb1e23-b8d4-4f76-a184-44e5541fa05a.cxg/"
  },
  "79ef1959-a6b4-4cac-82ca-30feaec48df1": {
    "name": "74 years old female - Fresh PBMCs (7 days post-intubation)",
    "url": "https://cellxgene.cziscience.com/e/79ef1959-a6b4-4cac-82ca-30feaec48df1.cxg/"
  },
  
  # COVID # Cells 146만개
  "9dbab10c-118d-496b-966a-67f1763a6b7d": {
    "name": "Large-scale single-cell analysis reveals critical immune characteristics of COVID-19 patients",
    "url": "https://cellxgene.cziscience.com/e/9dbab10c-118d-496b-966a-67f1763a6b7d.cxg/"
  },
  "krasnow_lab_human_lung_cell_atlas_10x-1-remixed": {
    "name": "Krasnow Lab Human Lung Cell Atlas, 10X",
    "url": "https://cellxgene.cziscience.com/e/krasnow_lab_human_lung_cell_atlas_10x-1-remixed.cxg/"
  },
  "krasnow_lab_human_lung_cell_atlas_smartseq2-2-remixed": {
    "name": "Krasnow Lab Human Lung Cell Atlas, Smart-seq2",
    "url": "https://cellxgene.cziscience.com/e/krasnow_lab_human_lung_cell_atlas_smartseq2-2-remixed.cxg/"
  },
  "c2a461b1-0c15-4047-9fcb-1f966fe55100": {
    "name": "Autoimmunity PBMCs",
    "url": "https://cellxgene.cziscience.com/e/c2a461b1-0c15-4047-9fcb-1f966fe55100.cxg/"
  },
  "fa8605cf-f27e-44af-ac2a-476bee4410d3": {
    "name": "PBMCs",
    "url": "https://cellxgene.cziscience.com/e/fa8605cf-f27e-44af-ac2a-476bee4410d3.cxg/"
  },
  "Single_cell_atlas_of_peripheral_immune_response_to_SARS_CoV_2_infection": {
    "name": "Single-cell atlas of peripheral immune response to SARS-CoV-2 infection",
    "url": "https://cellxgene.cziscience.com/e/Single_cell_atlas_of_peripheral_immune_response_to_SARS_CoV_2_infection.cxg/"
  },
  "human_cell_landscape": {
    "name": "Construction of a human cell landscape at single-cell level",
    "url": "https://cellxgene.cziscience.com/e/human_cell_landscape.cxg/"
  },
  "01209dce-3575-4bed-b1df-129f57fbc031": {
    "name": "Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease",
    "url": "https://cellxgene.cziscience.com/e/01209dce-3575-4bed-b1df-129f57fbc031.cxg/"
  },
  "5bc42b88-bb76-4954-927b-8bb7369adc64": {
    "name": "Pregnant Uterus (All)",
    "url": "https://cellxgene.cziscience.com/e/5bc42b88-bb76-4954-927b-8bb7369adc64.cxg/"
  },
  # COVID # Cells : 5만9천개
  "de2c780c-1747-40bd-9ccf-9588ec186cee": {
    "name": "Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19",
    "url": "https://cellxgene.cziscience.com/e/de2c780c-1747-40bd-9ccf-9588ec186cee.cxg/"
  }
}

scGPT 논문

https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2.full

 

scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

Generative pre-trained models have achieved remarkable success in various domains such as natural language processing and computer vision. Specifically, the combination of large-scale diverse datasets and pre-trained transformers has emerged as a promising

www.biorxiv.org