[Hierarchical MIL] Exploratory Data (Summary)
논문
Incorporating Hierarchical Information into Multiple Instance Learning for Patient Phenotype Prediction with scRNA-seq Data
https://www.biorxiv.org/content/10.1101/2025.02.10.637389v1.full.pdf
논문 정리
[Hierarchical MIL] Incorporating Hierarchical Information into Multiple Instance Learning for Patient Phenotype Prediction with
논문Incorporating Hierarchical Information into Multiple Instance Learning for Patient Phenotype Prediction with scRNA-seq Datahttps://www.biorxiv.org/content/10.1101/2025.02.10.637389v1.full.pdf깃허브https://github.com/minhchaudo/hier-mil GitHub -
doraemin.tistory.com
깃허브
https://github.com/minhchaudo/hier-mil
GitHub - minhchaudo/hier-mil
Contribute to minhchaudo/hier-mil development by creating an account on GitHub.
github.com
DATA (3가지)
1.Cardio: 심근병증 환자 데이터를 활용한 다중 분류 (DCM, HCM, 정상; 세 가지 분류)
2.COVID: COVID-19 감염 여부를 예측하는 이진 분류
3.ICB: 면역항암제 치료 반응 여부를 예측하는 이진 분류
COVID 데이터셋.
https://singlecell.broadinstitute.org/single_cell/study/SCP1289/impaired-local-intrinsic-immunity-to-sars-cov-2-infection-in-severe-covid-19
Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID-19 - Single Cell Portal
ABSTRACT: Infection with SARS-CoV-2, the virus that causes COVID-19, can lead to severe lower respiratory illness including pneumonia and acute respiratory distress syndrome, which can result in profound morbidity and mortality. However, many infected indi
singlecell.broadinstitute.org
Columns : 27개
총 샘플 수: 32,588개
총 환자 수(donor_id): 58명
- SARSCoV2_PCR_Status
- pos 18073
- neg 14515
- disease__ontology_label
- COVID-19 18073
- normal 8874
- respiratory failure 3335
- long COVID-19 2306
- Coarse_Cell_Annotations (18개)
- Ciliated Cells 10059
Squamous Cells 5250
Developing Ciliated Cells 3854
Secretory Cells 3633
Goblet Cells 2807
Basal Cells 1691
T Cells 1475
Erythroblasts 986
Macrophages 903
Deuterosomal Cells 583
Developing Secretory and Goblet Cells 406
Ionocytes 399
Mitotic Basal Cells 266
Dendritic Cells 142
B Cells 71
Enteroendocrine Cells 41
Plasmacytoid DCs 13
Mast Cells 9
- Ciliated Cells 10059
MetaData.txt
import pandas as pd
# 원본 메타데이터 로드
df = pd.read_csv("20210701_NasalSwab_MetaData.txt", sep='\t')
# 타입 행과 실제 데이터 분리
column_types = df.iloc[0] # 0번째 행 → 컬럼 타입
df_data = df.iloc[1:].copy() # 1번째 행부터가 진짜 데이터
df_data.reset_index(drop=True, inplace=True)
print(f" 🔍 예제 데이터: {df_data.head(3).to_dict()}")
# 각 컬럼을 알맞은 타입으로 변환 (numeric 컬럼은 float로 변환)
for col in column_types[column_types == "numeric"].index:
df_data[col] = pd.to_numeric(df_data[col], errors="coerce") # 숫자로 변환, 안 되면 NaN
print("전체 세포 수:", len(df_data),"\n")
print(f"유일한 환자(donor) 수: {df_data["donor_id"].nunique()}\n")
print("유일한 biosample 수:", df_data["biosample_id"].nunique(), "\n")
# 라벨 분포 확인
print(f"🦠 COVID 감염 여부 분포: {df_data["SARSCoV2_PCR_Status"].value_counts()}\n")
print(f"질병코드 : {df_data["disease"].value_counts()} \n")
print(f"질병이름 : {df_data["disease__ontology_label"].value_counts()} \n")
print(f"🧪 (Coarse_Cell_Annotations) 종류:{len(df_data["Coarse_Cell_Annotations"].unique())}개. \n{df_data['Coarse_Cell_Annotations'].value_counts()}")
missing = df_data.isnull().sum()
print("📉 누락값이 있는 컬럼:")
print(missing[missing > 0])
print(f"📋 전체 컬럼명 리스트: {len(df_data.columns)} 개")
for col in df_data.columns:
print(col)
실행 결과
kim89@ailab-System-Product-Name:~/hier-mil/data_original$ python data_MetaData.py
🔍 예제 데이터: {'NAME': {0: 'GTCGGGGGGTGG_Control_Participant7', 1: 'CAAATCAATTAT_Control_Participant7', 2: 'ATACAATTGACA_Control_Participant7'}, 'donor_id': {0: 'Control_Participant7', 1: 'Control_Participant7', 2: 'Control_Participant7'}, 'Peak_Respiratory_Support_WHO_Score': {0: '0', 1: '0', 2: '0'}, 'Bloody_Swab': {0: 'No', 1: 'No', 2: 'No'}, 'Percent_Mitochondrial': {0: '11.00478469', 1: '34.21052632', 2: '7.068223725'}, 'SARSCoV2_PCR_Status': {0: 'neg', 1: 'neg', 2: 'neg'}, 'SARSCoV2_PCR_Status_and_WHO_Score': {0: 'neg_0', 1: 'neg_0', 2: 'neg_0'}, 'Cohort_Disease_WHO_Score': {0: 'Control_WHO_0', 1: 'Control_WHO_0', 2: 'Control_WHO_0'}, 'biosample_id': {0: 'WHO_0_Control_Participant7', 1: 'WHO_0_Control_Participant7', 2: 'WHO_0_Control_Participant7'}, 'SingleCell_SARSCoV2_RNA_Status': {0: 'neg', 1: 'neg', 2: 'neg'}, 'SARSCoV2_Unspliced_TRS_Total_Corrected': {0: '0', 1: '0', 2: '0'}, 'SARSCoV2_Spliced_TRS_Total_Corrected': {0: '0', 1: '0', 2: '0'}, 'SARSCoV2_NegativeStrand_Total_Corrected': {0: '0', 1: '0', 2: '0'}, 'SARSCoV2_PositiveStrand_Total_Corrected': {0: '0', 1: '0', 2: '0'}, 'SARSCoV2_Total_Corrected': {0: '0', 1: '0', 2: '0'}, 'species': {0: 'NCBITaxon_9606', 1: 'NCBITaxon_9606', 2: 'NCBITaxon_9606'}, 'species__ontology_label': {0: 'Homo sapiens', 1: 'Homo sapiens', 2: 'Homo sapiens'}, 'sex': {0: 'male', 1: 'male', 2: 'male'}, 'disease': {0: 'PATO_0000461', 1: 'PATO_0000461', 2: 'PATO_0000461'}, 'disease__ontology_label': {0: 'normal', 1: 'normal', 2: 'normal'}, 'organ': {0: 'UBERON_0001728', 1: 'UBERON_0001728', 2: 'UBERON_0001728'}, 'organ__ontology_label': {0: 'nasopharynx', 1: 'nasopharynx', 2: 'nasopharynx'}, 'library_preparation_protocol': {0: 'EFO_0008919', 1: 'EFO_0008919', 2: 'EFO_0008919'}, 'library_preparation_protocol__ontology_label': {0: 'Seq-Well', 1: 'Seq-Well', 2: 'Seq-Well'}, 'age': {0: '50-59', 1: '50-59', 2: '50-59'}, 'Coarse_Cell_Annotations': {0: 'Developing Ciliated Cells', 1: 'Developing Ciliated Cells', 2: 'Developing Ciliated Cells'}, 'Detailed_Cell_Annotations': {0: 'Developing Ciliated Cells', 1: 'Developing Ciliated Cells', 2: 'Developing Ciliated Cells'}}
전체 세포 수: 32588
유일한 환자(donor) 수: 58
유일한 biosample 수: 58
🦠 COVID 감염 여부 분포: SARSCoV2_PCR_Status
pos 18073
neg 14515
Name: count, dtype: int64
질병코드 : disease
MONDO_0100096 18073
PATO_0000461 8874
MONDO_0021113 3335
MONDO_0100233 2306
Name: count, dtype: int64
질병이름 : disease__ontology_label
COVID-19 18073
normal 8874
respiratory failure 3335
long COVID-19 2306
Name: count, dtype: int64
🧪 (Coarse_Cell_Annotations) 종류:18개.
Coarse_Cell_Annotations
Ciliated Cells 10059
Squamous Cells 5250
Developing Ciliated Cells 3854
Secretory Cells 3633
Goblet Cells 2807
Basal Cells 1691
T Cells 1475
Erythroblasts 986
Macrophages 903
Deuterosomal Cells 583
Developing Secretory and Goblet Cells 406
Ionocytes 399
Mitotic Basal Cells 266
Dendritic Cells 142
B Cells 71
Enteroendocrine Cells 41
Plasmacytoid DCs 13
Mast Cells 9
Name: count, dtype: int64
📉 누락값이 있는 컬럼:
Series([], dtype: int64)
📋 전체 컬럼명 리스트: 27 개
NAME
donor_id
Peak_Respiratory_Support_WHO_Score
Bloody_Swab
Percent_Mitochondrial
SARSCoV2_PCR_Status
SARSCoV2_PCR_Status_and_WHO_Score
Cohort_Disease_WHO_Score
biosample_id
SingleCell_SARSCoV2_RNA_Status
SARSCoV2_Unspliced_TRS_Total_Corrected
SARSCoV2_Spliced_TRS_Total_Corrected
SARSCoV2_NegativeStrand_Total_Corrected
SARSCoV2_PositiveStrand_Total_Corrected
SARSCoV2_Total_Corrected
species
species__ontology_label
sex
disease
disease__ontology_label
organ
organ__ontology_label
library_preparation_protocol
library_preparation_protocol__ontology_label
age
Coarse_Cell_Annotations
Detailed_Cell_Annotations
전처리 완료한 icb.h5ad 데이터셋
- AnnData object with n_obs × n_vars = 9292 × 824
- 총 세포 수 : 9,292개 * (9292, 197) DataFrame
- 유전자 정보 : 824개 *(824,) Index
- 세포에 대한 정보가 .obs 에 담겨있다. * (9292, 197) DataFrame
- 유전자에 대한 정보가 .var 에 담겨 있다. *(824,) Index
import scanpy as sc
adata = sc.read_h5ad("icb.h5ad")
print(adata) # 전체 구조 요약
print(adata.X.shape) # (9292, 824)
import pandas as pd
df_X = pd.DataFrame(adata.X, columns=adata.var.index)
print(df_X.head())
print("==== column : 824개 유전자 ====")
print(adata.var.shape) # (824, 0) → 유전자 정보 (보통 이름만 있으면 (824, 0))
print(adata.var.index) # 유전자 이름 5개만 보기 # Index(['HAVCR2', 'CTLA4', 'PDCD1', 'IDO1', 'CXCL10'], dtype='object')
print(adata.var.head()) # 유전자 이름
print("==== row : 9292개 세포의 정보 197가지 ====")
print(adata.obs.shape) # (9292, 197) → 각 세포의 메타데이터
print(adata.obs.columns[:50]) # 메타데이터 컬럼
print(adata.obs.columns[50:100]) # 메타데이터 컬럼
print(adata.obs.columns[100:150]) # 메타데이터 컬럼
print(adata.obs.columns[150:]) # 메타데이터 컬럼
print(adata.obs.head()) # 행 샘플
결과는 아래의 '접은 글' 참조
(venv) kim89@ailab-System-Product-Name:~/hier-mil$ python data/icb/icb_h5ad_analysis.py
AnnData object with n_obs × n_vars = 9292 × 824
obs: 'sample_id', 'cell_id', 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'biosample_id', 'species', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'sex', 'cell.type', 'flow', 'X', 'Y', 'Gender', 'Primary Location', 'Immunotherapy #1', 'Immunotherapy #2', 'Immunotherapy #3', 'Immunotherapy #4', 'Targeted Therapy (dates)', 'CNS metastasic sites', 'Systemic sites of metastasis', 'SNaPshot Mutations', 'Location of surgery #1', 'Surgery #1 Single-cell ID', 'Location of surgery #2', 'Surgery #2 Single-cell ID', 'Location of surgery #3', 'Surgery #3 Single-cell ID', 'Location of surgery #4', 'Surgery #4 Single-cell ID', 'Pre/post ICI', 'outcome', 'Presence of necrosis on H&E', 'Var.41', '.1', '.2', '.3', '.4', '.5', '.6', 'donor_id_prepost', 'donor_id_responder', 'enough_cells', 'donor_id_prepost_responder', 'pre_post', 'Study_name', 'Cancer_type', 'Primary_or_met', 'sample_id_pre_post', 'total_cell_per_patient', 'cell_type_for_count', 'total_T_Cell', 'normalized_CD8_totalcells', 'RNA_snn_res.0.8', 'seurat_clusters', 'treatment', 'sort', 'cluster', 'UMAP1', 'UMAP2', 'Tumor.Type', 'Treatment', 'Ongoing.Vismodegib.treatment', 'Prior.treatment', 'Response', 'Best...change', 'scRNA.pre.site', 'scRNA.days.pre.treatment', 'scRNA.post.site', 'scRNA.days.post.treatment', 'Adaptive.pre.site', 'Adaptive.days.pre.treatment', 'Adaptive.post.site', 'Adaptive.days.post.treatment', 'PBMC.Adaptive.days.pre.treatment', 'PBMC.Adaptive.days.post.treatment', 'Exome.pre.site', 'Exome.days.pre.treatment', 'Exome.post.site', 'Exome.days.post.treatment', 'epi', 'sample_id_outcome', 'cell_type_for_count.x', 'cell_type_for_count.y', 'total_T_Cell_only', 'normalized_CD8_actual_totalcells', 'cell.types', 'treatment.group', 'Cohort', 'no.of.genes', 'no.of.reads', 'NAME', 'LABELS', 'tumor', 'immune_outcome', 'Immune_resistance.up', 'Immune_resistance.down', 'OE.Immune_resistance', 'OE.Immune_resistance.up', 'OE.Immune_resistance.down', 'no.genes', 'log.no.reads', 'technology', 'n_cells', 'patient', 'age', 'smoking_status', 'PY', 'diagnosis_recurrence', 'disease_extent', 'AJCC_T', 'AJCC_N', 'AJCC_M', 'AJCC_stage', 'sample_primary_met', 'size', 'site', 'histology', 'genetic_hormonal_features', 'grade', 'KI67', 'chemotherapy_exposed', 'chemotherapy_response', 'targeted_rx_exposed', 'targeted_rx_response', 'ICB_exposed', 'ICB_response', 'ET_exposed', 'ET_response', 'time_end_of_rx_to_sampling', 'post_sampling_rx_exposed', 'post_sampling_rx_response', 'PFS_DFS', 'OS', 'total_T.CD8', 'timepoint', 'cellType', 'cohort', 'treatment_info', 'Cancer_type_pre_post', 'sample', 'id', 'Type', 'No.a', 'Sex', 'Age..Years.b', 'Race', 'Diagnosis', 'Stage', 'Etiology', 'Biopsy.Timingc', 'Treatmentd', 'Mode.of.Actione', 'set', 'Sample', 'Source', 'Stage.y', 'Mode.of.Actione_2', 'sample_id_Mode.of.Actione_2', 'ICB_Exposed', 'ICB_Response', 'TKI_Exposed', 'Initial_Louvain_Cluster', 'Lineage', 'InferCNV', 'FinalCellType', 'sex.x', 'cancer_type', 'sex.y', 'treated_naive', 'Cancer_type_update', 'Outcome', 'Combined_outcome', 'Malignant_clusters', 'patient_ID', 'pre_post_outcome', 'percent.mito', 'percent.ribo', 'pANN_0.25_0.21_50', 'DoubletFinder', 'pANN_0.25_0.21_642', 'pANN_0.25_0.21_61', 'pANN_0.25_0.21_7', 'pANN_0.25_0.21_18', 'pANN_0.25_0.21_94', 'pANN_0.25_0.21_6', 'pANN_0.25_0.21_35', 'Study_name_cancer', 'label', 'cell_type_annotation'
(9292, 824)
HAVCR2 CTLA4 PDCD1 IDO1 CXCL10 CXCL9 HLA-DRA STAT1 IFNG CD3E GZMK CD2 CXCL13 IL2RG ... WT1 TET2 ZRSR2 PTPN11 EZH2 TP53 CALR STAG2 CEBPA CUX1 U2AF1 EP300 PHF6 KRAS
0 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.000000 3.111702 3.111702 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 2.892357 0.000000 2.892357 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 2.892357 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 3.905236 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[5 rows x 824 columns]
==== column : 824개 유전자 ====
(824, 0)
Index(['HAVCR2', 'CTLA4', 'PDCD1', 'IDO1', 'CXCL10', 'CXCL9', 'HLA-DRA',
'STAT1', 'IFNG', 'CD3E',
...
'EZH2', 'TP53', 'CALR', 'STAG2', 'CEBPA', 'CUX1', 'U2AF1', 'EP300',
'PHF6', 'KRAS'],
dtype='object', length=824)
Empty DataFrame
Columns: []
Index: [HAVCR2, CTLA4, PDCD1, IDO1, CXCL10]
==== row : 9292개 세포의 정보 197가지 ====
(9292, 197)
Index(['sample_id', 'cell_id', 'orig.ident', 'nCount_RNA', 'nFeature_RNA',
'biosample_id', 'species', 'species__ontology_label', 'disease',
'disease__ontology_label', 'organ', 'organ__ontology_label',
'library_preparation_protocol',
'library_preparation_protocol__ontology_label', 'sex', 'cell.type',
'flow', 'X', 'Y', 'Gender', 'Primary Location', 'Immunotherapy #1',
'Immunotherapy #2', 'Immunotherapy #3', 'Immunotherapy #4',
'Targeted Therapy (dates)', 'CNS metastasic sites',
'Systemic sites of metastasis', 'SNaPshot Mutations',
'Location of surgery #1', 'Surgery #1 Single-cell ID',
'Location of surgery #2', 'Surgery #2 Single-cell ID',
'Location of surgery #3', 'Surgery #3 Single-cell ID',
'Location of surgery #4', 'Surgery #4 Single-cell ID', 'Pre/post ICI',
'outcome', 'Presence of necrosis on H&E', 'Var.41', '.1', '.2', '.3',
'.4', '.5', '.6', 'donor_id_prepost', 'donor_id_responder',
'enough_cells'],
dtype='object')
Index(['donor_id_prepost_responder', 'pre_post', 'Study_name', 'Cancer_type',
'Primary_or_met', 'sample_id_pre_post', 'total_cell_per_patient',
'cell_type_for_count', 'total_T_Cell', 'normalized_CD8_totalcells',
'RNA_snn_res.0.8', 'seurat_clusters', 'treatment', 'sort', 'cluster',
'UMAP1', 'UMAP2', 'Tumor.Type', 'Treatment',
'Ongoing.Vismodegib.treatment', 'Prior.treatment', 'Response',
'Best...change', 'scRNA.pre.site', 'scRNA.days.pre.treatment',
'scRNA.post.site', 'scRNA.days.post.treatment', 'Adaptive.pre.site',
'Adaptive.days.pre.treatment', 'Adaptive.post.site',
'Adaptive.days.post.treatment', 'PBMC.Adaptive.days.pre.treatment',
'PBMC.Adaptive.days.post.treatment', 'Exome.pre.site',
'Exome.days.pre.treatment', 'Exome.post.site',
'Exome.days.post.treatment', 'epi', 'sample_id_outcome',
'cell_type_for_count.x', 'cell_type_for_count.y', 'total_T_Cell_only',
'normalized_CD8_actual_totalcells', 'cell.types', 'treatment.group',
'Cohort', 'no.of.genes', 'no.of.reads', 'NAME', 'LABELS'],
dtype='object')
Index(['tumor', 'immune_outcome', 'Immune_resistance.up',
'Immune_resistance.down', 'OE.Immune_resistance',
'OE.Immune_resistance.up', 'OE.Immune_resistance.down', 'no.genes',
'log.no.reads', 'technology', 'n_cells', 'patient', 'age',
'smoking_status', 'PY', 'diagnosis_recurrence', 'disease_extent',
'AJCC_T', 'AJCC_N', 'AJCC_M', 'AJCC_stage', 'sample_primary_met',
'size', 'site', 'histology', 'genetic_hormonal_features', 'grade',
'KI67', 'chemotherapy_exposed', 'chemotherapy_response',
'targeted_rx_exposed', 'targeted_rx_response', 'ICB_exposed',
'ICB_response', 'ET_exposed', 'ET_response',
'time_end_of_rx_to_sampling', 'post_sampling_rx_exposed',
'post_sampling_rx_response', 'PFS_DFS', 'OS', 'total_T.CD8',
'timepoint', 'cellType', 'cohort', 'treatment_info',
'Cancer_type_pre_post', 'sample', 'id', 'Type'],
dtype='object')
Index(['No.a', 'Sex', 'Age..Years.b', 'Race', 'Diagnosis', 'Stage', 'Etiology',
'Biopsy.Timingc', 'Treatmentd', 'Mode.of.Actione', 'set', 'Sample',
'Source', 'Stage.y', 'Mode.of.Actione_2', 'sample_id_Mode.of.Actione_2',
'ICB_Exposed', 'ICB_Response', 'TKI_Exposed', 'Initial_Louvain_Cluster',
'Lineage', 'InferCNV', 'FinalCellType', 'sex.x', 'cancer_type', 'sex.y',
'treated_naive', 'Cancer_type_update', 'Outcome', 'Combined_outcome',
'Malignant_clusters', 'patient_ID', 'pre_post_outcome', 'percent.mito',
'percent.ribo', 'pANN_0.25_0.21_50', 'DoubletFinder',
'pANN_0.25_0.21_642', 'pANN_0.25_0.21_61', 'pANN_0.25_0.21_7',
'pANN_0.25_0.21_18', 'pANN_0.25_0.21_94', 'pANN_0.25_0.21_6',
'pANN_0.25_0.21_35', 'Study_name_cancer', 'label',
'cell_type_annotation'],
dtype='object')
sample_id cell_id orig.ident nCount_RNA ... pANN_0.25_0.21_35 Study_name_cancer label cell_type_annotation
Row.names ...
Breast_previous_Breast_BIOKEY_10_Pre_AAAGCAAAGC... BIOKEY_10 BIOKEY_10_Pre_AAAGCAAAGCGTCTAT-1 BIOKEY 407 ... NaN Bassez:TNBC 1 Mesangial cells
Breast_previous_Breast_BIOKEY_10_Pre_AAATGCCGTT... BIOKEY_10 BIOKEY_10_Pre_AAATGCCGTTAGGGTG-1 BIOKEY 411 ... NaN Bassez:TNBC 1 HSC
Breast_previous_Breast_BIOKEY_10_Pre_AACTCTTGTA... BIOKEY_10 BIOKEY_10_Pre_AACTCTTGTAACGTTC-1 BIOKEY 466 ... NaN Bassez:TNBC 1 Mesangial cells
Breast_previous_Breast_BIOKEY_10_Pre_AACTGGTAGT... BIOKEY_10 BIOKEY_10_Pre_AACTGGTAGTACATGA-1 BIOKEY 587 ... NaN Bassez:TNBC 1 B-cells
Breast_previous_Breast_BIOKEY_10_Pre_AACTTTCAGG... BIOKEY_10 BIOKEY_10_Pre_AACTTTCAGGATCGCA-1 BIOKEY 411 ... NaN Bassez:TNBC 1 Adipocytes
[5 rows x 197 columns]