乳腺癌转移过程中的异常发育途径

  • 2020 年 3 月 30 日
  • 筆記

当你的才华还撑不起你的野心时,请潜下心来,脚踏实地,跟着我们慢慢进步。不知不觉在单细胞转录组领域做知识分析也快两年了,通过文献速递这个栏目很幸运聚集了一些小伙伴携手共进,一起成长。

文献速递栏目通过简短介绍,扩充知识面,每天关注,希望你也能有所收获!

文章信息

今天分享的文章是bioRxiv的预印本文章,文章研究概括了在PABC模型致癌过程中出错的发育机制。文章题目是:Single-cell RNAseq uncovers involution mimicry as an aberrant development pathway during breast cancer metastasis

Abstrac

使用Drop-seq,于pregnancy-associated breast cancer (PABC) transgenic mouse model (Elf5 overexpreesion),展示了其breast tumor中的cellular composition和functional diversity:1. 推断出了mammary epithelial cells各个subpopulation的lineage;2. 揭示了PABC中cancer progression的机理:由alveolar milk secretory cells主导,经由多种Tumor microenvironment (TME) 中细胞的协助,形成的异常involution过程;3. 展示了TME中的cellular & molecular pathway network:involution过程中不乏各类细胞之间的interactions,并具有ECM remodeling与inflammation的特征

Background

Workflow

Results

1. General cell population characterization

1.1 Unbiased high-resolution scRNAseq captures cell heterogeneity of MMTV-PyMT mammary tumours

Figure 1.

A) Experimental workflow showing a schematic representation of the transgenic MMTV-PyMT/Elf5 mouse model, the number of tumours analysed and the number of cells passing the QC filter in each genotype. B) Distribution of variable genes defined by expression and dispersion, highlighting typical canonical makers for each of the main lineages. C) Heatmap showing differential expression of the top expressed genes contributing to the epithelial, stromal and immune signature. The top right panel shows the score of the signature and the percentage of cells classified to each main cell lineage analysed. The tSNE visualisation shows the coordinates of each analysed cell after dimensional reduction coloured by its main cell lineage. Roman numerals define each of the spatially formed clusters (inset). The dot plot shows the top differential markers form each of the main cell lineages and their level of expression. Bottom left panel shows a representative contour plot of the cell composition of a MMTV-PyMT tumor analysed by FACS defined by EpCAM antibodies (epithelial cells), CD45 (leukocytes) and double negative cells (stroma). Violin plot shows distribution of the number of genes per cell in each of the main cell lineages. D) Feature tSNE plots showing the expression of typical canonical markers of each of the main cell lineages.

粗略的细胞分群:

  • Marker gene (labeled) : average expression vs dispersion
  • Cell type identification: FAC sorted thus only 3 types epithelial, immune, stromal, 其marker gene的表达和tumor中这三种成分的proportion

Figure 2.

A) tSNE plot showing cell clusters defined in each of the main cell lineages and their relative frequency. Far right column depicts the main cell lineage of origin for each cluster, showing 6 clusters of epithelial origin, 5 immune and 5 stromal. B) Heatmap showing the top differentially expressed genes that define each of the clusters. C) Cluster tree modelling the phylogenic relationship of the different clusters in each of the main cell types compartments at different clustering resolutions. Dashed red line shows the resolution chosen. Coloured circles in the cluster tree represent the origin of the clusters represented in the tSNE plot shown in panel A, the full cluster tree can be found in SuppFig 4A. D) Visualisation of the top differential genes for each of the defined clusters in the immune lineage (top) and in the stroma (bottom). E) Cell identification using score values for each of the metasignatures of the xCell algorithm in the immune and stromal compartment divided by cluster.

更细致一些的细胞分群:

  • K-means clustering-based subclusters: 6 群epithelial, 5群immune, 5群stromal
  • 亚群之间建立deduced phylogenic tree
  • 亚群的Cell type identification:对stromal & immune cells: 基于各种megasignature打分 xCell(同上,显示了marker gene expression和亚群内的细胞组分及其proportion)

2. Epithelial cell population: assign lineages for subpopulations and compare Elf5 OE vs WT models

2.2. PyMT cancer cells are organised in a structure that resembles the mammary gland epithelial hierarchy

Figure 3.

A) tSNE visualisation of the cell groups defined by k-means clustering analysis. Bottom panel shows a gene-expression heatmap of the top expressed genes for each cell cluster. B) Distribution of cells by genotype in the defined tSNE dimensions. The Sankey diagram shows the contribution of each of the genotypes to the cell clusters. Cluster numbers are coloured by the dominant genotype (>2-fold cell content of one genotype), Elf5 (red), WT (green). Violin plots showing Elf5 (upper plot) and PyMT (bottom plot) expression in each cell cluster. C) Scatter plot showing FACS data to define the % alveolar versus luminal progenitors using canonical antibodies that define the epithelial mammary gland hierarchy (EpCAM, CD49f, Sca1 and CD49b), in PyMT tumours. Each dot represents one animal (WT n = 6 and Elf5 n = 5); bottom panels are representative FACS plots of one of the replicates for each genotype. D) Dot plot representing the expression level (red jet) and the number of expressing cells (dot size) of the transcriptional mammary gland epithelium markers in each PyMT cluster. These marker genes were grouped according to each mammary epithelial cell type as defined by Bach, et al.: Hormone sensing differentiated (Hs-d, dark pink), Hormone-sensing progenitor (Hs-p, light pink), Luminal progenitor (LP, orange), Alveolar differentiated (Alv-d, dark red), Alveolar progenitor (Alv-p, light red), Basal (B, light purple), Myoepithelial (Myo, dark purple), Undifferentiated (Multi, light blue). E) Dot plot of the expression level of the top differential marker genes in each of the PyMT clusters coloured by genotype. The yellow rectangles highlight the top genes represented by each cluster. The size of the dots represents the percentage of cells/cluster that express each particular gene (pct. exp) and the colour gradient shows the level of expression for each gene/cluster. Note both colours are shown only when the cluster was populated similarly by both genotypes according to panel B.

分析epithelial cells, :

  • 两种分类
  • de novo clustering产生了11个subclusters (unknown biological significance)
  • by genotype: wt & Elf5 overexpression(OE) PyMT
  • 两种目的
  • 通过literature-derived marker genes,annotate identities of these 11 subclusters (Aim A)
  • 通过previous findings about Elf5 OE,对比WT & Elf5 OE的phenotype differences (Aim B)
  • Aim A – mapping clusters with lineage hierarchy:
  • alveolar vs luminal在两种genotype(wt & Elf5 OE)之间的proportion差异(FACS by surface marker) => Elf5 forced differentiation of the luminal progenitors (into alveolar cells)
  • mammary gland epithelium markers (Bach et al)在各个subcluster内部的表达;相当于正向annotation:对de novo cluster用已知signatures进行annotate => CLUSTER: 0,7 – stem cells; 9 – hormone sensing; 8: myoepithelial; 2,3,4 – luminal progenitor; 1, 6 – luminal, alveolar differentiated; 5, 10 – undefined (mixture of luminal and myoepithelial features)

2.2 Dynamic relationship and states of the malignant lineages of PyMT tumours: implications for the cell of origin of cancer

Figure 4.

A) Pseudotiming alignment of the PyMT cancer epithelial cells along the gene signatures that define the main lineages of the mammary gland epithelial hierarchy using the DDRTree method in Monocle2. Right panels show the distribution of the cell states by genotype. B) Projection of the states defined by pseudotime analysis into tSNE clustering coordinates overall and per pseudotime state (miniaturised tSNE plots). Right panels show the projection by genotype. C) Overlay representation of the cell identities (k-means clustering as per Fig.3) and cell lineage identification (pseudotime analysis), the proportion of cells in each cluster that belong to each defined state is shown in the bar chart (right hand side). D) Enrichment analysis (GSVA score) for the gene signatures that define the main mammary gland lineages: Basal, Luminal Progenitor (LP) and Mature Luminal (ML) for each of the clusters. Bottom panels show the expression of each of the gene signatures at single cell resolution. The top bar shows the assigned mammary epithelial cell type as per section C. E) Frequency of the different cell lineages in each genotype. F)Cluster tree showing the phylogeny relationship of the different clusters. Red arrow shows the resolution used (0.7). G)Illustration of cell diversity of PyMT tumours based on the canonical structure of the mammary gland epithelial lineages.

Aim A – 使用另一套mammary gland hierarchy gene signature(Pal et al)建立trajectory并划分出7个pseudotime states;并对比这两套分类方式下,clusters之间的对应关系 (I personally consider it to be kind of redundant)

  • Aim B: Elf5 OE 明显驱动cell population towards state 7 (alveolar lineage) from state 2, 4, 5, 6
  • 对tSNE的clusters用GSVA进行annotation;对tSNE的clusters进行pseudotime states的annotation
  • => 最终建立了clusters之间的phylogenic tree

Altogether, the combination analysis of gene signatures, gene markers and pseudotiming enabled the precise annotation of PyMT cell clusters within the mammary hierarchy proposed by Pal et al., identifying a large luminal lineage that retains most of the cell diversity and strong plasticity, a basal/myoepithelial compartment and a hormone-sensing lineage

2.3. Elf5 OE vs WT: molecular effects => upregulated involution signatures

Molecular mechanisms of cancer progression associated to cancer cells of Alveolar origin

Figure 5.

A) Cell cycle stages of the PyMT cancer cells as defined by gene expression signatures using tSNE coordinates and their deconvolution (middle panel). Circled area shows the cycling cluster (C5 in Fig. 3) characterised by a total absence of G1 cells. The quantification of the proportion of cells in each stage grouped by genotype is shown in the bar chart. B) Enrichment GSVA analysis of gene expression metasignatures of cancer-related and Elf5-related hallmarks associated to PyMT/WT (green) and /Elf5 (red) tumours. C) tSNE representation of the EMT gene expression metasignature at the single cell level. Right panel shows a western blot of canonical EMT markers (E-Cadh, E-Cadherin and Vim, vimentin) on PyMT/WT or ELF5 full tumour lysates. Note: the two images correspond to the same western blot gel cropped to show the relevant samples. D) Hypoxia metasignature at the single cell level is shown in the tSNE plot, bottom panel shows a bar plot of the extension of the hypoxic areas in PyMT/WT (green) and /Elf5 (red) tissue sections (tumours and lung metastasis) stained using IHC based on hypoxyprobe binding, representative images are shown in the right panels. E) Lactation and Late Involution (stage 4, S4) metasignatures at the single cell level is shown in the tSNE plots. Pictures show IHC with an anti-milk antibody in tissue sections from a lactating mammary gland at established lactation compared with a mammary gland from an aged-matched virgin mouse; and in PyMT/WT and Elf5 tumours. F) Kaplan-Meier survival curves based on Elf5 expression using the METABRIC cohort. Patient were segregated according to Elf5 expression levels based on tertiles. Elf5-high patients (red) were defined as the top-tertile and Elf5-low patients (green) as the bottom-tertile, Log-rank p values <0.05 are shown in red. The bar chart (bottom panel) corresponds to the distribution of the PAM50 classified breast cancer subtypes in the top and bottom Elf5 expressing tertile of patients. G) Upper panel: Survival analysis (Kaplan-Meier curves) for the expression of Elf5 (left hand side) and the Involution metasignature (right hand side) in luminal breast cancer patients as per section F) Bottom panel: Kaplan-Meier survival curves for the late involution metasignature in Elf5-high patients (left hand side, ELF5-H) and Elf5-low patients (right hand side, ELF5-L). Each group of patients (ELF5-H and ELF5-L) were segregated according to tertiles for the combined expression levels of the genes from the involution metasignature: Green, inv low, bottom third; Blue, inv mid, middle third and Red, inv high, top third. Log-rank p values <0.05 are shown in red.

  • 验证previous发现的 ELF5 OE tumor model的特征,因此大多都是explanatory results:
  • cell cycle => G1% ++ (consistent w/ previous findings that Elf5 decrease cell proliferation)
  • GSVA for specific signatures of known functions: EMT, hypoxia, lactation and late involution;并实验验证
  • EMT: WB – Elf5 OE shows reduction of Vimentin, thus indicating MET
  • Hypoxia: 用hypoxyprobe去quantify hypoxic region area(面积%)- Elf5 OE shows extensive hypoxic regions
  • Lactation & involution: IHC – Elf5 OE shows strong milk production features => (由于生物学已知,involution ~ mik stasis) 发现的确an involution gene signature在 富集alveolar cells 的 luminal lineage这部分subpopulation中高表达
  • 分析Elf5 OE及其相关的Lactation & involution signatures的临床价值(prognosis significance)
  • survival analysis:
  • 已知hi Elf5 expression ~ poor prognosis in luminal subcohort of patients
  • 验证Lactation & involution signatures:只有involution signatures,且only in Elf5 hi patients,其high expression ~ poor prognosis
  • (Caution !=> Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002240)
  • PAM50: ELF5-hi/low中subtypes的分布(因为PAM50和prognosis也很相关):Basal subytpe%在ELF5-hi中明显增高

至此,将Elf5 OE的下游影响focus on involution process

3. Fibroblasts cell population: assign lineages for subpopulations and compare Elf5 OE vs WT models

3.1 Characterisation of cancer-associated fibroblasts in PyMT tumours

Figure 6.

A) tSNE plot groups defined by k-means clustering analysis showing a total of three cell clusters defined within the fibroblast subtype. B) Metasignatures of Cancer-associated fibroblast (CAF) signature (upper plot) and myfibroblasts (bottom panel) plotted in the fibroblast tSNE. The gene list of each metasignature was manually annotated from published scRNAseq data in human tumours 64,65. C) Desmoplastic (upper plot), Inflammatory (middle plot) and Contractile (bottom plot) metasignatures plotted in the fibroblast tSNEs from public data 67. D) Violin plots displaying marker genes for each of the three fibroblast clusters defined in section A: ECM-CAFs (0), immune-CAFs (iCAFs, 1) and myofibroblasts (2). E) Upper plot: tSNE illustration of the involution signature from 62. Middle plot: tSNE plot defined by k-means clustering analysis at resolution 1 of the fibroblast population showing a total of nine cell clusters. Bottom section: Violin plots on these nine cell clusters of the three out of four genes from the involution signature. F) Upper plot: Distribution of fibroblasts by genotype (Elf5: red), WT: green) in the defined tSNE dimensions. Bottom plot: Sankey diagram showing the contribution percentage of each of the genotypes to the cell clusters. Cluster numbers are coloured by the dominant genotype (>2-fold cell content of one genotype), Elf5 (red), WT (green). G) GSVA enrichment analysis of involuting mammary fibroblast metasignatures associated to PyMT/WT and /Elf5 tumours. Violin plots of Cxcl12, Mmp3 and Col1a1 genes in all fibroblasts of each genotype. PyMT/WT (green) PyMT/Elf5 (red).

对fibroblasts群体:

  • identify subpopulations:
  • 基于metasignatures区分CAF & myfibroblasts(图示各signature的表达情况): 0,1 – CAFs, 2 – myofibroblasts
  • 三个subcluster, 及其各自enrich的megasignature: 0 – ECM-CAFs (secretory), immune-CAF (iCAFs), contractitle CAF (myofibroblasts) (通过GSVA,用hallmark和addtional CAF function dataset, ref67双重验证)
  • 通过involution marker genes,发现ECM-CAFs和iCAFs中具有involution特征的subpopulation (involution fibroblast)
  • involution ECM-CAF: top enriched pathways和lipid metabolism相关 => 推测其为adipocyte-derived fibroblasts, which are known to have high ECM remodeling potential and invasiveness
  • involution iCAF: top enriched pathways和wound-healing相关, which corresponds to late stages of involution.
  • compare Elf5 OE vs WT
  • GSVA using CAF-involution signature – enriched in CAFs from Elf5 OE than WT (Epithelial education?)
  • 对其进行wet lab验证:CAFs from Elf5/PyMT mammary tumours show features of mimicry involution 通过scRNAseq发现的ELF5 OE中显著富集的involution的phenotype: 基于involuting fibroblast具有fibrillar collagen high activity,quantified test of:
  • Collagen coverage & intensity(SHG imaging)
  • Collagen thickness (Polarised light imaging)
  • Collagen spatial alignment (SHG imaging)
  • 验证了Elf5 OE具有higher fibroblast activity both in collagen deposition & collagen rearrangement

Figure 7.

A) Representative bright field images and quantification of total coverage of picrosirius red-stained PyMT/WT and PyMT/ELF5 tumours sections n=4 mice per genotype with 10 regions of interest (ROI) per tumour. B) Representative maximum intensity projections of SHG signal and quantification of SHG signal intensity at depth (µm) and at peak in PyMT/WT and PyMT/ELF5 tumour sections, n=6 mice per genotype with 6 ROI per tumour. C) Polarised light imaging of picrosirius red stained PyMT/WT and PyMT/ELF5 tumour sections, and quantification of total signal intensity acquired via polarised light. Thick remodelled fibres/high birefringence (red-orange), medium birefringence (yellow) and less remodelled fibres/low birefringence (green) n=4 mice per genotype with 10 ROI. D) SHG images of PyMT/WT and PyMT/ELF5 tumours assessed for differences in fibre orientation angle and quantification of frequency of fibre alignment ranging from the peak alignment. Different colours correspond to specific angles of orientation n=6 PyMT/WT and n=4 PyMT/ELF5. Inset shows the cumulative frequency of fibre alignment +/-10 degrees from peak.

4. Crosstalks between epithelial cells/fibroblasts/immune cells

Characterisation of the cell-to-cell interactions involved in the cancer-associated involution mimicry

A) Heatmap of the cell-cell interactions of all cell types from PYMT tumours based on Cellphone DB. Cell classification was based on the annotation from Figure 3 for the epithelial compartment; from Figure 6 at resolution 1 in the case of fibroblasts, where the cycling cluster (Cluster 6) and the residual cluster of 15 cells (Cluster 8) were removed; in addition, Clusters 7 and 2 were considered as a sole group annotated as “Myofibroblasts”. The rest of the cells from the immune and stromal compartments were classified according to the annotation done in Figure 2 (See Supplementary Figure 9C for global annotation). The Scale at the right-hand side shows the interaction strength based on the statistical framework included in CellphoneDB (count of statistically significant (p<0.01) interactions above mean= 0.3, see methods). B) Graphical representation of all significant cell-cell interactions identified by CellphoneDB using the parameters of more than 10 significant interactions with a mean score greater than 0.3, number cut as more than 10 connections and number split 10. The red circles correspond to the cell types from the epithelial compartment; the blue triangles represent the cells from the stromal compartment and the green squares are the cells from the immune compartment. The size of geometric figures is relative to the number of cells involved in the interactions (display as count). Different number splits were applied to establish the most significant interactions for the fibroblast and immune cell types. Fibroblast showed the strongest interactions (highlighted as blue lines) when a number split of 67 (1st Tier) and 50 (2nd Tier) were used. The immune system showed weaker interactions (highlighted as green lines) at a number split of 15 (3rd Tier) and 11 (4th Tier). C) Representative dot plots of ligand (no background)-receptor (red background) pairs. The size of the circles is relative to the number of cells within each annotated cluster that showed a positive expression of each gene and the blue gradient represents the average scaled expression. D) Violin plots of genes from canonical pathways known to recruit and expand MDSCs. E) Proposed molecular model of involution mimicry driven by Elf5 where CAFs and MDSCs are the major cell types involved.

通过已知的receptor – ligand pair结合其在细胞中的表达,推断intercelluar interactome,

  • particular cell populations:
  • Fibroblasts中:对于ECM-CAFs & involution iCAFs群体:自己内部和自己对他人的interaction很高 (key talkers)
  • Immune cells中:Myeloid cells are the key talkers & hub talkers (talk to epithelial / fibroblasts / endothelium)
  • Particular ligand-receptor interaction between cell types : – aim to find common pathways linked to involution
  • involution – TGF
  • immune suppressive ecosystem – Cxcl12 & Dpp4
  • ECM remodeling – IGF
  • 想要validateprevious findings of increased infiltraion of myeloid-derived suppressor cells (MDSC) in Elf5 OE tumors
  • 因为细胞数量太少,不足以对myeloid cells再subcluster了,因此验证了一些促进MDSC infiltration的genes于各类细胞中的表达 (genes in signaling pathways for MDSC expansion, recruitment and malignant activation)

此部分的分析最终构建了PyMT中,各类cell subgroups之间的interaction图景(但没有强调Elf5 OE和WT的区别)

Comment

这篇文章基于一个很好的模型 (PABC的preclinical transgenic mouse model),该模型的tumor发生过程即可模拟pregnancy associated alveolar epithelium differentiation。虽然没有动态追踪Pregnancy-related tumor的发生过程,但于该过程中的惊鸿一瞥,仔细描述了一个完整的tumor ecosystem,并address了几点breast cancer field的大问题:

  • Cancerous epithelial cell lineage back to normal breast tissue
  • The role of cancer-related fibroblasts and immune cells
  • Communications among different types of cells

其中前两点几乎全部借助前人发现的signature来反复定义样本中的cluster identity – 虽然没有新marker/lineage的发现,但多种方式都指向相似function时,这种定义会更加solid,也有助于第三点对整个ecosystem的构建。这部分因此大多为explanatory,相当于对前人假想的TME interactions进行精密的定量描述。

这篇文章specific的点是Elf5 OE model,由于已知该model的部分biological facts,在做bioinformatics验证时更准确稳妥,也容易有wet lab的validation – 但多数都只是explanatory的results,新的发现是在CAF中也发现了involution related signature,并用collagen detection进行validation;其prognosis power是dependent on Elf5本身的overexpression的,所以并不算非常惊艳;而最后一部分描述interactome时,几乎没有区别展示Elf5 OE与WT(可能是发现没有strong的区别),也基本停留在descriptive层面。