A first approach for further investigation of the populations' genetic affinities across the PC1/PC2 space was performed using PCA on the populations from the published Indian samples. Figure 2B shows the distribution of the south Indian populations in the PC1/PC2 space. This type of analysis was chosen over the population-based PCA of Indian populations presented in Figure 2A because it excludes the influence of the Andamanese and the Burmese population (upper right corner of the plot), which is not comparable to the north Indian populations. Nonetheless, the southern populations separate into three groups. Although there are substantial differences between the positions of the different groups of the south Indians, on the PC1/PC2 plot they are also located on a north to south cline. In this way, they are in agreement with the western, northern and eastern groups of the Indian populations, respectively. The grouping of the south Indian populations is in agreement with the known distribution of the different ethnic groups, and it is also in agreement with the results from the PCA on the entire Indian populations (Figure 2A).
In short, the PCA results confirm the linguistic affiliations of the Indian populations and reveal the distinct genetic makeup of the Austroasiatic-speaking populations, as well as the interplay of their ancestry with that of other Southeast Asian populations in the PC1/PC2 space. The results also support a separate position of the Andamanese from all other Indian populations.
In a similar way, the populations from the Andaman Islands in the westernmost part of this study are also placed on the opposite side of the Indian population and away from the Asian populations, a result corroborating the suggested genetic affinity of the Andamanese to Native Americans26 and their proposed connection to the Denisovans.27,28 The results from the Andamanese samples are also confirmed by a recent study, which found them to be distinct in several different aspects29.
The additional two Austroasiatic-speaking populations, on the other hand, are placed opposite the Dravidic-speaking populations, which probably reflect their distinct origins in Southeast Asia. This is also in agreement with the proposed language relationships25 and highlights the influence of the Dravidian language family on the Austroasiatic language group.
The location of the Arabian populations (Figure 2A) on the PC1/PC2 plot (Figure S12) reflects the intra-regional correlation of language and geography. The position of the Arabian Austroasiatic speakers is in agreement with a former proposal that the Arabian languages are a late addition to the ASI6,69 which likely occurred in the Arabian Peninsula as a consequence of contact with the Persian-speaking population.70 The association of South Asian populations with a South Asian axis (Figure 2A) mirrors the situation of the Austroasiatic speakers in India, whom have been previously suggested to have an origin in South Asia.21 827ec27edc