Introduction
The red swamp crayfish (Procambarus clarkii), commonly known as the crayfish, is native to Central and South America as well as the northeastern region of Mexico. It was introduced to Nanjing, China from Japan during the 1920s and 1930s.1 Over the years, crayfish has matured as China’s most widely farmed shrimp species through a combination of market driven initiatives, government support, and consumer demand. However, the limited availability of wild resources, coupled with a mismatch between seed quality and market supply capabilities, has created a bottleneck that hinders the sustainable and scalable development of the P. clarkii industry. The prolonged reliance on self-propagating seedling breeding methods has led to a degradation of germplasm.2 While cross-regional hybrid breeding methods offer a potential solution, they are often fraught with uncertainty, resulting in variable quality within P. clarkii aquaculture stocks. By investigating the population biodiversity of P. clarkii across various regions in China and identifying unique genetic markers specific to different areas, we can facilitate targeted cross-breeding between distinct populations. This will help the target cross-breeding between genetic populations in different regions to impede the genetic decline caused by inbreeding eventually contributing to the sustainable development of the P. clarkii industry in China.
With the advancement and application of molecular biology techniques, an increasing number of scholars are assessing population genetic diversity and structure at the molecular level. Microsatellites, also known as simple sequence repeats (SSRs) or short tandem repeats (STRs), consist of sequences that repeat a small number of nucleotides multiple times in tandem.3 Microsatellite markers are characterized by their widespread distribution in the genome, high quantity, co-dominant inheritance, good polymorphism, ease of operation, and high reproducibility. These features make them extensively used in genetic and evolutionary biology studies, including assessments of population genetic diversity, analysis of genetic structure, identification of kinship, construction of genetic maps, and molecular marker-assisted breeding.4–6 The development and application of second-generation high-throughput sequencing technologies have created new opportunities for mass development of SSR markers, now applied in various aquatic species such as Megalobrama amblycephala,7 Quasipaa spinosa,8 Portunus trituberculatus,9 and Litopenaeus vannamei.10 However, the development of microsatellites based on transcriptomic data in P. clarkii remains relatively underexplored.
Genetic analysis of P. clarkii cultivated strains from various regions can provide a basis for the conservation of genetic resources and scientific breeding of this species. This study aims to identify and screen microsatellite markers using transcriptome sequencing of muscle tissue from P. clarkii, thereby offering a scientific reference for targeted conservation and genetic breeding efforts. Subsequently, thirteen microsatellite loci were selected to analyze the genetic diversity of seven P. clarkii farming populations, clarifying the genetic richness and differentiation among these groups. This analysis is intended to support further studies on the genetic diversity of P. clarkii populations and related breeding programs.
Materials and Methods
RNA extraction and sequencing
Adult samples of P. clarkii from seven different regions (Changsha (CS), Hebei (HB), Hainan (HN), Jiangsu (JS), Qianjiang (QJ), Shishou (SS), and Changde (WT)) were selected for experimentation. Six individuals from each region were sampled, and mixed total RNA was extracted from dorsal muscle tissue. The RNA samples were subjected to quality control, requiring a concentration of ≥ 100 ng/μL, a total amount of > 2 μg, OD260/280 ratios between 1.8 and 2.2, an OD260/230 ≥ 2.0, and a RIN value ≥ 6.5 as assessed by the Agilent 2100 Bioanalyzer. A ds cDNA library with 5’ phosphorylated blunt ends was constructed, followed by high-throughput sequencing on the Illumina Nova platform using a (2×150) bp paired-end sequencing mode, resulting in FastQ data. The library construction and sequencing were outsourced to Shanghai Tianhao Biotechnology Co., Ltd.
SSR loci screening and analysis
Using rnaSPAdes software, the filtered reads were assembled at the transcript level with reference to closely related species genomes available in the NCBI public database (https://www.ncbi.nlm.nih.gov/genome/66438?genome_assembly_id=358089). SSRs were analyzed within the assembled sequences using MIcroSAtellite (MISA) identification tool. The surrounding 50 bp sequences of the SSRs were extended to obtain new sequences, which were aligned to the reference genome to determine the regional coordinates of the SSRs in the reference genome. SSR sequences’ number, repeat types, frequency, and repeat motif types were calculated using Excel software.
Primer design and SSR genotyping
A total of 230 adult P. clarkii specimens were selected from seven different aquaculture populations: 33 from CS, 33 from HB, 33 from HN, 33 from JS, 32 from QJ, 33 from SS, and 33 from YN. DNA was extracted from dorsal muscle tissue of these specimens and subjected to quality assessment using a NanoDrop2000 to ensure genomic DNA quality: concentration ≥ 20 ng/μL and total amount ≥ (200+n*30) ng (where n is the number of panels). Based on an initial transcriptome screening, 20 SSRs were identified, for which eight pairs of high-precision polymorphic microsatellite primers were designed. Additionally, five pairs of polymorphic microsatellite primers were sourced from reported literature.11,12 The sequences of these 13 primer pairs are listed in Table 1. Primer synthesis and subsequent high-throughput sequencing were outsourced to Shanghai Tianhao Biotechnology Co., Ltd. Primers were combined into a multiplex PCR primer panel, and multiple PCR amplifications were conducted using the genomic DNA of the samples as a template. After quality control, amplification products from all multiplex PCR primer panels using the same sample’s genomic DNA were mixed, ensuring equal quantities of amplification products at each locus. Primers with index sequences were used to introduce Illumina-compatible specific tag sequences to the library ends through PCR amplification. Equal amounts of all sample Index PCR amplification products were mixed, and the final FastTarget™ sequencing library was obtained through gel extraction and recovery, followed by high-throughput sequencing on the Illumina HiSeq platform.
Data analysis
The raw sequencing data obtained were subjected to quality control, and SSR genotyping results were derived from the filtered data. Popgene 32 software was used to calculate the number of alleles (Na), effective number of alleles (Ne), Shannon’s diversity index (I), observed heterozygosity (Ho), expected heterozygosity (He), and polymorphic information content (PIC). Additionally, Hardy-Weinberg equilibrium tests were performed on the SSR loci, and a subspecies phylogenetic tree was constructed using the UPGMA method. The genetic structure of the populations was analyzed using the Analysis of Molecular Variance (AMOVA) method. The most suitable number of subpopulations was predicted using Structure software.
Results
Sequencing quality assessment
Sequencing of the seven samples on the Illumina Nova platform generated raw reads, which were filtered to remove low-quality sequences, yielding clean reads. The Q20 values exceeded 98%, and Q30 values were above 95%, with GC content ranging from 49.2% to 50.7%, indicating high sequencing quality (Table 2). These results provide a solid foundation for subsequent data assembly and analysis.
Quantification and distribution of SSR loci in transcriptomes
Using the MAS software, a total of 11,304 SSR loci were identified across the cultured populations of P. clarkii from seven different regions. The types of SSR repeats were primarily categorized into six classes, with dinucleotide and trinucleotide repeats being the most prevalent, comprising 3,699 and 3,055 loci respectively, accounting for 33% and 27% of the total SSRs. This was followed by mononucleotide and tetranucleotide repeats, which numbered 2,537 and 1,486 respectively, representing 22% and 13% of the total. Pentanucleotide and hexanucleotide repeats were less common, each constituting 4% and 1% of the total SSRs (Figure 1a). In the muscle transcriptome of P. clarkii, the number of SSR repeat units ranged from 3 to 19, showing a decreasing trend with increasing repeat unit numbers. The majority of SSR loci (97.29%) had repeat units between 3 and 10; those with 11 to 15 repeats comprised 277 loci, accounting for 2.52% of the total, while those with 16 to 19 repeats were the least, with 21 loci making up 0.19% (Figure 1b). SSR analysis of the transcriptomes from different regions of P. clarkii showed that 572, 203, 698, 428, 709, 522, and 430 region-specific SSR loci were uniquely identified in CS, HB, HN, JS, QJ, SS, and WT, respectively. Additionally, 1,399 SSR loci were shared among all seven populations (Figure 1c).
Polymorphism analysis of eight microsatellite loci developed from transcriptome sequencing
In a sample of 230 individuals, 35 alleles were detected across eight SSR loci, with the number of alleles per locus ranging from 3 to 9 and an average allelic number (Na) of 4.375. The loci MRZY010010061.1, MRZY010037823.1, MRZY010109927.1, and MRZY010268220.1 exhibited the lowest Na, with only three alleles each; MRZY010015679.1 had the highest, with nine alleles. The effective number of alleles (Ne) ranged from 1.754 to 3.413, with an average of 2.589; the Shannon information index (I) ranged from 0.641 to 1.284, averaging 1.038; observed heterozygosity (Ho) ranged from 0.219 to 0.657, with an average of 0.450; expected heterozygosity (He) ranged from 0.430 to 0.707, averaging 0.595; and the PIC ranged from 0.341 to 0.650, averaging 0.527. The loci MRZY010010061.1 and MRZY010109927.1 had a PIC of 0.25 to 0.50, indicating moderate polymorphism, while the remaining six loci had values greater than 0.5, indicating high polymorphism. Significant deviations from Hardy-Weinberg equilibrium were observed at five loci (p < 0.05) potentially due to inbreeding or sampling effects (Table 3). Despite these deviations, all loci demonstrated sufficient polymorphism for population genetic studies, with an average PIC value (0.527) indicating high overall informativeness.
Genetic polymorphism analysis of cultured populations of P. clarkii from seven different regions
Genetic diversity was assessed in 230 samples from cultured populations of P. clarkii originating from seven different regional aquaculture populations using 13 pairs of polymorphic microsatellite primers. The relevant genetic diversity parameters are presented in Table 4. The average number of Na across the seven populations ranged from 3.769 to 4.385, with an overall average of 4.110. The Ne varied from 2.216 to 2.726, Ho ranged from 0.374 to 0.502, and He was between 0.502 and 0.604. The population from Yunnan (YN) exhibited the lowest average PIC at 0.450, whereas the Hebei (HB) population displayed the highest average PIC at 0.533. All seven populations had a PIC greater than 0.450, indicating that these cultured populations possess medium to high levels of genetic polymorphism. Among these, the Hebei (HB) population showed the highest genetic diversity, whereas the YN population had the lowest.
Genetic differentiation analysis of P. clarkii cultured populations from seven different regions
Based on the genetic distances among the seven cultured populations of P. clarkii, a phylogenetic tree was constructed using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) to visually represent the phylogenetic relationships between these populations. As depicted in Figure 2a, the seven populations were divided into two major clades. The YN and QJ populations formed one clade independently, while the other five populations grouped into another clade, beginning with the Shishou, Hubei (SS) and Hainan (HN) populations, followed by the Changsha, Hunan (CS), Jiangsu (JS), and Hebei (HB) populations.
Population structure analysis using the Bayesian clustering approach in the Structure software indicated the most likely number of genetic clusters. The relationship between Δk values and the number of clusters (k) showed that the first peak of Δk corresponded to k=3, suggesting that three clusters represent the most probable population structure (Figure 2b). When k=3, a clustering diagram of the seven populations was constructed, as shown in Figure 2c, indicating some degree of gene flow among these geographically distinct groups.
The variation within and between the seven cultured populations of P. clarkii, involving 230 individuals. The results indicated that 6.48% of the genetic differentiation originated between populations, while 93.52% arose from within populations (Table 5).
Discussion
SSR markers are neutral and largely unaffected by environmental pressures, making them particularly effective in objectively reflecting the fundamental genetic differences between individuals from different geographical regions. Thus, SSRs are highly effective and accurate for detecting genetic variations within organisms.13 Compared to traditional methods of SSR marker development, the application of transcriptome sequencing technology has significantly enhanced the efficiency and accuracy of SSR marker development, providing a powerful tool for genetic research and applications.5 In recent years, high-throughput transcriptome sequencing has been extensively applied to the development of SSR markers in aquatic animals. Studies on species such as Squaliobarbus curriculus,14 Pelteobagrus fulvidraco,15 and Macrobrachium rosenbergii16 have used this technology to develop SSR markers in bulk and analyze the genetic structure and diversity of different cultured populations, laying a foundation for further assessment of species genetic diversity and the innovative utilization of germplasm resources. In this experiment, SSR genetic markers were screened at the transcriptome level from P. clarkii sourced from seven different regions nationwide, successfully identifying 11,304 SSR loci. Microsatellites show interspecies variability in distribution and abundance; generally, more evolutionarily advanced species have a higher proportion of lower-order repeat units in their microsatellites.17 The SSRs in P. clarkii predominantly consist of dinucleotide repeats, consistent with reports in Penaeus monodon.18 This pattern is also observed in other aquaculture species, such as Coilia nasus19 and Coreius guichenoti,20 where the distribution of SSR types in the transcriptomes follows this rule. The PIC is an important metric for measuring the polymorphism of genetic markers. In this study, the PIC values of the eight developed SSR loci ranged from 0.341 to 0.650. According to PIC criteria, values greater than 0.5 indicate high polymorphism, values between 0.25 and 0.50 suggest moderate polymorphism, and values less than 0.25 indicate low polymorphism. The microsatellites developed in this study all exhibited high polymorphism (averaging 0.527), making them highly practical for genetic studies in populations of this species.
Genetic diversity is a critical component of biodiversity, with higher genetic diversity indicating greater evolutionary potential and a stronger capacity for species to adapt to environmental changes.21 Preserving high genetic diversity within populations is therefore essential for long-term species sustainability.22 Understanding the genetic diversity of P. clarkii populations from different regions is a crucial prerequisite for scientific breeding. Thus, this study utilized 13 polymorphic SSR markers to analyze the germplasm resources of P. clarkii cultured populations from seven different regions. The selected loci exhibited moderate to high polymorphism, enabling a more accurate reflection of the true genetic diversity levels within these groups. The results indicated that the sampled populations from the seven regions generally exhibited moderate to high levels of genetic diversity (average PIC ranging from 0.450 to 0.533). These levels were lower than those found by Gao et al.23 in their analysis of P. clarkii populations from six typical lake regions in Jiangsu, Shandong, Hunan, Anhui, Jiangxi, and Hubei (average PIC ranging from 0.42 to 0.59), but higher than those reported by Liu et al.24 for P. clarkii from fourteen locations in Jiangxi, Zhejiang, Guangdong, and Hubei provinces (average PIC ranging from 0.33 to 0.43). Consistent with previous findings,25 cultured populations displayed reduced genetic diversity compared to their wild counterparts, likely due to inbreeding and reliance on closed breeding systems. Notably, we observed lower-than-expected heterozygosity levels, with most microsatellite loci deviating from Hardy-Weinberg equilibrium. Such deviations from random mating expectations strongly suggest the presence of population substructure and inbreeding depression within cultured stocks, likely due to restricted gene flow and artificial selection pressures. To mitigate genetic erosion while preserving desirable production traits, regular genetic monitoring, the incorporation of wild genetic material through controlled breeding programs, and the establishment of regional broodstock exchange networks are recommended.
The variation in genetic diversity among regional aquaculture populations may be related to different stocking and breeding practices in the selected regions. A comprehensive analysis indicates that the Yunnan population has the lowest genetic diversity and is most closely related to the Qianjiang, Hubei population. The main aquaculture areas and migration routes for P. clarkii cover provinces such as Hubei, Hunan, and Jiangsu, while Yunnan is not a primary aquaculture region for this species.26 It is hypothesized that the P. clarkii populations in Yunnan primarily originated from stock introduced from Hubei, a major aquaculture province. Subsequent non-standard breeding practices and the lack of genetic input from external wild populations have significantly reduced the genetic diversity of the P. clarkii population in Yunnan.27 Additionally, moderate genetic differentiation was observed among the seven P. clarkii populations, with the majority of variation occurring within populations rather than between them. Reduced gene flow may be a significant factor contributing to the genetic differentiation observed between populations. Consequently, it is advisable to enrich the genetic diversity of progeny populations by introducing stocks from different regions with diverse genetic backgrounds and more distantly related groups, as well as by incorporating wild populations known for their high genetic diversity.
Acknowledgments
This work was supported by Hunan Province key research and development project (2020NK2039); Open fund of Hunan Key Laboratory of Healthy Aquaculture and Processing of aquatic products (2020-011); Changde City science and technology plan key project (2021-50).
Authors’ Contribution
Writing – original draft: Dan Zeng (Lead). Methodology: Yunsheng Zhang (Equal), Min Du (Equal), Liye Shao (Equal). Validation: Hu Xia (Lead). Writing – review & editing: Qing Han (Lead).
Competing of Interest – COPE
The authors declare that they have no competing interests.
Ethical Conduct Approval – IACUC
Ethical approval for the animal experiments was granted by the Animal Ethics Committee of Hunan University of Arts and Science (HUAS-2021-0045).
Informed Consent Statement
All listed authors have approved the manuscript for publication.
Data Availability Statement
All are available upon reasonable request.