Master function for generating stats from taxonomic classification results
This function handles parsing reports, agreggating taxon counts, generating
read assignment statistics and, most importantly, generating diversity and
abundance metrics.
| Parameters: |
-
samples
(List[Sample])
–
List of samples, an object comprising two attributes,
one the report path, the other a string specifying the report type.
|
| Returns: |
-
dict( Dict
) –
Dict with 4 keys: 'sample n reads' containing read assignment stats;
'common taxas' containing the 5 most common taxas and their respective
counts in each sample; 'abund and div' containing abundance and diversity
metrics; and 'beta div' containing a PCoA of beta diversity results.
|
Source code in microview/parse_taxonomy.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250 | def get_tax_data(samples: List[Sample]) -> Dict:
"""
Master function for generating stats from taxonomic classification results
This function handles parsing reports, agreggating taxon counts, generating
read assignment statistics and, most importantly, generating diversity and
abundance metrics.
Args:
samples (List[Sample]): List of samples, an object comprising two attributes,
one the report path, the other a string specifying the report type.
Returns:
dict: Dict with 4 keys: 'sample n reads' containing read assignment stats;
'common taxas' containing the 5 most common taxas and their respective
counts in each sample; 'abund and div' containing abundance and diversity
metrics; and 'beta div' containing a PCoA of beta diversity results.
"""
parsed_stats = parse_reports(samples)
all_sample_counts = get_taxon_counts(parsed_stats)
n_reads = get_read_assignment(parsed_stats)
most_common = get_common_taxas(all_sample_counts)
abund_div_df, betadiv_pcoa = calculate_abund_diver(all_sample_counts)
stats_df = DataFrame(n_reads).T.reset_index().melt(id_vars=["index"])
most_common_df = (
DataFrame.from_dict(most_common, orient="index")
.reset_index()
.melt(id_vars=["index"])
.sort_values(["index", "variable"], ascending=False)
)
return {
"sample n reads": stats_df,
"common taxas": most_common_df,
"abund and div": abund_div_df,
"beta div": betadiv_pcoa,
}
|