Accuracy Performance
Accuracy measures the proportion of correct predictions across all feature classifications. Our Mulberry-Qwen DCoT demonstrates excellent accuracy on key architectural features, particularly excelling in weep holes and balconies detection.
F1-Score Performance
F1-Score provides the harmonic mean of precision and recall. Our Mulberry-Qwen DCoT approach achieves strong performance across multiple architectural features, demonstrating the effectiveness of Chain-of-Thought distillation in spatial reasoning tasks.
Recall Performance
Recall measures the proportion of true positive predictions among all actual positive cases, indicating model sensitivity. Our DCoT approach maintains consistent recall performance, successfully capturing architectural features with balanced sensitivity across the dataset.
Precision Performance
Precision measures the proportion of true positive predictions among all positive predictions, indicating model specificity. Our DCoT approach shows competitive precision, effectively minimizing false positives in architectural feature detection.