Big errors in big data

Large-scale global databases are increasingly used in biogeographical and evolutionary research and have led to unprecedented advances through tools that far surpass traditional data-processing methods. While technological developments have taken the lead in processing these large databases, the accuracy of raw data remains a limitation to the reliability of the final results. In the case of fossil records, accurate identification is a key issue that is not always sufficiently addressed, potentially leading to flawed biogeographical and evolutionary interpretations.
A recent study by IBB researcher Valentí Rull provides an example of how the accuracy of fossil identification—specifically pollen—can influence our understanding of the evolutionary origin, dispersal, and diversification of the Neotropical mangrove tree genus Pelliciera, which is now restricted to a small area around the Panama Isthmus. Previous studies demonstrated that Pelliciera originated in the Eocene of northern South America and expanded across the Neotropics during the Oligo-Miocene, later retreating to its original Eocene range between the Pliocene and the present. These range expansions and contractions were attributed mainly to climatic forcing, given the narrow environmental tolerance of Pelliciera.
However, the strictly Neotropical character of Pelliciera has been disputed due to supposed abundant records of its fossil pollen in North America, Europe, Africa, and the Middle East. This would imply a need to reconsider the global biogeographical trends of Pelliciera, offering a new perspective on the evolutionary history of this genus and of mangrove ecosystems in general.
This prompted VR to carefully analyze the extra-Neotropical records of Pelliciera fossil pollen. Several criteria were applied to filter these records. First, only articles published in journals indexed in Scopus were considered, to avoid gray literature and predatory journals; this excluded approximately 40% of the records. Second, only records that included pollen descriptions, diagnostic morphological characters, or identifiable images were accepted; simple mentions of Pelliciera were excluded.
This filtering reduced the more than 80 published records to approximately 20 cases, which were then compared with Pelliciera pollen. Only three records (less than 4% of the total) met the morphological criteria for being considered high-reliability Pelliciera pollen, and another three were classified in the medium-high reliability category. All of these reliable records originated from equatorial West Africa. In contrast, the European, Middle East, and North American records were categorized as low to negligible in reliability.
In summary, if all published records of Pelliciera fossil pollen were accepted uncritically, the entire global evolutionary history of this genus would need to be reconsidered. However, a careful taxonomic analysis reveals that more than 95% of these records are doubtful or unreliable. The lesson for paleontologists is that their fossil records are not merely evidence for localized, potentially short-term biostratigraphic or paleoenvironmental interpretations, but may be integrated into long-term, global databases used by non-specialist researchers. The lesson for database managers is that uncritical acceptance of fossil records can lead to flawed conclusions.
Reference
Rull, V. 2025. A critical evaluation of fossil pollen records from the mangrove tree Pelliciera beyond the Neotropics: biogeographical and evolutionary implications. Review of Palaeobotany and Palynology 335, 105299.