|
| 1 | +<!-- README.md for bibliometrix-python --> |
| 2 | + |
| 3 | +# bibliometrix-python |
| 4 | + |
| 5 | +## A Python tool for comprehensive science mapping analysis |
| 6 | + |
| 7 | +[](https://doi.org/10.1016/j.joi.2017.08.007) |
| 9 | + |
| 10 | +<p align="center"> |
| 11 | +<img src="https://www.bibliometrix.org/logo_new.png" width="400"/> |
| 12 | +</p> |
| 13 | + |
| 14 | +## Overview |
| 15 | + |
| 16 | +**bibliometrix-python** is a Python implementation of the renowned **bibliometrix** R package, providing a comprehensive set of tools for quantitative research in bibliometrics and scientometrics. |
| 17 | + |
| 18 | +This project reimplements the core functionality of [bibliometrix](https://github.com/massimoaria/bibliometrix) (developed by Massimo Aria and Corrado Cuccurullo) using Python and the Shiny for Python framework, making these powerful bibliometric tools accessible to the Python scientific community. |
| 19 | + |
| 20 | +Bibliometrics applies quantitative analysis and statistics to scientific publications and their citation patterns. It has become essential across all scientific fields for evaluating growth, maturity, leading authors, conceptual and intellectual maps, and emerging trends within research communities. |
| 21 | + |
| 22 | +**bibliometrix-python** supports scholars in three key phases of analysis: |
| 23 | + |
| 24 | +- **Data importing and conversion** from major bibliographic databases (Web of Science, Scopus, PubMed, Dimensions, Lens, Cochrane) |
| 25 | + |
| 26 | +- **Bibliometric analysis** of publication datasets, including descriptive statistics, author productivity, and source impact |
| 27 | + |
| 28 | +- **Building and visualizing networks** for co-citation, coupling, collaboration, and co-word analysis |
| 29 | + |
| 30 | +## biblioshiny: Python Edition |
| 31 | + |
| 32 | +**bibliometrix-python** includes an interactive web application built with **Shiny for Python**, providing an intuitive interface for comprehensive bibliometric analysis. |
| 33 | + |
| 34 | +The web application enables scholars to easily access bibliometric analysis features through an interactive workflow: |
| 35 | + |
| 36 | +### Data Management |
| 37 | + |
| 38 | +- **Import and convert** data from multiple bibliographic databases: |
| 39 | + - Web of Science (plaintext, BibTeX, EndNote) - ✅ Fully supported |
| 40 | + - Scopus (CSV, BibTeX) - 🚧 In progress |
| 41 | + - PubMed (plaintext export) - 🚧 In progress |
| 42 | + - Dimensions (Excel, CSV) - 🚧 In progress |
| 43 | + - Lens.org (CSV) - 🚧 In progress |
| 44 | + - Cochrane CDSR (plaintext) - 🚧 In progress |
| 45 | + |
| 46 | +- **Filter data** by various criteria including publication years, languages, document types, citation counts, and Bradford's Law zones |
| 47 | + |
| 48 | +- **Sample datasets** for testing and learning |
| 49 | + |
| 50 | +### Analytics and Visualization |
| 51 | + |
| 52 | +- **Three-level metrics** for comprehensive analysis: |
| 53 | + |
| 54 | + - **Sources**: journal performance, impact metrics, Bradford's Law, sources' local impact, production over time |
| 55 | + |
| 56 | + - **Authors**: productivity analysis, Lotka's Law, collaboration patterns, h-index, local impact, affiliations analysis |
| 57 | + |
| 58 | + - **Documents**: citation analysis, most relevant papers, references spectroscopy |
| 59 | + |
| 60 | +- **Countries Analysis**: scientific production by country, collaboration networks, corresponding authors' countries |
| 61 | + |
| 62 | +### Knowledge Structure Analysis |
| 63 | + |
| 64 | +- **Conceptual Structure**: analyzing topics and themes through co-word analysis, thematic mapping, and thematic evolution |
| 65 | + |
| 66 | +- **Intellectual Structure**: examining citation networks through co-citation analysis, historiograph, and document coupling |
| 67 | + |
| 68 | +- **Social Structure**: exploring collaboration patterns through co-authorship networks at author, institution, and country levels |
| 69 | + |
| 70 | +### Content Analysis Features |
| 71 | + |
| 72 | +- **Word Analysis**: frequent words, word clouds, treemaps, word frequency over time |
| 73 | + |
| 74 | +- **Trend Topics**: identify emerging and declining research topics |
| 75 | + |
| 76 | +- **Three-Field Plot**: Sankey diagrams for exploring relationships between authors, keywords, and journals |
| 77 | + |
| 78 | +### Advanced Features |
| 79 | + |
| 80 | +- **AI-Powered Assistant**: Integrated Google Gemini AI chatbot for contextual help and insights - 🧪 BETA |
| 81 | + |
| 82 | +- **Interactive Reports**: Generate comprehensive Excel reports combining multiple analyses |
| 83 | + |
| 84 | +- **Export Capabilities**: Download plots as high-resolution images and tables as Excel files |
| 85 | + |
| 86 | +### How to use biblioshiny |
| 87 | + |
| 88 | +To launch the application, simply run: |
| 89 | + |
| 90 | +```bash |
| 91 | +shiny run app.py |
| 92 | +``` |
| 93 | + |
| 94 | +Or using Python: |
| 95 | + |
| 96 | +```bash |
| 97 | +python -m shiny run app.py |
| 98 | +``` |
| 99 | + |
| 100 | +The application will start and provide a local URL (typically `http://127.0.0.1:8000`) to access the web interface. |
| 101 | + |
| 102 | +## How to cite |
| 103 | + |
| 104 | +If you use this package for your research, please cite the original R package: |
| 105 | + |
| 106 | +Aria, M. & Cuccurullo, C. (2017) **bibliometrix: An R-tool for comprehensive science mapping analysis**, *Journal of Informetrics*, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007 |
| 107 | + |
| 108 | +## Community |
| 109 | + |
| 110 | +**Original bibliometrix (R version):** |
| 111 | +- Official website: https://www.bibliometrix.org |
| 112 | +- CRAN page: https://cran.r-project.org/package=bibliometrix |
| 113 | +- GitHub repository: https://github.com/massimoaria/bibliometrix |
| 114 | + |
| 115 | +**Python implementation:** |
| 116 | +- GitHub repository: https://github.com/PRAISELab-PicusLab/bibliometrix-python |
| 117 | +- Issue tracker: https://github.com/PRAISELab-PicusLab/bibliometrix-python/issues |
| 118 | + |
| 119 | +## Installation |
| 120 | + |
| 121 | +### Prerequisites |
| 122 | + |
| 123 | +- Python 3.9 or higher |
| 124 | +- pip package manager |
| 125 | + |
| 126 | +### Install from source |
| 127 | + |
| 128 | +Clone the repository: |
| 129 | + |
| 130 | +```bash |
| 131 | +git clone https://github.com/PRAISELab-PicusLab/bibliometrix-python.git |
| 132 | +cd bibliometrix-python |
| 133 | +``` |
| 134 | + |
| 135 | +Install dependencies: |
| 136 | + |
| 137 | +```bash |
| 138 | +pip install -r requirements.txt |
| 139 | +``` |
| 140 | + |
| 141 | +### Run the application |
| 142 | + |
| 143 | +```bash |
| 144 | +shiny run app.py |
| 145 | +``` |
| 146 | + |
| 147 | +Or specify custom host and port: |
| 148 | + |
| 149 | +```bash |
| 150 | +shiny run app.py --port 8000 --host 0.0.0.0 |
| 151 | +``` |
| 152 | + |
| 153 | +## Project Structure |
| 154 | + |
| 155 | +```plaintext |
| 156 | +bibliometrix-python/ |
| 157 | +│ |
| 158 | +├── app.py # Main application entry point |
| 159 | +├── requirements.txt # Python dependencies |
| 160 | +├── README.md |
| 161 | +│ |
| 162 | +├── functions/ # Analysis functions |
| 163 | +│ ├── get_annualproduction.py |
| 164 | +│ ├── get_averagecitations.py |
| 165 | +│ ├── get_bradfordlaw.py |
| 166 | +│ ├── get_relevantauthors.py |
| 167 | +│ ├── get_relevantsources.py |
| 168 | +│ └── ... (35+ analysis modules) |
| 169 | +│ |
| 170 | +├── www/ # Web application components |
| 171 | +│ ├── services/ # Core bibliometric services |
| 172 | +│ │ ├── parsers.py |
| 173 | +│ │ ├── format_functions.py |
| 174 | +│ │ ├── networkplot.py |
| 175 | +│ │ ├── thematicmap.py |
| 176 | +│ │ └── utils.py |
| 177 | +│ └── static/ # Static assets (CSS, JS) |
| 178 | +│ └── biblioshiny.css |
| 179 | +│ |
| 180 | +└── sources/ # Sample datasets and test files |
| 181 | + ├── Web_of_Science/ |
| 182 | + ├── Scopus/ |
| 183 | + ├── PubMed/ |
| 184 | + ├── Dimensions/ |
| 185 | + ├── Lens/ |
| 186 | + └── Cochrane/ |
| 187 | +``` |
| 188 | + |
| 189 | +## Key Features |
| 190 | + |
| 191 | +### Data Import and Processing |
| 192 | + |
| 193 | +bibliometrix-python supports importing bibliographic data from major scientific databases: |
| 194 | + |
| 195 | +- **Web of Science**: plaintext (.txt), BibTeX (.bib), EndNote (.ciw) - ✅ Fully supported |
| 196 | +- **Scopus**: CSV (.csv), BibTeX (.bib) - 🚧 In progress |
| 197 | +- **PubMed**: plaintext export - 🚧 In progress |
| 198 | +- **Dimensions**: Excel (.xlsx), CSV (.csv) - 🚧 In progress |
| 199 | +- **Lens.org**: CSV (.csv) - 🚧 In progress |
| 200 | +- **Cochrane**: plaintext (.txt) - 🚧 In progress |
| 201 | + |
| 202 | +### Comprehensive Bibliometric Analysis |
| 203 | + |
| 204 | +The application provides extensive analysis capabilities organized by analytical level: |
| 205 | + |
| 206 | +#### Overview Analysis |
| 207 | +- Main information and descriptive statistics |
| 208 | +- Annual scientific production |
| 209 | +- Average citations per year |
| 210 | +- Document type distribution |
| 211 | +- Keywords analysis |
| 212 | + |
| 213 | +#### Sources Analysis |
| 214 | +- Most relevant sources (journals) |
| 215 | +- Most locally cited sources |
| 216 | +- Bradford's Law |
| 217 | +- Sources' local impact |
| 218 | +- Sources' production over time |
| 219 | + |
| 220 | +#### Authors Analysis |
| 221 | +- Most relevant authors |
| 222 | +- Most locally cited authors |
| 223 | +- Authors' production over time |
| 224 | +- Lotka's Law |
| 225 | +- Authors' local impact |
| 226 | +- Affiliations analysis |
| 227 | +- Author collaboration patterns |
| 228 | + |
| 229 | +#### Documents Analysis |
| 230 | +- Most globally cited documents |
| 231 | +- Most locally cited documents |
| 232 | +- Most locally cited references |
| 233 | +- References spectroscopy |
| 234 | +- Frequent words analysis |
| 235 | +- Word clouds and treemaps |
| 236 | +- Words' frequency over time |
| 237 | +- Trend topics |
| 238 | + |
| 239 | +#### Network Analysis |
| 240 | +- Co-occurrence networks |
| 241 | +- Co-citation networks |
| 242 | +- Collaboration networks |
| 243 | +- Country collaboration maps |
| 244 | +- Thematic maps |
| 245 | +- Thematic evolution |
| 246 | +- Clustering analysis |
| 247 | +- Factorial analysis |
| 248 | +- Historiograph |
| 249 | + |
| 250 | +### Interactive Visualizations |
| 251 | + |
| 252 | +All analyses include interactive visualizations built with Plotly and other modern Python libraries: |
| 253 | + |
| 254 | +- Bar charts, line plots, and scatter plots |
| 255 | +- Network diagrams |
| 256 | +- Sankey diagrams (Three-Field Plot) |
| 257 | +- Heatmaps |
| 258 | +- Word clouds |
| 259 | +- Treemaps |
| 260 | +- Thematic maps |
| 261 | + |
| 262 | +### Export and Reporting |
| 263 | + |
| 264 | +- Export plots as high-resolution PNG images (customizable DPI) |
| 265 | +- Download tables as Excel files |
| 266 | +- Generate comprehensive reports combining multiple analyses |
| 267 | +- Add analyses to report collection for batch download |
| 268 | + |
| 269 | +## AI Assistant Integration (BETA) |
| 270 | + |
| 271 | +The application includes an AI-powered chatbot using Google Gemini API to help users: |
| 272 | + |
| 273 | +- Understand bibliometric concepts |
| 274 | +- Interpret analysis results |
| 275 | +- Get contextual help |
| 276 | +- Receive recommendations for further analysis |
| 277 | + |
| 278 | +**Note:** This feature is currently in BETA testing. |
| 279 | + |
| 280 | +To use the AI assistant, configure your Gemini API key in the Settings panel. |
| 281 | + |
| 282 | +## Acknowledgments |
| 283 | + |
| 284 | +This project is a Python reimplementation of the original **bibliometrix** R package developed by: |
| 285 | + |
| 286 | +**Massimo Aria** and **Corrado Cuccurullo** |
| 287 | +*University of Naples Federico II, Italy* |
| 288 | + |
| 289 | +We are grateful for their pioneering work in making bibliometric analysis accessible to researchers worldwide. |
| 290 | + |
| 291 | +For the original R implementation and comprehensive documentation, please visit: |
| 292 | +- Website: https://www.bibliometrix.org |
| 293 | +- GitHub: https://github.com/massimoaria/bibliometrix |
| 294 | + |
| 295 | +### Main References (Original bibliometrix) |
| 296 | + |
| 297 | +Aria, M. & Cuccurullo, C. (2017). **bibliometrix: An R-tool for comprehensive science mapping analysis**, *Journal of Informetrics*, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007 |
| 298 | + |
| 299 | +Aria, M., Le, T., Cuccurullo, C., Belfiore, A., & Choe, J. (2024). **openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex**. *The R Journal*, DOI: 10.32614/RJ-2023-089 |
| 300 | + |
| 301 | +Aria, M., Cuccurullo, C., D'Aniello, L., Misuraca, M., & Spano, M. (2022). **Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19 Pandemic in Italy**. *Sustainability*, 14(6), 3643 |
| 302 | + |
| 303 | +For a complete list of references and applications, visit: https://www.bibliometrix.org |
| 304 | + |
| 305 | +## 🤝 Contributing |
| 306 | + |
| 307 | +We welcome contributions to improve the application! To contribute, simply open a pull request or report issues on our [issue tracker](https://github.com/PRAISELab-PicusLab/bibliometrix-python/issues). We look forward to your improvements! |
| 308 | + |
| 309 | +## 👨💻 Team |
| 310 | + |
| 311 | +This project was developed by: |
| 312 | + |
| 313 | +**Mariano Barone** · **Gian Marco Orlando** · **Giuseppe Riccio** · **Antonio Romano** · **Diego Russo** · **Vincenzo Moscato** |
| 314 | + |
| 315 | +*Department of Electrical Engineering and Information Technology* |
| 316 | +*University of Naples Federico II, Italy* |
| 317 | + |
| 318 | +**Research Lab:** The [PRAISE](https://github.com/PRAISELab) (PRedictive AnalytIcs for underUnderstanding big multimEdia data) research group is part of the PICUS Lab at the Department of Electrical Engineering and Information Technologies (DIETI), University of Naples Federico II, Italy. |
| 319 | + |
| 320 | +## 📄 License |
| 321 | + |
| 322 | +This application is distributed under the GNU General Public License as specified in the [LICENSE](LICENSE) file. |
| 323 | + |
| 324 | +When used in a publication, please cite the original bibliometrix R package (see [How to cite](#how-to-cite) section). |
| 325 | + |
| 326 | +## ⚠️ Development Notes |
| 327 | + |
| 328 | +**Note:** This is an independent Python implementation and may not be fully compatible with the R version. Some features are still under development. |
| 329 | + |
| 330 | +For detailed development status and known issues, please check the [issue tracker](https://github.com/PRAISELab-PicusLab/bibliometrix-python/issues). |
| 331 | + |
| 332 | +--- |
| 333 | + |
| 334 | +<p align="center"> |
| 335 | +Made with ❤️ by PRAISELab Team at University of Naples Federico II |
| 336 | +</p> |
0 commit comments