Skip to content

Commit 2e2e91c

Browse files
Bibliometrix-Python v1.0.0
0 parents  commit 2e2e91c

File tree

101 files changed

+1128314
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+1128314
-0
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
__pycache__/
2+
bibliovenv/
3+
Bibenv/
4+
.idea/

LICENSE

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
bibliometrix-python Package for Python - Tool for Quantitative Research in Bibliometrics and Scientometrics.
2+
3+
Copyright (C) 2025 PRAISELab Team - University of Naples Federico II
4+
5+
Based on the original bibliometrix R package:
6+
Copyright (C) 2016 Massimo Aria and Corrado Cuccurullo
7+
8+
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
9+
10+
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
11+
12+
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

README.md

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
<!-- README.md for bibliometrix-python -->
2+
3+
# bibliometrix-python
4+
5+
## A Python tool for comprehensive science mapping analysis
6+
7+
[![bibliometrix: An R-tool for comprehensive science mapping
8+
analysis.](https://www.bibliometrix.org/JOI-badge.svg)](https://doi.org/10.1016/j.joi.2017.08.007)
9+
10+
<p align="center">
11+
<img src="https://www.bibliometrix.org/logo_new.png" width="400"/>
12+
</p>
13+
14+
## Overview
15+
16+
**bibliometrix-python** is a Python implementation of the renowned **bibliometrix** R package, providing a comprehensive set of tools for quantitative research in bibliometrics and scientometrics.
17+
18+
This project reimplements the core functionality of [bibliometrix](https://github.com/massimoaria/bibliometrix) (developed by Massimo Aria and Corrado Cuccurullo) using Python and the Shiny for Python framework, making these powerful bibliometric tools accessible to the Python scientific community.
19+
20+
Bibliometrics applies quantitative analysis and statistics to scientific publications and their citation patterns. It has become essential across all scientific fields for evaluating growth, maturity, leading authors, conceptual and intellectual maps, and emerging trends within research communities.
21+
22+
**bibliometrix-python** supports scholars in three key phases of analysis:
23+
24+
- **Data importing and conversion** from major bibliographic databases (Web of Science, Scopus, PubMed, Dimensions, Lens, Cochrane)
25+
26+
- **Bibliometric analysis** of publication datasets, including descriptive statistics, author productivity, and source impact
27+
28+
- **Building and visualizing networks** for co-citation, coupling, collaboration, and co-word analysis
29+
30+
## biblioshiny: Python Edition
31+
32+
**bibliometrix-python** includes an interactive web application built with **Shiny for Python**, providing an intuitive interface for comprehensive bibliometric analysis.
33+
34+
The web application enables scholars to easily access bibliometric analysis features through an interactive workflow:
35+
36+
### Data Management
37+
38+
- **Import and convert** data from multiple bibliographic databases:
39+
- Web of Science (plaintext, BibTeX, EndNote) - ✅ Fully supported
40+
- Scopus (CSV, BibTeX) - 🚧 In progress
41+
- PubMed (plaintext export) - 🚧 In progress
42+
- Dimensions (Excel, CSV) - 🚧 In progress
43+
- Lens.org (CSV) - 🚧 In progress
44+
- Cochrane CDSR (plaintext) - 🚧 In progress
45+
46+
- **Filter data** by various criteria including publication years, languages, document types, citation counts, and Bradford's Law zones
47+
48+
- **Sample datasets** for testing and learning
49+
50+
### Analytics and Visualization
51+
52+
- **Three-level metrics** for comprehensive analysis:
53+
54+
- **Sources**: journal performance, impact metrics, Bradford's Law, sources' local impact, production over time
55+
56+
- **Authors**: productivity analysis, Lotka's Law, collaboration patterns, h-index, local impact, affiliations analysis
57+
58+
- **Documents**: citation analysis, most relevant papers, references spectroscopy
59+
60+
- **Countries Analysis**: scientific production by country, collaboration networks, corresponding authors' countries
61+
62+
### Knowledge Structure Analysis
63+
64+
- **Conceptual Structure**: analyzing topics and themes through co-word analysis, thematic mapping, and thematic evolution
65+
66+
- **Intellectual Structure**: examining citation networks through co-citation analysis, historiograph, and document coupling
67+
68+
- **Social Structure**: exploring collaboration patterns through co-authorship networks at author, institution, and country levels
69+
70+
### Content Analysis Features
71+
72+
- **Word Analysis**: frequent words, word clouds, treemaps, word frequency over time
73+
74+
- **Trend Topics**: identify emerging and declining research topics
75+
76+
- **Three-Field Plot**: Sankey diagrams for exploring relationships between authors, keywords, and journals
77+
78+
### Advanced Features
79+
80+
- **AI-Powered Assistant**: Integrated Google Gemini AI chatbot for contextual help and insights - 🧪 BETA
81+
82+
- **Interactive Reports**: Generate comprehensive Excel reports combining multiple analyses
83+
84+
- **Export Capabilities**: Download plots as high-resolution images and tables as Excel files
85+
86+
### How to use biblioshiny
87+
88+
To launch the application, simply run:
89+
90+
```bash
91+
shiny run app.py
92+
```
93+
94+
Or using Python:
95+
96+
```bash
97+
python -m shiny run app.py
98+
```
99+
100+
The application will start and provide a local URL (typically `http://127.0.0.1:8000`) to access the web interface.
101+
102+
## How to cite
103+
104+
If you use this package for your research, please cite the original R package:
105+
106+
Aria, M. & Cuccurullo, C. (2017) **bibliometrix: An R-tool for comprehensive science mapping analysis**, *Journal of Informetrics*, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007
107+
108+
## Community
109+
110+
**Original bibliometrix (R version):**
111+
- Official website: https://www.bibliometrix.org
112+
- CRAN page: https://cran.r-project.org/package=bibliometrix
113+
- GitHub repository: https://github.com/massimoaria/bibliometrix
114+
115+
**Python implementation:**
116+
- GitHub repository: https://github.com/PRAISELab-PicusLab/bibliometrix-python
117+
- Issue tracker: https://github.com/PRAISELab-PicusLab/bibliometrix-python/issues
118+
119+
## Installation
120+
121+
### Prerequisites
122+
123+
- Python 3.9 or higher
124+
- pip package manager
125+
126+
### Install from source
127+
128+
Clone the repository:
129+
130+
```bash
131+
git clone https://github.com/PRAISELab-PicusLab/bibliometrix-python.git
132+
cd bibliometrix-python
133+
```
134+
135+
Install dependencies:
136+
137+
```bash
138+
pip install -r requirements.txt
139+
```
140+
141+
### Run the application
142+
143+
```bash
144+
shiny run app.py
145+
```
146+
147+
Or specify custom host and port:
148+
149+
```bash
150+
shiny run app.py --port 8000 --host 0.0.0.0
151+
```
152+
153+
## Project Structure
154+
155+
```plaintext
156+
bibliometrix-python/
157+
158+
├── app.py # Main application entry point
159+
├── requirements.txt # Python dependencies
160+
├── README.md
161+
162+
├── functions/ # Analysis functions
163+
│ ├── get_annualproduction.py
164+
│ ├── get_averagecitations.py
165+
│ ├── get_bradfordlaw.py
166+
│ ├── get_relevantauthors.py
167+
│ ├── get_relevantsources.py
168+
│ └── ... (35+ analysis modules)
169+
170+
├── www/ # Web application components
171+
│ ├── services/ # Core bibliometric services
172+
│ │ ├── parsers.py
173+
│ │ ├── format_functions.py
174+
│ │ ├── networkplot.py
175+
│ │ ├── thematicmap.py
176+
│ │ └── utils.py
177+
│ └── static/ # Static assets (CSS, JS)
178+
│ └── biblioshiny.css
179+
180+
└── sources/ # Sample datasets and test files
181+
├── Web_of_Science/
182+
├── Scopus/
183+
├── PubMed/
184+
├── Dimensions/
185+
├── Lens/
186+
└── Cochrane/
187+
```
188+
189+
## Key Features
190+
191+
### Data Import and Processing
192+
193+
bibliometrix-python supports importing bibliographic data from major scientific databases:
194+
195+
- **Web of Science**: plaintext (.txt), BibTeX (.bib), EndNote (.ciw) - ✅ Fully supported
196+
- **Scopus**: CSV (.csv), BibTeX (.bib) - 🚧 In progress
197+
- **PubMed**: plaintext export - 🚧 In progress
198+
- **Dimensions**: Excel (.xlsx), CSV (.csv) - 🚧 In progress
199+
- **Lens.org**: CSV (.csv) - 🚧 In progress
200+
- **Cochrane**: plaintext (.txt) - 🚧 In progress
201+
202+
### Comprehensive Bibliometric Analysis
203+
204+
The application provides extensive analysis capabilities organized by analytical level:
205+
206+
#### Overview Analysis
207+
- Main information and descriptive statistics
208+
- Annual scientific production
209+
- Average citations per year
210+
- Document type distribution
211+
- Keywords analysis
212+
213+
#### Sources Analysis
214+
- Most relevant sources (journals)
215+
- Most locally cited sources
216+
- Bradford's Law
217+
- Sources' local impact
218+
- Sources' production over time
219+
220+
#### Authors Analysis
221+
- Most relevant authors
222+
- Most locally cited authors
223+
- Authors' production over time
224+
- Lotka's Law
225+
- Authors' local impact
226+
- Affiliations analysis
227+
- Author collaboration patterns
228+
229+
#### Documents Analysis
230+
- Most globally cited documents
231+
- Most locally cited documents
232+
- Most locally cited references
233+
- References spectroscopy
234+
- Frequent words analysis
235+
- Word clouds and treemaps
236+
- Words' frequency over time
237+
- Trend topics
238+
239+
#### Network Analysis
240+
- Co-occurrence networks
241+
- Co-citation networks
242+
- Collaboration networks
243+
- Country collaboration maps
244+
- Thematic maps
245+
- Thematic evolution
246+
- Clustering analysis
247+
- Factorial analysis
248+
- Historiograph
249+
250+
### Interactive Visualizations
251+
252+
All analyses include interactive visualizations built with Plotly and other modern Python libraries:
253+
254+
- Bar charts, line plots, and scatter plots
255+
- Network diagrams
256+
- Sankey diagrams (Three-Field Plot)
257+
- Heatmaps
258+
- Word clouds
259+
- Treemaps
260+
- Thematic maps
261+
262+
### Export and Reporting
263+
264+
- Export plots as high-resolution PNG images (customizable DPI)
265+
- Download tables as Excel files
266+
- Generate comprehensive reports combining multiple analyses
267+
- Add analyses to report collection for batch download
268+
269+
## AI Assistant Integration (BETA)
270+
271+
The application includes an AI-powered chatbot using Google Gemini API to help users:
272+
273+
- Understand bibliometric concepts
274+
- Interpret analysis results
275+
- Get contextual help
276+
- Receive recommendations for further analysis
277+
278+
**Note:** This feature is currently in BETA testing.
279+
280+
To use the AI assistant, configure your Gemini API key in the Settings panel.
281+
282+
## Acknowledgments
283+
284+
This project is a Python reimplementation of the original **bibliometrix** R package developed by:
285+
286+
**Massimo Aria** and **Corrado Cuccurullo**
287+
*University of Naples Federico II, Italy*
288+
289+
We are grateful for their pioneering work in making bibliometric analysis accessible to researchers worldwide.
290+
291+
For the original R implementation and comprehensive documentation, please visit:
292+
- Website: https://www.bibliometrix.org
293+
- GitHub: https://github.com/massimoaria/bibliometrix
294+
295+
### Main References (Original bibliometrix)
296+
297+
Aria, M. & Cuccurullo, C. (2017). **bibliometrix: An R-tool for comprehensive science mapping analysis**, *Journal of Informetrics*, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007
298+
299+
Aria, M., Le, T., Cuccurullo, C., Belfiore, A., & Choe, J. (2024). **openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex**. *The R Journal*, DOI: 10.32614/RJ-2023-089
300+
301+
Aria, M., Cuccurullo, C., D'Aniello, L., Misuraca, M., & Spano, M. (2022). **Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19 Pandemic in Italy**. *Sustainability*, 14(6), 3643
302+
303+
For a complete list of references and applications, visit: https://www.bibliometrix.org
304+
305+
## 🤝 Contributing
306+
307+
We welcome contributions to improve the application! To contribute, simply open a pull request or report issues on our [issue tracker](https://github.com/PRAISELab-PicusLab/bibliometrix-python/issues). We look forward to your improvements!
308+
309+
## 👨‍💻 Team
310+
311+
This project was developed by:
312+
313+
**Mariano Barone** · **Gian Marco Orlando** · **Giuseppe Riccio** · **Antonio Romano** · **Diego Russo** · **Vincenzo Moscato**
314+
315+
*Department of Electrical Engineering and Information Technology*
316+
*University of Naples Federico II, Italy*
317+
318+
**Research Lab:** The [PRAISE](https://github.com/PRAISELab) (PRedictive AnalytIcs for underUnderstanding big multimEdia data) research group is part of the PICUS Lab at the Department of Electrical Engineering and Information Technologies (DIETI), University of Naples Federico II, Italy.
319+
320+
## 📄 License
321+
322+
This application is distributed under the GNU General Public License as specified in the [LICENSE](LICENSE) file.
323+
324+
When used in a publication, please cite the original bibliometrix R package (see [How to cite](#how-to-cite) section).
325+
326+
## ⚠️ Development Notes
327+
328+
**Note:** This is an independent Python implementation and may not be fully compatible with the R version. Some features are still under development.
329+
330+
For detailed development status and known issues, please check the [issue tracker](https://github.com/PRAISELab-PicusLab/bibliometrix-python/issues).
331+
332+
---
333+
334+
<p align="center">
335+
Made with ❤️ by PRAISELab Team at University of Naples Federico II
336+
</p>

0 commit comments

Comments
 (0)