How to Add More Data to the Chatbot
Author: Georgi Peev
The chatbot's data pipeline consists of three main components:
- Parsers: Scripts that fetch and format data from the WFP Hungermap API
- Uploaders: Scripts that upload the parsed data to the MongoDB database
- Upload All: A utility file to run all parsers and uploaders in sequence
See also: Database Structure for the format of stored data.
Environment Setup
Before working with the data pipeline:
- Create and activate a Python virtual environment:
# Create venv
python -m venv venv
# Activate venv
# On Windows:
venv\Scripts\activate
# On Unix/MacOS:
source venv/bin/activate
- Install required dependencies:
pip install -r requirements.txt
Available Data Types
| Data Type | Parser | Description | Collection Fields |
|---|---|---|---|
| Country Reports | api_country_reports.py | Country-specific reports | document_name, country_name, report_content |
| General Data | api_country_general_data.py | Basic country metrics | country_id, country_name, fcs, rcsi |
| Additional Data | api_country_additional_data.py | Extended metrics | regions_data, fcs_graph, rcsi_graph |
| PDC Data | api_country_pdc.py | Pacific Disaster Center data | event_type, severity, location |
| Conflict Data | api_country_conflict.py | Conflict events | event_type, occurrences, regions |
| IPC Data | api_country_ipc.py | Food security classification | phase, population, region |
| ISO3 Data | api_iso3_data.py | ISO3 country code mappings | iso3, country_name |
| Yearly Review | api_yearly_review.py | Annual review reports | document_name, year, report_content |
Adding New Data Through Parsers
The parsers fetch the available data from the WFP Hungermap API. The acquired
raw data is saved in two formats in the src/assets/ directory:
- A cleaned and structured CSV file for database upload
- A raw JSON file containing the complete API response for debugging and reference
Available Parsers
Located in src/parsers/:
api_country_reports.py: Parses country reportsapi_country_general_data.py: Parses general country dataapi_country_additional_data.py: Parses additional country metricsapi_country_pdc.py: Parses PDC (Pacific Disaster Center) dataapi_country_conflict.py: Parses conflict dataapi_country_ipc.py: Parses IPC (Integrated Food Security Phase Classification) dataapi_iso3_data.py: Parses ISO3 country code mappingsapi_yearly_review.py: Parses yearly review reports
Creating a New Parser
- Create a new parser file in
src/parsers/:
# api_your_data.py
import csv
import json
import os
import requests
from ..utils.country_utils import get_list_of_all_country_ids
API_ENDPOINT = "https://api.hungermapdata.org/v2/your-endpoint"
SRC = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
def parse_your_data():
# Setup output directory
output_dir = os.path.join(SRC, "assets", "your_data_type")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Define paths and clean existing files
csv_path = os.path.join(output_dir, "your_data.csv")
json_path = os.path.join(output_dir, "your_data.json")
if os.path.exists(csv_path):
os.remove(csv_path)
if os.path.exists(json_path):
os.remove(json_path)
# Your parsing logic goes here...
def main():
parse_your_data()
if __name__ == "__main__":
main()
- Once you have added the file to
src/parsers/, you can run therun_all_parsers.pyscript which will automatically detect and upload all files from the parsers folder. Note that after all parsers run, temporary files in assets/ are cleaned up.
Adding Data Through Uploaders
Once the data is parsed, it needs to be uploaded to the MongoDB database.
Available Uploaders
Located in src/data_uploaders/:
Country Data Uploaders:
db_upload_country_additional_data.py: Additional country metrics and featuresdb_upload_country_and_region_data.py: Basic country and region informationdb_upload_country_conflict_data.py: Conflict events and statisticsdb_upload_country_economy_data.py: Economic indicatorsdb_upload_country_fcs_data.py: Food Consumption Score datadb_upload_country_ipc_data.py: Integrated Food Security Phase Classificationdb_upload_country_news.py: Country-specific news articlesdb_upload_country_nutrition_data.py: Nutrition statisticsdb_upload_country_pdc_data.py: Pacific Disaster Center eventsdb_upload_country_population_data.py: Population statisticsdb_upload_country_rcsi_data.py: Reduced Coping Strategy Index data
Report Uploaders:
db_upload_reports_data.py: Country-specific reportsdb_upload_yearly_reports_data.py: Annual review reports
Creating a New Uploader
- Create a new uploader in
src/data_uploaders/:
# db_upload_your_data.py
import os
from ..utils.csv_utils import read_csv_data
from ..utils.db_utils import upload_chatbot_data
if __name__ == "__main__":
# Read the CSV file
data = read_csv_data("path/to/your_data.csv")
# Format your data for the database
processed_data = []
for row in data:
processed_data.append({
"document_name": f"your_identifier_{row['type']}",
"data": row,
})
# Upload to MongoDB
upload_chatbot_data(processed_data)
- Once you have added the file to
src/data_uploaders/, you can run therun_all_uploaders.pyscript which will automatically detect and upload all files from the data uploaders folder.