How to Add More Data to the Chatbot
Author: Georgi Peev
The chatbot's data pipeline consists of three main components:
- Parsers: Scripts that fetch and format data from the WFP Hungermap API
- Uploaders: Scripts that upload the parsed data to the MongoDB database
- Upload All: A utility file to run all parsers and uploaders in sequence
See also: Database Structure for the format of stored data.
Environment Setup
Before working with the data pipeline:
- Create and activate a Python virtual environment:
# Create venv
python -m venv venv
# Activate venv
# On Windows:
venv\Scripts\activate
# On Unix/MacOS:
source venv/bin/activate
- Install required dependencies:
pip install -r requirements.txt
Available Data Types
Data Type | Parser | Description | Collection Fields |
---|---|---|---|
Country Reports | api_country_reports.py | Country-specific reports | document_name , country_name , report_content |
General Data | api_country_general_data.py | Basic country metrics | country_id , country_name , fcs , rcsi |
Additional Data | api_country_additional_data.py | Extended metrics | regions_data , fcs_graph , rcsi_graph |
PDC Data | api_country_pdc.py | Pacific Disaster Center data | event_type , severity , location |
Conflict Data | api_country_conflict.py | Conflict events | event_type , occurrences , regions |
IPC Data | api_country_ipc.py | Food security classification | phase , population , region |
ISO3 Data | api_iso3_data.py | ISO3 country code mappings | iso3 , country_name |
Yearly Review | api_yearly_review.py | Annual review reports | document_name , year , report_content |
Adding New Data Through Parsers
The parsers fetch the available data from the WFP Hungermap API. The acquired
raw data is saved in two formats in the src/assets/
directory:
- A cleaned and structured CSV file for database upload
- A raw JSON file containing the complete API response for debugging and reference
Available Parsers
Located in src/parsers/
:
api_country_reports.py
: Parses country reportsapi_country_general_data.py
: Parses general country dataapi_country_additional_data.py
: Parses additional country metricsapi_country_pdc.py
: Parses PDC (Pacific Disaster Center) dataapi_country_conflict.py
: Parses conflict dataapi_country_ipc.py
: Parses IPC (Integrated Food Security Phase Classification) dataapi_iso3_data.py
: Parses ISO3 country code mappingsapi_yearly_review.py
: Parses yearly review reports
Creating a New Parser
- Create a new parser file in
src/parsers/
:
# api_your_data.py
import csv
import json
import os
import requests
from ..utils.country_utils import get_list_of_all_country_ids
API_ENDPOINT = "https://api.hungermapdata.org/v2/your-endpoint"
SRC = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
def parse_your_data():
# Setup output directory
output_dir = os.path.join(SRC, "assets", "your_data_type")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Define paths and clean existing files
csv_path = os.path.join(output_dir, "your_data.csv")
json_path = os.path.join(output_dir, "your_data.json")
if os.path.exists(csv_path):
os.remove(csv_path)
if os.path.exists(json_path):
os.remove(json_path)
# Your parsing logic goes here...
def main():
parse_your_data()
if __name__ == "__main__":
main()
- Once you have added the file to
src/parsers/
, you can run therun_all_parsers.py
script which will automatically detect and upload all files from the parsers folder. Note that after all parsers run, temporary files in assets/ are cleaned up.
Adding Data Through Uploaders
Once the data is parsed, it needs to be uploaded to the MongoDB database.
Available Uploaders
Located in src/data_uploaders/
:
Country Data Uploaders:
db_upload_country_additional_data.py
: Additional country metrics and featuresdb_upload_country_and_region_data.py
: Basic country and region informationdb_upload_country_conflict_data.py
: Conflict events and statisticsdb_upload_country_economy_data.py
: Economic indicatorsdb_upload_country_fcs_data.py
: Food Consumption Score datadb_upload_country_ipc_data.py
: Integrated Food Security Phase Classificationdb_upload_country_news.py
: Country-specific news articlesdb_upload_country_nutrition_data.py
: Nutrition statisticsdb_upload_country_pdc_data.py
: Pacific Disaster Center eventsdb_upload_country_population_data.py
: Population statisticsdb_upload_country_rcsi_data.py
: Reduced Coping Strategy Index data
Report Uploaders:
db_upload_reports_data.py
: Country-specific reportsdb_upload_yearly_reports_data.py
: Annual review reports
Creating a New Uploader
- Create a new uploader in
src/data_uploaders/
:
# db_upload_your_data.py
import os
from ..utils.csv_utils import read_csv_data
from ..utils.db_utils import upload_chatbot_data
if __name__ == "__main__":
# Read the CSV file
data = read_csv_data("path/to/your_data.csv")
# Format your data for the database
processed_data = []
for row in data:
processed_data.append({
"document_name": f"your_identifier_{row['type']}",
"data": row,
})
# Upload to MongoDB
upload_chatbot_data(processed_data)
- Once you have added the file to
src/data_uploaders/
, you can run therun_all_uploaders.py
script which will automatically detect and upload all files from the data uploaders folder.