English
Español
Log In
Email address
Password
Log in
Have you forgotten your password?
Communities & Collections
Research Outputs
Projects
Researchers
Statistics
Investigación Indoamérica
English
Español
Log In
Email address
Password
Log in
Have you forgotten your password?
Home
CRIS
Publications
An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
Export
Statistics
Options
An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
Journal
Agriculture (Switzerland)
Date Issued
2023
Author(s)
Santos, Fabián
Centro de Investigación para el Territorio y el Hábitat Sostenible
Acosta N.
Type
Article
DOI
10.3390/agriculture13051015
URL
https://cris.indoamerica.edu.ec/handle/123456789/8244
Abstract
Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion. © 2023 by the authors.
Subjects
Alzheimer; apache net...
Scopus© citations
1
Acquisition Date
Jun 6, 2024
View Details
Views
3
Acquisition Date
Dec 2, 2024
View Details
google-scholar
View Details
Downloads
View Details