Right now, we are gathering data to determine what is the yes/no ratio we need for the same location to reach some confidence level that the location contains a school, and the same to reach some confidence level that the location does not contain a school. We are using a dataset that has been manually validated by a trained team of "data mappers" as ground truth for this initial validation and for determining the yes/no thresholds. The more users play the game and the more answers we have, the better we’ll be able to define these thresholds. On the other hand, no single person will have the power to validate a school alone.
The ratings will be used in the following ways:
- for locations that are rated with a high number of "NO", this information will be sent to the government (our main source of data) and we’ll work with them to correct the geolocation tagging of the corresponding schools
- locations that are rated with high number of "NO" and "YES" will be used to train our ML algorithms.
- for locations that are rated with high number of "UNSURE", we will consider that satellite imagery is not sufficient to determine that the location contains a school or not and they won’t be used to train our ML algorithms.