Friday, April 8, 2016

Data Normalization, Geocoding, and Error Assessment Sand Mining Suitability Project

Goals

The goal of this assignment is to geocode the locations of all the sand mines in Wisconsin and compare our results to the actual locations and the geocoding results of our classmates. Each student was given around 16 mines to geocode to ensure that we would be comparing to multiple peoples geocoding.

Methods

The data that was provided was in the form of an Excel table. Each student was assigned around 16 mines to geocode. The first step before geocoding was to normalize the address table. We needed to do this because some of the mines were given in a PLSS address, some were given in a street address, and some were given in both. In normalizing the table, the addresses were separated. Some of the street addresses were missing data like a zip code or street name or city. Because of this, the addresses were normalized further.  

Using ArcMap, we signed into the enterprise account for UWEC. This allowed the geocoding to be done. The first step in the geocoding was to add the table to ArcMap. The software is then able to use its geocoding tools to find addresses within the table. The original geocoding of the table was very messy because of the in-completion of the data. Before manually finding the mines, the mines not assigned were separated out. Once the assigned mines were all that was left, it was time to go through each and every mine to find its actual location. 

First, looking for mines with complete street addresses were checked for their accuracy. Ideally, the geocoding was successful. In a couple cases, the address locator couldn't find the address. The next step is to manually find these addresses. In most cases, Google Maps can in handy finding the mines. It is easier to navigate and cover ground faster. If there was an address that was not complete, and no PLSS was given, all of the mines locations were given in latitude and longitude. Once all of the street address mines were located, then the PLSS mines needed to be located. 

The PLSS shapefile was added to ArcMap. All of the PLSS addresses were given and relatively easy to find, but finding them was a long and tedious process. The PLSS address were given in a series of directional units and numbered squares. 

The second part of the assignment was to gather the data from the students who geocoded the same mines as us. First, it was necessary to add all of their mines into ArcMap and observe the data. There were visible differences in the geocoding. The next step was to merge all of the different classmates data into one feature class. 

Once the data was merged, the actual mine locations shapefile was added into ArcMap. After comparing my own and my classmates geocoding to the correct locations, I generated a near table. This gave the distances from my geocoding to the correct locations and also the distance from the classmates geocoding to the correct. The results are shown in the next section through two maps and tables. 




Results


The following map shows the differences between the mines that I mapped and the mines that my classmates mapped. As you can see for the most part, the mines are in the same general area. But in reality, there is quite a difference once we look at them at a larger scale. You can also notice that there is a point located on the East side of the state. Figure 2 is a table that maps the difference between all of our data points in meters.
Figure 1. Map of Wisconsin showing geocoded mines

Figure 2. shows the distance between mapped points in Figure 1.



Below, Figure 3 shows the difference between my mapped mines and the actual location of the mines. For the most part, my mines are in the general location with some of them located in the correct spot. However, like the map above, at a larger scale the errors will be shown more. Figure 4 shows a table with the distances from my mapped points to the actual mapped points in meters.

Figure 3. Shows my mapped points compared to the actual mine locations.


Figure 4. Shows the distance in meters between my points and the correct points. 



Discussion

After doing this assignment, it is easier to look at data and think about potential errors in the data. From the start we were told the the data was going to have some errors in it regarding the addresses. Some of the addresses would be incomplete. I even found some addresses where the street numbers just had two numbers that were switched. Errors like these can come up at anytime when working with GIS data. The majority of the errors in this assignment would be classified as operational errors. Operational errors are mostly contained in the collecting and managing of data. The main issue of this lab was to geocode data with operational errors inside the data. The inherent errors appeared in the results of our geocoding. It goes along with looking at and interpreting the results. Because the real world is far more complex than our mapping capability, inherent errors occur. For example there were a couple of mines that, when looked at through an imagery base map, didn't look like they were actually located where they were supposed to be. This is all part of GIS, knowing that not everything is going to work perfectly is very important. 





Conclusion

This exercise really showed the difficulty with geocoding and working with error filled data. It showed that no matter what, things are not going to always work smoothly. We need to learn to work around errors and know how to fix them. It also showed the importance of keeping data in an organized fashion. 














No comments:

Post a Comment