Page 1 of 1 [ 6 posts ] 

Tim_Tex
Veteran
Veteran

User avatar

Joined: 2 Jul 2004
Age: 42
Gender: Male
Posts: 43,400
Location: Houston, Texas

11 Mar 2021, 11:50 pm

I am trying to do a map that shows economic and unemployment data for each county. When I reach the point where I merge my excel file with the shapefile data, the data from my excel file is all NaNs.

If there are any Python programmers who know what's going on, your help would be greatly appreciated.

Here is my code thus far:

Quote:
#Import required libraries

import numpy as np
import pandas as pd
import geopandas as gpd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns

#Set the filepath and load in a shapefile

fp = 'C:/Users/timho/UScounties/UScounties.shp'

map_df = gpd.read_file(fp)
print (map_df)

NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS \
0 Lake of the Woods Minnesota 27 077 27077
1 Ferry Washington 53 019 53019
2 Stevens Washington 53 065 53065
3 Okanogan Washington 53 047 53047
4 Pend Oreille Washington 53 051 53051
... ... ... ... ... ...
3136 Skagway-Hoonah-Angoon Alaska 02 232 02232
3137 Yukon-Koyukuk Alaska 02 290 02290
3138 Southeast Fairbanks Alaska 02 240 02240
3139 Denali Alaska 02 068 02068
3140 Broomfield Colorado 08 014 08014

geometry
0 POLYGON ((-95.34283 48.54668, -95.34105 48.715...
1 POLYGON ((-118.85163 47.94956, -118.84846 48.4...
2 POLYGON ((-117.43883 48.04412, -117.54219 48.0...
3 POLYGON ((-118.97209 47.93915, -118.97406 47.9...
4 POLYGON ((-117.43858 48.99992, -117.03205 48.9...
... ...
3136 MULTIPOLYGON (((-137.80952 58.71648, -137.4674...
3137 POLYGON ((-161.04770 62.20469, -160.99428 62.8...
3138 POLYGON ((-146.96382 63.46070, -146.95735 64.2...
3139 POLYGON ((-152.98947 62.74900, -152.48773 63.1...
3140 POLYGON ((-105.05201 39.99761, -104.99139 40.0...

[3141 rows x 6 columns]

#Prep Shapefile data

map_df = map_df.drop(columns=["STATE_FIPS", "CNTY_FIPS"])
map_df.head()

NAME STATE_NAME FIPS geometry
0 Lake of the Woods Minnesota 27077 POLYGON ((-95.34283 48.54668, -95.34105 48.715...
1 Ferry Washington 53019 POLYGON ((-118.85163 47.94956, -118.84846 48.4...
2 Stevens Washington 53065 POLYGON ((-117.43883 48.04412, -117.54219 48.0...
3 Okanogan Washington 53047 POLYGON ((-118.97209 47.93915, -118.97406 47.9...
4 Pend Oreille Washington 53051 POLYGON ((-117.43858 48.99992, -117.03205 48.9...

%matplotlib inline

map_df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x289acce5e88>

df = pd.read_csv('C:/Users/timho/Econ_Plan.csv', header=0)
df.head()

fips state area_name urban_or_rural ue_rate_2015 ue_rate_2016 ue_rate_2017 ue_rate_2018 ue_rate_2019 avg_ue_rate_chg high_ue main_industry fossil_fuel hhi_2019
0 1001 AL Autauga County, AL urban 5.2 5.1 3.9 3.6 2.7 -0.6 no utilities yes 58,233
1 1003 AL Baldwin County, AL urban 5.5 5.3 4.1 3.6 2.7 -0.7 no hospitality no 59,871
2 1005 AL Barbour County, AL rural 8.9 8.3 5.8 5.1 3.8 -1.3 no mining yes 35,972
3 1007 AL Bibb County, AL urban 6.6 6.4 4.4 3.9 3.1 -0.9 no construction no 47,918
4 1009 AL Blount County, AL urban 5.4 5.4 4.0 3.5 2.7 -0.7 no manufacturing no 52,902
df = df[['fips', 'urban_or_rural', 'ue_rate_2015', 'ue_rate_2019', 'avg_ue_rate_chg', 'main_industry', 'fossil_fuel']]
data_for_map = df.rename(index=str, columns={'urban_or_rural': 'makeup', 'ue_rate_2015': 'unemp_15', 'ue_rate_2019': 'unemp_19', 'avg_ue_rate_chg': 'net_change'})

#Check dat dataframe

data_for_map.head()

fips makeup unemp_15 unemp_19 net_change main_industry fossil_fuel
0 1001 urban 5.2 2.7 -0.6 utilities yes
1 1003 urban 5.5 2.7 -0.7 hospitality no
2 1005 rural 8.9 3.8 -1.3 mining yes
3 1007 urban 6.6 3.1 -0.9 construction no
4 1009 urban 5.4 2.7 -0.7 manufacturing no

#Join the geodataframe with the cleaned up csv dataframe

merged = map_df.set_index('FIPS').join(data_for_map.set_index('fips'))

merged.head()

NAME STATE_NAME geometry makeup unemp_15 unemp_19 net_change main_industry fossil_fuel
FIPS
27077 Lake of the Woods Minnesota POLYGON ((-95.34283 48.54668, -95.34105 48.715... NaN NaN NaN NaN NaN NaN
53019 Ferry Washington POLYGON ((-118.85163 47.94956, -118.84846 48.4... NaN NaN NaN NaN NaN NaN
53065 Stevens Washington POLYGON ((-117.43883 48.04412, -117.54219 48.0... NaN NaN NaN NaN NaN NaN
53047 Okanogan Washington POLYGON ((-118.97209 47.93915, -118.97406 47.9... NaN NaN NaN NaN NaN NaN
53051 Pend Oreille Washington POLYGON ((-117.43858 48.99992, -117.03205 48.9... NaN NaN NaN NaN NaN NaN

#Set a variable that will call whatever column we want to visualize on the map

variable = 'unemp_19'

#Set the range for the chloropleth

vmin, vmax = 0, 20

#Create figure and axes for Matplotlib

fig, ax = plt.subplots(1, figsize=(10, 6))

#Create map

merged.plot(column=variable, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-185-3dd9742880f0> in <module>
----> 1 merged.plot(column=variable, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')

~\Anaconda3\lib\site-packages\geopandas\geodataframe.py in plot(self, *args, **kwargs)
604 from there.
605 """
--> 606 return plot_dataframe(self, *args, **kwargs)
607
608 plot.__doc__ = plot_dataframe.__doc__

~\Anaconda3\lib\site-packages\geopandas\plotting.py in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, classification_kwds, **style_kwds)
542 values = np.array(binning.yb)
543
--> 544 mn = values[~np.isnan(values)].min() if vmin is None else vmin
545 mx = values[~np.isnan(values)].max() if vmax is None else vmax
546

~\Anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims, initial)
30 def _amin(a, axis=None, out=None, keepdims=False,
31 initial=_NoValue):
---> 32 return umr_minimum(a, axis, None, out, keepdims, initial)
33
34 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,

ValueError: zero-size array to reduction operation minimum which has no identity


_________________
Who’s better at math than a robot? They’re made of math!


zacb
Veteran
Veteran

User avatar

Joined: 7 May 2012
Age: 27
Gender: Male
Posts: 1,117

14 Mar 2021, 3:47 pm

Well I am half wonder if the path might be part of the issue, but that is just experience I have had personally.

Also I do know that sometimes if you need to convert it to a string it can be a bit of an issue.

One final thing, worse comes to worse , and assuming it is not too big, parsing the file as a txt file might work better. One time I had a 2 gb file and even with 32 gbs of ram it would fail to load. I opened it as a regular file in python and parsed it and that actually worked better.

Wish I could be of more help. The only major thing I might do differently is the / might work better as a \, assuming this is for Windows.



Fenn
Veteran
Veteran

User avatar

Joined: 1 Sep 2014
Gender: Male
Posts: 1,240
Location: Pennsylvania

22 Mar 2021, 9:16 pm

I would first try exporting the excel spreadsheet as text.
I like tab-seperated text, but CSV is another option.
I would then try write the python code to read one line at a time, printing each line.
I would use RosettaCode to find reference code.
Then when I was sure I was reading each line correctly I would break each line into fields and print each field for each line.
I like to use a variable called "debug" to turn on and turn off "tracer code". That way I can see what I am doing.
After I had the fields breaking up correctly I would work on making sure I was converting text to numbers correctly.
Numbers can be used for math - adding zero or multiplying by 1.0. Text cannot.
When that was all working the I would work on the graphing, or generating of SVG shapes.
Take one thing at a time - get it working, then move on.
For me I can can stay focused for longer if I take small steps and can see the results each time.
Using Classes and Functions can help to break up the code into smaller chunks.
TDD is a good way to do that or reading about "refactoring".


_________________
ADHD-I(diagnosed) ASD-HF(undiagnosed - maybe)
RDOS scores - Aspie score 131/200 - neurotypical score 69/200 - very likely Aspie


Fenn
Veteran
Veteran

User avatar

Joined: 1 Sep 2014
Gender: Male
Posts: 1,240
Location: Pennsylvania

29 Mar 2021, 8:58 pm

P.S.
Have you made any progress?
I tried to get your python running on my Mac but I couldn't get the pandas library to compile from pip.
(I used Windows and Linux at work but have a Mac for my home use - not really built out for development - I bought it just to create my resume).
I may try again with miniconda when I have more time.
What is the exact version of python that you are running? Python 2.x or 3.x? On Windows 10 or Windows 7?
It looks like what you posted was an interactive session - do you have the same code in a .py file to run it as a script?

Best of luck


_________________
ADHD-I(diagnosed) ASD-HF(undiagnosed - maybe)
RDOS scores - Aspie score 131/200 - neurotypical score 69/200 - very likely Aspie


Tim_Tex
Veteran
Veteran

User avatar

Joined: 2 Jul 2004
Age: 42
Gender: Male
Posts: 43,400
Location: Houston, Texas

06 Apr 2021, 4:53 pm

It turns out there were compatibility issues between the version of Python (3.8 ) and the data libraries I was using.


_________________
Who’s better at math than a robot? They’re made of math!


Fenn
Veteran
Veteran

User avatar

Joined: 1 Sep 2014
Gender: Male
Posts: 1,240
Location: Pennsylvania

09 Apr 2021, 6:51 pm

Glad you found a solution.
A lot of Python intros are written for Linux. For Windows it looks like creating a Python Virtual Environment with miniconda or anaconda is the way to go.
I use Cygwin on Windows and it doesn't play well with Python.
venv is one of the places where that is especially true.
Mixing pip and miniconda might cause problems with library versions too.
MSYS2 is supposed to be more python friendly - they patched C Python a lot to obtain that result.
It is kind of like perl on Windows was years ago.


_________________
ADHD-I(diagnosed) ASD-HF(undiagnosed - maybe)
RDOS scores - Aspie score 131/200 - neurotypical score 69/200 - very likely Aspie