Cancer County EDA with PandaSQL¶

Summary¶

Cancer is the second leading cause of death within the United States next to heart disease. Everyone in modern day US knows someone who has been affected by cancer either through death in the family, watching a friend recover with chemotherapy, or meeting a survivor who prays for continued remission. Although cancer diagnosis rate has decreased by 27% over the last 20 years (1, CDC), it still is a constant looming fear in the eyes of most individuals especially approaching their 50's.

In this exploratory data analysis I hope to uncover any insight into the cancer rates across the US. Perhaps allowing another individual more in tune with cultural, political, or geographic causes/datasets to have a place to start their own discoveries. And maybe create a comprehensive list of states/counties to avoid for the sake of your personal health...

Kidding aside I hope this EDA or use of Pandasql is useful to you. There are 3 slightly different syntax for engaging in sql with the pandas data frame that I have used throughout this notebook. I will point them out as they come up and go over the strengths and weaknesses I found of each while doing this project.

Preparation¶

Dataset used¶

This data comes from cancer.gov and the US Census American Community Survey compiled by another kaggle user- Noah Rippner. Who has compiled and released the dataset. Any other users are free to use, manipulate, and transform the data at will with appropriate credit. Thank you!

Dataset Organization¶

There are two tables within this dataset. One reporting Incidence rates of cancer across the counties in the US. And another reporting the Death rates from cancer related causes within the US. The intial data was collected 05/09/2017 and was added upon until 2020 where it was recently published from data.world to Kaggle. The data includes the county, state, FIPS, age-adjusted death rate/100k, average deaths per year per county, recent trend of aa_death_rate, upper and lower 95% confidence intervals of trends, 5 year trends in death and incidence, average annual count of diagnoses per county, and interestingly enough a Met45.5 objective. In short did this county/state maintain a 45.5/100,000 age-adjusted death rate? Which is incredibley low compared to the US average of 144.1/100,000 (1, CDC).

Dataset Integrity¶

There are 3,141 individual rows of data each representing a different county. There are 3,143 coutnies in the original 50 states, but including territories as well there are 3,243 counties. This data does include D.C. and Rhode Island with various territories in Alaska to bring the count up. Even with the county listed in row there are quite a few nulls that we will get into later. The data creator has scraped various sources to complete incident data. With no available data for Nevada.

Processing of the Data¶

Analysis Tools¶

Throughout this notebook I tried to display SQL through PandaSQL as much as possible. There are a few lines that had to be done using strictly Python to manipulate the dataframe before analysis with SQL. I am used to R and Rstudio syntax/tools so, please help me build my understanding of Python by critiquing my code in the comments! After the initial publishing of this EDA using Pandasql I will create a dashboard using Tableau Public as soon as possible. To visualize the insights found in the hard numbers of the data. This would be a great dataset to practice use of matplotlib or plotly packages, however, under the scope of this project I think SQL and Tableau would better serve my purposes.

Packages for Analysis¶

To use SQL in Python I imported

  • Numpy package for manipulations not allowed by pandas

  • pandas to create dataframes and transform the data from .CSV to dataframe type data to be used as a database/table

  • Pandasql to manipulate the pandas dataframes with basic sql queries

The above syntax reads in the .csv into a pandas dataframe without having to connect the database or transform the .csv files into tables for sql analysis, which was very helpful. I then set the max tibble to 100 as there were times when the table would print 3000+ rows and DRASTICALLY slow down the system and is not all that helpful for data processing. It should automatically tibble at 12 rows, however, I found a lot of the really cool findings would be about 50 rows down.

After loading in the data I knew I had to rename the coloumns to be able to use SQL. It is a particular query language already when it comes to names of columns from specific tables. So I made them more concise using '_', avoiding '.', and refraining from capitalization. While hopefully maintaining general meaning of particular columns throughout the analysis.

      index                                        county   FIPS met45 aa_deathrate Lower95  \
0         0                                 United States      0    No           46    45.9   
1         1                        Perry County, Kentucky  21193    No        125.6   108.9   
2         2                       Powell County, Kentucky  21197    No        125.3   100.2   
3         3                   North Slope Borough, Alaska   2185    No        124.9      73   
4         4                       Owsley County, Kentucky  21189    No        118.5    83.1   
...     ...                                           ...    ...   ...          ...     ...   
3136   3136  Yakutat City and Borough, Alaska<sup>3</sup>   2282     *            *       *   
3137   3137             Yukon-Koyukuk Census Area, Alaska   2290     *            *       *   
3138   3138                          Zapata County, Texas  48505     *            *       *   
3139   3139                          Zavala County, Texas  48507     *            *       *   
3140   3140                  Ziebach County, South Dakota  46137     *            *       *   

     upper95 avgdeath_year trending five_year lower95_trend upper95_trend  
0       46.1       157,376  falling      -2.4          -2.6          -2.2  
1      144.2            43   stable      -0.6          -2.7           1.6  
2      155.1            18   stable       1.7             0           3.4  
3      194.7             5       **        **            **            **  
4      165.5             8   stable       2.2          -0.4           4.8  
...      ...           ...      ...       ...           ...           ...  
3136       *             *       **        **            **            **  
3137       *             *       **        **            **            **  
3138       *             *        *         *             *             *  
3139       *             *       **        **            **            **  
3140       *             *       **        **            **            **  

[3141 rows x 12 columns]
      index                            county   FIPS aa_deathrate_per_100k lower95 upper95  \
0         0              US (SEER+NPCR)(1,10)      0                  62.4    62.3    62.6   
1         1     Autauga County, Alabama(6,10)   1001                  74.9    65.1    85.7   
2         2     Baldwin County, Alabama(6,10)   1003                  66.9    62.4    71.7   
3         3     Barbour County, Alabama(6,10)   1005                  74.6    61.8    89.4   
4         4        Bibb County, Alabama(6,10)   1007                  86.4      71   104.2   
...     ...                               ...    ...                   ...     ...     ...   
3136   3136  Sweetwater County, Wyoming(6,10)  56037                  39.9    30.5    51.1   
3137   3137       Teton County, Wyoming(6,10)  56039                  23.7    14.7    36.1   
3138   3138       Uinta County, Wyoming(6,10)  56041                  31.7    20.8    46.1   
3139   3139    Washakie County, Wyoming(6,10)  56043                    50    33.8    72.2   
3140   3140      Weston County, Wyoming(6,10)  56045                  44.9    27.9    69.6   

     avg_cases_annually trending five_year lower95_trend upper95_trend  
0                214614  falling      -2.5            -3            -2  
1                    43   stable       0.5         -14.9          18.6  
2                   170   stable         3         -10.2          18.3  
3                    25   stable      -6.4         -18.3           7.3  
4                    23   stable      -4.5         -31.4          32.9  
...                 ...      ...       ...           ...           ...  
3136                 14   stable      12.6         -18.1          54.9  
3137                  5   stable     -19.6         -35.5           0.1  
3138                  6   stable      -0.1         -18.3            22  
3139                  6   stable      13.5         -12.2          46.7  
3140                  4   stable     -26.2         -65.4          57.4  

[3141 rows x 11 columns]

After printing the tables using more Python I was able to see the changes took effect and I was ready to go! The first thing I had to do was try to drop the upper and lower 95th percentiles trends as they were not really useful information for my EDA. Pandasql does not have a function to drop whole columns to my knowledge. But I left the typical syntax in incase the functionality does come online as the package is always being updated.

index county FIPS aa_deathrate_per_100k lower95 upper95 avg_cases_annually trending five_year
0 0 US (SEER+NPCR)(1,10) 0 62.4 62.3 62.6 214614 falling -2.5
1 1 Autauga County, Alabama(6,10) 1001 74.9 65.1 85.7 43 stable 0.5
2 2 Baldwin County, Alabama(6,10) 1003 66.9 62.4 71.7 170 stable 3
3 3 Barbour County, Alabama(6,10) 1005 74.6 61.8 89.4 25 stable -6.4
4 4 Bibb County, Alabama(6,10) 1007 86.4 71 104.2 23 stable -4.5
... ... ... ... ... ... ... ... ... ...
3136 3136 Sweetwater County, Wyoming(6,10) 56037 39.9 30.5 51.1 14 stable 12.6
3137 3137 Teton County, Wyoming(6,10) 56039 23.7 14.7 36.1 5 stable -19.6
3138 3138 Uinta County, Wyoming(6,10) 56041 31.7 20.8 46.1 6 stable -0.1
3139 3139 Washakie County, Wyoming(6,10) 56043 50 33.8 72.2 6 stable 13.5
3140 3140 Weston County, Wyoming(6,10) 56045 44.9 27.9 69.6 4 stable -26.2

3141 rows × 9 columns

Next I parsed the state from the county information as best as I could using python. If I could have used only SQL it would have been easy using a LIKE with a RIGHT, LEFT, string_split function. In Python I created new columns and split at the first space, then deleted that info as it was all the word 'County' then splitting again to rename that portion to state. It was not the most elegant solution but allowed me to group in SQL fairly accurately.

index county FIPS met45 aa_deathrate Lower95 upper95 avgdeath_year trending five_year state
0 0 United 0 No 46 45.9 46.1 157,376 falling -2.4 None
1 1 Perry 21193 No 125.6 108.9 144.2 43 stable -0.6 Kentucky
2 2 Powell 21197 No 125.3 100.2 155.1 18 stable 1.7 Kentucky
3 3 North 2185 No 124.9 73 194.7 5 ** ** Borough, Alaska
4 4 Owsley 21189 No 118.5 83.1 165.5 8 stable 2.2 Kentucky
... ... ... ... ... ... ... ... ... ... ... ...
3136 3136 Yakutat 2282 * * * * * ** ** and Borough, Alaska<sup>3</sup>
3137 3137 Yukon-Koyukuk 2290 * * * * * ** ** Area, Alaska
3138 3138 Zapata 48505 * * * * * * * Texas
3139 3139 Zavala 48507 * * * * * ** ** Texas
3140 3140 Ziebach 46137 * * * * * ** ** South Dakota

3141 rows × 11 columns

Then I did the same processing to the incd table to keep it even which allows me to be able to JOIN the tables information.

index county FIPS aa_deathrate_per_100k lower95 upper95 avg_cases_annually trending five_year state
0 0 US 0 62.4 62.3 62.6 214614 falling -2.5 None
1 1 Autauga 1001 74.9 65.1 85.7 43 stable 0.5 Alabama(6,10)
2 2 Baldwin 1003 66.9 62.4 71.7 170 stable 3 Alabama(6,10)
3 3 Barbour 1005 74.6 61.8 89.4 25 stable -6.4 Alabama(6,10)
4 4 Bibb 1007 86.4 71 104.2 23 stable -4.5 Alabama(6,10)
... ... ... ... ... ... ... ... ... ... ...
3136 3136 Sweetwater 56037 39.9 30.5 51.1 14 stable 12.6 Wyoming(6,10)
3137 3137 Teton 56039 23.7 14.7 36.1 5 stable -19.6 Wyoming(6,10)
3138 3138 Uinta 56041 31.7 20.8 46.1 6 stable -0.1 Wyoming(6,10)
3139 3139 Washakie 56043 50 33.8 72.2 6 stable 13.5 Wyoming(6,10)
3140 3140 Weston 56045 44.9 27.9 69.6 4 stable -26.2 Wyoming(6,10)

3141 rows × 10 columns

FIPS is an identifier for individual states and their county. The first two digits correspond to state and last 3 digits are to signify county. While doing the actual analysis some FIPS only had 4 digits as the leading zero was removed when reading the dataframe in. So I added a leading zero to the entire column until each line had 5 numbers. Then convert the FIPS to string type to pull first 2 digits. This was a much cleaner grouping operator than the messy state/county divide.

death FIPS BEFORE CONVERSION
 index             int64
county           object
FIPS              int64
met45            object
aa_deathrate     object
Lower95          object
upper95          object
avgdeath_year    object
trending         object
five_year        object
state            object
dtype: object 

death FIPS AFTER CONVERSION
 index             int64
county           object
FIPS             object
met45            object
aa_deathrate     object
Lower95          object
upper95          object
avgdeath_year    object
trending         object
five_year        object
state            object
state_code       object
dtype: object 

incd FIPS BEFORE CONVERSION
 index                     int64
county                   object
FIPS                      int64
aa_deathrate_per_100k    object
lower95                  object
upper95                  object
avg_cases_annually       object
trending                 object
five_year                object
state                    object
dtype: object 

incd FIPS AFTER CONVERSION
 index                     int64
county                   object
FIPS                     object
aa_deathrate_per_100k    object
lower95                  object
upper95                  object
avg_cases_annually       object
trending                 object
five_year                object
state                    object
state_code               object
dtype: object 

index county FIPS aa_deathrate_per_100k lower95 upper95 avg_cases_annually trending five_year state state_code
0 0 US 00000 62.4 62.3 62.6 214614 falling -2.5 None 00
1 1 Autauga 01001 74.9 65.1 85.7 43 stable 0.5 Alabama(6,10) 01
2 2 Baldwin 01003 66.9 62.4 71.7 170 stable 3 Alabama(6,10) 01
3 3 Barbour 01005 74.6 61.8 89.4 25 stable -6.4 Alabama(6,10) 01
4 4 Bibb 01007 86.4 71 104.2 23 stable -4.5 Alabama(6,10) 01
... ... ... ... ... ... ... ... ... ... ... ...
3136 3136 Sweetwater 56037 39.9 30.5 51.1 14 stable 12.6 Wyoming(6,10) 56
3137 3137 Teton 56039 23.7 14.7 36.1 5 stable -19.6 Wyoming(6,10) 56
3138 3138 Uinta 56041 31.7 20.8 46.1 6 stable -0.1 Wyoming(6,10) 56
3139 3139 Washakie 56043 50 33.8 72.2 6 stable 13.5 Wyoming(6,10) 56
3140 3140 Weston 56045 44.9 27.9 69.6 4 stable -26.2 Wyoming(6,10) 56

3141 rows × 11 columns

In strictly SQL, again it would have been an easier solution to use a LEFT (FIPS, 2) as state_code operation to pull the state into its own column and rename it. This function is not supported yet.

Now that the data has been organized and made easily usable by pandasql queries it is time to fill in the missing values with a consistent value. I researched and found NaN is the typical Python form for missing values and initially plugged in the numpy package np.NaN to signify Not a Number, but pulling the data and excluding it in sql was not compatible so I used the string/char 'nan' instead to get the message across and easily pull it from the results. As seen above in the dtypes transformation all values besides indexing were strings already so, this was a complimentary solution specific to this dataset.

Analysis¶

Now the SQL queries begin. I started with the total row count of the dataset and then found all of the nulls in the data. Typically you would want to drop null values, but since they were previously marked with 'nan' and there is still good data within the specific rows I think it would be a waste to drop entire rows of counties for analysis when I can just include != 'nan' statements.

counties in death count
0 3141
counties in incidence count
0 3141

3141 counties reported within the dataset and it is the same amount of data points between both datasets.

age adjusted nulls
0 328
total death nulls
0 328

328 total counties reporting do not have reported avg death rate or age adjusted death rates

total counties missing avg cases annually
0 209

209 counties have no information about average cases diagnosed annually in the incidence table.

county count state
0 1 None
1 67 Alabama
2 15 Arizona
3 75 Arkansas
4 27 Borough, Alaska
5 64 Colorado
6 1 Columbia (State)
7 8 Connecticut
8 58 County, California
9 39 County, Washington
10 3 Delaware
11 67 Florida
12 159 Georgia
13 5 Hawaii
14 44 Idaho
15 102 Illinois
16 92 Indiana
17 99 Iowa
18 105 Kansas
19 120 Kentucky
20 16 Maine
21 24 Maryland
22 14 Massachusetts
23 83 Michigan
24 87 Minnesota
25 82 Mississippi
26 115 Missouri
27 56 Montana
28 93 Nebraska
29 17 Nevada
30 10 New Hampshire
31 21 New Jersey
32 33 New Mexico
33 62 New York
34 100 North Carolina
35 53 North Dakota
36 88 Ohio
37 77 Oklahoma
38 36 Oregon
39 64 Parish, Louisiana
40 67 Pennsylvania
41 5 Rhode Island
42 46 South Carolina
43 66 South Dakota
44 95 Tennessee
45 254 Texas
46 29 Utah
47 14 Vermont
48 133 Virginia
49 55 West Virginia
50 72 Wisconsin
51 23 Wyoming
total average deaths in US per year
0 146841.0

There is a total of 146,841 deaths due to cancer related problems in the US per year according to this dataset.

Avg across all counties
0 134.54

There is an average of 134.5/100,000 population cases diagnosed per county per year across the United States. This data includes all cancer types.

    county county with least diagnosis
0  Choctaw                          10
   county county with most diagnosis
0  Nassau                        992

Choctaw County, Alabama has the fewest average annual cases diagnosed, while Nassau county New york has the most avg annual cases. After a quick google seach Choctaw county has a population of about ~12,400, while Nassau county has a population of ~1,383,000. So this is an unfair comparison but a good look at cases on the whole. Lets try to group them by state and see which states have the highest avg annual cases.

cases_per_county_per_year state
0 152.46 Maryland(6,10)
1 24.13 Iowa(7,8)
2 159.07 Pennsylvania(6,10)
3 59.52 Tennessee(6,10)
4 13.32 Montana(6,10)
5 294.40 California(7,8)
6 91.25 Illinois(6,10)
7 110.28 Washington(6,10)
8 332.88 Connecticut(7,8)
9 105.60 New Hampshire(6,10)
10 54.45 Louisiana(7,9)
11 51.48 Texas(6,10)
12 39.82 Virginia(6,10)
13 59.60 Alabama(6,10)
14 8.91 South Dakota(6,10)
15 172.80 Rhode Island(6,10)
16 351.00 Columbia(6,10)
17 55.89 Wisconsin(6,10)
18 109.67 Ohio(6,10)
19 37.36 Vermont(6,10)
20 22.24 Utah(7)
21 9.09 North Dakota(6,10)
22 13.48 Borough, Alaska(6,10)
23 30.49 Mississippi(6,10)
24 29.42 New Mexico(7,8)
25 219.53 New York(6,10)
26 358.79 Massachusetts(6,10)
27 81.41 South Carolina(6,10)
28 13.47 Nebraska(6,10)
29 280.90 New Jersey(7,8)
30 214614.00 None
31 40.08 Kentucky(7,9)
32 35.73 Arkansas(6,10)
33 19.50 Idaho(6,10)
34 57.77 Indiana(6,10)
35 35.28 Colorado(6,10)
36 74.69 Oregon(6,10)
37 39.27 Oklahoma(6,10)
38 245.46 Florida(6,10)
39 252.60 Arizona(6,10)
40 46.24 Missouri(6,10)
41 95.01 Michigan(6,10)
42 39.82 Georgia(7,9)
43 257.67 Delaware(6,10)
44 75.58 North Carolina(6,10)
45 36.44 West Virginia(6,10)
46 12.74 Wyoming(6,10)
47 155.80 Hawaii(7,9)
48 82.75 Maine(6,10)

So subtracting the 'None' column from the previous query and the US total will give us a rough look at the cure/recovery rate per year. This is a very rough estimate as some treatment may last years, we dont know how they count recurring diagnosis, the deaths could have been cancer related but not caused by cancer, or the incidence/death averages could be skewed with other confounding variable- it will be a fun look none the less. (214614 - 146841 = 67,773 average cases cured/recovered in a year) Making that a percent over the death rate, while grim, would give us an estimation over whole cause mortality after being diagnosed with cancer in the US. ( 67,773 / 146,841 = 0.4615 or 46.1% chance on average to recover from all cause cancer in any given year according to these numbers.)

state total average cases diagnosed per state per year
0 None 214614.0
1 Alabama(6,10) 3993.0
2 Arizona(6,10) 3789.0
3 Arkansas(6,10) 2680.0
4 Borough, Alaska(6,10) 364.0
5 California(7,8) 17075.0
6 Colorado(6,10) 2258.0
7 Columbia(6,10) 351.0
8 Connecticut(7,8) 2663.0
9 Delaware(6,10) 773.0
10 Florida(6,10) 16446.0
11 Georgia(7,9) 6332.0
12 Hawaii(7,9) 779.0
13 Idaho(6,10) 858.0
14 Illinois(6,10) 9308.0
15 Indiana(6,10) 5315.0
16 Iowa(7,8) 2389.0
17 Kansas(6) 0.0
18 Kentucky(7,9) 4810.0
19 Louisiana(7,9) 3485.0
20 Maine(6,10) 1324.0
21 Maryland(6,10) 3659.0
22 Massachusetts(6,10) 5023.0
23 Michigan(6,10) 7886.0
24 Minnesota(6) 0.0
25 Mississippi(6,10) 2500.0
26 Missouri(6,10) 5318.0
27 Montana(6,10) 746.0
28 Nebraska(6,10) 1253.0
29 Nevada(6) 0.0
30 New Hampshire(6,10) 1056.0
31 New Jersey(7,8) 5899.0
32 New Mexico(7,8) 971.0
33 New York(6,10) 13611.0
34 North Carolina(6,10) 7558.0
35 North Dakota(6,10) 482.0
36 Ohio(6,10) 9651.0
37 Oklahoma(6,10) 3024.0
38 Oregon(6,10) 2689.0
39 Pennsylvania(6,10) 10658.0
40 Rhode Island(6,10) 864.0
41 South Carolina(6,10) 3745.0
42 South Dakota(6,10) 588.0
43 Tennessee(6,10) 5654.0
44 Texas(6,10) 13076.0
45 Utah(7) 645.0
46 Vermont(6,10) 523.0
47 Virginia(6,10) 5296.0
48 Washington(6,10) 4301.0
49 West Virginia(6,10) 2004.0
50 Wisconsin(6,10) 4024.0
51 Wyoming(6,10) 293.0
county aa_deathrate avgdeath_year
0 United 46 157,376
1 Perry 125.6 43
2 Powell 125.3 18
3 North 124.9 5
4 Owsley 118.5 8
... ... ... ...
2808 Eagle 14.9 5
2809 Summit 14.4 4
2810 Utah 12.4 37
2811 McKinley 11.6 7
2812 Cache 9.2 7

2813 rows × 3 columns

state total average deaths per state per year
0 Alabama 3186.0
1 Arizona 1278.0
2 Arkansas 2141.0
3 Borough, Alaska 242.0
4 Colorado 1563.0
5 Columbia (State) 240.0
6 Connecticut 1735.0
7 County, California 8780.0
8 County, Washington 3139.0
9 Delaware 565.0
10 Florida 11928.0
11 Georgia 4506.0
12 Hawaii 537.0
13 Idaho 597.0
14 Illinois 4262.0
15 Indiana 4010.0
16 Iowa 1751.0
17 Kansas 1437.0
18 Kentucky 3452.0
19 Maine 956.0
20 Maryland 2738.0
21 Massachusetts 3465.0
22 Michigan 4747.0
23 Minnesota 2355.0
24 Mississippi 1942.0
25 Missouri 3914.0
26 Montana 472.0
27 Nebraska 842.0
28 Nevada 1302.0
29 New Hampshire 736.0
30 New Jersey 4097.0
31 New Mexico 719.0
32 New York 9094.0
33 North Carolina 5484.0
34 North Dakota 276.0
35 Ohio 7410.0
36 Oklahoma 2433.0
37 Oregon 2058.0
38 Parish, Louisiana 2720.0
39 Pennsylvania 7697.0
40 Rhode Island 623.0
41 South Carolina 2798.0
42 South Dakota 374.0
43 Tennessee 4358.0
44 Texas 8266.0
45 Utah 421.0
46 Vermont 371.0
47 Virginia 3967.0
48 West Virginia 1509.0
49 Wisconsin 2963.0
50 Wyoming 228.0

Met 45.5¶

In the next section I got curious about the met 45.5 objective as that would be a great direction for healthy states overall. The current goal for the CDC is a 122.7/100,000 deaths by 2030 so, already being at 45.5/100,000 is a fantastic achievement. It is also worth looking into for further studies into specific states that are able to tout an assumed survival rate at that level.

# of counties that reached standard 45.5/100,000 age adjusted deaths
0 828

In total there are 828 / 3141 reported counties that reached the 45.5/100,000 objective or a total of ~26.4% counties. Only a quarter of the US is smashing the survival objective for cancer rates. So lets dive a little deeper and see why that may be.

Starting with which states have even 1 county within that objective...

states that have counties that reached 45.5
0 Alabama
1 Alaska
2 Arizona
3 County, California
4 Colorado
5 County, Connecticut
6 Columbia (State)
7 Florida
8 Georgia
9 Hawaii
10 Idaho
11 Illinois
12 Indiana
13 Iowa
14 Kansas
15 Louisiana
16 Maine
17 Maryland
18 Massachusetts
19 Michigan
20 Minnesota
21 Mississippi
22 Missouri
23 Montana
24 Nebraska
25 Nevada
26 New Hampshire
27 New Jersey
28 New Mexico
29 New York
30 North Carolina
31 North Dakota
32 Ohio
33 Oklahoma
34 Oregon
35 Pennsylvania
36 Rhode Island
37 South Carolina
38 South Dakota
39 Tennessee
40 County, Texas
41 Utah
42 County, Vermont
43 and County, Virginia
44 Washington
45 West Virginia
46 Wisconsin
47 Wyoming

It looks like the majority of states have atleast one county with a met45.5. The missing could just be NaN values for the state as a whole. So lets turn it into a percentage of the entire reported state to see which ones are actually doing well overall.

state Percent_counties_met45 Percent_counties_not_met45
0 Alabama 8.96 91.04
1 Arizona 93.33 6.67
2 Arkansas 0.00 100.00
3 Borough, Alaska 15.38 84.62
4 Colorado 95.00 5.00
5 Columbia (State) 100.00 0.00
6 Connecticut 87.50 12.50
7 County, California 78.18 21.82
8 County, Washington 54.05 45.95
9 Delaware 0.00 100.00
10 Florida 28.36 71.64
11 Georgia 17.22 82.78
12 Hawaii 100.00 0.00
13 Idaho 75.86 24.14
14 Illinois 13.73 86.27
15 Indiana 10.87 89.13
16 Iowa 55.67 44.33
17 Kansas 33.33 66.67
18 Kentucky 0.00 100.00
19 Maine 6.25 93.75
20 Maryland 20.83 79.17
21 Massachusetts 57.14 42.86
22 Michigan 17.07 82.93
23 Minnesota 71.95 28.05
24 Mississippi 6.17 93.83
25 Missouri 11.40 88.60
26 Montana 57.14 42.86
27 Nebraska 57.14 42.86
28 Nevada 25.00 75.00
29 New Hampshire 30.00 70.00
30 New Jersey 61.90 38.10
31 New Mexico 92.31 7.69
32 New York 22.58 77.42
33 North Carolina 13.00 87.00
34 North Dakota 60.87 39.13
35 Ohio 13.64 86.36
36 Oklahoma 8.33 91.67
37 Oregon 54.55 45.45
38 Parish, Louisiana 7.81 92.19
39 Pennsylvania 40.91 59.09
40 Rhode Island 40.00 60.00
41 South Carolina 13.04 86.96
42 South Dakota 57.14 42.86
43 Tennessee 2.11 97.89
44 Texas 38.38 61.62
45 Utah 100.00 0.00
46 Vermont 28.57 71.43
47 Virginia 23.85 76.15
48 West Virginia 10.91 89.09
49 Wisconsin 47.89 52.11
50 Wyoming 85.71 14.29

Sure enough out of the states listed the ones that stick out to me are Arizona, Colorado, Connecticut, New Mexico and Wyoming. Potentially Hawaii and Utah as well, however being 100% makes me skeptical- Hawaii has 5 counties and Utah has 29. But other than that the main commonality between these states to me is the prevalance for an outdoor culture overall. They all have many opportunities for outdoor activites due too nature, government infastructure, and cultural lifestyle that would lead to a healthier outlook in recovering from cancer. While simultaneously boasting a decent size population (besides Wyoming) for assumed better than average healthcare from doctors who have world class training and care.

On the other side though many of the states that are lacking in their ability to meet the objective are in the South. Alabama, Louisiana, Mississippi, Tennessee, and another odd one out- Maine. Again ignoring Arkansas, Delaware, and Kentucky as they have 100% no met45 counties. While they are absent from the first query about having any met45 counties and are mostly 'in the South' besides Delaware. They may not be reliable figures and require further research.

Trending in Incidence¶

Now that we have seen the met45.5 objective it might be interesting to look at the individual states and how they are trending in their diagnosis of cancer. The trends are listed in a last year basis with a numeric value, a 5 year basis with a numeric value, and a last year trend in a string- rising, falling, stable. I thought it would be fun to encode the string first before dealing with the numeric values to see what is trending in the trends!

      index      county   FIPS aa_deathrate_per_100k lower95 upper95 avg_cases_annually trending  \
0         0          US  00000                  62.4    62.3    62.6             214614  falling   
1         1     Autauga  01001                  74.9    65.1    85.7                 43   stable   
2         2     Baldwin  01003                  66.9    62.4    71.7                170   stable   
3         3     Barbour  01005                  74.6    61.8    89.4                 25   stable   
4         4        Bibb  01007                  86.4      71   104.2                 23   stable   
...     ...         ...    ...                   ...     ...     ...                ...      ...   
3136   3136  Sweetwater  56037                  39.9    30.5    51.1                 14   stable   
3137   3137       Teton  56039                  23.7    14.7    36.1                  5   stable   
3138   3138       Uinta  56041                  31.7    20.8    46.1                  6   stable   
3139   3139    Washakie  56043                    50    33.8    72.2                  6   stable   
3140   3140      Weston  56045                  44.9    27.9    69.6                  4   stable   

     five_year          state state_code  trending_encoded  
0         -2.5           None         00               1.0  
1          0.5  Alabama(6,10)         01               2.0  
2            3  Alabama(6,10)         01               2.0  
3         -6.4  Alabama(6,10)         01               2.0  
4         -4.5  Alabama(6,10)         01               2.0  
...        ...            ...        ...               ...  
3136      12.6  Wyoming(6,10)         56               2.0  
3137     -19.6  Wyoming(6,10)         56               2.0  
3138      -0.1  Wyoming(6,10)         56               2.0  
3139      13.5  Wyoming(6,10)         56               2.0  
3140     -26.2  Wyoming(6,10)         56               2.0  

[3141 rows x 12 columns]

Now that we have encoded the trend from the previous years of data- whether the county is rising, falling, or stable in their diagnosis of cancer year to year we can group them and analyze on a state by state basis or the US as a whole

total average of trend avg five year trend
0 1.941595 -1.243453

Here we averaged for the entire US to see what the trend is. It seems the diagnosis of cancer is slightly below stable. Which is good news! That means there has been fewer diagnosis than previous years overall or the diagnostic rate is "falling". I put the average five year trend next to the average of our current to last year trend to see how the diagnosis was trending through the dataset. It seems incidence of cancer is falling consistently over the previous 5 years and from the datapoint of just last year as well- so it is a continuing trend in this glimpse.

Next lets look at the trend on a state-by-state basis.

state avg state trend avg 5 yr state trend
0 Alabama(6,10) 1.93 -2.79
1 Alaska(6,10) 2.00 -6.27
2 Arizona(6,10) 1.93 -3.51
3 Arkansas(6,10) 1.96 1.48
4 California(7,8) 1.73 -5.71
5 Colorado(6,10) 1.93 -4.91
6 Columbia(6,10) 2.00 -1.70
7 Connecticut(7,8) 1.63 -0.74
8 County, Utah(7,8) 1.87 -1.57
9 Delaware(6,10) 2.00 -0.80
10 Florida(6,10) 1.87 -1.77
11 Georgia(7,9) 1.94 -0.35
12 Hawaii(7,9) 2.00 -1.43
13 Idaho(6,10) 2.00 -1.59
14 Illinois(6,10) 1.88 -1.38
15 Indiana(6,10) 2.00 -2.08
16 Iowa(7,8) 2.03 -0.36
17 Kentucky(7,9) 1.98 -1.27
18 Louisiana(7,9) 1.97 -2.28
19 Maine(6,10) 2.00 0.49
20 Maryland(6,10) 1.92 -0.89
21 Massachusetts(6,10) 1.86 1.54
22 Michigan(6,10) 1.93 -3.34
23 Mississippi(6,10) 1.96 -1.89
24 Missouri(6,10) 2.02 -0.34
25 Montana(6,10) 2.03 -0.99
26 Nebraska(6,10) 2.02 2.74
27 New Hampshire(6,10) 1.90 -2.34
28 New Jersey(7,8) 1.33 -3.48
29 New Mexico(7,8) 1.85 -1.13
30 New York(6,10) 1.94 -1.40
31 North Carolina(6,10) 1.97 -1.73
32 North Dakota(6,10) 2.04 3.30
33 Ohio(6,10) 1.89 -2.17
34 Oklahoma(6,10) 1.96 -1.33
35 Oregon(6,10) 1.94 -3.85
36 Pennsylvania(6,10) 2.00 -1.36
37 Rhode Island(6,10) 2.00 -2.96
38 South Carolina(6,10) 1.96 -0.69
39 South Dakota(6,10) 1.95 -0.76
40 Tennessee(6,10) 1.95 -1.27
41 Texas(6,10) 1.97 -1.64
42 Vermont(6,10) 1.86 -4.60
43 Virginia(6,10) 1.90 -2.26
44 Washington(6,10) 1.71 -2.30
45 West Virginia(6,10) 2.00 -0.15
46 Wisconsin(6,10) 1.97 -0.48
47 Wyoming(6,10) 1.95 -3.42

From this look there are only a handful of states collectivly above a 'Stable' trend from previous years and New Jersey some how has the fastest falling rate of diagnosis. After a quick google New Jersey has had a sharp increase in popualtion in the last 5 years and only a mild -.04% - -.07% decrease within the last year that this data was collected. So, New Jersey may have implemented other environmental, health, or diagnostic critiria that is decreasing their cancer instances from previous years. (After doing further research New Jersey had some of the highest rates in cancer incidence until about 2018 when new toxic waste protocols were put into place so, this one year glimpse was not a very accurate look of the data.)

Next it might be worthwhile to put the two tables together and see how the trends in avgerage cases diagnosed, average cases diagnosed trend, and average death rate trend might look per state.

state state_incdtrend_1_year state_avg_cases overall_deathtrend
0 Alabama 1.93 59.60 1.72
1 Arizona 1.93 270.21 1.29
2 Arkansas 1.96 35.73 1.83
3 Colorado 1.89 54.59 1.44
4 Columbia (State) 2.00 351.00 1.00
5 Connecticut 1.63 332.88 1.00
6 County, California 1.73 310.29 1.11
7 County, Washington 1.69 119.17 1.36
8 Delaware 2.00 257.67 1.00
9 Florida 1.87 245.46 1.30
10 Georgia 1.95 43.20 1.71
11 Idaho 2.00 32.29 1.83
12 Illinois 1.88 92.96 1.81
13 Indiana 2.00 57.77 1.84
14 Iowa 2.03 25.31 1.92
15 Kansas NaN 0.00 1.92
16 Kentucky 1.97 40.39 1.94
17 Maine 2.00 82.75 1.44
18 Maryland 1.92 152.46 1.25
19 Massachusetts 1.86 358.79 1.14
20 Michigan 1.93 96.13 1.71
21 Minnesota NaN 0.00 1.93
22 Mississippi 1.96 31.48 1.81
23 Missouri 2.02 48.12 1.86
24 Montana 1.96 23.07 1.63
25 Nebraska 2.02 24.32 1.89
26 Nevada NaN 0.00 1.70
27 New Hampshire 1.90 105.60 1.30
28 New Jersey 1.33 280.90 1.00
29 New Mexico 1.85 36.50 1.62
30 New York 1.94 219.53 1.45
31 North Carolina 1.97 76.28 1.62
32 North Dakota 2.05 19.11 1.95
33 Ohio 1.89 109.67 1.69
34 Oklahoma 1.96 43.39 1.77
35 Oregon 1.94 83.56 1.31
36 Parish, Louisiana 1.97 55.24 1.71
37 Pennsylvania 2.00 161.41 1.62
38 Rhode Island 2.00 172.80 1.00
39 South Carolina 1.96 81.41 1.61
40 South Dakota 1.93 16.29 1.82
41 Star, Alaska 2.00 53.20 1.40
42 Tennessee 1.95 60.09 1.84
43 Texas 1.97 68.72 1.53
44 Utah 1.85 41.93 1.64
45 Vermont 1.85 39.62 1.77
46 Virginia 1.90 40.90 1.67
47 West Virginia 2.00 36.44 1.76
48 Wisconsin 1.97 57.36 1.87
49 Wyoming 1.95 14.10 1.75
state state_code
0 Kansas(6) 20
1 Minnesota(6) 27
2 Nevada(6) 32

The previous two queries show that the incidence rate and death rate of all states have been falling at a decent pace since the incidence rates are at or below 2 (stable) and the death rate trend over the last year is also below 2 (stable) that signifies the deaths and diagnoses are decreasing from the previous year. There were 3 NaN values in the table though. So I ran a query only on the incidence table and found 3 states that are missing all of their county information. Sadly I have lived my whole life in Kansas and was hoping to do a deeper exploration into their data. However Georgia, being the Peak South, will probably have some interesting findings through their rising obesity rate. And a comparison to New jersey, with its swiftly falling incidence rate, may have vastly different looking tables.

Georgia¶

five year incidence trend death_trend county
0 -0.1 -0.9 Jackson
1 -0.1 -0.9 Thomas
2 -0.2 -0.1 Bulloch
3 -0.4 -0.5 Carroll
4 -0.5 -0.6 Brantley
... ... ... ...
132 8.3 9.6 Habersham
133 8.5 -0.2 Peach
134 8.6 -1.3 Rabun
135 9.1 -4.8 Ben
136 9.2 -2.1 Camden

137 rows × 3 columns

county age_adjusted_rate AverageDeaths_per_year Average_Diagnoses_per_year
0 Appling 56.3 12 15
1 Atkinson 75.8 6 5
2 Bacon 66.5 8 11
3 Baldwin 47.3 23 36
4 Banks 33.8 7 13
... ... ... ... ...
146 Whitfield 64.5 64 79
147 Wilcox 45.7 5 7
148 Wilkes 57.7 9 10
149 Wilkinson 56.9 8 11
150 Worth 51.3 13 19

151 rows × 4 columns

According to the last two queries the state of Georgia has 137 out of the 159 counties that have reported recent 5 year trend numbers. So I order the incidence by ascending and showed the whole table then joined the death rate to it and grouped by the county name to eliminate duplicates. This query shows that there are 82 counties out of 137 reported that have a negative trend in incidence. Or that the majority of the trend is slowly decreasing in the majority of the counties for both death and incidence. Georgia's age adjusted death rates are also decently low but still hovering around the 60-70 mark, which beats the national average but is still nowhere near the met45.5 objective.

New Jersey¶

five year incidence trend death_trend county
0 -0.4 -1.1 Cape
1 -1 -1.4 Warren
2 -1.1 -1.6 Salem
3 -1.2 -1.9 Mercer
4 -1.2 -1.7 Passaic
5 -1.3 -2.1 Somerset
6 -1.3 -1.6 Sussex
7 -1.6 -1.8 Union
8 -1.7 -2.2 Hunterdon
9 -2.1 -2.7 Hudson
10 -2.6 -2.9 Essex
11 -3.3 -2.2 Bergen
12 -4 -4.3 Ocean
13 -5.4 -2.9 Camden
14 -5.6 -3.2 Gloucester
15 -5.7 -1.8 Burlington
16 -5.8 -3.8 Morris
17 -6.4 -5.5 Atlantic
18 -6.9 -1 Cumberland
19 -7.1 -3.2 Monmouth
20 -7.3 -3.1 Middlesex

Unsurprisingly New Jersey's numbers are all drastically falling in the rate of incidence and death with 0 being stable and a positive (+) integer showing increase. This is great for New Jersey residence and can also mean that they had higher rates previously, but were able to slow the rates down. Perhaps other states could follow suit in legislation or public intervention in a way that is reliable and replicable?

county age_adjusted_rate AverageDeaths_per_year Average_Diagnoses_per_year
0 Atlantic 47.9 156 230
1 Bergen 34.7 402 580
2 Burlington 44.2 232 342
3 Camden 48.9 275 406
4 Cape 54.9 90 136
5 Cumberland 50.7 84 122
6 Essex 37.1 289 399
7 Gloucester 55.5 172 250
8 Hudson 36.5 206 279
9 Hunterdon 37.8 55 80
10 Mercer 38.2 152 235
11 Middlesex 37 319 459
12 Monmouth 42.8 317 475
13 Morris 34.8 201 287
14 Ocean 47.7 442 645
15 Passaic 39.3 202 276
16 Salem 48.1 41 62
17 Somerset 35.3 122 171
18 Sussex 45.2 74 106
19 Union 35.8 207 274
20 Warren 45.6 59 85

New Jersey's counties also have very low age adjusted death rate numbers- many hovering around the 45.5 objective discussed earlier.

Kansas¶

And now onto Kansas for my own personal interest... Sadly after the previous queries into the incidence table it was discovered that Kansas was one of the three states with no incidence data to report. They also have very limited data on the met45.5 goal that narrowed the insights into the state.

index county FIPS met45 aa_deathrate Lower95 upper95 avgdeath_year trending five_year state state_code met45_percentage
0 105 Chautauqua 20019 No 80.1 51.7 123 5 nan nan Kansas 20 0
1 174 Woodson 20207 No 75.6 44.9 123.4 4 stable 2.1 Kansas 20 0
2 260 Kearny 20093 No 71.7 40.6 117.8 3 nan nan Kansas 20 0
3 269 Geary 20061 No 71.4 56.4 88.9 16 stable 0.5 Kansas 20 0
4 403 Wyandotte 20209 No 67.4 61.5 73.7 99 stable -0.3 Kansas 20 0
5 503 Linn 20107 No 65.1 47.5 88 10 stable 1 Kansas 20 0
6 543 Montgomery 20125 No 64.4 54.4 75.9 30 stable -0.3 Kansas 20 0
7 601 Cherokee 20021 No 63.2 50.5 78.5 17 stable 0.4 Kansas 20 0
8 626 Rice 20159 No 62.7 45.3 85.4 9 rising 1.9 Kansas 20 0
9 666 Smith 20183 No 61.9 38.2 100.1 4 nan nan Kansas 20 0
10 674 Osage 20139 No 61.8 47.9 78.9 14 stable 0.2 Kansas 20 0
11 859 Crawford 20037 No 59 49.2 70.3 26 stable -0.3 Kansas 20 0
12 927 Doniphan 20043 No 58.1 38.1 85.3 6 stable -0.1 Kansas 20 0
13 1072 Phillips 20147 No 56.3 35.1 87.3 5 nan nan Kansas 20 0
14 1098 Elk 20049 No 56.1 32 99.9 3 stable -0.7 Kansas 20 0
15 1099 Labette 20099 No 56.1 44.1 70.4 16 stable 1.2 Kansas 20 0
16 1100 Jackson 20085 No 56.1 41.2 75.1 10 stable 0 Kansas 20 0
17 1109 Barber 20007 No 56 33.4 89.9 4 stable 0.9 Kansas 20 0
18 1172 Jewell 20089 No 55.2 31.5 97.6 3 nan nan Kansas 20 0
19 1204 Dickinson 20041 No 54.8 42.9 69.3 15 stable 1.2 Kansas 20 0
20 1232 Coffey 20031 No 54.4 37.3 77.8 7 stable 0 Kansas 20 0
21 1310 Brown 20013 No 53.6 37.9 74.4 8 nan nan Kansas 20 0
22 1330 Lyon 20111 No 53.3 43 65.4 19 stable -0.4 Kansas 20 0
23 1353 Franklin 20059 No 53 42 66.1 16 stable 0.7 Kansas 20 0
24 1354 Butler 20015 No 53 45.6 61.2 38 stable -0.5 Kansas 20 0
25 1355 Harper 20077 No 53 34.6 80 5 stable 1.3 Kansas 20 0
26 1369 Clay 20027 No 52.9 35.9 76 7 nan nan Kansas 20 0
27 1422 Greenwood 20073 No 52.3 35.4 76.8 6 stable -0.8 Kansas 20 0
28 1433 Allen 20001 No 52.2 38.2 70.1 10 stable -1 Kansas 20 0
29 1464 Saline 20169 No 51.8 44.3 60.2 35 stable -0.6 Kansas 20 0
30 1490 Sherman 20181 No 51.6 32.4 79.5 5 stable 0.6 Kansas 20 0
31 1508 Reno 20155 No 51.4 44.7 58.8 45 stable 0.6 Kansas 20 0
32 1516 Morris 20127 No 51.3 33.3 78.1 5 stable -0.1 Kansas 20 0
33 1532 Atchison 20005 No 51.1 37.8 67.8 10 stable 0 Kansas 20 0
34 1569 Anderson 20003 No 50.6 33.9 73.7 6 stable -0.6 Kansas 20 0
35 1596 Sedgwick 20173 No 50.3 47.5 53.2 251 falling -1 Kansas 20 0
36 1631 Sumner 20191 No 49.9 39.2 62.8 15 stable 0.2 Kansas 20 0
37 1632 Shawnee 20177 No 49.9 45.8 54.4 109 falling -0.9 Kansas 20 0
38 1646 Neosho 20133 No 49.7 37.5 65 11 stable 0.4 Kansas 20 0
39 1688 Republic 20157 No 49 29.6 79.9 4 stable -0.3 Kansas 20 0
40 1735 Cloud 20029 No 48.4 33.4 68.7 7 stable 0.3 Kansas 20 0
41 1745 Leavenworth 20103 No 48.3 41.4 56 37 falling -1.5 Kansas 20 0
42 1797 Jefferson 20087 No 47.7 36.2 62.2 12 stable -1.2 Kansas 20 0
43 1817 Bourbon 20011 No 47.5 34.9 63.7 10 stable -1.3 Kansas 20 0
44 1912 Ottawa 20143 No 46.4 27.6 74.6 4 nan nan Kansas 20 0
45 1923 Pratt 20151 No 46.2 31.7 66 7 stable 0.2 Kansas 20 0
46 1987 Barton 20009 Yes 45.5 36.3 56.6 17 stable -0.4 Kansas 20 1
47 1990 Mitchell 20123 Yes 45.4 28.4 70.9 5 stable 1.2 Kansas 20 1
48 2008 Riley 20161 Yes 45.2 36.5 55.2 20 stable 1.4 Kansas 20 1
49 2148 Cowley 20035 Yes 43.1 35 52.6 20 falling -4.5 Kansas 20 1
50 2166 Ellsworth 20053 Yes 42.8 26.6 67.2 4 nan nan Kansas 20 1
51 2174 Washington 20201 Yes 42.7 26.2 68.3 4 stable -1.5 Kansas 20 1
52 2259 Harvey 20079 Yes 41.4 33.5 50.7 20 stable 0.5 Kansas 20 1
53 2274 Douglas 20045 Yes 41.2 35.2 47.8 36 stable -0.9 Kansas 20 1
54 2275 Ford 20057 Yes 41.2 31.3 53.1 12 stable -1 Kansas 20 1
55 2289 Miami 20121 Yes 41 32.3 51.5 15 stable -1.1 Kansas 20 1
56 2350 Finney 20055 Yes 40.2 30.1 52.4 11 falling -1.9 Kansas 20 1
57 2363 Russell 20167 Yes 39.9 25.6 61.4 5 stable -1.2 Kansas 20 1
58 2375 Johnson 20091 Yes 39.6 37.2 42.2 210 falling -1.3 Kansas 20 1
59 2396 Pottawatomie 20149 Yes 39.2 28.5 52.7 9 stable -1 Kansas 20 1
60 2468 Ellis 20051 Yes 37.7 28.4 49 12 stable -1.5 Kansas 20 1
61 2473 Marshall 20117 Yes 37.5 25.4 54.5 6 stable 0.4 Kansas 20 1
62 2497 Seward 20175 Yes 37.1 25.4 52.2 7 stable -1.6 Kansas 20 1
63 2531 Wilson 20205 Yes 36.5 23.7 54.9 5 stable -1.9 Kansas 20 1
64 2607 McPherson 20113 Yes 34.4 26.7 43.7 15 stable 0.4 Kansas 20 1
65 2648 Pawnee 20145 Yes 33.2 19.1 55.1 3 stable -2.2 Kansas 20 1
66 2727 Nemaha 20131 Yes 29.6 18.7 45.6 5 stable -0.6 Kansas 20 1
67 2754 Marion 20115 Yes 26.8 17 41.1 5 stable -0.9 Kansas 20 1
68 2757 Kingman 20095 Yes 26.5 15.1 44.8 3 stable -1.3 Kansas 20 1

There are 105 counties in kansas and this data is missing quite a bit of the information that I would find useful for a reliable measure on the state of the State. Many of the age related deaths are hovering around ~50-55 but a few are very low in the 20's. While the majority is mostly stable to slightly falling of incidence and death. Lets get a glimpse of this information into a table so its easier to understand.

Average deaths per year per county avg upper 95th for death per year avg lower 95th for death per year avg age adjusted death rate avg five year trend State total deaths per year
0 13.69 45.47 23.9 32.95 -0.18 1437.0

The population of the entire state of Kansas is relatively small at about 2.94 Million across a fairly large swath of land ~ 82,000 sq mi (~213,000 sq km). So having a per county death rate of about 14 people per year per county is fantastic! Also, compared to previous years the trend for dying of cancer related causes has been decreasing in the last 5 years. Although slowly, Kansas as a whole has a largely aging population so, that is quite the accomplishment. They also average about 1,437 deaths from cancer throughout the entire state per year. That is well below 1% and could also be due to the larger farming population forcing a lot more outdoor activity/less sedentary lifestyles.

Conclusion¶

Thank you so much for reading my EDA over the Cancer County dataset. I hope I can inspire you to plug in your own state and discover more through this data. I have really only scratched the surface of what could be uncovered through the information provided by the CDC and Cancer society. In closing if you hope to look into bettering your own state maybe take a page out of New Jersey, Wyoming, or Vermont for their incredible progress in the last 5 years of trends for incidence and deaths. And if you find yourself or a loved one being diagnosed with any general form of Cancer (please do your research). This data suggests a move to mostly medium-size population states with mountainous regions being your best bet for a good recovery. Fresh air and wonderful healthcare provided.

If you have any suggestions on improving my analysis or want to show off your own EDA, Regression model, or other training/prediction model please leave it in the comments so I can grow my own knowledge in this field/language!

 ¶

External Analysis¶

Further in-depth data visualization and analysis can be found at the following Tableau dashboard:
Project Deep-Dive: Tableau Public


References¶

  1. CDC - Update on Cancer Deaths in the United States
[NbConvertApp] Converting notebook pandasql_cancer_eda.ipynb to html
[NbConvertApp] Writing 391391 bytes to Final_Report.html