육각형 구간. 등고선 | 분할표 | 상자그림, 바이올린 도표 | 조건화¶
1단계 : 데이터 로드하기¶
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
2단계: 데이터 확인하기¶
In [ ]:
KC_TAX_CSV = '/content/drive/MyDrive/통계공부/PSDS/data/kc_tax.csv.gz'
In [ ]:
import pandas as pd
kc_tax = pd.read_csv(KC_TAX_CSV)
In [ ]:
#데이터 필터링하기
kc_tax0 = kc_tax.loc[(kc_tax.TaxAssessedValue < 750000) &
(kc_tax.SqFtTotLiving > 100) &
(kc_tax.SqFtTotLiving < 3500), :]
print(kc_tax0.shape)
(432693, 3)
In [ ]:
LC_LOANS_CSV = '/content/drive/MyDrive/통계공부/PSDS/data/lc_loans.csv'
lc_loans = pd.read_csv(LC_LOANS_CSV)
In [ ]:
AIRLINE_STATS_CSV = '/content/drive/MyDrive/통계공부/PSDS/data/airline_stats.csv'
airline_stats = pd.read_csv(AIRLINE_STATS_CSV)
3단계: 계산하기¶
In [ ]:
#육각형 구간
import matplotlib.pylab as plt
ax = kc_tax0.plot.hexbin(x='SqFtTotLiving', y='TaxAssessedValue',
gridsize=30, sharex=False, figsize=(5, 4))
ax.set_xlabel('Finished Square Feet')
ax.set_ylabel('Tax Assessed Value')
plt.tight_layout()
plt.show()
In [ ]:
#등고선
import seaborn as sns
fig, ax = plt.subplots(figsize=(4, 4))
sns.kdeplot(data=kc_tax0.sample(10000), x='SqFtTotLiving', y='TaxAssessedValue', ax=ax)
ax.set_xlabel('Finished Square Feet')
ax.set_ylabel('Tax Assessed Value')
plt.tight_layout()
plt.show()
In [ ]:
#분할표
crosstab = lc_loans.pivot_table(index='grade', columns='status',
aggfunc=lambda x: len(x), margins=True)
print(crosstab)
print("\n")
df = crosstab.copy().loc['A':'G',:]
df.loc[:,'Charged Off':'Late'] = df.loc[:,'Charged Off':'Late'].div(df['All'], axis=0)
df['All'] = df['All'] / sum(df['All'])
perc_crosstab = df
print(perc_crosstab)
status Charged Off Current Fully Paid Late All grade A 1562 50051 20408 469 72490 B 5302 93852 31160 2056 132370 C 6023 88928 23147 2777 120875 D 5007 53281 13681 2308 74277 E 2842 24639 5949 1374 34804 F 1526 8444 2328 606 12904 G 409 1990 643 199 3241 All 22671 321185 97316 9789 450961 status Charged Off Current Fully Paid Late All grade A 0.021548 0.690454 0.281528 0.006470 0.160746 B 0.040054 0.709013 0.235401 0.015532 0.293529 C 0.049828 0.735702 0.191495 0.022974 0.268039 D 0.067410 0.717328 0.184189 0.031073 0.164708 E 0.081657 0.707936 0.170929 0.039478 0.077177 F 0.118258 0.654371 0.180409 0.046962 0.028614 G 0.126196 0.614008 0.198396 0.061401 0.007187
In [ ]:
#상자그림
airline_stats.head()
ax = airline_stats.boxplot(by='airline', column='pct_carrier_delay',
figsize=(5, 5))
ax.set_xlabel('')
ax.set_ylabel('Daily % of Delayed Flights')
plt.suptitle('')
plt.tight_layout()
plt.show()
In [ ]:
#바이올린 도표
fig, ax = plt.subplots(figsize=(5, 5))
sns.violinplot(data=airline_stats, x='airline', y='pct_carrier_delay',
ax=ax, inner='quartile', color='white')
ax.set_xlabel('')
ax.set_ylabel('Daily % of Delayed Flights')
plt.tight_layout()
plt.show()
In [ ]:
#조건화
zip_codes = [98188, 98105, 98108, 98126]
kc_tax_zip = kc_tax0.loc[kc_tax0.ZipCode.isin(zip_codes),:]
kc_tax_zip
def hexbin(x, y, color, **kwargs):
cmap = sns.light_palette(color, as_cmap=True)
plt.hexbin(x, y, gridsize=25, cmap=cmap, **kwargs)
g = sns.FacetGrid(kc_tax_zip, col='ZipCode', col_wrap=2)
g.map(hexbin, 'SqFtTotLiving', 'TaxAssessedValue',
extent=[0, 3500, 0, 700000])
g.set_axis_labels('Finished Square Feet', 'Tax Assessed Value')
g.set_titles('Zip code {col_name:.0f}')
plt.tight_layout()
plt.show()
'수학 및 통계 > PSDS' 카테고리의 다른 글
| [PSDS] 두 개 이상의 변수 탐색하기 (1) | 2023.09.27 |
|---|---|
| [PSDS] 상관관계 실습 (0) | 2023.09.26 |
| [PSDS] 상관관계 (0) | 2023.09.25 |
| [PSDS] 이진 데이터와 범주 데이터 탐색하기 실습 (0) | 2023.09.24 |
| [PSDS] 이진 데이터와 범주 데이터 탐색하기 (0) | 2023.09.23 |