‘Ms’, ‘Miss’の2つは同じ意味なのでこういったものを一つに統合
O’Driscoll, Miss. Bridget |
Samaan, Mr. Youssef |
Arnold-Franchi, Mrs. Josef (Josefine Franchi) |
Panula, Master. Juha Niilo |
Nosworthy, Mr. Richard Cater |
Harper, Mrs. Henry Sleeper (Myna Haxtun) |
Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson) |
Ostby, Mr. Engelhart Cornelius |
Woolner, Mr. Hugh |
1 2 3 4 5 6 7 8 9 10 11 |
def get_title(name): title_search = re.search(' ([A-Za-z]+)\.', name) # If the title exists, extract and return it. if title_search: return title_search.group(1) return "" for dataset in full_data: dataset['Title'] = dataset['Name'].apply(get_title) print(pd.crosstab(train['Title'], train['Sex'])) |
上記の結果以下のように名前のタイトルの一覧が出る
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Sex female male Title Capt 0 1 Col 0 2 Countess 1 0 Don 0 1 Dr 1 6 Jonkheer 0 1 Lady 1 0 Major 0 2 Master 0 40 Miss 182 0 Mlle 2 0 Mme 1 0 Mr 0 517 Mrs 125 0 Ms 1 0 Rev 0 6 Sir 0 1 |
同じ意味の物を統合する
1 2 3 4 5 6 7 8 9 |
for dataset in full_data: dataset['Title'] = dataset['Title'].replace(['Lady', 'Countess','Capt', 'Col',\ 'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare') dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss') dataset['Title'] = dataset['Title'].replace('Ms', 'Miss') dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs') print (train[['Title', 'Survived']].groupby(['Title'], as_index=False).mean()) |
結果
1 2 3 4 5 6 |
Title Survived 0 Master 0.575000 1 Miss 0.702703 2 Mr 0.156673 3 Mrs 0.793651 4 Rare 0.347826 |