문제
Table: Person
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| email | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id.
For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.
For Pandas users, please note that you are supposed to modify Person in place.
After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.
https://leetcode.com/problems/delete-duplicate-emails/description/?lang=pythondata
예시
Input:
Person table:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Output:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
Explanation: john@example.com is repeated two times. We keep the row with the smallest Id = 1.
문제 풀이
import pandas as pd
def delete_duplicate_emails(person: pd.DataFrame) -> None:
person.sort_values(by='id', inplace=True)
person.drop_duplicates(subset='email', keep='first', inplace=True)
파이썬을 독학하시는 분들에게 도움이 되길 바라며,
혹 더 좋은 방법이 있거나 오류가 있다면 편하게 말씀 부탁드립니다.