포스팅 목차
48. Display the department numbers with more than three employees in each dept.
* 개별 부서에 속한 직원의 숫자가 3명 이상인 부서만 출력하라.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- [그룹별 집계함수와 조건식] 그룹별 건수 집계 후 조건절에 만족하는 그룹 선택
|
1. Oracle(오라클)
- Group by 절과 having 절
Oracle Programming |
select deptno,
count(*) emp_cnt
from emp
group
by deptno
having count(*) > 3;
2. Python Pandas(파이썬)
- Groupby 함수과 Query() 함수
Python Programming |
emp.groupby('deptno')['sal'].count().reset_index().query('sal > 3')
Results |
deptno | sal | |
1 | 20 | 5 |
2 | 30 | 6 |
- Groupby 함수와 관측치 선택
Python Programming |
emp1 = emp['sal'].groupby(emp['deptno']).count()
emp1.loc[emp1 > 3,]
Results |
deptno
20 5
30 6
Name: sal, dtype: int64
- [참고] Groupby 함수와 관측치 선택
Python Programming |
emp1 = emp['sal'].groupby(emp['deptno']).count()
emp1.loc[emp1==emp1.max(),]
Results |
deptno
30 6
Name: sal, dtype: int64
3. R Programming (R Package)
- 그룹별 집계함수(Aggregate() 함수) 와 subset 함수
R Programming |
%%R
subset( aggregate(sal ~ deptno, data = emp, FUN = function(x) c(count_sal = length(x) ) ), sal >3 )
Results |
deptno sal
2 20 5
3 30 6
4. R Dplyr Package
- Group_by 함수과 Filter() 함수
R Programming |
%%R
emp %>%
group_by(deptno) %>% summarise( n_sal=n() ) %>% filter(n_sal > 3)
Results |
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
deptno n_sal
<dbl> <int>
1 20 5
2 30 6
5. R sqldf Package
- Group by 절과 having 절
R Programming |
%%R
sqldf("select deptno, count(*) from emp group by deptno having count(*)>3")
Results |
deptno count(*)
1 20 5
2 30 6
6. Python pandasql Package
- Group by 절과 having 절
Python Programming |
ps.sqldf("select deptno, count(*) from emp group by deptno having count(*)>3")
Results |
deptno | count(*) | |
0 | 20 | 5 |
1 | 30 | 6 |
7. R data.table Package
- keyby= 구문과 .SD
R Programming |
%%R
DT <- data.table(emp)
DT[, .SD[.N > 3, .('N_count' = .N)], keyby = .(deptno)]
Results |
deptno N_count
1: 20 5
2: 30 6
8. SAS Proc SQL
- Group by 절과 having 절
SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
select deptno, count(*) as sal_cnt
from emp
group
by deptno
having count(*) > 3;;
QUIT;
PROC PRINT;RUN;
Results |
OBS | deptno | sal_cnt |
1 | 20 | 5 |
2 | 30 | 6 |
9. SAS Data Step
- Proc summary 프로시져와 Where= 인수
SAS Programming |
%%SAS sas
PROC SUMMARY DATA=EMP NWAY;
CLASS deptno;
VAR SAL;
OUTPUT OUT=STATSAS_2(DROP=_: WHERE=(sal_cnt>3)) n=sal_cnt;
RUN;
PROC PRINT;RUN;
Results |
OBS | deptno | sal_cnt |
1 | 20 | 5 |
2 | 30 | 6 |
- By 구문과 Where= 인수
SAS Programming |
%%SAS sas
PROC SORT DATA=EMP OUT=EMP_1;
BY deptno;
RUN;
DATA STATSAS_3(WHERE=(sal_cnt > 3));
SET EMP_1;
BY deptno;
IF FIRST.deptno THEN sal_cnt = 1;
ELSE DO;
sal_cnt+1;
END;
IF LAST.deptno THEN OUTPUT STATSAS_3; * IF LAST.deptno;
KEEP DEPTNO sal_cnt;
RUN;
PROC PRINT;RUN;
Results |
OBS | deptno | sal_cnt |
1 | 20 | 5 |
2 | 30 | 6 |
10. Python Dfply Package
- Group_by 함수와 fylter_by() 함수
Python Programming |
emp >> \
group_by('deptno') >> \
summarize( emp_cnt = X.empno.count(), emp_cnt1 = X.empno.nunique(), emp_cnt2 = n(X.empno) ) >> \
ungroup() >> \
filter_by( X.emp_cnt > 3 )
Results |
deptno | emp_cnt | emp_cnt1 | emp_cnt2 | |
1 | 20 | 5 | 5 | 5 |
2 | 30 | 6 | 6 | 6 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트
반응형
댓글