포스팅 목차
49. Display the various jobs along with total sal for each of the jobs where total sal is greater than 4000.
* 개별 직무별 급여 합계를 구한 후 그 합계값이 5000 초과인 직무와 급여 합계를 출력하시오.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- [데이터 추출] having 절
|
1. Oracle(오라클)
- Having 구문
Oracle Programming |
select job,
sum(sal) sal_sum
from emp
group
by job
having sum(sal)>5000;
2. Python Pandas(파이썬)
- Query() 함수
Python Programming |
emp.groupby('job')['sal'].sum().reset_index().query('sal > 5000')
Results |
job | sal | |
0 | ANALYST | 6000 |
2 | MANAGER | 8275 |
4 | SALESMAN | 5600 |
- Query() 함수
Python Programming |
emp.groupby('job').agg({'sal':'sum'}).reset_index().query('sal > 5000')
Results |
job | sal | |
0 | ANALYST | 6000 |
2 | MANAGER | 8275 |
4 | SALESMAN | 5600 |
- Filter() 함수
Python Programming |
emp.groupby('job').filter(lambda x: x['sal'].sum() > 5000).groupby('job')['sal'].sum()
Results |
job
ANALYST 6000
MANAGER 8275
SALESMAN 5600
Name: sal, dtype: int64
3. R Programming (R Package)
- Subset() 함수
R Programming |
%%R
subset( aggregate(sal ~ job, data=emp ,sum) , sal > 5000 )
Results |
job sal
1 ANALYST 6000
3 MANAGER 8275
5 SALESMAN 5600
- [참고] 집계함수 - Aggregate() 함수
R Programming |
%%R
aggregate(emp$sal,list(v1=emp$job),sum)
Results |
v1 sum.emp$sal
1 ANALYST 6000
2 CLERK 4150
3 MANAGER 8275
4 PRESIDENT 5000
5 SALESMAN 5600
4. R Dplyr Package
- Filter() 함수
R Programming |
%%R
emp %>%
group_by(job) %>%
summarise(sum_sal = sum(sal, na.rm = TRUE)) %>%
filter(sum_sal > 5000)
Results |
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
job sum_sal
<chr> <dbl>
1 ANALYST 6000
2 MANAGER 8275
3 SALESMAN 5600
5. R sqldf Package
- Having() 절
R Programming |
%%R
sqldf(" select job, sum(sal) sal_sum from emp group by job having sum(sal)>5000 ")
Results |
job sal_sum
1 ANALYST 6000
2 MANAGER 8275
3 SALESMAN 5600
6. Python pandasql Package
- Having() 절
Python Programming |
ps.sqldf("select job, sum(sal) sal_sum from emp group by job having sum(sal)>5000")
Results |
job | sal_sum | |
0 | ANALYST | 6000 |
1 | MANAGER | 8275 |
2 | SALESMAN | 5600 |
7. R data.table Package
- .SD
R Programming |
%%R
DT <- data.table(emp)
DT[, .SD[sum(sal, na.rm = TRUE) > 5000, .('sum_sal' = sum(sal,na.rm = TRUE))], keyby = .(job)]
Results |
job sum_sal
1: ANALYST 6000
2: MANAGER 8275
3: SALESMAN 5600
8. SAS Proc SQL
- Having() 절
SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
select job, sum(sal) AS SAL_SUM
from emp
group
by job
having sum(sal) > 5000;
QUIT;
PROC PRINT;RUN;
Results |
OBS | job | SAL_SUM |
1 | ANALYST | 6000 |
2 | MANAGER | 8275 |
3 | SALESMAN | 5600 |
9. SAS Data Step
- Where() 구문
SAS Programming |
%%SAS sas
PROC SUMMARY DATA=EMP NWAY;
CLASS job;
VAR SAL;
OUTPUT OUT=STATSAS_2(DROP=_: WHERE=(SAL_SUM > 5000)) SUM=SAL_SUM;
RUN;
PROC PRINT;RUN;
Results |
OBS | job | SAL_SUM |
1 | ANALYST | 6000 |
2 | MANAGER | 8275 |
3 | SALESMAN | 5600 |
- Where() 구문
SAS Programming |
%%SAS sas
PROC SORT DATA=EMP OUT=EMP_1;
BY job;
RUN;
DATA STATSAS_3(WHERE=(sal_SUM > 5000));
SET EMP_1;
BY job;
IF FIRST.job THEN sal_SUM = SAL;
ELSE sal_SUM + SAL; * SAL_SUM = SUM_SUL + SAL;
IF LAST.job; * IF LAST.JOB THEN OUTPUT STATSAS_3;
KEEP job sal_SUM;
RUN;
PROC PRINT;RUN;
Results |
OBS | job | sal_SUM |
1 | ANALYST | 6000 |
2 | MANAGER | 8275 |
3 | SALESMAN | 5600 |
10. Python Dfply Package
- Filter_by() 함수
Python Programming |
emp >> group_by('job') >> summarize( sal_tot = X.sal.sum() ) >> filter_by(X.sal_tot > 5000)
Results |
job | sal_tot | |
0 | ANALYST | 6000 |
2 | MANAGER | 8275 |
4 | SALESMAN | 5600 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트
반응형
댓글