포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ SUM Oracle Function ]
SUM 함수는 개별 그룹 내 관측치 또는 전체 자료의 모든 관측치를 기준으로 인수 expr에 대한 값의 합계를 반환한다. 집계 함수 또는 분석함수로 사용할 수 있다.
- 함수 설명 : SUM 오라클 함수 링크
1. Oracle(오라클)
1) 집계 함수
- sum() 함수
개별 부서에 속한 사원들의 급여 합계를 반환한다.
Oracle Programming |
SELECT deptno, SUM(SAL) sal_tot
FROM EMP
GROUP BY deptno;
Results |
DEPTNO SAL_TOT
-----------------
30 9400
10 8750
20 10875
2) 분석함수
- sum() 함수
개별 그룹별 사원들의 급여 합계를 계산하여 원본 자료에 추가하여서 각 관측치에 해당 부서의 급여 합계를 추가한다.
Oracle Programming |
SELECT ENAME, deptno, sal,
SUM(SAL) OVER (PARTITION BY deptno) sal_tot
FROM EMP
Results |
ENAME DEPTNO SAL SAL_TOT
--------------------------------
CLARK 10 2450 8750
MILLER 10 1300 8750
KING 10 5000 8750
FORD 20 3000 10875
SCOTT 20 3000 10875
JONES 20 2975 10875
SMITH 20 800 10875
ADAMS 20 1100 10875
WARD 30 1250 9400
MARTIN 30 1250 9400
TURNER 30 1500 9400
JAMES 30 950 9400
ALLEN 30 1600 9400
BLAKE 30 2850 9400
2. Python Pandas(파이썬)
1) 집계 함수
- sum() 함수
개별 부서에 속한 사원들의 급여 합계를 반환한다.
Python Programming |
emp.groupby('deptno')['sal'].sum()
Results |
deptno
10 8750
20 10875
30 9400
Name: sal, dtype: int64
- agg(lambda x: sum(x))
Python Programming |
emp.groupby('deptno')['sal'].agg(lambda x: sum(x))
Results |
deptno
10 8750
20 10875
30 9400
Name: sal, dtype: int64
2) 분석함수
- transform(lambda x: sum(x))
개별 그룹별 사원들의 급여 합계를 계산 후 원본 자료에 추가(Join)하여서 각 관측치에 해당 부서의 급여 합계를 추가한다.
Python Programming |
withmooc = copy.copy(emp)
withmooc['sal_sum'] = withmooc.groupby('deptno')['sal'].transform(lambda x: sum(x))
withmooc.sort_values(['deptno']).head()
Results |
empno ename job mgr hiredate sal comm deptno sal_sum
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 8750
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 8750
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 8750
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 10875
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 10875
3. R Programming (R Package)
1) 집계 함수
- sum() 함수
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
plyr::ddply(emp,.(deptno),transform,sal_sum = sum(sal))[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_sum
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 8750
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 8750
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 8750
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 10875
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 10875
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 10875
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 10875
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 10875
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 9400
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 9400
- aggregate() 함수와 sum
개별 부서에 속한 사원들의 급여 합계를 반환한다.
R Programming |
%%R
aggregate(sal ~ deptno, data=emp, FUN=sum)
Results |
deptno sal
1 10 8750
2 20 10875
3 30 9400
- by() 함수와 sum
R Programming |
%%R
unlist(with(emp, by(sal, deptno, sum)))
Results |
deptno: 10
[1] 8750
------------------------------------------------------------
deptno: 20
[1] 10875
------------------------------------------------------------
deptno: 30
[1] 9400
2) 분석함수
- sum() 함수
R Programming |
%%R
withmooc <- emp
withmooc['sum_sal'] = with(emp, ave(sal, deptno, FUN =function(x) { sum(x) } ))
withmooc[order(withmooc$deptno), ][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sum_sal
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 8750
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 8750
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 8750
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 10875
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 10875
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 10875
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 10875
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 10875
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 9400
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 9400
- ave() 함수와 sum
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
withmooc['sal_sum'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { sum(x) }))
withmooc[order(withmooc$deptno),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_sum
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 8750
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 8750
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 8750
4 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 10875
5 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 10875
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 10875
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 10875
8 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 10875
9 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 9400
10 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 9400
4. R Dplyr Package
1) 집계 함수
- sum() 함수
개별 부서에 속한 사원들의 급여 합계를 반환한다.
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::summarize(sal_sum = sum(sal)) %>%
head()
Results |
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
deptno sal_sum
<dbl> <dbl>
1 10 8750
2 20 10875
3 30 9400
2) 분석함수
- sum() 함수
개별 그룹별 사원들의 급여 합계를 계산 후 원본 자료에 추가(Join)하여서 각 관측치에 해당 부서의 급여 합계를 추가한다.
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_sum = sum(sal)) %>%
dplyr::arrange(deptno) %>%
head()
Results |
# A tibble: 6 x 9
# Groups: deptno [2]
empno ename job mgr hiredate sal comm deptno sal_sum
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 8750
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 8750
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 8750
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 10875
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 10875
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 10875
5. R sqldf Package
1) 집계 함수
- sum() 함수
R Programming |
%%R
sqldf(" SELECT deptno, SUM(sal) sal_sum FROM emp GROUP BY deptno; ")
Results |
deptno sal_sum
1 10 8750
2 20 10875
3 30 9400
2) 분석함수
- sum() 함수
R Programming |
%%R
sqldf(" SELECT ename, deptno, sal, SUM(sal) OVER (PARTITION BY deptno) sal_sum
FROM emp ")
Results |
ename deptno sal sal_sum
1 CLARK 10 2450 8750
2 KING 10 5000 8750
3 MILLER 10 1300 8750
4 SMITH 20 800 10875
5 JONES 20 2975 10875
6 SCOTT 20 3000 10875
7 ADAMS 20 1100 10875
8 FORD 20 3000 10875
9 ALLEN 30 1600 9400
10 WARD 30 1250 9400
11 MARTIN 30 1250 9400
12 BLAKE 30 2850 9400
13 TURNER 30 1500 9400
14 JAMES 30 950 9400
6. Python pandasql Package
1) 집계 함수
- sum() 함수
Python Programming |
ps.sqldf(" SELECT deptno, SUM(sal) sal_sum FROM emp GROUP BY deptno ")
Results |
deptno sal_sum
0 10 8750
1 20 10875
2 30 9400
2) 분석함수
- sum() 함수
Python Programming |
ps.sqldf(" SELECT ename, deptno, sal, \
SUM(sal) OVER (PARTITION BY deptno) sal_sum \
FROM emp ").head()
Results |
ename deptno sal sal_sum
0 CLARK 10 2450 8750
1 KING 10 5000 8750
2 MILLER 10 1300 8750
3 SMITH 20 800 10875
4 JONES 20 2975 10875
7. R data.table Package
1) 집계 함수
- sum() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[,list(sal_sum = sum(sal)), by=deptno]
Results |
deptno sal_sum
1: 20 10875
2: 30 9400
3: 10 8750
2) 분석함수
- sum() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[,sal_sum := sum(sal), by=deptno][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_sum
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 8750
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 8750
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 8750
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 10875
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 10875
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 10875
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 10875
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 10875
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 9400
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 9400
8. Python Duckdb의 SQL
1) 집계 함수
- sum() 함수
Python Programming |
%%sql
SELECT deptno, SUM(SAL) sal_tot FROM emp
GROUP BY deptno
ORDER BY 1
Python Programming |
duckdb.sql(" SELECT deptno, SUM(SAL) sal_tot FROM emp \
GROUP BY deptno \
ORDER BY 1 ").df()
Results |
deptno sal_tot
0 10 8750.0
1 20 10875.0
2 30 9400.0
2) 분석 함수
- sum() 함수
Python Programming |
%%sql
SELECT ename, deptno, sal,
SUM(SAL) OVER (PARTITION BY deptno) sal_tot
FROM emp
ORDER BY DEPTNO
LIMIT 6
Python Programming |
duckdb.sql(" SELECT ename, deptno, sal, \
SUM(SAL) OVER (PARTITION BY deptno) sal_tot \
FROM emp \
ORDER BY DEPTNO \
LIMIT 6 ").df()
Results |
ename deptno sal sal_tot
0 CLARK 10 2450 8750.0
1 KING 10 5000 8750.0
2 MILLER 10 1300 8750.0
3 SMITH 20 800 10875.0
4 JONES 20 2975 10875.0
5 SCOTT 20 3000 10875.0
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글