본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

SUM 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 21.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ SUM Oracle Function ]

     


    SUM 함수는 개별 그룹 내 관측치 또는 전체 자료의 모든 관측치를 기준으로 인수 expr에 대한 값의 합계를 반환한다. 집계 함수 또는 분석함수로 사용할 수 있다.

     

     


    1. Oracle(오라클)

     

    1) 집계 함수

     

    • sum() 함수

    개별 부서에 속한 사원들의 급여 합계를 반환한다.

     

    Oracle Programming
    SELECT deptno, SUM(SAL) sal_tot 
    FROM   EMP
    GROUP  BY deptno;

     

    Results
    DEPTNO	SAL_TOT
    -----------------
    30	9400
    10	8750
    20	10875

    2) 분석함수

     

    • sum() 함수

    개별 그룹별 사원들의 급여 합계를 계산하여 원본 자료에 추가하여서 각 관측치에 해당 부서의 급여 합계를 추가한다.

     

    Oracle Programming
    SELECT ENAME, deptno, sal, 
           SUM(SAL) OVER (PARTITION BY deptno) sal_tot
    FROM   EMP

     

    Results
    ENAME	DEPTNO	SAL	SAL_TOT
    --------------------------------
    CLARK	10	2450	8750
    MILLER	10	1300	8750
    KING	10	5000	8750
    FORD	20	3000	10875
    SCOTT	20	3000	10875
    JONES	20	2975	10875
    SMITH	20	800	10875
    ADAMS	20	1100	10875
    WARD	30	1250	9400
    MARTIN	30	1250	9400
    TURNER	30	1500	9400
    JAMES	30	950	9400
    ALLEN	30	1600	9400
    BLAKE	30	2850	9400

     


    2. Python Pandas(파이썬)

     

    1) 집계 함수

     

    • sum() 함수

    개별 부서에 속한 사원들의 급여 합계를 반환한다.

     

    Python Programming
    emp.groupby('deptno')['sal'].sum()

     

    Results
    deptno
    10     8750
    20    10875
    30     9400
    Name: sal, dtype: int64

     


    • agg(lambda x: sum(x))
    Python Programming
    emp.groupby('deptno')['sal'].agg(lambda x: sum(x))

     

    Results
    deptno
    10     8750
    20    10875
    30     9400
    Name: sal, dtype: int64

     


    2) 분석함수

     

    • transform(lambda x: sum(x))

    개별 그룹별 사원들의 급여 합계를 계산 후 원본 자료에 추가(Join)하여서 각 관측치에 해당 부서의 급여 합계를 추가한다.

     

    Python Programming
    withmooc = copy.copy(emp)
    withmooc['sal_sum'] = withmooc.groupby('deptno')['sal'].transform(lambda x: sum(x))
    withmooc.sort_values(['deptno']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_sum
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	8750
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	8750
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	8750
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	10875
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	10875
     

     


    3. R Programming (R Package)

     

    1) 집계 함수

    • sum() 함수
    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    plyr::ddply(emp,.(deptno),transform,sal_sum = sum(sal))[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno sal_sum
    1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    8750
    2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    8750
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    8750
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20   10875
    5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20   10875
    6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20   10875
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20   10875
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20   10875
    9   7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30    9400
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30    9400

     


    • aggregate() 함수와 sum

    개별 부서에 속한 사원들의 급여 합계를 반환한다.

     

    R Programming
    %%R
    
    aggregate(sal ~ deptno, data=emp, FUN=sum)

     

    Results
      deptno   sal
    1     10  8750
    2     20 10875
    3     30  9400

     


    • by() 함수와 sum
    R Programming
    %%R
    
    unlist(with(emp, by(sal, deptno, sum)))

     

    Results
    deptno: 10
    [1] 8750
    ------------------------------------------------------------ 
    deptno: 20
    [1] 10875
    ------------------------------------------------------------ 
    deptno: 30
    [1] 9400

     


    2) 분석함수

     

    • sum() 함수
    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['sum_sal'] = with(emp, ave(sal, deptno, FUN =function(x) { sum(x) } ))  
    
    withmooc[order(withmooc$deptno), ][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sum_sal
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>
     1  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    8750
     2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    8750
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    8750
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20   10875
     5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20   10875
     6  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20   10875
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20   10875
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20   10875
     9  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30    9400
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    9400

     


    • ave() 함수와 sum
    R Programming
    %%R
    withmooc <- emp[order(-emp$sal),]
    withmooc['sal_sum'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { sum(x) }))
    
    withmooc[order(withmooc$deptno),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_sum
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>
     1  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    8750
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    8750
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    8750
     4  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20   10875
     5  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20   10875
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20   10875
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20   10875
     8  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20   10875
     9  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30    9400
    10  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30    9400

     

     


    4. R Dplyr Package

     

    1) 집계 함수

     

    • sum() 함수

    개별 부서에 속한 사원들의 급여 합계를 반환한다.

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::summarize(sal_sum = sum(sal)) %>%
      head()

     

    Results
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 2
      deptno sal_sum
       <dbl>   <dbl>
    1     10    8750
    2     20   10875
    3     30    9400

     


    2) 분석함수

     

    • sum() 함수

    개별 그룹별 사원들의 급여 합계를 계산 후 원본 자료에 추가(Join)하여서 각 관측치에 해당 부서의 급여 합계를 추가한다.

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_sum = sum(sal)) %>%
      dplyr::arrange(deptno) %>%
      head()

     

    Results
    # A tibble: 6 x 9
    # Groups:   deptno [2]
      empno ename  job         mgr hiredate     sal  comm deptno sal_sum
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>
    1  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    8750
    2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    8750
    3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    8750
    4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20   10875
    5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20   10875
    6  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20   10875

     


    5. R sqldf Package

     

    1) 집계 함수

    • sum() 함수
    R Programming
    %%R
    
    sqldf(" SELECT deptno, SUM(sal) sal_sum FROM emp GROUP BY deptno; ")

     

    Results
      deptno sal_sum
    1     10    8750
    2     20   10875
    3     30    9400

     


    2) 분석함수

    • sum() 함수
    R Programming
    %%R
    
    sqldf(" SELECT ename, deptno, sal, SUM(sal) OVER (PARTITION BY deptno) sal_sum
            FROM emp ")

     

    Results
        ename deptno  sal sal_sum
    1   CLARK     10 2450    8750
    2    KING     10 5000    8750
    3  MILLER     10 1300    8750
    4   SMITH     20  800   10875
    5   JONES     20 2975   10875
    6   SCOTT     20 3000   10875
    7   ADAMS     20 1100   10875
    8    FORD     20 3000   10875
    9   ALLEN     30 1600    9400
    10   WARD     30 1250    9400
    11 MARTIN     30 1250    9400
    12  BLAKE     30 2850    9400
    13 TURNER     30 1500    9400
    14  JAMES     30  950    9400

     


    6. Python pandasql Package

     

     

    1) 집계 함수

    • sum() 함수
    Python Programming
    ps.sqldf(" SELECT deptno, SUM(sal) sal_sum FROM emp GROUP BY deptno   ")

     

    Results
    	deptno	sal_sum
    0	10	8750
    1	20	10875
    2	30	9400
     

    2) 분석함수

    • sum() 함수
    Python Programming
    ps.sqldf(" SELECT ename, deptno, sal,                             \
                      SUM(sal) OVER (PARTITION BY deptno) sal_sum     \
               FROM emp ").head()

     

    Results
    	ename	deptno	sal	sal_sum
    0	CLARK	10	2450	8750
    1	KING	10	5000	8750
    2	MILLER	10	1300	8750
    3	SMITH	20	800	10875
    4	JONES	20	2975	10875

     


    7. R data.table Package

     

    1) 집계 함수

    • sum() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[,list(sal_sum = sum(sal)), by=deptno]

     

    Results
       deptno sal_sum
    1:     20   10875
    2:     30    9400
    3:     10    8750

     


    2) 분석함수

    • sum() 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[,sal_sum := sum(sal), by=deptno][order(deptno),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_sum
     1:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    8750
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    8750
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    8750
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20   10875
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20   10875
     6:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20   10875
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20   10875
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20   10875
     9:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30    9400
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30    9400

     


    8. Python Duckdb의 SQL

     

    1) 집계 함수

    • sum() 함수
    Python Programming
    %%sql
      SELECT deptno, SUM(SAL) sal_tot FROM emp
      GROUP  BY deptno
      ORDER  BY 1

     

    Python Programming
    duckdb.sql(" SELECT deptno, SUM(SAL) sal_tot FROM emp   \
                 GROUP  BY deptno                           \
                 ORDER  BY 1 ").df()

     

    Results
       deptno  sal_tot
    0      10   8750.0
    1      20  10875.0
    2      30   9400.0

     


     

    2) 분석 함수

    • sum() 함수
    Python Programming
    %%sql
      SELECT ename, deptno, sal,
             SUM(SAL) OVER (PARTITION BY deptno) sal_tot
      FROM   emp
      ORDER  BY DEPTNO
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT ename, deptno, sal,                              \
                        SUM(SAL) OVER (PARTITION BY deptno) sal_tot      \
                 FROM   emp                                              \
                 ORDER  BY DEPTNO                                        \
                 LIMIT  6 ").df()

     

    Results
        ename  deptno   sal  sal_tot
    0   CLARK      10  2450   8750.0
    1    KING      10  5000   8750.0
    2  MILLER      10  1300   8750.0
    3   SMITH      20   800  10875.0
    4   JONES      20  2975  10875.0
    5   SCOTT      20  3000  10875.0

     


    ( https://unsplash.com/photos/gdL-UZfnD3I )

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글