본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

RATIO_TO_REPORT 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 13.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ RATIO_TO_REPORT Oracle Function ]

     

     


    RATIO_TO_REPORT함수는 분석 함수이다. 이 함수는 개별 그룹에서 관측치 합에 대한 개별 관측치 값의 상대적 비율을 계산한다. 만약 expr이 NULL이라면, ratio-to-report값은 NULL이다.

     

     

     


    1. Oracle(오라클)

     

    RATIO_TO_REPORT() 함수

     

    Oracle Programming
    SELECT deptno, ename, sal,
           ROUND( SAL / SUM(sal) OVER(PARTITION BY deptno ORDER BY sal) ,3) AS sal_ratio_1,
           ROUND( 100*RATIO_TO_REPORT(sal) OVER (PARTITION BY deptno)   ,3) AS sal_ratio_2
    FROM   emp

     

    DEPTNO	ENAME	SAL	SAL_RATIO_1	SAL_RATIO_2
    -------------------------------------------------
    10	MILLER	1300	1		14.857
    10	CLARK	2450	.653		28
    10	KING	5000	.571		57.143
    20	SMITH	800	1		7.356
    20	ADAMS	1100	.579		10.115
    20	JONES	2975	.61		27.356
    20	SCOTT	3000	.276		27.586
    20	FORD	3000	.276		27.586
    30	JAMES	950	1		10.106
    30	MARTIN	1250	.362		13.298
    30	WARD	1250	.362		13.298
    30	TURNER	1500	.303		15.957
    30	ALLEN	1600	.244		17.021
    30	BLAKE	2850	.303		30.319

     


    2. Python Pandas(파이썬)

     

    개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.

     

    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['sal_ratio'] = withmooc["sal"] / withmooc.groupby("deptno")["sal"].transform("sum")
    
    withmooc.sort_values(['deptno','sal']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_ratio
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	0.148571
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	0.280000
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0.571429
      0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	0.073563
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	0.101149

     


     

    Python Programming
    withmooc = emp.copy()
    withmooc['sal_ratio'] = emp.groupby('deptno')['sal'].transform(lambda x: x / x.sum())
    withmooc.sort_values(['deptno','sal']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_ratio
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	0.148571
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	0.280000
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0.571429
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	0.073563
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	0.101149
     

     


    3. R Programming (R Package)

     

    개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.

     

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['sal_ratio'] = with(emp, ave(sal, deptno, FUN =function(x) { x / sum(x) } ))  
    
    withmooc[order(withmooc$deptno,withmooc$sal), ][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_ratio
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    0.149 
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    0.28  
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    0.571 
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    0.0736
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    0.101 
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    0.274 
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    0.276 
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    0.276 
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    0.101 
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    0.133 

     

     


    4. R Dplyr Package

     

    개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_ratio = sal / sum(sal)) %>%
      dplyr::arrange(deptno,sal) %>%
      head()

     

    Results
    # A tibble: 6 x 9
    # Groups:   deptno [2]
      empno ename  job         mgr hiredate     sal  comm deptno sal_ratio
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
    1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    0.149 
    2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    0.28  
    3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    0.571 
    4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    0.0736
    5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    0.101 
    6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    0.274 

     

     


    5. R sqldf Package

     

    개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.

     

    R Programming
    %%R
    
    sqldf(" select deptno,
                   sal,
                   sal / sum(sal) over(partition by deptno) sal_ratio
            from emp 
            order by 1,2")

     


    6. Python pandasql Package

     

    개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.

     

    Python Programming
    ps.sqldf(" select deptno,                                                      \
                      sal,                                                         \
                      sum(sal) over(partition by deptno) sal_sum,                  \
                      100 * sal / sum(sal) over(partition by deptno) sal_ratio     \
               from emp ").head()

     

    Results
    	deptno	sal	sal_sum	sal_ratio
    0	10	2450	8750	28
    1	10	5000	8750	57
    2	10	1300	8750	14
    3	20	800	10875	7
    4	20	2975	10875	27
     

     


    7. R data.table Package

     

    개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_ratio := sal / sum(sal), by=deptno][order(deptno,sal), ][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno  sal_ratio
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10 0.14857143
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10 0.28000000
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10 0.57142857
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20 0.07356322
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20 0.10114943
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 0.27356322
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20 0.27586207
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20 0.27586207
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30 0.10106383
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 0.13297872

     


    8. Python Duckdb의 SQL

     

    Python Programming
    %%sql
    SELECT deptno, ename, sal,
           SAL / SUM(sal) OVER(PARTITION BY deptno ORDER BY sal) AS sal_ratio_1
    FROM   emp
    LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT deptno, ename, sal,                                                      \
                        SAL / SUM(sal) OVER(PARTITION BY deptno ORDER BY sal) AS sal_ratio_1     \
                 FROM   emp                                                                      \
                 LIMIT  6 ").df()

     

    Results
       deptno   ename   sal  sal_ratio_1
    0      30   JAMES   950     1.000000
    1      30    WARD  1250     0.362319
    2      30  MARTIN  1250     0.362319
    3      30  TURNER  1500     0.303030
    4      30   ALLEN  1600     0.244275
    5      30   BLAKE  2850     0.303191

     


    Python Programming
    %%sql
      select deptno,
             sal,
             sum(sal) over(partition by deptno) sal_sum,
             100 * sal / sum(sal) over(partition by deptno) sal_ratio
      from   emp
      order  by 1,2
      LIMIT  6

     

    Python Programming
    duckdb.sql(" select deptno,                                                        \
                        sal,                                                           \
                        sum(sal) over(partition by deptno) sal_sum,                    \
                        100 * sal / sum(sal) over(partition by deptno) sal_ratio       \
                 from   emp                                                            \
                 order  by 1,2                                                         \
                 LIMIT  6 ").df()

     

    Results
       deptno   sal  sal_sum  sal_ratio
    0      10  1300   8750.0  14.857143
    1      10  2450   8750.0  28.000000
    2      10  5000   8750.0  57.142857
    3      20   800  10875.0   7.356322
    4      20  1100  10875.0  10.114943
    5      20  2975  10875.0  27.356322

     

     


    ( https://unsplash.com/photos/Obyan1bxya8 )

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글