포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ RATIO_TO_REPORT Oracle Function ]
RATIO_TO_REPORT함수는 분석 함수이다. 이 함수는 개별 그룹에서 관측치 합에 대한 개별 관측치 값의 상대적 비율을 계산한다. 만약 expr이 NULL이라면, ratio-to-report값은 NULL이다.
- 함수 설명 : RATIO_TO_REPORT 오라클 함수 링크
1. Oracle(오라클)
RATIO_TO_REPORT() 함수
Oracle Programming |
SELECT deptno, ename, sal,
ROUND( SAL / SUM(sal) OVER(PARTITION BY deptno ORDER BY sal) ,3) AS sal_ratio_1,
ROUND( 100*RATIO_TO_REPORT(sal) OVER (PARTITION BY deptno) ,3) AS sal_ratio_2
FROM emp
DEPTNO ENAME SAL SAL_RATIO_1 SAL_RATIO_2
-------------------------------------------------
10 MILLER 1300 1 14.857
10 CLARK 2450 .653 28
10 KING 5000 .571 57.143
20 SMITH 800 1 7.356
20 ADAMS 1100 .579 10.115
20 JONES 2975 .61 27.356
20 SCOTT 3000 .276 27.586
20 FORD 3000 .276 27.586
30 JAMES 950 1 10.106
30 MARTIN 1250 .362 13.298
30 WARD 1250 .362 13.298
30 TURNER 1500 .303 15.957
30 ALLEN 1600 .244 17.021
30 BLAKE 2850 .303 30.319
2. Python Pandas(파이썬)
개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.
Python Programming |
withmooc = copy.copy(emp)
withmooc['sal_ratio'] = withmooc["sal"] / withmooc.groupby("deptno")["sal"].transform("sum")
withmooc.sort_values(['deptno','sal']).head()
Results |
empno ename job mgr hiredate sal comm deptno sal_ratio
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 0.148571
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0.280000
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0.571429
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 0.073563
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0.101149
Python Programming |
withmooc = emp.copy()
withmooc['sal_ratio'] = emp.groupby('deptno')['sal'].transform(lambda x: x / x.sum())
withmooc.sort_values(['deptno','sal']).head()
Results |
empno ename job mgr hiredate sal comm deptno sal_ratio
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 0.148571
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0.280000
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0.571429
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 0.073563
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0.101149
3. R Programming (R Package)
개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.
R Programming |
%%R
withmooc <- emp
withmooc['sal_ratio'] = with(emp, ave(sal, deptno, FUN =function(x) { x / sum(x) } ))
withmooc[order(withmooc$deptno,withmooc$sal), ][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_ratio
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 0.149
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.28
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0.571
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0.0736
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.101
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.274
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0.276
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0.276
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 0.101
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 0.133
4. R Dplyr Package
개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_ratio = sal / sum(sal)) %>%
dplyr::arrange(deptno,sal) %>%
head()
Results |
# A tibble: 6 x 9
# Groups: deptno [2]
empno ename job mgr hiredate sal comm deptno sal_ratio
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 0.149
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.28
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0.571
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0.0736
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.101
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.274
5. R sqldf Package
개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.
R Programming |
%%R
sqldf(" select deptno,
sal,
sal / sum(sal) over(partition by deptno) sal_ratio
from emp
order by 1,2")
6. Python pandasql Package
개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.
Python Programming |
ps.sqldf(" select deptno, \
sal, \
sum(sal) over(partition by deptno) sal_sum, \
100 * sal / sum(sal) over(partition by deptno) sal_ratio \
from emp ").head()
Results |
deptno sal sal_sum sal_ratio
0 10 2450 8750 28
1 10 5000 8750 57
2 10 1300 8750 14
3 20 800 10875 7
4 20 2975 10875 27
7. R data.table Package
개별 부서의 급여 합계를 기준으로 부서별로 개별 사원의 급여 비중을 계산한다.
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_ratio := sal / sum(sal), by=deptno][order(deptno,sal), ][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_ratio
1: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 0.14857143
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.28000000
3: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0.57142857
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0.07356322
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.10114943
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.27356322
7: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0.27586207
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0.27586207
9: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 0.10106383
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 0.13297872
8. Python Duckdb의 SQL
Python Programming |
%%sql
SELECT deptno, ename, sal,
SAL / SUM(sal) OVER(PARTITION BY deptno ORDER BY sal) AS sal_ratio_1
FROM emp
LIMIT 6
Python Programming |
duckdb.sql(" SELECT deptno, ename, sal, \
SAL / SUM(sal) OVER(PARTITION BY deptno ORDER BY sal) AS sal_ratio_1 \
FROM emp \
LIMIT 6 ").df()
Results |
deptno ename sal sal_ratio_1
0 30 JAMES 950 1.000000
1 30 WARD 1250 0.362319
2 30 MARTIN 1250 0.362319
3 30 TURNER 1500 0.303030
4 30 ALLEN 1600 0.244275
5 30 BLAKE 2850 0.303191
Python Programming |
%%sql
select deptno,
sal,
sum(sal) over(partition by deptno) sal_sum,
100 * sal / sum(sal) over(partition by deptno) sal_ratio
from emp
order by 1,2
LIMIT 6
Python Programming |
duckdb.sql(" select deptno, \
sal, \
sum(sal) over(partition by deptno) sal_sum, \
100 * sal / sum(sal) over(partition by deptno) sal_ratio \
from emp \
order by 1,2 \
LIMIT 6 ").df()
Results |
deptno sal sal_sum sal_ratio
0 10 1300 8750.0 14.857143
1 10 2450 8750.0 28.000000
2 10 5000 8750.0 57.142857
3 20 800 10875.0 7.356322
4 20 1100 10875.0 10.114943
5 20 2975 10875.0 27.356322
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글