포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ PERCENT_RANK Oracle Function ]
PERCENT_RANK함수는 파라미터로 지정한 인수의 기준 값(위치)에 대한 그룹 내의 위치를 나타내는 순위 퍼센트(상대 순위)를 반환. PERCENT_RANK함수는 CUME_DIST(누적 분포) 함수와 유사하다. PERCENT_RANK에 의해 반환되는 값의 범위는 0~1(포함한) 사이의 값이다. 모든 집합의 최초 행은 0의 PERCENT_RANK이다. 반환되는 값은 NUMBER이다.
- 함수 설명 : PERCENT_RANK 오라클 함수 링크
1. Oracle(오라클)
1) 집계 함수
다음 예제는 hr.employees테이블에서 급여가 $1,000를 수령하는 직원의 Percent Rank를 계산한다.
Oracle Programming |
SELECT PERCENT_RANK(1000) WITHIN GROUP (ORDER BY SAL) AS PERCENT_RANK
FROM EMP;
Results |
PERCENT_RANK
-------------------------------------------
.1428571428571428571428571428571428571429
2) 분석함수
다음 예제는 개별 부서 내에서 근무하는 직원의 급여를 기준으로 Percent Rank를 계산한다.
Oracle Programming |
SELECT A.*,
PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS PERCENT_RANK
FROM EMP A
ORDER
BY DEPTNO, SAL DESC;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO PERCENT_RANK
-------------------------------------------------------------------------------------
7839 KING PRESIDENT - 17-NOV-81 5000 - 10 0
7782 CLARK MANAGER 7839 09-JUN-81 2450 - 10 .5
7934 MILLER CLERK 7782 23-JAN-82 1300 - 10 1
7788 SCOTT ANALYST 7566 19-APR-87 3000 - 20 0
7902 FORD ANALYST 7566 03-DEC-81 3000 - 20 0
7566 JONES MANAGER 7839 02-APR-81 2975 - 20 .5
7876 ADAMS CLERK 7788 23-MAY-87 1100 - 20 .75
7369 SMITH CLERK 7902 17-DEC-80 800 - 20 1
7698 BLAKE MANAGER 7839 01-MAY-81 2850 - 30 0
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 .2
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 .4
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 .6
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 .6
7900 JAMES CLERK 7698 03-DEC-81 950 - 30 1
2. Python Pandas(파이썬)
2) 분석함수
- Rank()
Python Programming |
withmooc = copy.copy(emp)
withmooc['pct_first'] = withmooc.groupby('deptno')['sal'].apply(lambda x: x.rank(ascending=False) / len(x))
withmooc.sort_values(['deptno','sal'],ascending = [True,False]).head(10)
Results |
empno ename job mgr hiredate sal comm deptno pct_first
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0.333333
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0.666667
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1.000000
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 0.300000
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 0.300000
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 0.600000
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0.800000
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1.000000
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 0.166667
1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 0.333333
- rank(pct=True,method='dense',ascending=False)
Python Programming |
withmooc = copy.copy(emp)
withmooc['pct_first'] = withmooc.groupby('deptno')['sal'].rank(pct=True,method='dense',ascending=False)
withmooc.sort_values(['deptno','sal'],ascending = [True,False]).head(10)
Results |
empno ename job mgr hiredate sal comm deptno pct_first
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0.333333
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0.666667
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1.000000
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 0.250000
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 0.250000
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 0.500000
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0.750000
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1.000000
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 0.200000
1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 0.400000
- x.rank(method='min',ascending=False)
Python Programming |
withmooc = copy.copy(emp)
withmooc['pct_first'] = withmooc.reset_index() \
.groupby(['deptno'])['sal'] \
.apply(lambda x: (x.rank(method='min',ascending=False) - 1) / (x.count() - 1) ) \
.values
withmooc.sort_values(['deptno','sal'],ascending = [True,False]).head(10)
Results |
empno ename job mgr hiredate sal comm deptno pct_first
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0.00
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0.50
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1.00
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 0.00
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 0.00
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 0.50
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0.75
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1.00
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 0.00
1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 0.20
3. R Programming (R Package)
2) 분석함수
- rank(-x,ties.method = 'min')
R Programming |
%%R
withmooc <- emp[order(-emp$sal), ]
withmooc['pct_first'] = with(withmooc, ave(sal,deptno, FUN=function(x) { (rank(-x,ties.method = 'min') -1)/ (length(x)-1) }))
withmooc[order(withmooc$deptno,-withmooc$sal),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno pct_first
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.5
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
4 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
5 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.5
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.75
8 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
9 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 0
10 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 0.2
4. R Dplyr Package
2) 분석함수
- percent_rank(desc(sal))
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_Perrank = percent_rank(desc(sal))) %>%
dplyr::arrange(deptno,desc(sal)) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: deptno [3]
empno ename job mgr hiredate sal comm deptno sal_Perrank
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.5
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
4 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
5 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.5
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.75
8 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
9 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 0
10 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 0.2
5. R sqldf Package
1) 집계 함수
- PERCENT_RANK()
- 참고 : https://www.sqlshack.com/calculate-sql-percentile-using-the-sql-server-percent_rank-function/
R Programming |
%%R
sqldf(" WITH Student
AS ( SELECT '1' grp, sal from emp
union all
select '1' grp, 1000 sal)
SELECT grp, sal, PERCENT_RANK() OVER (PARTITION BY grp ORDER BY sal) as sal_PerRank
from Student ")[1:10, ]
Results |
grp sal sal_PerRank
1 1 800 0.00000000
2 1 950 0.07142857
3 1 1000 0.14285714
4 1 1100 0.21428571
5 1 1250 0.28571429
6 1 1250 0.28571429
7 1 1300 0.42857143
8 1 1500 0.50000000
9 1 1600 0.57142857
10 1 2450 0.64285714
2) 분석함수
R Programming |
%%R
sqldf(" SELECT DEPTNO, ENAME, SAL,
PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) as sal_PerRank
FROM emp
order by deptno, sal desc; ")[1:10, ]
Results |
deptno ename sal sal_PerRank
1 10 KING 5000 0.00
2 10 CLARK 2450 0.50
3 10 MILLER 1300 1.00
4 20 SCOTT 3000 0.00
5 20 FORD 3000 0.00
6 20 JONES 2975 0.50
7 20 ADAMS 1100 0.75
8 20 SMITH 800 1.00
9 30 BLAKE 2850 0.00
10 30 ALLEN 1600 0.20
6. Python pandasql Package
1) 집계 함수
- PERCENT_RANK()
- 참고 : https://www.sqlshack.com/calculate-sql-percentile-using-the-sql-server-percent_rank-function/
with 구문에 급여 1000 값을 임으로 추가 후 이에 해당하는 순위 퍼센트(0.14285714)를 계산한다.
Python Programming |
ps.sqldf(" WITH Student \
AS ( SELECT '1' grp, sal from emp \
union all \
select '1' grp, 1000 sal) \
SELECT sal, PERCENT_RANK() OVER (PARTITION BY grp ORDER BY sal) as sal_PerRank from Student ").head(10)
Results |
sal sal_PerRank
0 800 0.000000
1 950 0.071429
2 1000 0.142857
3 1100 0.214286
4 1250 0.285714
5 1250 0.285714
6 1300 0.428571
7 1500 0.500000
8 1600 0.571429
9 2450 0.642857
2) 분석함수
- PERCENT_RANK()
Python Programming |
ps.sqldf(" SELECT DEPTNO, ENAME, SAL, \
PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) as PerRank \
FROM emp \
order by deptno, sal desc; ").head(10)
Results |
deptno ename sal PerRank
0 10 KING 5000 0.00
1 10 CLARK 2450 0.50
2 10 MILLER 1300 1.00
3 20 SCOTT 3000 0.00
4 20 FORD 3000 0.00
5 20 JONES 2975 0.50
6 20 ADAMS 1100 0.75
7 20 SMITH 800 1.00
8 30 BLAKE 2850 0.00
9 30 ALLEN 1600 0.20
7. R data.table Package
2) 분석함수
- dplyr::percent_rank(desc(sal))
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_Perrank := dplyr::percent_rank(desc(sal)), by=deptno][order(deptno,-sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_Perrank
1: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0.00
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.50
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1.00
4: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0.00
5: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0.00
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.50
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.75
8: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1.00
9: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 0.00
10: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 0.20
- rank(-sal,ties.method = 'min')
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_Perrank := (rank(-sal,ties.method = 'min') -1)/ (length(sal)-1), by=deptno][order(deptno,-sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_Perrank
1: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0.00
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.50
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1.00
4: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0.00
5: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0.00
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.50
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.75
8: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1.00
9: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 0.00
10: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 0.20
- rank(-sal,ties.method = 'min')
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_Perrank := (rank(-sal,ties.method = 'min') -1)/ (length(sal)-1), by=deptno][order(deptno,-sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_Perrank
1: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0.00
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0.50
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1.00
4: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0.00
5: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0.00
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0.50
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0.75
8: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1.00
9: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 0.00
10: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 0.20
[참고] frank 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_Perrank := frank(desc(sal),ties.method=c('first')), by=deptno][order(deptno,-sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_Perrank
1: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 3
4: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1
5: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 2
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 4
8: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 5
9: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 1
10: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
8. Python Duckdb의 SQL
Python Programming |
%%sql
SELECT A.*,
PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS per_rank
FROM emp A
ORDER BY DEPTNO, SAL DESC
LIMIT 7
Python Programming |
duckdb.sql(" SELECT A.*, \
PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS per_rank \
FROM emp A \
ORDER BY DEPTNO, SAL DESC \
LIMIT 7 ").df()
Results |
empno ename job mgr hiredate sal comm deptno per_rank
0 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0.00
1 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0.50
2 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1.00
3 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 0.00
4 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 0.00
5 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 0.50
6 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0.75
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글