본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

PERCENT_RANK 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 10.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ PERCENT_RANK  Oracle Function ]

     


    PERCENT_RANK함수는 파라미터로 지정한 인수의 기준 값(위치)에 대한 그룹 내의 위치를 나타내는 순위 퍼센트(상대 순위)를 반환. PERCENT_RANK함수는 CUME_DIST(누적 분포) 함수와 유사하다. PERCENT_RANK에 의해 반환되는 값의 범위는 0~1(포함한) 사이의 값이다. 모든 집합의 최초 행은 0의 PERCENT_RANK이다. 반환되는 값은 NUMBER이다.

     

     

     


    1. Oracle(오라클)

     

    1) 집계 함수

    다음 예제는 hr.employees테이블에서 급여가 $1,000를 수령하는 직원의 Percent Rank를 계산한다.

     

    Oracle Programming
    SELECT PERCENT_RANK(1000) WITHIN GROUP (ORDER BY SAL) AS PERCENT_RANK
    FROM   EMP;

     

    Results
                 PERCENT_RANK
    -------------------------------------------
    .1428571428571428571428571428571428571429

     


    2) 분석함수

    다음 예제는 개별 부서 내에서 근무하는 직원의 급여를 기준으로 Percent Rank를 계산한다.

     

    Oracle Programming
    SELECT A.*,
           PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS PERCENT_RANK       
    FROM   EMP A
    ORDER 
       BY  DEPTNO, SAL DESC;
    EMPNO	ENAME	JOB	MGR	HIREDATE	SAL	COMM	DEPTNO	PERCENT_RANK
    -------------------------------------------------------------------------------------
    7839	KING	PRESIDENT	 - 	17-NOV-81	5000	 - 	10	0
    7782	CLARK	MANAGER		7839	09-JUN-81	2450	 - 	10	.5
    7934	MILLER	CLERK		7782	23-JAN-82	1300	 - 	10	1
    7788	SCOTT	ANALYST		7566	19-APR-87	3000	 - 	20	0
    7902	FORD	ANALYST		7566	03-DEC-81	3000	 - 	20	0
    7566	JONES	MANAGER		7839	02-APR-81	2975	 - 	20	.5
    7876	ADAMS	CLERK		7788	23-MAY-87	1100	 - 	20	.75
    7369	SMITH	CLERK		7902	17-DEC-80	800	 - 	20	1
    7698	BLAKE	MANAGER		7839	01-MAY-81	2850	 - 	30	0
    7499	ALLEN	SALESMAN	7698	20-FEB-81	1600	300	30	.2
    7844	TURNER	SALESMAN	7698	08-SEP-81	1500	0	30	.4
    7521	WARD	SALESMAN	7698	22-FEB-81	1250	500	30	.6
    7654	MARTIN	SALESMAN	7698	28-SEP-81	1250	1400	30	.6
    7900	JAMES	CLERK		7698	03-DEC-81	950	 - 	30	1

    2. Python Pandas(파이썬)

     

    2) 분석함수

     

    • Rank()

     

    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['pct_first'] = withmooc.groupby('deptno')['sal'].apply(lambda x: x.rank(ascending=False) / len(x))
    
    withmooc.sort_values(['deptno','sal'],ascending = [True,False]).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	pct_first
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0.333333
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	0.666667
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1.000000
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	0.300000
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	0.300000
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	0.600000
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	0.800000
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1.000000
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	0.166667
    1	7499	ALLEN	SALESMAN	7698.0	1981/02/20	1600	300.0	30	0.333333

     


    • rank(pct=True,method='dense',ascending=False)

     

    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['pct_first'] = withmooc.groupby('deptno')['sal'].rank(pct=True,method='dense',ascending=False)
    
    withmooc.sort_values(['deptno','sal'],ascending = [True,False]).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	pct_first
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0.333333
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	0.666667
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1.000000
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	0.250000
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	0.250000
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	0.500000
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	0.750000
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1.000000
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	0.200000
    1	7499	ALLEN	SALESMAN	7698.0	1981/02/20	1600	300.0	30	0.400000

     


    • x.rank(method='min',ascending=False)

     

    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['pct_first'] = withmooc.reset_index() \
                          .groupby(['deptno'])['sal'] \
                          .apply(lambda x: (x.rank(method='min',ascending=False) - 1) / (x.count() - 1) ) \
                          .values
    
    withmooc.sort_values(['deptno','sal'],ascending = [True,False]).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	pct_first
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0.00
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	0.50
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1.00
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	0.00
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	0.00
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	0.50
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	0.75
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1.00
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	0.00
    1	7499	ALLEN	SALESMAN	7698.0	1981/02/20	1600	300.0	30	0.20
     

     


    3. R Programming (R Package)

     

    2) 분석함수

     

    • rank(-x,ties.method = 'min')

     

    R Programming
    %%R
    withmooc <- emp[order(-emp$sal), ]
    withmooc['pct_first'] = with(withmooc, ave(sal,deptno, FUN=function(x) { (rank(-x,ties.method = 'min') -1)/ (length(x)-1) }))
    
    withmooc[order(withmooc$deptno,-withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno pct_first
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10      0   
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10      0.5 
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10      1   
     4  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20      0   
     5  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20      0   
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20      0.5 
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20      0.75
     8  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20      1   
     9  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30      0   
    10  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30      0.2 

     

     


    4. R Dplyr Package

     

    2) 분석함수

     

    • percent_rank(desc(sal))

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_Perrank = percent_rank(desc(sal))) %>%
      dplyr::arrange(deptno,desc(sal)) %>%
      head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno sal_Perrank
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>       <dbl>
     1  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        0   
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        0.5 
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        1   
     4  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20        0   
     5  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        0   
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        0.5 
     7  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        0.75
     8  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1   
     9  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30        0   
    10  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30        0.2 

     


    5. R sqldf Package

     

    1) 집계 함수

    R Programming
    %%R
    
    sqldf(" WITH Student
              AS ( SELECT '1' grp, sal from emp
                   union all
                   select '1' grp, 1000 sal)
            SELECT grp, sal, PERCENT_RANK() OVER (PARTITION BY grp ORDER BY sal) as sal_PerRank 
            from   Student  ")[1:10, ]

     

    Results
       grp  sal sal_PerRank
    1    1  800  0.00000000
    2    1  950  0.07142857
    3    1 1000  0.14285714
    4    1 1100  0.21428571
    5    1 1250  0.28571429
    6    1 1250  0.28571429
    7    1 1300  0.42857143
    8    1 1500  0.50000000
    9    1 1600  0.57142857
    10   1 2450  0.64285714

     


    2) 분석함수

     

    R Programming
    %%R
    
    sqldf(" SELECT DEPTNO, ENAME, SAL,
                   PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) as sal_PerRank 
            FROM emp
            order by deptno, sal desc; ")[1:10, ]

     

    Results
       deptno  ename  sal sal_PerRank
    1      10   KING 5000        0.00
    2      10  CLARK 2450        0.50
    3      10 MILLER 1300        1.00
    4      20  SCOTT 3000        0.00
    5      20   FORD 3000        0.00
    6      20  JONES 2975        0.50
    7      20  ADAMS 1100        0.75
    8      20  SMITH  800        1.00
    9      30  BLAKE 2850        0.00
    10     30  ALLEN 1600        0.20

     

     


    6. Python pandasql Package

     

    1) 집계 함수

    with 구문에 급여 1000 값을 임으로 추가 후 이에 해당하는 순위 퍼센트(0.14285714)를 계산한다.

     

    Python Programming
    ps.sqldf(" WITH Student                            \
                 AS ( SELECT '1' grp, sal from emp     \
                      union all                        \
                      select '1' grp, 1000 sal)        \
               SELECT sal, PERCENT_RANK() OVER (PARTITION BY grp ORDER BY sal) as sal_PerRank from  Student ").head(10)

     

    Results
    	sal	sal_PerRank
    0	800	0.000000
    1	950	0.071429
    2	1000	0.142857
    3	1100	0.214286
    4	1250	0.285714
    5	1250	0.285714
    6	1300	0.428571
    7	1500	0.500000
    8	1600	0.571429
    9	2450	0.642857
     

    2) 분석함수

     

    • PERCENT_RANK()

     

    Python Programming
    ps.sqldf(" SELECT DEPTNO, ENAME, SAL,     \
                      PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) as PerRank      \
               FROM emp     \
               order by deptno, sal desc;  ").head(10)

     

    Results
    	deptno	ename	sal	PerRank
    0	10	KING	5000	0.00
    1	10	CLARK	2450	0.50
    2	10	MILLER	1300	1.00
    3	20	SCOTT	3000	0.00
    4	20	FORD	3000	0.00
    5	20	JONES	2975	0.50
    6	20	ADAMS	1100	0.75
    7	20	SMITH	800	1.00
    8	30	BLAKE	2850	0.00
    9	30	ALLEN	1600	0.20

     

     


    7. R data.table Package

     

    2) 분석함수

     

    • dplyr::percent_rank(desc(sal))

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_Perrank := dplyr::percent_rank(desc(sal)), by=deptno][order(deptno,-sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_Perrank
     1:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        0.00
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10        0.50
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10        1.00
     4:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        0.00
     5:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20        0.00
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20        0.50
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20        0.75
     8:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        1.00
     9:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30        0.00
    10:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30        0.20

     


    • rank(-sal,ties.method = 'min')

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_Perrank := (rank(-sal,ties.method = 'min') -1)/ (length(sal)-1), by=deptno][order(deptno,-sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_Perrank
     1:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        0.00
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10        0.50
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10        1.00
     4:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        0.00
     5:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20        0.00
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20        0.50
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20        0.75
     8:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        1.00
     9:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30        0.00
    10:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30        0.20

     


    • rank(-sal,ties.method = 'min')

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_Perrank := (rank(-sal,ties.method = 'min') -1)/ (length(sal)-1), by=deptno][order(deptno,-sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_Perrank
     1:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        0.00
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10        0.50
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10        1.00
     4:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        0.00
     5:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20        0.00
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20        0.50
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20        0.75
     8:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        1.00
     9:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30        0.00
    10:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30        0.20

     


    [참고] frank 함수

     

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_Perrank := frank(desc(sal),ties.method=c('first')), by=deptno][order(deptno,-sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_Perrank
     1:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10           1
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10           2
     3:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10           3
     4:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20           1
     5:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20           2
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20           3
     7:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20           4
     8:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20           5
     9:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30           1
    10:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30     

     


    8. Python Duckdb의 SQL

     

    Python Programming
    %%sql
    SELECT A.*,
           PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS per_rank
    FROM   emp A
    ORDER  BY DEPTNO, SAL DESC
    LIMIT  7

     

    Python Programming
    duckdb.sql(" SELECT A.*,                                                                           \
                        PERCENT_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS per_rank        \
                 FROM   emp A                                                                          \
                 ORDER  BY DEPTNO, SAL DESC                                                            \
                 LIMIT  7 ").df()

     

    Results
       empno   ename        job     mgr    hiredate   sal  comm  deptno  per_rank
    0   7839    KING  PRESIDENT     NaN  1981/11/17  5000   NaN      10      0.00
    1   7782   CLARK    MANAGER  7839.0  1981/01/09  2450   NaN      10      0.50
    2   7934  MILLER      CLERK  7782.0  1982/01/23  1300   NaN      10      1.00
    3   7788   SCOTT    ANALYST  7566.0  1982/12/09  3000   NaN      20      0.00
    4   7902    FORD    ANALYST  7566.0  1981/12/03  3000   NaN      20      0.00
    5   7566   JONES    MANAGER  7839.0  1981/04/02  2975   NaN      20      0.50
    6   7876   ADAMS      CLERK  7788.0  1983/01/12  1100   NaN      20      0.75

     

     

     


    Jamsil Baseball Stadium, South Korea (https://unsplash.com/photos/jwM5CCi6M9M)

      --------------------------------------------  

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글