본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

RANK 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 13.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ RANK Oracle Function ]

     


    RANK함수는 개별 그룹 별로 데이터를 정렬 후 그룹 내에서 기준 관측치 값의 순위를 계산한다. 반환되는 데이터형은 NUMBER이다.

    같은 순위를 가지는 순위 기준에 대해서는 같은 값을 가진다. 오라클 데이터베이스는 동일한 행에 대해 동일한 순위를 더하고, 다음의 순위를 계산한다. 그런 이유로 RANK가 연속하지 않는 숫자일 수 있다. 이 함수는 top-N과 bottom-N 리포팅에 유용하다.

     

     

     


    1. Oracle(오라클)

     

    1) 집계함수

    • RANK() 함수

     

    Oracle Programming
    SELECT DEPTNO,
           RANK(3000) WITHIN GROUP (ORDER BY SAL) AS RANK
    FROM   EMP
    GROUP  BY DEPTNO

     

    Results
    DEPTNO	RANK
    ---------------
    10	3
    20	4
    30	7

     


    2) 분석함수

    • RANK() 함수

     

    Oracle Programming
    SELECT DEPTNO, EMPNO, SAL, 
           RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL) ROW_RANK 
    FROM   EMP
    ORDER  
       BY  DEPTNO, SAL

     

    Results
    DEPTNO	EMPNO	SAL	ROW_RANK
    -----------------------------------
    10	7934	1300	1
    10	7782	2450	2
    10	7839	5000	3
    20	7369	800	1
    20	7876	1100	2
    20	7566	2975	3
    20	7788	3000	4
    20	7902	3000	4
    30	7900	950	1
    30	7654	1250	2
    30	7521	1250	2
    30	7844	1500	4
    30	7499	1600	5
    30	7698	2850	6

     


    2. Python Pandas(파이썬)

     

    2) 분석함수

    • RANK() 함수

     

    Python Programming
    withmooc = emp.copy()
    
    withmooc['sal_rank'] = withmooc.groupby('deptno')['sal'].rank(method="first",ascending=True)
    
    withmooc.sort_values(['deptno','sal']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_rank
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1.0
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2.0
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	3.0
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1.0
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	2.0

     

     


    3. R Programming (R Package)

     

    2) 분석함수

     

    • RANK() 함수

     

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['sal_rank'] = with(withmooc, ave(sal, deptno, FUN =function(x) { rank(x,ties.method='first') } ))  # FUN =function(x) { median(x) }
    
    withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_rank
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>    <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        2
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        3
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20        4
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        5
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30        1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30        2

     


    • RANK() 함수

     

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    withmooc<- emp[order(emp$sal),]
    plyr::ddply(withmooc,.(deptno),transform,sal_median = rank(sal,ties.method='first'))[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno sal_median
    1   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10          1
    2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10          2
    3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          3
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20          1
    5   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          2
    6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          3
    7   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          4
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20          5
    9   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30          1
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30          2

     

     


    4. R Dplyr Package

     

    2) 분석함수

    • RANK() 함수

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_rank = rank(sal,ties.method='first')) %>%
      dplyr::arrange(deptno,sal) %>%
      head()

     

    Results
    # A tibble: 6 x 9
    # Groups:   deptno [2]
      empno ename  job         mgr hiredate     sal  comm deptno sal_rank
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>    <int>
    1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        1
    2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        2
    3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        3
    4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1
    5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        2
    6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        3

     

     


    5. R sqldf Package

     

    1) 집계함수

    • RANK() 함수

     

    R Programming
    %%R
    
    sqldf(" WITH new_emp
              AS ( SELECT deptno, sal from emp
                   union all
                   select 10 deptno, 3000 sal
                   union all
                   select 20 deptno, 3000 sal
                   union all
                   select 30 deptno, 3000 sal
                 )
            SELECT *
            FROM   ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from  new_emp order by deptno,sal )
            where  sal = 3000
            ")

     

    Results
      deptno  sal sal_PerRank
    1     10 3000           3
    2     20 3000           4
    3     20 3000           4
    4     20 3000           4
    5     30 3000           7

     


    2) 분석함수

    • RANK() 함수

     

    R Programming
    %%R
    
    sqldf(" SELECT ename, deptno, sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank
            FROM emp ")[1:10, ]

     

    Results
        ename deptno  sal sal_rank
    1  MILLER     10 1300        1
    2   CLARK     10 2450        2
    3    KING     10 5000        3
    4   SMITH     20  800        1
    5   ADAMS     20 1100        2
    6   JONES     20 2975        3
    7   SCOTT     20 3000        4
    8    FORD     20 3000        4
    9   JAMES     30  950        1
    10   WARD     30 1250        2

     

     


    6. Python pandasql Package

     

    1) 집계함수

    • RANK() 함수

     

    Python Programming
    ps.sqldf(" WITH Student                       \
              AS ( SELECT deptno, sal from emp    \
                   union all                      \
                   select 10 deptno, 3000 sal     \
                   union all                      \
                   select 20 deptno, 3000 sal     \
                   union all                      \
                   select 30 deptno, 3000 sal     \
                 )                                \
            SELECT *                              \
            FROM   ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from  Student order by deptno,sal )     \
            where  sal = 3000  ")

     

    Results
    	deptno	sal	sal_PerRank
    0	10	3000	3
    1	20	3000	4
    2	20	3000	4
    3	20	3000	4
    4	30	3000	7

     


    2) 분석함수

    • RANK() 함수

     

    Python Programming
    ps.sqldf(" SELECT ename, deptno, sal,                                         \
                      RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank     \
               FROM emp  ").head()

     

    Results
    	ename	deptno	sal	sal_rank
    0	MILLER	10	1300	1
    1	CLARK	10	2450	2
    2	KING	10	5000	3
    3	SMITH	20	800	1
    4	ADAMS	20	1100	2

     

     


    7. R data.table Package

     

    2) 분석함수

    • RANK() 함수

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    # DT[, sal_Perrank := frank(desc(sal),ties.method=c('first')), by=deptno][order(deptno,-sal),]
    DT[, sal_Perrank := rank(sal,ties.method=c('first')), by=deptno][order(deptno,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_Perrank
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10           1
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10           2
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10           3
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20           1
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20           2
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20           3
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20           4
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20           5
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30           1
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30           2

     


    • frank 함수

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    # DT[, sal_Perrank := frank(desc(sal),ties.method=c('first')), by=deptno][order(deptno,-sal),]
    DT[, sal_Perrank := frank(sal,ties.method=c('first')), by=deptno][order(deptno,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_Perrank
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10           1
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10           2
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10           3
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20           1
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20           2
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20           3
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20           4
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20           5
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30           1
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30           2

     


    8. Python Duckdb의 SQL

     

    1) 집계함수

     

    Python Programming
    %%sql
    Student << SELECT deptno, sal from emp
                   union all
                   select 10 deptno, 3000 sal
                   union all
                   select 20 deptno, 3000 sal
                   union all
                   select 30 deptno, 3000 sal
    %%sql
      SELECT *
      FROM   ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from  Student order by deptno,sal )
      where  sal = 3000

     

    Python Programming
    duckdb.sql(" SELECT *                                                                                                                        \
                 FROM   ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from  Student order by deptno,sal )   \
                 where  sal = 3000 ").df()

     

    Results
       deptno   sal  sal_PerRank
    0      10  3000            3
    1      20  3000            4
    2      20  3000            4
    3      20  3000            4
    4      30  3000            7

     


    2) 분석함수

     

    Python Programming
    %%sql
      SELECT ename, deptno, sal,
             RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank
      FROM   emp
      ORDER  BY DEPTNO, SAL
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT ename, deptno, sal,                                              \
                        RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank          \
                 FROM   emp                                                              \
                 ORDER  BY DEPTNO, SAL                                                   \
                 LIMIT  6 ").df()

     

    Results
        ename  deptno   sal  sal_rank
    0  MILLER      10  1300         1
    1   CLARK      10  2450         2
    2    KING      10  5000         3
    3   SMITH      20   800         1
    4   ADAMS      20  1100         2
    5   JONES      20  2975         3

     

     


    Citrus fruit flat order ( https://unsplash.com/photos/hOwcob_3dpc )

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글