본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

DENSE_RANK 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 11. 23.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ DENSE_RANK Oracle Function ]

     


    DENSE_RANK()는 Rank()와 유사한 함수로 ORDER BY절에 사용된 컬럼이나 표현식에 대하여 순위를 부여하는데 RANK()와 달리 동일 순위 다음의 순위는 동일 순위의 수와 상관없이 1 증가된 값을 돌려준다.

     

     


    1. Oracle(오라클)

     

    Oracle Programming
    SELECT DEPTNO, ENAME, SAL,
           DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL) DENSE_RANK
    FROM   EMP
    ORDER 
       BY  DEPTNO, DENSE_RANK;

     

    Results
    DEPTNO	ENAME	SAL	DENSE_RANK
    --------------------------------
    10	MILLER	1300	1
    10	CLARK	2450	2
    10	KING	5000	3
    20	SMITH	800	1
    20	ADAMS	1100	2
    20	JONES	2975	3
    20	SCOTT	3000	4
    20	FORD	3000	4
    30	JAMES	950	1
    30	MARTIN	1250	2
    30	WARD	1250	2
    30	TURNER	1500	3
    30	ALLEN	1600	4
    30	BLAKE	2850	5

     


    2. Python Pandas(파이썬)

     

    Rank()

    Python Programming
    withmooc = emp.copy()
    
    withmooc['DENSE_RANK'] = emp.groupby('deptno')['sal'].transform(lambda x: x.rank(method='min'))
    
    withmooc.sort_values(['deptno','DENSE_RANK']).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	DENSE_RANK
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	3
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	2
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	3
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	4
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	4
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	1
    2	7521	WARD	SALESMAN	7698.0	1981/02/22	1250	500.0	30	2

     


    3. R Programming (R Package)

     

    Rank

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['DENSE_RANK'] = ave(withmooc[, 'sal'], withmooc[, 'deptno'], FUN = function (x) { rank(x, ties.method = "min", na.last = "keep") }  )
    
    withmooc[order(withmooc$deptno,withmooc$DENSE_RANK), ][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno DENSE_RANK
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10          1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10          2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10          3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20          1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20          2
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20          3
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20          4
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20          4
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30          1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30          2

     


    dplyr::dense_rank()

    R Programming
    %%R
    
    library(plyr)
    withmooc <- ddply(emp,.(deptno),transform,DENSE_RANK = dplyr::dense_rank(sal))
    
    withmooc[order(withmooc$deptno,withmooc$DENSE_RANK), ][1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno DENSE_RANK
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10          1
    1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10          2
    2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          3
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20          1
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          2
    5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          3
    6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          4
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20          4
    14  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30          1
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30          2

     

     


    4. R Dplyr Package

     

    [참고] dense_rank 함수 구조

    R Programming
    %%R
    
    dplyr::dense_rank

     

    Results (함수 구조)
    function (x) 
    {
        match(x, sort(unique(x)))
    }
    <bytecode: 0x000001b0ddbe9258>
    <environment: namespace:dplyr>

     


    dplyr::dense_rank

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate( DENSE_RANK = dense_rank(sal)) %>%
      dplyr::arrange(deptno,DENSE_RANK) %>%
      head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno DENSE_RANK
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <int>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10          1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10          2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10          3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20          1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20          2
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20          3
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20          4
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20          4
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30          1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30          2

     


    sort & unique()

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate( DENSE_RANK = match(sal, sort(unique(sal)))) %>%
      dplyr::arrange(deptno,DENSE_RANK) %>%
      head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno DENSE_RANK
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <int>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10          1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10          2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10          3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20          1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20          2
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20          3
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20          4
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20          4
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30          1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30          2

     


    Rank()

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate( DENSE_RANK = rank(sal, ties.method = "min", na.last = "keep") ) %>%
      dplyr::arrange(deptno,DENSE_RANK) %>%
      head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno DENSE_RANK
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <int>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10          1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10          2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10          3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20          1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20          2
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20          3
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20          4
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20          4
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30          1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30          2

     

     


    5. R sqldf Package

     

    R Programming
    %%R
    
    sqldf(" select *, DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK from emp ")[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno DENSE_RANK
    1   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10          1
    2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10          2
    3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          3
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20          1
    5   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          2
    6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          3
    7   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          4
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20          4
    9   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30          1
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30          2

     

     

     


    6. Python pandasql Package

     

    Python Programming
    ps.sqldf(" select *, DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK from emp ").head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	DENSE_RANK
    0	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1
    1	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2
    2	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	3
    3	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1
    4	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	2
    5	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	3
    6	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	4
    7	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	4
    8	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	1
    9	7521	WARD	SALESMAN	7698.0	1981/02/22	1250	500.0	30	2

     

     


    7. R data.table Package

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, DENSE_RANK := match(sal, sort(unique(sal))),by = deptno][order(deptno,DENSE_RANK)][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno DENSE_RANK
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10          1
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10          2
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          3
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20          1
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          2
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          3
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          4
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20          4
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30          1
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30          2

     

     


    8. Python DuckDB의 SQL

     

    Python Programming
    %%sql
      select *, 
             DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK
      from   emp
      order  by deptno, sal
      LIMIT  5

     

    Python Programming
    duckdb.sql(" select empno,ename,job,sal,comm,deptno,                                  \
                        DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK   \
                 from   emp                                                               \
                 order  by deptno, sal                                                    \
                 LIMIT  6 ").df()

     

     

    Results
       empno   ename        job   sal  comm  deptno  DENSE_RANK
    0   7934  MILLER      CLERK  1300   NaN      10           1
    1   7782   CLARK    MANAGER  2450   NaN      10           2
    2   7839    KING  PRESIDENT  5000   NaN      10           3
    3   7369   SMITH      CLERK   800   NaN      20           1
    4   7876   ADAMS      CLERK  1100   NaN      20           2
    5   7566   JONES    MANAGER  2975   NaN      20           3

     

     

     


     

    Traditional Korean Masks Used for Skits (https://unsplash.com/photos/6ti-VnQ_wyM)

      --------------------------------------------  

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록? 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글