본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

ROW_NUMBER 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 15.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ ROW_NUMBER Oracle Function ]

     


    ROW_NUMBER함수는 분석 함수이다. 이 함수는 처리되는 범위(파티션에 포함된 모든 행 또는 쿼리에 의해서 반환되는 모든 행)에서 존재하는 모든 관측치에 대하여 order_by_clause에서 지정된 행의 순서대로 1로 시작하여 각 행에 유일한 (unique) 번호를 부여한다.

     

     

     

     


    1. Oracle(오라클)

     

    row_number() 함수

    개별 직무 내에 속한 직원의 관측치에 대하여 월급을 기준으로 정렬 후 중복 없이 유일한 번호를 부여하라.

     

    Oracle Programming
    SELECT job, ename, sal, RN
    FROM   ( SELECT job, ename, sal,
                    ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
             FROM emp
           )
    WHERE RN <= 1
    order by 1

     

    Results
    JOB		ENAME	SAL	RN
    --------------------------------------
    ANALYST		SCOTT	3000	1
    CLERK		SMITH	800	1
    MANAGER		CLARK	2450	1
    PRESIDENT	KING	5000	1
    SALESMAN	WARD	1250	1

     


    2. Python Pandas(파이썬)

     

    cumcount() 함수

     

    Python Programming
    withmooc = copy.copy(emp)
    withmooc['row_num'] = withmooc.sort_values(['sal'], ascending=[True]).groupby('job').cumcount() + 1
    
    withmooc.sort_values(['job','sal']).head(7)

     

    Results
    	empno	ename	job	mgr	hiredate	sal	comm	deptno	row_num
    7	7788	SCOTT	ANALYST	7566.0	1982/12/09	3000	NaN	20	1
    12	7902	FORD	ANALYST	7566.0	1981/12/03	3000	NaN	20	2
    0	7369	SMITH	CLERK	7902.0	1980/12/17	800	NaN	20	1
    11	7900	JAMES	CLERK	7698.0	1981/12/03	950	NaN	30	2
    10	7876	ADAMS	CLERK	7788.0	1983/01/12	1100	NaN	20	3
    13	7934	MILLER	CLERK	7782.0	1982/01/23	1300	NaN	10	4
    6	7782	CLARK	MANAGER	7839.0	1981/01/09	2450	NaN	10	1

     


    rank 함수

    - method=’first’ : 동일한 값으로 구성된 관측치에 대하여 데이터 상에서 먼저 반환되는 관측치부터 순위를 부여한다.

     

    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['sal_rank']  = withmooc.sort_values('sal',ascending = True).groupby('job')['sal'].transform(lambda x: x.rank(method='first',ascending=True) )
    withmooc.sort_values(['job','sal']).head()

     

    Results
    	empno	ename	job	mgr	hiredate	sal	comm	deptno	sal_rank
    7	7788	SCOTT	ANALYST	7566.0	1982/12/09	3000	NaN	20	1
    12	7902	FORD	ANALYST	7566.0	1981/12/03	3000	NaN	20	2
    0	7369	SMITH	CLERK	7902.0	1980/12/17	800	NaN	20	1
    11	7900	JAMES	CLERK	7698.0	1981/12/03	950	NaN	30	2
    10	7876	ADAMS	CLERK	7788.0	1983/01/12	1100	NaN	20	3
     

     


    3. R Programming (R Package)

     

    dplyr::row_number 함수


    R Programming
    %%R
    withmooc <- emp[order(withmooc$sal),]
    withmooc['sal_rank']  = with(withmooc, ave(sal, job, FUN =function(x) { dplyr::row_number(x) }))
    
    withmooc[order(withmooc$job,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_rank
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>    <dbl>
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20        1
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        2
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1
     4  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30        2
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        3
     6  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        4
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        1
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30        2
     9  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        3
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        1

     


    rank 함수

    • method=’first’ : 동일한 값으로 구성된 관측치에 대하여 데이터 상에서 먼저 반환되는 관측치부터 순위를 부여한다.
    • na.last = “keep” : 결측치를 원래의 위치에 순위로 NA 값을 그대로 반환한다.

     

    R Programming
    %%R
    withmooc <- emp
    withmooc['sal_rank']  = with(withmooc, ave(sal, job, FUN =function(x) { rank(x, ties.method = "first", na.last = "keep") }))
    
    withmooc[order(withmooc$job,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_rank
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>    <dbl>
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20        1
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        2
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1
     4  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30        2
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        3
     6  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        4
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        1
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30        2
     9  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        3
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        1

     

     


    4. R Dplyr Package

     

    row_number() 함수

     

    R Programming
    %%R
    
    emp %>%
      dplyr::arrange(sal) %>%
      dplyr::group_by(job) %>%
      dplyr::mutate(row_num = row_number(sal)) %>%
      dplyr::arrange(job,sal) %>%
      head()

     

    Results
    # A tibble: 6 x 9
    # Groups:   job [2]
      empno ename  job       mgr hiredate     sal  comm deptno row_num
      <dbl> <chr>  <chr>   <dbl> <date>     <dbl> <dbl>  <dbl>   <int>
    1  7788 SCOTT  ANALYST  7566 1982-12-09  3000    NA     20       1
    2  7902 FORD   ANALYST  7566 1981-12-03  3000    NA     20       2
    3  7369 SMITH  CLERK    7902 1980-12-17   800    NA     20       1
    4  7900 JAMES  CLERK    7698 1981-12-03   950    NA     30       2
    5  7876 ADAMS  CLERK    7788 1983-01-12  1100    NA     20       3
    6  7934 MILLER CLERK    7782 1982-01-23  1300    NA     10       4

     

     


    5. R sqldf Package

     

    ROW_NUMBER 함수

     

    R Programming
    %%R
    
    sqldf(" SELECT job, ename, sal,
                   ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
            FROM emp ")[1:10, ]

     

    Results
             job  ename  sal RN
    1    ANALYST  SCOTT 3000  1
    2    ANALYST   FORD 3000  2
    3      CLERK  SMITH  800  1
    4      CLERK  JAMES  950  2
    5      CLERK  ADAMS 1100  3
    6      CLERK MILLER 1300  4
    7    MANAGER  CLARK 2450  1
    8    MANAGER  BLAKE 2850  2
    9    MANAGER  JONES 2975  3
    10 PRESIDENT   KING 5000  1

     


    row_number() 함수

    개별 직무 내에서 최소 급여를 받는 사원의 명단을 출력하시오.

     

    R Programming
    %%R
    
    sqldf(" SELECT job, ename, sal,RN
            FROM   ( SELECT job, ename, sal,
                            ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
                     FROM emp
                   )
            WHERE RN <= 1 ")

     

    Results
            job ename  sal RN
    1   ANALYST SCOTT 3000  1
    2     CLERK SMITH  800  1
    3   MANAGER CLARK 2450  1
    4 PRESIDENT  KING 5000  1
    5  SALESMAN  WARD 1250  1

     

     


    6. Python pandasql Package

     

    Python Programming
    ps.sqldf(" SELECT job, ename, sal,RN                                                     \
               FROM   ( SELECT job, ename, sal,                                              \
                               ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN          \
                        FROM emp                                                             \
                      )                                                                      \
               WHERE RN <= 1 ")

     

    Results
    	job		ename	sal	RN
    0	ANALYST		SCOTT	3000	1
    1	CLERK		SMITH	800	1
    2	MANAGER		CLARK	2450	1
    3	PRESIDENT	KING	5000	1
    4	SALESMAN	WARD	1250	1
     

     


    7. R data.table Package

     

    dplyr::row_number 함수

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(sal),row_num := dplyr::row_number(sal), by=job][order(job,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno row_num
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       1
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       2
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       1
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       2
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20       3
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       4
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       1
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       3
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       1

     


    개별 직무내에서 데이터를 정렬 후 순위 부여(1:그룹 내 관측치 개수)

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(sal),row_num := 1:.N , by=job][order(job,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno row_num
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       1
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       2
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       1
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       2
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20       3
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       4
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       1
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       3
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       1

     


    seq_len 함수

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(sal),row_num := seq_len(.N) , by=job][order(job,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno row_num
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       1
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       2
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       1
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       2
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20       3
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       4
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       1
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       3
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       1

     


    그룹내 부여된 행 번호(row.names)를 사용하여 관측치 번호(순위) 부여

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(sal),row_num := row.names(.SD) , by=job][order(job,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno row_num
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       1
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       2
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       1
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       2
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20       3
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       4
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       1
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       3
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       1

     


    data.table::frank

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(sal),row_num := data.table::frank(sal,ties.method="first") , by=job][order(job,sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno row_num
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       1
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       2
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       1
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       2
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20       3
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       4
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       1
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       3
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       1

     


    8. Python Duckdb의 SQL

     

    ROW_NUMBER() 함수

     

    Python Programming
    %%sql
      SELECT job, ename, sal
      FROM   ( SELECT job, ename, sal,
                      ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN
               FROM emp
             )
      WHERE RN <= 1
      order by 1

     

    Python Programming
    duckdb.sql(" SELECT job, ename, sal                                                         \
                 FROM   ( SELECT job, ename, sal,                                               \
                                 ROW_NUMBER() OVER (PARTITION BY job ORDER BY sal) RN           \
                          FROM emp                                                              \
                        )                                                                       \
                 WHERE RN <= 1                                                                  \
                 order by 1 ").df()

     

    Results
          Arc_Cosine
    0	1.266104

     


    Old Bingo tiles found in my grandfathers house ( https://unsplash.com/photos/3UbsiRcrFV4 )

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글