본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

NTH_VALUE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2022. 1. 4.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ NTH_VALUE Oracle Function ]

     


    NTH_VALUE 함수는 ‘analytic_clause’로 정의된 윈도우(그룹)에서 n번째 행의 measure_expr 값을 반환한다. 반환된 값은 ‘measure_expr’의 데이터 유형을 가진다.

     

    • 함수 설명 : 

     


    1. Oracle(오라클)

     

    NTH_VALUE() 함수

     

    Oracle Programming
    SELECT empno,
           ename,
           deptno,
           sal,
           nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,
           NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom
    FROM   emp
    order by deptno, sal;

     

    Results
    EMPNO	ENAME	DEPTNO	SAL	NTH_TOP	NTH_BOTTOM
    ------------------------------------------------------
    7934	MILLER	10	1300	2450	2450
    7782	CLARK	10	2450	2450	2450
    7839	KING	10	5000	2450	2450
    7369	SMITH	20	800	1100	3000
    7876	ADAMS	20	1100	1100	3000
    7566	JONES	20	2975	1100	3000
    7788	SCOTT	20	3000	1100	3000
    7902	FORD	20	3000	1100	3000
    7900	JAMES	30	950	1250	1600
    7654	MARTIN	30	1250	1250	1600
    7521	WARD	30	1250	1250	1600
    7844	TURNER	30	1500	1250	1600
    7499	ALLEN	30	1600	1250	1600
    7698	BLAKE	30	2850	1250	1600

     


    2. Python Pandas(파이썬)

     

    transform() 함수와 'nth' 인수

    개별 부서 내에서 최소 급여 값과 최대 급여 값을 반환한다. 급여를 오름차순과 내림차순으로 각각 정렬 후 첫 번째 값을 취한다.

     

    Python Programming
    withmooc = emp.copy()
    withmooc['nth_top']    = emp.sort_values('sal', ascending=True).groupby('deptno')['sal'].transform('nth',1)
    withmooc['nth_bottom'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform('nth',1)
    withmooc.sort_values(['deptno','sal']).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	nth_top	nth_bottom
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450	2450
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450	2450
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450	2450
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1100	3000
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1100	3000
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	1100	3000
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	1100	3000
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	1100	3000
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	1250	1600
    2	7521	WARD	SALESMAN	7698.0	1981/02/22	1250	500.0	30	1250	1600

     

     


    transform() 함수와 iloc

     

    Python Programming
    withmooc = emp.copy()
    withmooc['nth_top']    = emp.sort_values('sal', ascending=True).groupby('deptno')['sal'].transform(lambda x:x.iloc[1])
    withmooc['nth_bottom'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform(lambda x:x.iloc[1])
    withmooc.sort_values(['deptno','sal']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	nth_top	nth_bottom
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450	2450
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450	2450
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450	2450
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1100	3000
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1100	3000

     


    transform() 함수와 'nth' 인수

    개별 부서의 급여 최솟값과 최댓값을 먼저 선택 후 원본 테이블에 해당 그룹의 선택한 값을 mapping 한다.

     

    Python Programming
    withmooc = emp.copy()
    
    group_top = withmooc.sort_values('sal',ascending = True).groupby('deptno')['sal'].nth(1)
    withmooc['nth_top'] = withmooc['deptno'].map(group_top)
    
    group_bottom = withmooc.sort_values('sal',ascending = False).groupby('deptno')['sal'].nth(1)
    withmooc['nth_bottom'] = withmooc['deptno'].map(group_bottom)
    
    withmooc.sort_values(['deptno','sal']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	nth_top	nth_bottom
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450	2450
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450	2450
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450	2450
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1100	3000
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1100	3000

     


    3. R Programming (R Package)

     

    dplyr::nth() 함수

    개별 부서 내에서 상위 2번째와 하위 2번째 급여 값을 반환한다. 급여를 오름차순과 내림차순으로 각각 정렬 후 두 번째 값을 취한다.

     

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    withmooc <- emp
    
    withmooc <- plyr::ddply(withmooc,.(deptno),transform,nth_top    = dplyr::nth(sal,2, order_by = sal ))
    withmooc <- plyr::ddply(withmooc,.(deptno),transform,nth_bottom = dplyr::nth(sal,2, order_by = desc(sal) ))
    
    withmooc[order(withmooc$deptno,withmooc$sal), ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno nth_top nth_bottom
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    2450       2450
    1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    2450       2450
    2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    2450       2450
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20    1100       3000
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1100       3000
    5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    1100       3000
    6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    1100       3000
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20    1100       3000
    14  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30    1250       1600
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30    1250       1600

     


    dplyr::nth() 함수

     

    R Programming
    %%R
    withmooc <- emp[order(emp$sal),]
    withmooc['nth_top'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { dplyr::nth(x,2) }))
    #withmooc['nth_top'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { x[2] }))
    
    withmooc <- withmooc[order(-withmooc$sal),]
    withmooc['nth_bottom'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { dplyr::nth(x,2) }))
    #withmooc['nth_bottom'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { x[2] }))
    
    withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 10
       empno ename  job         mgr hiredate     sal  comm deptno nth_top nth_bottom
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>      <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    2450       2450
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2450       2450
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    2450       2450
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    1100       3000
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1100       3000
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    1100       3000
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    1100       3000
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    1100       3000
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1250       1600
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1250       1600

     


    dplyr::nth() 함수

     

    R Programming
    %%R
    
    withmooc <- emp[order(emp$sal),]
    
    withmooc['nth_top'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  dplyr::nth(x,2) } )
    # withmooc['nth_top'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  x[2] } )
    
    withmooc <- withmooc[order(-withmooc$sal),]
    withmooc['nth_bottom'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  dplyr::nth(x,2) } )
    # withmooc['nth_bottom'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  x[2] } )
    
    withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 10
       empno ename  job         mgr hiredate     sal  comm deptno nth_top nth_bottom
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>      <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    2450       2450
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2450       2450
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    2450       2450
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    1100       3000
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1100       3000
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    1100       3000
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    1100       3000
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    1100       3000
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1250       1600
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1250       1600

     


    nth() 함수

    개별 부서 내에서 하위 2번째 급여 값을 반환한다.

     

    R Programming
    %%R
    # split 되는 그룹내의 관측치 순서 보존을 위하여 데이터 사전 정렬 필요
    withmooc <- emp[order(emp$deptno, emp$sal),]
    
    Nth_var <- with(withmooc, purrr::map_chr(base::split(withmooc,withmooc$deptno), ~nth(.x$sal,2)) ) 
    
    withmooc1 <- data.frame(withmooc, nth_top =rep(Nth_var, table(withmooc$deptno)))
    
    withmooc1[order(withmooc1$deptno,withmooc1$sal),][1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno     nth_top
    1   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10 2450.000000
    2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10 2450.000000
    3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10 2450.000000
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20 1100.000000
    5   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20 1100.000000
    6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 1100.000000
    7   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20 1100.000000
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20 1100.000000
    9   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30 1250.000000
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 1250.000000

     


    tapply() 함수

     

    R Programming
    %%R
    
    withmooc <- emp[order(emp$deptno, emp$sal),]
    
    First_var <- with(withmooc, tapply(sal, deptno, nth, 2))   # head, 1
    
    data.frame(withmooc, nth_top =rep(Nth_var, table(withmooc$deptno)))[1:10, ]
    
    # with(withmooc[order(withmooc$sal),],  data.frame(emp, nth_top=rep(tapply(sal, denptno, nth, 2), table(job))) )

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno     nth_top
    1   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10 2450.000000
    2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10 2450.000000
    3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10 2450.000000
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20 1100.000000
    5   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20 1100.000000
    6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 1100.000000
    7   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20 1100.000000
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20 1100.000000
    9   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30 1250.000000
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 1250.000000

     


    nth() 함수

     

    R Programming
    %%R
    
    withmooc <- emp[order(emp$sal),]
    
    mapping    <- t( purrr::map_dfc( split(withmooc,withmooc$deptno), ~nth(.x$sal,2) ) )
    
    trans_data <- melt(mapping, value.name="nth_top")
    
    merge(emp,trans_data,by.x='deptno',by.y='Var1')[1:10, ]

     

    Results
       deptno empno  ename       job  mgr   hiredate  sal comm Var2 nth_top
    1      10  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA    1    2450
    2      10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA    1    2450
    3      10  7934 MILLER     CLERK 7782 1982-01-23 1300   NA    1    2450
    4      20  7369  SMITH     CLERK 7902 1980-12-17  800   NA    1    1100
    5      20  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA    1    1100
    6      20  7566  JONES   MANAGER 7839 1981-04-02 2975   NA    1    1100
    7      20  7902   FORD   ANALYST 7566 1981-12-03 3000   NA    1    1100
    8      20  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA    1    1100
    9      30  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300    1    1250
    10     30  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA    1    1250

     


    4. R Dplyr Package

     

    dplyr::nth() 함수 구조 확인

     

    R Programming
    %%R
    
    dplyr::nth

     

    Results - dplyr::nth 함수 구조
    function (x, n, order_by = NULL, default = default_missing(x)) 
    {
        if (length(n) != 1 || !is.numeric(n)) {
            abort("`n` must be a single integer.")
        }
        n <- trunc(n)
        if (n == 0 || n > length(x) || n < -length(x)) {
            return(default)
        }
        if (n < 0) {
            n <- length(x) + n + 1
        }
        if (is.null(order_by)) {
            x[[n]]
        }
        else {
            x[[order(order_by)[[n]]]]
        }
    }
    <bytecode: 0x00000218d09751f0>
    <environment: namespace:dplyr>

     


    dplyr::nth() 함수

     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate( nth_top    = nth(sal, 2, order_by = sal),
                     nth_bottom = nth(sal, 2, order_by = desc(sal))) %>%
      dplyr::arrange(deptno,sal) %>%
      head(10)

     

    Results
    # A tibble: 10 x 10
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno nth_top nth_bottom
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>      <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    2450       2450
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2450       2450
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    2450       2450
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    1100       3000
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1100       3000
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    1100       3000
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    1100       3000
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    1100       3000
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1250       1600
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1250       1600

     

     


    [참고]

    • row_number()

     

    R Programming
    %%R
    
    emp %>%
     dplyr::group_by(deptno) %>%
     dplyr::arrange(deptno,sal) %>%
     filter(row_number() == 2)

     

    Results
    # A tibble: 3 x 8
    # Groups:   deptno [3]
      empno ename job        mgr hiredate     sal  comm deptno
      <dbl> <chr> <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7782 CLARK MANAGER   7839 1981-01-09  2450    NA     10
    2  7876 ADAMS CLERK     7788 1983-01-12  1100    NA     20
    3  7521 WARD  SALESMAN  7698 1981-02-22  1250   500     30

     


    5. R sqldf Package

     

    nth_value() 함수

     

    R Programming
    %%R
    
    sqldf(" SELECT empno,
                   ename,
                   deptno,
                   sal,
                   nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,
                   NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom
            FROM   emp
            order by deptno, sal")

     

    Results
       empno  ename deptno  sal nth_top nth_bottom
    1   7934 MILLER     10 1300    2450       2450
    2   7782  CLARK     10 2450    2450       2450
    3   7839   KING     10 5000    2450       2450
    4   7369  SMITH     20  800    1100       3000
    5   7876  ADAMS     20 1100    1100       3000
    6   7566  JONES     20 2975    1100       3000
    7   7788  SCOTT     20 3000    1100       3000
    8   7902   FORD     20 3000    1100       3000
    9   7900  JAMES     30  950    1250       1600
    10  7521   WARD     30 1250    1250       1600
    11  7654 MARTIN     30 1250    1250       1600
    12  7844 TURNER     30 1500    1250       1600
    13  7499  ALLEN     30 1600    1250       1600
    14  7698  BLAKE     30 2850    1250       1600

     


    6. Python pandasql Package

     

    nth_value() 함수

     

    Python Programming
    ps.sqldf(" SELECT empno,     \
                   ename,        \
                   deptno,       \
                   sal,          \
                   nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,     \
                   NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom     \
            FROM   emp     \
            order by deptno, sal   ").head(10)

     

    Results
    	empno	ename	deptno	sal	nth_top	nth_bottom
    0	7934	MILLER	10	1300	2450	2450
    1	7782	CLARK	10	2450	2450	2450
    2	7839	KING	10	5000	2450	2450
    3	7369	SMITH	20	800	1100	3000
    4	7876	ADAMS	20	1100	1100	3000
    5	7566	JONES	20	2975	1100	3000
    6	7788	SCOTT	20	3000	1100	3000
    7	7902	FORD	20	3000	1100	3000
    8	7900	JAMES	30	950	1250	1600
    9	7521	WARD	30	1250	1250	1600

     


    7. R data.table Package

     

    nth() 함수

     

    R Programming
    %%R
    DT          <- data.table(emp)
    
    DT[order(-sal) ,  `:=`( nth_top= nth(sal,2,order_by=sal), nth_bottom= nth(sal,2,order_by=desc(sal)) ), by = deptno][order(deptno,sal),]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno nth_top nth_bottom
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    2450       2450
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    2450       2450
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    2450       2450
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20    1100       3000
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1100       3000
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    1100       3000
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    1100       3000
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20    1100       3000
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30    1250       1600
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30    1250       1600
    11:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30    1250       1600
    12:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30    1250       1600
    13:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30    1250       1600
    14:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30    1250       1600

     


    dplyr::nth() 함수

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(sal), `:=` ( nth_top_0  = dplyr::nth(sal,2, order_by = sal ),            # 첫 번째 정렬 옵션(order(sal)) 필요 없음
                          nth_top_1  = na.omit(unlist(.SD))[2]),                      # 첫 번째 정렬 옵션(order(sal)) 필요
                          by = deptno,
                          .SDcols = c("sal")][order(deptno,sal),]
    
    DT[order(-sal), `:=` ( nth_bottom_0  = dplyr::nth(sal,2, order_by = desc(sal) ),  # 첫 번째 정렬 옵션(order(-sal)) 필요 없음
                           nth_bottom_1  = na.omit(unlist(.SD))[2]),                  # 첫 번째 정렬 옵션(order(-sal)) 필요
                           by = deptno,
                           .SDcols = c("sal")][order(deptno,sal),]
    
    DT[,c('hiredate','job','mgr','ename') := NULL][1:10, ]

     

    Results
        empno  sal comm deptno nth_top_0 nth_top_1 nth_bottom_0 nth_bottom_1
     1:  7369  800   NA     20      1100      1100         3000         3000
     2:  7499 1600  300     30      1250      1250         1600         1600
     3:  7521 1250  500     30      1250      1250         1600         1600
     4:  7566 2975   NA     20      1100      1100         3000         3000
     5:  7654 1250 1400     30      1250      1250         1600         1600
     6:  7698 2850   NA     30      1250      1250         1600         1600
     7:  7782 2450   NA     10      2450      2450         2450         2450
     8:  7788 3000   NA     20      1100      1100         3000         3000
     9:  7839 5000   NA     10      2450      2450         2450         2450
    10:  7844 1500    0     30      1250      1250         1600         1600

     


    dplyr::nth() 함수

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    
    DT[ order(sal), `:=` (nth_top_0  = dplyr::nth(sal,2, order_by = sal ),
                          nth_top_1  = sal[2],
                          nth_top_2  = na.omit(unlist(.SD))[2],
                          fir_sal    = first(sal), 
                          max_sal    = max(sal),
                          min_sal    = min(sal), 
                          last_sal   = last(sal)),by=deptno,.SDcols = c("sal")][order(deptno,sal),][,c('hiredate','job','mgr','ename','comm','empno') := NULL][1:10, ]

     

    Results
         sal deptno nth_top_0 nth_top_1 nth_top_2 fir_sal max_sal min_sal last_sal
     1: 1300     10      2450      2450      2450    1300    5000    1300     5000
     2: 2450     10      2450      2450      2450    1300    5000    1300     5000
     3: 5000     10      2450      2450      2450    1300    5000    1300     5000
     4:  800     20      1100      1100      1100     800    3000     800     3000
     5: 1100     20      1100      1100      1100     800    3000     800     3000
     6: 2975     20      1100      1100      1100     800    3000     800     3000
     7: 3000     20      1100      1100      1100     800    3000     800     3000
     8: 3000     20      1100      1100      1100     800    3000     800     3000
     9:  950     30      1250      1250      1250     950    2850     950     2850
    10: 1250     30      1250      1250      1250     950    2850     950     2850

     


    8. Python Duckdb의 SQL

     

    Python Programming
    %%sql
      SELECT empno,
             ename,
             deptno,
             sal,
             nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,
             NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom
      FROM   emp
      order by deptno, sal
      LIMIT 6

     

    Python Programming
     duckdb.sql(" SELECT empno,                                                                                                                                 \
                         ename,                                                                                                                                 \
                         deptno,                                                                                                                                \
                         sal,                                                                                                                                   \
                         nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,   \
                         NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom \
                  FROM   emp                                                                                                                                    \
                  order by deptno, sal                                                                                                                          \
                  LIMIT 6 ").df()

     

    Results
       empno   ename  deptno   sal  nth_top  nth_bottom
    0   7934  MILLER      10  1300     2450        2450
    1   7782   CLARK      10  2450     2450        2450
    2   7839    KING      10  5000     2450        2450
    3   7369   SMITH      20   800     1100        3000
    4   7876   ADAMS      20  1100     1100        3000
    5   7566   JONES      20  2975     1100        3000

     

     


    https://unsplash.com/photos/qodjMu0byZ8

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글