NTH_VALUE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

포스팅 목차

[ NTH_VALUE Oracle Function ]

NTH_VALUE 함수는 ‘analytic_clause’로 정의된 윈도우(그룹)에서 n번째 행의 measure_expr 값을 반환한다. 반환된 값은 ‘measure_expr’의 데이터 유형을 가진다.

함수 설명 :

1. Oracle(오라클)

NTH_VALUE() 함수

Oracle Programming

SELECT empno,
       ename,
       deptno,
       sal,
       nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,
       NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom
FROM   emp
order by deptno, sal;

Results

EMPNO	ENAME	DEPTNO	SAL	NTH_TOP	NTH_BOTTOM
------------------------------------------------------
7934	MILLER	10	1300	2450	2450
7782	CLARK	10	2450	2450	2450
7839	KING	10	5000	2450	2450
7369	SMITH	20	800	1100	3000
7876	ADAMS	20	1100	1100	3000
7566	JONES	20	2975	1100	3000
7788	SCOTT	20	3000	1100	3000
7902	FORD	20	3000	1100	3000
7900	JAMES	30	950	1250	1600
7654	MARTIN	30	1250	1250	1600
7521	WARD	30	1250	1250	1600
7844	TURNER	30	1500	1250	1600
7499	ALLEN	30	1600	1250	1600
7698	BLAKE	30	2850	1250	1600

2. Python Pandas(파이썬)

transform() 함수와 'nth' 인수

개별 부서 내에서 최소 급여 값과 최대 급여 값을 반환한다. 급여를 오름차순과 내림차순으로 각각 정렬 후 첫 번째 값을 취한다.

Python Programming

withmooc = emp.copy()
withmooc['nth_top']    = emp.sort_values('sal', ascending=True).groupby('deptno')['sal'].transform('nth',1)
withmooc['nth_bottom'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform('nth',1)
withmooc.sort_values(['deptno','sal']).head(10)

Results

	empno	ename	job		mgr	hiredate	sal	comm	deptno	nth_top	nth_bottom
13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450	2450
6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450	2450
8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450	2450
0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1100	3000
10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1100	3000
3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	1100	3000
7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	1100	3000
12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	1100	3000
11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	1250	1600
2	7521	WARD	SALESMAN	7698.0	1981/02/22	1250	500.0	30	1250	1600

transform() 함수와 iloc

Python Programming

withmooc = emp.copy()
withmooc['nth_top']    = emp.sort_values('sal', ascending=True).groupby('deptno')['sal'].transform(lambda x:x.iloc[1])
withmooc['nth_bottom'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform(lambda x:x.iloc[1])
withmooc.sort_values(['deptno','sal']).head()

Results

	empno	ename	job		mgr	hiredate	sal	comm	deptno	nth_top	nth_bottom
13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450	2450
6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450	2450
8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450	2450
0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1100	3000
10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1100	3000

transform() 함수와 'nth' 인수

개별 부서의 급여 최솟값과 최댓값을 먼저 선택 후 원본 테이블에 해당 그룹의 선택한 값을 mapping 한다.

Python Programming

withmooc = emp.copy()

group_top = withmooc.sort_values('sal',ascending = True).groupby('deptno')['sal'].nth(1)
withmooc['nth_top'] = withmooc['deptno'].map(group_top)

group_bottom = withmooc.sort_values('sal',ascending = False).groupby('deptno')['sal'].nth(1)
withmooc['nth_bottom'] = withmooc['deptno'].map(group_bottom)

withmooc.sort_values(['deptno','sal']).head()

Results

	empno	ename	job		mgr	hiredate	sal	comm	deptno	nth_top	nth_bottom
13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2450	2450
6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2450	2450
8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	2450	2450
0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1100	3000
10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1100	3000

3. R Programming (R Package)

dplyr::nth() 함수

개별 부서 내에서 상위 2번째와 하위 2번째 급여 값을 반환한다. 급여를 오름차순과 내림차순으로 각각 정렬 후 두 번째 값을 취한다.

R Programming

%%R

library(plyr)   # dplyr 보다 먼저 로딩
library(dplyr)

withmooc <- emp

withmooc <- plyr::ddply(withmooc,.(deptno),transform,nth_top    = dplyr::nth(sal,2, order_by = sal ))
withmooc <- plyr::ddply(withmooc,.(deptno),transform,nth_bottom = dplyr::nth(sal,2, order_by = desc(sal) ))

withmooc[order(withmooc$deptno,withmooc$sal), ]

Results

   empno  ename       job  mgr   hiredate  sal comm deptno nth_top nth_bottom
3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    2450       2450
1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    2450       2450
2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    2450       2450
4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20    1100       3000
7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1100       3000
5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    1100       3000
6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    1100       3000
8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20    1100       3000
14  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30    1250       1600
10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30    1250       1600

dplyr::nth() 함수

R Programming

%%R
withmooc <- emp[order(emp$sal),]
withmooc['nth_top'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { dplyr::nth(x,2) }))
#withmooc['nth_top'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { x[2] }))

withmooc <- withmooc[order(-withmooc$sal),]
withmooc['nth_bottom'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { dplyr::nth(x,2) }))
#withmooc['nth_bottom'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { x[2] }))

withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

Results

# A tibble: 10 x 10
   empno ename  job         mgr hiredate     sal  comm deptno nth_top nth_bottom
   <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>      <dbl>
 1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    2450       2450
 2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2450       2450
 3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    2450       2450
 4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    1100       3000
 5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1100       3000
 6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    1100       3000
 7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    1100       3000
 8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    1100       3000
 9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1250       1600
10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1250       1600

dplyr::nth() 함수

R Programming

%%R

withmooc <- emp[order(emp$sal),]

withmooc['nth_top'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  dplyr::nth(x,2) } )
# withmooc['nth_top'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  x[2] } )

withmooc <- withmooc[order(-withmooc$sal),]
withmooc['nth_bottom'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  dplyr::nth(x,2) } )
# withmooc['nth_bottom'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  x[2] } )

withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

Results

# A tibble: 10 x 10
   empno ename  job         mgr hiredate     sal  comm deptno nth_top nth_bottom
   <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>      <dbl>
 1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    2450       2450
 2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2450       2450
 3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    2450       2450
 4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    1100       3000
 5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1100       3000
 6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    1100       3000
 7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    1100       3000
 8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    1100       3000
 9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1250       1600
10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1250       1600

nth() 함수

개별 부서 내에서 하위 2번째 급여 값을 반환한다.

R Programming

%%R
# split 되는 그룹내의 관측치 순서 보존을 위하여 데이터 사전 정렬 필요
withmooc <- emp[order(emp$deptno, emp$sal),]

Nth_var <- with(withmooc, purrr::map_chr(base::split(withmooc,withmooc$deptno), ~nth(.x$sal,2)) ) 

withmooc1 <- data.frame(withmooc, nth_top =rep(Nth_var, table(withmooc$deptno)))

withmooc1[order(withmooc1$deptno,withmooc1$sal),][1:10, ]

Results

   empno  ename       job  mgr   hiredate  sal comm deptno     nth_top
1   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10 2450.000000
2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10 2450.000000
3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10 2450.000000
4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20 1100.000000
5   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20 1100.000000
6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 1100.000000
7   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20 1100.000000
8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20 1100.000000
9   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30 1250.000000
10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 1250.000000

tapply() 함수

R Programming

%%R

withmooc <- emp[order(emp$deptno, emp$sal),]

First_var <- with(withmooc, tapply(sal, deptno, nth, 2))   # head, 1

data.frame(withmooc, nth_top =rep(Nth_var, table(withmooc$deptno)))[1:10, ]

# with(withmooc[order(withmooc$sal),],  data.frame(emp, nth_top=rep(tapply(sal, denptno, nth, 2), table(job))) )

Results

   empno  ename       job  mgr   hiredate  sal comm deptno     nth_top
1   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10 2450.000000
2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10 2450.000000
3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10 2450.000000
4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20 1100.000000
5   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20 1100.000000
6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 1100.000000
7   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20 1100.000000
8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20 1100.000000
9   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30 1250.000000
10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 1250.000000

nth() 함수

R Programming

%%R

withmooc <- emp[order(emp$sal),]

mapping    <- t( purrr::map_dfc( split(withmooc,withmooc$deptno), ~nth(.x$sal,2) ) )

trans_data <- melt(mapping, value.name="nth_top")

merge(emp,trans_data,by.x='deptno',by.y='Var1')[1:10, ]

Results

   deptno empno  ename       job  mgr   hiredate  sal comm Var2 nth_top
1      10  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA    1    2450
2      10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA    1    2450
3      10  7934 MILLER     CLERK 7782 1982-01-23 1300   NA    1    2450
4      20  7369  SMITH     CLERK 7902 1980-12-17  800   NA    1    1100
5      20  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA    1    1100
6      20  7566  JONES   MANAGER 7839 1981-04-02 2975   NA    1    1100
7      20  7902   FORD   ANALYST 7566 1981-12-03 3000   NA    1    1100
8      20  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA    1    1100
9      30  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300    1    1250
10     30  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA    1    1250

4. R Dplyr Package

dplyr::nth() 함수 구조 확인

R Programming

%%R

dplyr::nth

Results - dplyr::nth 함수 구조

function (x, n, order_by = NULL, default = default_missing(x)) 
{
    if (length(n) != 1 || !is.numeric(n)) {
        abort("`n` must be a single integer.")
    }
    n <- trunc(n)
    if (n == 0 || n > length(x) || n < -length(x)) {
        return(default)
    }
    if (n < 0) {
        n <- length(x) + n + 1
    }
    if (is.null(order_by)) {
        x[[n]]
    }
    else {
        x[[order(order_by)[[n]]]]
    }
}
<bytecode: 0x00000218d09751f0>
<environment: namespace:dplyr>

dplyr::nth() 함수

R Programming

%%R

emp %>%
  dplyr::group_by(deptno) %>%
  dplyr::mutate( nth_top    = nth(sal, 2, order_by = sal),
                 nth_bottom = nth(sal, 2, order_by = desc(sal))) %>%
  dplyr::arrange(deptno,sal) %>%
  head(10)

Results

# A tibble: 10 x 10
# Groups:   deptno [3]
   empno ename  job         mgr hiredate     sal  comm deptno nth_top nth_bottom
   <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>      <dbl>
 1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    2450       2450
 2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2450       2450
 3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10    2450       2450
 4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20    1100       3000
 5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1100       3000
 6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    1100       3000
 7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    1100       3000
 8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    1100       3000
 9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1250       1600
10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1250       1600

[참고]

row_number()

R Programming

%%R

emp %>%
 dplyr::group_by(deptno) %>%
 dplyr::arrange(deptno,sal) %>%
 filter(row_number() == 2)

Results

# A tibble: 3 x 8
# Groups:   deptno [3]
  empno ename job        mgr hiredate     sal  comm deptno
  <dbl> <chr> <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
1  7782 CLARK MANAGER   7839 1981-01-09  2450    NA     10
2  7876 ADAMS CLERK     7788 1983-01-12  1100    NA     20
3  7521 WARD  SALESMAN  7698 1981-02-22  1250   500     30

5. R sqldf Package

nth_value() 함수

R Programming

%%R

sqldf(" SELECT empno,
               ename,
               deptno,
               sal,
               nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,
               NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom
        FROM   emp
        order by deptno, sal")

Results

   empno  ename deptno  sal nth_top nth_bottom
1   7934 MILLER     10 1300    2450       2450
2   7782  CLARK     10 2450    2450       2450
3   7839   KING     10 5000    2450       2450
4   7369  SMITH     20  800    1100       3000
5   7876  ADAMS     20 1100    1100       3000
6   7566  JONES     20 2975    1100       3000
7   7788  SCOTT     20 3000    1100       3000
8   7902   FORD     20 3000    1100       3000
9   7900  JAMES     30  950    1250       1600
10  7521   WARD     30 1250    1250       1600
11  7654 MARTIN     30 1250    1250       1600
12  7844 TURNER     30 1500    1250       1600
13  7499  ALLEN     30 1600    1250       1600
14  7698  BLAKE     30 2850    1250       1600

6. Python pandasql Package

nth_value() 함수

Python Programming

ps.sqldf(" SELECT empno,     \
               ename,        \
               deptno,       \
               sal,          \
               nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,     \
               NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom     \
        FROM   emp     \
        order by deptno, sal   ").head(10)

Results

	empno	ename	deptno	sal	nth_top	nth_bottom
0	7934	MILLER	10	1300	2450	2450
1	7782	CLARK	10	2450	2450	2450
2	7839	KING	10	5000	2450	2450
3	7369	SMITH	20	800	1100	3000
4	7876	ADAMS	20	1100	1100	3000
5	7566	JONES	20	2975	1100	3000
6	7788	SCOTT	20	3000	1100	3000
7	7902	FORD	20	3000	1100	3000
8	7900	JAMES	30	950	1250	1600
9	7521	WARD	30	1250	1250	1600

7. R data.table Package

nth() 함수

R Programming

%%R
DT          <- data.table(emp)

DT[order(-sal) ,  `:=`( nth_top= nth(sal,2,order_by=sal), nth_bottom= nth(sal,2,order_by=desc(sal)) ), by = deptno][order(deptno,sal),]

Results

    empno  ename       job  mgr   hiredate  sal comm deptno nth_top nth_bottom
 1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10    2450       2450
 2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10    2450       2450
 3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10    2450       2450
 4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20    1100       3000
 5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1100       3000
 6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    1100       3000
 7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    1100       3000
 8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20    1100       3000
 9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30    1250       1600
10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30    1250       1600
11:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30    1250       1600
12:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30    1250       1600
13:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30    1250       1600
14:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30    1250       1600

dplyr::nth() 함수

R Programming

%%R

DT          <- data.table(emp)
dept_DT     <- data.table(dept)

DT[order(sal), `:=` ( nth_top_0  = dplyr::nth(sal,2, order_by = sal ),            # 첫 번째 정렬 옵션(order(sal)) 필요 없음
                      nth_top_1  = na.omit(unlist(.SD))[2]),                      # 첫 번째 정렬 옵션(order(sal)) 필요
                      by = deptno,
                      .SDcols = c("sal")][order(deptno,sal),]

DT[order(-sal), `:=` ( nth_bottom_0  = dplyr::nth(sal,2, order_by = desc(sal) ),  # 첫 번째 정렬 옵션(order(-sal)) 필요 없음
                       nth_bottom_1  = na.omit(unlist(.SD))[2]),                  # 첫 번째 정렬 옵션(order(-sal)) 필요
                       by = deptno,
                       .SDcols = c("sal")][order(deptno,sal),]

DT[,c('hiredate','job','mgr','ename') := NULL][1:10, ]

Results

    empno  sal comm deptno nth_top_0 nth_top_1 nth_bottom_0 nth_bottom_1
 1:  7369  800   NA     20      1100      1100         3000         3000
 2:  7499 1600  300     30      1250      1250         1600         1600
 3:  7521 1250  500     30      1250      1250         1600         1600
 4:  7566 2975   NA     20      1100      1100         3000         3000
 5:  7654 1250 1400     30      1250      1250         1600         1600
 6:  7698 2850   NA     30      1250      1250         1600         1600
 7:  7782 2450   NA     10      2450      2450         2450         2450
 8:  7788 3000   NA     20      1100      1100         3000         3000
 9:  7839 5000   NA     10      2450      2450         2450         2450
10:  7844 1500    0     30      1250      1250         1600         1600

dplyr::nth() 함수

R Programming

%%R

DT          <- data.table(emp)

DT[ order(sal), `:=` (nth_top_0  = dplyr::nth(sal,2, order_by = sal ),
                      nth_top_1  = sal[2],
                      nth_top_2  = na.omit(unlist(.SD))[2],
                      fir_sal    = first(sal), 
                      max_sal    = max(sal),
                      min_sal    = min(sal), 
                      last_sal   = last(sal)),by=deptno,.SDcols = c("sal")][order(deptno,sal),][,c('hiredate','job','mgr','ename','comm','empno') := NULL][1:10, ]

Results

     sal deptno nth_top_0 nth_top_1 nth_top_2 fir_sal max_sal min_sal last_sal
 1: 1300     10      2450      2450      2450    1300    5000    1300     5000
 2: 2450     10      2450      2450      2450    1300    5000    1300     5000
 3: 5000     10      2450      2450      2450    1300    5000    1300     5000
 4:  800     20      1100      1100      1100     800    3000     800     3000
 5: 1100     20      1100      1100      1100     800    3000     800     3000
 6: 2975     20      1100      1100      1100     800    3000     800     3000
 7: 3000     20      1100      1100      1100     800    3000     800     3000
 8: 3000     20      1100      1100      1100     800    3000     800     3000
 9:  950     30      1250      1250      1250     950    2850     950     2850
10: 1250     30      1250      1250      1250     950    2850     950     2850

8. Python Duckdb의 SQL

Python Programming

%%sql
  SELECT empno,
         ename,
         deptno,
         sal,
         nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,
         NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom
  FROM   emp
  order by deptno, sal
  LIMIT 6

Python Programming

 duckdb.sql(" SELECT empno,                                                                                                                                 \
                     ename,                                                                                                                                 \
                     deptno,                                                                                                                                \
                     sal,                                                                                                                                   \
                     nth_value(sal, 2) over (PARTITION BY deptno ORDER BY sal      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_top,   \
                     NTH_VALUE(sal, 2) over (PARTITION BY deptno ORDER BY sal desc RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS nth_bottom \
              FROM   emp                                                                                                                                    \
              order by deptno, sal                                                                                                                          \
              LIMIT 6 ").df()

Results

   empno   ename  deptno   sal  nth_top  nth_bottom
0   7934  MILLER      10  1300     2450        2450
1   7782   CLARK      10  2450     2450        2450
2   7839    KING      10  5000     2450        2450
3   7369   SMITH      20   800     1100        3000
4   7876   ADAMS      20  1100     1100        3000
5   7566   JONES      20  2975     1100        3000

--------------------------------------------

[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크

저작자표시 비영리 변경금지

'통계프로그램 비교 시리즈 > 오라클함수 비교(R & Python)' 카테고리의 다른 글

오라클 함수와 R & 파이썬 비교 목록 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] (0)	2022.01.05
LISTAGG 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2022.01.04
VARIANCE / VAR_SAMP 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2022.01.04
PERCENTILE_CONT 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB] (0)	2022.01.03
TO_Time 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] (0)	2022.01.03

NTH_VALUE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

[ NTH_VALUE Oracle Function ]

1. Oracle(오라클)

NTH_VALUE() 함수

2. Python Pandas(파이썬)

transform() 함수와 'nth' 인수

transform() 함수와 iloc

transform() 함수와 'nth' 인수

3. R Programming (R Package)

dplyr::nth() 함수

dplyr::nth() 함수

dplyr::nth() 함수

nth() 함수

tapply() 함수

nth() 함수

4. R Dplyr Package

dplyr::nth() 함수 구조 확인

dplyr::nth() 함수

[참고]

5. R sqldf Package

nth_value() 함수

6. Python pandasql Package

nth_value() 함수

7. R data.table Package

nth() 함수

dplyr::nth() 함수

dplyr::nth() 함수

8. Python Duckdb의 SQL

'통계프로그램 비교 시리즈 > 오라클함수 비교(R & Python)' 카테고리의 다른 글

댓글

티스토리툴바

NTH_VALUE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

[ NTH_VALUE Oracle Function ]

1. Oracle(오라클)

NTH_VALUE() 함수

2. Python Pandas(파이썬)

transform() 함수와 'nth' 인수

transform() 함수와 iloc

transform() 함수와 'nth' 인수

3. R Programming (R Package)

dplyr::nth() 함수

dplyr::nth() 함수

dplyr::nth() 함수

nth() 함수

tapply() 함수

nth() 함수

4. R Dplyr Package

dplyr::nth() 함수 구조 확인

dplyr::nth() 함수

[참고]

5. R sqldf Package

nth_value() 함수

6. Python pandasql Package

nth_value() 함수

7. R data.table Package

nth() 함수

dplyr::nth() 함수

dplyr::nth() 함수

8. Python Duckdb의 SQL

'통계프로그램 비교 시리즈 > 오라클함수 비교(R & Python)' 카테고리의 다른 글

관련글

댓글

티스토리툴바