본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

NTILE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 12. 6.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ NTILE Oracle Function ]

     


    NTILE 함수는 분석 함수이다. 순서화된 데이터를 파티션 별로 expr에 의해 지정된 bucket의 수로 분할하여, 각 행을 적절한 bucket 번호로 할당한다. buckets은 1~expr의 번호를 붙일 수 있다. expr의 값은 각 파티션에 대하여 양의 정수로 변활 될 필요가 있다. 오라클 데이터 베이스는 정수로 간주하기 위해, 만약 expr이 정수가 아닌 상수라면, 오라클은 정수로 값을 절삭한다. 반환되는 값은 NUMBER이다.

     

     


    1. Oracle(오라클)

     

    Oracle Programming
    SELECT EMPNO,
           DEPTNO,
           ENAME,SAL,
           NTILE(3) OVER (ORDER BY SAL DESC) GROUPING,
           RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL) GROUP_RANK
    FROM   EMP
    ORDER  BY DEPTNO;

     

    EMPNO	DEPTNO	ENAME	SAL	GROUPING	GROUP_RANK
    ----------------------------------------------------------
    7934	10	MILLER	1300	2		1
    7782	10	CLARK	2450	2		2
    7839	10	KING	5000	1		3
    7369	20	SMITH	800	3		1
    7876	20	ADAMS	1100	3		2
    7566	20	JONES	2975	1		3
    7788	20	SCOTT	3000	1		4
    7902	20	FORD	3000	1		4
    7900	30	JAMES	950	3		1
    7654	30	MARTIN	1250	2		2
    7521	30	WARD	1250	3		2
    7844	30	TURNER	1500	2		4
    7499	30	ALLEN	1600	2		5
    7698	30	BLAKE	2850	1		6

     


    2. Python Pandas(파이썬)

     

    • 개별 부서 내에 속한 사원을 3개의 bucket으로 구분한다.
    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['sal_ntile'] = withmooc.groupby('deptno')['sal'].transform(lambda x: pd.qcut(x,3, labels=range(1,4)))
    
    withmooc.sort_values(['deptno','sal'])

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_ntile
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	3
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	2
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	3
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	3
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	1
    2	7521	WARD	SALESMAN	7698.0	1981/02/22	1250	500.0	30	1
    4	7654	MARTIN	SALESMAN	7698.0	1981/09/28	1250	1400.0	30	1
    9	7844	TURNER	SALESMAN	7698.0	1981/09/08	1500	0.0	30	2
    1	7499	ALLEN	SALESMAN	7698.0	1981/02/20	1600	300.0	30	3
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	3

     


    • [참고] sal_rank : PARTITION 내에 속한 사원을 급여를 기준으로 순위를 부여한다.
    • sal_ntile : 순서는 고려 안 하고 PARTITION 내에 속한 사원을 3개의 그룹으로 그룹화
    Python Programming
    withmooc = copy.copy(emp)
    
    withmooc['sal_rank']  = withmooc.sort_values('sal',ascending = True).groupby('deptno')['sal'].transform(lambda x: x.rank(method='first',ascending=True) )
    withmooc['sal_ntile'] = withmooc.sort_values('sal',ascending = True).groupby('deptno')['sal'].transform(lambda x: x.rank(method='first',ascending=True) % 3 + 1)
    
    withmooc.sort_values(['deptno','sal']).head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_rank	sal_ntile
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1	2
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2	3
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	3	1
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	1	2
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	2	3
     

     


    3. R Programming (R Package)

     

    • 좀 더 다양한 윈도우 함수 처리 방법은 first_value 함수 참조
    R Programming
    %%R
    withmooc <- emp
    withmooc['sal_ntile'] = with(withmooc, ave(sal, deptno, FUN =function(x) { ntile(x,3) }))  # FUN =function(x) { median(x) }
    
    withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_ntile
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10         1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10         2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10         3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20         1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20         1
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20         2
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20         2
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20         3
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30         1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30         1

     


    RANK 함수

    • : 순서는 고려 안 하고 PARTITION 내에서 그룹화
    R Programming
    %%R
    withmooc <- emp
    withmooc['sal_rank']  = with(withmooc, ave(sal, deptno, FUN =function(x) { rank(x, ties.method = "first", na.last = "keep") }))
    withmooc['sal_ntile'] = with(withmooc, ave(sal, deptno, FUN =function(x) { rank(x, ties.method = "first", na.last = "keep") %% 3 + 1 }))  # FUN =function(x) { median(x) }
    
    withmooc[order(withmooc$deptno,withmooc$sal),]

     

    Results
    # A tibble: 14 x 10
       empno ename  job         mgr hiredate     sal  comm deptno sal_rank sal_ntile
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>    <dbl>     <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        1         2
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        2         3
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        3         1
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1         2
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        2         3
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        3         1
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20        4         2
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        5         3
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30        1         2
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30        2         3
    11  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30        3         1
    12  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30        4         2
    13  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30        5         3
    14  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30        6         1

     


    Hmisc::cut2

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['sal_ntile'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) {  Hmisc::cut2(x, g=3) })    # FUN = function (x) {  min(x) }
    
    withmooc[order(withmooc$deptno,withmooc$sal),]

     

    Results
    # A tibble: 14 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_ntile
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10         1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10         2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10         3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20         1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20         1
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20         2
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20         2
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20         2
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30         1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30         1
    11  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30         1
    12  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30         2
    13  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30         3
    14  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30         3

     


    • rnorm 임의의 난수를 생성하여 PARTITION 내에 속한 사원을 3개의 그룹(bucket)으로 구분한다. 그룹을 표시하는 이름은 임의의 난수로 표시된다.
    R Programming
    %%R
    
    withmooc <- emp
    
    set.seed(123)
    withmooc['sal_ntile'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) { round(rnorm(3) * 100) })    # FUN = function (x) {  min(x) }
    
    withmooc[order(withmooc$deptno,withmooc$sal),]

     

    Results
    # A tibble: 14 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_ntile
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10       156
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10       -56
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10       -23
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20         7
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20         7
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        13
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20       172
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        13
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30       -69
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30      -127
    11  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30       -69
    12  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30      -127
    13  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30        46
    14  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30        46

     


    dplyr::ntile() 함수

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['sal_ntile'] = ave(x = withmooc$sal, withmooc$deptno, FUN = function (x) { dplyr::ntile(x,3) })    # FUN = function (x) {  min(x) }
    
    withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno sal_ntile
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10         1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10         2
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10         3
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20         1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20         1
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20         2
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20         2
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20         3
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30         1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30         1

     


    dplyr::ntile() 함수

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    withmooc <- plyr::ddply(emp,.(deptno),transform,sal_ntile = dplyr::ntile(sal,3))
    
    withmooc[order(withmooc$deptno,withmooc$sal),]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno sal_ntile
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10         1
    1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10         2
    2   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10         3
    4   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20         1
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20         1
    5   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20         2
    6   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20         2
    8   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20         3
    14  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30         1
    10  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30         1
    11  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30         2
    13  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30         2
    9   7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30         3
    12  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30         3

     

     

     

     


    4. R Dplyr Package

     

    dplyr::ntile 함수 로직

    R Programming
    %%R
    
    dplyr::ntile

     

    dplyr::ntile 함수 로직
    function (x = row_number(), n) 
    {
        if (!missing(x)) {
            x <- row_number(x)
        }
        len <- length(x) - sum(is.na(x))
        n <- as.integer(floor(n))
        if (len == 0L) {
            rep(NA_integer_, length(x))
        }
        else {
            n_larger <- as.integer(len%%n)
            n_smaller <- as.integer(n - n_larger)
            size <- len/n
            larger_size <- as.integer(ceiling(size))
            smaller_size <- as.integer(floor(size))
            larger_threshold <- larger_size * n_larger
            bins <- if_else(x <= larger_threshold, (x + (larger_size - 
                1L))/larger_size, (x + (-larger_threshold + smaller_size - 
                1L))/smaller_size + n_larger)
            as.integer(floor(bins))
        }
    }
    <bytecode: 0x000001ac1d364cd8>
    <environment: namespace:dplyr>

     


    dplyr::ntile() 함수

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(deptno) %>%
      dplyr::mutate(sal_ntile = dplyr::ntile(sal,3)) %>%
      dplyr::arrange(deptno,sal) %>%
      head()

     

    Results
    # A tibble: 6 x 9
    # Groups:   deptno [2]
      empno ename  job         mgr hiredate     sal  comm deptno sal_ntile
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <int>
    1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10         1
    2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10         2
    3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10         3
    4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20         1
    5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20         1
    6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20         2

     


    findInterval() 함수

    R Programming
    %%R
    
    emp %>%
      group_by(deptno) %>% 
      dplyr::mutate(Quantile = findInterval(sal, quantile(sal, probs=0:4/4))) %>%
      dplyr::arrange(deptno,sal)

     

    Results
    # A tibble: 14 x 9
    # Groups:   deptno [3]
       empno ename  job         mgr hiredate     sal  comm deptno Quantile
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>    <int>
     1  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10        1
     2  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10        3
     3  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10        5
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        1
     5  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20        2
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20        3
     7  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20        5
     8  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20        5
     9  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30        1
    10  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30        2
    11  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30        2
    12  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30        3
    13  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30        4
    14  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30        5

     

     


    5. R sqldf Package

     

    Ntile() 함수

    R Programming
    %%R
    
    sqldf(" SELECT  *, 
                    NTILE(3) over (PARTITION BY DEPTNO ORDER BY (sal) DESC) sal_ntile
            FROM emp ")

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno sal_ntile
    1   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10         1
    2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10         2
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10         3
    4   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20         1
    5   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20         1
    6   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20         2
    7   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20         2
    8   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20         3
    9   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30         1
    10  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30         1
    11  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30         2
    12  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30         2
    13  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30         3
    14  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30         3

     

     


    6. Python pandasql Package

     

    Ntile() 함수

    Python Programming
    ps.sqldf(" SELECT  *,    \
                       NTILE(2) over (PARTITION BY DEPTNO ORDER BY (sal) DESC) sal_ntile    \
               FROM emp ").head()

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_ntile
    0	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	1
    1	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	1
    2	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	2
    3	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	1
    4	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	1

     

     


    7. R data.table Package

     

    dplyr::ntile() 함수

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(-sal), Ntile := dplyr::ntile(sal, 3), by=deptno][order(deptno, sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno Ntile
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10     1
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10     2
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10     3
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20     1
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20     1
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20     2
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20     2
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20     3
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30     1
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30     1

     


    hutils::mutate_ntile() 함수

    R Programming
    %%R
    library(hutils)
    DT          <- data.table(emp)
    
    hutils::mutate_ntile(DT, "sal", n = 3, by = "deptno")[order(deptno, sal),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno salTerciles
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10           1
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10           2
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10           3
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20           1
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20           1
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20           2
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20           2
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20           3
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30           1
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30           1

     


    Hmisc::cut2() 함수

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(-sal), Ntile := Hmisc::cut2(sal, g=3), by=deptno][order(deptno, sal),]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno       Ntile
     1:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10        1300
     2:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10        2450
     3:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        5000
     4:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20 [ 800,2975)
     5:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20 [ 800,2975)
     6:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 [2975, Inf]
     7:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20 [2975, Inf]
     8:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20 [2975, Inf]
     9:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30 [ 950,1500)
    10:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 [ 950,1500)
    11:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30 [ 950,1500)
    12:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30        1500
    13:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30 [1600,2850]
    14:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30 [1600,2850]

     


    8. Python Duckdb의 SQL

     

    Python Programming
    %%sql
      SELECT *,
             NTILE(3) over (PARTITION BY DEPTNO ORDER BY (sal) DESC) sal_ntile
      FROM   emp
      ORDER  BY DEPTNO, SAL DESC
      LIMIT  7

     

    Python Programming
    duckdb.sql(" SELECT DEPTNO, EMPNO, ENAME, SAL,                                          \
                        NTILE(3) over (PARTITION BY DEPTNO ORDER BY (sal) DESC) sal_ntile   \
                 FROM   emp                                                                 \
                 ORDER  BY DEPTNO, SAL DESC                                                 \
                 LIMIT  7 ").df()

     

    Results
       deptno  empno   ename   sal  sal_ntile
    0      10   7839    KING  5000          1
    1      10   7782   CLARK  2450          2
    2      10   7934  MILLER  1300          3
    3      20   7788   SCOTT  3000          1
    4      20   7902    FORD  3000          1
    5      20   7566   JONES  2975          2
    6      20   7876   ADAMS  1100          2

     


    Vienna, Austria (https://unsplash.com/photos/E40JUGMOAAE)

      --------------------------------------------  

     

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글