본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

FIRST_VALUE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 11. 24.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크

    [ FIRST_VALUE Oracle Function ]

     


    FIRST_VALUE함수는 분석 함수이다. 값의 정렬된 세트에서 첫 번째 값을 반환한다. 만약 집합 내에서 첫 번째 값이 NULL이라면, IGNORE NULLS를 지정하지 않은 한 NULL을 반환한다. 이 설정은 데이터 조밀화에 유용하다. 만약 IGNORE NULLS을 지정하면, FIRST_VALUE함수는 집합 내에서 최초 NON_NULL값을 반환하고, 만약 모든 값이 NULL이라면 NULL을 반환한다.

     

     


    1. Oracle(오라클)

     

    Oracle Programming
    SELECT ENAME,
           JOB,
           SAL,
           FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY 
    FROM   emp

     

    Results
    ENAME	JOB		SAL	FIRST_SALARY
    -------------------------------------
    FORD	ANALYST		3000	FORD
    SCOTT	ANALYST		3000	FORD
    MILLER	CLERK		1300	MILLER
    ADAMS	CLERK		1100	MILLER
    JAMES	CLERK		950	MILLER
    SMITH	CLERK		800	MILLER
    JONES	MANAGER		2975	JONES
    BLAKE	MANAGER		2850	JONES
    CLARK	MANAGER		2450	JONES
    KING	PRESIDENT	5000	KING
    ALLEN	SALESMAN	1600	ALLEN
    TURNER	SALESMAN	1500	ALLEN
    MARTIN	SALESMAN	1250	ALLEN
    WARD	SALESMAN	1250	ALLEN

     


    2. Python Pandas(파이썬)

     

    • [stackoverflow 참고] Pandas - group by one column, sort by another, get value from the third column [링크]
    • [stackoverflow 참고] Apply vs transform on a group object [링크]

     

    Python Programming
    withmooc = emp.copy()
    
    group_first = withmooc.sort_values('sal',ascending = False).groupby('job')['ename'].first()
    withmooc['Time'] = withmooc['job'].map(group_first)
    
    withmooc.sort_values('job').head()

     

    Results
    	empno	ename	job	mgr	hiredate	sal	comm	deptno	Time
    7	7788	SCOTT	ANALYST	7566.0	1982/12/09	3000	NaN	20	SCOTT
    12	7902	FORD	ANALYST	7566.0	1981/12/03	3000	NaN	20	SCOTT
    0	7369	SMITH	CLERK	7902.0	1980/12/17	800	NaN	20	MILLER
    10	7876	ADAMS	CLERK	7788.0	1983/01/12	1100	NaN	20	MILLER
    11	7900	JAMES	CLERK	7698.0	1981/12/03	950	NaN	30	MILLER

     


     

    Python Programming
    withmooc = emp.copy()
    withmooc['first_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform(lambda x:x.iloc[0])
    withmooc.sort_values('job').head()

     

    Results
    	empno	ename	job	mgr	hiredate	sal	comm	deptno	first_value
    7	7788	SCOTT	ANALYST	7566.0	1982/12/09	3000	NaN	20	SCOTT
    12	7902	FORD	ANALYST	7566.0	1981/12/03	3000	NaN	20	SCOTT
    0	7369	SMITH	CLERK	7902.0	1980/12/17	800	NaN	20	MILLER
    10	7876	ADAMS	CLERK	7788.0	1983/01/12	1100	NaN	20	MILLER
    11	7900	JAMES	CLERK	7698.0	1981/12/03	950	NaN	30	MILLER​

     


     

    Python Programming
    withmooc = emp.copy()
    withmooc['first_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform('first')
    withmooc.sort_values('job').head()

     

    Results
    	empno	ename	job	mgr	hiredate	sal	comm	deptno	first_value
    7	7788	SCOTT	ANALYST	7566.0	1982/12/09	3000	NaN	20	SCOTT
    12	7902	FORD	ANALYST	7566.0	1981/12/03	3000	NaN	20	SCOTT
    0	7369	SMITH	CLERK	7902.0	1980/12/17	800	NaN	20	MILLER
    10	7876	ADAMS	CLERK	7788.0	1983/01/12	1100	NaN	20	MILLER
    11	7900	JAMES	CLERK	7698.0	1981/12/03	950	NaN	30	MILLER

     


    [참고]

    • job 그룹별 ename에 대한 rank
    Python Programming
    withmooc = emp.copy()
    withmooc.sort_values('sal', ascending=False).groupby('job').apply(lambda x : x[ x['ename'].rank() ==1 ]["ename"])

     

    Results
    job          
    ANALYST    12     FORD
    CLERK      10    ADAMS
    MANAGER    5     BLAKE
    PRESIDENT  8      KING
    SALESMAN   1     ALLEN
    Name: ename, dtype: object

     


     

    Python Programming
    emp.sort_values('sal',ascending = False).groupby('job')['ename'].apply(max)

     

    Results
    job
    ANALYST      SCOTT
    CLERK        SMITH
    MANAGER      JONES
    PRESIDENT     KING
    SALESMAN      WARD
    Name: ename, dtype: object

     


    3. R Programming (R Package)

     

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    plyr::ddply(emp,.(job),transform,first_value = dplyr::first(ename, order_by = desc(sal) ))[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno first_value
    1   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       SCOTT
    2   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       SCOTT
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      MILLER
    4   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      MILLER
    5   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      MILLER
    6   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      MILLER
    7   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       JONES
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       JONES
    9   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       JONES
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        KING

     


     

    R Programming
    %%R
    withmooc <- emp[order(-emp$sal),]
    withmooc['first_value'] = with(withmooc, ave(x = ename, job, FUN =function(x) { dplyr::first(x) }))
    
    # withmooc['first_value'] = with(emp[order(-emp$sal),], ave(x = ename, job, FUN =function(x) { x[1] }))
    
    withmooc[order(withmooc$job),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno first_value
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20 SCOTT      
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20 SCOTT      
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10 MILLER     
     4  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20 MILLER     
     5  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30 MILLER     
     6  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20 MILLER     
     7  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20 JONES      
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30 JONES      
     9  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10 JONES      
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10 KING       

     


     

    R Programming
    %%R
    
    withmooc <- emp[order(-emp$sal),]
    
    withmooc['first_value'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) {  dplyr::first(x) } )
    # withmooc['first_value'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) {  x[1] } )
    
    withmooc[order(withmooc$job),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno first_value
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20 SCOTT      
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20 SCOTT      
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10 MILLER     
     4  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20 MILLER     
     5  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30 MILLER     
     6  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20 MILLER     
     7  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20 JONES      
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30 JONES      
     9  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10 JONES      
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10 KING       

     


     

    R Programming
    %%R
    
    withmooc <- emp
    
    # split 되는 그룹내의 관측치 순서 보존을 위하여 데이터와 group 데이터 양쪽 재 정렬 필요
    
    First_var <- with(withmooc, purrr::map_chr(base::split(withmooc[order(-withmooc$sal),],withmooc[order(-withmooc$sal),]$job), ~first(.x$ename)) ) 
    
    data.frame(withmooc, First_value =rep(First_var, table(withmooc$job)))[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno First_value
    1   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       SCOTT
    2   7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       SCOTT
    3   7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30      MILLER
    4   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      MILLER
    5   7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30      MILLER
    6   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      MILLER
    7   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       JONES
    8   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       JONES
    9   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       JONES
    10  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30        KING

     


     

    • [stackoverflow 참고] In R, how do I add a max by group? [duplicate] [링크]
    R Programming
    %%R
    
    withmooc <- emp
    
    First_var <- with(withmooc[order(-withmooc$sal),], tapply(ename, job, first))   # head, 1
    
    # First_var <- with(withmooc[order(-withmooc$sal),], tapply(ename, job, head, 1))
    
    data.frame(emp, First_value =rep(First_var, table(withmooc$job)))[1:10, ]
    
    # with(withmooc[order(-withmooc$sal),],  data.frame(emp, max.per.group=rep(tapply(ename, job, first), table(job))) )

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno First_value
    1   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       SCOTT
    2   7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       SCOTT
    3   7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30      MILLER
    4   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      MILLER
    5   7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30      MILLER
    6   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      MILLER
    7   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       JONES
    8   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       JONES
    9   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       JONES
    10  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30        KING

     


     

     

    R Programming
    %%R
    
    withmooc <- emp[order(-emp$sal),]
    
    mapping    <- t( purrr::map_dfc( split(withmooc,withmooc$job), ~first(.x$ename) ) )
    
    trans_data <- melt(mapping, value.name="first_ename")
    
    merge(emp,trans_data,by.x='job',by.y='Var1')[1:10, ]

     

    Results
             job empno  ename  mgr   hiredate  sal comm deptno Var2 first_ename
    1    ANALYST  7902   FORD 7566 1981-12-03 3000   NA     20    1       SCOTT
    2    ANALYST  7788  SCOTT 7566 1982-12-09 3000   NA     20    1       SCOTT
    3      CLERK  7369  SMITH 7902 1980-12-17  800   NA     20    1      MILLER
    4      CLERK  7934 MILLER 7782 1982-01-23 1300   NA     10    1      MILLER
    5      CLERK  7876  ADAMS 7788 1983-01-12 1100   NA     20    1      MILLER
    6      CLERK  7900  JAMES 7698 1981-12-03  950   NA     30    1      MILLER
    7    MANAGER  7566  JONES 7839 1981-04-02 2975   NA     20    1       JONES
    8    MANAGER  7782  CLARK 7839 1981-01-09 2450   NA     10    1       JONES
    9    MANAGER  7698  BLAKE 7839 1981-03-01 2850   NA     30    1       JONES
    10 PRESIDENT  7839   KING   NA 1981-11-17 5000   NA     10    1        KING

     


    [참고] Group By - first

    R Programming
    %%R
    
    do.call(rbind, Map(first, split(emp, emp$job)))

     

    Results
    # A tibble: 5 x 8
      empno ename job         mgr hiredate     sal  comm deptno
    * <dbl> <chr> <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7788 SCOTT ANALYST    7566 1982-12-09  3000    NA     20
    2  7369 SMITH CLERK      7902 1980-12-17   800    NA     20
    3  7566 JONES MANAGER    7839 1981-04-02  2975    NA     20
    4  7839 KING  PRESIDENT    NA 1981-11-17  5000    NA     10
    5  7499 ALLEN SALESMAN   7698 1981-02-20  1600   300     30

     


     

    R Programming
    %%R
    
    emp  %>% 
        rownames_to_column %>%
        gather(variable, value, -rowname) %>% 
        spread(variable, value) %>%
        head(10)

     

    Results
    # A tibble: 10 x 9
       rowname comm  deptno empno ename  hiredate job      mgr   sal  
       <chr>   <chr> <chr>  <chr> <chr>  <chr>    <chr>    <chr> <chr>
     1 1       <NA>  20     7369  SMITH  4003     CLERK    7902  800  
     2 10      0     30     7844  TURNER 4268     SALESMAN 7698  1500 
     3 11      <NA>  20     7876  ADAMS  4759     CLERK    7788  1100 
     4 12      <NA>  30     7900  JAMES  4354     CLERK    7698  950  
     5 13      <NA>  20     7902  FORD   4354     ANALYST  7566  3000 
     6 14      <NA>  10     7934  MILLER 4405     CLERK    7782  1300 
     7 2       300   30     7499  ALLEN  4068     SALESMAN 7698  1600 
     8 3       500   30     7521  WARD   4070     SALESMAN 7698  1250 
     9 4       <NA>  20     7566  JONES  4109     MANAGER  7839  2975 
    10 5       1400  30     7654  MARTIN 4288     SALESMAN 7698  1250 

     


     

    R Programming
    %%R
    
    purrr::map_dfr(split(emp,emp$job), ~first(.x$ename))

     

    Results
    # A tibble: 1 x 5
      ANALYST CLERK MANAGER PRESIDENT SALESMAN
      <chr>   <chr> <chr>   <chr>     <chr>   
    1 SCOTT   SMITH JONES   KING      ALLEN   

     


     

    R Programming
    %%R
    
    do.call(rbind, purrr::map_dfr(split(emp,emp$job), ~first(.x$ename)) )

     

    Results
              [,1]   
    ANALYST   "SCOTT"
    CLERK     "SMITH"
    MANAGER   "JONES"
    PRESIDENT "KING" 
    SALESMAN  "ALLEN"

     


     

    R Programming
    %%R
    
    emp %>%
      tbl_df() %>%
      nest(-job) %>%
      mutate(Quantiles = map(data, ~ first(.$ename))) %>% 
      unnest(Quantiles)

     

    Results
    # A tibble: 5 x 3
      job       data             Quantiles
      <chr>     <list>           <chr>    
    1 CLERK     <tibble [4 x 7]> SMITH    
    2 SALESMAN  <tibble [4 x 7]> ALLEN    
    3 MANAGER   <tibble [3 x 7]> JONES    
    4 ANALYST   <tibble [2 x 7]> SCOTT    
    5 PRESIDENT <tibble [1 x 7]> KING     

     


    [참고] Emulate the window function of SQL in R

    • [stackoverflow 참고] Emulate the window function of SQL in R [링크]
    R Programming
    %%R
    
    withmooc <- emp
    
    unlist(with(withmooc[order(-emp$sal),], by(ename, job, head,1)))

     

    Results
    job: ANALYST
    [1] "SCOTT"
    ------------------------------------------------------------ 
    job: CLERK
    [1] "MILLER"
    ------------------------------------------------------------ 
    job: MANAGER
    [1] "JONES"
    ------------------------------------------------------------ 
    job: PRESIDENT
    [1] "KING"
    ------------------------------------------------------------ 
    job: SALESMAN
    [1] "ALLEN"

     


     

    R Programming
    %%R
    
    aggregate(ename ~ job, data=emp[order(-emp$sal),], FUN=head,1)

     

    Results
            job  ename
    1   ANALYST  SCOTT
    2     CLERK MILLER
    3   MANAGER  JONES
    4 PRESIDENT   KING
    5  SALESMAN  ALLEN

     


     

    R Programming
    %%R
    
    with(emp[order(-emp$sal),], ave(ename, job, FUN=head,1))

     

    Results
     [1] "KING"   "SCOTT"  "FORD"   "JONES"  "BLAKE"  "CLARK"  "ALLEN"  "TURNER"
     [9] "MILLER" "WARD"   "MARTIN" "ADAMS"  "JAMES"  "SMITH" 

     


    [참고] first / Last

    • 리스트 반복 작업 purrr 패키지
    R Programming
    %%R
    
    purrr::map_chr(split(emp,emp$job), ~first(.x$ename))

     

    Results
      ANALYST     CLERK   MANAGER PRESIDENT  SALESMAN 
      "SCOTT"   "SMITH"   "JONES"    "KING"   "ALLEN" 

     


     

    R Programming
    %%R
    
    withmooc <- emp
    
    do.call(rbind, lapply(split(withmooc[order(withmooc$sal, decreasing=TRUE),], withmooc$job), head, 1))

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
    * <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30
    2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    3  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
    4  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
    5  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20

     


     

    R Programming
    %%R
    
    do.call(rbind, lapply(split(withmooc[order(withmooc$sal, decreasing=TRUE),], withmooc$job), `[`, 1, ))

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
    * <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30
    2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    3  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
    4  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
    5  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20

     

     


    4. R Dplyr Package

     

    [참고] dplyr 함수 구문 참고

    R Programming
    %%R
    
    dplyr::first

     

    Results
    function (x, order_by = NULL, default = default_missing(x)) 
    {
        nth(x, 1L, order_by = order_by, default = default)
    }
    <bytecode: 0x00000210154b2470>
    <environment: namespace:dplyr>

     


     

    R Programming
    %%R
    
    dplyr::nth

     

    Results
    function (x, n, order_by = NULL, default = default_missing(x)) 
    {
        if (length(n) != 1 || !is.numeric(n)) {
            abort("`n` must be a single integer.")
        }
        n <- trunc(n)
        if (n == 0 || n > length(x) || n < -length(x)) {
            return(default)
        }
        if (n < 0) {
            n <- length(x) + n + 1
        }
        if (is.null(order_by)) {
            x[[n]]
        }
        else {
            x[[order(order_by)[[n]]]]
        }
    }
    <bytecode: 0x00000210103471a8>
    <environment: namespace:dplyr>

     


     

    R Programming
    %%R
    
    emp %>%
      dplyr::arrange(-sal) %>%
      dplyr::group_by(job) %>%
      dplyr::mutate(first_value = dplyr::first(ename, order_by = desc(sal) )) %>%
      dplyr::arrange(job) %>%
      head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   job [4]
       empno ename  job         mgr hiredate     sal  comm deptno first_value
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20 SCOTT      
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20 SCOTT      
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10 MILLER     
     4  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20 MILLER     
     5  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30 MILLER     
     6  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20 MILLER     
     7  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20 JONES      
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30 JONES      
     9  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10 JONES      
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10 KING       

     


    purrr 패키지

    R Programming
    %%R
    
    withmooc <- emp[order(withmooc$sal, decreasing=TRUE),]
    
    emp %>%
      dplyr::arrange(-sal) %>%
      dplyr::left_join( ( purrr::map_dfr(split(withmooc,withmooc$job), ~first(.x$ename)) %>%
                          gather(key = job, value = first_job )
                        ),
                        id = "job"
                      ) %>%
      dplyr::arrange(job) %>%
      head(10)

     

    Results
    Joining, by = "job"
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno first_job
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <chr>    
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20 SCOTT    
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20 SCOTT    
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10 SMITH    
     4  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20 SMITH    
     5  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30 SMITH    
     6  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20 SMITH    
     7  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20 JONES    
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30 JONES    
     9  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10 JONES    
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10 KING     

     


    [참고] First / Last

    R Programming
    %%R
    
    # 단일 건에 대한 중복 처리 필요(PRESIDENT)
    emp %>% arrange(desc(sal)) %>% group_by(job) %>% slice(c(1,n()))

     

    Results
    # A tibble: 10 x 8
    # Groups:   job [5]
       empno ename  job         mgr hiredate     sal  comm deptno
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
     5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
     6  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
     7  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     8  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     9  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30
    10  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30

     


     

    R Programming
    %%R
    
    emp %>% 
      group_by(job) %>% 
      slice(c(which.min(sal), which.max(sal)))

     

    Results
    # A tibble: 10 x 8
    # Groups:   job [5]
       empno ename  job         mgr hiredate     sal  comm deptno
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20
     2  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
     4  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
     5  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
     7  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     8  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     9  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30
    10  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30

     


     

    R Programming
    %%R
    
    emp %>% 
      dplyr::arrange(desc(sal)) %>%
      split(f = emp$job)        %>%   
      purrr::map_df(first)

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30
    2  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    3  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
    4  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
    5  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20

     

     


    5. R sqldf Package

     

    • 직무(Job) 별 급여가 가장 높은 사람
    R Programming
    %%R
    
    sqldf(" SELECT  ENAME,
                      JOB,
                      SAL,
                      FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY 
            FROM emp ")[1:10, ]

     

    Results
        ename       job  sal FIRST_SALARY
    1   SCOTT   ANALYST 3000        SCOTT
    2    FORD   ANALYST 3000        SCOTT
    3  MILLER     CLERK 1300       MILLER
    4   ADAMS     CLERK 1100       MILLER
    5   JAMES     CLERK  950       MILLER
    6   SMITH     CLERK  800       MILLER
    7   JONES   MANAGER 2975        JONES
    8   BLAKE   MANAGER 2850        JONES
    9   CLARK   MANAGER 2450        JONES
    10   KING PRESIDENT 5000         KING

     

     


    6. Python pandasql Package

     

    Python Programming
    ps.sqldf(" SELECT  ENAME,     \
                       JOB,       \
                       SAL,       \
                       FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY      \
               FROM emp  ").head()

     

    Results
    	ename	job	sal	FIRST_SALARY
    0	SCOTT	ANALYST	3000	SCOTT
    1	FORD	ANALYST	3000	SCOTT
    2	MILLER	CLERK	1300	MILLER
    3	ADAMS	CLERK	1100	MILLER
    4	JAMES	CLERK	950	MILLER

     

     


    7. R data.table Package

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    
    DT[order(-sal) , first_value := first(ename), by = job][order(job),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno first_value
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       SCOTT
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       SCOTT
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      MILLER
     4:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      MILLER
     5:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      MILLER
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      MILLER
     7:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       JONES
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       JONES
     9:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       JONES
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        KING

     


     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    # s = dplyr::first(.SD, order_by = desc(sal) )
    
    DT[, `:=` ( first_value  = dplyr::first(ename, order_by = desc(sal) ), 
                first_value1 = na.omit(unlist(.SD))[1],
                last_value   = tail(na.omit(unlist(.SD)),1)), 
              by = job,
              .SDcols = c("ename")][order(job),][, c("mgr","sal","comm","depno","job","hiredate") := NULL][1:8, ]

     

    Results
       empno  ename deptno first_value first_value1 last_value
    1:  7788  SCOTT     20       SCOTT        SCOTT       FORD
    2:  7902   FORD     20       SCOTT        SCOTT       FORD
    3:  7369  SMITH     20      MILLER        SMITH     MILLER
    4:  7876  ADAMS     20      MILLER        SMITH     MILLER
    5:  7900  JAMES     30      MILLER        SMITH     MILLER
    6:  7934 MILLER     10      MILLER        SMITH     MILLER
    7:  7566  JONES     20       JONES        JONES      CLARK
    8:  7698  BLAKE     30       JONES        JONES      CLARK

     


     

    R Programming
    %%R
    DT          <- data.table(emp)
    
    DT[order(-sal) , first_value := xts::first(ename) , by=job][order(job), ][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno first_value
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       SCOTT
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       SCOTT
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      MILLER
     4:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      MILLER
     5:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      MILLER
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      MILLER
     7:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       JONES
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       JONES
     9:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       JONES
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        KING

     


     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    
    DT[ order(-sal), `:=` (first_ename  = ename[1],
                           first_ename1 = first(ename), 
                           max_ename    = max(ename),
                           min_ename    = min(ename), last_ename=last(ename)),by=job][order(job,-sal),][, c("mgr","sal","comm","depno","job","hiredate") := NULL]

     

    Results
        empno  ename deptno first_ename first_ename1 max_ename min_ename last_ename
     1:  7788  SCOTT     20       SCOTT        SCOTT     SCOTT      FORD       FORD
     2:  7902   FORD     20       SCOTT        SCOTT     SCOTT      FORD       FORD
     3:  7934 MILLER     10      MILLER       MILLER     SMITH     ADAMS      SMITH
     4:  7876  ADAMS     20      MILLER       MILLER     SMITH     ADAMS      SMITH
     5:  7900  JAMES     30      MILLER       MILLER     SMITH     ADAMS      SMITH
     6:  7369  SMITH     20      MILLER       MILLER     SMITH     ADAMS      SMITH
     7:  7566  JONES     20       JONES        JONES     JONES     BLAKE      CLARK
     8:  7698  BLAKE     30       JONES        JONES     JONES     BLAKE      CLARK
     9:  7782  CLARK     10       JONES        JONES     JONES     BLAKE      CLARK
    10:  7839   KING     10        KING         KING      KING      KING       KING
    11:  7499  ALLEN     30       ALLEN        ALLEN      WARD     ALLEN     MARTIN
    12:  7844 TURNER     30       ALLEN        ALLEN      WARD     ALLEN     MARTIN
    13:  7521   WARD     30       ALLEN        ALLEN      WARD     ALLEN     MARTIN
    14:  7654 MARTIN     30       ALLEN        ALLEN      WARD     ALLEN     MARTIN

     


    [참고] 그룹별 First / Last (단일 건의 중복 문제는 존재)

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[ DT[order(sal, decreasing=TRUE), .I[c(1L,.N)], by=job]$V1 ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno
     1:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
     3:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20
     4:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20
     6:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10
     7:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30
     8:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30
     9:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10
    10:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20

     


    8. Python DuckDB의 SQL

     

    Python Programming
    %%sql
      SELECT ENAME,
             JOB,
             SAL,
             FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY
      FROM   emp
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT ENAME,                                                                      \
                        JOB,                                                                        \
                        SAL,                                                                        \
                        FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY   \
                 FROM   emp                                                                         \
                 LIMIT  6 ").df()

     

    Results
        ename       job   sal FIRST_SALARY
    0   SCOTT   ANALYST  3000        SCOTT
    1    FORD   ANALYST  3000        SCOTT
    2   ALLEN  SALESMAN  1600        ALLEN
    3  TURNER  SALESMAN  1500        ALLEN
    4    WARD  SALESMAN  1250        ALLEN
    5  MARTIN  SALESMAN  1250        ALLEN

     

     

     


    Gyeongbokgung Palace, Sajik-ro, Sejongno, Jongno-gu, Seoul (https://unsplash.com/photos/jbnuIDkOrFU)

      --------------------------------------------  

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글