본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

LAST_VALUE 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table]

by 기서무나구물 2021. 11. 26.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ LAST_VALUE Oracle Function ]

     


    FIRST_VALUE 분석 함수는 윈도우에서 정렬된 값 중에서 첫 번째 값을 반환하고, LAST_VALUE 분석 함수는 윈도우에서 정렬된 값 중에서 마지막 값을 반환한다.

     

    LAST_VALUE함수는 분석함수로 순서형 집합에서 마지막 값을 반환한다. 만약 집합내의 마지막 값이 Null이라면, 그 함수는 IGNORE NULL 옵션을 지정하지 않는 한 Null을 반환한다. 이 설정은 데이터의 정밀화에 유용하다. 만약 IGNORE NULL을 지정하면, 그때 LAST_VALUE함수는 집합 내에서 최초 Non-Null 값을 반환하거나, 만약 모든 값이 Null이라면 NULL을 반환한다.

     

    expr에 대하여 LAST_VALUE 또는 임의의 다른 분석 함수를 이용할수 없다. 분석함수는 중첩할 수 없지만, expr에 대하여 다른 이미 만들어진 함수 표현을 이용할 수 있다.

     

     


    1. Oracle(오라클)

    개별 직무 내에서 급여(sal)가 가장 적은 직원의 이름을 반환한다.

     

    Oracle Programming
    SELECT  ENAME,
            JOB,
            SAL,
            last_value(ENAME) over (partition by job order by sal desc) LAST_SALARY 
    FROM    emp
    order   by job, sal desc

     

    Results
    ENAME	JOB		SAL	LAST_SALARY
    --------------------------------------
    FORD	ANALYST		3000	SCOTT
    SCOTT	ANALYST		3000	SCOTT
    MILLER	CLERK		1300	MILLER
    ADAMS	CLERK		1100	ADAMS
    JAMES	CLERK		950	JAMES
    SMITH	CLERK		800	SMITH
    JONES	MANAGER		2975	JONES
    BLAKE	MANAGER		2850	BLAKE
    CLARK	MANAGER		2450	CLARK
    KING	PRESIDENT	5000	KING
    ALLEN	SALESMAN	1600	ALLEN
    TURNER	SALESMAN	1500	TURNER
    MARTIN	SALESMAN	1250	WARD
    WARD	SALESMAN	1250	WARD

     


    2. Python Pandas(파이썬)

    개별 직무 내에서 급여(sal)가 가장 적은 직원의 이름을 반환한다.

     

    Python Programming
    withmooc = emp.copy()
    
    group_first = withmooc.sort_values('sal',ascending = False).groupby('job')['ename'].last()
    withmooc['Time'] = withmooc['job'].map(group_first)
    
    withmooc.sort_values(['job','sal'],ascending=[True, False]).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	Time
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	FORD
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	FORD
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	SMITH
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	SMITH
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	SMITH
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	SMITH
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	CLARK
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	CLARK
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	CLARK
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	KING

     


     

    Python Programming
    withmooc = emp.copy()
    withmooc['first_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform(lambda x:x.iloc[-1])
    withmooc.sort_values(['job','sal'],ascending = [True,False]).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	first_value
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	FORD
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	FORD
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	SMITH
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	SMITH
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	SMITH
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	SMITH
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	CLARK
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	CLARK
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	CLARK
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	KING

     


    transform('last')

    Python Programming
    withmooc = emp.copy()
    withmooc['last_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform('last')
    withmooc.sort_values(['job','sal'],ascending = [True,False]).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	last_value
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	FORD
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	FORD
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	SMITH
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	SMITH
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	SMITH
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	SMITH
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	CLARK
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	CLARK
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	CLARK
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	KING

     

     


    [참고] rank()

    Python Programming
    withmooc = emp.copy()
    withmooc.sort_values('sal', ascending=True).groupby('job').apply(lambda x : x[ x['ename'].rank() ==1 ]["ename"])

     

    Results
    job          
    ANALYST    12     FORD
    CLERK      10    ADAMS
    MANAGER    5     BLAKE
    PRESIDENT  8      KING
    SALESMAN   1     ALLEN
    Name: ename, dtype: object

     


    min

    Python Programming
    emp.sort_values('sal',ascending = False).groupby('job')['ename'].apply(min)

     

    Results
    job
    ANALYST       FORD
    CLERK        ADAMS
    MANAGER      BLAKE
    PRESIDENT     KING
    SALESMAN     ALLEN
    Name: ename, dtype: object

     

     


    3. R Programming (R Package)

     

    dplyr::last()

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    plyr::ddply(emp,.(job),transform,last_value = dplyr::last(ename, order_by = desc(sal) ))[1:12, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno last_value
    1   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       FORD
    2   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       FORD
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      SMITH
    4   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      SMITH
    5   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      SMITH
    6   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      SMITH
    7   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      CLARK
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      CLARK
    9   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      CLARK
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       KING
    11  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30     MARTIN
    12  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30     MARTIN

     


    dplyr::last(x) 와 max()

    R Programming
    %%R
    
    withmooc <- emp[order(-emp$sal),]
    
    withmooc['last_value_1'] = with(withmooc, ave(x = ename, job, FUN =function(x) { dplyr::last(x) }))
    
    withmooc['last_value_2'] = with(emp[order(-emp$sal),], ave(x = ename, job, FUN =function(x) { x[max(which(!is.na(x)))] }))
    
    withmooc[order(withmooc$job),c('empno','ename','job','sal','last_value_1','last_value_2')][1:12, ]

     

    Results
    # A tibble: 12 x 6
       empno ename  job         sal last_value_1 last_value_2
       <dbl> <chr>  <chr>     <dbl> <chr>        <chr>       
     1  7788 SCOTT  ANALYST    3000 FORD         FORD        
     2  7902 FORD   ANALYST    3000 FORD         FORD        
     3  7934 MILLER CLERK      1300 SMITH        SMITH       
     4  7876 ADAMS  CLERK      1100 SMITH        SMITH       
     5  7900 JAMES  CLERK       950 SMITH        SMITH       
     6  7369 SMITH  CLERK       800 SMITH        SMITH       
     7  7566 JONES  MANAGER    2975 CLARK        CLARK       
     8  7698 BLAKE  MANAGER    2850 CLARK        CLARK       
     9  7782 CLARK  MANAGER    2450 CLARK        CLARK       
    10  7839 KING   PRESIDENT  5000 KING         KING        
    11  7499 ALLEN  SALESMAN   1600 MARTIN       MARTIN      
    12  7844 TURNER SALESMAN   1500 MARTIN       MARTIN      

     


    dplyr::last(x) 와 max()

    R Programming
    %%R
    
    withmooc <- emp[order(-emp$sal),]
    
    withmooc['last_value_1'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) {  dplyr::last(x) } )
    withmooc['last_value_2'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) {  x[max(which(!is.na(x)))] } )
    
    withmooc[order(withmooc$job),c('empno','ename','job','sal','last_value_1','last_value_2')][1:10, ]

     

    Results
    # A tibble: 10 x 6
       empno ename  job         sal last_value_1 last_value_2
       <dbl> <chr>  <chr>     <dbl> <chr>        <chr>       
     1  7788 SCOTT  ANALYST    3000 FORD         FORD        
     2  7902 FORD   ANALYST    3000 FORD         FORD        
     3  7934 MILLER CLERK      1300 SMITH        SMITH       
     4  7876 ADAMS  CLERK      1100 SMITH        SMITH       
     5  7900 JAMES  CLERK       950 SMITH        SMITH       
     6  7369 SMITH  CLERK       800 SMITH        SMITH       
     7  7566 JONES  MANAGER    2975 CLARK        CLARK       
     8  7698 BLAKE  MANAGER    2850 CLARK        CLARK       
     9  7782 CLARK  MANAGER    2450 CLARK        CLARK       
    10  7839 KING   PRESIDENT  5000 KING         KING        

     


    last()

    R Programming
    %%R
    
    withmooc <- emp[order(emp$job, -emp$sal),]
    
    last_var <- with(withmooc, purrr::map_chr(split(withmooc,withmooc$job), ~last(.x$ename)) ) 
    
    data.frame(withmooc, last_value =rep(last_var, table(withmooc$job)))[1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno last_value
    1   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       FORD
    2   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       FORD
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      SMITH
    4   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      SMITH
    5   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      SMITH
    6   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      SMITH
    7   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      CLARK
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      CLARK
    9   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      CLARK
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       KING

     


    tapply 와 last

    R Programming
    %%R
    
    withmooc <- emp[order(emp$job, -emp$sal),]
    
    Last_var <- with(withmooc, tapply(ename, job, last))   # head, 1
    
    print(Last_var)
    # First_var <- with(withmooc[order(-withmooc$sal),], tapply(ename, job, tail, 1))
    
    data.frame(withmooc, First_value =rep(Last_var, table(withmooc$job)))[1:10, ]
    
    # with(withmooc[order(-withmooc$sal),],  data.frame(emp, max.per.group=rep(tapply(ename, job, last), table(job))) )

     

    Results
      ANALYST     CLERK   MANAGER PRESIDENT  SALESMAN 
       "FORD"   "SMITH"   "CLARK"    "KING"  "MARTIN" 
       
       
       empno  ename       job  mgr   hiredate  sal comm deptno First_value
    1   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        FORD
    2   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20        FORD
    3   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       SMITH
    4   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20       SMITH
    5   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       SMITH
    6   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       SMITH
    7   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20       CLARK
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       CLARK
    9   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       CLARK
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        KING

     


    데이터 Merge 방식

    R Programming
    %%R
    
    withmooc <- emp[order(-emp$sal),]
    
    # split 되는 그룹내의 관측치 순서 보존을 위하여 데이터와 group 데이터 양쪽 재 정렬 필요
    mapping    <- t( purrr::map_dfc( base::split(withmooc[order(-withmooc$sal),],withmooc[order(-withmooc$sal),]$job) , ~last(.x$ename) ) )
    
    trans_data <- melt(mapping, value.name="last_ename")
    
    merge(withmooc,trans_data,by.x='job',by.y='Var1')[1:10,]

     

    Results
             job empno  ename  mgr   hiredate  sal comm deptno Var2 last_ename
    1    ANALYST  7788  SCOTT 7566 1982-12-09 3000   NA     20    1       FORD
    2    ANALYST  7902   FORD 7566 1981-12-03 3000   NA     20    1       FORD
    3      CLERK  7934 MILLER 7782 1982-01-23 1300   NA     10    1      SMITH
    4      CLERK  7876  ADAMS 7788 1983-01-12 1100   NA     20    1      SMITH
    5      CLERK  7900  JAMES 7698 1981-12-03  950   NA     30    1      SMITH
    6      CLERK  7369  SMITH 7902 1980-12-17  800   NA     20    1      SMITH
    7    MANAGER  7566  JONES 7839 1981-04-02 2975   NA     20    1      CLARK
    8    MANAGER  7698  BLAKE 7839 1981-03-01 2850   NA     30    1      CLARK
    9    MANAGER  7782  CLARK 7839 1981-01-09 2450   NA     10    1      CLARK
    10 PRESIDENT  7839   KING   NA 1981-11-17 5000   NA     10    1       KING

     


     

    [참고] 그룹별(Group By) 최종값(last)을 추출

    R Programming
    %%R
    
    # base::split(emp, emp$job)
    do.call(rbind, Map(last, base::split(emp, emp$job)))

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
    * <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20
    2  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
    3  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
    4  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    5  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30

     


    gather() 와 spread()

    R Programming
    %%R
    
    emp  %>% 
        rownames_to_column %>%
        gather(variable, value, -rowname) %>% 
        spread(variable, value)

     

    Results
    # A tibble: 14 x 9
       rowname comm  deptno empno ename  hiredate job       mgr   sal  
       <chr>   <chr> <chr>  <chr> <chr>  <chr>    <chr>     <chr> <chr>
     1 1       <NA>  20     7369  SMITH  4003     CLERK     7902  800  
     2 10      0     30     7844  TURNER 4268     SALESMAN  7698  1500 
     3 11      <NA>  20     7876  ADAMS  4759     CLERK     7788  1100 
     4 12      <NA>  30     7900  JAMES  4354     CLERK     7698  950  
     5 13      <NA>  20     7902  FORD   4354     ANALYST   7566  3000 
     6 14      <NA>  10     7934  MILLER 4405     CLERK     7782  1300 
     7 2       300   30     7499  ALLEN  4068     SALESMAN  7698  1600 
     8 3       500   30     7521  WARD   4070     SALESMAN  7698  1250 
     9 4       <NA>  20     7566  JONES  4109     MANAGER   7839  2975 
    10 5       1400  30     7654  MARTIN 4288     SALESMAN  7698  1250 
    11 6       <NA>  30     7698  BLAKE  4077     MANAGER   7839  2850 
    12 7       <NA>  10     7782  CLARK  4026     MANAGER   7839  2450 
    13 8       <NA>  20     7788  SCOTT  4725     ANALYST   7566  3000 
    14 9       <NA>  10     7839  KING   4338     PRESIDENT <NA>  5000 

     


    purrr::map_dfr() 와 last

    R Programming
    %%R
    
    purrr::map_dfr(split(emp,emp$job), ~last(.x$ename))

     

    Results
    # A tibble: 1 x 5
      ANALYST CLERK  MANAGER PRESIDENT SALESMAN
      <chr>   <chr>  <chr>   <chr>     <chr>   
    1 FORD    MILLER CLARK   KING      TURNER  

     


    purrr::map_dfr() 와 last

    R Programming
    %%R
    
    do.call(rbind, purrr::map_dfr(split(emp,emp$job), ~last(.x$ename)) )

     

    Results
              [,1]    
    ANALYST   "FORD"  
    CLERK     "MILLER"
    MANAGER   "CLARK" 
    PRESIDENT "KING"  
    SALESMAN  "TURNER"

     


    map() 와 last

    R Programming
    %%R
    
    emp %>%
      tbl_df() %>%
      dplyr::arrange(job,-sal) %>%
      nest(-job)               %>%
      mutate(Quantiles = map(data, ~ last(.$ename))) %>% 
      unnest(Quantiles)

     

    Results
    # A tibble: 5 x 3
      job       data             Quantiles
      <chr>     <list>           <chr>    
    1 ANALYST   <tibble [2 x 7]> FORD     
    2 CLERK     <tibble [4 x 7]> SMITH    
    3 MANAGER   <tibble [3 x 7]> CLARK    
    4 PRESIDENT <tibble [1 x 7]> KING     
    5 SALESMAN  <tibble [4 x 7]> MARTIN   

     


    [참고] Emulate the window function of SQL in R

    • [stackoverflow 참고] Emulate the window function of SQL in R [링크]
    R Programming
    %%R
    
    withmooc <- emp
    
    unlist(with(withmooc[order(-emp$sal),], by(ename, job, tail,1)))  # last

     

    Results
    job: ANALYST
    [1] "FORD"
    ------------------------------------------------------------ 
    job: CLERK
    [1] "SMITH"
    ------------------------------------------------------------ 
    job: MANAGER
    [1] "CLARK"
    ------------------------------------------------------------ 
    job: PRESIDENT
    [1] "KING"
    ------------------------------------------------------------ 
    job: SALESMAN
    [1] "MARTIN"

     


    aggregate() 와 tail

    R Programming
    %%R
    
    aggregate(ename ~ job, data=emp[order(-emp$sal),], FUN=tail,1)  # last

     

    Results
            job  ename
    1   ANALYST   FORD
    2     CLERK  SMITH
    3   MANAGER  CLARK
    4 PRESIDENT   KING
    5  SALESMAN MARTIN

     


    [참고] first / Last

    • 리스트 반복 작업 purrr 패키지
    R Programming
    %%R
    
    purrr::map_chr(split(emp,emp$job), ~last(.x$ename))

     

    Results
      ANALYST     CLERK   MANAGER PRESIDENT  SALESMAN 
       "FORD"  "MILLER"   "CLARK"    "KING"  "TURNER" 

     


     

    R Programming
    %%R
    withmooc <- emp[order(emp$sal, decreasing=TRUE),]
    
    do.call(rbind, lapply(split(withmooc, withmooc$job), tail, 1))  # last

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
    * <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20
    2  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
    3  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
    4  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    5  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30

     


     

    R Programming
    %%R
    withmooc <- emp[order(emp$sal, decreasing=TRUE),]
    split(withmooc, withmooc$job)

     

    Results
    $ANALYST
    # A tibble: 2 x 8
      empno ename job       mgr hiredate     sal  comm deptno
      <dbl> <chr> <chr>   <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7788 SCOTT ANALYST  7566 1982-12-09  3000    NA     20
    2  7902 FORD  ANALYST  7566 1981-12-03  3000    NA     20
    
    $CLERK
    # A tibble: 4 x 8
      empno ename  job     mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr> <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7934 MILLER CLERK  7782 1982-01-23  1300    NA     10
    2  7876 ADAMS  CLERK  7788 1983-01-12  1100    NA     20
    3  7900 JAMES  CLERK  7698 1981-12-03   950    NA     30
    4  7369 SMITH  CLERK  7902 1980-12-17   800    NA     20
    
    $MANAGER
    # A tibble: 3 x 8
      empno ename job       mgr hiredate     sal  comm deptno
      <dbl> <chr> <chr>   <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7566 JONES MANAGER  7839 1981-04-02  2975    NA     20
    2  7698 BLAKE MANAGER  7839 1981-03-01  2850    NA     30
    3  7782 CLARK MANAGER  7839 1981-01-09  2450    NA     10
    
    $PRESIDENT
    # A tibble: 1 x 8
      empno ename job         mgr hiredate     sal  comm deptno
      <dbl> <chr> <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7839 KING  PRESIDENT    NA 1981-11-17  5000    NA     10
    
    $SALESMAN
    # A tibble: 4 x 8
      empno ename  job        mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30
    2  7844 TURNER SALESMAN  7698 1981-09-08  1500     0     30
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30
    4  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30

     


    개별 직무 내에서 급여(sal)가 가장 적은 직원의 데이터를 반환한다.

    R Programming
    %%R
    withmooc <- emp[order(emp$sal, decreasing=TRUE),]
    
    do.call(rbind, lapply(split(withmooc, withmooc$job), last ))  # comma-space 1, ) : [1, ]

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
    * <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20
    2  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
    3  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
    4  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    5  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30

     


    참고 : base::split

     

    R Programming
    %%R
    
    withmooc <- emp[order(-emp$sal),]
    
    base::split(withmooc,withmooc$job)$CLERK

     

    Results
    # A tibble: 4 x 8
      empno ename  job     mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr> <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7934 MILLER CLERK  7782 1982-01-23  1300    NA     10
    2  7876 ADAMS  CLERK  7788 1983-01-12  1100    NA     20
    3  7900 JAMES  CLERK  7698 1981-12-03   950    NA     30
    4  7369 SMITH  CLERK  7902 1980-12-17   800    NA     20

     

     

     


    4. R Dplyr Package

     

    R Programming
    %%R
    
    dplyr::last

     

    Results
    function (x, order_by = NULL, default = default_missing(x)) 
    {
        nth(x, -1L, order_by = order_by, default = default)
    }
    <bytecode: 0x0000019dc1a70040>
    <environment: namespace:dplyr>

     


     

    R Programming
    %%R
    
    emp %>%
      dplyr::group_by(job) %>%
      dplyr::mutate(last_value = dplyr::last(ename, order_by = desc(sal) )) %>%
      dplyr::arrange(job,-sal)

     

    Results
    # A tibble: 14 x 9
    # Groups:   job [5]
       empno ename  job         mgr hiredate     sal  comm deptno last_value
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <chr>     
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20 FORD      
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20 FORD      
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10 SMITH     
     4  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20 SMITH     
     5  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30 SMITH     
     6  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20 SMITH     
     7  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20 CLARK     
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30 CLARK     
     9  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10 CLARK     
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10 KING      
    11  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30 MARTIN    
    12  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30 MARTIN    
    13  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30 MARTIN    
    14  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30 MARTIN    

     


    purrr 패키지

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc %>%
      dplyr::left_join( ( purrr::map_dfr(base::split(withmooc[order(-withmooc$sal),],withmooc[order(-withmooc$sal),]$job) , ~last(.x$ename)) %>%
                          gather(key = job, value = last_job )
                        ),
                        id = "job"
                      ) %>%
      dplyr::arrange(job,-sal)

     

    Results
    Joining, by = "job"
    # A tibble: 14 x 9
       empno ename  job         mgr hiredate     sal  comm deptno last_job
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <chr>   
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20 FORD    
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20 FORD    
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10 SMITH   
     4  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20 SMITH   
     5  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30 SMITH   
     6  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20 SMITH   
     7  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20 CLARK   
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30 CLARK   
     9  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10 CLARK   
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10 KING    
    11  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30 MARTIN  
    12  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30 MARTIN  
    13  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30 MARTIN  
    14  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30 MARTIN  

     


    • [참고] 직무(job) 내에서 급여(sal)를 내림차순으로 정렬 후 그룹 내의 First / Last 값을 반환
    R Programming
    %%R
    
    # 단일 건에 대한 중복 처리 필요(PRESIDENT)
    emp %>% 
      arrange(desc(sal)) %>% 
      group_by(job) %>% 
      slice(c(1,n()))

     

    Results
    # A tibble: 10 x 8
    # Groups:   job [5]
       empno ename  job         mgr hiredate     sal  comm deptno
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20
     2  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20
     3  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
     4  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
     5  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
     6  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
     7  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     8  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     9  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30
    10  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30

     


     

    R Programming
    %%R
    
    emp %>% 
      group_by(job) %>% 
      slice(c(which.min(sal), which.max(sal)))

     

    Results
    # A tibble: 10 x 8
    # Groups:   job [5]
       empno ename  job         mgr hiredate     sal  comm deptno
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
     1  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20
     2  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
     4  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10
     5  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
     6  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20
     7  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     8  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
     9  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30
    10  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30

     


     

    R Programming
    %%R
    
    withmooc <- emp[order(withmooc$sal, decreasing=TRUE),]
    
    withmooc %>% 
      split(f = withmooc$job) %>%
      purrr::map_df(last)

     

    Results
    # A tibble: 5 x 8
      empno ename  job         mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20
    2  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20
    3  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10
    4  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10
    5  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30

     

     


    5. R sqldf Package

     

    • last_value 현재 에러
    R Programming
    %%R
    
    sqldf(" SELECT  ENAME,
                      JOB,
                      SAL,
                      FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY,
                      LAST_VALUE(ENAME)  OVER (PARTITION BY JOB ORDER BY SAL DESC) LAST_SALARY 
            FROM emp
            order by job, sal desc")

     

    Results
        ename       job  sal FIRST_SALARY LAST_SALARY
    1   SCOTT   ANALYST 3000        SCOTT        FORD
    2    FORD   ANALYST 3000        SCOTT        FORD
    3  MILLER     CLERK 1300       MILLER      MILLER
    4   ADAMS     CLERK 1100       MILLER       ADAMS
    5   JAMES     CLERK  950       MILLER       JAMES
    6   SMITH     CLERK  800       MILLER       SMITH
    7   JONES   MANAGER 2975        JONES       JONES
    8   BLAKE   MANAGER 2850        JONES       BLAKE
    9   CLARK   MANAGER 2450        JONES       CLARK
    10   KING PRESIDENT 5000         KING        KING
    11  ALLEN  SALESMAN 1600        ALLEN       ALLEN
    12 TURNER  SALESMAN 1500        ALLEN      TURNER
    13   WARD  SALESMAN 1250        ALLEN      MARTIN
    14 MARTIN  SALESMAN 1250        ALLEN      MARTIN

     

     


    6. Python pandasql Package

     

    Python Programming
    ps.sqldf(" SELECT  ENAME,   \
                       JOB,     \
                       SAL,     \
                       last_value(ENAME) over (partition by job order by sal desc) LAST_SALARY      \
             FROM emp     \
             order by job, sal desc ").head(10)

     

    Results
    	ename	job		sal	LAST_SALARY
    0	SCOTT	ANALYST		3000	FORD
    1	FORD	ANALYST		3000	FORD
    2	MILLER	CLERK		1300	MILLER
    3	ADAMS	CLERK		1100	ADAMS
    4	JAMES	CLERK		950	JAMES
    5	SMITH	CLERK		800	SMITH
    6	JONES	MANAGER		2975	JONES
    7	BLAKE	MANAGER		2850	BLAKE
    8	CLARK	MANAGER		2450	CLARK
    9	KING	PRESIDENT	5000	KING

     

     


    7. R data.table Package

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    
    DT[order(-sal) , last_value := last(ename), by = job][order(job),][1:12, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno last_value
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       FORD
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       FORD
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      SMITH
     4:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      SMITH
     5:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      SMITH
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      SMITH
     7:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      CLARK
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      CLARK
     9:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      CLARK
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       KING
    11:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30     MARTIN
    12:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30     MARTIN

     


     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    # s = dplyr::last(.SD, order_by = desc(sal) )
    
    DT[, `:=` ( last_value_1 = dplyr::last(ename, order_by = desc(sal) ), 
                last_value_2 = tail(na.omit(unlist(.SD)),1),
                first_value1 = na.omit(unlist(.SD))[1]
                ), 
              by = job,
              .SDcols = c("ename")][order(job),][, c("mgr","comm","hiredate") := NULL][1:10, ]

     

    Results
        empno  ename       job  sal deptno last_value_1 last_value_2 first_value1
     1:  7788  SCOTT   ANALYST 3000     20         FORD         FORD        SCOTT
     2:  7902   FORD   ANALYST 3000     20         FORD         FORD        SCOTT
     3:  7369  SMITH     CLERK  800     20        SMITH       MILLER        SMITH
     4:  7876  ADAMS     CLERK 1100     20        SMITH       MILLER        SMITH
     5:  7900  JAMES     CLERK  950     30        SMITH       MILLER        SMITH
     6:  7934 MILLER     CLERK 1300     10        SMITH       MILLER        SMITH
     7:  7566  JONES   MANAGER 2975     20        CLARK        CLARK        JONES
     8:  7698  BLAKE   MANAGER 2850     30        CLARK        CLARK        JONES
     9:  7782  CLARK   MANAGER 2450     10        CLARK        CLARK        JONES
    10:  7839   KING PRESIDENT 5000     10         KING         KING         KING

     


     

    R Programming
    %%R
    DT          <- data.table(emp)
    
    DT[order(-sal) , last_ename := xts::last(ename) , by=job][order(job), ][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno last_ename
     1:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20       FORD
     2:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       FORD
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      SMITH
     4:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      SMITH
     5:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      SMITH
     6:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10      SMITH
     7:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      CLARK
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      CLARK
     9:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      CLARK
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       KING

     


     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    
    DT[ order(-sal), `:=` (fir_ename  = ename[.N],
                           last_ename1 = last(ename), 
                           max_ename    = max(ename),
                           min_ename    = min(ename), 
                           fir_ename_1=first(ename)),by=job][order(job,-sal),][,.SD, .SDcols = !c('empno','hiredate', 'mgr','comm','deptno')]

     

    Results
         ename       job  sal fir_ename last_ename1 max_ename min_ename fir_ename_1
     1:  SCOTT   ANALYST 3000      FORD        FORD     SCOTT      FORD       SCOTT
     2:   FORD   ANALYST 3000      FORD        FORD     SCOTT      FORD       SCOTT
     3: MILLER     CLERK 1300     SMITH       SMITH     SMITH     ADAMS      MILLER
     4:  ADAMS     CLERK 1100     SMITH       SMITH     SMITH     ADAMS      MILLER
     5:  JAMES     CLERK  950     SMITH       SMITH     SMITH     ADAMS      MILLER
     6:  SMITH     CLERK  800     SMITH       SMITH     SMITH     ADAMS      MILLER
     7:  JONES   MANAGER 2975     CLARK       CLARK     JONES     BLAKE       JONES
     8:  BLAKE   MANAGER 2850     CLARK       CLARK     JONES     BLAKE       JONES
     9:  CLARK   MANAGER 2450     CLARK       CLARK     JONES     BLAKE       JONES
    10:   KING PRESIDENT 5000      KING        KING      KING      KING        KING
    11:  ALLEN  SALESMAN 1600    MARTIN      MARTIN      WARD     ALLEN       ALLEN
    12: TURNER  SALESMAN 1500    MARTIN      MARTIN      WARD     ALLEN       ALLEN
    13:   WARD  SALESMAN 1250    MARTIN      MARTIN      WARD     ALLEN       ALLEN
    14: MARTIN  SALESMAN 1250    MARTIN      MARTIN      WARD     ALLEN       ALLEN

     


    [참고] 그룹별 First / Last (단일 건의 중복 문제는 존재)

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[ DT[order(sal, decreasing=TRUE), .I[c(1L,.N)], by=job]$V1 ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno
     1:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
     2:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
     3:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20
     4:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20
     5:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20
     6:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10
     7:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30
     8:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30
     9:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10
    10:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20

     

     


      --------------------------------------------  

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글