본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 필터링 - 비상관 서브쿼리를 통한 최대값 추출] 테이블 데이터 전처리 비교 - 52 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 19.

포스팅 목차

     

    52. Display the employee number and name of employee working as CLERK and earning highest salary among CLERKS.

     

    * 사무직 직원 중에서 최대 연봉을 받는 직원의 정보를 출력하시오.


    • Oracle : 서브쿼리(Subquery), max() 함수
    • 파이썬 Pandas : max() 함수
    • R 프로그래밍 : max() 함수
    • R Dplyr Package : filter(), max() 함수, pull() : 벡터 추출, unique()
    • R sqldf Package : 서브쿼리(Subquery), max() 함수
    • Python pandasql Package : 서브쿼리(Subquery), max() 함수
    • R data.table Package : max() 함수
    • SAS Proc SQL : 서브쿼리(Subquery), max() 함수
    • SAS Data Step : PROC RANK, obs
    • Python Dfply Package : filter_by(), max() 함수, @pipe (사용자 정의 함수)
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • Max() 함수
    Oracle Programming
    select empno, ename 
    from   emp 
    where  job='CLERK' 
      and  sal=(select max(sal) from emp where job='CLERK')

     


    2. Python Pandas(파이썬)

    • Max() 함수
    Python Programming
    emp[(emp['job']=='CLERK') & (emp['sal'] == max(emp[emp['job']=='CLERK'].sal))]

     

    Results
      empno ename job mgr hiredate sal comm deptno
    13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10

     


    3. R Programming (R Package)

    • Max() 함수
    R Programming
    %%R
    
    emp[ emp$job == 'CLERK' & emp$sal == max(emp[emp$job == 'CLERK',]$sal),]

     

    Results
    # A tibble: 1 x 8
      empno ename  job     mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr> <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7934 MILLER CLERK  7782 1982-01-23  1300    NA     10

     


    4. R Dplyr Package

    • Max() 함수
    R Programming
    %%R
    
    emp %>%
      filter(job == "CLERK" & sal == ( emp %>% filter(job == 'CLERK') 
                                            %>% summarise(sal_max = max(sal)) 
                                            %>% pull(sal_max) 
                                            %>% unique) )

     

    Results
    # A tibble: 1 x 8
      empno ename  job     mgr hiredate     sal  comm deptno
      <dbl> <chr>  <chr> <dbl> <date>     <dbl> <dbl>  <dbl>
    1  7934 MILLER CLERK  7782 1982-01-23  1300    NA     10

     


    5. R sqldf Package

    • Max() 함수
    R Programming
    %%R
    sqldf(" select *
            from emp 
            where job='CLERK' 
              and sal=(select max(sal) from emp where job='CLERK')")

     

    Results
      empno  ename   job  mgr   hiredate  sal comm deptno
    1  7934 MILLER CLERK 7782 1982-01-23 1300   NA     10

     


    6. Python pandasql Package

    • Max() 함수
    Python Programming
    ps.sqldf("select * from emp where job='CLERK' and sal=(select max(sal) from emp where job='CLERK')")

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7934 MILLER CLERK 7782.0 1982/01/23 1300 None 10

     


    7. R data.table Package

    • Max() 함수
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[job == "CLERK" & sal == (DT[job == "CLERK", max(sal)]) ,]

     

    Results
       empno  ename   job  mgr   hiredate  sal comm deptno
    1:  7934 MILLER CLERK 7782 1982-01-23 1300   NA     10

     


    8. SAS Proc SQL

    • Max() 함수
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select empno, ename 
        from   emp
        where  job='CLERK'
          and  sal=(select max(sal) from emp where job='CLERK');
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename
    1 7934 MILLER

     


    9. SAS Data Step

    • 내부 조인(Inner join)
    SAS Programming
    %%SAS sas
    
    PROC SORT DATA=EMP OUT=SAL_MAX;
         BY DESCENDING SAL;
         WHERE job='CLERK';
    RUN;
    
    DATA STATSAS_3;
     MERGE EMP_1(IN=A) SAL_MAX(IN=B OBS=1);
         BY EMPNO;
         IF A AND B;   * IF A = 1 AND B = 1 THEN OUTPUT STATSAS_3;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7934 MILLER CLERK 7782 1982-01-23 1300 . 10

     


    • DATA STEP & PROC rank;
    SAS Programming
    %%SAS sas
    
    proc rank data=EMP_1 out= SAL_MAX(where=(SAL_rank=1)) ties=low descending;
          var   SAL;
          ranks SAL_rank;
          WHERE job='CLERK';
    run;
    
    DATA STATSAS_3;
     MERGE EMP_1(IN=A) SAL_MAX(IN=B);
         BY EMPNO;
         IF A AND B;   * IF A = 1 AND B = 1 THEN OUTPUT STATSAS_3;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno SAL_rank
    1 7934 MILLER CLERK 7782 1982-01-23 1300 . 10 1

     


    10. Python Dfply Package

    • dfply 의 pull() 함수 에러 발생 : 현재 파이썬 .ix 메서드 지원 안함.
    Python Programming
    # [참고] 함수 사용. ( 사용 시 변수를 큰 따옴표로 지정)
    @pipe
    def pull_fun(df, column=-1):
        return df.loc[:, column].item()
    
    
    emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max()) >> pull_fun("sal_max")

     

    Results
    1300

     


    • Max() 함수
    Python Programming
    emp >> filter_by(X.job == 'CLERK', X.sal == ( emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max()) >> pull_fun("sal_max") )    )

     

    Results
      empno ename job mgr hiredate sal comm deptno
    13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10

     


    • [참고] 반환되는 값이 리스트 형태인 경우 아래 함수 사용
    Python Programming
    @pipe
    def pull_list(df, column=-1):
        return df.loc[:, column]

     


    • 상수(Scalar) 추출 및 변환 하기
    Python Programming
    ## Dfply
    ## .values
    
    # np.asarray(emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max())).item()
    # ( (emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max()))['sal_max'] ).item()
    
    (emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max())).iloc[0,0]

     

    Results
    1300

     


    • Max() 함수
    Python Programming
    emp >> filter_by(X.job == "CLERK" , X.sal == ( (emp >> filter_by(X.job == 'CLERK') >> summarize(sal_max = X.sal.max())).iloc[0,0] ) )

     

    Results
      empno ename job mgr hiredate sal comm deptno
    13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글