본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[변수 생성] 신규 변수를 생성하여 출력- 153 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2023. 1. 6.

포스팅 목차

    153. Define variable representing the expressions used to calculate on employee’s total annual remuneration.

     

    * 수수료를 포함한 직원의 년봉을 계산하여 신규 변수를 생성하시오.


    • Oracle : NVL 함수
    • 파이썬 Pandas : replace(), np.nan
    • R 프로그래밍 : transform(), ifelse(), is.na()
    • R Dplyr Package : dplyr::mutate(), ifelse(), is.na()
    • R sqldf Package : ifnull()
    • Python pandasql Package : ifnull()
    • R data.table Package : ifelse(), is.na()
    • SAS Proc SQL : Coalesce(), IFN() 함수
    • SAS Data Step : Coalesce(), IFN() 함수, KEEP 구문
    • Python Dfply Package : if_else(), isnull()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal로 할당한다.

     

    Oracle Programming
    select empno,sal,comm,
           (sal+nvl(comm,0))*12 emp_ann_sal
    from   emp;

     


    2. Python Pandas(파이썬)

    수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal로 할당한다.

     

    Python Programming
    import copy 
    withmooc = copy.copy(emp)
    
    withmooc['emp_ann_sal'] = 12 * (withmooc['sal'] + withmooc['comm'].replace(np.nan,0));
    
    withmooc.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno emp_ann_sal
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 9600.0
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 22800.0
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 21000.0
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 35700.0
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 31800.0

     

     


    3. R Programming (R Package)

    transform() 함수를 사용하여 수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal을 생성한다.

     

    R Programming
    %%R
    withmooc <- emp
    
    transform(withmooc, tot_sal = 12 * (sal + ifelse(is.na(comm),0,comm) )) %>% head()

     

    Results
      empno  ename      job  mgr   hiredate  sal comm deptno tot_sal
    1  7369  SMITH    CLERK 7902 1980-12-17  800   NA     20    9600
    2  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30   22800
    3  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30   21000
    4  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20   35700
    5  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30   31800
    6  7698  BLAKE  MANAGER 7839 1981-03-01 2850   NA     30   34200

     


    수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal을 생성한다.

     

    R Programming
    %%R
    
    withmooc <- emp
    withmooc['emp_ann_sal'] <- 12 * ( withmooc['sal'] + ifelse(is.na(withmooc$comm),0,withmooc$comm) )
    
    withmooc %>% head()

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno emp_ann_sal
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>       <dbl>
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20        9600
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30       22800
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30       21000
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20       35700
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30       31800
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30       34200

     


    4. R Dplyr Package

    dplyr::mutate() 함수를 사용하여 수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal을 생성한다.

    R Programming
    %%R
    
    emp %>% 
      dplyr::mutate( emp_ann_sal = 12 * (sal+ifelse(is.na(comm),0,comm)) ) %>%
      head()

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno emp_ann_sal
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl>       <dbl>
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20        9600
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30       22800
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30       21000
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20       35700
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30       31800
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30       34200

     


    5. R sqldf Package

     

    R Programming
    %%R
    
    sqldf(" select empno,ename,
                   (sal + ifnull(comm,0))*12 emp_ann_sal
            from   emp")  %>% head()

     

    Results
      empno  ename emp_ann_sal
    1  7369  SMITH        9600
    2  7499  ALLEN       22800
    3  7521   WARD       21000
    4  7566  JONES       35700
    5  7654 MARTIN       31800
    6  7698  BLAKE       34200

     


    6. Python pandasql Package

    수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal로 할당한다.

     

    Python Programming
    ps.sqldf(" select empno,ename,                            \
                      (sal + ifnull(comm,0))*12 emp_ann_sal   \
               from   emp").head()

     

    Results
      empno ename emp_ann_sal
    0 7369 SMITH 9600.0
    1 7499 ALLEN 22800.0
    2 7521 WARD 21000.0
    3 7566 JONES 35700.0
    4 7654 MARTIN 31800.0

     


    7. R data.table Package

    수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal로 할당한다.

     

    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[ ,emp_ann_sal := 12 * (sal+ifelse(is.na(comm),0,comm))][1:5, ]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno emp_ann_sal
    1:  7369  SMITH    CLERK 7902 1980-12-17  800   NA     20        9600
    2:  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30       22800
    3:  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30       21000
    4:  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20       35700
    5:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30       31800

     


    8. SAS Proc SQL

    수수료를 포함한 직원의 1년간 년봉을 계산하여 신규변수 emp_ann_sal로 할당한다.

     

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select EMPNO,ENAME,
               (sal+COALESCE(comm,0))*12   AS emp_ann_sal,
               (sal+IFN(comm=.,0,comm))*12 AS emp_ann_sal_1
        from   emp;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=5);RUN;

     

    Results
    OBS empno ename emp_ann_sal emp_ann_sal_1
    1 7369 SMITH 9600 9600
    2 7499 ALLEN 22800 22800
    3 7521 WARD 21000 21000
    4 7566 JONES 35700 35700
    5 7654 MARTIN 31800 31800

     


    9. SAS Data Step

     

    SAS Programming
    %%SAS sas
    
    DATA STATSAS_2;
     SET EMP;
         emp_ann_sal   = (sal+COALESCE(comm,0))*12;
         emp_ann_sal_1 = (sal+IFN(comm=.,0,comm))*12;
         keep empno ename emp_:;
    RUN;
    PROC PRINT data=STATSAS_2(obs=5);RUN;

     

    Results
    OBS empno ename emp_ann_sal emp_ann_sal_1
    1 7369 SMITH 9600 9600
    2 7499 ALLEN 22800 22800
    3 7521 WARD 21000 21000
    4 7566 JONES 35700 35700
    5 7654 MARTIN 31800 31800

     


    10. Python Dfply Package

     

    Python Programming
    emp >> \
      mutate( emp_ann_sal = 12 * (X.sal+if_else(X.comm.isnull(),0,X.comm)) ) >> \
      head()

     

    Results
      empno ename job mgr hiredate sal comm deptno emp_ann_sal
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 9600.0
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 22800.0
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 21000.0
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 35700.0
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 31800.0

     


    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트

    반응형

    댓글