본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[기초 통계량 - 총합계(Total) 계산] 연산작업을 통한 신규변수 생성 - 41 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 10.

포스팅 목차

     

    41. Display name, Sal, hra, pf, da, total Sal for each employee. The output should be in the order of total Sal, hra 15% of Sal, da 10% of sal, pf 5% of sal total salary will be (sal + hra+ da)-pf.

     

    * 아래의 수식을 사용하여서 총급여를 계산하시오.

    • HRA = sal * 15% , PF = sal * 5%, DA = sal * 10%
    • 총급여 = (sal + HRA + DF) - PF

     


    • [신규 변수 생성] 복잡한 연산 작업 수행
    • Oracle :
    • 파이썬 Pandas : .assign(), lambda() 함수
    • R 프로그래밍 : transform() 함수
    • R Dplyr Package : dplyr::mutate()
    • R sqldf Package :
    • Python pandasql Package :
    • R data.table Package : 복수의 변수 생성
    • SAS Proc SQL :
    • SAS Data Step : keep
    • Python Dfply Package : mutate(), select()
    • 파이썬 Base 프로그래밍 :

     


    1. 오라클(Oracle)

    • 사칙연산을 통한 신규 변수 생성
    Oracle Programming
    select ename,sal,
           sal*15/100 HRA, 
           sal*5/100  PF, 
           sal*10/100 DA,
           sal+sal*15/100-sal*5/100+sal*10/100 TOTAL_SALARY 
    from   emp

     


    2. 파이썬(Pandas)

    • 개별 변수 생성
    Python Programming
    emp['HRA'] = emp['sal']*15/100
    emp['PF']  = emp['sal']*5/100
    emp['DA']  = emp['sal']*10/100
    emp['TOTAL_SALARY'] = (emp['sal']+emp['sal']*15/100-emp['sal']*5/100+emp['sal']*10/100)
    emp.head()

     

    Results
      empno ename job mgr hiredate sal comm deptno HRA PF DA TOTAL_SALARY
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 120.00 40.00 80.0 960.0
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 240.00 80.00 160.0 1920.0
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 187.50 62.50 125.0 1500.0
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 446.25 148.75 297.5 3570.0
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 187.50 62.50 125.0 1500.0

    • assign 사용하여서 일괄 변수 생성하기
    Python Programming
    emp.assign( HRA = emp['sal']*15/100,
                PF  = emp['sal']*5/100,
                DA  = emp['sal']*10/100,
                TOTAL_SALARY = (emp['sal']+emp['sal']*15/100-emp['sal']*5/100+emp['sal']*10/100)).head()

     

    Results
      empno ename job mgr hiredate sal comm deptno HRA PF DA TOTAL_SALARY
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 120.00 40.00 80.0 960.0
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 240.00 80.00 160.0 1920.0
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 187.50 62.50 125.0 1500.0
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 446.25 148.75 297.5 3570.0
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 187.50 62.50 125.0 1500.0

     


    • assign 함수와 lamda 함수를 함께 사용
    Python Programming
    emp.assign( HRA = lambda x: x['sal']*15/100,
                PF  = lambda x: x['sal']*5/100,
                DA  = lambda x: x['sal']*10/100,
                TOTAL_SALARY = (lambda x: x['sal']+x['sal']*15/100-x['sal']*5/100+x['sal']*10/100)).head()

     

    Results
      empno ename job mgr hiredate sal comm deptno HRA PF DA TOTAL_SALARY
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 120.00 40.00 80.0 960.0
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 240.00 80.00 160.0 1920.0
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 187.50 62.50 125.0 1500.0
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 446.25 148.75 297.5 3570.0
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 187.50 62.50 125.0 1500.0

     


    3. R Programming (R Package)

    • 개별 변수 생성
    R Programming
    %%R
    
    emp['HRA'] <- emp["sal"]*15/100
    emp['PF']  <- emp["sal"]*5/100
    emp['DA']  <- emp["sal"]*10/100
    emp['TOTAL_SALARY'] <- (emp["sal"]+emp["sal"]*15/100-emp["sal"]*5/100+emp["sal"]*10/100)
    
    head(emp)

     

    Results
    # A tibble: 6 x 12
      empno ename  job        mgr hiredate     sal  comm deptno   HRA    PF    DA TOTAL_SALARY
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>        <dbl>
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20  120   40     80           960
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30  240   80    160          1920
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30  188.  62.5  125          1500
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20  446. 149.   298.         3570
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30  188.  62.5  125          1500
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30  428. 142.   285          3420

     


    • transform함수를 사용하여서 일괄 변수 생성하기
    R Programming
    %%R
    
    emp_chg <- transform(emp, HRA = sal*15/100, 
                              PF  = sal*5/100,
                              DA  = sal*10/100,
                              TOTAL_SALARY = (sal+sal*15/100-sal*5/100+sal*10/100) )
    head(emp_chg)

     

    Results
      empno  ename      job  mgr   hiredate  sal comm deptno    HRA     PF    DA TOTAL_SALARY
    1  7369  SMITH    CLERK 7902 1980-12-17  800   NA     20 120.00  40.00  80.0          960
    2  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30 240.00  80.00 160.0         1920
    3  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30 187.50  62.50 125.0         1500
    4  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20 446.25 148.75 297.5         3570
    5  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30 187.50  62.50 125.0         1500
    6  7698  BLAKE  MANAGER 7839 1981-03-01 2850   NA     30 427.50 142.50 285.0         3420

     


    4. R Dplyr Package

    • 사칙연산을 통한 신규 변수 생성
    R Programming
    %%R
    options(width = 150)
    emp %>% dplyr::mutate(HRA = sal*15/100, 
                          PF  = sal*5/100,
                          DA = sal*10/100,
                          TOTAL_SALARY = (sal+sal*15/100-sal*5/100+sal*10/100) )

     

    Results
    # A tibble: 14 x 12
       empno ename  job         mgr hiredate     sal  comm deptno   HRA    PF    DA TOTAL_SALARY
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>        <dbl>
     1  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20  120   40     80           960
     2  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30  240   80    160          1920
     3  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30  188.  62.5  125          1500
     4  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20  446. 149.   298.         3570
     5  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30  188.  62.5  125          1500
     6  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30  428. 142.   285          3420
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10  368. 122.   245          2940
     8  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20  450  150    300          3600
     9  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10  750  250    500          6000
    10  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30  225   75    150          1800
    11  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20  165   55    110          1320
    12  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30  142.  47.5   95          1140
    13  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20  450  150    300          3600
    14  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10  195   65    130          1560

     


    5. R sqldf Package

    • 사칙연산을 통한 신규 변수 생성
    R Programming
    %%R
    sqldf("select ename,sal,
                  sal*15/100 HRA, 
                  sal*10/100 DA,
                  sal + sal*15/100 - sal*5/100 + sal*10/100 TOTAL_SALARY 
           from emp")

     

    Results
        ename  sal    HRA    DA TOTAL_SALARY
    1   SMITH  800 120.00  80.0          960
    2   ALLEN 1600 240.00 160.0         1920
    3    WARD 1250 187.50 125.0         1500
    4   JONES 2975 446.25 297.5         3570
    5  MARTIN 1250 187.50 125.0         1500
    6   BLAKE 2850 427.50 285.0         3420
    7   CLARK 2450 367.50 245.0         2940
    8   SCOTT 3000 450.00 300.0         3600
    9    KING 5000 750.00 500.0         6000
    10 TURNER 1500 225.00 150.0         1800
    11  ADAMS 1100 165.00 110.0         1320
    12  JAMES  950 142.50  95.0         1140
    13   FORD 3000 450.00 300.0         3600
    14 MILLER 1300 195.00 130.0         1560

     


    6. Python pandasql Package

    • 사칙연산을 통한 신규 변수 생성
    Python Programming
    ps.sqldf("select ename,sal,sal*15/100 HRA, sal*5/100 PF, sal*10/100 DA,     \
                     sal+sal*15/100-sal*5/100+sal*10/100 TOTAL_SALARY           \
              from emp").head()

     

    Results
      ename sal HRA PF DA TOTAL_SALARY
    0 SMITH 800 120 40 80 960
    1 ALLEN 1600 240 80 160 1920
    2 WARD 1250 187 62 125 1500
    3 JONES 2975 446 148 297 3570
    4 MARTIN 1250 187 62 125 1500

     


    7. R data.table Package

    • 사칙연산을 통한 신규 변수 생성
    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[, { ename <- ename
           sal   <- sal
           HRA   <- sal * 15/100
           PF    <- sal * 5/100
           DA    <- sal * 10/100
           TOTAL_SALARY <- sal + sal * 15/100 - sal * 5/100 + sal * 10/100
        .( ename = ename, sal = sal, HRA = HRA, PF = PF, DA = DA, 
           TOTAL_SALARY = TOTAL_SALARY)
    }]

     

    Results
         ename  sal    HRA     PF    DA TOTAL_SALARY
     1:  SMITH  800 120.00  40.00  80.0          960
     2:  ALLEN 1600 240.00  80.00 160.0         1920
     3:   WARD 1250 187.50  62.50 125.0         1500
     4:  JONES 2975 446.25 148.75 297.5         3570
     5: MARTIN 1250 187.50  62.50 125.0         1500
     6:  BLAKE 2850 427.50 142.50 285.0         3420
     7:  CLARK 2450 367.50 122.50 245.0         2940
     8:  SCOTT 3000 450.00 150.00 300.0         3600
     9:   KING 5000 750.00 250.00 500.0         6000
    10: TURNER 1500 225.00  75.00 150.0         1800
    11:  ADAMS 1100 165.00  55.00 110.0         1320
    12:  JAMES  950 142.50  47.50  95.0         1140
    13:   FORD 3000 450.00 150.00 300.0         3600
    14: MILLER 1300 195.00  65.00 130.0         1560

     


    8. SAS Proc SQL

    • 사칙연산을 통한 신규 변수 생성
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select ename,sal,
               sal*15/100 AS HRA, 
               sal*5/100  AS PF, 
               sal*10/100 AS DA,
               sal+sal*15/100-sal*5/100+sal*10/100 AS TOTAL_SALARY 
        from   emp;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=5);RUN;

     

    Results
    OBS ename sal HRA PF DA TOTAL_SALARY
    1 SMITH 800 120.00 40.00 80.0 960
    2 ALLEN 1600 240.00 80.00 160.0 1920
    3 WARD 1250 187.50 62.50 125.0 1500
    4 JONES 2975 446.25 148.75 297.5 3570
    5 MARTIN 1250 187.50 62.50 125.0 1500

     


    9. SAS Data Step

    • 사칙연산을 통한 신규 변수 생성
    SAS Programming
    %%SAS sas
    
    DATA STATSAS_2;
     SET EMP;
         HRA = sal*15/100;
         PF  = sal*5/100;
         DA  = sal*10/100;
         TOTAL_SALARY = sal+sal*15/100-sal*5/100+sal*10/100;
    
         KEEP ename sal HRA PF DA TOTAL_SALARY;
    RUN;
    PROC PRINT data=STATSAS_2(obs=3);RUN;

     

    Results
    OBS ename sal HRA PF DA TOTAL_SALARY
    1 SMITH 800 120.0 40.0 80 960
    2 ALLEN 1600 240.0 80.0 160 1920
    3 WARD 1250 187.5 62.5 125 1500

     


    10. Python Dfply Package

    • 사칙연산을 통한 신규 변수 생성
    Python Programming
    emp >> \
      mutate( HRA = X.sal*15/100, 
              PF  = X.sal*5/100,
              DA  = X.sal*10/100,
              TOTAL_SALARY = (X.sal+X.sal*15/100-X.sal*5/100+X.sal*10/100) ) >> \
      select(X.empno, X.sal, X.HRA, X.PF, X.DA, X.TOTAL_SALARY)  

     

    Results
      empno sal HRA PF DA TOTAL_SALARY
    0 7369 800 120.00 40.00 80.0 960.0
    1 7499 1600 240.00 80.00 160.0 1920.0
    2 7521 1250 187.50 62.50 125.0 1500.0
    3 7566 2975 446.25 148.75 297.5 3570.0
    4 7654 1250 187.50 62.50 125.0 1500.0
    5 7698 2850 427.50 142.50 285.0 3420.0
    6 7782 2450 367.50 122.50 245.0 2940.0
    7 7788 3000 450.00 150.00 300.0 3600.0
    8 7839 5000 750.00 250.00 500.0 6000.0
    9 7844 1500 225.00 75.00 150.0 1800.0
    10 7876 1100 165.00 55.00 110.0 1320.0
    11 7900 950 142.50 47.50 95.0 1140.0
    12 7902 3000 450.00 150.00 300.0 3600.0
    13 7934 1300 195.00 65.00 130.0 1560.0

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글