본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[문자변수 생성] 신규 변수 생성 - 문자 추출과 문자 결합 - 129 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2022. 12. 16.

포스팅 목차

    129. Display those employees whose first 2 characters from hire date-last 2 characters of salary?

     

    * 입사일자에서 첫 2글자와 급여의 마지막 2글자로 구성된 신규 변수를 생성하시오.


    • Oracle : substr(), to_char(), 문자 결합 연산자('||')
    • 파이썬 Pandas : str.slice(), 문자 결합 연산자('+'), astype(str)
    • R 프로그래밍 : paste(), paste0(), base::substr(), stringr::str_sub()
    • R Dplyr Package : dplyr::mutate(), stringi::stri_join(), stringr::str_c(), paste(), paste0(), stringr::str_sub()
    • R sqldf Package : typeof() 변수 형태 확인, datetime(), substr(), strftime(), cast(), 문자 결합 연산자('||')
    • Python pandasql Package : typeof() 변수 형태 확인, datetime(), substr(), strftime(), cast(), 문자 결합 연산자('||')
    • R data.table Package : stringi::stri_join(), stringr::str_c(), paste(), paste0(), stringr::str_sub()
    • SAS Proc SQL : put(), strip(), substr(), reverse(), prxchange() 정규식 함수, 문자 결합 연산자('||'), cat 함수, cats 함수
    • SAS Data Step : put(), strip(), substr(), reverse(), prxchange() 정규식 함수, 문자 결합 연산자('||'), cat 함수, cats 함수
    • Python Dfply Package : .str.slice(), astype(str), 문자 결합 연산자('+')
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    파이프 연산자를 사용하여서 입사일자의 첫 2글자와 급여의 마지막 2글자를 연결하여서 신규 문자 변수를 생성한다.

    Oracle Programming
    select ename,hiredate,sal,
           SUBSTR(to_char(hiredate,'yyyymmdd'),1,2)||ENAME||substr(sal,-2,2) as ename_str
    from   emp

     


    2. Python Pandas(파이썬)

     

    Python Programming
    import copy
    
    withmooc = copy.copy(emp)
    withmooc
    
    withmooc['ename_string'] = withmooc['hiredate'].str.slice(0,2) + withmooc['ename'] + withmooc['sal'].astype(str).str.slice(-2,)
    display(withmooc.head())

     

    Results
      empno ename job mgr hiredate sal comm deptno ename_string
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 19SMITH00
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 19ALLEN00
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 19WARD50
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 19JONES75
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 19MARTIN50

     


    3. R Programming (R Package)

     

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['str_combined_1'] <- paste(base::substr(withmooc$hiredate,1,2),withmooc$ename,stringr::str_sub(withmooc$sal,-2) ,sep="")
    withmooc['str_combined_2'] <- paste0(base::substr(withmooc$hiredate,1,2),'',withmooc$ename,'', stringr::str_sub(withmooc$sal,-2))
    head(withmooc)

     

    Results
    # A tibble: 6 x 10
      empno ename  job        mgr hiredate     sal  comm deptno str_combined_1 str_combined_2
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>          <chr>         
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 19SMITH00      19SMITH00     
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 19ALLEN00      19ALLEN00     
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 19WARD50       19WARD50      
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 19JONES75      19JONES75     
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 19MARTIN50     19MARTIN50    
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 19BLAKE50      19BLAKE50     

     


    4. R Dplyr Package

     

    R Programming
    %%R
    
    library(stringi)
    
    emp %>% 
      dplyr::mutate(str_combined_2 = stringi::stri_join(stringr::str_sub(withmooc$hiredate,1,2),ename,stringr::str_sub(withmooc$sal,-2),sep="")) %>%
      dplyr::mutate(str_combined_3 = stringr::str_c(stringr::str_sub(withmooc$hiredate,1,2),ename,stringr::str_sub(withmooc$sal,-2),sep="")) %>%
      dplyr::mutate(str_combined_4 = base::paste(stringr::str_sub(withmooc$hiredate,1,2),ename,stringr::str_sub(withmooc$sal,-2),sep="")) %>%
      dplyr::mutate(str_combined_5 = base::paste0(stringr::str_sub(withmooc$hiredate,1,2),"",ename,"",stringr::str_sub(withmooc$sal,-2))) %>%
      head()

     

    Results
    # A tibble: 6 x 12
      empno ename  job        mgr hiredate     sal  comm deptno str_combined_2 str_combined_3 str_combined_4 str_combined_5
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>          <chr>          <chr>          <chr>         
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 19SMITH00      19SMITH00      19SMITH00      19SMITH00     
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 19ALLEN00      19ALLEN00      19ALLEN00      19ALLEN00     
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 19WARD50       19WARD50       19WARD50       19WARD50      
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 19JONES75      19JONES75      19JONES75      19JONES75     
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 19MARTIN50     19MARTIN50     19MARTIN50     19MARTIN50    
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 19BLAKE50      19BLAKE50      19BLAKE50      19BLAKE50     

     


    5. R sqldf Package

    • 현재 hiredate가 “real” 형태로 저장되어 있음. / sqlite에는 날짜형이 존재 안 함 (129번 sqldf)
    R Programming
    %%R
    
    sqldf(" select hiredate, 
                   typeof(hiredate) hire_type,
    
                   datetime(hiredate * 3600 * 24,'unixepoch') real_to_datetext,
                   typeof(datetime(hiredate * 3600 * 24,'unixepoch')) type2,
    
                   substr(datetime(hiredate * 3600 * 24,'unixepoch'),1,2) k,
    
                   strftime('%Y/%M/%d', datetime(hiredate * 3600 * 24,'unixepoch')) dateformat_chagne,
                   strftime('%Y-%M-%d', hiredate) date_error,
    
                   substr(datetime(hiredate * 3600 * 24,'unixepoch'),1,2)||ENAME||substr(cast(sal as int),-2,2) ename_1 
            from emp") %>% head(10)

     

    Results
         hiredate hire_type    real_to_datetext type2  k dateformat_chagne date_error    ename_1
    1  1980-12-17      real 1980-12-17 00:00:00  text 19        1980/00/17 -470-00-09  19SMITH00
    2  1981-02-20      real 1981-02-20 00:00:00  text 19        1981/00/20 -470-00-13  19ALLEN00
    3  1981-02-22      real 1981-02-22 00:00:00  text 19        1981/00/22 -470-00-15   19WARD50
    4  1981-04-02      real 1981-04-02 00:00:00  text 19        1981/00/02 -470-00-23  19JONES75
    5  1981-09-28      real 1981-09-28 00:00:00  text 19        1981/00/28 -470-00-21 19MARTIN50
    6  1981-03-01      real 1981-03-01 00:00:00  text 19        1981/00/01 -470-00-22  19BLAKE50
    7  1981-01-09      real 1981-01-09 00:00:00  text 19        1981/00/09 -470-00-02  19CLARK50
    8  1982-12-09      real 1982-12-09 00:00:00  text 19        1982/00/09 -470-00-01  19SCOTT00
    9  1981-11-17      real 1981-11-17 00:00:00  text 19        1981/00/17 -470-00-10   19KING00
    10 1981-09-08      real 1981-09-08 00:00:00  text 19        1981/00/08 -470-00-01 19TURNER00

     


    6. Python pandasql Package

     

    Python Programming
    ps.sqldf(" select hiredate, sal, \
                      substr(datetime(hiredate * 3600 * 24,'unixepoch'),1,2)||ENAME||substr(cast(sal as int),-2,2) ename_1 \
               from emp ").head()

     

    Results
      hiredate sal ename_1
    0 1980/12/17 800 19SMITH00
    1 1981/02/20 1600 19ALLEN00
    2 1981/02/22 1250 19WARD50
    3 1981/04/02 2975 19JONES75
    4 1981/09/28 1250 19MARTIN50

     


    7. R data.table Package

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    salgrade_DT <- data.table(salgrade)
    
    
    DT[, c('str_combined_2','str_combined_3','str_combined_4','str_combined_5') 
                := list( stringi::stri_join(stringr::str_sub(hiredate,1,2),ename,stringr::str_sub(sal,-2),sep="") ,
                          stringr::str_c(stringr::str_sub(hiredate,1,2),ename,stringr::str_sub(sal,-2),sep=""),
                          base::paste(stringr::str_sub(hiredate,1,2),ename,stringr::str_sub(sal,-2),sep=""),
                          base::paste0(stringr::str_sub(hiredate,1,2),"",ename,"",stringr::str_sub(sal,-2))
                       ) ]
    
    DT

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno str_combined_2 str_combined_3 str_combined_4 str_combined_5
     1:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      19SMITH00      19SMITH00      19SMITH00      19SMITH00
     2:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30      19ALLEN00      19ALLEN00      19ALLEN00      19ALLEN00
     3:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       19WARD50       19WARD50       19WARD50       19WARD50
     4:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      19JONES75      19JONES75      19JONES75      19JONES75
     5:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30     19MARTIN50     19MARTIN50     19MARTIN50     19MARTIN50
     6:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      19BLAKE50      19BLAKE50      19BLAKE50      19BLAKE50
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      19CLARK50      19CLARK50      19CLARK50      19CLARK50
     8:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20      19SCOTT00      19SCOTT00      19SCOTT00      19SCOTT00
     9:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       19KING00       19KING00       19KING00       19KING00
    10:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30     19TURNER00     19TURNER00     19TURNER00     19TURNER00
    11:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      19ADAMS00      19ADAMS00      19ADAMS00      19ADAMS00
    12:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30      19JAMES50      19JAMES50      19JAMES50      19JAMES50
    13:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       19FORD00       19FORD00       19FORD00       19FORD00
    14:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10     19MILLER00     19MILLER00     19MILLER00     19MILLER00

     


    8. SAS Proc SQL

     

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select ename,
               put(hiredate, yymmddn.)                                       as head_base,
               strip(put(sal,5.))                                            as tail_base,
               substr( strip(put(sal,5.)) ,length(strip(put(sal,5.))) -1,2 ) as last_char_1,
               reverse( substr( reverse( strip(put(sal,5.)) ),1,2) )         as last_char_2,
               prxchange('s/.*(\d{2})/$1/',-1, strip(put(sal,5.))  )         as last_char_3,
    
               SUBSTR( calculated head_base , 1,2)||compress(ENAME)|| calculated last_char_1 as concat_str format=$12.
    
        from emp;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=5);RUN;

     

    Results
    OBS ename head_base tail_base last_char_1 last_char_2 last_char_3 concat_str
    1 SMITH 19801217 800 00 00 00 19SMITH00
    2 ALLEN 19810220 1600 00 00 00 19ALLEN00
    3 WARD 19810222 1250 50 50 50 19WARD50
    4 JONES 19810402 2975 75 75 75 19JONES75
    5 MARTIN 19810928 1250 50 50 50 19MARTIN50

     


    9. SAS Data Step

     

    SAS Programming
    %%SAS sas
    
    DATA STATSAS_2; 
     SET emp; 
         head_base   = put(hiredate, yymmddn.);
         tail_base   = strip(put(sal,5.));
         last_char_1 = substr( strip(put(sal,5.)) ,length(strip(put(sal,5.))) -1,2 );
         last_char_2 = reverse( substr( reverse( strip(put(sal,5.)) ),1,2) );
         last_char_3 = prxchange('s/.*(\d{2})/$1/',-1, strip(put(sal,5.))  );
    
         concat_str = SUBSTR( head_base , 1,2)||compress(ENAME)|| last_char_1;
         keep empno head_base tail_base last_: concat_str;
    RUN;
    
    PROC PRINT Data=STATSAS_2(Obs=5);RUN;

     


    Results
    OBS empno head_base tail_base last_char_1 last_char_2 last_char_3 concat_str
    1 7369 19801217 800 00 00 00 19SMITH00
    2 7499 19810220 1600 00 00 00 19ALLEN00
    3 7521 19810222 1250 50 50 50 19WARD50
    4 7566 19810402 2975 75 75 75 19JONES75
    5 7654 19810928 1250 50 50 50 19MARTIN50

     


    10. Python Dfply Package

     

    Python Programming
    emp >> mutate( ename_string = X.hiredate.str.slice(0,2) + X.ename + X.sal.astype(str).str.slice(-2,) ) >> \
      head()
     

     

    Results
      empno ename job mgr hiredate sal comm deptno ename_string
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 19SMITH00
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 19ALLEN00
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 19WARD50
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 19JONES75
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 19MARTIN50

     


    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트

    반응형

    댓글