본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 전처리- 문자함수 예제] 문자열 변경 - 72

by 기서무나구물 2021. 8. 30.

포스팅 목차

     

    72. Display the information from EMP table. Wherever job ‘manager’ is found it should be displayed as boss(replace function).

     

    * 직무 변수에서 직무명 'manager'를 'boss'로 변경하시오.


    • Oracle : replace() 함수
    • 파이썬 Pandas : .str.replace() 함수
    • R 프로그래밍 : base::gsub(), lapply(), function(x) 사용자 정의 함수, str_replace_all()
    • R Dplyr Package : str_replace_all() 함수, base::gsub() 함수
    • R sqldf Package : replace() 함수
    • Python pandasql Package : replace() 함수
    • R data.table Package : str_replace_all() 함수
    • SAS Proc SQL : tranwrd(), prxchange() 함수(정규식 함수)
    • SAS Data Step : tranwrd(), prxchange() 함수(정규식 함수)
    • Python Dfply Package : .str.replace()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    • replace 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.
    Oracle Programming
    select empno, ename, job, 
           replace(job, 'MANAGER', 'Boss') JOB_replace
    from   emp;

     


    2. Python Pandas(파이썬)

    • pandas.Series.str.replace 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.
    Python Programming
    import copy
    
    withmooc = copy.copy(emp)
    withmooc
    
    withmooc['job_replace'] = withmooc['job'].str.replace('MANAGER','Boss')
    display(withmooc.head(6))

     

    Results
      empno ename job mgr hiredate sal comm deptno job_replace
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 CLERK
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 SALESMAN
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 SALESMAN
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 Boss
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 SALESMAN
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 Boss

     


    3. R Programming (R Package)

    base::gsub 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.

    R Programming
    %%R
    
    withmooc <- emp
    
    withmooc['job_replace'] = base::gsub("MANAGER", "Boss", withmooc$job, fixed=TRUE)
    head(withmooc)

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno job_replace
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 CLERK      
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALESMAN   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALESMAN   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 Boss       
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALESMAN   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 Boss       

     


    stringi::stri_locate_all 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.

    R Programming
    %%R
    withmooc <- emp
    
    withmooc['job_replace'] = lapply(withmooc['job'], function(x) str_replace_all(x, "MANAGER" , "Boss") ) 
    head(withmooc)

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno job_replace
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 CLERK      
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALESMAN   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALESMAN   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 Boss       
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALESMAN   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 Boss       

     


    4. R Dplyr Package

    stringi::stri_locate_all 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.

    R Programming
    %%R
    
    emp %>% 
      dplyr::mutate(job_replace =  str_replace_all( job, "MANAGER" , "Boss" )) %>%
      head()

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno job_replace
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 CLERK      
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALESMAN   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALESMAN   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 Boss       
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALESMAN   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 Boss       

     


    gsub 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.

    R Programming
    %%R
    
    emp %>% 
      dplyr::mutate(job_replace =  gsub("MANAGER", "Boss", job, fixed=TRUE)) %>%
      head()

     

    Results
    # A tibble: 6 x 9
      empno ename  job        mgr hiredate     sal  comm deptno job_replace
      <dbl> <chr>  <chr>    <dbl> <date>     <dbl> <dbl>  <dbl> <chr>      
    1  7369 SMITH  CLERK     7902 1980-12-17   800    NA     20 CLERK      
    2  7499 ALLEN  SALESMAN  7698 1981-02-20  1600   300     30 SALESMAN   
    3  7521 WARD   SALESMAN  7698 1981-02-22  1250   500     30 SALESMAN   
    4  7566 JONES  MANAGER   7839 1981-04-02  2975    NA     20 Boss       
    5  7654 MARTIN SALESMAN  7698 1981-09-28  1250  1400     30 SALESMAN   
    6  7698 BLAKE  MANAGER   7839 1981-03-01  2850    NA     30 Boss       

     


    5. R sqldf Package

    replace 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.

    R Programming
    %%R
    
    sqldf("select empno, ename, 
                  job, 
                  replace(job, 'MANAGER', 'Boss') job_replace 
           from emp;") %>% head()

     

    Results
      empno  ename      job job_replace
    1  7369  SMITH    CLERK       CLERK
    2  7499  ALLEN SALESMAN    SALESMAN
    3  7521   WARD SALESMAN    SALESMAN
    4  7566  JONES  MANAGER        Boss
    5  7654 MARTIN SALESMAN    SALESMAN
    6  7698  BLAKE  MANAGER        Boss

     


    6. Python pandasql Package

    • replace 함수를 사용하여서 직무명(Job)에서 'MANAGER'를 'Boss'로 변경한다.
    Python Programming
    ps.sqldf("select empno, ename, replace(job, 'MANAGER', 'Boss') JOB from emp;").head()

     

    Results
      empno ename JOB
    0 7369 SMITH CLERK
    1 7499 ALLEN SALESMAN
    2 7521 WARD SALESMAN
    3 7566 JONES Boss
    4 7654 MARTIN SALESMAN

     


    7. R data.table Package

    stringi::stri_locate_all 함수를 사용하여서 직무명 'MANAGER'를 'Boss'로 변경한다.

    R Programming
    %%R
    
    DT <- data.table(emp)
    
    DT[,ename_str := str_replace_all( job, "MANAGER" , "Boss" )][1:7, ]

     

    Results
       empno  ename      job  mgr   hiredate  sal comm deptno ename_str
    1:  7369  SMITH    CLERK 7902 1980-12-17  800   NA     20     CLERK
    2:  7499  ALLEN SALESMAN 7698 1981-02-20 1600  300     30  SALESMAN
    3:  7521   WARD SALESMAN 7698 1981-02-22 1250  500     30  SALESMAN
    4:  7566  JONES  MANAGER 7839 1981-04-02 2975   NA     20      Boss
    5:  7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400     30  SALESMAN
    6:  7698  BLAKE  MANAGER 7839 1981-03-01 2850   NA     30      Boss
    7:  7782  CLARK  MANAGER 7839 1981-01-09 2450   NA     10      Boss

     


    8. SAS Proc SQL

    • Tranwrd()함수, Prxchange() 함수를 사용하여서 직무명(Job)에서 'MANAGER'를 'Boss'로 변경한다.
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select empno, job, 
               tranwrd(job, 'MANAGER', 'Boss')       as job_tranwrd,
               prxchange('s/MANAGER/Boss/', -1, job) as job_prxchange
        from   emp;
    QUIT;
    PROC PRINT data=STATSAS_1(obs=3);RUN;

     

    Results
    OBS empno job job_tranwrd job_prxchange
    1 7369 CLERK CLERK CLERK
    2 7499 SALESMAN SALESMAN SALESMAN
    3 7521 SALESMAN SALESMAN SALESMAN

     


    9. SAS Data Step

    • Tranwrd()함수, Prxchange() 함수를 사용하여서 직무명(Job)에서 'MANAGER'를 'Boss'로 변경한다.
    SAS Programming
    %%SAS sas
    
    DATA STATSAS_2; 
     SET emp;
         job_tranwrd   = tranwrd(job, 'MANAGER', 'Boss');
         job_prxchange = prxchange('s/MANAGER/Boss/', -1, job);
         keep ename empno job_tranwrd job_prxchange;
    RUN;
    
    
    PROC PRINT data=STATSAS_2(obs=3);RUN;

     

    Results
    OBS empno ename job_tranwrd job_prxchange
    1 7369 SMITH CLERK CLERK
    2 7499 ALLEN SALESMAN SALESMAN
    3 7521 WARD SALESMAN SALESMAN

     


    10. Python Dfply Package

    • replace 함수를 사용하여서 직무명(Job)에서 'MANAGER'를 'Boss'로 변경한다.
    Python Programming
    emp >> \
      mutate( job_replace = X.job.str.replace('MANAGER','Boss') ) >> head()

     

    Results
      empno ename job mgr hiredate sal comm deptno job_replace
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 CLERK
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 SALESMAN
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 SALESMAN
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 Boss
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 SALESMAN

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

    반응형

    댓글