Table des matières

awk

Références

http://www.gnu.org/software/gawk/manual/gawk.html GAWK: Effective AWK Programming

http://en.wikipedia.org/wiki/AWK

Introduction

AWK est un langage interprété de traitement de données sophistiqué. Plus évolué que grep, sed ou autres.

Le principe est d'appliquer pour chaque ligne d'un fichier en entrée un ensemble de règles successives.

“The awk utility reads the input files one line at a time. For each line, awk tries the patterns of each of the rules. If several patterns match, then several actions are run in the order in which they appear in the awk program. If no patterns match, then no actions are run. After processing all the rules that match the line (and perhaps there are none), awk reads the next line. (However, see Next Statement, and also see Nextfile Statement). This continues until the program reaches the end of the file.” [Effective AWK Programming]

Données de référence

Les données de référence pour les exemples :

$ cat addressbook.txt
Amelia       555-5553     amelia.zodiacusque@gmail.com    F
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Becky        555-7685     becky.algebrarum@gmail.com      A
Bill         555-1675     bill.drowning@hotmail.com       A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Camilla      555-2912     camilla.infusarum@skynet.be     R
Fabius       555-1234     fabius.undevicesimus@ucb.edu    F
Julie        555-6699     julie.perscrutabor@skeeve.com   F
Martin       555-6480     martin.codicibus@hotmail.com    A
Samuel       555-3430     samuel.lanceolis@shu.edu        A
Jean-Paul    555-2127     jeanpaul.campanorum@nyu.edu     R

Exemple 1

Sélection de lignes.

Le programme awk :

#!/usr/bin/gawk.exe -f
BEGIN {
   TEST="Test 2"
   print "------------- BEGIN AWK " TEST " -------------"
}

/thon/ { print }
/li/ { print }

END {
   print "------------- END AWK " TEST " -------------"

L'exécution :

$ ./test2.awk addressbook.txt
------------- BEGIN AWK Test 2 -------------
Amelia       555-5553     amelia.zodiacusque@gmail.com    F
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Julie        555-6699     julie.perscrutabor@skeeve.com   F
Samuel       555-3430     samuel.lanceolis@shu.edu        A
------------- END AWK Test 2 -------------
Cet exemple montre que l'ordre des règles de traitement n'a pas d'importance : pour chaque ligne successive, les règles sont appliquées.

Exemple 2

Substitution de caractères puis sélection de lignes.

Le programme awk :

#!/usr/bin/gawk.exe -f
/lia/ { gsub(/lia/, "zzia")}
/thon/ { print }
/li/ { print }

L'exécution :

$ ./test3.awk addressbook.txt
Anthony      555-3412     anthony.asserturo@hotmail.com   A
Broderick    555-0542     broderick.aliquotiens@yahoo.com R
Julie        555-6699     julie.perscrutabor@skeeve.com   F
Samuel       555-3430     samuel.lanceolis@shu.edu        A
Cet exemple montre que l'ordre des règles compte ici : la règle 1 impacte la règle 3

Exemple 3

Inversion du code de téléphone.

Le programme awk :

#!/usr/bin/gawk.exe -f
/li/ { print gensub(/([0-9]+)-([0-9]+)/, "\\2-\\1", "g")}

Variante avec changement global :

#!/usr/bin/gawk.exe -f
{ $0=gensub(/([0-9]+)-([0-9]+)/, "\\2-\\1", "g") }
/li/ { print }

L'exécution :

$ ./test3.awk addressbook.txt
Amelia       5553-555     amelia.zodiacusque@gmail.com    F
Broderick    0542-555     broderick.aliquotiens@yahoo.com R
Julie        6699-555     julie.perscrutabor@skeeve.com   F
Samuel       3430-555     samuel.lanceolis@shu.edu        A
Seul gensub permet de récupérer des parties de la regexp pour les réutiliser (\n).
A la différence de sub et gsub, gensub ne modifie pas la chaine cible : “It returns the modified string as the result of the function and the original target string is not changed.”. Pour modifier la ligne : $0=gensub(…)

Exemple script complexe

log2html.awk
#!/usr/bin/gawk.exe -f
BEGIN {
   print "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"fr\" lang=\"fr\" >"
   print "<head>"
   print "  <meta charset=\"UTF-8\" />"
   print "  <style>"
   print "		body {"
   print "			margin: 0.3em;"
   print "			padding: 0.4em;"
   print "			font-family: monospace, \"Courier New\", Consolas;"
   print "			color: #444;"
   print "			font-size: 1.1em;"
   print "			line-height: 1.2;"
   print "		}"
   print "		.Title {"
   print "			display: inline-block ; "
   print "			margin: 1.5em 0 0.5em 0;"
   print "			padding: 0.1em 0.8em;"
   print "			border: 1px solid rgba(100, 100, 100, 0.5);"
   print "        border-radius: 128px;"
	print "		   box-shadow: /*0px 0px 4px 4px rgba(150, 150, 200, 0.5),*/ 2px 2px 8px 2px rgba(200, 200, 220, 0.5) ; "
   # print "			color: #66d;"
   print "			background-color: #ddd;"
   print "			font-size: 1.2em;"
   print "			font-weight: bold;"
   print "		}"
   print "		.Report {"
   print "			display: inline-block ; "
   print "			margin: 1.5em 0 0.5em 0;"
   print "			padding: 0 1.2em 0.8em 1.2em;"
   print "			border: 1px solid rgba(100, 100, 100, 0.5);"
   print "        border-radius: 4px;"
	print "		   box-shadow: /*0px 0px 4px 4px rgba(150, 150, 200, 0.5),*/ 2px 2px 8px 2px rgba(200, 200, 220, 0.5) ; "
   print "			color: #666;"
   print "			background-color: #eee;"
   print "		}"
   print "		.Emphasise {"
   print "			display: inline-block;"
   print "			margin: 0.8em 0 0.5em 0;"
   print "			color: #484;"
   print "			font-weight: bold;"
   print "			font-size: 1.1em;"
   print "		}"
   print "		.Error {"
   print "			display: inline-block;"
   print "			margin: 0.3em 0;"
   print "			color: #C00;"
   print "			font-weight: bold;"
   print "			font-size: 1.2em;"
   print "		}"
   print "		.Added {"
   print "			display: inline-block;"
   print "			font-size: 0.9em;"
   print "			padding: 0 0.6em 0.2em 0.6em;"
   print "			color: #4a4;"
   print "			background-color: #EFE;"
   print "		}"
   print "		.Deleted {"
   print "			display: inline-block;"
   print "			font-size: 0.9em;"
   print "			padding: 0 0.6em 0.2em 0.6em;"
   print "			color: #a44;"
   print "			background-color: #FEE;"
   print "		}"
   print "		.EndProcess {"
   print "			display: inline-block ; "
   print "			margin: 0.5em 0 0.3em 0;"
   print "			color: #669;"
   print "			font-weight: bold;"
   print "			font-size: 1.0em;"
   print "		}"
   print "		.grey {"
   print "			display: inline-block;"
   print "			color: #88F;"
   print "		}"
   print "		.Egal {"
   print "			background-color: #ddd;"
   print "        border-radius: 4px;"
   print "			padding: 0 0.8em;"
   print "		}"
   print "		.Different {"
   print "			background-color: #Fee;"
   print "        border-radius: 4px;"
   print "			padding: 0 0.8em;"
   print "		}"
   print "		.Date, .Time, .Duration {"
   print "			padding: 0 0.4em;"
   print "        border-radius: 6px;"
   print "			color: #4a4;"
   print "			background-color: #f4faf4;"
   print "		}"
 
   print "		table {"
   print "			display: block ; "
   print "			border-collapse:collapse;"
   print "			border: 3px solid #ccc;"
   print "        border-radius: 6px;"
   print "			font-size: 90%;"
   print "		}"
   print "		tr th {"
   print "			background-color: #eee;"
   print "		}"
   print "		caption {"
   print "			font-weight:bold;"
   print "			text-align: center;"
   print "		}"
   print "		th, td {"
   print "			padding: 0.1em 0.6em;"
   print "			vertical-align:middle;"
   print "			border: 2px solid #ddd"
   print "		}"
   print "		.different {"
   print "			color: #F00;"
   print "		}"
   print "		.right {"
   print "			text-align: right;"
   print "		}"
   print "	</style>"
   print "</head><body><PRE>"
 
   FS=" "
   CYGDRIVE="/cygdrive/(.)";
   LINE____="^[_]{25,}$";
   COMPARE_REPORT=".*_WORKING_DIR.*|.*LOCAL_DIR.*";
   EMPTY_LINE="^$";
   DURATION="[0-9]*'[0-9]+\"";
   DATE="[0-9]{4}-[0-9]{2}-[0-9]{2}"
   TIME="[0-9]{2}:[0-9]{2}:[0-9]{2}"
}
# convert Cygwin path to Dos path   
function convert2DosPath() {
   return gensub(/\/cygdrive\/(.)/, toupper("\\1") ":", "g")
}
 
function setErrorLine() {
   gsub(/^ */, "")
   $0="<span class=\"Error\">" $0 "</span>"
}
 
function emphasiseLine() {
   gsub(/^ */, "")
   $0="<span class=\"Emphasise\">" $0 "</span>"
}
 
function createTitle() {
   getline; # mange les premiers ___________
   while (match($0,EMPTY_LINE)) {getline}
   $0=convert2DosPath()
   gsub(/^ */, "")
   $0="<span class=\"Title\">" $0 "</span>"
   diskSizeCheck=match($0, "Disks size check")
   print; getline
   while (match($0,EMPTY_LINE)) {getline}
   getline; # mange les derniers ___________
   while (match($0,EMPTY_LINE)) {getline}
   if (diskSizeCheck) createDfReportTable()
}
 
function createDfReportTable() {
   getline #bouffe le header
   print "<TABLE class=\"ReportTable\">"
   print "<tr><th>Device</th><th>Size</th><th>Utilisé</th><th>Dispo.</th><th>% util.</th><th>Monté sur</th></tr>"
   while (! match($0,EMPTY_LINE)) {
      if ($5 > 90) {
         print "<tr style=\"color:#fdd; background-color: #c00;\">"
      } else if ($5 > 80) {
         print "<tr style=\"color:#fdd; background-color: #c61;\">"
      } else if ($5 > 50) {
         print "<tr style=\"color:#383; background-color: #cfc;\">"
      } else { 
         print "<tr style=\"color:#363; background-color: #efe;\">"
      }
      print "<td>"$1"</td><td align=\"right\">"$2"</td><td align=\"right\">"$3"</td><td align=\"right\">"$4"</td><td align=\"right\">"$5"</td><td>"$6"</td></tr>"
      getline
   }
   print "</TABLE>"
}
 
function createCompareReportTable() {
   print; getline # on affiche le "compareLocal"
   # Afficher les erreurs éventuelles
   while (! match($0, "^LOCAL report")) {
      if (! match($0,EMPTY_LINE)) {
         setErrorLine()
         print
      }
      getline
   }
   getline #bouffe la ligne "LOCAL report"
   getline #bouffe la ligne dir1 et dir2
   print "<table>"
   print "<tr><th>Dir</th><th>Items </th><th>Size (octets)</th><th>Dir</th><th> items</th><th>Size (octets)</th></tr>"
   while (! match($0, "Total:")) {
      parseReportLine()   
   }
   getline # Supprime la ligne Total:
   parseReportLine()
   print "</table>"
   getline
}
 
function setTd(val) {
   return "<td>" val "</td>"
}
function setTdDifferent(val) {
   return "<td class=\"different\">" val "</td>"
}
function setTdRight(val) {
   return "<td class=\"right\">" val "</td>"
}
function setTdRightDifferent(val) {
   return "<td class=\"right different\">" val "</td>"
}
 
function parseReportLine() {
   dir1=""
   dir2=""
   items1=""
   items2=""
   size1=""
   size2=""
   $0=gensub(/( {4,}[\|<>]{0,1} *)/, " ;-; ", 1)
   dir1=gensub(/([ ]*)(.*)( ;-; .*)/, "\\2", 1)
   dir2=gensub(/(.* ;-; *)([^:]*)(.*)/, "\\2", 1)
   getline
   if (dir1 == "") {
      $0=gensub(/^ *>/, "  ;-;  ", 1)
      items2=gensub(/.*  ;-;  ([0-9\.]*) items.*/, "\\1", 1)
      size2=gensub(/.*  ;-;  .* items, ([0-9\.]*) octets.*/, "\\1", 1)
   } else if (dir2 == "" ) {
      $0=gensub(/< *$/, "  ;-;  ", 1)
      items1=gensub(/^ *([0-9\.]*) items, .*/, "\\1", 1)
      size1=gensub(/^ *[0-9\.]* items, ([0-9\.]*) octets.*/, "\\1", 1)
   } else {
      $0=gensub(/( {10,}[\|<>]{0,1} *)/, "  ;-;  ", 1)
      items1=gensub(/^ *([0-9\.]*) items, .*  ;-;  .*/, "\\1", 1)
      size1=gensub(/^ *[0-9\.]* items, ([0-9\.]*) octets.*  ;-;  .*/, "\\1", 1)
      items2=gensub(/.*  ;-;  ([0-9\.]*) items.*/, "\\1", 1)
      size2=gensub(/.*  ;-;  .* items, ([0-9\.]*) octets.*/, "\\1", 1)
   }
 
    if ( dir1 != dir2 ) {
      lleft=setTdDifferent(dir1)
      lrigth=setTdDifferent(dir2)
   } else {
      lleft=setTd(dir1)
      lrigth=setTd(dir2)
   }
   if ( items1 != items2 ) {
      lleft = lleft setTdRightDifferent(items1)
      lrigth = lrigth setTdRightDifferent(items2)
   } else {
      lleft = lleft setTdRight(items1)
      lrigth = lrigth setTdRight(items2)
   }
   if ( size1 != size2 ) {
      lleft = lleft setTdRightDifferent(size1)
      lrigth = lrigth setTdRightDifferent(size2)
   } else {
      lleft = lleft setTdRight(size1)
      lrigth = lrigth setTdRight(size2)
   }
   print  "<tr>" lleft lrigth "</tr>"
   getline
}
#remove all empty lines
$0 ~ EMPTY_LINE    { getline }
 
# Time and duration formatting
$0 ~ " " DATE { gsub(DATE, "<span class=\"Date\">&</span>") }
$0 ~ TIME { gsub(TIME, "<span class=\"Time\">&</span>") }
$0 ~ DURATION { gsub(DURATION, "<span class=\"Duration\">&</span>") }
 
# Path patch
$0 ~ CYGDRIVE { $0=convert2DosPath() }
 
# Special lines formatting
/ - kept/ { $0="<span class=\"Added\">" $0 "</span>" }
/ - deleted|^deleting / { $0="<span class=\"Deleted\">" $0 "</span>" }
/^rsync |log: |^compareDirs|ile created: / { emphasiseLine() }
 
/^rsync error/ { setErrorLine() }
/^Error / { setErrorLine() }
 
/<span class="Emphasise">compareDirs/ { createCompareReportTable() }
 
/.*: Fin de / { 
   gsub(/\*/, ""); 
   $0="<span class=\"EndProcess\">" $0 "</span>" 
}
 
$0 ~ LINE____ {createTitle()}
 
{ print }
 
# Print html file end
END { print "</PRE></body></html>" }