http://www.gnu.org/software/gawk/manual/gawk.html GAWK: Effective AWK Programming
AWK est un langage interprété de traitement de données sophistiqué. Plus évolué que grep, sed ou autres.
Le principe est d'appliquer pour chaque ligne d'un fichier en entrée un ensemble de règles successives.
“The awk utility reads the input files one line at a time. For each line, awk tries the patterns of each of the rules. If several patterns match, then several actions are run in the order in which they appear in the awk program. If no patterns match, then no actions are run. After processing all the rules that match the line (and perhaps there are none), awk reads the next line. (However, see Next Statement, and also see Nextfile Statement). This continues until the program reaches the end of the file.” [Effective AWK Programming]
Les données de référence pour les exemples :
$ cat addressbook.txt Amelia 555-5553 amelia.zodiacusque@gmail.com F Anthony 555-3412 anthony.asserturo@hotmail.com A Becky 555-7685 becky.algebrarum@gmail.com A Bill 555-1675 bill.drowning@hotmail.com A Broderick 555-0542 broderick.aliquotiens@yahoo.com R Camilla 555-2912 camilla.infusarum@skynet.be R Fabius 555-1234 fabius.undevicesimus@ucb.edu F Julie 555-6699 julie.perscrutabor@skeeve.com F Martin 555-6480 martin.codicibus@hotmail.com A Samuel 555-3430 samuel.lanceolis@shu.edu A Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R
Sélection de lignes.
Le programme awk :
#!/usr/bin/gawk.exe -f BEGIN { TEST="Test 2" print "------------- BEGIN AWK " TEST " -------------" } /thon/ { print } /li/ { print } END { print "------------- END AWK " TEST " -------------"
L'exécution :
$ ./test2.awk addressbook.txt ------------- BEGIN AWK Test 2 ------------- Amelia 555-5553 amelia.zodiacusque@gmail.com F Anthony 555-3412 anthony.asserturo@hotmail.com A Broderick 555-0542 broderick.aliquotiens@yahoo.com R Julie 555-6699 julie.perscrutabor@skeeve.com F Samuel 555-3430 samuel.lanceolis@shu.edu A ------------- END AWK Test 2 -------------
Substitution de caractères puis sélection de lignes.
Le programme awk :
#!/usr/bin/gawk.exe -f /lia/ { gsub(/lia/, "zzia")} /thon/ { print } /li/ { print }
L'exécution :
$ ./test3.awk addressbook.txt Anthony 555-3412 anthony.asserturo@hotmail.com A Broderick 555-0542 broderick.aliquotiens@yahoo.com R Julie 555-6699 julie.perscrutabor@skeeve.com F Samuel 555-3430 samuel.lanceolis@shu.edu A
Inversion du code de téléphone.
Le programme awk :
#!/usr/bin/gawk.exe -f /li/ { print gensub(/([0-9]+)-([0-9]+)/, "\\2-\\1", "g")}
Variante avec changement global :
#!/usr/bin/gawk.exe -f { $0=gensub(/([0-9]+)-([0-9]+)/, "\\2-\\1", "g") } /li/ { print }
L'exécution :
$ ./test3.awk addressbook.txt Amelia 5553-555 amelia.zodiacusque@gmail.com F Broderick 0542-555 broderick.aliquotiens@yahoo.com R Julie 6699-555 julie.perscrutabor@skeeve.com F Samuel 3430-555 samuel.lanceolis@shu.edu A
gensub
permet de récupérer des parties de la regexp pour les réutiliser (\n).
gensub
ne modifie pas la chaine cible : “It returns the modified string as the result of the function and the original target string is not changed.”. Pour modifier la ligne : $0=gensub(…)
#!/usr/bin/gawk.exe -f BEGIN { print "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"fr\" lang=\"fr\" >" print "<head>" print " <meta charset=\"UTF-8\" />" print " <style>" print " body {" print " margin: 0.3em;" print " padding: 0.4em;" print " font-family: monospace, \"Courier New\", Consolas;" print " color: #444;" print " font-size: 1.1em;" print " line-height: 1.2;" print " }" print " .Title {" print " display: inline-block ; " print " margin: 1.5em 0 0.5em 0;" print " padding: 0.1em 0.8em;" print " border: 1px solid rgba(100, 100, 100, 0.5);" print " border-radius: 128px;" print " box-shadow: /*0px 0px 4px 4px rgba(150, 150, 200, 0.5),*/ 2px 2px 8px 2px rgba(200, 200, 220, 0.5) ; " # print " color: #66d;" print " background-color: #ddd;" print " font-size: 1.2em;" print " font-weight: bold;" print " }" print " .Report {" print " display: inline-block ; " print " margin: 1.5em 0 0.5em 0;" print " padding: 0 1.2em 0.8em 1.2em;" print " border: 1px solid rgba(100, 100, 100, 0.5);" print " border-radius: 4px;" print " box-shadow: /*0px 0px 4px 4px rgba(150, 150, 200, 0.5),*/ 2px 2px 8px 2px rgba(200, 200, 220, 0.5) ; " print " color: #666;" print " background-color: #eee;" print " }" print " .Emphasise {" print " display: inline-block;" print " margin: 0.8em 0 0.5em 0;" print " color: #484;" print " font-weight: bold;" print " font-size: 1.1em;" print " }" print " .Error {" print " display: inline-block;" print " margin: 0.3em 0;" print " color: #C00;" print " font-weight: bold;" print " font-size: 1.2em;" print " }" print " .Added {" print " display: inline-block;" print " font-size: 0.9em;" print " padding: 0 0.6em 0.2em 0.6em;" print " color: #4a4;" print " background-color: #EFE;" print " }" print " .Deleted {" print " display: inline-block;" print " font-size: 0.9em;" print " padding: 0 0.6em 0.2em 0.6em;" print " color: #a44;" print " background-color: #FEE;" print " }" print " .EndProcess {" print " display: inline-block ; " print " margin: 0.5em 0 0.3em 0;" print " color: #669;" print " font-weight: bold;" print " font-size: 1.0em;" print " }" print " .grey {" print " display: inline-block;" print " color: #88F;" print " }" print " .Egal {" print " background-color: #ddd;" print " border-radius: 4px;" print " padding: 0 0.8em;" print " }" print " .Different {" print " background-color: #Fee;" print " border-radius: 4px;" print " padding: 0 0.8em;" print " }" print " .Date, .Time, .Duration {" print " padding: 0 0.4em;" print " border-radius: 6px;" print " color: #4a4;" print " background-color: #f4faf4;" print " }" print " table {" print " display: block ; " print " border-collapse:collapse;" print " border: 3px solid #ccc;" print " border-radius: 6px;" print " font-size: 90%;" print " }" print " tr th {" print " background-color: #eee;" print " }" print " caption {" print " font-weight:bold;" print " text-align: center;" print " }" print " th, td {" print " padding: 0.1em 0.6em;" print " vertical-align:middle;" print " border: 2px solid #ddd" print " }" print " .different {" print " color: #F00;" print " }" print " .right {" print " text-align: right;" print " }" print " </style>" print "</head><body><PRE>" FS=" " CYGDRIVE="/cygdrive/(.)"; LINE____="^[_]{25,}$"; COMPARE_REPORT=".*_WORKING_DIR.*|.*LOCAL_DIR.*"; EMPTY_LINE="^$"; DURATION="[0-9]*'[0-9]+\""; DATE="[0-9]{4}-[0-9]{2}-[0-9]{2}" TIME="[0-9]{2}:[0-9]{2}:[0-9]{2}" } # convert Cygwin path to Dos path function convert2DosPath() { return gensub(/\/cygdrive\/(.)/, toupper("\\1") ":", "g") } function setErrorLine() { gsub(/^ */, "") $0="<span class=\"Error\">" $0 "</span>" } function emphasiseLine() { gsub(/^ */, "") $0="<span class=\"Emphasise\">" $0 "</span>" } function createTitle() { getline; # mange les premiers ___________ while (match($0,EMPTY_LINE)) {getline} $0=convert2DosPath() gsub(/^ */, "") $0="<span class=\"Title\">" $0 "</span>" diskSizeCheck=match($0, "Disks size check") print; getline while (match($0,EMPTY_LINE)) {getline} getline; # mange les derniers ___________ while (match($0,EMPTY_LINE)) {getline} if (diskSizeCheck) createDfReportTable() } function createDfReportTable() { getline #bouffe le header print "<TABLE class=\"ReportTable\">" print "<tr><th>Device</th><th>Size</th><th>Utilisé</th><th>Dispo.</th><th>% util.</th><th>Monté sur</th></tr>" while (! match($0,EMPTY_LINE)) { if ($5 > 90) { print "<tr style=\"color:#fdd; background-color: #c00;\">" } else if ($5 > 80) { print "<tr style=\"color:#fdd; background-color: #c61;\">" } else if ($5 > 50) { print "<tr style=\"color:#383; background-color: #cfc;\">" } else { print "<tr style=\"color:#363; background-color: #efe;\">" } print "<td>"$1"</td><td align=\"right\">"$2"</td><td align=\"right\">"$3"</td><td align=\"right\">"$4"</td><td align=\"right\">"$5"</td><td>"$6"</td></tr>" getline } print "</TABLE>" } function createCompareReportTable() { print; getline # on affiche le "compareLocal" # Afficher les erreurs éventuelles while (! match($0, "^LOCAL report")) { if (! match($0,EMPTY_LINE)) { setErrorLine() print } getline } getline #bouffe la ligne "LOCAL report" getline #bouffe la ligne dir1 et dir2 print "<table>" print "<tr><th>Dir</th><th>Items </th><th>Size (octets)</th><th>Dir</th><th> items</th><th>Size (octets)</th></tr>" while (! match($0, "Total:")) { parseReportLine() } getline # Supprime la ligne Total: parseReportLine() print "</table>" getline } function setTd(val) { return "<td>" val "</td>" } function setTdDifferent(val) { return "<td class=\"different\">" val "</td>" } function setTdRight(val) { return "<td class=\"right\">" val "</td>" } function setTdRightDifferent(val) { return "<td class=\"right different\">" val "</td>" } function parseReportLine() { dir1="" dir2="" items1="" items2="" size1="" size2="" $0=gensub(/( {4,}[\|<>]{0,1} *)/, " ;-; ", 1) dir1=gensub(/([ ]*)(.*)( ;-; .*)/, "\\2", 1) dir2=gensub(/(.* ;-; *)([^:]*)(.*)/, "\\2", 1) getline if (dir1 == "") { $0=gensub(/^ *>/, " ;-; ", 1) items2=gensub(/.* ;-; ([0-9\.]*) items.*/, "\\1", 1) size2=gensub(/.* ;-; .* items, ([0-9\.]*) octets.*/, "\\1", 1) } else if (dir2 == "" ) { $0=gensub(/< *$/, " ;-; ", 1) items1=gensub(/^ *([0-9\.]*) items, .*/, "\\1", 1) size1=gensub(/^ *[0-9\.]* items, ([0-9\.]*) octets.*/, "\\1", 1) } else { $0=gensub(/( {10,}[\|<>]{0,1} *)/, " ;-; ", 1) items1=gensub(/^ *([0-9\.]*) items, .* ;-; .*/, "\\1", 1) size1=gensub(/^ *[0-9\.]* items, ([0-9\.]*) octets.* ;-; .*/, "\\1", 1) items2=gensub(/.* ;-; ([0-9\.]*) items.*/, "\\1", 1) size2=gensub(/.* ;-; .* items, ([0-9\.]*) octets.*/, "\\1", 1) } if ( dir1 != dir2 ) { lleft=setTdDifferent(dir1) lrigth=setTdDifferent(dir2) } else { lleft=setTd(dir1) lrigth=setTd(dir2) } if ( items1 != items2 ) { lleft = lleft setTdRightDifferent(items1) lrigth = lrigth setTdRightDifferent(items2) } else { lleft = lleft setTdRight(items1) lrigth = lrigth setTdRight(items2) } if ( size1 != size2 ) { lleft = lleft setTdRightDifferent(size1) lrigth = lrigth setTdRightDifferent(size2) } else { lleft = lleft setTdRight(size1) lrigth = lrigth setTdRight(size2) } print "<tr>" lleft lrigth "</tr>" getline } #remove all empty lines $0 ~ EMPTY_LINE { getline } # Time and duration formatting $0 ~ " " DATE { gsub(DATE, "<span class=\"Date\">&</span>") } $0 ~ TIME { gsub(TIME, "<span class=\"Time\">&</span>") } $0 ~ DURATION { gsub(DURATION, "<span class=\"Duration\">&</span>") } # Path patch $0 ~ CYGDRIVE { $0=convert2DosPath() } # Special lines formatting / - kept/ { $0="<span class=\"Added\">" $0 "</span>" } / - deleted|^deleting / { $0="<span class=\"Deleted\">" $0 "</span>" } /^rsync |log: |^compareDirs|ile created: / { emphasiseLine() } /^rsync error/ { setErrorLine() } /^Error / { setErrorLine() } /<span class="Emphasise">compareDirs/ { createCompareReportTable() } /.*: Fin de / { gsub(/\*/, ""); $0="<span class=\"EndProcess\">" $0 "</span>" } $0 ~ LINE____ {createTitle()} { print } # Print html file end END { print "</PRE></body></html>" }