<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>export Archivi -</title>
	<atom:link href="https://simonetocco.it/tag/export/feed/" rel="self" type="application/rss+xml" />
	<link>https://simonetocco.it/tag/export/</link>
	<description></description>
	<lastBuildDate>Sat, 05 Mar 2016 10:31:03 +0000</lastBuildDate>
	<language>it-IT</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>

<image>
	<url>https://simonetocco.it/wp-content/uploads/2020/12/logoSimone-1-150x150.png</url>
	<title>export Archivi -</title>
	<link>https://simonetocco.it/tag/export/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Estrarre indirizzi email da Pagine Gialle</title>
		<link>https://simonetocco.it/estrarre-indirizzi-email-pagine-gialle/</link>
					<comments>https://simonetocco.it/estrarre-indirizzi-email-pagine-gialle/#comments</comments>
		
		<dc:creator><![CDATA[Simone Tocco]]></dc:creator>
		<pubDate>Sat, 05 Mar 2016 10:31:03 +0000</pubDate>
				<category><![CDATA[PERL]]></category>
		<category><![CDATA[Ricerca]]></category>
		<category><![CDATA[estrattore]]></category>
		<category><![CDATA[export]]></category>
		<category><![CDATA[extractor]]></category>
		<category><![CDATA[paginegialle]]></category>
		<category><![CDATA[Perl]]></category>
		<guid isPermaLink="false">http://simonetocco.it/?p=1346</guid>

					<description><![CDATA[<p>Qualche annetto fa, ho avuto la necessità di (far finta 😉 ) di ottenere svariati indirizzi email dal portale Pagine Gialle per contattare qualche azienda di una determinata categoria merceologia. In quel periodo studiavo PERL quindi perché non provarci 🙂 Ora, a distanza di anni, ritrovo lo script e da bravo Opensourcer pubblico qui l&#8217;estrattore. [&#8230;]</p>
<p>L'articolo <a href="https://simonetocco.it/estrarre-indirizzi-email-pagine-gialle/">Estrarre indirizzi email da Pagine Gialle</a> sembra essere il primo su <a href="https://simonetocco.it"></a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Qualche annetto fa, ho avuto la necessità di (far finta <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> ) di ottenere svariati indirizzi email dal portale Pagine Gialle per contattare qualche azienda di una determinata categoria merceologia. In quel periodo studiavo PERL quindi perché non provarci <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Ora, a distanza di anni, ritrovo lo script e da bravo Opensourcer pubblico qui l&#8217;estrattore.</p>
<p>&nbsp;</p>
<p>Non ho verificato se sia ancora possibile estrarre indirizzi email da Pagine Gialle poiché probabilmente il portale avrà cambiato qualche parametro. Sono sicuro però che con qualche semplice modifica può tornare funzionante <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> .</p>
<p>&nbsp;</p>
<p><em><strong>NB:</strong> inviare e-mail ad indirizzi che non hanno prestato il consenso è reato. Lo script è rilasciato a scopo di ricerca, si diffida ogni uso illecito.</em></p>
<p>&nbsp;</p>
<pre>#! usr/bin/perl</pre>
<pre>$queryString='http://www.paginegialle.it/pgol/p-1/4-';
$keywords = $ARGV[0];
$location = $ARGV[1];
$raggio = $ARGV[2];
unless(open DATI, '&gt;' . 'export.txt') {
die "\nNon è stato possibile creare il file di esportazione\n";
}
##########################################################################################################
######################## Creo il finto Browser ###########################################################
##########################################################################################################
#use strict; 
#use warnings; 
 
use LWP::UserAgent;
use HTTP::Cookies;</pre>
<pre>$ua = LWP::UserAgent-&gt;new();
print "Fake browser creato.\n";</pre>
<pre>$cookies = HTTP::Cookies-&gt;new(
file =&gt; "cookies.txt",
autosave =&gt; 1,
);
print "Ricezione cookies abilitata.\n";</pre>
<pre>$ua-&gt;cookie_jar($cookies);
print "Codifica cookies effettuata.\n";</pre>
<pre>$ua-&gt;agent("Windows IE 7");
print "Dichiarazione di essere il browser più scadente in circolazione comunicata ;)\n";</pre>
<pre>
##########################################################################################################
######################## Rendo GET friendly la keyword della categoria da ricercare ######################
##########################################################################################################
$keywords =~ s/\s/%20/g;
$queryString=$queryString.$keywords;</pre>
<pre>##########################################################################################################
######################## Rendo GET friendly la location da ricercare #####################################
##########################################################################################################
if($location){
 $location =~ s/\s/%20/g;
 $queryString = $queryString."/3-".$location; 
}</pre>
<pre>##########################################################################################################
######################## Porto al massimo il numero di risultati per una pagina ##########################
##########################################################################################################
$queryString = $queryString."?mr=50";</pre>
<pre>##########################################################################################################
######################## Rendo GET friendly il raggio entro il quale devo cercare ########################
##########################################################################################################
if($raggio){
 if($raggio == '25'){
 $queryString = $queryString."?&amp;enl=7";
 }elsif($raggio == '50'){
 $queryString = $queryString."?&amp;enl=8";
 }elsif($raggio == '100'){
 $queryString = $queryString."?&amp;enl=9";
 }else{
 die "Formato raggio non corretto. Inserire '25' '50' o '100'";
 }
}</pre>
<pre>print "Query string pronta.\n";
$pageNumber = 1;
##########################################################################################################
######################## Calcolo il numero di pagina da visualizzare #####################################
##########################################################################################################</pre>
<pre>for($i=1;$i&lt;=$pageNumber;$i++){</pre>
<pre>$response = $ua-&gt;get($queryString);
 print "Scaricata la pagina $queryString di dimensione ".length($response-&gt;decoded_content)." bytes\n";</pre>
<pre>unless($response-&gt;is_success) {
 print "\nErrore nello scaricamento file HTML: " . $response-&gt;status_line."\n";
 }</pre>
<pre>$fileOut = "result-$i.html";</pre>
<pre>unless(open OUT, '&gt;' . $fileOut) {
 die "\nNon è stato possibile creare il file '$fileOut'\n";
 }
print "Creato il file $fileOut\n";
# Setto la codifica
binmode(OUT, ":utf8");</pre>
<pre>print OUT $response-&gt;decoded_content;</pre>
<pre>close OUT;
$fileIn = "result-$i.html";
open(IN,$fileIn);
@contenuto=&lt;IN&gt;;</pre>
<pre>##########################################################################################################
######################## Verifico il numero di risultati trovati e lo memorizzo ##########################
##########################################################################################################
if($i==1){
 foreach (@contenuto){
 if ($_ =~/&lt;span class="h-bold"&gt;(\d+)&lt;\/span&gt;/){
 $resultNumber= $1;
 print "Trovati $resultNumber risultati.\n";
 break; 
 }
 }
}</pre>
<pre>$pageNumber = $resultNumber/50;
$quote = $pageNumber/int($pageNumber);
if($quote!=1){
 $pageNumber = sprintf("%d", $pageNumber)+1;
}</pre>
<pre>foreach (@contenuto)
{</pre>
<pre>chomp ($_);
 if($_=~/&lt;a title=\"Scheda Azienda (.+)\" class="_lms _noc"/)
 {
 $ragione_sociale = $1;
 $ragione_sociale =~ s/\&amp;amp\;/\&amp;/g;
 }
 elsif($_=~/\&lt;span class\=\"street-address\"\&gt;(.+)\, ([\w|\.|\s]+).+ class="locality"&gt;(\d\d\d\d\d) ([\w|\.|\s]+) \((\w\w)\)&lt;\/span&gt;/)
 {
 $civico = $1;
 $indirizzo = $2;
 $cap = $3;
 $citta = $4;
 $provincia = $5;
 }
 elsif($_=~/&lt;div class="address"&gt;/){
 $inizio = 1;
 }
 elsif($inizio==1 &amp;&amp; $_=~/&lt;span class\=\"type\"&gt; tel\:&lt;\/span&gt;/){
 $tel = 1;
 }
 elsif($inizio==1 &amp;&amp; $_=~/^&lt;span class=\"type\"&gt; fax\:&lt;\/span&gt;/){
 $fax = 1;
 }
 elsif($tel==1 &amp;&amp; (($_=~/^&lt;span class="value"&gt;([\d|\s]+), ([\d|\s]+)&lt;\/span&gt;/) || ($_=~/&lt;span class="value"&gt;([\d|\s]+)&lt;\/span&gt;/)) ){
 $tel1 = $1;
 $tel2 = $2;
 $tel = 0;
 }
 elsif($fax==1 &amp;&amp; $_=~/^&lt;span class="value"&gt;([\d|\s]+)&lt;\/span&gt;/ ){
 $fax1 = $1;
 $fax = 0;
 }
 elsif($inizio==1 &amp;&amp; $_=~/&lt;\/div&gt;/ ){
 $inizio = 0;
 print DATI "$ragione_sociale,$indirizzo,$cap,$citta,$provincia,$tel1,$tel2,$fax1\n";
 }
}
close(IN);</pre>
<pre>
#Ogni pagina visitata, setto la nuova pagina da visitare
$queryString =~ s/p-(\d*)/p-$i/;
}</pre>
<pre>
close (DATI);</pre>
<!--------------------------------------><!-- Conversion Box Made Using : -------><!-- WP Conversion Boxes - -------------><!-- http://wpconversionboxes.com --><!--------------------------------------><div class="wpcb_nothing_offset"></div>

<style>
    
    .wpcb_template_main_1{
        
        background-color: #0faf97;
        width: 100%;
        height: ;
        border-width: ;
        border-color: ;
        margin-top: ;
        margin-bottom: ;
        margin-left: ;
        margin-right: ;
        padding: 20px;
        -webkit-box-shadow: inset 0px 0px 200px -38px rgba(0,0,0,0.5);
        -moz-box-shadow: inset 0px 0px 200px -38px rgba(0,0,0,0.5);
        box-shadow: inset 0px 0px 200px -38px rgba(0,0,0,0.5);     
        
    }
    
    .wpcb_template_main_1 .wpcb_box_heading{
        background-color: ;
        padding-top: 0px;
    }
    
    .wpcb_template_main_1 .wpcb_box_heading_text{
        font-family:    'Arial', serif;
        font-size:      32px;
        line-height:    38px;
        color:          #ffffff;
        text-align:     center;
        text-shadow: 0px 3px 4px rgba(0, 0, 0, 0.25);
        font-weight: 900;
        margin: 0;
    }
    
    .wpcb_template_main_1 .wpcb_box_content_container{
        
    }    
    
    .wpcb_template_main_1 .wpcb_box_content{
        font-family:    Arial;
        font-size:      20px;
        line-height:    24px;
        color:          #ffffff;
        text-align:     center;
        padding: 20px 0px;
    }
    
    .wpcb_template_main_1 .wpcb_box_button_div{
        text-align: center;
    }
    
    .wpcb_template_main_1 .wpcb_box_button_div a.wpcb_box_button, .wpcb_template_main_1 .wpcb_box_button_div button.wpcb_box_button{
        font-family:    Arial;
        font-size:      16px;
        color:          #fff;
        background-color: #4f78f2;
        border-radius: 30px;
        width: ;
        padding: 10px 15px;
    }
    
    .wpcb_button_gradient{background-image : -moz-linear-gradient(top, #4f78f2, #1851f9);background-image : -ms-linear-gradient(top, #4f78f2, #1851f9);background-image : -webkit-gradient(linear, #4f78f2, #1851f9);background-image : -webkit-linear-gradient(top, #4f78f2, #1851f9);background-image : -o-linear-gradient(top, #4f78f2, #1851f9);background-image : linear-gradient(top, #4f78f2, #1851f9);filter : progid:DXImageTransform.Microsoft.gradient(startColorstr=&quot;#4f78f2&quot;, endColorstr=&quot;#1851f9&quot;, GradientType=0);border-color : +#4f78f2 #1851f9 #1851f9;background-color : #4f78f2;}    
        
</style>


<div class="wpcb_template_main wpcb_template_main_1 wpcb_nothing wpcb_nothing" data-fadetime="0">
    <div class="wpcb_box_all_content_container">
        <div class="wpcb_box_content_container">    
            <div class="wpcb_box_heading">
                <div class="wpcb_box_heading_text">Hai bisogno di una consulenza o assistenza?</div>
            </div>            
            <div class="wpcb_box_content">
                Apri un ticket di richiesta, ti risponderò in brevissimo tempo! Chiedere non costa nulla <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" />            </div>    
            <div class="wpcb_box_button_div">
                <a href="http://simonetocco.it/assistenza-e-consulenza/" target="" id="wpcb_box_button_1" class="wpcb_box_button wpcb_button_gradient">Apri Richiesta</a>
            </div>
        </div>    
    </div>
</div><!------------------------------><!-- Conversion Box Ends Here --><!------------------------------><div class="wpcb-tracker" data-id="3639057" data-boxid="1" data-visitedpage="https://simonetocco.it:443/tag/export/feed/" data-visittype="visit"></div><p>L'articolo <a href="https://simonetocco.it/estrarre-indirizzi-email-pagine-gialle/">Estrarre indirizzi email da Pagine Gialle</a> sembra essere il primo su <a href="https://simonetocco.it"></a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://simonetocco.it/estrarre-indirizzi-email-pagine-gialle/feed/</wfw:commentRss>
			<slash:comments>8</slash:comments>
		
		
			</item>
	</channel>
</rss>
