慢度數據中隨機::

您好我使用Data::Random模塊，用於生成隨機日期但其用於產生1百萬樣本數據非常慢。如何提高速度呢？這是我嘗試過的代碼。慢度數據中隨機::

#!/usr/bin/perl -w 

use Data::Random qw(:all); 

my $randDate_Start = '1900-01-01'; 
my $randDate_End = '2010-12-31'; 

open Outfile, ">", "D:/Test.txt"; 

for(0..1000000) 
{ 
    my $randDate = rand_date(min=>$randDate_Start, max=>$randDate_End); 
    print Outfile $randDate."\n"; 
} 

close Outfile;

有沒有其他的方法來生成隨機日期

來源

2014-09-11 lazy

生成較少的樣本數據？ – 2014-09-11 04:51:48

@ialarmedalien：汽車的速度並不取決於你駕駛的距離，Data :: Random的速度不受你所做的世代數量的影響。 – 2014-09-11 04:54:47

@RenéNffeffegger，但您的汽車行程所需的時間取決於行駛的距離。再一次，這不是目的地，這是重要的旅程。 – 2014-09-11 05:00:07

我建議使用Time::Piece。

它顯示性能提高6倍，如下面的基準所示。

如果您緩存可能的日期值，你可以得到所有百萬價值的幾乎瞬時的結果：

#!/usr/bin/perl -w 
use strict; 
use warnings; 
use autodie; 

use Benchmark; 
use Data::Random qw(:all); 
use Time::Piece; 
use Time::Seconds; 

my $randDate_Start = '1900-01-01'; 
my $randDate_End = '2010-12-31'; 

my $tp_start = Time::Piece->strptime("$randDate_Start 12:00:00", "%Y-%m-%d %T"); 
my $tp_end = Time::Piece->strptime("$randDate_End 12:00:00", "%Y-%m-%d %T"); 
my $tp_days = ($tp_end - $tp_start)->days; 

my @tp_cached = map { ($tp_start + ONE_DAY * $_)->strftime('%Y-%m-%d') } (0 .. $tp_days); 

# Compare Data Methods 
timethese(
    1_000_000, 
    { 'Data::Random'   => sub { rand_date(min => $randDate_Start, max => $randDate_End) }, 
     'Time::Piece'   => sub { ($tp_start + ONE_DAY * int rand $tp_days)->strftime('%Y-%m-%d') }, 
     'Time::Piece (cached)' => sub { $tp_cached[ rand $tp_days ] }, 
    } 
);

輸出：

Benchmark: timing 1000000 iterations of Data::Random, Time::Piece, Time::Piece (cached)... 
Data::Random: 61 wallclock secs (60.20 usr + 0.07 sys = 60.27 CPU) @ 16592.00/s (n=1000000) 
Time::Piece: 10 wallclock secs (9.95 usr + 0.01 sys = 9.96 CPU) @ 100401.61/s (n=1000000) 
Time::Piece (cached): 0 wallclock secs (0.08 usr + 0.00 sys = 0.08 CPU) @ 12500000.00/s (n=1000000) 
      (warning: too few iterations for a reliable count)

來源

2014-09-11 07:31:22 Miller

我就展開了循環開始。您可能無法將其展開一百萬次，但您可能展開大量次並且循環次數減少很多。這將有助於加速它，因爲它不必爲下一個項目分支。我做了一個簡短的測試，速度提高了5到10倍。以下是我會提出的100萬環（如果我有我的數學正確的:)）

# Declare the variable before the loop 
my $randDate; 
# Statement is what we want to execute a number of times 
my $statement = "$randDate = rand_date(min=>$randDate_Start, max=>$randDate_End);print Outfile \$randDate.\"\\n\";" 
# Replicate the statement 1000 times 
$statement = $statement x 1000; 
# Get the time we started (to the second) 
my $start = time(); 
# Loop 1000 times to make a 1 million items 
for(0..1000) 
{ 
    # Evaluate the 1000 statements 
    eval($statement); 
} 
# Determine the amount of time it took 
my $diff = time() - $start; 
# Print out the time 
print "Time taken is: $diff\n";

當我這樣做，花了107秒如果我循環一百萬次，28秒，如果我使用上述方法生成1百萬個項目。

如果速度不夠快，那麼您可能需要定期生成日期。鑑於範圍，將有111年和365.25天每年，這將是40543日期的範圍。這可能會在開始時產生一次。您可以爲每個日期的時間範圍製作一個數組。然後使用rand可以生成一個介於0和40543之間的數字。這會給你和索引到數組中以便選擇日期。如果上面提供了足夠的加速比，那麼這比以上要多一點。

來源

2014-09-11 05:28:11 Glenn

謝謝!!!我會嘗試你的方法* 1百萬*現在，然後我必須嘗試* 10億*！你可以給**例程**嗎？我從來沒有聽說過... – lazy 2014-09-11 05:45:20

0..1000是1001次，而不是1000次 – ysth 2014-09-11 06:37:04

你的代碼非常快，因爲eval失敗了。沒有辦法打開這種速度。 – ysth 2014-09-11 06:39:26

使用第二種技術是什麼@Glenn建議，不使用任何優化

use 5.010; 
use strict; 
use warnings; 
use Date::Calc qw(Delta_Days Add_Delta_Days); 

#create an array for each day 
my $numdays = Delta_Days(1900,1,1, 2010,12,31) + 1; 
my @dates = map { sprintf("%d-%02d-%02d", Add_Delta_Days(1900,1,1, $_)) } 0..$numdays; 

say $dates[ rand($numdays) ] for(1..100_000_000);

爲100_000_000運行

$ time perl dat | wc -l 
100000000 

real 0m32.227s 
user 0m31.439s 
sys  0m1.159s

。對於1千萬是1.2秒...

來源

2014-09-11 06:45:46 jm666

慢度數據中隨機::

回答

相關問題