使用關係數據庫跟蹤實驗數據的演變

我沒有關係數據庫的經驗，在編寫C++代碼來實現我的問題的解決方案之前，我想檢查一下使用數據庫是否會提供一個簡單的解決方案。這是我的問題：使用關係數據庫跟蹤實驗數據的演變

我有一套物理樣本和簡單的測量，它會在每個樣本上產生一個實數結果。對所有可用樣本進行多次測量（定期添加新樣本），結果以SAMPLE_ID和RESULT列的形式存儲在數據庫中。每個測量都存儲爲包含其結果的新表格（表格名稱標識特定測量）。或者，如果更有意義，每次測量都會在當前結果的全局表中添加一列（列名標識特定的測量值）。我將通過C++ API創建表並以相同的方式接收報表（查詢結果）。我需要至少兩份報告（簡單的ASCII文本很好）：

列出所有樣本的最佳（最高）結果。
對於測量的一個小子集，最近測量結果比任何前一個（在選定子集中）更差（更低）的樣本列表。

什麼是生成每個報告的數據庫查詢？

來源

2012-02-05 Paul Jurczak

那麼，到底你在問什麼？ – 2012-02-05 21:11:00

YOu是對的，數據庫似乎是一個解決方案。現在，您需要一位設計數據庫的分析師。一個代碼開發人員，使用戶葉終端界面。還有一位報告專家來撰寫報告。我認爲你可以測試一些東西並分解問題。我投票結束。 – danihp 2012-02-05 21:12:00

Sample_id似乎不是唯一的。您至少需要第三欄包含數據+測量時間，甚至可能需要第二個表格。 – wildplasser 2012-02-05 21:15:08

是的，一個數據庫可以很好地工作。

您將需要一個列來存儲日期或時間戳，以便您可以區分樣本結果。沒有這樣的專欄，「最近的測量」是沒有意義的。（表格中行的順序本質上是沒有意義的。）

您可能不需要任何人開發前端;嘗試手動輸入數據或通過dbms的批量加載器加載CSV文件。（每個現代dbms都有一個;它們的名稱各不相同）。

而且您可能不需要報告專家來構建報告。查詢輸出通常是研究中需要的。

一些查詢很簡單，其他查詢可能並不簡單，但至少很簡單。下面的代碼在PostgreSQL中進行了測試，但應該支持任何支持公用表表達式和行構造函數的dbms。

create table measurements (
    sample_id integer not null, 
    measurement_time timestamp not null, 
    measurement real not null check(measurement >= 0 and measurement <= 30), 
    primary key (sample_id, measurement_time) 
); 

insert into measurements values 
(1, '2012-02-02 08:03', 13.89), 
(2, '2012-02-02 00:00', 13.86), 
(1, '2012-02-02 00:25', 25.07), 
(1, '2012-02-02 03:32', 25.38), 
(1, '2012-02-02 05:47', 16.64), 
(2, '2012-02-02 08:03', 16.16), 
(2, '2012-02-02 07:25', 25.85), 
(3, '2012-02-02 08:03', 14.78), 
(3, '2012-02-02 09:29', 17.08), 
(3, '2012-02-02 10:31', 13.41), 
(4, '2012-02-02 12:38', 20.98), 
(5, '2012-02-02 08:03', 25.00), 
(5, '2012-02-02 14:02', 16.27), 
(5, '2012-02-02 03:32', 12.10), 
(5, '2012-02-02 17:47', 21.34), 
(6, '2012-02-02 18:32', 17.16), 
(6, '2012-02-02 18:33', 21.59), 
(7, '2012-02-02 20:07', 21.47), 
(8, '2012-02-02 21:58', 11.50), 
(8, '2012-02-02 22:53', 21.01); 

-- All samples with their highest measurement. 
select sample_id, max(measurement) 
from measurements 
group by sample_id 
order by sample_id; 

-- Most recent measurement lower than any preceeding measurement. 
-- Another way of saying this is that the max() measurement isn't the 
-- latest measurement. 
with max_measurements as (
    select m.* 
    from measurements m 
    inner join (select sample_id, max(measurement) measurement 
       from measurements 
       group by sample_id) max_m 
     on max_m.sample_id = m.sample_id 
    and max_m.measurement = m.measurement 
), 
latest_measurement as (
    select m.* 
    from measurements m 
    inner join (select sample_id, max(measurement_time) measurement_time 
       from measurements 
       group by sample_id) max_m 
     on max_m.sample_id = m.sample_id 
    and max_m.measurement_time = m.measurement_time 
) 
select m.* 
from max_measurements m 
where row(m.sample_id, m.measurement_time) not in (select sample_id, measurement_time 
                from latest_measurement);

來源

2012-02-05 22:16:03

謝謝！這是我期待的答案。我喜歡你的單一表格設計（SAMPLE_ID，MEASUREMENT_TIME，MEASUREMENT_RESULT），它們在排序時有助於數據局部化 - 更快的處理。 – 2012-02-05 23:22:23

你的問題 - 我們所知道的 - 是一個單表問題。這實際上非常罕見。（好吧，兩張表，如果你有更多關於樣品的信息，而不僅僅是身份證號碼。） – 2012-02-06 03:59:55

使用關係數據庫跟蹤實驗數據的演變

回答

相關問題