MDF Files currupted: My table design

CREATE TABLE Sales1
(
varchar (10) CustID,
varchar (10) TransID,
datetime SaleDate,
money S1,
money S2,
money S3,
money S4,
numeric V1
)
CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
"money" is just the right size for my fields. This table has 9,500,000
records.
Although I need to do this select in less than a half second, it takes 1
full minute:
SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
And I need to do this select in less than a half second, but it takes 3
minutes:
SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
Am I supposed to create a new field with these values pre-calculated? I hope
not, because I have several other formulas - up to 500 different types of
selects which are all similar.
I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and 2GB
of memory.
ThanksHi
The 1st query can use the index your created on SaleDate.
The 2nd query has no option, but to read every row (9.5 million) in the
table to find where S3 > S4, and then based on that subset, so the S1/S2
evaluation.
Either add and index, to restrict the select to something that can use one
of the indexes.
Regards
--
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
IM: mike@.epprecht.net
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Rich" <no@.spam.invalid> wrote in message
news:G_uGe.53624$4o.5775@.fed1read06...
> CREATE TABLE Sales1
> (
> varchar (10) CustID,
> varchar (10) TransID,
> datetime SaleDate,
> money S1,
> money S2,
> money S3,
> money S4,
> numeric V1
> )
> CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
> CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
> "money" is just the right size for my fields. This table has 9,500,000
> records.
> Although I need to do this select in less than a half second, it takes 1
> full minute:
> SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
> And I need to do this select in less than a half second, but it takes 3
> minutes:
> SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
> Am I supposed to create a new field with these values pre-calculated? I
> hope
> not, because I have several other formulas - up to 500 different types of
> selects which are all similar.
> I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and
> 2GB
> of memory.
> Thanks
>
>|||Shoot... SQL server might not work for me. I have to be able to calculate
sums, averages, etc. based on a number of different fields, and there are
millions of records. I have to get results within milliseconds. Is there a
way I can tell SQL server to load *everything* into memory?
"Mike Epprecht (SQL MVP)" <mike@.epprecht.net> wrote in message
news:OVlgKgHlFHA.1440@.TK2MSFTNGP14.phx.gbl...
> Hi
> The 1st query can use the index your created on SaleDate.
> The 2nd query has no option, but to read every row (9.5 million) in the
> table to find where S3 > S4, and then based on that subset, so the S1/S2
> evaluation.
> Either add and index, to restrict the select to something that can use one
> of the indexes.
> Regards
> --
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
> IM: mike@.epprecht.net
> MVP Program: http://www.microsoft.com/mvp
> Blog: http://www.msmvps.com/epprecht/
> "Rich" <no@.spam.invalid> wrote in message
> news:G_uGe.53624$4o.5775@.fed1read06...
of
>|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
But based on your weird, vague personal language:
1) You have no key because in NULL-able
2) You don't know the proper format for dates in SQL
3) You don't know about the math errors in the deprecated MONEY data
type - use DECIMAL(s,p) instead.
4) You don't know how to declare NUMERIC (s,p) data
5) Either you were sloppy or you don't know what a repeating group and
1NF are.
6) You don't know that rows are not records and columns are not fields.
if you cared about the people helping your for free, would your DDL
look like this?
CREATE TABLE SalesTransactions
(trans_nbr CHAR(10) NOT NULL PRIMARY KEY,
cust_id CHAR(10) NOT NULL
REFERENCES Customers (cust_id)
ON UPDATE CASCADE,
sales_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL
CHECK (<<SET to 00:00:00 Hrs >> ),
repeating_grp_1 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
repeating_grp_2 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
repeating_grp_3 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
repeating_grp_4 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
vague_thing_1 NUMERIC(12,4) DEFAULT 0.00 NOT NULL);
You can use a CASE expession with aggregates to do all of these things
in one statement. You will simply have to make a table scan for some
of them, so indexing will not help. Do them all at once or put them in
a VIEW.
CREATE VIEW Godzilla (..)
AS
SELECT MAX(CASE WHEN sales_date SaleDate > '2005-01-01'
THEN repeating_grp_1 ELSE NULL END)
AS max_1,
AVG (CASE WHEN repeating_grp_3 > repeating_grp_S4
THEN repeating_grp_1/repeating_grp_2)
ELSE NULL END
AS foobar_avg,
etc.
FROM SalesTransactions1;
But this table feels like a set of design flaws.|||How about thinking a little outside the norm. Denormalisation may be useful
here.
The idea is that you have a table which contains the results of the query
already done.
Then you could add triggers or maybe a SQL Job to re-calculate the
aggregates, and fill in a table.
Averages (or more correctly the mean value) can be calculated easily, during
the insert. Update's and delete's may be a little more tricky, but then
that's there the JOB comes in. If it's not critical that the aggregate is
calculated NOW, but can be done lets say once a day then a SQL Job could be
setup to run re-calculate every so often. Does it really matter if it takes
a minute if no one is sat there waiting for it? OK, not what you wanted to
hear I know, but it's a possibility.
Another thought is that you may want to take a look into using composite
indexes. For you second query, try this index it may help to improve the
performance (my SQL box is at work so I can't try it)
Before you create the index, run the query and get SQL to display the Actual
Execution plan. You'll notice that the Query is performing a Table Scan,
(or if you've now created a clustered index, as Clustered Index scan). Use
SQL Profiler to take a reading of the Number of Reads that SQL needs to
execute your query.
Now Create the following index
Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 )
If I've got it right, when you re-execute the query, the execution plan will
show that it's performing an Index Scan Or possibly even and Index S

(but
I don't think this will happen) Now use SQL Profiler to get the number of
reads, you'll notice that it was alot lower. Also the execution time should
be lower too. Now turn off the Execution Plan, and stop the profiler to
get the production speed of the query.
I know that I've rambled on a bit, but I hope it gives you some ideas.
Regards
Colin Dawson
www.cjdawson.com
"Rich" <no@.spam.invalid> wrote in message
news:EjxGe.53631$4o.43600@.fed1read06...
> Shoot... SQL server might not work for me. I have to be able to calculate
> sums, averages, etc. based on a number of different fields, and there are
> millions of records. I have to get results within milliseconds. Is there a
> way I can tell SQL server to load *everything* into memory?
> "Mike Epprecht (SQL MVP)" <mike@.epprecht.net> wrote in message
> news:OVlgKgHlFHA.1440@.TK2MSFTNGP14.phx.gbl...
> of
>|||Thanks I'll give that a try. But I have to recalculate those 500 fields
every 5 minutes... do you suppose that "Real Time OLAP" would help in this
case?
"Colin Dawson" <newsgroups@.cjdawson.com> wrote in message
news:iJxGe.79457$G8.17569@.text.news.blueyonder.co.uk...
> How about thinking a little outside the norm. Denormalisation may be
useful
> here.
> The idea is that you have a table which contains the results of the query
> already done.
> Then you could add triggers or maybe a SQL Job to re-calculate the
> aggregates, and fill in a table.
> Averages (or more correctly the mean value) can be calculated easily,
during
> the insert. Update's and delete's may be a little more tricky, but then
> that's there the JOB comes in. If it's not critical that the aggregate is
> calculated NOW, but can be done lets say once a day then a SQL Job could
be
> setup to run re-calculate every so often. Does it really matter if it
takes
> a minute if no one is sat there waiting for it? OK, not what you wanted
to
> hear I know, but it's a possibility.
> Another thought is that you may want to take a look into using composite
> indexes. For you second query, try this index it may help to improve the
> performance (my SQL box is at work so I can't try it)
> Before you create the index, run the query and get SQL to display the
Actual
> Execution plan. You'll notice that the Query is performing a Table Scan,
> (or if you've now created a clustered index, as Clustered Index scan).
Use
> SQL Profiler to take a reading of the Number of Reads that SQL needs to
> execute your query.
> Now Create the following index
> Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 )
> If I've got it right, when you re-execute the query, the execution plan
will
> show that it's performing an Index Scan Or possibly even and Index S

(but
> I don't think this will happen) Now use SQL Profiler to get the number of
> reads, you'll notice that it was alot lower. Also the execution time
should
> be lower too. Now turn off the Execution Plan, and stop the profiler to
> get the production speed of the query.
> I know that I've rambled on a bit, but I hope it gives you some ideas.
> Regards
> Colin Dawson
> www.cjdawson.com
> "Rich" <no@.spam.invalid> wrote in message
> news:EjxGe.53631$4o.43600@.fed1read06...
calculate
are
a
the
S1/S2
9,500,000
takes
3
I
types
>|||Honestly, I've not looked at OLAP yet. I really think that I should do at
some point soon.
re-calculating 500 fields every five mins! ouch. I know that this idea
could end up in a complete tangle, but maybe it would be worth splitting the
workload down. For example if you're working out averages, use the real
data to work out the averages for portions of the data, then to get the
average for the whole, use the results from the portions to calculate the
whole. Also does the entire table have to be re-calculated? If not, don't
do the whole table. For example...
Say that you're talking about sales for the last w

. Today is Sunday, and
as the information is entered into the system each day, and the previous
days information doesn't get altered, there's no need to re-calulate
yesterdays average, it's static.
Calculating the average for the w

so far is as simple as calculating the
average for monday to saturday - i.e. The six seperate averages for added
together, then divided by 6. Then all you need to do is include todays
average, which will only ever be on a small dataset compared to the entire
table. Bet this can be calulated in a matter of ms.
I can't give you any specifics on how to get the best performance from your
database, for the simple reasong that I don't have a full and complete
understanding of your data. This is the key to getting maximum performance
out of any database.
Regards
Colin Dawson
www.cjdawson.com
"Rich" <no@.spam.invalid> wrote in message
news:exyGe.53638$4o.28608@.fed1read06...
> Thanks I'll give that a try. But I have to recalculate those 500 fields
> every 5 minutes... do you suppose that "Real Time OLAP" would help in this
> case?
> "Colin Dawson" <newsgroups@.cjdawson.com> wrote in message
> news:iJxGe.79457$G8.17569@.text.news.blueyonder.co.uk...
> useful
> during
> be
> takes
> to
> Actual
> Use
> will
> (but
> should
> calculate
> are
> a
> the
> S1/S2
> 9,500,000
> takes
> 3
> I
> types
>

Wednesday, March 7, 2012

My table design - where did I go wrong?

No comments:

Post a Comment

MDF Files currupted

Blog Archive

About Me