I'm creating an application that will allow users to contribute "content". The content can be tagged, saved as "my content", etc... very web 2.0'ish. The site will rely very heavily on SQL Server 2005. By default, there is a Content table that simply stores the content with an identity column PK. There is also a Tag table and User table.
What is the most effective schema for speed and reliable scalability? Eventually there could be hundreds to thousands of people contributing and tagging content.
Idea 1: Lookup Tables
Simply make a new table that holds the TagID and UserID. The table will be very important as it will be queried very regularly to show the users which tags they have stored.
Pros: This will effectively allow me to store & query what tags the user has selected. It's simple to setup and the application won't have to work much with the data returned from the query.
Cons: What happens when there are thousands of users tagging content? At what point does it become very inefficient to query a table that has a huge number of rows?
Idea 2: Comma Delimited List
Simply has a field in the user table that has a comma delimited list of TagID's the user has selected
Pros: Keeps table size low, fairly easy to implement.
Cons: Application has to perform more work. It has to separate the TagID's by comma, and requery the database to get each tags data, based on TagID.
Those are basically the only two methods I've really got experience using. The reason for this post is to see if there are methods that I'm not aware of that are better suited for what I'm trying to do.
Any assistance is greatly appreciated, thank you in advance!
Phil, great links, really helpful and appreciated.
I just need to verify one thing on the lookup method: --One of the lookup methods people were discussing is non-cached lookup -- which seem to be evaluated to be the fastest. Is the non-cached the default of LookUp transformation? and when I wanted the lookup method to be cached, I need to go into the Advance tab and set it to however %, right? thanks.
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
Ok say I would like to build a table for of the following questions(say 6 questions for the sake of argument): Do I just stored the index of the radiobuttonlist. What are some resources that I could look at. Should I make a look up table.
5) If money were no object, I would live . . . Prefer not to say On a tropical island In a New York penthouse In an English castle On a Texas ranch In a Malibu beach house In a mountain retreat (Selected) On the moon None of the above
1 ---> 2 ---> 3 ---> 4 ---> 5) --->6 This is the question we are looking at. 6 --->
How should I create the database table for the above example.
I have facing a design problem and unable to justify which design to choose for my data model.
Usually, what we have is like data tables and reference tables to store data in those data tables. My database has tables with 20-30 columns in them. And most of them (though not all of them), stores data from some reference tables. Meaning each column has an associated reference table where it stores possible list of values for that particular column. Sort of data domain for that column. FYI, My database is related to medical field.
For example, A table that has a varchar(40) column called "Differentiation". It can only store values from following list: - Undifferentiated - Moderate - Poor - Poor - Moderate - Moderate - well
Now, to implement this, simple solution would be to have a reference table where I can store all these possible values...And then have just a reference of each data item into my "Differentiation" column in the table.
This is simplest and probably the best solution for such thing and i can also have referential integrity implemented for this.
But now if we look at the bigger picture, my database is growing and I have about 80 tables which I need to create where most of the columns will have different reference tables like I mentioned above. Approximate number of reference tables is 300 tables. All the reference table will have same structure, with different values for different columns.
Now, what seems to me is, because the table structure is same for every column, rather than having 300 different tables, I can only have 2 tables, where I can put all these reference values into these 2 tables. Like, Table 1 : This table can have name of the reference table like "differentiationlist" etc.
Table 2:
It has reference to the reference table list in Table 1 discussed above and all the values that are part of that reference table can go in this table with its reference.
But problem with this is, because all the reference tables are in these two tables, I don't know how to implement referential integrity in this design.
Does anyone have any idea or solution for situation like this?
Right now I am leaning towards joining a temp table that pulls my aggregates in and then joins them on metric id and year month. But I noticed my bosses boss who knows this stuff a lot more than I, seemed to do direct inserts dropping that whole range.
Same criteria but different approach. I like updating the new results and not dropping or deleting the contents. I like using temp tables too it makes it easy to just select into them and join off of them on the destination table that will be updated.
I'm working on a query for a report. I've done this before and it works, but I think it's a little slow due to the joins and I'm wondering if I'm doing this the best way.
This is from a Microsoft CRM system. I'm only using the LEAD table. There is a field on the lead table called StateCode. When a user "Qualifies" a lead, the statecode changes. The report requires a column for total leads, a column for # of leads qualified, and a column for % of leads qualified. There are other columns, but those three will illustrate the problem.
Because total leads means all statecode values are included, and Qualified leads means only one statecode value is included, I can't get those two values from the same query (that I know of). So what I do is take two queries, one for total leads, and one for qualified leads, put them in parenthesis and name them, and then join them on the name of the leadsource, like below. I often end up with 10 or 15 of these "Query Tables" in my main query. Is this the best way?
Code Block SELECT * FROM ( SELECT LeadSource , COUNT(CreatedOn) FROM Leads GROUP BY LeadSource ) as A
LEFT OUTER JOIN
( SELECT LeadSource , Count(CreatedOn) , Count(CreatedOn) / (SELECT COUNT(CreatedOn) FROM leads) AS "% of Leads Qualified from this Lead Source" FROM Leads WHERE StateCode = 2 GROUP BY LeadSource ) as B ON A.LeadSource = B.LeadSource
So i am designing a new database that currently has several tables 'look up tables' that are used just to limit the values of columns in other tables.
my question is what is the best way to do this?
1. multiple tables - one for each set of values (ex: JobType, Position, PayGrade) 2. One large table that holds all the lookup values - has a 'Category' field to group them 3. put constraints on the columns of the tables that are 'looking up' and get rid of the lookup tables.
Hi,If you have lookup tables which are used by multiple tables (e.g.'City' lookup table might be used by the 'Employee' and the 'Company'tables) do we need to link it to both tables? Or can it sit by itselfand just referenced?Cheers,Jack
I can't see what is going on, this is the situation:
I call the Pull method, specify the table to be affected, the query to be used, the connection string to the remote SQL server, the tracking options (On) and the Error table. The pull method executes with no errors however, no table is ever created. I don't know why, here's what I have done so far:
I read the SQL BOOKS ONLINE help on preparing RDA, I set up the IIS virtual directory for anonymous access and on the connection string I send in the user name and password for the SQL server, I went into the SQL Server and grated access to the user name to the database that I am going to access and I made the user a db_owner.
So, according to SQL BOOKS ONLINE I have everything right however, it won't populate, so right now I am open to suggestions on how to get this to work, heres the code: ------------------------------------------------------------------------------------------------------------------ string rdaOleDbConnectString = "Provider=SQLOLEDB;Data Source=<Server>;Initial Catalog=<DB>; User Id=<User>;Password=<Password>"; (it's not exactly like this, but in it has the proper values) string connectionString = "Data Source="\Program Files\client\db\MobileDB.sdf"";
SqlCeRemoteDataAccess rda = new SqlCeRemoteDataAccess("http://10.1.1.206/mobile/sqlcesa30.dll", connectionString);
IList _tableNames = new ArrayList(); IList _queries = new ArrayList();
############ Code that prepares tables and queries ############
It's been always said that it is best to put index on commonly joined fields in the table. But putting too much index on the table would cause the table to be slow on insert and update.
My question is, how do you deal with your fields that uses look up tables? Like for example for these fields
Those fields don't come a big part in the table, though when I query the table I always join them with their respective primary table to get their respective text value. Do I still need to put Index & FK relationships to these fields?
What fields are normally good candidates for index or fk relationships?
Two tables:T1 (c1 int, TestVal numeric(18,2), ResultFactor numeric(18,2))--c1 isthe primary key.T2 (x1 int, FromVal numeric(18,2), ToVal numeric(18,2), Factornumeric(18,2))--x1 is the primary key. T2 contains non-overlappingvalues. So for eg., a few rows in T2 may look like.1, 51, 51.999, 512, 52, 52.999, 52........32, 82, 82.999, 82........T2 is basically a lookup table. There is no relationship between thetwo tables T1 and T2. However, if the TestVal from T1 falls in therange between FromVal and ToVal in T2, then I want to updateResultFactor in T1 with the corresponding value of Factor from the T2table.------Example for illustration only---------------Even though tables cannot be joined using keys, the above problem is avery common one in our everyday life. For example T1 could beemployees PayRaise table, c1=EmployeeID, with "TestVal" representingtest scores (from 1 to 100). T2 representing lookup of the ranges,with "Factor" representing percent raise to be given to the employee.If TestVal is 65 (employee scored 65% in a test), and a row in T2(FromVal=60, ToVal=70, Factor=12), then I would like to update 12 intable T1 from T2 using sql;. Basically T2 (like a global table)applies to all the employees, so EmpID cannot serve as a key in T2.---------------------------------------------------------Could anyone suggest how I would solve MY PROBLEM using sql? I wouldlike to avoid cursors and loops.Reply appreciated.Thanks
I need to match each car to it's respective class. I've already search the database for post on this subject, but unfortunatelly my goal is yet to be reached. Can someone help?
I try to convert a Procedure that join 8 tables with INNER AND OUTER JOIN, my understanding is that the Lookup task is the one to use and I should break these joins into smaller block, it takes a long time to load when I do this, since each of these tables had 10-40mill. rows and I have 8 tables to go thru, currently this Stored Procedure took 3-4min to run, by converting this to 8 Lookup tasks, it ran for 20min. has anyone run into this issue before and know a work around for this.
Hi,I am trying to write a method which needs to call a stored procedure and then needs to get the response of the stored procedure back to the variable i declared in the method. private string GetFromCode(string strWebVersionFromCode, string strWebVersionString) { //call stored procedure } strWebVersionFromCode = GetFromCode(strFromCode, "web_version"); // is the var which will store the response.how should I do this?Please assist.
I am trying a create views that would join 2 tables:
Table 1: Has all the columns need by a view ( Name: Product Structure: ID, Attribute 1, Attribute 2, Attribute 3, Attribute 4, Attribute 5 etc Table 2: Is a lookup table that provides the names of columns Name: lookupTable Structure: tableName, ColumnName, columnValue Values: Product, Attribute1, Color Product, Attribute2, Size Product, Attribute3, Flavor Product, Attribute4, Shape
Hi, I just have a Dataset with my tables and thats it I have a grid view with several datas on it no problem to get the data or insert but as soon as I try to delete or update some records the local machine through the same error Unable to find nongeneric method... I've try to create an Update query into my table adapters but still not working with this one Also, try to remove the original_{0} and got the same error... Please help if anyone has a solution
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
I am designing a ssis package,This is intends to mine text data(Data extracted from websites). Term lookup/Term extraction has been used as tools for mining. I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining. For example : I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS. If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
is this the best and fastest method to connect to a database for DataGrid databind?
--------- Dim MyConnection As SqlConnection = New SqlConnection(ConfigurationSettings.AppSettings("ConnectionStringSQL")) Dim MyCommand As SqlCommand = New SqlCommand("sp_BuddiesPendingSelect1", MyConnection) MyCommand.CommandType = CommandType.StoredProcedure MyCommand.Parameters.Add(New SqlParameter("@UserID", intUserID))
MyConnection.Open() Dim dr As SqlDataReader = MyCommand.ExecuteReader()
DataList1.DataSource = dr DataList1.DataBind()
dr.Close() MyConnection.Close()
----------------------- or should i use a DataAdapter and fill a DataSet and then attach the DataSet to the DataGrid?
I have been using VB6 for a long time now and had no problems using ADODB.Recordset.I had a module to which I would send my recordset (byref) and the SQL command (byval) and use the resultant recordset for adding, modifying or deleting records.How can I do the same with VS2005? This is a major problem as there are dataadapters, grids, datareaders etc.Does someone have a simple method to get the recordset so that I can modify the record using the code (eg: .Addnew/.Delete/.Update) and Close the connection?Please note that I do not need to display anything and have to run 40/50 AddModDel for every click of the program.Any help is greately appreciated. Thanks in Advance.
If you not familiar with it, the <a href="http://www.stephenforte.net/owdasblog/permalink.aspx?guid=2b0532fc-4318-4ac0-a405-15d6d813eeb8">Rozenshtein Method</a> uses SQL to create a crosstab. The concept is a stroke of genius but I'm having trouble getting it to work on one of production databases.
I successfully used the Northwind example explained at Stephen Forte's site (see the link above)...but no luck on my real world problems.
I can get the date statements to resolve to 0 correctly, but when I try to aggregate the data - the statements are turning into 1 multiplying the aggregate data for each cell...which fills in the same data across the entire row.
For example (the columns represent the time period), GROUP T1 T2 T3 T4 group1 9 9 9 9 group2 3 3 3 3 group3 5 5 5 5
My sql code is: SELECT dbo.tblHassBatch.ProdLine, COUNT((dbo.tblHassUUT.UnitID)*(1-ABS(SIGN(datediff(dd,dbo.tblHassBatch.StartTime,ge tdate())-0)))) AS Today, COUNT((dbo.tblHassUUT.UnitID)*(1-ABS(SIGN(datediff(ww,dbo.tblHassBatch.StartTime,ge tdate())-0)))) AS [This Week], COUNT((dbo.tblHassUUT.UnitID)*(1-ABS(SIGN(datediff(mm,dbo.tblHassBatch.StartTime,ge tdate())-0)))) AS [This Month], COUNT((dbo.tblHassUUT.UnitID)*(1-ABS(SIGN(datediff(mm,dbo.tblHassBatch.StartTime,ge tdate())-1)))) AS [Last Month] FROM dbo.tblHassBatch INNER JOIN dbo.tblHassUUT ON dbo.tblHassBatch.BatchID = dbo.tblHassUUT.BatchID GROUP BY dbo.tblHassBatch.ProdLine
Am I overlooking something here? I'm pulling my hair out b/c if this works it's really going to provide a great solution for a project I'm working on...but I can't seem to figure it out.
I'm not really strong in SQL. My goal is to compare the beginning mileage of a vehicle record with it's previous ending mileage reading. I have something that works, but it feels clunky. I wonder if there is a better method, ie a join. Here's what I have:
SELECT A.Trolley_num, A.Date, A.Speedo_start, A.Speedo_end, (SELECT B.Speedo_end FROM Daily_Trolley AS B WHERE B.Trolley_num = A.Trolley_num AND B.Date = (SELECT Max(Date) FROM Daily_Trolley AS C WHERE C.Trolley_num = A.Trolley_num And C.Date < '1/23/2005')) AS PrevSpeedoEnd FROM Daily_Trolley AS A WHERE A.Date='1/23/2005'
ps: I inherited this db; I'm aware that "Date" should not have been used as a field name.
'TOP 1' or 'DISTINCT' or 'MAX' Any sugestions on which is better to use if I need to select a record that has the highest value - could be a INT or sometimes a DATETIME.
Can someone please tell me how to fix this following error in SQL 2005 When trying to create a maintenance plan, Method Not Found:'Void Microsoft.SqlServer.Management.DatabaseMaintenance.TaskUIYtils..ctro()' (Mircosoft.SqlServer.MaintenancePlanTasksUI)
Is there an efficient scripting method to update the connection string for ALL reports that reside on a reporting/web server? "(automating the process, rather than having to change the data source for each individual report that resides on that server)".
I have never used replication before and I have been asked to considerit in a project I am currently working on.I have created an application for a sales team which is loaded on theirmachines, it uses ms sql as its data source and connects via theinternet back to the central server in the office.Problem is this has shown to be too slow causing time out errormessages and so on. I have been told to research the possibility ofreplication, but am unclear what type of replication to use or where tostart.Any assistance would be appreciated.Regards,Ben