Preparing the Database for In-Memory Capability:
Changing Database Compatibility Level:
This is needed for parallel queries and auto-update of statistics in the database.
ALTER DATABASE FACILITIES_FARM_DB SET COMPATIBILITY_LEVEL = 130;
Enabling MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT Option:
Enables the database option MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT to avoid the need to use the WITH (SNAPSHOT) hint for ad hoc queries accessing memory-optimized tables.
ALTER DATABASE FACILITIES_FARM_DB
SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT = ON;
Add a FILEGROUP and CONTAINER:
- Only one filegroup can be created for memory optimized tables in a database. All the table data and metadata is stored in this filegroup only.
- Now, I am creating a filegroup named FACILITIES_FARM_DB_mod using statement:
ALTER DATABASE FACILITIES_FARM_DB
ADD FILEGROUP FACILITIES_FARM_DB_mod CONTAINS MEMORY_OPTIMIZED_DATA;
- At least a container should be added to this filegroup. We are now creating a container to the FACILITIES_FARM_DB_mod filegroup using below statement:
ALTER DATABASE FACILITIES_FARM_DB
ADD FILE (name='FACILITIES_FARM_DB_mod2', filename='F:\data\FACILITIES_FARM_DB_mod2') TO FILEGROUP FACILITIES_FARM_DB_mod;
This is to add MEMORY_OPTIMIZED_DATA filegroup. This can also be done using UI in SSMS.
CREATE SCHEMA INFY_SCHEMA;
GO
To get the data from into the in-memory tables, I preferred my source as another db in which the tables are present. This is a transactional database (OLTP) where records come in every second. All the records in these tables capture the update_dt also using which we will create the incremental strategy. To use the tables in this db to fetch the data, we need to link this db to our FACILITIES_FARM_DB using Linked Server setup in SSMS.
Linked Server can be added from Server Objects in SSMS.
Select the provider as SQL Server and give-in name and give-in the server details of your source db.
Using these steps, I created a linked server named FACILITIES_MASTER_DB_SERVER which would connect me to FACILITIES_MASTER_DB and browse/refer the tables in the master db.
The points to ponder before we create memory optimized tables:
- Changing the database compatibility level to 130 (Microsoft's suggestion for optimized tables)
- Enabling the database option MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT (prevents locks when using ad-hoc queries)
- Enabling MEMORY_OPTIMIZED option on the table
- Set DURABILITY to SCHEMA_AND_DATA (keeps both schema and data of table in secondary disk)
Let's create a new in-memory optimized table called INFY_PMU_ORDER in schema INFY_SCHEMA.
The CREATE table script for memory optimized table is a regular script but we need to add two attributes in the last. Also, a non-clustered primary key index is mandatory.
CREATE TABLE [INFY_SCHEMA].[INFY_PMU_ORDER](
[ROW_WID] [int] NOT NULL PRIMARY KEY NONCLUSTERED,
[INSERT_DT] [datetime] NULL,
[UPDATE_DT] [datetime] NULL,
[PROJECT_ID] [varchar](32) NULL,
[ORDER_NUMBER] [varchar](4000) NULL,
[TYPE] [varchar](100) NULL,
[ORDER_TYPE] [varchar](30) NULL,
[COMPLETION_DATE] [datetime] NULL,
) WITH
(MEMORY_OPTIMIZED = ON,
DURABILITY = SCHEMA_AND_DATA);
GO
A control table is used to maintain and monitor the delta records. It will typically have the load start time, load end time, status of the load (running/success). In the control table we use, we will be having a record for each table everyday with the typical fields and the count of updated records for each UPDATE and each INSERT. The structure of our control table is as follows:
CREATE TABLE [INFY_SCHEMA].[CONTROL_TABLE](
[SNO] [int] NULL,
[BATCH_ID] [int] NULL,
[CURR_ETL_RUN_DATETIME] [datetime] NULL,
[LAST_ETL_RUN_DATETIME] [datetime] NULL,
[STATUS] [nvarchar](50) NULL,
[TABLE_NAME] [nvarchar](50) NULL,
[UPDATE_COUNT] [int] NULL,
[INSERT_COUNT] [int] NULL,
[SOURCE] [nvarchar](20) NULL,
[LOAD_TYPE] [varchar](30) NULL
) ON [PRIMARY]
GO
We will be updating the record of table with the insert and update counts for every incremental load along with batch id and sno.
Stored Procedure with Incremental Logic:
This stored procedure is to implement incremental logic for all the in-memory tables. The source for these would be Datahub db and a database link must be created between the Datahub DB and the replicator db. Once the linked server is established, using a set of INSERT and UPDATE statements, we can implement the incremental strategy. The steps to follow for the table would be:
1. Insert the full load record in the control table
2. Get the max start datetime from the control table into a variable say @UpdateDt
3. Insert a default record into control table with status 'RUNNING'
4. Update the existing records in respective table based on the value @UpdateDt
5. Get the rows count using @@ROWCOUNT and update the UPDATE_COUNT field in control table
6. Insert the new records in respective table based on the value @UpdateDt
7. Get the rows count using @@ROWCOUNT and update the INSERT_COUNT field in control table
8. Update the relevant record in control table with status 'SUCCESS'
The above steps are repeated in blocks for as many tables as you need and thus the procedure is created.
Here is the script for this procedure:
CREATE PROCEDURE [dbo].[Sp_facilities_farm_incremental]
AS
BEGIN
SET nocount ON;
DECLARE @UpdateDt DATETIME
DECLARE @RowCount1 INTEGER
DECLARE @RowCount2 INTEGER
/************************ INFY_PMU_ORDER ***********************************/
SET @UpdateDt = (SELECT Max(curr_etl_run_datetime)
FROM infy_schema.control_table
WHERE status = 'SUCCESS'
AND table_name = 'INFY_PMU_ORDER')
INSERT INTO infy_schema.control_table
([sno],
[batch_id],
[curr_etl_run_datetime],
[last_etl_run_datetime],
[status],
[table_name],
[update_count],
[insert_count],
[source],
[load_type])
VALUES ((SELECT Max(sno) + 1
FROM infy_schema.control_table
WHERE ( status = 'SUCCESS'
OR status = 'RUNNING' )),
(SELECT Max(batch_id) + 1
FROM infy_schema.control_table
WHERE table_name = 'INFY_PMU_ORDER'
AND ( status = 'SUCCESS'
OR status = 'RUNNING' )),
Getdate(),
(SELECT Max(curr_etl_run_datetime)
FROM infy_schema.control_table
WHERE table_name = 'INFY_PMU_ORDER'
AND status = 'SUCCESS'),
'RUNNING',
'INFY_PMU_ORDER',
NULL,
NULL,
'DATAHUB',
'INCREMENTAL');
UPDATE tab1
SET tab1.insert_dt = tab2.insert_dt,
tab1.update_dt = tab2.update_dt,
tab1.project_id = tab2.project_id,
tab1.order_number = tab2.order_number,
tab1.type = tab2.type,
tab1.order_type = tab2.order_type,
tab1.completion_date = tab2.completion_date
FROM [INFY_SCHEMA].[infy_pmu_order] AS tab1
INNER JOIN (SELECT [row_wid],
[insert_dt],
[update_dt],
[project_id],
[order_number],
[type],
[order_type],
[completion_date]
FROM
[FACILITIES_MASTER_DB_SERVER].[FACILITIES_MASTER_DB].[INFY_SCHEMA].[infy_pmu_order]
WHERE update_dt >= @UpdateDt) tab2
ON tab1.row_wid = tab2.row_wid;
SELECT @RowCount1 = @@ROWCOUNT;
UPDATE infy_schema.control_table
SET update_count = @RowCount1
WHERE sno = (SELECT Max(sno)
FROM infy_schema.control_table
WHERE table_name = 'INFY_PMU_ORDER'
AND status = 'RUNNING');
INSERT INTO [INFY_SCHEMA].[infy_pmu_order]
([row_wid],
[insert_dt],
[update_dt],
[project_id],
[order_number],
[type],
[order_type],
[completion_date])
SELECT [row_wid],
[insert_dt],
[update_dt],
[project_id],
[order_number],
[type],
[order_type],
[completion_date]
FROM
[FACILITIES_MASTER_DB_SERVER].[FACILITIES_MASTER_DB].[INFY_SCHEMA].[infy_pmu_order]
WHERE update_dt >= @UpdateDt
AND row_wid NOT IN (SELECT row_wid
FROM [INFY_SCHEMA].[infy_pmu_order]);
SELECT @RowCount2 = @@ROWCOUNT;
UPDATE infy_schema.control_table
SET insert_count = @RowCount2
WHERE sno = (SELECT Max(sno)
FROM infy_schema.control_table
WHERE table_name = 'INFY_PMU_ORDER'
AND status = 'RUNNING');
UPDATE infy_schema.control_table
SET status = 'SUCCESS'
WHERE sno = (SELECT Max(sno)
FROM infy_schema.control_table
WHERE table_name = 'INFY_PMU_ORDER'
AND status = 'RUNNING');
RETURN 1
END
To execute the procedure, go to:
FACILITIES_FARM_DB -> Programmability -> Stored Procedures -> spTConnectIncremental (right click) -> Execute Stored Procedure
Now schedule the procedure we have created in the previous step using SQL Server Agent in SSMS. Once a name is given to this job in General tab, schedule the timing and repeat properties needed in Schedules tab as shown below:
Creating Indexes on Memory Optimized Tables:
The nominal (CREATE INDEX...) statement doesn't work on in-memory tables.
Instead, we need to use
(ALTER TABLE... ADD INDEX...) syntax.
ALTER TABLE INFY_PMU_ORDER ADD CONSTRAINT constraint_Unique_PI
UNIQUE NONCLUSTERED (UPDATE_DT DESC, PROJECT_ID);
ALTER TABLE INFY_PMU_ORDER ADD INDEX hash_projectid
HASH (PROJECT_ID) WITH (BUCKET_COUNT = 64); -- Nonunique